MakeDict direction

AJ

秀才
I would be very interested in hearing from some of you that have used MakeDict to construct some custom dictionaries. What advice would you give to someone just starting to compile data with the intent of moving it to a custom dictionary? How did you structure your word entries beyond the basic format prescribed in the MakeDict readme? How does MakeDict handle one to many or many to many relationships? I'm assuming there's no synthesis there, so did you manually create multiple entries combining all possible translations?

I realize I'm rambling a bit - bottom line, what basic steps, advice would you give including how you would format definitions, etc.

Thanks in advance for your input.
 

mikelove

皇帝
Staff member
Since nobody's chimed in here, I'd say that the most important thing is to keep your data in a clean, tagged/separated text format - the next version of PlecoDict should have considerably more robust dictionary creation abilities and you'll want to be able to reformat your data to work with those.

On the subject of entry structure, of the dictionaries we offer I personally think the ABC has the best format, so you could look at entries from that for some inspiration. Those circled numbers the ABC uses are standard Unicode characters that are built into PlecoDict's font file, so if you use them in your own entries they should show up correctly.

I'm not sure what you mean about one to many relationships - we use a flat (non-relational) database format for dictionary entries, so if you create multiple entries with the same headword they'll each show up as a separate entry in the dictionary - the way entries are ordered and structured in your input file is the way they'll be ordered and structured in the finished dictionary.
 

AJ

秀才
Thanks Mike for the info. If I can keep you engaged here for another comment/question or two that would be great.

  • I've played around some using the sample.txt file and creating sample .pdb's just to see what some different changes would look like, etc. I can see that MakeDict picks up the first pinyin rendering it bold with tone marks instead of numbers. Is there anyway I can introduce a bold font elsewhere in the definition?
  • If I make a longer definition that includes alternate pinyin, I see that I can input that pinyin with tone marks and it shows up correctly in the dictionary.
  • I can find the 'circled number' like ABC uses under the insert symbol for Unicode hex listed in MS Word. Is it safe to assume most other symbols from that same listing Asian text font, from Unicode hex should work?
  • My sample db's all have ? as their icon in PlecoDict. Are the other icons (e.g. ABC, NWP, OX) graphics? How can I add an icon for custom dbs?
  • For the one-to-many, many-to-many relationships, you said
mikelove said:
I'm not sure what you mean about one to many relationships - we use a flat (non-relational) database format for dictionary entries, so if you create multiple entries with the same headword they'll each show up as a separate entry in the dictionary -

I'm not assuming this is anything that can be handled now, but a thought for your continued work on the custom dictionary app.

If I take a simple English word like book. If I do a search on book, I receive a dictionary entry that will have

book n. 1. 书 shū; 书籍 shūjí 2. 卷 juàn 3. 名册 míngcè; v. 预订 yùdìng . . .

Basically, all of these combined comprise one single dictionary entry. If I did a lookup in the C-E for the above characters I would also yield multiply English words listed as the translation. If I am going to do both an E-C and C-E dictionary simultaneously, it would be nice to be able to manage those relationships somehow. This might require some type of weighted pairing that MakeDict would then use to order the resulting definition. Coding for that level of many-many relationships and weigted pairings would be a large piece of work but would be awesome in reducing the amount of work for input.
 

sfrrr

状元
I'd like to throw in a plea for self-control. I rather like the serendipity involved in inputting parts of phrases, going back and forth, etc. That's how I learn all the interesting words.

Sandra


BTW, you probably won't hear such lofty thoughts when I think up a feature I think PD should have.
 

mikelove

皇帝
Staff member
AJ - you can use bold tags by manually inserting the private-use Unicode characters EAB2 EAB3 into entries; EAB2 marks the start of a bold range and EAB3 marks the end of it. I don't think Word's character palette supports private-use characters, though, so you may need to find another Unicode text editor for this. (we use EmEditor ourselves)

For the circled number symbols, the characters begin at code 2460 - if you select SimSun from the Font menu, that's one that definitely includes those characters. A lot of the other symbols in there actually don't work, as we had to manually design the Unicode symbol characters on Palm and didn't bother doing so for any that we weren't actually using.

There's no way yet to change that ? icon, though there likely will be one in Pleco 2.0. As for the one-to-one relationships, I'm afraid what you describe would be rather difficult at the moment, not only because of the coding required but also because most of our dictionary data isn't very well tagged (having been converted from prepress layout files) and hence we'd need to put in an enormous amount of editorial work just to allow PlecoDict to easily distinguish between / match up the different parts of a dictionary entry. It's something we might consider doing eventually, but it's highly doubtful that it will make it into version 2.0. I do think the idea has a lot of promise, though - in an electronic format it's rather silly to confine ourselves to the simple flat non-relational lists of entries that paper dictionaries use.

sfrrr - if we did implement this it would almost certainly be optional, so nothing to worry about there.
 

AJ

秀才
Mike - thanks for all of the info and help. I'm sure I may have another question or two as I continue to work with this. I've downloaded EmEditor so that should help. It's been a few years since I have coded anything, hand or otherwise, so I need all the help I can get.

As far as the one-many / many-many relationships, I realize that's a big piece. I also may not have fully communicated my intention. The relational database functionality in using a dictionary would in my opinion be a great functionality. That wasn't specifically what I was aiming for however. Even if Pleco were written to handle such a functionality it would then require the dictionaries themselves to be re-built with those relationships coded.

I was speaking purely from the perspective of building and maintaining a specialized dictionary. Having a neat db front end that would say allow me to only enter a Chinese character and its pinyin once and thereafter code it's relationship to any number of English translations, and likewise entering an English word only once and code it's relationship to any number of Chinese characters - this would be nice. Building such a front end shouldn't be too difficult, just not sure it needs to be a priority unless there is a large enough group building custom dictionaries out there. The main difficulty probably still lies in weighting the parings.

Sarah - didn't mean to scare you with my lofty thoughts. I'll try to muster up an appropriate level of self-control henceforth. My intention was really not to alter the way in which a PlecoDict is used, but rather to ease the effort of creating a custom dict. Then again, perhaps not everyone using PlecoDict has as their purpose a serendipitous learning experience.
 

mikelove

皇帝
Staff member
I read sfrrr's comment to mean that the lofty thoughts were her own ('serendipity' etc) - i.e., that she was suggesting that she wouldn't romanticize the old way of doing something if it was something that she did want changed. If that's not the case, then I would remind everybody to please try to keep a positive tone - there are no bad suggestions, if you want to criticize someone then criticize us (or better yet, criticize Microsoft :D), but try not to say anything negative about other users.

AJ - I see your point, there's actually an English-English database like this called WordNet and it would be interesting to see a similar project emerge for Chinese. And you're correct that it wouldn't be all that difficult for us to code - in fact, as we're planning to allow hyperlinks in our new database creator system it should be very easy to set up a database like this. Have you had a chance to play around with Adsotrans at all? It's a pretty active free online Chinese dictionary project. There's a discussion forum for it at http://www.chinese-forums.com/forumdisplay.php?f=34 which the project's creator checks regularly. I mention them because they have the largest free database of Chinese dictionary data that I know of, and they could be a useful resource to base your specialized dictionary on. Plus they're very open to feedback and might be willing to implement some of your changes in their main database. (not that we won't, but they'll probably do it a lot faster than we could)
 

AJ

秀才
Mike & Sarah - Mea culpa. I did not perceive Sarah's comments as negative, merely tongue in cheek. My response to her was also not intended to be negative but likewise facetious.

Mike -I'm familiar with Adsotrans and will explore that and their forums more perhaps. WordNet is something new I will check out.

Warm regards all.
 
Top