A few questions about Pleco functions and dictionaries

mikelove

皇帝
Staff member
That doesn't seem to be the main reason most of our traditional users prefer traditional; most seem to simply be accustomed to it and so have an easier time reading it than simplified.

Oxford's traditional version was developed by them; the source dictionary for the CE half was simplified-first, but since OUP China is based in Hong Kong they generally do a very good/thorough job with traditional character conversions. Also the Longman Advanced dictionary is traditional-first and in fact traditional-only. (but has not sold very well)
 

timseb

举人
I learn traditional and traditional only, and I'm not sure how automatic conversions can help the cause when it undermines what traditional characters have going for them: precision and the very fact that simplified characters are derived from them.

As for traditional dictionaries not derived from simplified, what have we got? Only ABC, Oxford and MOE? Or really only MOE?
Xiandai Guifan is manually checked by them, right?
 

mikelove

皇帝
Staff member
Sorry, no, looks like there's a bug from a recent update to the catalog file - Guifan traditional is done by us. (I'll fix that right now)
 

timseb

举人
No, KEY's not a bug; their traditional characters did in fact come from them.
That's very good to know, and very important, since their traditional headwords are what I have been using in my Anki. Changing that now would be... horrible. :cool:

I'm looking at the Longman Advanced now. What more can be said about it than is in the information text? 47k words is not a lot but also not nothing. Can anything be said about how words have been chosen? For example, which of these are closer to the truth:

1. Good coverage of somehwat common words, but not nearly as good coverage the rarer the words become.
2. A stricter view on what a word is (splitting compounds that aren't really words per se surely could lead to fewer entries).
3. Somewhat random, missing both common and rare words.

The reason I'm asking is that I'm using Xiandai Guifan more of a filtering than as an actual dictionary when I import (edited). When I add words from a book en masse, having a good dictionary with not too many entries mean I get the best bang for buck (time spent). After that I mass change them to KEY (there are obviously some mismatches, but most seem fine).

Since the Longman has even fewer entries, it might be even better as a filter, especially if #2 is closest to the truth, but also #1. #3 would not be good. I hope I am able to communicate what I mean here.

The headwords that are simplified shouldn't be a big problem when I import simplified, since as stated earlier, T to S shouldn't generally be a big problem.
 
Last edited:

Fernando

秀才
It seems like the Commercial Press in HK does sell a rather old-fashioned CD-ROM version of a 全體版 of 漢語大詞典. It has slightly fewer entries than the 簡體版. Whether they did the conversion themselves and whether or not it's reliable is, of course, a different matter.

The original publishers in China also seem to be working on a 2nd edition in 25 o_O volumes.
 

mikelove

皇帝
Staff member
I'm looking at the Longman Advanced now. What more can be said about it than is in the information text? 47k words is not a lot but also not nothing. Can anything be said about how words have been chosen?
You can download the demo version of it and Browse Entries to see exactly which words are included. I don't really have a good philosophical statement to give you on it but that should give you a pretty good idea.

It seems like the Commercial Press in HK does sell a rather old-fashioned CD-ROM version of a 全體版 of 漢語大詞典.
Our data is based on that, actually. (and much of the content in the new 2nd edition was in the "Supplement" volume which we've already incorporated into our version)
 

timseb

举人
You can download the demo version of it and Browse Entries to see exactly which words are included. I don't really have a good philosophical statement to give you on it but that should give you a pretty good idea.
Thank you. I've downloaded the preview and the definitions look really good, easier than Xiandai Guifan according to me. Just a few more questions before I buy it:

1. Are the cards exportable? Since I'm using them in Anki that's important.
2. As you mentioned it's traditional only. How does that affect importing? I've had problems with most dictionary imporats with a lot of characters, like 划 being parsed as 劃, 云 turns into 雲 and so forth. If it's traditional only, would that erase that problem? I'm talking about importing from txt here. If not, is there any workaround?
3. More of a bonus question. Do you have any concrete information about how many entries for individual characters the dictionary has? I'm looking at it as my vocabulary dictinonary as well, so it's not super important, but would be interesting to know. I'm guessing several thousands at least?
 

Fernando

秀才
Our data is based on that, actually. (and much of the content in the new 2nd edition was in the "Supplement" volume which we've already incorporated into our version)
So the Commercial Press doesn't actually sell a traditional-only edition? I downloaded the demo again from Pleco and it is indeed impressive, just a little jarring to read a quotation in traditional just below the title of the book from which it was taken rendered in simplified! I guess I'll consider it again later...

云 turns into 雲 and so forth.
Those two characters actually provide a good test for traditional dictionaries. In the traditional set 雲 means "cloud" and 云 is an archaic word meaning "to speak". The latter one is still found in some idioms, such as 人云亦云 , to repeat what other people say and not to think for oneself. 現代規範 fails at that, rendering a "人雲亦雲" as one of the examples. The Oxford dictionary gets it right.
 

timseb

举人
Those two characters actually provide a good test for traditional dictionaries. In the traditional set 雲 means "cloud" and 云 is an archaic word meaning "to speak". The latter one is still found in some idioms, such as 人云亦云 , to repeat what other people say and not to think for oneself. 現代規範 fails at that, rendering a "人雲亦雲" as one of the examples. The Oxford dictionary gets it right.
Exactly. I know most of the 2000-2500 most common characters already, so a lot of them are in my Anki deck just to keep reviewing them, and these have been my go-to to check how the matching works. The same goes with my example before, 鵪, to see if a dictionary explains characters that are almost never seen without it's partner (in this case 鶉). The fact that over 2000 characters need to be changed retroactively is also the reason I need the matching to be correct, since I wouldn't have time to do it all manually. If i were to start off now from the start, I would have that time, but that is not my situation.
 

Fernando

秀才
Exactly. I know most of the 2000-2500 most common characters already, so a lot of them are in my Anki deck just to keep reviewing them, and these have been my go-to to check how the matching works. The same goes with my example before, 鵪, to see if a dictionary explains characters that are almost never seen without it's partner (in this case 鶉). The fact that over 2000 characters need to be changed retroactively is also the reason I need the matching to be correct, since I wouldn't have time to do it all manually. If i were to start off now from the start, I would have that time, but that is not my situation.
I don't understand why you're trying to learn individual characters by rote with all their possible pronunciations. In modern Chinese characters are words themselves or part of words, and those words are what carry meaning. Sure, in classical Chinese every character was a word in itself, and so every character will have at least one ancient meaning, such as in the example you gave, which you'll probably find in a huge dictionary like 漢語大詞典, but this is much like a study in etymology. Take 會 as another example. It has three pronunciations I'm aware of: hui4 as in 會議, kuai4 as in 會計 and even gui4 in an alternative pronunciation of the name of a place 會稽 (otherwise pronounced as "kuai4ji1"), but you only need to know all those three pronunciations if you know the words "meeting", "accountancy" and that archaic name place.
 

timseb

举人
I don't understand why you're trying to learn individual characters by rote with all their possible pronunciations. In modern Chinese characters are words themselves or part of words, and those words are what carry meaning. Sure, in classical Chinese every character was a word in itself, and so every character will have at least one ancient meaning, such as in the example you gave, which you'll probably find in a huge dictionary like 漢語大詞典, but this is much like a study in etymology. Take 會 as another example. It has three pronunciations I'm aware of: hui4 as in c, kuai4 as in 會計 and even gui4 in an alternative pronunciation of the name of a place 會稽 (otherwise pronounced as "kuai4ji1"), but you only need to know all those three pronunciations if you know the words "meeting", "accountancy" and that archaic name place.
I can explain my system:

Please note that importing *all* readings was not my original plan, but instead the most common ones, like kTGHZ2013 in the Unihan data, but I have not found an authoritative list for traditional characters. In other words, all readings was what I got and I just rolled with it. I'd rather learn one too many than one too few.

The reason for this is quite simple. I have one Anki deck for characters, and one for vocabulary of two characters or more. It has saved me a lot of time. I have about 7500 words in my vocabulary deck by now and review are very quick. Please note that I already know the word 會計 and obviously the single character word 會, and that most characters I'm more keeping fresh in my memory than actually learning them. This has boosted my reading ability a lot. It has helped me stop mixing characters, and realizing the reason I have been unsure of some readings is that I have not realized it was the *same* character (rare, but it happens). It also very quickly helps me learn most new vocabulary with almost zero effort, since I almost never run into new characters or new readings. It is also very good for surnames, which quite often are pronounced completely different from what one would expect of the character in question, like 蓋 (ge3).

But again, if I find a better filtering system, it would be nice. That's partly what I've been trying to achive lately. To take an extreme example: KEY has four readings and definitions of 厭. I say extreme because three of them are, I think at least, either archaic or "used as character XYZ", but I still know them all just because that was the system I had. II've had no problem remembering them, both readings and most common meanings of said reading. Now, I know that yan4 is the one I keep encountering, but since I'm not at an advanced level yet I can not make that judgement on all characters.

The 會 example also shows why I think KEY has the best definitions for individual characters:

Gui4
n (in this pronunciation used in certain place names, such as Guìjī 會稽/会稽 "Mount Guiji" [Zhèjiāng 浙江 Province])
hui4
v | n | sn 1 meeting 2 meet with 3 can, could 4 understand, know (like a language) 5 be able to 6 will, be likely to, be possible, possibly 7 moment, short while 8 Hui (surname)
kuai4
v | n | sn 1 calculate, add up 2 accounting 3 Kuai (surname)

It is also for my mental health... I hate reading books and encounter new characters I have to look up. To be completely prepared and not encounter too many new characters is a good feeling, especially if I not only know the character per se, but I know enough of it to understand how it functions in the sentence and what its reading probably is. Since I'm going to read a lot of 19th and 18th century material I've just made sure to learn the archaic ones as well until Ii find a better filtering system, since I'd rather get a few extra than miss a few.

If the Longman for example has fewer readings and have got rid of a lot of the archaic ones that would be a great filtering method. I tried using the Xiandai Guifan as a filter, but it's mising a lot of common characters because it's not made in that way (since it's not a character dictionary). Please not that this is exactly what I've been chasing both here and on Chinese-Forums, a good list of what readings are actually common! Not only *the most common ones*, for too many characters have several readings where two or more actually are quite common.

And I guess I'm not the first to say this, but... the characters are kind of fun. About 2500 characters in I still think that.
 

timseb

举人
Which 19th and 18th material are you planning to read?
My partner is a a PhD student in history and I'm planning on going the same way, starting my master next year. He's doing older history (Swedish history), but at the moment I'm leaning towards Qing dynasty China. It's still very early on though and a lot can change.
 

Fernando

秀才
Get ready for a steep climb. Chinese evolved into its modern vernacular much later and slowly than European languages, so 19th century literature is still filled to the brim with classicisms. In fact that was true pretty much all the way up to the Xinhai Revolution. I know this cause I've translated works that quoted from late 19th century sources.
 

timseb

举人
Get ready for a steep climb. Chinese evolved into its modern vernacular much later and slowly than European languages, so 19th century literature is still filled to the brim with classicisms. In fact that was true pretty much all the way up to the Xinhai Revolution. I know this cause I've translated works that quoted from late 19th century sources.
I still have quite a few years for the heavy stuff (a master's degree doesn't require that much to be honest). That's precisely why I do it the way I do it, and have a quite clear road map. I want to have read at least a few novels before the year is over (which doesn't seem to be a problem at the moment, but trust me, I know plans don't always work out exactly as planned). Contemporary novels are somewhat managable by now, and I start off with a few of them. I then move on to the 80s (baby steps), and then proceed to early 20th century novels.

In other words: prioritizing reading speed and recognition. Not until that is somewhat mastered I see any point in taking a bite on older stuff. And for reading speed I need characters, and I need words. Lots of them. If it's one thing I've learned is that pretty good knowledge of thousands of words is much better than deep knowledge of hundreds.

It's also important to point out that moving a hundred years or so isn't frowned upon in the field, so if my master is about 20th century China that's no problem. I could even mostly use Swedish archival material if progress is slow.

This also means I have zero focus on speaking and writing. That goes for most historians I would say.
 

timseb

举人
I'm sorry to say Longman was not exportable. Perhaps that should be added to each dictionaries information text?
 

mikelove

皇帝
Staff member
Sorry, no, not exportable - I wish you'd waited for my reply. Email us if you'd like to cancel / refund that purchase.

We generally avoid listing that in dictionary information pages because we don't want to commit to a particular stance on that in the future, if e.g. a publisher's policy changes or if we license a new version of a dictionary and we're no longer allowed to export from it; we do not as a matter of policy guarantee that any particular dictionary's definitions will be exportable indefinitely. (except of course for open-source dictionaries like CC-CEDICT)

Re your other question, it imports like any other dictionary - there are internal simplified mappings we use for search/indexing, we just don't display them in the interface.
 
Top