Pleco - 'Common Mode'

Dunhuang17

秀才
System: iPhone
Dictionaries: PLC, CC, and UNI

Hello,

I have been using Pleco for nearly six months now and am happy with it. At first I used it purely as a dictionary, but then I bought the flash card system, and now I use it as a learning resource.

When I am on the bus or the underground I use Pleco just to familiarise myself with characters, and most importantly to link the characters to words and really just a get a sense of how everything in the Chinese language fits together.

Thus I do a lot of searching, browsing, and regularly use the 'Words Containing' feature and generally end up on a bit of journey. (The actual bus journey itself is over an hour daily!)

One thing has annoyed me though: the sheer amount of characters (and thus words) which are completely redundant.

For example when I type in 'cong', it comes up with 24 different characters.

Of those 24, my university educated friend knows just seven: 从聪丛匆葱囱琮.

The other 17 are either obscure, variations, or historical.

The same could be said for many, many characters (and words) in the dictionary.

Now I know that a wide variety of people use Pleco, from beginners to scholars of Classical Chinese. The fact that this is all there is great, but could it be possible for Pleco to perhaps have a 'Common Mode' or 'Streamlined Mode' in the Settings menu?

At first I thought there must be a way of turning this off (or decluttering!), but amazingly there isn't! There is a switch for Simplified to Traditional, but nothing to streamline the dictionary for 'new learners' or just 'standard learners'.

A question one might ask: one what basis would a character be in the 'Common Mode'? However characters and words are already sorted by frequency, so I don't suppose it would be too hard for Pleco to find a sensible frequency cut-off point.

A fluent speaker knows 4,000 characters, a graduate knows around 5,000. I'm sure somewhere around those figures a cut off point can be found.

I just want a more streamlined mode, so that I come across the seven 'cong' 's that my friend knows, and not the others that are actually not used/needed. He says exactly the same for every other Pinyin word (don't get me started with 'shi'!). He of course knows which characters are still 'in use' and which just aren't, but I don't.

If this 'mode' is already implemented, then I am sorry! If it can be bought, then I will happily buy it.

If not, then I would love to here views on this.

*** Somebody might suggest turning off some of the dictionaries (especially the CC and UNI dictionaries). However they have some good example sentences in the characters/words that I am interested in so I don't want to turn them off.
 

mikelove

皇帝
Staff member
To be honest, I'm not quite sure what benefit this would offer over the sorting by frequency we do now - how are you hurt by having less common characters show up farther down in search results? If you're trying to figure out which specific 'cong' somebody meant, that's going to be a matter of going through the list definition-by-definition anyway.

Best way I can think of to hack this together at the moment would be to add a list of the most common however-many-characters-you-want-supported to your flashcard database - then their entries will show up with a + in the top right corner so you'll know they're common characters because of that.

In theory this would not be too difficult to implement, but we'd need to see a lot more interest before we could consider it (as far as I know, you're the first person who's ever asked for this particular change).
 

HW60

状元
A fluent speaker knows 4,000 characters, a graduate knows around 5,000. I'm sure somewhere around those figures a cut off point can be found.
You could download the HSK flashcards and load them into a user dictionary. Then you have about 5000 words only. Move this dictionary on top of your Manage Dictionaries list.
 

Dunhuang17

秀才
Thanks for your replies.

I guess it has not been asked before as most people use Pleco as a dictionary, not so much as a separate learning resource. However having bought the flashcard add-on, and having lots of time spare with my phone I use it in this way.

The HSK flashcards is not really a 'native speaker' level though really!! Of course there is no exact number of words or characters that a 'native' knows - it varies. However what I am saying is that there is a large number of characters (and words) that a native speaker does not know, yet it is in the dictionary! There is also a setting for 'Rare Simplified' and 'Rare Traditional' which I have turned off. Perhaps the level of what constitutes as 'rare' could be lowered? (surely if a university graduate doesn't know a character than it must be constituted as rare! It is the Chinese equivalent of putting words like 'ratoon' and 'plethora' into a dictionary.)

The reason it occasionally annoys me is that I have used some words before which have garnered puzzled looks from some people, and then upon showing my phone to these people, I get even more puzzled looks from people saying that they have never seen/used that character before. Admittedly it does tend to happen when I using nouns which are not all too common (technical terms etc).

Overall I just think more options are good. Perhaps 'Learners' dictionary, 'Native' dictionary, 'Rare dictionary' etc, groups for 'Scientific terms', 'Botany terms', 'Business terms' etc, the ability to turn off proper nouns and idioms. Or allow the dictionary to just show proper nouns or idioms by turning off the other options etc. I just think that would lead to a more tailored dictionary depending on the level/interests of the user.

I am of course very happy with Pleco though, it is my favourite learning resource.
 

Shun

状元
Hi Dunhuang17,

you could also use the Tuttle Learner's Chinese-English dictionary with 3600 entries or the Tuttle Chinese-English Dictionary with 18'000/13'000 entries, these two should include only the commonly known words. If you activate "Sticky selection" and deactivate "Skip over on button tap" in Manage Dictionaries, you can search exclusively in one of these two dictionaries, and then if you need to know more about a particular word, you could change to the other dictionaries.

Regards, Shun
 

mikelove

皇帝
Staff member
Also, if you put a ! at the start of a search query it will automatically put flashcard words at the top of the results. And we will soon be adding an option to treat flashcards as their own 'dictionary' which will make it even easier to surface words from a particular subset of our supported vocabulary.

Those 'rare' settings are just for the handwriting recognizer, designed to improve accuracy by reducing the number of false matches if you know you're not going to be entering very uncommon characters.

What I'm still having trouble understanding is why this would be useful for Chinese-to-English - if you read or hear a Chinese word then you can be reasonably confident it's something that's still in use. So why do you need to limit the list of Chinese-English results if the most common ones are on top anyway?

It is an issue for English-to-Chinese, but the best way to avoid those 'never heard of this word before' situations with that is to get a good English-to-Chinese dictionary; that will generally spare you from accidentally digging up words or meanings that nobody uses.
 
Hi Dunhuang17,

you could also use the Tuttle Learner's Chinese-English dictionary with 3600 entries or the Tuttle Chinese-English Dictionary with 18'000/13'000 entries, these two should include only the commonly known words. If you activate "Sticky selection" and deactivate "Skip over on button tap" in Manage Dictionaries, you can search exclusively in one of these two dictionaries, and then if you need to know more about a particular word, you could change to the other dictionaries.

Regards, Shun

I would agree with this! Bought the Tuttle dictionary and I am very happy with it. It sounds like I use the dictionary in a similar way to you Dunhuang.

The Tuttle has 13,000 entries.
Unihan has 43,000 entries.
Pleco has 82,000 entries.

Apparently an English toddler at 2 and a half knows 500 words, a 16 year old knows 12,000 words, and a university graduate knows 21,000 words.

So Tuttle should get you to a 'normal' level without jargon etc.

What I'm still having trouble understanding is why this would be useful for Chinese-to-English - if you read or hear a Chinese word then you can be reasonably confident it's something that's still in use. So why do you need to limit the list of Chinese-English results if the most common ones are on top anyway?

It is an issue for English-to-Chinese, but the best way to avoid those 'never heard of this word before' situations with that is to get a good English-to-Chinese dictionary; that will generally spare you from accidentally digging up words or meanings that nobody uses.

I think a lot of the time you don't hear the words though. It is not a case of listening, it is speaking, so you want to know what the Chinese word for 'x' is. It has happened to me before because I will search a word, there will be a common word at the top which will have a definition similar to the English word you are thinking of, but further down there is another word (less-common) but the English translation appears more specific. So you choose that second word to use, but you end up saying a word which is simply not in use.

One example was when I wanted to say the word 'cliff' (or perhaps it was 'crevice' I have forgotten). Anyway I said a word which was just not in use despite the fact that in the dictionary it was the top entry. The other translation was 'precipice' which of course is much less common in English than 'cliff' is, so I presumed it would be the same for Chinese, but it wasn't.
 
Last edited:
Having just searched for 'cong' now using Tuttle, it comes up with 6 of the 7 'cong' that your university-education friend knows:

从聪丛匆葱囱

It does not include:




which apparently only means 'jade jewellery' or 'gurgling' anyway (not important).

Not bad at all in my opinion. And it doesn't come up with any of the 17 obscure characters.
 
Perhaps the level of what constitutes as 'rare' could be lowered? (surely if a university graduate doesn't know a character than it must be constituted as rare! It is the Chinese equivalent of putting words like 'ratoon' and 'plethora' into a dictionary.)

Perhaps you might want to get a reality check instead.

most university graduates, that I have met at least, do not know the character 闩 - but that does not mean that it is 'rare' by any sense of the word.
 

Peter

榜眼
Currently I do something similar to OP's request by using a "Frequency Dictionary".

This is a simple dictionary I created that lists the frequency of each character in SUBTLEX and MODERN (http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO).

When I come across a new word with an unknown character, I use the dictionary to lookup the frequency(s) of the new character. Currently, If its above 3000th frequency in both SUBLTEX and MODERN I don't spend any effort on it.

If I could make one wish, it would be to let users upload a list of characters, and have Pleco display search results differently when a headword contains a character not in the list. Color to distinguish results would be my choice!

Edit: example:

gui.png
 

HW60

状元
@Peter: do you have an idea why some of the first letters in my user dictionary made of frq.txt are truncated? When I paste frq.txt in an Excel sheet, all letters are visible.
 

Attachments

  • Screenshot_2016-03-16-16-54-37.png
    Screenshot_2016-03-16-16-54-37.png
    196.3 KB · Views: 564

Shun

状元
@HW60: FWIW, you could try loading this Frequency user dictionary which looks fine on my iOS device. I'll try it on Android later.
 

Attachments

  • Frequency.pqb.zip
    1.1 MB · Views: 517

HW60

状元
@Shun: Thank you, but the result looks exactly like my dictionary entry. Either an Android problem, or one of the reasons why @Mike does not like Samsung ...
 

mikelove

皇帝
Staff member
@HW60 - that's a weird one; what happens if you turn off Night Mode, do they appear then?

(Samsung actually are one of the less annoying OEMs in this particular department - Sony e.g. commits far greater fragmentation sins when it comes to font rendering)
 
Top