[Unofficial] Feature Request / Suggestion List

jurgen85 · May 27, 2020

Inspired by my other post in the Outlier thread:

If the Unihan dictionary is enabled, the distinction between simplified and traditional becomes quite messy. E.g. it now lists 师 as a traditional variant and 師 as simplified (when in the opposite mode).

——————————

Also it would make sense to be able to search the same sentence in either set, i.e. both
#说话太直
and
#說話太直
should hit the same 譬如 in OCC.

——————————

Edit: I just noticed that a flashcard with the hanzi 这[這] will not be tested at all with "Remap cards to dicts" set to only the Outlier dicts.

mikelove · May 30, 2020

Unihan: thanks, will investigate. (we don't test much with it enabled as a regular dictionary, to be honest)

# should generally be able to search both sets but some dictionaries are tagged as only supporting one set or the other and in those we don't even try to generate / search for the other set. If you search for a sentence in a dictionary like PLC that supports both, do you get results in both sets?

这 doesn't seem to have an SC/TC map in Outlier at the moment; we'll add it to our correction list. There aren't official SC/TC maps from Outlier yet, we generated our own quick-and-dirty ones for search indexing purposes but didn't want to put too much more into it than that since we expect they probably will add them eventually.

jurgen85 · May 31, 2020

Oxford (non-pocket) should have both sets provided by publisher, no?

Searching that except #說話太直 should match either
說話太直率的 (from "bone"), or
譬如他, 就是個說話太直的人。(from "譬如")

Pleco dictionary behaves the same (#率師東征)

mikelove · May 31, 2020

Sorry, are you doing this with Pleco in simplified character mode?

Full text Chinese searches are limited to the current character set, matching the way that definitions / examples / etc are displayed in Pleco. In theory I suppose we could add an option to support them in both - it’s an intentional thing, not a bug or a technical limitation - but AFAIK in the many years since we added that behavior you’re the first person to complain.

jurgen85 · Jun 1, 2020

I suppose there are a few cases where it might give inappropriate matches (后, 系, ...) but many of those are either ancient (like 云) or names (like 并), i.e. fairly rare. The vast majority of simplifications like 話 > 话 only exist in one set each. Even our dearly beloved 發 vs 髮 shouldn't be a problem since there is no traditional 发.

Perhaps any dubious matches (i.e. chars that have multiple different trad defs for a single simp) from the secondary set could be sorted after the primary.

daal · Jul 1, 2020

Sometimes when I come across an unfamiliar word, I wonder how common it is. Would it be possible for Pleco to show this somehow? One rough indication would be for Pleco to display HSK levels.

mikelove · Jul 1, 2020

That's already supported, just add category tags to your HSK level categories in Organize Cards ((i) button on iOS / long press on Android) and then words will show up on the definition screen with those tags.

timseb · Jul 14, 2020

One thing I would appreciate, which I can't see suggested anywhere, is the "Import all matching dictionary entries" in the "Ambigious entries". For example, words like 好事, 結果, 琢磨 and so on have two or more entries in Guifan, and it would be nice for both to be added automatically. I mostly add words when I'm about to read a book, a chapter, a news article or so on, and then I won't know on beforehand what entry I will encounter, and therefore prefer to learn both to be able to identify them.

BenJackson · Jul 14, 2020

daal said:
Sometimes when I come across an unfamiliar word, I wonder how common it is. Would it be possible for Pleco to show this somehow? One rough indication would be for Pleco to display HSK levels.

I believe (@mikelove can confirm?) that a lot of results lists in Pleco are naturally sorted by some rough frequency, such as dictionary searches, the "words" tab, the "chars" tab. I don't know if there's any direct way to view that information.

There's a useful user dictionary here: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ which has raw frequency data from several sources. As a collector of frequency data myself, I have considered making another one like it.

timseb · Jul 14, 2020

Noticed reply was not for me (though I got a notification). Deleted.

BenJackson · Jul 14, 2020

timseb said:
One thing I would appreciate, which I can't see suggested anywhere, is the "Import all matching dictionary entries" in the "Ambigious entries". For example, words like 好事, 結果, 琢磨 and so on have two or more entries in Guifan, and it would be nice for both to be added automatically. I mostly add words when I'm about to read a book, a chapter, a news article or so on, and then I won't know on beforehand what entry I will encounter, and therefore prefer to learn both to be able to identify them.

This is a hack, but I've noticed that if I import a word multiple times with "ambiguous entries" set to "use first" each subsequent import will get the next available entry. So you might get what you want by just importing your list multiple times.

I build a lot of word lists based on things I plan to read. After hitting issues like you describe (and with the import "feature" I mentioned above) I mostly leave "ambiguous entries" set to "prompt" and I consult the source to see the word in context before I choose an entry. Back when I imported 1000 words at a time instead of a few hundred I did just let it pick the first entry, though.

(Also, re: your deleted reply, and my suggestion of the frequency data dictionary: It's definitely a good idea to go to "dictionaries" under "import cards" and make sure the frequency "dictionary" is below the line so it does not get used as a flashcard source!)

mikelove · Jul 14, 2020

BenJackson said:
I believe (@mikelove can confirm?) that a lot of results lists in Pleco are naturally sorted by some rough frequency, such as dictionary searches, the "words" tab, the "chars" tab. I don't know if there's any direct way to view that information.

Yes, and no, not directly viewable - we don't want people building anything around that data (e.g. using it to come up with lists of words to study) because we don't consider it reliable enough for that; picking out the more common items from a small subset of all Chinese words (e.g. those that contain the word 'cake' in their definition) is about the most we'd use it for.

timseb · Jul 14, 2020

BenJackson said:
This is a hack, but I've noticed that if I import a word multiple times with "ambiguous entries" set to "use first" each subsequent import will get the next available entry. So you might get what you want by just importing your list multiple times.

Tried this now, but did not work unfortunately.

BenJackson · Jul 14, 2020

timseb said:
Tried this now, but did not work unfortunately.

Does your import have the pinyin column? I assume in that case there would be no ambiguity (e.g. 好事 as hǎoshì vs hàoshì). I always import bare Hanzi lists because I'm working from sources that don't clarify pronunciation.

timseb · Jul 14, 2020

BenJackson said:
Does your import have the pinyin column? I assume in that case there would be no ambiguity (e.g. 好事 as hǎoshì vs hàoshì). I always import bare Hanzi lists because I'm working from sources that don't clarify pronunciation.

My experiment was without pinyin, just a plain hanzi list. I wonder why it did not work.

timseb · Jul 17, 2020

If I haven't misunderstood anything, capitalized pinyin entries don't seem to be treated as ambigious when importing. Shouldn't they be? For example, a txt file with "沈 shen3" gives me two entries to chose from in the KEY dictionary, instead of three, because the third entry is capitalized (Shen3, the surname). In this example Shen as a surname is also included in the non-capitalized entry, but the question is more in general: shouldn't capitalized entries still turn up among the ambigious to chose from?

mikelove · Jul 17, 2020

No, we treat capitalization as meaningful so we don't match lower/upper or upper/lower unless there's no exact match.

(however, this behavior will be highly customizable in 4.0)

giokve · Jul 17, 2020

Not really a suggestion, but the word "stress" (verb/noun) is missing from NEC.

mikelove · Jul 17, 2020

Thanks!

timseb · Jul 17, 2020

Another question for me. Is there any reason it isn't considered a missing entry when importing, if the pinyin in the txt file is not even close to the ones it tries to match? Like shao4 for 削 which lets me chose between xue1 and xiao1, and many others when I import. It would be nice to be able to tick in a box that was stricter in its definition of missing entry.

[Unofficial] Feature Request / Suggestion List

榜眼

皇帝

榜眼

皇帝

榜眼

探花

皇帝

进士

举人

进士

举人

皇帝

进士

举人

进士

进士

皇帝

进士

皇帝

进士