The MoE dictionary is now open source

abhoriel · Apr 8, 2013

thanks a lot for this! its a really beautiful dictionary.
Simplified headwords would indeed be nice addition though

alex_hk90 · Apr 8, 2013

abhoriel said:
thanks a lot for this! its a really beautiful dictionary.
Simplified headwords would indeed be nice addition though

You're welcome.

I'm happy to add simplified headwords, if there is a good enough conversion script that deals with the exceptions where a single traditional character can map to multiple simplified characters (as well as the simple one-to-one conversions obviously).

sandwich · Apr 13, 2013

Just noticed a couple of characters with wrong readings when pron system is set to zhuyin. 欸 is "?", and 誒 is "ề". Should be ㄟˋ and ㄝˋ respectively. No idea how often this is a problem (and don't care about who to blame), but might be an idea to give pleco the zhuyin and let it sort out generating the pinyin.

Also are entries that have "(又音)" in their pronunciation a non issue? (see: 欸 and 呆). Wrt 又音, no way to merge them into their original entry right? Given that some are already like this. (see: 腌臢)

Edit: A couple of other things, some characters have "(變)" plus an additional pronunciation, and 兒 has incorrect zhuyin. shows "˙" instead of "ㄦ". (see: 蔓兒 for both)
Edit2: Ok, looks like 兒 thing is a pleco bug since other dictionaries have the same issue.

mikelove · Apr 13, 2013

sandwich said:
Edit: A couple of other things, some characters have "(變)" plus an additional pronunciation, and 兒 has incorrect zhuyin. shows "˙" instead of "ㄦ". (see: 蔓兒 for both)
Edit2: Ok, looks like 兒 thing is a pleco bug since other dictionaries have the same issue.

Yep, glitch in the Zhuyin conversion table that correctly maps 'er' but not 'r'. I believe it's fixed on Android but it would be awkward to merge that change into an interim release on iOS. (very much looking forward to not having two different code bases anymore)

alex_hk90 · Apr 14, 2013

sandwich said:
Just noticed a couple of characters with wrong readings when pron system is set to zhuyin. 欸 is "?", and 誒 is "ề". Should be ㄟˋ and ㄝˋ respectively. No idea how often this is a problem (and don't care about who to blame), but might be an idea to give pleco the zhuyin and let it sort out generating the pinyin.

I'm not sure if you can import the Zhuyin using flashcards, at least I don't know how to do it.

sandwich said:
Also are entries that have "(又音)" in their pronunciation a non issue? (see: 欸 and 呆). Wrt 又音, no way to merge them into their original entry right? Given that some are already like this. (see: 腌臢)

Is the entry for 呆 a typo or is ai2 (not dai2) really an alternate pronunciation? A benefit of not merging them is that the alternate pronunciations are searchable by the Pinyin. Regarding the feasibility of merging them, as long as there is no ambiguity in the entry that it is referring to then it should be possible (if a little tricky).

sandwich said:
Edit: A couple of other things, some characters have "(變)" plus an additional pronunciation, and 兒 has incorrect zhuyin. shows "˙" instead of "ㄦ". (see: 蔓兒 for both)
Edit2: Ok, looks like 兒 thing is a pleco bug since other dictionaries have the same issue.

Yeah I noticed the "(變)" ones - it's from the original data. I was considering splitting them like the "(又音)" ones:
viewtopic.php?f=20&t=3606&start=30#p29339 (point 3.)
viewtopic.php?f=20&t=3606&start=75#p29433
I haven't checked to see if there is a pattern as to why sometimes multiple pronunciations are in the Pinyin field and sometimes it has been split into multiple entries. It's probably possible to either merge or split all of them, but I didn't want to do that until I had worked out what should be done and if there was a reason/pattern to the split/combined entries.

sandwich · Apr 17, 2013

alex_hk90 said:
I'm not sure if you can import the Zhuyin using flashcards, at least I don't know how to do it.

Ah, I just kinda assumed you could swap them and it would work. Obviously not. I haven't noticed any other similar issues yet, so maybe these should just be special cased to 'ei' and 'eh'?

alex_hk90 said:
Is the entry for 呆 a typo or is ai2 (not dai2) really an alternate pronunciation? A benefit of not merging them is that the alternate pronunciations are searchable by the Pinyin. Regarding the feasibility of merging them, as long as there is no ambiguity in the entry that it is referring to then it should be possible (if a little tricky).

I'm not sure what you are asking here. Personally I would assume the dictionary is right. Unfortunately the (又音) don't have usage notes, so if they weren't searchable then there would be no point to having them. Main thought was that they are kinda hidden from view and the actual info they refer to is in the other entry. (I don't know how the multiple entry stuff works, so maybe this is just a temporary non issue on ios?).

alex_hk90 said:
Yeah I noticed the "(變)" ones - it's from the original data. I was considering splitting them like the "(又音)" ones:
http://www.plecoforums.com/viewtopic.ph ... =30#p29339 (point 3.)
http://www.plecoforums.com/viewtopic.ph ... =75#p29433
I haven't checked to see if there is a pattern as to why sometimes multiple pronunciations are in the Pinyin field and sometimes it has been split into multiple entries. It's probably possible to either merge or split all of them, but I didn't want to do that until I had worked out what should be done and if there was a reason/pattern to the split/combined entries.

Ah, sorry. Missed that you had already discussed it.

alex_hk90 · Apr 19, 2013

sandwich said:
Ah, I just kinda assumed you could swap them and it would work. Obviously not. I haven't noticed any other similar issues yet, so maybe these should just be special cased to 'ei' and 'eh'?

In all honesty I haven't tried it - I don't know anything about Zhuyin.

sandwich said:
I'm not sure what you are asking here. Personally I would assume the dictionary is right.

It was just an aside on that particular entry, I would've thought dai2 was a more obvious alternate pronunciation for dai1 than ai2 would be.

sandwich said:
Unfortunately the (又音) don't have usage notes, so if they weren't searchable then there would be no point to having them. Main thought was that they are kinda hidden from view and the actual info they refer to is in the other entry. (I don't know how the multiple entry stuff works, so maybe this is just a temporary non issue on ios?).

I'm not sure what (temporary non-)issue you are talking about?

sandwich said:
Ah, sorry. Missed that you had already discussed it.

No worries - do you have any suggestions of how to resolve these?

feng · Apr 19, 2013

audreyt said:
The missing entries are all variant characters; they have no distinct semantics, and it's safe to discard them.

Variants are important for people reading genuine old books (not modern editions) or doing research into characters.

alex_hk90 said:
- Sometimes the Pinyin has a note before it, like <又音> or <讀音>; this should probably be moved into the main body of the definition as well,...

sandwich said:
Unfortunately the (又音) don't have usage notes, so if they weren't searchable then there would be no point to having them.

又音: "also pronounced", often these are the same as PRC pronunciations, and the analogous form of Xinhua Zidian's 舊讀 (which are often Taiwan pronunciations). I am not really sure that each did this for the purpose of cross-listing the other's pronunciation, but rather because there are historical issues with the change in pronunciation on either side of the straits. In fact, though I can't think of an example, I am sure that some of them are not such. In any case, they don't have usage notes because they are not, in my experience, related to usage. It's a "You say tomato. I say tomato" kind of a thing.
讀音: these are the reading pronunciations for the classical language by which, for example, 白 bai2 is also bo2.
語音: standard, colloquial pronunciation.

alex_hk90 said:
Is the entry for 呆 a typo or is ai2 (not dai2) really an alternate pronunciation?

Yes, really. Taiwan has some old pronunciations that the PRC no longer uses.

Yiliya said:
著 (don't simplify when it's pronounced zhù), very common character that causes the most of the confusion in Trad -> Simp conversions
徵 (don't simplify when it's pronounced zhǐ), this is a rare usage, but still, MoE has it
於 (don't simplify when it's pronounced wū), also rare

Also, the 幺/么/麼/麽 confusion. Basically, Trad 么 = Simp 幺 (yāo), Trad 麼 = Simp 么 (me) OR 麽 (mó). This way, 么麼 (yāomó) gets simplified to 幺麽.

Another thing to consider is that the MoE dictionary uses a number of archaic traditional characters throughout the whole dictionary, case in point - 祕 (instead of the nowadays commonly accepted 秘).

* 著: zhuo2 is also in Xinhua Zidian
* Although 么 (yao1) is the official standard in Taiwan, even 《新編國語日報辭典》(Taiwan's equivalent of 《現代漢語詞典》 or 《新華字典》) uses 幺, since 么 is a variant, historically.
* 祕 is the official standard in Taiwan. 秘 is for the PRC (and a variant in origin).

mikelove said:
but iOS actually includes both simplified- and traditional-styled variants of its built-in STHeiti font (with the attendant character / punctuation / etc changes)

At least on a Mac, STHeiti does not display 敢 with a 丅 on top of the 耳, but rather a 乛。There are a number of characters like this. PRC traditional varies occasionally from Taiwan traditional _by font_ no matter what one types (and varies plenty of times for other reasons). Nearly all computer fonts for Chinese on English OS systems are entirely PRC, even for traditional characters. Taiwan's MoE has fonts for free and most computers have "BiaoKai" which seems to be the same, but I won't swear to it.

mikelove said:
FWIW, here's a longer list of TC->multiple SC mappings, including a couple of Extension B/C ones which might be considered more "variants":

There are some errors in that list, unless it is a display issue. One example: 锺 is not a character in the PRC, though it is typeable. PRC uses 钟 for both 鐘 and 鍾. There are other such examples there.

A couple of thoughts:

1) I find Taiwan's MoE very responsive. Recently, I email two different offices at the MoE (including the one that controls this dictionary and others) once or twice a month for some research I am doing and they are great. Academia Sinica can go . . . but anyway MoE is responsive.

2) I hope you can do this and that you can save the bo-po-mo (re comment above somewhere). I for one am very sick of always looking at Hanyu Pinyin.

alex_hk90 · Apr 20, 2013

feng said:
Yes, really. Taiwan has some old pronunciations that the PRC no longer uses.

Interesting.

feng said:
* 著: zhuo2 is also in Xinhua Zidian
* Although 么 (yao1) is the official standard in Taiwan, even 《新編國語日報辭典》(Taiwan's equivalent of 《現代漢語詞典》 or 《新華字典》) uses 幺, since 么 is a variant, historically.
* 祕 is the official standard in Taiwan. 秘 is for the PRC (and a variant in origin).

Is there an official source that states all this information (on the Traditional to Simplified conversions), or is it just common knowledge?

feng said:
2) I hope you can do this and that you can save the bo-po-mo (re comment above somewhere). I for one am very sick of always looking at Hanyu Pinyin.

The data has columns for bopomofo:

Code:

sqlite> select bopomofo, bopomofo2, pinyin
   ...> from heteronyms
   ...> where rowid > 100000 and rowid < 100010;
ㄈㄨˇ ㄊㄧㄢˊ|fǔ tián|fǔ tián
ㄩㄥˇ|yǔng|yǒng
ㄩㄥˇ ㄐㄩˋ|yǔng jiù|yǒng jù
ㄩㄥˇ ㄌㄨˋ|yǔng lù|yǒng lù
ㄩㄥˇ ㄉㄠˋ|yǔng dàu|yǒng dào
ㄅㄥˊ|béng|béng
ㄅㄥˊ ㄩㄥˋ|béng yùng|béng yòng
ㄋㄧㄥˋ|nìng|nìng
ㄋㄧㄥˊ|níng|níng

Which column would you want to have? At worst it could probably be imported into the main definition text (of course I could try just replacing the Pinyin with it, but I don't know if Pleco supports that for flashcard/user dictionary imports).

mikelove · Apr 20, 2013

feng said:
At least on a Mac, STHeiti does not display 敢 with a 丅 on top of the 耳, but rather a 乛。There are a number of characters like this. PRC traditional varies occasionally from Taiwan traditional _by font_ no matter what one types (and varies plenty of times for other reasons). Nearly all computer fonts for Chinese on English OS systems are entirely PRC, even for traditional characters. Taiwan's MoE has fonts for free and most computers have "BiaoKai" which seems to be the same, but I won't swear to it.

That's why STHeiti on iPhone also has that STHeitiTC variant, which displays 敢 and 骨 and other problematic characters in the correct way for Taiwan.

feng said:
There are some errors in that list, unless it is a display issue. One example: 锺 is not a character in the PRC, though it is typeable. PRC uses 钟 for both 鐘 and 鍾. There are other such examples there.

True, hadn't looked that carefully... just double-checked to make sure that particular mistake isn't in any of our dictionary conversions and it seems like it isn't, thankfully.

Do you have another list that you'd recommend alex_hk90 (or whoever eventually converts this to simplified) use?

feng said:
1) I find Taiwan's MoE very responsive. Recently, I email two different offices at the MoE (including the one that controls this dictionary and others) once or twice a month for some research I am doing and they are great. Academia Sinica can go . . . but anyway MoE is responsive.

That hasn't been my experience with them, sadly.

feng said:
2) I hope you can do this and that you can save the bo-po-mo (re comment above somewhere). I for one am very sick of always looking at Hanyu Pinyin.

Pleco has BoPoMoFo support built-in for all dictionaries (auto-converts from Pinyin), so it's not really necessary to extract it specifically from this one.

feng · Apr 20, 2013

alex_hk90 said:
feng said:

* 著: zhuo2 is also in Xinhua Zidian
* Although 么 (yao1) is the official standard in Taiwan, even 《新編國語日報辭典》(Taiwan's equivalent of 《現代漢語詞典》 or 《新華字典》) uses 幺, since 么 is a variant, historically.
* 祕 is the official standard in Taiwan. 秘 is for the PRC (and a variant in origin).

Click to expand...

Is there an official source that states all this information (on the Traditional to Simplified conversions), or is it just common knowledge?

All official

. But in lots of different places. The PRC has various official lists. Taiwan has one main official list and then other lists for very uncommon characters. They go about it in very different ways, with issues to be figured out even within both sets of lists -- and between Taiwan and the PRC is a larger issue to figure out all the different forms and disappearing characters and such.

The parts I wrote about variants is not 'official'. You have to research to get that gold, though it is easy to do the larger part of that on Taiwan's online 《異體字字典》 which is quick, easy, and free. One can (and if it is important, must) look in serious paper character dictionaries that at least attempt to base themselves on historical principles such 《正中形音義綜合大字典》 or 《漢語大字典》（第二版） and other sorts of dictionaries, either ancient or modern about the ancient. I figure it beats having a heroine habit! Though it cost me as much to buy all those books

Sorry, I don't have any sort of hand-held device, so I can't give an opinion regarding your specific question about placement, other than to say I am in a long term love affair with bo-po-mo :mrgreen:

mikelove said:
Do you have another list that you'd recommend alex_hk90 (or whoever eventually converts this to simplified) use?

I am in the middle of researching all this as a small part of a larger project I am working on. Looking at your list again: 鉋刨,铇 : 铇 does not exist in the PRC, officially. As with the character in the previous post, it is something that theoretically should exist based on List 2 (Zong Biao), but the PRC's Yitizi List proscribes 鉋 altogether, which makes whatever List 2 might do to it moot. 卻卻,却: the PRC proscribes 卻. Still others there. And, as you of course know, there are several funky characters with weird rules about simplification such as 線 (can't type the simplified form; I don't mean 綫／线) and 馀 and 摺 and others (only simplify if your life depends on it). Funny, with three of ten appendices dealing with simplification issues I never thought to make a list for multiple simplified forms. Actually, I think the list is a bit smaller than what you gave. I will take a look at it some more on Monday (or Tuesday . . .).

I was actually staying away from List 1 and List 2 in general (other than to note the simplifications), except where they mess things up (outside of mere simplification) in Taiwan's list of 4,808 common characters that I am using as the corpus for my project. The problem is that even outside of almost 5,000 characters there are still lots of issues like this, so to make a comprehensive list would require of me rather more effort. I have been focusing on the list of 4,808 characters as they represent nearly all the characters one would ever want for anything outside of the numerous rare forms encountered in Chinese history or classical literature. Ideally I want to modify the list for a future project, adding and deleting a hundred or so characters each way to make it "perfect" for daily use. The list provides me with a set of parameters that allow me to finish my project before the next Mayan cycle rolls around (how's that for a Chinese reference? It is.).

I am afraid I may not have answered your question or made much sense. Don't be shy about refocusing my attention.

mikelove said:
That hasn't been my experience with them, sadly.

You emailed onile@mail.naer.edu.tw ? Them is the dictionary people. Actually, http://email.moe.gov.tw/EDU_WEB/sendmail/send.php?sGo=1
would probably be better since you are asking about a larger legal issue. And the second is the office that gives better responses. Of course, I realize you need more than an email, but this can start things off. I will PM you their phone number. Never did phone them as my current situation is not conveniently set up for overseas calls, but they have given me their office number more than once. Their office certainly can't handle your question, but I am confident they can get you to someone who can.

mikelove said:
Pleco has BoPoMoFo support built-in for all dictionaries (auto-converts from Pinyin), so it's not really necessary to extract it specifically from this one.

Reason number 783 to get Pleco! :idea:

alex_hk90 · Apr 21, 2013

feng said:
All official . But in lots of different places. The PRC has various official lists. Taiwan has one main official list and then other lists for very uncommon characters. They go about it in very different ways, with issues to be figured out even within both sets of lists -- and between Taiwan and the PRC is a larger issue to figure out all the different forms and disappearing characters and such.

The parts I wrote about variants is not 'official'. You have to research to get that gold, though it is easy to do the larger part of that on Taiwan's online 《異體字字典》 which is quick, easy, and free. One can (and if it is important, must) look in serious paper character dictionaries that at least attempt to base themselves on historical principles such 《正中形音義綜合大字典》 or 《漢語大字典》（第二版） and other sorts of dictionaries, either ancient or modern about the ancient. I figure it beats having a heroine habit! Though it cost me as much to buy all those books

Sorry, I don't have any sort of hand-held device, so I can't give an opinion regarding your specific question about placement, other than to say I am in a long term love affair with bo-po-mo :mrgreen:

Thanks for all the information - interesting stuff.

feng said:
I am in the middle of researching all this as a small part of a larger project I am working on. Looking at your list again: 鉋刨,铇 : 铇 does not exist in the PRC, officially. As with the character in the previous post, it is something that theoretically should exist based on List 2 (Zong Biao), but the PRC's Yitizi List proscribes 鉋 altogether, which makes whatever List 2 might do to it moot. 卻卻,却: the PRC proscribes 卻. Still others there. And, as you of course know, there are several funky characters with weird rules about simplification such as 線 (can't type the simplified form; I don't mean 綫／线) and 馀 and 摺 and others (only simplify if your life depends on it). Funny, with three of ten appendices dealing with simplification issues I never thought to make a list for multiple simplified forms. Actually, I think the list is a bit smaller than what you gave. I will take a look at it some more on Monday (or Tuesday . . .).

I was actually staying away from List 1 and List 2 in general (other than to note the simplifications), except where they mess things up (outside of mere simplification) in Taiwan's list of 4,808 common characters that I am using as the corpus for my project. The problem is that even outside of almost 5,000 characters there are still lots of issues like this, so to make a comprehensive list would require of me rather more effort. I have been focusing on the list of 4,808 characters as they represent nearly all the characters one would ever want for anything outside of the numerous rare forms encountered in Chinese history or classical literature. Ideally I want to modify the list for a future project, adding and deleting a hundred or so characters each way to make it "perfect" for daily use. The list provides me with a set of parameters that allow me to finish my project before the next Mayan cycle rolls around (how's that for a Chinese reference? It is.).

I am afraid I may not have answered your question or made much sense. Don't be shy about refocusing my attention.

Essentially I am looking for an accurate way of converting the Traditional headwords to Simplified in this dictionary. The data has the Traditional characters and the Pinyin, so I'm thinking it could be done in two parts:
1. Convert all one-to-one Traditional to Simplified characters using a list of (Traditional, Simplified) pairs.
2. Convert all one-to-many Traditional to Simplified characters using a list of ({Traditional, Pinyin}, Simplified) pairs, assuming that {Traditional, Pinyin} to Simplified is a one-to-one mapping (i.e. there are no Traditional characters which convert to more than one Simplified characters once you consider the pronunciation).

goldyn chyld · Apr 21, 2013

Speaking of Tw 異體字字典, you can find its wordlist here: https://github.com/kcwu/moedict-variants

I wonder if it'd be possible to make it work in Pleco. But it seems quite complicated, esp. since they often use an image to display a rare character...

feng · Apr 21, 2013

alex_hk90 said:
2. Convert all one-to-many Traditional to Simplified characters using a list of ({Traditional, Pinyin}, Simplified) pairs, assuming that {Traditional, Pinyin} to Simplified is a one-to-one mapping (i.e. there are no Traditional characters which convert to more than one Simplified characters once you consider the pronunciation).

There are some fickle simplifications, which I mentioned in my last two or three replies to this thread. There are also, as mentioned in those same replies, some incorrect simplifications going around. Getting it 99% right is easy; getting it 100% right takes more effort, as there are nitpicking little exceptions to worry about.

goldyn chyld said:
Speaking of Tw 異體字字典, you can find its wordlist here: https://github.com/kcwu/moedict-variants

I wonder if it'd be possible to make it work in Pleco. But it seems quite complicated, esp. since they often use an image to display a rare character...

The bottom of the page at the original site has "中華民國教育部　版權所有　 (c) 2000 Ministry of Education, R.O.C. All rights reserved." Is that open source now?

In any case, how does that thing at github work? Sorry, I am not familiar with this technical stuff. If all they have is a list of characters, that would be pointless. It is having all the scans of old dictionaries that makes that site such a time saver.

alex_hk90 · Apr 21, 2013

feng said:
There are some fickle simplifications, which I mentioned in my last two or three replies to this thread. There are also, as mentioned in those same replies, some incorrect simplifications going around. Getting it 99% right is easy; getting it 100% right takes more effort, as there are nitpicking little exceptions to worry about.

Yeah, it does seem to be the case that it is that last ~1% or so that is the issue. What I'm looking for is a list/table/database that includes all of these, so I can just do a search/replace (more or less) on the Traditional to get the correct Simplified. Do you know if such a list exists? Unfortunately I don't have the time to do the research and collate from different sources.

feng · Apr 22, 2013

I hope to finish my project by autumn. The information you seek will eventually be in there, but I am still working on it. It is a sidelight to the main project, and so I can not promise it will be in the sort of list form that you are looking for, though it might be. As I alluded to somewhere above, there are a lot of in and outs to this in both Taiwan and the PRC (especially the latter), and I would direct you to 《新華字典》 or 《當代漢語詞典》or other such standard PRC dictionary for an authoritative guide to the PRC forms (or the various lists for simplification and standardization). I need to finish all the research and then I can play around with the presentation some more. Good luck with your project!

Mike, new forum look is great!

mikelove · Apr 22, 2013

feng said:
Mike, new forum look is great!

Thanks! We're trying to overhaul all of our website over the next few months - it's embarrassing to be a software company in 2013 with such an outdated / primitive-looking web presence.

feng · Apr 26, 2013

You thought I forgot? I did! Not sure if there is still interest in this or not, but I have corrected the list (from page 3 of this thread) below. As I understand it, this is supposed to be one traditional form on the left with its two simplified forms on the right. I am going by PRC pronunciation since these are for use in the PRC, and therefore also pretty much also noting things only as they apply to the PRC. If I say something is not a character in the PRC, that means it is not one of the characters approved for general daily use. I did not cite sources as in previous posts on this topic in this thread, since my time is pulling me elsewhere. If something has its own entry in Xinhua Zidian, but says 同 whatever, then it is not (officially) a simplified or variant character; it is a standard PRC character. Below this corrected list I have cut and pasted a new list of the only 11 out of 37 entries from this list that seem to warrant T: S, S status. If you feel I am in error, please say so.

NB: There are more T: S, S situations like this. I may not get around to it till June. Should I edit this post (no one will be notified by the system) or make a new post on this thread?

么么,幺: Taiwan uses 么 for yao1 (a silly thing to do); PRC 么 me5, ma5 (and yao1, 同 “ 幺”) and 幺 yao1.
乾乾,干: 乾 is only used in the PRC when it is pronounced qian2.
份份,分: Both characters exist in the PRC.
俱具,俱: Both characters exist in the PRC.
卻卻,却: 卻 is not a PRC character.
夥伙,夥: 夥 is only used in the PRC when it means 多.
幺么,幺: see first entry in this list
彷彷,仿: Both characters exist in the PRC.
徵徵,征: 徵 zhi3; 征 (征 + 徵) zheng1
摺摺,折: 摺 is only used in the PRC when necessary to avoid confusion.
擣捣,U+22B4F: What is U+22B4F?
於于,於: yv2, 于 preferred; yv1, 於, a surname; wu1, 於, “文言嘆詞”
沈沈,沉: Both characters exist in the PRC.
瀰弥,㳽: 弥 = 彌 and 瀰
甚什,甚: shen2, preferred 什; shi2, 什; shen4, 甚
畫画,划: 划 is the simplified form of 劃, not 畫 (though issues of usage exist, but that is separate).
瞭瞭,了: only 瞭 when it is pronounced liao4, which is rare
矓眬,胧: 眬 (矓) and 胧 (朧) both exist in the PRC.
綵彩,䌽: only 彩 in PRC; no 䌽/綵
著著,着: zhu4, zhuo2, 著; zhuo2 and others, 着
藉藉,借: 借口、凭借，otherwise 藉
覆复,覆: 覆 is OK in PRC since 1986.
託托,讬: 托 stands for 託/讬 in the PRC.
諮咨,谘: 諮/谘同 “ 咨” in both the PRC and Taiwan.
逕径,迳: 径 is not an accepted PRC form; 徑 stands for 逕/迳 in the PRC.
鉅巨,钜: 鉅/钜 is not a character in the PRC.
鉋刨,铇: 鉋/铇 is not a PRC character.
鍾钟,锺: 锺 is not a character in the PRC
阪坂,阪: These are both characters in the PRC.
願愿,U+2B5B8; What is U+2B5B8?
颺扬,飏: 扬 stands for 颺/飏.
餘馀,余: 馀 is only supposed to be used when necessary to prevent confusion.
餸餸,U+2980c: What is U+2980c?
騃呆,U+2B624: What is U+2B624?
鬹鬶,鬹: 鬲 (with丷) is the standard form in both Taiwan and the PRC, and anyway the form 鬲 (with 儿) is a minor issue and doesn’t make it another character; same for 見／见
鹼碱,硷: Both characters exist in the PRC.
麼麽,么: PRC uses 麽 when pronounced mo2 and otherwise it is 么 (see above).

The Eleven:
么么,幺: Taiwan uses 么 for yao1 (a silly thing to do); PRC 么 me5, ma5 (and yao1, 同 “ 幺”) and 幺 yao1. True, I could also look at this from the perspective of the 幺么,幺 entry above, but I am tempted to delete both entries (a list of ten) because yao1 is Taiwan, 么 and PRC 幺, and Xinhua Zidian has a separate entry for 么 -- these are character form differences, not simplification or official variant character differences; they're one to one. I may just keep 麼：么、麽。Your thoughts are hereby solicited.
乾乾,干: 乾 is only used in the PRC when it is pronounced qian2.
夥伙,夥: 夥 is only used in the PRC when it means 多.
摺摺,折: 摺 is only used in the PRC when necessary to avoid confusion.
於于,於: yv2, 于 preferred; yv1, 於, a surname; wu1, 於, “文言嘆詞”
甚什,甚: shen2, preferred 什; shi2, 什; shen4, 甚
瞭瞭,了: only 瞭 when it is pronounced liao4, which is rare
著著,着: zhu4, zhuo2, 著; zhuo2 and others, 着
藉藉,借: 借口、凭借，otherwise 藉
餘馀,余: 馀 is only supposed to be used when necessary to prevent confusion.
麼麽,么: PRC uses 麽 when pronounced mo2 and otherwise it is 么 (see above).

audreyt · Apr 29, 2013

The U+2ABCD notation means "Unicode, codepoint 0x2ABCD" and refers to characters outside the Basic Multilingual Plane (sometimes referred to as "Astral Characters").

Unfortunately, this forum software does not support such characters [for OSX/Firefox here, YMMV].

Please refer to http://www.audreyt.org/newdict/astral.html for their original (character) form. Modern systems should have fonts for them; if not, HanaMinB from http://fonts.jp/hanazono/ contains all currently coded Han characters.

mikelove · Apr 29, 2013

audreyt said:
The U+2ABCD notation means "Unicode, codepoint 0x2ABCD" and refers to characters outside the Basic Multilingual Plane (sometimes referred to as "Astral Characters").

Unfortunately, this forum software does not support such characters...

Actually it does, but so few people have compatible fonts installed on their system that I figured it was easier to just print the codes. But yes, HanaMinB is a great resource, one which we're actually planning to integrate into future versions of our app.

feng said:
NB: There are more T: S, S situations like this. I may not get around to it till June. Should I edit this post (no one will be notified by the system) or make a new post on this thread?

Maybe both? Edit the original post and then make a new one pointing back to it.

The MoE dictionary is now open source

Member

状元

举人

皇帝

状元

举人

状元

榜眼

状元

皇帝

榜眼

状元

状元

榜眼

状元

榜眼

皇帝

榜眼

Member

皇帝