Official MoEDict Pleco Release

mikelove · Aug 5, 2015

Check free section of Add-ons. Available on 3.2.12 too (and in the latest iOS release) but Ruby Zhuyin is only in 3.2.13.

This proved to be a lot harder than we anticipated - some parts of the data needed extensive manual cleaning - but we're pretty satisfied with the result now; feedback certainly welcome though.

两岸词典 should be available soon too, and we've also got another interesting free dictionary we'll be launching probably any day now.

Peter · Aug 6, 2015

Example sentences appear on the DICT page, but not on the SENTS page. No biggie, since the existing MoE import by hsk80 didn't have that either.

alex_hk90 · Aug 6, 2015

mikelove said:
Check free section of Add-ons. Available on 3.2.12 too (and in the latest iOS release) but Ruby Zhuyin is only in 3.2.13.

This proved to be a lot harder than we anticipated - some parts of the data needed extensive manual cleaning - but we're pretty satisfied with the result now; feedback certainly welcome though.

两岸词典 should be available soon too, and we've also got another interesting free dictionary we'll be launching probably any day now.

Thanks Mike.

Out of interest, are you submitting the changes of the manual cleaning back upstream?
Obviously it doesn't really matter to me as I'll likely only be using MoEDict in Pleco anyway.

Abun · Aug 6, 2015

awesome! Excited to maybe see the Minnan and Hakka dictionaries soon as well (hopefully?)

Taichi · Aug 6, 2015

I failed to remap some nonofficial Moe flash entries to the official dictionary (I attached a screenshot). Some of them are due to the problem on unofficial Moe side, but I found some (seemingly) official moe side problems as well.
- 说人人到，說鬼鬼到, 適材适用 traditional and simplifed mixed together.
- 一纲成擒，箱纲养殖，「三日打鱼，两日晒纲」 should be 网 not 纲.
- 大出鋒頭，一著，抽菸 missing simplified?

And some random questions
- Is there any plan for bolding the word?
- 如：「」 Make this type of example sentense playable. Maybe can omit「」and even 如：
- I see "似暖和溫暖" at the buttom of "温和" definitions. This should be appended to the the first definition
- 勞動 lao2dong4 definition has 勞動 lao2dong0 definition appended at the bottom without a separator and pronunciation . Maybe it shouldn't appear at all in the first place.
- 塞翁失馬 shows "別人去安慰他，他卻說" as a reference

mikelove · Aug 6, 2015

@Peter - not sure why they're not coming up, actually; we'll investigate + hopefully patch that shortly.

@alex_hk90 - to be honest, the fixing was all done after we'd already converted the files to our own data format, so it would be difficult to back-port to something they could use, and in any event it was mostly fixing issues relevant to us and the particular requirements of our databases; ensuring that every entry had the same # of characters as Pinyin syllables, for example.

We submitted about 600 changes to CC-CEDICT at one point to based on our conversion work on that and they couldn't use them (and didn't really have a good reason to, since they weren't particularly important or interesting to anybody other than us), so now we simply maintain our own CC-CEDICT diff file and update it every time we do a new CC-CEDICT release. We don't anticipate regular updates to the MoEDict data, so we didn't bother building an automated process for applying our diffs to that, but if they did start actively developing it we could certainly turn what we have into something automated too.

@Abun - that one would be harder since we'd need to teach Pleco how to understand Minnan and Hakka romanization first; I assume a dictionary like this would be less-than-super-useful if it was only searchable by characters and not by pronunciation, correct?

@Taichi - we'll recheck those simplified mappings, thanks - to be honest our primary goal there was matching simplified versions in our other dictionaries when they also had an entry for the same word (so that you'd get nice clean merged results), so we didn't put as much time into making sure the simplified versions were correct in entries that were exclusive to MoEDict, but we certainly are going to go back and clean that up. In the meantime I'd suggest that you back up your flashcard database, then use the batch command to delete all of the simplified versions in your MoEDict-based cards (so only the traditional version is left) and remap again with those - should match up more nicely that way.

Bold headwords - yes, that's on our to-do list.

Playable 如's - we were planning to do that in our first release but they're a gigantic pain the butt to parse (formatting is inconsistent and sometimes downright wonky); hopefully in another update or two, though.

似: Our understanding was that synonyms/antonyms in the original file that aren't preceded by any numbers applied to the entire word, rather than a particular definition. Is that not the case?

勞動: that one may be on the coding end, actually - weird.

塞翁失馬: that one was a parsing issue, it looked to our system like a quotation. (they don't use 《》in those consistently, or do much of anything else to distinguish them from sentences that happen to be followed by full-width colons)

Taichi · Aug 6, 2015

removing the simplified headword did the trick, thx!

似：I see. I didn't realize the bottom ones are for the whole definitions.
Update: It seems the unofficial Moe does place them to the first definition. For the 温和 case I think they should be for the first definition, but my Chinese isn't good enough to be sure.

Update2: "奇怪"'s synonym "古怪" should be for the first definition. So I guess the numberless ones are for the first definition?

giokve · Aug 7, 2015

mikelove said:
Peter - not sure why they're not coming up, actually; we'll investigate + hopefully patch that shortly.

I thought they weren't supposed to be in that page since the sentences of HDC aren't there as well.

mikelove · Aug 7, 2015

@Taichi - I see that the app does that, but the data file has quite a lot of synonyms that start with a "1." suggesting that those are identified separately. (@audreyt, any clarification?)

@giokve - actually yeah, wasn't even intentional but we copied the formatting instructions from those directly from HDC so they ended up getting treated the same way. Not sure whether that's desirable or not, it's easy enough to hide sentences from dictionaries that you don't want to see them from that we're probably better off including them.

ACardiganAndAFrown · Aug 10, 2015

getting some weird stuff at the bottom of this entry in MoE, although it looks like it's just trying to say pronunciation is ren2 xing4.

mikelove said:
@Taichi - I see that the app does that, but the data file has quite a lot of synonyms that start with a "1." suggesting that those are identified separately. (@audreyt, any clarification?)

yeah seems like [1] would refer to definition #1 - so those would only be synonyms of that definition and not subsequent meanings.

etm001 · Aug 12, 2015

Thanks for adding the MoE dictionary to Pleco, it's great to officially have it (I've been using the user defined dictionary, which was really helpful too).

ACardiganAndAFrown · Aug 22, 2015

Abun said:
awesome! Excited to maybe see the Minnan and Hakka dictionaries soon as well (hopefully?)

mikelove said:
@Abun - that one would be harder since we'd need to teach Pleco how to understand Minnan and Hakka romanization first; I assume a dictionary like this would be less-than-super-useful if it was only searchable by characters and not by pronunciation, correct?

Is this likely to happen? [it would be so amazing] There's also a butt-load of audio already done, right? Hakka has like five or six different audio pronunciations online. Would Pleco be able to use these?

mikelove · Aug 24, 2015

@ACardiganAndAFrown - possible, but not a super high priority at the moment since it's not really something we'd expect to make any money on (absent a grant from the TW government or some such) and we can only afford to spend so much time per year on projects like that

ACardiganAndAFrown · Aug 24, 2015

Kickstart it.

edit:

mikelove said:
@ACardiganAndAFrown - possible, but not a super high priority at the moment since it's not really something we'd expect to make any money on (absent a grant from the TW government or some such) and we can only afford to spend so much time per year on projects like that

is there any chance to just import the chars - first - without 'dialectical' pinyin?

mikelove · Aug 24, 2015

Sure, as a user dictionary you'd just prepend a @ to each of the readings so Pleco wouldn't try to parse them as Pinyin.

ACardiganAndAFrown · Aug 24, 2015

mikelove said:
Sure, as a user dictionary you'd just prepend a @ to each of the readings so Pleco wouldn't try to parse them as Pinyin.

@alex_hk90

Please & thanks.

alex_hk90 · Aug 25, 2015

ACardiganAndAFrown said:
@alex_hk90

Please & thanks.

I haven't really been following the sub-thread. What's this about?

ACardiganAndAFrown · Aug 25, 2015

alex_hk90 said:
I haven't really been following the sub-thread. What's this about?

MoE's Minnan and Hakka dictionaries!

Abun · Aug 25, 2015

mikelove said:
@Abun - that one would be harder since we'd need to teach Pleco how to understand Minnan and Hakka romanization first; I assume a dictionary like this would be less-than-super-useful if it was only searchable by characters and not by pronunciation, correct?

mikelove said:
Sure, as a user dictionary you'd just prepend a @ to each of the readings so Pleco wouldn't try to parse them as Pinyin.

Sorry for the late answer. I agree, they would be of limited use without romanization. The workaround using @ would work as a makeshift solution I guess. In that case it would only be useful if the tone diacritics were changed to numbers, though, because at least one of them (the above-stroke for tone 8 (陽入) in Minnan, as in a̍h 鴨) are impossible to enter without special Minnan keyboards (which are hard to find for mobile devices and often less than optimal to use). On the long run, it would of course be nice to have proper romanization recognition along the lines of what is possible for Cantonese already. That would also make it easier to include custom dictionaries (there is for example a database for the super-extensive 台日大辭典 which might even be public license by now (it was compiled in 1931). But I understand that the demand for that is not quite as high as for Cantonese. Hope it is on the list somewhere though

alex_hk90 · Aug 25, 2015

ACardiganAndAFrown said:
MoE's Minnan and Hakka dictionaries!

Do you have links to the data in a usable format and some general information? I don't really know anything about Minnan or Hakka.

Official MoEDict Pleco Release

皇帝

榜眼

状元

榜眼

榜眼

Attachments

皇帝

榜眼

进士

皇帝

状元

状元

状元

皇帝

状元

皇帝

状元

状元

状元

榜眼

状元