Official MoEDict Pleco Release

Discussion in 'Pleco for Android' started by mikelove, Aug 5, 2015.

  1. mikelove

    mikelove 皇帝 Staff Member

    Check free section of Add-ons. Available on 3.2.12 too (and in the latest iOS release) but Ruby Zhuyin is only in 3.2.13.

    This proved to be a lot harder than we anticipated - some parts of the data needed extensive manual cleaning - but we're pretty satisfied with the result now; feedback certainly welcome though.

    两岸词典 should be available soon too, and we've also got another interesting free dictionary we'll be launching probably any day now.
     
    Last edited: Aug 5, 2015
  2. Peter

    Peter 探花

    Example sentences appear on the DICT page, but not on the SENTS page. No biggie, since the existing MoE import by hsk80 didn't have that either.
     
  3. alex_hk90

    alex_hk90 状元

    Thanks Mike. :)

    Out of interest, are you submitting the changes of the manual cleaning back upstream?
    Obviously it doesn't really matter to me as I'll likely only be using MoEDict in Pleco anyway. :)
     
    Abun likes this.
  4. Abun

    Abun 探花

    awesome! Excited to maybe see the Minnan and Hakka dictionaries soon as well (hopefully?)
     
  5. Taichi

    Taichi 榜眼

    I failed to remap some nonofficial Moe flash entries to the official dictionary (I attached a screenshot). Some of them are due to the problem on unofficial Moe side, but I found some (seemingly) official moe side problems as well.
    - 说人人到,說鬼鬼到, 適材适用 traditional and simplifed mixed together.
    - 一纲成擒,箱纲养殖,「三日打鱼,两日晒纲」 should be 网 not 纲.
    - 大出鋒頭,一著,抽菸 missing simplified?

    And some random questions
    - Is there any plan for bolding the word?
    - 如:「」 Make this type of example sentense playable. Maybe can omit「」and even 如:
    - I see "似 暖和 溫暖" at the buttom of "温和" definitions. This should be appended to the the first definition
    - 勞動 lao2dong4 definition has 勞動 lao2dong0 definition appended at the bottom without a separator and pronunciation . Maybe it shouldn't appear at all in the first place.
    - 塞翁失馬 shows "別人去安慰他,他卻說" as a reference
     

    Attached Files:

    Last edited: Aug 6, 2015
  6. mikelove

    mikelove 皇帝 Staff Member

    @Peter - not sure why they're not coming up, actually; we'll investigate + hopefully patch that shortly.

    @alex_hk90 - to be honest, the fixing was all done after we'd already converted the files to our own data format, so it would be difficult to back-port to something they could use, and in any event it was mostly fixing issues relevant to us and the particular requirements of our databases; ensuring that every entry had the same # of characters as Pinyin syllables, for example.

    We submitted about 600 changes to CC-CEDICT at one point to based on our conversion work on that and they couldn't use them (and didn't really have a good reason to, since they weren't particularly important or interesting to anybody other than us), so now we simply maintain our own CC-CEDICT diff file and update it every time we do a new CC-CEDICT release. We don't anticipate regular updates to the MoEDict data, so we didn't bother building an automated process for applying our diffs to that, but if they did start actively developing it we could certainly turn what we have into something automated too.

    @Abun - that one would be harder since we'd need to teach Pleco how to understand Minnan and Hakka romanization first; I assume a dictionary like this would be less-than-super-useful if it was only searchable by characters and not by pronunciation, correct?

    @Taichi - we'll recheck those simplified mappings, thanks - to be honest our primary goal there was matching simplified versions in our other dictionaries when they also had an entry for the same word (so that you'd get nice clean merged results), so we didn't put as much time into making sure the simplified versions were correct in entries that were exclusive to MoEDict, but we certainly are going to go back and clean that up. In the meantime I'd suggest that you back up your flashcard database, then use the batch command to delete all of the simplified versions in your MoEDict-based cards (so only the traditional version is left) and remap again with those - should match up more nicely that way.

    Bold headwords - yes, that's on our to-do list.

    Playable 如's - we were planning to do that in our first release but they're a gigantic pain the butt to parse (formatting is inconsistent and sometimes downright wonky); hopefully in another update or two, though.

    似: Our understanding was that synonyms/antonyms in the original file that aren't preceded by any numbers applied to the entire word, rather than a particular definition. Is that not the case?

    勞動: that one may be on the coding end, actually - weird.

    塞翁失馬: that one was a parsing issue, it looked to our system like a quotation. (they don't use 《》in those consistently, or do much of anything else to distinguish them from sentences that happen to be followed by full-width colons)
     
    Last edited: Aug 6, 2015
  7. Taichi

    Taichi 榜眼

    removing the simplified headword did the trick, thx!

    似:I see. I didn't realize the bottom ones are for the whole definitions.
    Update: It seems the unofficial Moe does place them to the first definition. For the 温和 case I think they should be for the first definition, but my Chinese isn't good enough to be sure.

    Update2: "奇怪"'s synonym "古怪" should be for the first definition. So I guess the numberless ones are for the first definition?
     
    Last edited: Aug 6, 2015
  8. giokve

    giokve 进士

    I thought they weren't supposed to be in that page since the sentences of HDC aren't there as well.
     
  9. mikelove

    mikelove 皇帝 Staff Member

    @Taichi - I see that the app does that, but the data file has quite a lot of synonyms that start with a "1." suggesting that those are identified separately. (@audreyt, any clarification?)

    @giokve - actually yeah, wasn't even intentional but we copied the formatting instructions from those directly from HDC so they ended up getting treated the same way. Not sure whether that's desirable or not, it's easy enough to hide sentences from dictionaries that you don't want to see them from that we're probably better off including them.
     
  10. IMG_0295.PNG
    getting some weird stuff at the bottom of this entry in MoE, although it looks like it's just trying to say pronunciation is ren2 xing4.

    yeah seems like [1] would refer to definition #1 - so those would only be synonyms of that definition and not subsequent meanings.
     
  11. etm001

    etm001 状元

    Thanks for adding the MoE dictionary to Pleco, it's great to officially have it (I've been using the user defined dictionary, which was really helpful too).
     
  12. Is this likely to happen? [it would be so amazing] There's also a butt-load of audio already done, right? Hakka has like five or six different audio pronunciations online. Would Pleco be able to use these?
     
  13. mikelove

    mikelove 皇帝 Staff Member

    @ACardiganAndAFrown - possible, but not a super high priority at the moment since it's not really something we'd expect to make any money on (absent a grant from the TW government or some such) and we can only afford to spend so much time per year on projects like that :)
     
  14. Kickstart it. ;)

    edit:

    is there any chance to just import the chars - first - without 'dialectical' pinyin?
     
  15. mikelove

    mikelove 皇帝 Staff Member

    Sure, as a user dictionary you'd just prepend a @ to each of the readings so Pleco wouldn't try to parse them as Pinyin.
     
  16. @alex_hk90

    Please & thanks. ;)
     
  17. alex_hk90

    alex_hk90 状元

    :confused: I haven't really been following the sub-thread. What's this about? :)
     
  18. MoE's Minnan and Hakka dictionaries!
     
  19. Abun

    Abun 探花

    Sorry for the late answer. I agree, they would be of limited use without romanization. The workaround using @ would work as a makeshift solution I guess. In that case it would only be useful if the tone diacritics were changed to numbers, though, because at least one of them (the above-stroke for tone 8 (陽入) in Minnan, as in a̍h 鴨) are impossible to enter without special Minnan keyboards (which are hard to find for mobile devices and often less than optimal to use). On the long run, it would of course be nice to have proper romanization recognition along the lines of what is possible for Cantonese already. That would also make it easier to include custom dictionaries (there is for example a database for the super-extensive 台日大辭典 which might even be public license by now (it was compiled in 1931). But I understand that the demand for that is not quite as high as for Cantonese. Hope it is on the list somewhere though :)
     
  20. alex_hk90

    alex_hk90 状元

    Do you have links to the data in a usable format and some general information? I don't really know anything about Minnan or Hakka.
     

Share This Page