Add CedPane dictionary to Pleco?

ssb22

Member
My CedPane project (see http://ssb22.user.srcf.net/gradint/cedpane.html) currently has about 14,000 words which I have confirmed from multiple independent sources to be in the public domain but which are not yet in typical dictionaries. They are mostly transliterations of non-Chinese people and place names, which I hope will be useful in Pleco Reader.
Users can already install CedPane as a user dictionary if they have the Flashcards add-on (Pleco .pqb files are provided at the above URL), but it would be more convenient if CedPane were made available in the bundled list of free dictionaries, not least because then updates can be done more automatically (currently users have to periodically update manually, and I don't suppose many of them do; if Pleco put it in with their free dictionaries then presumably any updates can be pushed out with the help of Pleco themselves).
So what do I have to do to get in to Pleco's free dictionary list?
(Edit: updated URL)
 
Last edited:

mikelove

皇帝
Staff member
No specific process, we just have to decide it's worth the time to convert + update + maintain. This certainly looks like it might be interesting, but we've got a pretty spectacular backlog of work at the moment so it would likely be while before we can find the time to get it converted - check back with me once the first 4.0 beta is out.

We're also considering adding the ability for user dictionaries to come with their own update URLs (like an extremely minimal package repository system) - if a user has installed your dictionary then when checking for addon updates we'd post a query to your server with the current ID / version of the dictionary + current version and platform of Pleco and your server would respond back with a URL / date / file size / checksum if there was an update available. Would that be helpful / interesting?
 

ssb22

Member
Thanks. And yes the ability for user dictionaries to come with their own update URLs would also be good, especially for work-in-progress projects where distribution permissions have not been fully confirmed beyond a select group of beta testers.
 

pdwalker

状元
if a user has installed your dictionary then when checking for addon updates we'd post a query to your server with the current ID / version of the dictionary + current version and platform of Pleco and your server would respond back with a URL / date / file size / checksum if there was an update available. Would that be helpful / interesting?

Absolutely! It would allow people to create their own dictionaries that you wouldn't have to manage, like we were discussing in the other thread (dictionaries for food, slang, etc, etc). Much less time consuming on your part once the feature is coded.

Also, it's potentially a good way for me to keep my own dictionary in sync between devices (which is something I don't really bother to do at the moment).
 
Just an update on this: the people.ds.cam.ac.uk server was permanently shut down last Friday and CedPane is now at http://ssb22.user.srcf.net/gradint/cedpane.html (and is approaching 28,000 words). It would be very nice indeed if it were one of the free dictionary options in Pleco:)

I downloaded your 2019-10-03 db; it looks like you need to upgrade database and lock database in Pleco (manage dicts) and then export the pqb files.
 
My CedPane project (see http://ssb22.user.srcf.net/gradint/cedpane.html) currently has about 14,000 words which I have confirmed from multiple independent sources to be in the public domain but which are not yet in typical dictionaries. They are mostly transliterations of non-Chinese people and place names, which I hope will be useful in Pleco Reader.
Users can already install CedPane as a user dictionary if they have the Flashcards add-on (Pleco .pqb files are provided at the above URL), but it would be more convenient if CedPane were made available in the bundled list of free dictionaries, not least because then updates can be done more automatically (currently users have to periodically update manually, and I don't suppose many of them do; if Pleco put it in with their free dictionaries then presumably any updates can be pushed out with the help of Pleco themselves).
So what do I have to do to get in to Pleco's free dictionary list?
(Edit: updated URL)
This looks very promising :) Reminds me of a hacky project that I started a while back and didn't completely finish:https://github.com/danielt998/CC-CEDICT-additions to extract proper nouns and stuff like that automatically from Wikidata. Obviously the piniyn is unreliable as it was automatically generated, but just wanted to point you towards it in case you thought something like this could be of any assistance when generating names and such
 

ssb22

Member
Thanks Daniel. I did look at Wikidata before, and it needs a lot of proofreading. Apart from making sure the pinyin is right, there's also making sure it's spaced correctly when it's a multi-word phrase (the CEDICT format doesn't seem to "do" word-separation in phrases but I do), making sure to use capital letters in the right places (including when it's a multi-word phrase), making sure the Wikidata correspondence is actually correct (I did run into some instances where they'd simply linked you to the nearest related Chinese article which wasn't quite the same word), making sure the term the Wiki editors used is actually the term in common use (it usually is, but there are odd cases where Wiki article titles have been made more unique, usually to distinguish them from other Wiki articles, and if I don't see web search results other than on Wikimedia and its copies then I won't think it's suitable to plonk in a dictionary), etc. But I did get some proper names out of it, especially by looking at the Chinese article titles that use mid-dot (·) to separate first and last names of people, separating those out and sorting by most-frequently-used first/last name so I can get more "bang for your buck" doing manual editing. Most CedPane words are still from things I happened to come across in what I read though.

For Pleco it probably would be possible to turn a public-domain Wikidata dump into a fallback English-to-Chinese dictionary with no pinyin, but I'm not sure how useful Wikidata would be on the Chinese-to-English side without extensive editing (so please add CedPane first:))
 
Yeah I use it as a fallback right now which is somewhat useful but needs to be taken with a pinch of salt. One of my ideas was to curate it and create a proper noun dictionary from it but looks as though you've already done that;). I'm happy to try to help out with cedPane a little if you accept PRs btw
 

ssb22

Member
Thanks, no need to make a pull request, just send me words :) as the master is a private Wenlin database and I tag up which parts are OK to auto-extract, so if I had a pull request I'd have to back-port it into the Wenlin database anyway.
 

ssb22

Member
Just an update: made it past the 40,000 entries mark today:) Would be great if this could be converted so our Pleco-using friends can find it under Add-ons / Free Dictionaries instead of having to download my .pqb files. Let me know if there's anything I can do my end to make the conversion easier (preferably something I can automate in a script:))
 

mikelove

皇帝
Staff member
No, it's just that any time we want to add a new dictionary to our catalog there are a whole bunch of steps (as with a lot of things in Pleco, the current system was never designed to handle as many dictionaries as it now is) and it's hard to justify spending hours of programming time for that when CedPane is already available as a user dictionary to those who want it.

We still intend to support it eventually but, as I indicated before, probably not until after 4.0 beta is out.
 
Just an update: made it past the 40,000 entries mark today:) Would be great if this could be converted so our Pleco-using friends can find it under Add-ons / Free Dictionaries instead of having to download my .pqb files. Let me know if there's anything I can do my end to make the conversion easier (preferably something I can automate in a script:))

I know I mentioned this before but I'd still recommend you upgrading the format, adding full-text index & locking the database of the pqb files before uploading them to your site. It looks like the format has already been "upgraded" on the EC, though.
 
Top