Chinese Text Project Dictionary; online dictionary searches

bokane

举人
I hadn't checked the dictionary over at CText.org for a few years, and it turns out that it got really good while I wasn't looking! It's still limited only (?) to single characters, and the data is still very uneven -- some words have really good entries; others less so -- but those usage examples are awesome. I think this has got the potential to be a really useful resource -- the inclusion of Kangxi/Guangyun/Shuowen entries in a single place makes this useful even in cases where there aren't as many usage examples as one would like.

Would there be any way of using this within Pleco? Downloading the dictionary and packaging it as a file wouldn't make a lot of sense, since it's constantly evolving -- but might it be possible at some point (much further down the line, obviously) to search both local and online dictionaries within the Pleco interface? Even just generating a search URL along the lines of http://ctext.org/dictionary.pl?if=en&char=%s would be a time-saver.
 

mikelove

皇帝
Staff member
Well Shuowen / Guangyun / Kangxi at least are all public-domain, so we could probably just incorporate those into Pleco directly at some point. (indeed it's a long-standing to-do list item though finding clean electronic texts for them is a challenge) And/or we could look at making a deal to distribute Ctext's data (think I approached them a few years ago but never heard back, could try again). I don't view constant updates as a major problem since we've gotten pretty efficient at converting databases; can do a new CC-CEDICT build in about 5 minutes, the only reason we don't do daily updates is that the bandwidth charges would be murder.

Integrating online search is also on our to-do list; main question would be one of overburdening a small site, nobody at say Youdao would mind much if we started sending large volumes of traffic towards them but Ctext is a different story.

EDIT: it actually looks like I did hear back from them (in 2012) but they weren't interested in doing anything that wasn't web-based at that point.
 
Last edited:

alex_hk90

状元
I don't view constant updates as a major problem since we've gotten pretty efficient at converting databases; can do a new CC-CEDICT build in about 5 minutes, the only reason we don't do daily updates is that the bandwidth charges would be murder.
Interesting - is there no way you could just distribute the diffs, or would that still be a huge amount of bandwidth?
 

mikelove

皇帝
Staff member
The bandwidth would be small, but the problem is that our database indexes are both very complicated and not conducive to simple updates (would have to rebuild them from scratch) so the time involved + battery life wasted on rebuilding them would be problematic. If we actually did want to make daily CC updates available we'd probably just do some sort of modest subscription fee to pay for the bandwidth. (heck, even $1/month would do, doesn't need to be a profit maker but simply a way of limiting updates to people who actually need them)
 
Top