The China Biographical Database (CBDB)

#2
User dictionary dictionary attached. Just with person names, dates and description. Conversion script:
Code:
#!/bin/sh
PLECO_NEWLINE="$(env printf '\uEAB1')"
sqlite3 20170829CBDBavBase.db -separator "$(env printf '\t')" \
        "pragma encoding='UTF-8'; select c_name_chn, '', COALESCE(c_name, '') || case
        when c_birthyear > 0 and c_deathyear > 0 then ' (' || c_birthyear || '–' || c_deathyear || ')'
        when c_birthyear > 0 and c_deathyear = 0 then ' (' || c_birthyear || '–' || case when c_birthyear < 990 then '???' else '????' end || ')'
        when c_birthyear = 0 and c_deathyear > 0 then ' (–' || c_deathyear || ')'
        else '' end || COALESCE('$PLECO_NEWLINE $PLECO_NEWLINE' || REPLACE(c_notes, x'7F', ''), '') from BIOG_MAIN"
 

Attachments

#3
Hi,

since it takes extremely long to import—about 4 hours on an iPhone 7—, I've uploaded a zipped, locked Pleco user dictionary (49.3 MB) that includes the same data to my Google Drive:

https://goo.gl/oRpjK3

I hasten to add that I don't need this dictionary at all for my purposes, but I read about how painstakingly it is being compiled from old sources. (from the Song, Yuan, and Ming dynasties) I wonder if they will ever manage to bring this huge endeavor to completion. (I guess it can never be, because the sources don't cover everything.)
 
Last edited:
#4
Hi,

since it takes extremely long to import—about 4 hours on an iPhone 7—, I've uploaded a zipped, locked Pleco user dictionary (49.3 MB) that includes the same data to my Google Drive:

https://goo.gl/oRpjK3

I hasten to add that I don't need this dictionary at all for my purposes, but I read about how painstakingly it is being compiled from old sources. (from the Song, Yuan, and Ming dynasties) I wonder if they will ever manage to bring this huge endeavor to completion. (I guess it can never be, because the sources don't cover everything.)
I'm only getting Cantonese pronunciations on the names? Is that right?
 
#5
That would be funny. Have you already compared the dictionary entries to the text file? In any case, the pronunciation field is empty everywhere. Perhaps @Peter can point to where there might be Cantonese pronunciations in the data.
 
Last edited:
#6
That would be funny. Have you already compared the dictionary entries to the text file? In any case, the pronunciation field is empty everywhere. Perhaps @Peter can point to where there might be Cantonese pronunciations in the data.
Na, I imported your .pqd file but I'm only getting Cantonese pronunciations on the names, not sure why.
 
#7
5A78972E-1CF7-464A-BF92-829C4248BD8C.png

If I Browse Dictionary Entries in the user dictionary settings, an entry looks like this, with no pronunciation. Perhaps you could post a screenshot of yours if it’s different?
 

mikelove

皇帝
Staff member
#8
There doesn't appear to be Cantonese or Pinyin here (user dictionaries don't support Cantonese so there really couldn't be Cantonese), but we offer an option to automatically generate Cantonese readings for words that don't have them, so that's probably where the Cantonese is coming from. We don't currently offer such an option for Pinyin because all of our add-on dictionaries come with Pinyin (either originally or because we added it).
 
#11
You could auto-generate the pinyin using Pleco if you import the original text file above into a Flashcards category, activating "Fill in missing fields", then export it with the pinyin, delete the category from Flashcards, and then import it as a new user dictionary, this time including the pinyin. I tried it with my secondary Android phone, but when it just aborted at around 31'000 entries (probably more Android’s fault than Pleco’s), I had to give up on it.

It'd be nice if we could get Mandarin Pinyin on this.
 
Last edited:
#12
New build of the user dictionary. This build includes computer-generated pinyin.

.txt: attached
.pqb: https://mega.nz/#!Suxk1SBb!L8nHXWfAW_ENV1a_CpBxA4URQU67r411mKdrpFWUSCc

The pinyin reading was generated by processing the headword characters and using the romanised name field to reduce candidate readings. When the algorithm encounters a character with multiple candidate readings that match the romanised name, it does not output intonation. This approach is more accurate than the Pleco flashcard pinyin generator.
 

Attachments

Top