How do I format a user dictionary txt file?

bentolsky

Member
I want to create my own user dictionary and all I can find is basic formatting for it as follows from this forum:

headword<tab>pinyin<tab>definition

I really want to format the definition field with line breaks, bold, and other stuff. I tried using html tags but that just displayed html tags and nothing else. Basically I want my entry to look like this:


Bāo
Bag


背包 Bēi Bāo : Backpack
面包 Miàn Bāo : Bread
蒙古包 Měng Gǔ Bāo : Yurt
钱包 Qián Bāo : Wallet

Thank you, this will help me a lot with studying.

Also, how do I upload a txt file that is on my computer to Pleco on my iTouch. Currently I need to upload it to a web page I maintain (a very tricky thing to do thanks to the great firewall) and then download it from there.

Ben Tolsky
 

mikelove

皇帝
Staff member
bentolsky said:
I really want to format the definition field with line breaks, bold, and other stuff. I tried using html tags but that just displayed html tags and nothing else.
There's no official system for that yet, but unofficially you can do some basic formatting with private use Unicode characters (have to use the "Insert Symbol" command in Word or the equivalent in whatever editor you have) - see this post for details.

However, if possible I would recommend developing your user dictionary in a semantically tagged format (so you'd have <definition>, <example>, etc type tags in an XML file, then write a simple XSL transformation to output a Pleco-friendly file with formatting tags), as it's likely that we will be extending semantic tagging support to user-created dictionaries in a future release. That would also protect you against any future changes to our data format, and allow you to update it to match the more standardized formatting we're planning to roll out in all of our dictionaries over the next year or so.

bentolsky said:
Also, how do I upload a txt file that is on my computer to Pleco on my iTouch. Currently I need to upload it to a web page I maintain (a very tricky thing to do thanks to the great firewall) and then download it from there.
Connect your iPod to your computer over USB, open up iTunes, click on your iPod's name on the left side of the screen, click on "Apps" at the top of the screen, click on "Pleco" in the list of files in the bottom half of the screen, and that will give you a list of files in Pleco's private directory that you can drag items into / out of.
 

bentolsky

Member
Thank you, this is exactly what I was looking for.

mikelove said:
However, if possible I would recommend developing your user dictionary in a semantically tagged format (so you'd have <definition>, <example>, etc type tags in an XML file, then write a simple XSL transformation to output a Pleco-friendly file with formatting tags), as it's likely that we will be extending semantic tagging support to user-created dictionaries in a future release. That would also protect you against any future changes to our data format, and allow you to update it to match the more standardized formatting we're planning to roll out in all of our dictionaries over the next year or so.
I'm not familiar with XSL, do you have any example code? Also, how would I run it? I could always do this in Access with VBA if I don't have the tools for XSL, in fact a database might even make this easier.

Ben
 

mikelove

皇帝
Staff member
bentolsky said:
I'm not familiar with XSL, do you have any example code? Also, how would I run it? I could always do this in Access with VBA if I don't have the tools for XSL, in fact a database might even make this easier.
XSL is kind of tricky if you don't already know it (basically you're using XML to describe a way to transform XML into different XML), though very useful in general for dealing with XML data. An export from an Access database should work too provided that you structure it nicely (though I'm not sure if Access would be able to generate the right data directly, you might have to convert it afterwords with a Perl script or, if you aren't familiar with Perl, a find-and-replace command to turn all of the <example> tags into our bold format codes).
 

Cameroon

进士
Almost 10 years have passed, but this matter didn't lose its actuality.
Are there any changes to the user dictionary import txt file formatting support?
Maybe at least the newline feeds (linebreaks like <br> tags) can be somehow applied to the source txt without xsl/xml tricks?
I'm ready to live without bolds italics etc but newlines are much crucial for readability

our bold format codes
Can we see the whole list of these format codes to apply manually (if that's possible)?

UPD: The answer is already here: https://plecoforums.com/threads/multiple-new-lines-in-user-defined-flashcards.5916/post-44863
 
Last edited:

Cameroon

进士
Adding yet one question, as it's also up to source .txt file formatting:
How to cure the wrong tone coloring in the examples below?
pleco_tonecolors.png


Seems like Pleco importing algorithm recognizes (some) digits and (some) letters as hanzi and, thus, colorizes them.
How do we need to format the input entries so that it could be distinguished?
 

mikelove

皇帝
Staff member
Generally the best bet in cases like this is to match up the number of syllables in your Pinyin - i.e. for '985 gongcheng' you would instead use 'jiu ba wu gongcheng'.
 

Cameroon

进士
that's optimal for the given example, but what if we have 16 (yiliu or shiliu? different tones ambiguity) or rather 24 (erliu or ershiliu - tones the same but new syllable appears in between)?

Or abbreviations like ABC etc?

Maybe that would be good not to tone color digits (as the ambiguity on whether number should be pronounced one-by-one as digits, or should they be spelled with thousands, hundreds and tens is hard to resolve), and as for the abbreviations (given that Pleco renders them as 'pseudo-hanzi' and colorizes them) - to generate pinyin for Latin letters? In Mandarin there are already commonly used pronunciations of Latin letters: A = ai (don't know which tone, maybe neutral) etc.
 

mikelove

皇帝
Staff member
The problem is that this really isn't consistent between words; sometimes a series of letters might be mapped to a single Pinyin syllable, sometimes one for each letter, sometimes numbers might include the shi / bai while other times it might just be a series of digits.

You can insert an upside down question mark ¿ as a 'tone skip' character - it doesn't work entirely reliably in user dictionaries so if you find any bugs with that method all I can do is apologize (we're no longer fixing any bugs this esoteric in 3.x) but it might get you a little closer to what you want.
 
Top