Romanizations for Hokkien / Hakka / Wu / Sichuanese

Hydramus

Member
Thanks very much for all of this detailed info.

Mikelove, quick question. I've found your example on making custom dictionary text files, but does Pleco supply some template that I could work with? If I want to help with my own custom Hong Kong Hakka dictionary (and potentially add recordings!), where should I start? I've not read any support for sound files either since it wouldn't have a google dictation, it would have to be sound recordings done by a person.
 

mikelove

皇帝
Staff member
No templates at the moment, no - it's all a little up in the air still. Basically if it's a well-structured tab- or comma-delimited file you should be able to wrangle the import screen into doing something sensible with it, and it may even detect the format automatically to some extent.

We do support embedded audio files, yes - *tentatively* I think you'd want to put in a relative path to them as a Markdown embed link, ![](path/to/my-file.extension). However, I'm not sure how well-supported those would be in user dictionary entries - the only simple thing we've built that uses them so far is flashcard tests. They do generally get rendered into tappable audio icon links (even in the current version) so if you include them embedded like this at the top of your dictionary definition (try this in a small file first) it should be possible to tap on an audio icon in that spot to play the linked audio.
 

Hydramus

Member
No templates at the moment, no - it's all a little up in the air still. Basically if it's a well-structured tab- or comma-delimited file you should be able to wrangle the import screen into doing something sensible with it, and it may even detect the format automatically to some extent.

We do support embedded audio files, yes - *tentatively* I think you'd want to put in a relative path to them as a Markdown embed link, ![](path/to/my-file.extension). However, I'm not sure how well-supported those would be in user dictionary entries - the only simple thing we've built that uses them so far is flashcard tests. They do generally get rendered into tappable audio icon links (even in the current version) so if you include them embedded like this at the top of your dictionary definition (try this in a small file first) it should be possible to tap on an audio icon in that spot to play the linked audio.

Ah that's a shame. I'll try with the tab split. Do I still need to add the "<>" as well after every definition if a word has multiple?


Also...
Thanks. Do you know if HK Hakka can be expressed accurately using the MOE system?

For sandhi in Mandarin and tone changes in Cantonese, we've adopted the approach of letting you search for either or neither - we certainly could support entering both tone numbers, but I'm not sure how many people would be interested in looking up words that way or how much a search might be narrowed down by that versus only entering one.

Oh I found an area for Sandhi rules in hakka too that you might find helpful.

Typical Tone Sandhi Rules​

For the majority of Hakka dialects, which belong to the JiaYing subgroup, tone sandhi occurs on the first syllable if the following syllable is of a lower tone for some tones only. For example, if [xn] is a syllable with tone n, the for the Jiaying Hakka subdialect of Shatoujiao (ShaTauKok, SaThewKok, SaTdiuGok, SaTeuGog):


1724020532419.png
 

Member
Using the imported dictionary MoE-Minnan for Hokkien but using the bopomofo display setting for headwords, the pronunciation for a lot of Hokkien words is displayed in a mixed Tâi-lô and bopomofo that makes it unreadable: is there something I'm doing wrong or it is a bug?
 

mikelove

皇帝
Staff member
No, it's not really supported in the current build, and it's going to end up garbled because it's trying to parse it as Pinyin.

This will be getting much better in our next beta update, but in the meantime my best suggestion would be that you put your Hokkien readings in the definition part of the entry instead.
 
  • Like
Reactions:
Having Hokkien romanization would be great. But not sure how it would be done, because you can’t just put a Hokkien romanization next to a regular Chinese character word for every word. For example having tio̍h-kiann as the Hokkien romanization for 害怕 wouldn’t make sense, because the characters used are usually 著驚. You would have to have a whole seperate dictionary. If you just assigned hokkien pronunciations to regular Chinese character words it would be kind of useless because for many of those words, that’s not how people in real life say them. Maybe you already have a solution to this, and I would love to hear it. Another issue is actually getting the dictionary data, I’ve never seen an English to Hokkien dictionary that is actually accurate for modern day Hokkien, at least Hokkien that is spoken in Taiwan.
 

Hydramus

Member
Having Hokkien romanization would be great. But not sure how it would be done, because you can’t just put a Hokkien romanization next to a regular Chinese character word for every word. For example having tio̍h-kiann as the Hokkien romanization for 害怕 wouldn’t make sense, because the characters used are usually 著驚. You would have to have a whole seperate dictionary. If you just assigned hokkien pronunciations to regular Chinese character words it would be kind of useless because for many of those words, that’s not how people in real life say them. Maybe you already have a solution to this, and I would love to hear it. Another issue is actually getting the dictionary data, I’ve never seen an English to Hokkien dictionary that is actually accurate for modern day Hokkien, at least Hokkien that is spoken in Taiwan.

Well... from my perspective for Hakka which has quite a bit of different vocabulary too, I would still prefer to see the pronunciation for the actual characters. For separate vocabulary, you still need to create another dictionary. Cantonese already has plenty of these dictionaries with their own vocabulary. I run this side by side the standard pleco one too. It means you have many words for the same english definition but you can see the dictionary source on the right-hand side anyway so it does work.
 

mikelove

皇帝
Staff member
The plan data-wise is to offer existing open-source dictionaries, mostly from the MoE. We aren't currently planning to add Hokkien or Hakka to existing dictionary entries, and I don't think we're likely to add the ability to automatically generate them either, at least not initially - basically, we want to facilitate people putting whatever data they have for other romanizations in Pleco and make it possible to search / test flashcards / etc with them, but that's about all we currently think we can do absent discovering some vast new trove of dictionary data for them.
 
Top