Inconsistencies in HSK3.0 Export

ZellDD

Member
TLDR:

I found some inconsistencies/missing entries of specific fields in HSK3.0 when I exported the Vocabulary Set.

Like:
Exported: 骑车[騎車] qi2che1 ride a bike
Pleco Entry:
qiche.jpg


Usually, there should be a "Part of a Sentence" like Noun, Verb, ... .

As a Reference:
Exported Row Entry:
孙女[孫女] sun1nü5 noun granddaughter
Pleco Entry:
sunnu.jpg


--------------------------------------------------------------------------------------------------------------------------------------------------------
For the Nerds ;)


Currently, I´m preparing my Anki Flash Card Set using the provided Definitions by Pleco HSK 3.0.
First, I exported ALL HSK3.0 Levels to a text file with the following settings:

Screenshot_Pleco.jpg


By reverse engineering, I found the following structure:

[Hanzi] [Pinyin]
↻ [Part of Sentence] (colloquial) (Enumeration) [Meaning]
↻ ([Hanzi Example] [Pinyin Example] [Translation] (Alternative Translation)

Just for the notation not to make it complicated, [ ... ] is a necessary part of the definition, while ( ... ) is an optional part and ↻ indicates at least one or multiple elements. For reference, I formatted one entry of the text/csv file of 折:

折[折] zhe2 noun
1 discount; rebate
打八折 dǎ bā zhé give 20% discount; charge 80% of the original price
2 turning stroke
measure word
an act of zaju (杂剧)
verb
1 break; snap
折断腿 zhéduàn tuǐ fracture (or break) one’s leg
折断一根树枝 zhéduàn yī gēn shùzhī break off a branch
2 suffer the loss of; lose
损兵折将 sǔnbīng-zhéjiàng suffer heavy losses
3 bend; twist
曲折 qūzhé twists and turns
4 turn back; change direction
边界由此折向西南。 Biānjiè yóucǐ zhé xiàng xīnán. From here the boundary turns southwestward.
5 convinced; filled with admiration
心折 xīnzhé deeply convinced; filled with heartfelt admiration
6 convert into; amount to
把市斤折成公斤 bǎ shìjīn zhé chéng gōngjīn convert (or change) jin into kilograms
笔外币折成人民币是多少? Zhè bǐ wàibì zhé chéng rénmínbì shì duōshao? How much does this foreign exchange convert into Renminbi?
noun
folded booklet with a slipcase, used for keeping accounts, etc.; folder
存折 cúnzhé account book verb fold 把信折好 bǎ xìn zhé hǎo fold the letter 把纸对折起来 bǎ zhǐ duìzhé qǐlai fold the sheet of paper in two



When I filter for "Part of a Sentence", a lot of entries do not match the [Hanzi] [Pinyin] [Part of Sentence], which makes it hard for me to import them correctly to Anki/other Tools, because there should be another tab space (e.g. 4 space characters) to identify an missing entry [Part of Sentence]:

For Instance:
总是[總是] zong3shi4 always

Should be (by my understanding):
总是[總是] zong3shi4 always

This does not happen with small friction of entries, but rather with larger numbers that take a long time to correct by hand.
Did I do something wrong? Is this behavior wanted?

Thanks in advance :)
 

Attachments

  • qiche.jpg
    qiche.jpg
    304.1 KB · Views: 7

mikelove

皇帝
Staff member
To be honest, we didn't really build our current export feature around definition export - we only do it from some dictionaries and only with a single minimized line of text - so it sounds like this is pretty much working as expected. The main point of the export feature is to let you share cards with other Pleco users, where they'll be filled in from dictionary definitions on their end. So I'm afraid I don't have any recommendations for how to make this work better for your use case.

We do integrate with AnkiDroid's API directly on Android, but we built that around adding single words from the dictionary/reader rather than exporting premade lists in bulk, since we felt like people would generally be able to find premade Anki decks for any particular word list they were interested in and that the main thing we could contribute would be the live bookmarking of words encountered while using Pleco. In theory if we got a lot of requests for it we could consider adding a bulk export feature too, but we haven't seen much interest in it so far.
 

ZellDD

Member
To be honest, we didn't really build our current export feature around definition export - we only do it from some dictionaries and only with a single minimized line of text - so it sounds like this is pretty much working as expected. The main point of the export feature is to let you share cards with other Pleco users, where they'll be filled in from dictionary definitions on their end. So I'm afraid I don't have any recommendations for how to make this work better for your use case.

We do integrate with AnkiDroid's API directly on Android, but we built that around adding single words from the dictionary/reader rather than exporting premade lists in bulk, since we felt like people would generally be able to find premade Anki decks for any particular word list they were interested in and that the main thing we could contribute would be the live bookmarking of words encountered while using Pleco. In theory if we got a lot of requests for it we could consider adding a bulk export feature too, but we haven't seen much interest in it so far.

Hi Mike, thanks for your reply.

Just to share my perspective — the main reason I’d like to have a bulk export option is:
  1. I’ve seen many Anki decks with incorrect pinyin and missing definitions. I’d prefer to start with a reliable source like Pleco.
  2. Even though you’re working on customization for cards/entries (as you mentioned), I prefer editing cards on a PC — it’s much faster than on a phone. Anki offers excellent customization for my needs.
  3. I also think users would want this feature. Unlike exporting single entries repeatedly, bulk export is mostly a one-time task (e.g., exporting the full HSK list once so all vocabulary is available).

My next step will be to write a script to clean it up.
Would it be possible to share my code here in case others encounter the same issue in the future? I’m not sure what your policy is on that.

Stupid Question:

Also, I couldn’t find any recent updates about accessing audio/pronunciation (direct audio file) or dictionary entries (e.g. JSON) via URL/API on the forum, so I’m assuming nothing has changed on that front since this thread.

Thanks,
Daniel
 

mikelove

皇帝
Staff member
I’ve seen many Anki decks with incorrect pinyin and missing definitions. I’d prefer to start with a reliable source like Pleco.
Sure, but for HSK 3.0 specifically there are numerous other reliable decks out there, not to mention that you can always fill in data from an open-source dictionary like CC-CEDICT or Wiktionary.

My next step will be to write a script to clean it up.
Would it be possible to share my code here in case others encounter the same issue in the future? I’m not sure what your policy is on that.
As long as posting the code here isn't violating anybody's intellectual property rights, sure.

Also, I couldn’t find any recent updates about accessing audio/pronunciation (direct audio file) or dictionary entries (e.g. JSON) via URL/API on the forum, so I’m assuming nothing has changed on that front since this thread.
No, the constraint there remains that we are - financially - mostly an iOS company and iOS doesn't really allow for any meaningful use of inter-app APIs. Also, in the case of audio at least, AnkiDroid offers easy integration with the third-party TTS of your choice, and in the case of dictionary entries we would not easily be able to offer most of them through an API anyway.

To be honest, a lot of what you're talking about is scope creep, for an app that already tries to do way too many things at once - it's all we can do to do a halfway decent job with our existing functionality.
 
Top