Sentences flashcards generator (Python script)

Dear all,

I am pleased to share with you this Python script that allows to automatically generate sentences flashcards from the Tatoeba database, that I wrote based on a previous script from and with the help of @Shun . The main features are:
  • Choice of translation language, and automatic download from tatoeba.org
  • Assessment of sentence HSK level (algorithm based on the number of HSK words and other 'rare' words), and export only selected levels
  • Copy of the Chinese sentence in the translation (as tapping the Chinese sentence will only open the character translation, instead of looking for multiple character words in the available dictionaries)
  • Possibility to overwrite the pinyin translation with the expected Chinese words, but shuffled: to be used as a hint when translating from foreign language to Chinese. If this option is not selected, the pronunciation is left blank, Pleco to automatically fill it during the import process
At this moment, it only works with simplified characters.

The script requires two additional files: hsk new.txt (the list of HSK words) and global_wordfreq.release (Hanzi only).txt (a word frequency list).
To run, the script requires the following packages to be installed: hanziconv and tatoebatools

I also attach a few examples of the exports: English (with and without shuffled words hints) and French.
As the files are too large for this forum, please use the following links:
  • The script and associated files --> here
  • The flashcards examples --> here
Any feedback, ideas of improvement, etc. is very appreciated!

Pierre
 

Shun

状元
Hi all,

that was some great work by Pierre! Everyone who would like to try studying with sentences suited to their current level, and in their own native language, should try it out. We could also create a standalone app for this, but it would be pretty large, so you have to install Python for now.

Brief instructions: If you're on a Mac, I suggest installing it through Homebrew (easily found through Google), and on Windows, you could use the "Chocolatey" package manager, or on Linux, the package manager of your Linux distribution, all of which keep Python up to date. Then, you only need to install the hanziconv and tatoebatools packages using "pip3 install <package name>". Once that is done, you can run the script using "python3 <script name>". The two associated files above need to reside in the same directory as the script file. We should be able to help out in case of any troubles.

Enjoy,

Shun
 
Last edited:

hugovth

Member
Hi!

Thanks a lot, it works perfectly! (except hsk1 that render some complex sentences, but to be honest, it does not matter)

I have small question:

How do you generate global_worldfreq and hsk new.txt ? I am willing to adapt it for the hsk 3.0 but I am not very much aware of where I can find/generate these txt.

Thanks and good job !
 

Shun

状元
Hi hugovth,

welcome; it would be wonderful to have even more programmers. If Pierre hasn't already done so, soon, we could start an open source repository on GitHub, perhaps with different forks. As it happens, just yesterday I've added an even stronger word segmenter to Pierre's script (for word reordering in the sentence) which uses 350'000 expressions for segmenting and works great.

You can get Pleco's clean, built-in HSK 3 vocabulary from here:

1CCBC3BD-D9B3-4EDA-9474-F45D36DF3ED8.jpeg C739A3CB-D4FF-47B0-9748-929056BE22AC.jpeg

and then export it. I have also attached the same list to this post (the "9levels" one), as well as yet another new "2020" HSK list with four levels ("hsk 3.txt") that I don't remember the origin of. Possibly it comes from @Weyland. The HSK rating should work even more reliably if we include these lists, or perhaps even all three lists at once. But it certainly takes a lot of testing and personal day-to-day usage to make sure that the HSK rating performs optimally.

"hsk new.txt" is the older 5,000 word HSK 2.0, which also comes from Pleco's built-in list. BCC is a frequency list that was obtained from a thread on these forums:


Feel free to come back with questions to both of us.

Have fun,

Shun
 

Attachments

  • hsk3.0-9levels-simplified.txt
    197.7 KB · Views: 65
  • HSK 3.txt
    76.1 KB · Views: 62
Last edited:
Top