TOCFL Levels 1-5 in tab separated text files - Please help me covert

Mer · May 8, 2015

Here are the TOCFL levels in 5 different text files: tab separated text files.
As is, they can be used in Anki but need to be formatted for Pleco.
If someone knows how to convert them to Pleco format that would be awesome.

I modified the lists to include Bigrams only and removed parts of speech because it's pretty easy to understand part of speech from context. I removed the non bigrams because I know most of them already. Sorry about that for anyone who may have wanted them but there weren't that many anyway.

Thanks

PS:
If you're studying traditional in Taiwan, I recommend using the TOCFL lists instead of Practical Audio Visual Chinese because after book 3 in the series the books start including >50% outliers which really hinders progress. Books 1, 2 and 3 are quite good though with book 1 being a great beginner book.

Mer · May 8, 2015

Here they are in Pleco Format. I figured out the formatting. [Just add "// Title Bla bla bla"] to the first line of the tab separated file.

example

Shun · May 8, 2015

Hi Mer,

thanks for these useful lists. It's good to see the slight differences in Taiwanese Mandarin. Since Pleco has three fields per card (Hanzi, pinyin and the English definition), you would need another <tab> character after the pinyin instead of a space. This should be possible to correct using regular expressions, perhaps someone knows how to do it?

Cheers, Shun

alex_hk90 · May 12, 2015

Shun said:
Hi Mer,

thanks for these useful lists. It's good to see the slight differences in Taiwanese Mandarin. Since Pleco has three fields per card (Hanzi, pinyin and the English definition), you would need another <tab> character after the pinyin instead of a space. This should be possible to correct using regular expressions, perhaps someone knows how to do it?

Cheers, Shun

This should get you fairly close:

Code:

sed 's/\(\w*\)\s\(\w*\)\s\(.*\)/\1\t\2\t\3/g' Input.txt > Output.txt

I tried with one of the files and it looks more or less there with just a bit of manual clean-up required after.

Shun · May 13, 2015

Thanks a lot! I tried running sed on OS X both with and without the -E option. If run it without the -E option, it seems not to change anything; if I use the -E option, I get this error:

sed -E 's/\(\w*\)\s\(\w*\)\s\(.*\)/\1\t\2\t\3/g' test.txt > test-out.txt
sed: 1: "s/\(\w*\)\s\(\w*\)\s\(. ...": \1 not defined in the RE

alex_hk90 · May 13, 2015

Shun said:
Thanks a lot! I tried running sed on OS X both with and without the -E option. If run it without the -E option, it seems not to change anything; if I use the -E option, I get this error:

sed -E 's/\(\w*\)\s\(\w*\)\s\(.*\)/\1\t\2\t\3/g' test.txt > test-out.txt
sed: 1: "s/\(\w*\)\s\(\w*\)\s\(. ...": \1 not defined in the RE

Mac has a different (BSD-based) version of sed - have a look at installing GNU sed and that should work.

Shun · May 13, 2015

Something to keep in mind.

I'll do it in 3 weeks on my experimental machine which has Xcode on it. (then I'll have enough time) Or if you like, you could PM the output files to me for cleanup & I'll post them here.

alex_hk90 · May 13, 2015

Shun said:
Something to keep in mind. I'll do it in 3 weeks on my experimental machine which has Xcode on it. (then I'll have enough time) Or if you like, you could PM the output files to me for cleanup & I'll post them here.

I don't think you can attach files to a PM so I've uploaded them here. I'll remove the attachment (EDIT: now removed) when you've posted the cleaned-up version.

Shun · May 14, 2015

Perfect, thanks! I couldn't replace a "comma + any letter" sequence by "comma + <space> + any letter", but it should definitely be usable now. (EDIT: See last post.)

alex_hk90 · May 14, 2015

Shun said:
Perfect, thanks! I couldn't replace a "comma + any letter" sequence by "comma + <space> + any letter", but it should definitely be usable now.

This should do it:

Code:

sed 's@\([a-z,A-Z]\),\([a-z,A-Z]\)@\1, \2@g' Pleco\ TOCFL\ bigrams\ L05b.txt > Pleco\ TOCFL\ bigrams\ L05c.txt

Shun · May 15, 2015

Excellent, thanks for making it work on BSD, here's the final, cleaned-up version:

TOCFL Levels 1-5 in tab separated text files - Please help me covert

Mer

Member

Attachments

Mer

Member

Attachments

Shun

状元

alex_hk90

状元

Shun

状元

alex_hk90

状元

Shun

状元

alex_hk90

状元

Shun

状元

alex_hk90

状元

Shun

状元

Attachments