79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

Shun · Sep 22, 2019

Hi lamington,

you’re welcome! That‘s exactly it, sorry about that; I will upload a corrected file in about two hours. Now you could either delete the category or select Undo last import in Import Flashcards to correct the error.

Cheers,

Shun

lamington · Sep 22, 2019

The undo function is very useful

But Shun, please don't spoil your afternoon/evening on this - I'm not in a hurry, and I'm just one person asking about it.

Shun · Sep 22, 2019

It is!

Thanks, I’ll easily be able to fit it in, as it only takes 2-3 minutes.

Shun · Sep 22, 2019

I’ve replaced the file with a corrected version, you can download it from the first post:

Post #119

leguan · Oct 20, 2019

Interesting article on algoritmically weeding out poor quality sentences

:
https://blog.dong-chinese.com/2019/10/19/high-quality-Chinese-English-sentences.html

Shun · Oct 20, 2019

Hi leguan,

many thanks!

One could also use a good dictionary and check how many of the words in the sentence of one language match up with a word in the corresponding sentence of the other language, then divide that number by the sentence length. Maybe we could try this with the CC-CEDICT or HanDeDict once. This should allow us to spot mismatched sentences. We could try removing the last 1-3 letters in this comparison to account for word inflections. But to use TensorFlow would, of course, be more on the cutting edge.

In any case, it's good to see that Tatoeba ranks highest in sentence quality of all his sources.

Cheers,

Shun

shaoguan · Mar 22, 2020

Hello,
Is there a way to add these sentences to the Pleco example sentences database ?
As a french, it would add some much needed examples.

Thanks for your work, thanks for sharing it.

Shun · Mar 23, 2020

Hi Shaoguan,

glad you like them! I assume that the goal is to add the sentences to the SENTS tab under Dictionary. It may be possible to add all Chinese-French sentence pairs as example sentences in a user dictionary and have Pleco recognize them as such.

In a first step, I believe you would have to generate the user dictionary entries, linking one or more example sentences to each Chinese dictionary headword. However, I'm not sure what the internal tags are that are necessary for Pleco to add them to the example sentences database, or whether Pleco accepts them if they come from a user dictionary.

So for this, we'd clearly need help from @mikelove. It may even be worth a thought for Pleco to cooperate with Tatoeba.org, harnessing their open source example sentence database with translations in all major Western languages, and including them in an upcoming major release for everyone.

Best, cheers,

Shun

shaoguan · Mar 23, 2020

Hi Shun,

Yes I managed to create user dictionaries with the sentences as headwords.
I still didn't manage to create new entries in the SENTS tab.

Yes, guidances on adding new sentences in SENTS and/or collaboration with tatoeba.org would be very nice for everybody.
And ultra-nice for non english ~~speakers~~ natives!

Thanks for your help.

mikelove · Mar 23, 2020

Not supported yet, but implemented (isn't everything?) for 4.0 - user dictionary entries can have different semantic categories and 'sentence' is one of them and that makes them accessible through a sentence search.

sourlearn · May 25, 2020

Hello,

Reading through the thread I just want to thank everyone for their hard work with this.

I was just wondering what the most up to date version of the deck/ sentences is? Is it the one in the OP main post? I just want to make sure I download the correct deck.

Thank you

Shun · May 25, 2020

Hi sourlearn,

you're welcome! I remember it as a very fun project, one that we need more of! It was all thanks to @leguan's idea. I would download the following files: (post #83)

79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

Hi leguan, that sounds excellent, I'm all for a clear thread structure. So I will use another thread to answer your post and edit in a link to the new thread here: HSK Difficulty Thread Best, Shun

www.plecoforums.com

If you're a beginner, I would download the following file for Chinese-English: (Post #88. It includes a HSK Level 2 category, as well, and a few more sentences.)

79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

Hi leguan, that sounds excellent, I'm all for a clear thread structure. So I will use another thread to answer your post and edit in a link to the new thread here: HSK Difficulty Thread Best, Shun

www.plecoforums.com

If you only need Chinese-English sentence pairs, I would only download the latter file.

Enjoy learning, and tell us how you fared,

Shun

sourlearn · May 25, 2020

Thank you very much for the reply! I'm very excited to use this.

Like you, I think we need more of this. Looking forward to contributing in the future

shaoguan · Dec 8, 2020

Shun said:
Hi leguan,

many thanks! Python seems to be quite ideal for this type of task. Thanks also for the excellent "sentence contextual" idea of replacing one Chinese word with its pinyin. As stated previously, it trains your Hanzi writing skills, and at the same time your Chinese reading skills and general passive vocabulary knowledge.

Best, Shun

Hello !

Would you happen to have the french deck without the "sentence contextual" idea ?
Just with the sentences in french and chinese.
If yes, could you share it please ?

Shun · Dec 8, 2020

Hello shaoguan!

Sure, your wish is my command.

I attach the Chinese-French sentence pairs, separated by one tab character. If you wish to modify the list and import it into Pleco, it's necessary that you search and replace every single tab character by a double tab character, so that the pinyin field is bypassed and stays empty. If you wish, I could also order the Chinese-French sentence pairs by difficulty level using the rating algorithm @leguan and I have developed. Feel free to tell me if you would like me to do that.

Attribution: The sentences were taken from Tatoeba.org in October 2020.

Enjoy,

Shun

shaoguan said:
Hello !

Would you happen to have the french deck without the "sentence contextual" idea ?
Just with the sentences in french and chinese.
If yes, could you share it please ?

shaoguan · Dec 8, 2020

I was just starting to read the python code but you really sped up the process for me.
Thank you very much !

Greets !

Shun · Dec 8, 2020

You're welcome! Yes, Python is a really nice language for beginners like me.

Greetings, Shun

shaoguan · Dec 8, 2020

I really just shed a little tear realising how easier it is to mentally process "french <-> chinese" instead of "french <-> english <-> chinese"
Thanks again.

EDIT:
How easier it is for me of course

Shun · Dec 8, 2020

Great! My French isn't too shabby, I had eight years of it at school, and I still nurture a personal interest in it and in French culture. I think one of the best foreign speakers of Chinese is also French, I can't remember his name now off the top of my head, but he had a lot of appearances on Chinese TV as a curiosity. He studied Chinese non-stop for five years, now he really is at native level.

shaoguan · Dec 9, 2020

Salut !
Une petite question :
Are the sentences in the file "sentences_cmn_fra_simplified_folded.txt" ordered by HSK level ?

If not, could you provide or point me to the file "hsk new.txt" mentioned in the python code please ?
So I can try to order it.

EDIT :

What do you think would be faster ?
Use REGEX to remove the pinyin and recreate the original sentences in the file "sentence_contextual_tatoeba_cn_fra_folded by HSK rating - random sentence selection.txt"
or
Order by HSK level the sentences in the file "sentences_cmn_fra_simplified_folded.txt"

79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

状元

Member

状元

状元

探花

状元

举人

状元

举人

皇帝

Member

状元

Member

举人

状元

Attachments

举人

状元

举人

状元

举人