79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

Shun

状元
Hi lamington,

you’re welcome! That‘s exactly it, sorry about that; I will upload a corrected file in about two hours. Now you could either delete the category or select Undo last import in Import Flashcards to correct the error.

Cheers,

Shun
 

lamington

Member
The undo function is very useful;) But Shun, please don't spoil your afternoon/evening on this - I'm not in a hurry, and I'm just one person asking about it.
 

Shun

状元
It is! ;) Thanks, I’ll easily be able to fit it in, as it only takes 2-3 minutes.
 
Last edited:

Shun

状元
I’ve replaced the file with a corrected version, you can download it from the first post:

Post #119
 
Last edited:

Shun

状元
Hi leguan,

many thanks! :) One could also use a good dictionary and check how many of the words in the sentence of one language match up with a word in the corresponding sentence of the other language, then divide that number by the sentence length. Maybe we could try this with the CC-CEDICT or HanDeDict once. This should allow us to spot mismatched sentences. We could try removing the last 1-3 letters in this comparison to account for word inflections. But to use TensorFlow would, of course, be more on the cutting edge.

In any case, it's good to see that Tatoeba ranks highest in sentence quality of all his sources.

Cheers,

Shun
 

shaoguan

举人
Hello,
Is there a way to add these sentences to the Pleco example sentences database ?
As a french, it would add some much needed examples.

Thanks for your work, thanks for sharing it.
 

Shun

状元
Hi Shaoguan,

glad you like them! I assume that the goal is to add the sentences to the SENTS tab under Dictionary. It may be possible to add all Chinese-French sentence pairs as example sentences in a user dictionary and have Pleco recognize them as such.

In a first step, I believe you would have to generate the user dictionary entries, linking one or more example sentences to each Chinese dictionary headword. However, I'm not sure what the internal tags are that are necessary for Pleco to add them to the example sentences database, or whether Pleco accepts them if they come from a user dictionary.

So for this, we'd clearly need help from @mikelove. It may even be worth a thought for Pleco to cooperate with Tatoeba.org, harnessing their open source example sentence database with translations in all major Western languages, and including them in an upcoming major release for everyone.

Best, cheers,

Shun
 

shaoguan

举人
Hi Shun,

Yes I managed to create user dictionaries with the sentences as headwords.
I still didn't manage to create new entries in the SENTS tab.

Yes, guidances on adding new sentences in SENTS and/or collaboration with tatoeba.org would be very nice for everybody.
And ultra-nice for non english speakers natives!

Thanks for your help.
 

sourlearn

Member
Hello,

Reading through the thread I just want to thank everyone for their hard work with this.

I was just wondering what the most up to date version of the deck/ sentences is? Is it the one in the OP main post? I just want to make sure I download the correct deck.


Thank you
 

Shun

状元
Hi sourlearn,

you're welcome! I remember it as a very fun project, one that we need more of! It was all thanks to @leguan's idea. I would download the following files: (post #83)


If you're a beginner, I would download the following file for Chinese-English: (Post #88. It includes a HSK Level 2 category, as well, and a few more sentences.)


If you only need Chinese-English sentence pairs, I would only download the latter file.

Enjoy learning, and tell us how you fared,

Shun
 

sourlearn

Member
Thank you very much for the reply! I'm very excited to use this.


Like you, I think we need more of this. Looking forward to contributing in the future
 

shaoguan

举人
Hi leguan,

many thanks! Python seems to be quite ideal for this type of task. Thanks also for the excellent "sentence contextual" idea of replacing one Chinese word with its pinyin. As stated previously, it trains your Hanzi writing skills, and at the same time your Chinese reading skills and general passive vocabulary knowledge.

Best, Shun
Hello !

Would you happen to have the french deck without the "sentence contextual" idea ?
Just with the sentences in french and chinese.
If yes, could you share it please ?
 

Shun

状元
Hello shaoguan!

Sure, your wish is my command. :)

I attach the Chinese-French sentence pairs, separated by one tab character. If you wish to modify the list and import it into Pleco, it's necessary that you search and replace every single tab character by a double tab character, so that the pinyin field is bypassed and stays empty. If you wish, I could also order the Chinese-French sentence pairs by difficulty level using the rating algorithm @leguan and I have developed. Feel free to tell me if you would like me to do that.

Attribution: The sentences were taken from Tatoeba.org in October 2020.

Enjoy,

Shun


Hello !

Would you happen to have the french deck without the "sentence contextual" idea ?
Just with the sentences in french and chinese.
If yes, could you share it please ?
 

Attachments

  • sentences_cmn_fra_simplified_folded.txt
    1.2 MB · Views: 550

shaoguan

举人
I was just starting to read the python code but you really sped up the process for me.
Thank you very much !

Greets !
 

shaoguan

举人
I really just shed a little tear realising how easier it is to mentally process "french <-> chinese" instead of "french <-> english <-> chinese"
Thanks again.

EDIT:
How easier it is for me of course :)
 

Shun

状元
Great! My French isn't too shabby, I had eight years of it at school, and I still nurture a personal interest in it and in French culture. I think one of the best foreign speakers of Chinese is also French, I can't remember his name now off the top of my head, but he had a lot of appearances on Chinese TV as a curiosity. He studied Chinese non-stop for five years, now he really is at native level.
 

shaoguan

举人
Salut !
Une petite question :
Are the sentences in the file "sentences_cmn_fra_simplified_folded.txt" ordered by HSK level ?

If not, could you provide or point me to the file "hsk new.txt" mentioned in the python code please ?
So I can try to order it.

EDIT :

What do you think would be faster ?
Use REGEX to remove the pinyin and recreate the original sentences in the file "sentence_contextual_tatoeba_cn_fra_folded by HSK rating - random sentence selection.txt"
or
Order by HSK level the sentences in the file "sentences_cmn_fra_simplified_folded.txt"
 
Last edited:
Top