Help Needed with Importing BKRS Chinese-Russian Dictionary

kasim · Apr 20, 2024

Hello Pleco Community,

I am experiencing difficulties with importing the BKRS dictionary (the largest Chinese-Russian dictionary available) into the Pleco app and am hoping for some guidance. Here’s a breakdown of the issues and my setup:

1. Dictionary Structure and Import Settings(Settings before importing 1; 2.PNG, BKRS Sample Text.PNG, Settings that I have no clue what they change.PNG):

The BKRS file is structured with fields for Chinese characters, Pinyin, and definitions in Russian. From my attempts, I understand that the first field for Chinese characters should be set to ‘Simplified’, and the second field for Pinyin should be set as ‘Mandarin’. However, I encountered a problem with making the Russian definitions searchable. Initially setting the third field to ‘English’ didn't allow searches in Russian. Changing the setting to ‘Entry body' made it searchable but led to display issues across devices.

2. Display Inconsistencies Across Devices (3 screenshots - Display Inconsistencies iPad, iPad 2, iPhone):

When I configure the output to 'entry body', the dictionary appears differently on my devices. On my iPhone, dictionary entries show up as empty(), and on my iPad, Russian definitions incorrectly appear as Pinyin. It seems there may be a misinterpretation or mishandling of the data.

3. Potential Bugs with Column Settings:

I've noticed potential bugs in how settings are managed within the app. For example, changing the setting 'split to second field' resets previous settings for the columns back to default. Additionally, under 'output to field' set to Mandarin, the 'parse as' options for tone numbers and symbols seem to be reversed: selecting tone marks displays numbers and vice versa.

I am also including a video and screenshots of my settings to better illustrate these issues. I am also sending an extract from the dictionary for you guys to see whether the file is corrupted or has formatting issues(whole file is too big to upload on a website). I’ve been working to resolve this for hours with no success and would greatly appreciate any help or suggestions from the community.

Moreover, if these issues can be resolved, I believe adding this Russian dictionary to currenctly supported English, German and French to Pleco by default would greatly benefit users from Central Asia, Eastern Europe, and Russia, given the dictionary's extensive entry count of 228,000 and given that these regions are all very accustomed to Russian language.

Thank you in advance for your assistance!

Best regards,
Kasim

Shun · Apr 20, 2024

Hi Kasim,

you could try the following settings at the top (that worked for me before with that kind of format), if you haven't already:

Kind regards,

Shun

kasim · Apr 20, 2024

Hi Shun, thanks for your swift reply! Could you please send also the full version with the second part of the setting(for fields, columns etc.)? On my side I can see only one screenshot, which doesn't really tell me much, because the setting before importing a dictionary is quite extensive.

Is this on the 4.0 beta version or you using Legacy?

Do you think the format of the document is fine? I also had a thought that maybe it is because the file is so big, that it may cause trouble(228.000 entries). What do you think about that?

Shun · Apr 20, 2024

Hi Kasim,

you're welcome! The fine-grained options that you filmed disappear with that setting:

It seems to be a kind of compatibility setting for Pleco 3.2 input files.

If you like Russian-Chinese example sentences (a lot of them), before ChatGPT was born, allow me to recommend this thread to you:

79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

Dear all, here is an archive containing translated Chinese sentences in the following language pairs, ready for importing into Pleco: Chinese-English 41,955 sentences Chinese-French 15,740 sentences Chinese-German 4,566 sentences Chinese-Italian 3,800 sentences...

plecoforums.com

Shun

Shun · Apr 20, 2024

kasim said:
Hi Shun, thanks for your swift reply! Could you please send also the full version with the second part of the setting(for fields, columns etc.)? On my side I can see only one screenshot, which doesn't really tell me much, because the setting before importing a dictionary is quite extensive.

Is this on the 4.0 beta version or you using Legacy?

This is also on the Build 25 beta.

kasim said:
Do you think the format of the document is fine?

Yes, it looks great. I once had issues with the third tone on top of the "i" (there was a round one which didn't work), but they look fine on your screenshots.

kasim said:
I also had a thought that maybe it is because the file is so big, that it may cause trouble(228.000 entries). What do you think about that?

Pleco can handle that very easily, especially if it's a user dictionary. Sorry, my screenshots came from Flashcards, but they look almost the same.

Cheers, Shun

mikelove · Apr 21, 2024

Thanks, we'll investigate the issues here.

On the other point about distributing BKRS, I don't think we can do that ourselves because my understanding from previous discussions with people about BKRS is that the copyright status of its content isn't totally clean / clear, but in principle it should work fine as a user dictionary in 4.0. As far as content officially available in Pleco, we do have a smaller Chinese-Russian dictionary among a new collection of international dictionaries we've licensed that we hope to release with or shortly after 4.0.

kasim · Apr 21, 2024

Thanks a lot Shun, the thread really helped a lot!

Additionally, I've been on the lookout for a particular resource and wondered if you might be able to help. Do you know of any databases or electronic resources for Chinese-Slovak or Chinese-Czech dictionaries? I've been searching for such tools for some time but haven't found anything available for the computer yet. I'm not even sure if electronic versions exist. Any guidance or information you could share would be greatly appreciated.

Thanks again for your invaluable assistance!

Shun · Apr 21, 2024

Hi Kasim,

that's great, you're very welcome! I unfortunately don't know any digital Chinese-Slovak or Chinese-Czech dictionaries, but you can get sentence pairs for these languages from here (I also attached them; they're on a Creative commons license):

Download sentences - Tatoeba

tatoeba.org

There are only about 100 (Chinese-Czech) and 800 (Chinese-Slovak) sentence pairs in their database right now, though.

Enjoy,

Shun

kasim · Apr 23, 2024

Thank you Shun for the materials, they are super helpful!

I would also like to ask - I've downloaded several dictionaries that I'd like to import into Pleco as user dictionaries. However, they are currently in DSL format, among others, and need to be in .txt format for compatibility. Has anyone here encountered this issue and found a tool or method to convert these files effectively? Any recommendations or guidance would be greatly appreciated as I navigate this.

Thank you in advance for your help!

Shun · Apr 23, 2024

Hi Kasim,

you're welcome! I've found this Python package which converts DSL dictionary files to HTML. From HTML, the path to tab-delimited text shouldn't be long (Some HTML parser like "Beautiful soup" could be used afterwards):

GitHub - Crissium/python-dsl: Python module for converting DSL dictionary texts into HTML

Python module for converting DSL dictionary texts into HTML - Crissium/python-dsl

github.com

If you wish, you could also send me the DSL dictionary files by private message, and I'll see if I can extract data from them.

Shun

kasim · Apr 23, 2024

Shun said:
If you wish, you could also send me the DSL dictionary files by private message, and I'll see if I can extract data from them.

Thank you for sharing the Python package and offering to help with extracting data from the DSL dictionaries. I appreciate your willingness to assist!

I'm new to this forum and haven’t figured out how to send a private message yet—it seems like when I click on 'Start conversation' I'm just posting on your wall. To facilitate, I will provide a link and some download links where you can access the dictionaries. Would that be okay?

If so, this is the download page: https://bkrs.info/p47
Download link(older version of a dictionary in DSL format): 大БКРС (58 Мб, 3 части)

Full Chinese Russian BKRS database(up to date version): 大БКРС v240423 ; БРуКС v240423 ; Примеры v240423 (examples)

Also, have you used this Python package yourself before? Any insights on its performance or any issues to watch out for would be really helpful.

Thanks again for your help!

Shun · Apr 23, 2024

Hi Kasim,

this looks like a nice open source dictionary. I've sent you WeTransfer link by private message ("Start conversation" is private).

You're very welcome!

kasim · May 9, 2024

Thank you so much for your help!

Shun · May 10, 2024

Hi Kasim, hello @mikelove,

I tried to import the 230 MB text file with Markdown into an empty user dictionary, but unfortunately Pleco crashes as it processes the input file (before I could enter the import settings, with the 28 beta). I've sent Mike the detailed crash report and the input text file.

I could think of four possible causes for the crash:

The size of the input file overwhelms the format checker.
The pinyin uses tone marks instead of tone numbers.
The pinyin field is often empty, with an underscore "_" in the field.
I used the EAB1 character for newlines, and I've used Markdown to mark italic and boldface text. So this would be a mixture of the Pleco 3.2 and 4.0 formats. I tried replacing the EAB1 character with a backslash "\", which unfortunately didn't keep it from crashing.

Thanks, cheers,

Shun

mikelove · May 11, 2024

It's the first one, out-of-memory errors; we'll see if we can finagle a way to make a file like this usable, but 3+ million entries is a lot, and frankly, if we'd optimized around that number it would have cost us a great deal of performance among dictionaries of the smaller sizes we usually traffic in.

Shun · May 11, 2024

I see, thanks! It worked after splitting the import file up into one-million-line pieces.

Help Needed with Importing BKRS Chinese-Russian Dictionary

kasim

举人

Attachments

Shun

状元

kasim

举人

Shun

状元

79,000 Chinese-English, French, German, Italian, Japanese, and Spanish sentences

Shun

状元

mikelove

皇帝

kasim

举人

Shun

状元

Download sentences - Tatoeba

Attachments

kasim

举人

Shun

状元

GitHub - Crissium/python-dsl: Python module for converting DSL dictionary texts into HTML

kasim

举人

Shun

状元

kasim

举人

Shun

状元

mikelove

皇帝

Shun

状元