Help Needed with Importing BKRS Chinese-Russian Dictionary

kasim

举人
Hello Pleco Community,


I am experiencing difficulties with importing the BKRS dictionary (the largest Chinese-Russian dictionary available) into the Pleco app and am hoping for some guidance. Here’s a breakdown of the issues and my setup:


1. Dictionary Structure and Import Settings(Settings before importing 1; 2.PNG, BKRS Sample Text.PNG, Settings that I have no clue what they change.PNG):

The BKRS file is structured with fields for Chinese characters, Pinyin, and definitions in Russian. From my attempts, I understand that the first field for Chinese characters should be set to ‘Simplified’, and the second field for Pinyin should be set as ‘Mandarin’. However, I encountered a problem with making the Russian definitions searchable. Initially setting the third field to ‘English’ didn't allow searches in Russian. Changing the setting to ‘Entry body' made it searchable but led to display issues across devices.


2. Display Inconsistencies Across Devices (3 screenshots - Display Inconsistencies iPad, iPad 2, iPhone):

When I configure the output to 'entry body', the dictionary appears differently on my devices. On my iPhone, dictionary entries show up as empty(), and on my iPad, Russian definitions incorrectly appear as Pinyin. It seems there may be a misinterpretation or mishandling of the data.


3. Potential Bugs with Column Settings:


I've noticed potential bugs in how settings are managed within the app. For example, changing the setting 'split to second field' resets previous settings for the columns back to default. Additionally, under 'output to field' set to Mandarin, the 'parse as' options for tone numbers and symbols seem to be reversed: selecting tone marks displays numbers and vice versa.


I am also including a video and screenshots of my settings to better illustrate these issues. I am also sending an extract from the dictionary for you guys to see whether the file is corrupted or has formatting issues(whole file is too big to upload on a website). I’ve been working to resolve this for hours with no success and would greatly appreciate any help or suggestions from the community.


Moreover, if these issues can be resolved, I believe adding this Russian dictionary to currenctly supported English, German and French to Pleco by default would greatly benefit users from Central Asia, Eastern Europe, and Russia, given the dictionary's extensive entry count of 228,000 and given that these regions are all very accustomed to Russian language.


Thank you in advance for your assistance!

Best regards,
Kasim
 

Attachments

  • BKRS Sample Text.PNG
    BKRS Sample Text.PNG
    524.4 KB · Views: 120
  • Display Inconsistencies - iPad 2.PNG
    Display Inconsistencies - iPad 2.PNG
    247.1 KB · Views: 110
  • Display Inconsistencies - iPad.PNG
    Display Inconsistencies - iPad.PNG
    608.2 KB · Views: 105
  • Display Inconsistencies - iPhone.PNG
    Display Inconsistencies - iPhone.PNG
    171 KB · Views: 111
  • Settings before importing 1.PNG
    Settings before importing 1.PNG
    320.3 KB · Views: 108
  • Settings before importing 2.PNG
    Settings before importing 2.PNG
    325.5 KB · Views: 108
  • Settings that I have no clue what they change.PNG
    Settings that I have no clue what they change.PNG
    293 KB · Views: 114
  • extract from BKRS.txt
    20.4 KB · Views: 124

Shun

状元
Hi Kasim,

you could try the following settings at the top (that worked for me before with that kind of format), if you haven't already:

IMG_8569.PNG



Kind regards,

Shun
 

kasim

举人
Hi Shun, thanks for your swift reply! Could you please send also the full version with the second part of the setting(for fields, columns etc.)? On my side I can see only one screenshot, which doesn't really tell me much, because the setting before importing a dictionary is quite extensive.

Is this on the 4.0 beta version or you using Legacy?

Do you think the format of the document is fine? I also had a thought that maybe it is because the file is so big, that it may cause trouble(228.000 entries). What do you think about that?
 

Shun

状元
Hi Kasim,

you're welcome! The fine-grained options that you filmed disappear with that setting:

IMG_8570.PNG



It seems to be a kind of compatibility setting for Pleco 3.2 input files.

If you like Russian-Chinese example sentences (a lot of them), before ChatGPT was born, allow me to recommend this thread to you:


Shun
 

Shun

状元
Hi Shun, thanks for your swift reply! Could you please send also the full version with the second part of the setting(for fields, columns etc.)? On my side I can see only one screenshot, which doesn't really tell me much, because the setting before importing a dictionary is quite extensive.

Is this on the 4.0 beta version or you using Legacy?

This is also on the Build 25 beta.

Do you think the format of the document is fine?

Yes, it looks great. I once had issues with the third tone on top of the "i" (there was a round one which didn't work), but they look fine on your screenshots.

I also had a thought that maybe it is because the file is so big, that it may cause trouble(228.000 entries). What do you think about that?

Pleco can handle that very easily, especially if it's a user dictionary. Sorry, my screenshots came from Flashcards, but they look almost the same.

Cheers, Shun
 

mikelove

皇帝
Staff member
Thanks, we'll investigate the issues here.

On the other point about distributing BKRS, I don't think we can do that ourselves because my understanding from previous discussions with people about BKRS is that the copyright status of its content isn't totally clean / clear, but in principle it should work fine as a user dictionary in 4.0. As far as content officially available in Pleco, we do have a smaller Chinese-Russian dictionary among a new collection of international dictionaries we've licensed that we hope to release with or shortly after 4.0.
 

kasim

举人
Thanks a lot Shun, the thread really helped a lot!

Additionally, I've been on the lookout for a particular resource and wondered if you might be able to help. Do you know of any databases or electronic resources for Chinese-Slovak or Chinese-Czech dictionaries? I've been searching for such tools for some time but haven't found anything available for the computer yet. I'm not even sure if electronic versions exist. Any guidance or information you could share would be greatly appreciated.

Thanks again for your invaluable assistance!
 

Shun

状元
Hi Kasim,

that's great, you're very welcome! I unfortunately don't know any digital Chinese-Slovak or Chinese-Czech dictionaries, but you can get sentence pairs for these languages from here (I also attached them; they're on a Creative commons license):


There are only about 100 (Chinese-Czech) and 800 (Chinese-Slovak) sentence pairs in their database right now, though.

Enjoy,

Shun
 

Attachments

  • Sentence pairs in Mandarin Chinese-Czech - 2024-04-21.txt
    7.5 KB · Views: 129
  • Sentence pairs in Mandarin Chinese-Slovak - 2024-04-21.txt
    62.5 KB · Views: 108

kasim

举人
Thank you Shun for the materials, they are super helpful!

I would also like to ask - I've downloaded several dictionaries that I'd like to import into Pleco as user dictionaries. However, they are currently in DSL format, among others, and need to be in .txt format for compatibility. Has anyone here encountered this issue and found a tool or method to convert these files effectively? Any recommendations or guidance would be greatly appreciated as I navigate this.

Thank you in advance for your help!
 

Shun

状元
Hi Kasim,

you're welcome! I've found this Python package which converts DSL dictionary files to HTML. From HTML, the path to tab-delimited text shouldn't be long (Some HTML parser like "Beautiful soup" could be used afterwards):


If you wish, you could also send me the DSL dictionary files by private message, and I'll see if I can extract data from them.

Shun
 

kasim

举人
If you wish, you could also send me the DSL dictionary files by private message, and I'll see if I can extract data from them.


Thank you for sharing the Python package and offering to help with extracting data from the DSL dictionaries. I appreciate your willingness to assist!

I'm new to this forum and haven’t figured out how to send a private message yet—it seems like when I click on 'Start conversation' I'm just posting on your wall. To facilitate, I will provide a link and some download links where you can access the dictionaries. Would that be okay?

If so, this is the download page: https://bkrs.info/p47
Download link(older version of a dictionary in DSL format): 大БКРС (58 Мб, 3 части)

Full Chinese Russian BKRS database(up to date version): 大БКРС v240423 ; БРуКС v240423 ; Примеры v240423 (examples)

Also, have you used this Python package yourself before? Any insights on its performance or any issues to watch out for would be really helpful.

Thanks again for your help!
 

Shun

状元
Hi Kasim,

this looks like a nice open source dictionary. I've sent you WeTransfer link by private message ("Start conversation" is private).

You're very welcome!
 

Shun

状元
Hi Kasim, hello @mikelove,

I tried to import the 230 MB text file with Markdown into an empty user dictionary, but unfortunately Pleco crashes as it processes the input file (before I could enter the import settings, with the 28 beta). I've sent Mike the detailed crash report and the input text file.

I could think of four possible causes for the crash:
  • The size of the input file overwhelms the format checker.
  • The pinyin uses tone marks instead of tone numbers.
  • The pinyin field is often empty, with an underscore "_" in the field.
  • I used the EAB1 character for newlines, and I've used Markdown to mark italic and boldface text. So this would be a mixture of the Pleco 3.2 and 4.0 formats. I tried replacing the EAB1 character with a backslash "\", which unfortunately didn't keep it from crashing.

Thanks, cheers,

Shun
 
Last edited:

mikelove

皇帝
Staff member
It's the first one, out-of-memory errors; we'll see if we can finagle a way to make a file like this usable, but 3+ million entries is a lot, and frankly, if we'd optimized around that number it would have cost us a great deal of performance among dictionaries of the smaller sizes we usually traffic in.
 

Shun

状元
I see, thanks! It worked after splitting the import file up into one-million-line pieces.
 
Last edited:
Top