Can we make a super antonym/synonym dictionary from this?

catusf

举人
Hi guys
In the field of language processing and deep learning, people have been classifying words for years. I have come cross this research, that break down ~70k words into categories.

Do you guys think we can convert the data in the dict_synonym.txt, and dict_antonym.txt, dict_negative.txt into a synonym/antonym dictionary?

Catus
1731909448825.png




1731908845058.png
1731908876911.png
1731908856644.png
 

Attachments

  • 哈工大信息检索研究中心同义词词林扩展版说明 (1).pdf
    177.1 KB · Views: 10
  • 哈工大信息检索研究中心同义词词林扩展版说明.pdf
    102.2 KB · Views: 18
  • dict_antonym.txt
    360.6 KB · Views: 10
  • dict_negative.txt
    27.1 KB · Views: 11
  • dict_synonym.txt
    889.1 KB · Views: 130
Last edited:

Shun

状元
Hi Catus,

that's a nice idea. The data seems to be of a high quality. In this test, I combined only the synonym data using a short Python script (see attachment) into a Pleco 3.2 list. Especially for words with a more specific meaning, the list seems to work great.

I've found combining the antonym data with the synonym data to be trickier because the synonym of a word's antonym in most cases doesn't work as an antonym for the first word anymore. As an example, there's the antonym pair 人 and 神 (man and spirit). This makes sense, but if I add all the synonyms for 神 to the list of antonyms of 人, I get 神情 (expression, look) as an extra antonym, for example, whose meaning is of course too distant from 人. It didn't work the other way around, either (first creating the synonyms list and replacing the synonyms with their antonyms where available). Or did you think of a different way of adding antonyms? The original list only has 1:1 antonym pairs.

Here's the dictionary in action in the 3.99.1.31.2 beta:

IMG_8151.PNG

I attach the Python script and the output TXT and PQB files. (Remove the last ".zip" endings from the two split files and then unzip.)

Best, Shun
 

Attachments

  • catus_synonyms_plecodict_userdict.txt.zip
    849 KB · Views: 8
  • Generate Pleco synonym list.py.txt
    830 bytes · Views: 11
  • Catus Synonyms Dictionary 2.zip.001.zip
    1.8 MB · Views: 8
  • Catus Synonyms Dictionary 2.zip.002.zip
    1.6 MB · Views: 8

Shun

状元
Hi Catus,

your solution looks great to me. Nice work. I prefer it without the pinyin, but that's up to the user. Just as a side note to Mike, Catus' PQB didn't load right in the 31.2 beta, the entries came out empty:

IMG_8152.png

Cheers,

Shun
 

catusf

举人
Hi Shun

I will make a non-pinyin version.

Also the negation parts are still very mechanical, but I don't know how to improve. Any ideas?
 

Shun

状元
Hi Catus,

thanks! I don't think the negations can be improved, except perhaps if you used AI to group them. :)

Cheers, Shun
 
Top