List of characters that have been simplified

daal

探花
Hi, does anyone have a list of characters where the traditional form differs from the simplified form? If not, does anyone know how use Pleco to generate such a list? Thanks!
 

Shun

状元
Hi daal,

I didn't have one ready, but was happy to create one.

I used the word list of the newest CC-CEDICT file (of about 121,600 Chinese words, so all the important Chinese characters should be represented) and obtained a list of 3635 Traditional-Simplified character pairs where the Simp. and Trad. were different.

Because the Traditional and Simplified characters are often in a "n to 1" relationship, I added the list in both Traditional-Simplified and Simplified-Traditional order, with the second one sorted by the Simplified characters, so you can see the Simplified characters with multiple Traditional correspondences at a glance.

I attach the word list taken from CC-CEDICT and the two lists of 3,635 Simplified-Traditional character pairs in both orderings, as well as the Python source.

Thanks for the request, you're welcome & enjoy,

Shun
 

Attachments

  • Simplified_Traditional_different_chars_list_sorted_by_Simplified.txt
    28.6 KB · Views: 166
  • Traditional_Simplified_different_chars_list.txt
    28.6 KB · Views: 166
  • CC-CEDICT September 23 - Simplified different from Traditional - Words only.txt
    1.4 MB · Views: 156
  • Different_chars_filter.py.txt
    651 bytes · Views: 128

daal

探花
Shun, that's incredible! Thanks so much! I hope you can make some use of this list as well! Best wishes
 

Shun

状元
Hi daal,

you're welcome! Yeah, I also happen to be studying the Traditional characters to strengthen my general knowledge. It's good to see that there's a lot more regularity in them than I had thought, which also becomes clear from looking at these lists.

Best wishes, Shun
 

alex_hk90

状元
Looks like you already have a (probably better :)) answer, but here is a mapping of traditional to simplified characters I used previously to convert the (traditional) headwords for a Pleco user dictionary a few years back:
 
Top