How can you SORT Chinese characters...single and multiple

Discussion in 'Chinese Language' started by Sy, Oct 3, 2015.

  1. sobriaebritas

    sobriaebritas 探花

    Just as an illustration of another aspect of the problem:

    ABC (Wenlin) Entries with identical spelling (including tones) are arranged by order of frequency (Xiandai Hanyu Pinlü Cidian and Zhongwen Shumianyu Pinlü Cidian):

    淹浸 ¹yānjìn
    烟禁 ²yānjìn
    严谨 ¹yánjǐn
    严紧 ²yánjǐn
    严禁 yánjìn*
    掩襟 yǎnjīn(r)
    演进 yǎnjìn
    演进到 yǎnjìndào

    Microsoft (Word, Excel, etc.) Entries with identical spelling (including tones) are arranged by number of strokes

    烟禁 ²yānjìn
    淹浸 ¹yānjìn
    严紧 ²yánjǐn
    严谨 ¹yánjǐn
    严禁 yánjìn*
    掩襟 yǎnjīn(r)
    演进 yǎnjìn
    演进到 yǎnjìndào
     
  2. sobriaebritas

    sobriaebritas 探花

    Hi Sy,
    I beg to differ, unless you use "bigram" and "disyllabic word" as synonyms. But then again, "bigram" and "disyllabic word" do not refer to the same thing. Or, to put it in another way, all disyllabic words are bigrams, but not all bigrams are disyllabic words. For instance, 我也 is a bigram (as I understand this term), but I wouldn't say it's a word.
     
    Last edited: Dec 3, 2015
  3. sobriaebritas

    sobriaebritas 探花

  4. sobriaebritas

    sobriaebritas 探花

    Do you mean something like this?


    事实
    事实层次
    事实动词
    事实婚姻
    事实俱在
    事实清单
    事实如此
    事实上
    事实上公司
    事实胜于雄辩
    事实问题
    事实修正
    事实意义
    事实昭彰
    事实真相
    澄清事实
    事实
    既成事实
    经验事实
    歪曲事实
    违反事实
    诬捏事实
    隐匿事实
    重要事实
    事实
    施事实词
    依事实宣告无罪
    以事实为根据
    -------------------------------
    and then the same with
    事变
    事畜
    事端
    事儿
    .....
    .....
    .....
     
    Last edited: Dec 4, 2015
  5. sobriaebritas

    sobriaebritas 探花

    Hi Sy,
    Do you know the pdf file I've attached to this message? Have you ever read it? I thought it might be of interest to you. It's already 30 years old, though.
    (The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects)
     

    Attached Files:

  6. Romanization or Pinyin is OK for "Mandarin", but create problems for Cantonese or Japanese people...
    More: the spoken language change, for now Pinyin is near to the pronunciation of standard chinese, but some word are pronuciated in different ways commonly: not like the difference between written and spoken language in English luckily!
    Perhaps the problem cannot have an unique solution for printed dictionaries: but for Pleco an digital dictionaries is an opportunity!
     
  7. Sy

    Sy 进士

    image.jpeg
     
  8. 朱真明

    朱真明 进士

    At the end of the day I really think that this is just grasping at straws. If I look up the word "realistic" in a English dictionary, I will first have to find the "R" section and then the "Re" and "Rea" finally to the "Real" section and then I will have to scan through all of these words.........

    Real
    Real ale
    Real estate
    Re-align
    Realise
    Realism
    Realist

    before reaching realistic. This order is not set and can vary dictionary to dictionary depending on how many words are in that dictionary. This scenario is not much different to that of 衣, 依, and 醫. That is, once you have reached the "yi 1" section you will have to scan through a list of varying 字 before reaching the one you are looking for. It's not really a big deal.
     
    feng likes this.
  9. Sy

    Sy 进士

    Reply to no 28
    If you are scanning , it is not a good ordering system..
    In English dictionary , I don't have to scan .Thus, it is a good system.
    In Chinese dictionary, when I scan a long list and I don't find it.
    Then I have to scan again and realize that character / term in NOT in the list..
    This action causes loss of time and aggravation.
    If I want to design a new system ,I like to avoid this problem for human and machine
    Search.
     
  10. Sy

    Sy 进士

    I did not go to page 2, I missed the posts
    Now. I catch up .

    In the above lists of pinyin order, the Chinese characters stay together.
    I have seen in pinyin order , the Chinese characters are separated.
    If one does a machine sort , he prob,ly sees Chinese characters separation.
     
  11. Sy

    Sy 进士


    TRUE,
    I AGREE.
     
  12. 朱真明

    朱真明 进士

    I'm pretty sure that in my comment I showed that you do scan in a English dictionary.
     
    feng likes this.
  13. Sy

    Sy 进士

    Reply to no 24
    This attachment is similar to 商务的 巜汉英词典》without 反义词条格式

    image.jpeg
     
  14. Sy

    Sy 进士


    Sobri.....
    I can not read it now due to my ignorance.
    I wish it is shown directly.
    I will try to decode it later.
    I am sure it would be very interesting.thanks
     
  15. Sy

    Sy 进士

    真明:you make me think
    Thanks so much.
    I admire the Romanized language in sorting.
    In a best language system , one should not have to scan .
    A word should be in a FIXED LOCATION in a dictionary.
    If one wants to scan to waste time,that is one,s choice.
    I like automation . Manual mode is too slow.
    Scan makes me 头晕眼花。
     
  16. 朱真明

    朱真明 进士

    A word or character is always in a fixed location inside a dictionary, it just varies depending on how you go about identifying where the fixed location is or which dictionary you are using. For example the character 真 is on page 726 of 遠東拼音漢英辭典. Because this dictionary is ordered by pinyin, I can first look for "Z" and then "Zh", "Zhe" finally to "Zhen". At that point I would be at page 725, from there I just need to scan through 貞, 珍, 針, and 砧 before reaching 真. Alternatively I could go to the back of the dictionary and look up the word by stroke order or radical which would then tell me the exact page number.

    Regardless of which method you used the character 真 is still in a fixed location in this dictionary. You will not find it anywhere else. If I were to use another dictionary of course it will be in a slightly different location. But this is the same in English. If you were to compare one word in two different dictionaries and check what words listed before that word and after that word, each dictionary would be different. This is because they contain different amounts of words in the dictionary and maybe sometimes conjugation and pluralization aren't taken into consideration. In order to find the word you will still have to go through the "pronunciation sections" method in order to find the fixed location of the word. This isn't even considering accents, regional variants, alternate pronunciations and so forth.

    Natural language is subject to many variations due to numerous cultural influences. Automation is only suitable for logically consistent languages like mathematical language or programming language.
     
    Last edited: Dec 31, 2015
    alex_hk90 likes this.
  17. Sy

    Sy 进士


    I just copy n paste your reference to do a google seek .
    I found it n read the 20 plus pages.
    Mair emphasize pinyin approach .a lot of historical background . 30 years later , pinyin can not solve the present problem. 王力…等wrote the article I posted here
    They had an opposing view.
     
  18. feng

    feng 榜眼

    I don't understand your point. All the Japanese dictionaries for native speakers that I am aware of are arranged by kana (i.e. by pronunciation). A syllabary is just an alphabet with a different name due to linguists loving to name things ;)
    Aside from Cantonese's lack of standardization, what is the issue with using romanization for Cantonese (a language of which I am ignorant)?

    If one knows standard Mandarin and the basic rules of juyin, or a given system of romanization, it is hard to spell things wrong. There are barely 400 syllables in actual use, not counting tones. What do you mean when saying that some words are commonly pronounced differently? You mean characters? Multi-character words? Taiwan vs PRC pronunciation? Could you give a couple of examples please?

    Sy: Love your posting style; even better with the paper still on the clipboard!
    Frankly, I think your fundamental question has been answered by more than one person on this thread. One can not expect to go to a restaurant and get a meal one likes without perusing the menu, ordering, and then waiting for the food to be made. One can not go to a library and get the right book without consulting the catalog and/or browsing the shelves.
    I agree with you that it makes no sense that PRC dictionaries ordered by Pinyin then inexplicably throw the characters in at random (or is there some logic?) under the same tone, rather than ordering them by stroke count which has been the practice of the last 400 years. Of course, "yi" is the most populous syllable in Hanyu pinyin, so that somewhat exaggerates the problem.
    Is your interest in 22,000 characters, more than three quarters of which practically no one has ever seen, theoretical or practical? In other words, what is it you want to do with these uncommon characters? Counting variants, there are well over 100,000 characters, but arguably less than 30,000 basic characters (i.e. non-variants) with only 5,000 or so of those known by educated people (I've been testing!), so what is your need for a lightning fast, all perfect lookup method for rare characters?


    "The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese" is not worth the time of day, IMHO. Neither is the dictionary it spawned :confused:
     
    Last edited: Jan 1, 2016
    alex_hk90 likes this.
  19. Sy

    Sy 进士

    真明and all.

    In English ,if a dictionary has only 3 words,namely,
    Cat
    Dog
    Pig
    Dog is indexed between cat and pig in its fixed position.
    Dog can not come after pig.
    I your example, 贞珍针砧真
    Anyone or I can index them as 真针珍贞砧
    Thus, 真has no fixed position
    I wish I know how to express it more clearly.
    Another thing, when you go back to the rear to use another system.you cause delay by introducing multi system for dictionary look up.
    When I use the English system,I use only one system....alphabetic sort.
     
  20. 朱真明

    朱真明 进士

    Maybe you should add the word "Realistic" into your three word list, it might clarify things a little better.

    What I'm getting at, is that you have created an unrealistic scenario that is incapable of honestly reflecting reality.

    If you are including all of the dictionaries in a language, then without a doubt no word ever has a fixed location. Its location is determined by the dictionary it is found in. I have already shown this and if you own two English dictionaries then I encourage you to test out the method. Furthermore, not all English dictionaries are ordered alphabetically, have you ever used a specialist dictionary before? Some of them are organised by category which assumes that you are already familiar with that field of knowledge.

    Chinese is more flexible, it is not a phonetic or syllabary based writing system therefore if you don't know the pronunciation of the word there are other means by which you can look it up in a dictionary. You do not need to use multiple systems to find the word, you just need to use the appropriate one. Meaning that having multiple systems to look up words is not a burden but actually a freedom.

    I honestly think that it has already been proven that you will not be able to eliminate scanning or searching for words in paperback dictionaries, if you have any evidence to the contrary then please present it.

    All in all, what you desire has already been achieved in electronic dictionaries. As we move into the electronic age and paper-books are slowly reduced in favour of electronic versions, dictionaries would probably be the first to go. Meaning this type of idealism is relatively pointless.

    Anyway, I did enjoy the conversation but still wonder, in your research have you got anything that can contribute towards the development of systems for organising Chinese characters?
     
    feng and alex_hk90 like this.

Share This Page