How can you SORT Chinese characters...single and multiple

Discussion in 'Chinese Language' started by Sy, Oct 3, 2015.

  1. Sy

    Sy 进士

    Feng and All,
    I used the screen shot to contain my writing is easier for me.when I use iPad typing ,iPad changed my spelling from in to I and many more others
    I am not a fast typist.
    Another thing ,when I type to reply to you, I have to move to the top of the page to read your writing and move down to continue answering you.
    In screen shot, I read your writing in iPad and write my discussion on paper without referring up and down off the page.
    I want to discuss with you with all good may take many postings.likewise. I want to discuss with others the same way.
    I may not have time to do it in one shot and right away.
    In pinyin arrangement,there are too many characters with the same you said 400 sounds for all those thousands of single characters.
    Also there are 同音词 to make pinyin to become a pinyin language difficult.
    Pinyin is really for 注音 only for now.
    Til next time.
  2. feng

    feng 榜眼


    I really do like your posting style. I would do it myself, but I am too lazy.

    What I was trying to say in my last post when referring to "yi" is that since 1615 dictionaries have typically (and still do for Taiwan and those I have seen from Hong Kong) listed your three example characters as 衣 依 醫, that is by stroke count. The problem you point to is a PRC problem.

    As for your cat-dog-pig dictionary, see how many Americans (and likely other English speakers) can find voila or onomatopoeia in the dictionary. They know both words, but I bet you they can't spell them -- and in English you have no other recourse if you can't spell it (until Google came along).

    I am under the opinion that 朱真明 has answered your questions fully. As for my own attempts, you make me feel I am in a dialogue with myself. I commented on the putative Li Wang essay, I asked you questions about what you are trying to do . . . and I got little or nothing by way of direct response. I wish you all the best with whatever it is you are doing :)
    alex_hk90 likes this.
  3. Sy

    Sy 进士

    image.jpeg I like ALL members to read this first part of an article
    Somehow the image is not clear here but it is clearer in my photo album.
    I don't know why . Hope you can read it ok.

    I will reply to Feng,s posting later.
  4. Sy

    Sy 进士

    Here is another writing to those who like to read.
  5. Sy

    Sy 进士

    Last reference that may be of interest.

    To post full image
  6. Sy

    Sy 进士

    As I mentioned , I would come back to answer this part of Mr Feng post

  7. Sy

    Sy 进士

    Feng and All:
    In pinyin index facing same sound terms, they use the following 3 ways to help
    Detailed index.
    1/ 札字法that is to order. First stroke is horizontal, second is verticle, third is slash
    To left,fourth is slash to right, 5th is crooked or bent.
    2/ 江山千古,曲 法....use the first stroke to index
    3/ 寒来暑往,曲 法.......same as above
    Confused ?no standard.

    Years ago , friend working in library of congress said the magic number of characters is about 22000 to cover index of books, people names and geographic names...
    Now, Unicode has about that number incl Japanese ,Korean characters.
    I collected a list published by China was about 21000. Now they increase them to 27000. I stay with 22000 max not for me but for others as reference. I like real time computer speed .dont want to go lightning speed.
    山西and 陕西have same sound and tone. They changed the spelling to make an exception.
  8. Abun

    Abun 探花

    For a print dictionary, I don't think it's possible to ever eliminate the necessity of some index for a simple reason: There are two main scenarios where you would look sth up in a Chinese dictionary: a) You know the pronunciation but not the character or meaning, or b) you know the character but not the pronunciation or meaning. (Of course there is the scenario where you know both pronunciation and character and just look for the meaning, but then you have the luxory to be able to choose either a or b for looking the word up.) In a good dictionary it should be possible to find a word in either of those scenarios. However as far as I can see it's impossible to find an unambiguous way to order words which results in the same order for both input methods. So the compiler has to choose ordering either by pronunciation or by character shape. In both cases there would have to be an index to allow the user to look a word up with the respective other method. A hybrid method such as ordering by pronunciation but order homophones by character shape doesn't solve the problem because you wouldn't be able to find the list of homophones without knowing the pronunciation of the character you're looking for.

    I also second the argument made before that it is impossible completely eliminating the necessity to scan. I think there might have been a misunderstanding caused by different interpretations of the "fixed position" (as far as I understand the OP s/he means "unambiguous" (relative to other words) while other contributors understood it as fixed in terms of absolute position).
    Even if the position of a word in a dictionary is unambiguous in the ordering system (as it is for example in English dictionaries ordered by spelling, apart from a few homographs), it is still improbable that I find it at first glance. More likely, I open the dictionary at the approximate location I expect to find it according to the first one or two letters (e.g. towards the beginning for c-, towards the end for u-) and then have to compare with other words to know in which direction and by how much I was off. Unless I'm already very close, I will again take a guess at how many pages I am off and then repeat the process until I've found the right page. And when I do, I still have to compare with adjacent words to find the exact location. I know when to expect to find the word instead of having to scan at random, that's true, but I do have to scan.
    While an absolutely fixed position should in theory indeed be possible to find without scanning, it presupposes that I as the user know that absolute position beforehand. I'm afraid I fail to see how that could be possible.
    Wan, sobriaebritas and alex_hk90 like this.
  9. Sy

    Sy 进士

    Feng said "If one knows standard Mandarin and the basic rules of juyin, or a given system of romanization, it is hard to spell things wrong. There are barely 400 syllables in actual use, not counting tones. What do you mean when saying that some words are commonly pronounced differently? You mean characters? Multi-character words? Taiwan vs PRC pronunciation? Could you give a couple of examples

    Feng and all
    Sorry.i did not address the the question in the second paragraph AS QUOTED ABOVE.

    The trouble is that mandarin is not spoken by everyone; therefore, romanization can not be done as a language.for example, the Cantonese would not be able to read the
    Romanized/pinyin writing. I agreed with you the 400 spelling is a plus.this cut down
    The spelling mistakes. In Wang li,s essay, one of the items he said was the problem
    Of 同音词。if 2 wordS or 2 multi characters have the same sound,one can not distinguish which word is in reference. In single character ,the problem is worse.
    I can not give you many examples easily ; however, I can look up my reference
    . Coming off my head , I can only give the word of SHANXI WHICH can mean
    山西 or 陕西。so I this case, they changed 陕西to spell SHAANXI.
    I did not mean to leave you hanging.
    I don't know whether I have replied to you fully; otherwise, rephrase the question.
    We will continue to discuss.
  10. Sy

    Sy 进士

    Abun and All

  11. alex_hk90

    alex_hk90 状元

    This phenomenon is inevitable for any language (including both English and Chinese) with homographs (same spelling, multiple meanings) and/or homonyms (same pronunciation, multiple meanings).
    To take a similar example as you have done, with 3 words in English, namely:
    there are dozens of different meanings and no consistent place across different dictionaries:
    set: Wiktionary, TheFreeDictionary, Merrium-Webster;
    run: Wiktionary, TheFreeDictionary, Merrium-Webster;
    one: Wiktionary, TheFreeDictionary, Merrium-Webster.
    So to find a particular meaning you still have to first look up by spelling, then by type of word, then by meaning.
    Yes it might not be as common in English as in Chinese, but it is still fairly common, and for fairly common words as well.
  12. Abun

    Abun 探花

    @Sy: So your idea is to use a code point system which includes phonetic, graphic and semantic information? Interesting idea, that way not only different characters would have an unambiguous code, but indeed each subentry would. But if the code includes 音, 形 and 義, how would it be possible to find the desired entry if you know only one of the three (unless there are indices again of course)? And how would you handle multi-character entries. Those have their own 義 but their 音 and 形 cannot be covered with the same encoding method as single-character entries (unless maybe you use only the 音 and 形 information of the first character). Or is what you envision more a 字典 instead of a 辭典?
  13. Sy

    Sy 进士

  14. Sy

    Sy 进士

    One more thing,if I have meaning in another column in English ,I can pull out the Chinese FORM,SOUND.
    IN English to Chinese dictioary ,there is no index problem
    In Chinese to English dictioary , I have to use another indexable language for assistance like English here. Later, there may be a solution.
  15. Sy

    Sy 进士

    I went to Flash card about food to borrow from Miguel (ref #3) on edible mushroom .this is a short list for illustration. Thank you, Miguel.

    草菇[草菇] cao3gu1


    春菇[春菇] chun1gu1


    刺芹菇[刺芹菇] ci4qin2gu1


    金针菇[金針菇] jin1zhen1gu1


    口蘑[口蘑] kou3mo2


    木耳[木耳] mu4er3


    平菇[平菇] ping2gu1


    食菌[食菌] shi2jun4


    松蕈[松蕈] song1xun4


    香菇 xiang1gu1


    银耳[銀耳] yin2er3


    猪苓[豬苓] zhu1ling2

  16. Sy

    Sy 进士

    In my ref #55 above,I struggled to put up this chart without grids.
    I manually edited the mushroom list and delete some.
    just leave chinese names //pinyin with tones.
    do a sort in pinyin.the result is shown above.
    I do this is to illustrate the sorting of chinese terms in pinyin order.
    One time ,I went to a book sale .I wanted to order a magazine ;the seller could not find the magazine
    name and cost of subscription.
    I am always thinking that if the seller had indexed a master magazine list ,he would not have such problem to find the info.
    I wish I know how to convert chinese character magazine names to pinyin easier in the rows like in Excel so I can sort thousands of names of a list. Please advise me your steps.many Thanks
  17. Sy

    Sy 进士

    Alex ref 51... And all
    In my previous posts I said about the No fixed position
    I repeat here with example again
    See 新华字典 front section of using radical to find character.
    Page of three dots for water
    2 stroke


    I can Order the characters as 汇,汁,汀. 氿,氾,汉,汈
    Also please refer to refer no.43. 涂建国 说 无定序⋯等
    Also see Feng and 朱真明
  18. 朱真明

    朱真明 进士

    I think everybody here already understands your issue, but you haven't addressed the counter-argument that this phenomenon you are trying to deal with is actually an inevitable one that cannot be dealt with.

    I made the claims that you will find this phenomena in English as well as in every so called "Natural Language". If you are familiar with linguistics especially socio-linguistics and historical linguistics you would know that they are different from computational linguistics. That is, natural language isn't entirely logical due to its complicated history of cultural influences.

    My argument is (and I think "feng" "abun" and "alex" are in general agreement) that what you desire to achieve is logically unachievable and that you should probably just accept the idiosyncrasies of natural language.

    Are you willing to argue against the claims made? Do you have an argument that shows what you want is logically valid?

    I understand you probably created this discussion thread hoping for some practical advice on how to achieve what you want but I think if we cannot sort out the fundamentals of the issue then practical advice isn't going to be that practical.
    Last edited: Jan 17, 2016
    sobriaebritas and alex_hk90 like this.
  19. alex_hk90

    alex_hk90 状元

    I agree that you can order those characters in multiple ways - what I was trying to show is that you can order words in multiple ways in English as well, and hence I gave a few examples in a few dictionaries where the particular meanings of words had "no fixed position". For all intents and purposes, these are different 'words' with "no fixed positions". I don't think there are many, if any, languages where every word has a "fixed position" in a dictionary.

    Exactly. :)
  20. Sy

    Sy 进士

    In reality, I never thought of generating a thread to ask for advise.
    Right now I don't have a firm answer either .I try to design some thing that
    May answer some sorting problem questions. Please refer to the printed attachment
    I posted. People voiced their problems. these problems existed from long ago. I don't make them up.
    It is real complex to discuss in a few postings to reveal the solution.
    So I rather do the easier sorting of pinyin terms as I did in the MUSHROOM
    POST previously
    I try to learn the flash card and OCR SYSTEM here to see if I can find an easy
    Way to compile the pinyin listing.
    When I have a firm solution, I shall be back on the character sort issue.
    I consider sorting character is like sorting a can of worms.

Share This Page