Thanks.
The problem isn't unintentional duplicates; we can (and do) also check against other fields to confirm a match. We're actually worried about the opposite problem, not finding duplicates at all.
For the initial query of the AnkiDroid database to find *potential* matches (for which we can then fetch / compare whatever fields we like to confirm a match), we have to be able to search on something that we can match exactly - that search happens through a mathematical hash, not a text match, so even just an extra space will screw it up. And that hash is based on the first field (and can't be made to use any other instead). So that's why the first field needs to be something that's relatively consistent / reliable, like characters or pinyin.
(though even pinyin is sub-optimal, since the orthography can change - next beta will most likely generate a bunch of different hashes for different combinations of spacing / capitalization in order to help reduce the likelihood of that)
The problem isn't unintentional duplicates; we can (and do) also check against other fields to confirm a match. We're actually worried about the opposite problem, not finding duplicates at all.
For the initial query of the AnkiDroid database to find *potential* matches (for which we can then fetch / compare whatever fields we like to confirm a match), we have to be able to search on something that we can match exactly - that search happens through a mathematical hash, not a text match, so even just an extra space will screw it up. And that hash is based on the first field (and can't be made to use any other instead). So that's why the first field needs to be something that's relatively consistent / reliable, like characters or pinyin.
(though even pinyin is sub-optimal, since the orthography can change - next beta will most likely generate a bunch of different hashes for different combinations of spacing / capitalization in order to help reduce the likelihood of that)