Outlier Dictionary of Chinese Characters

Thanks for your patient and information-rich comments.

> re: "in that phonetic series have always been one of the main sources for the reconstruction of OC, maybe even the most important of all. " …

I see that this comment comes from a perspective that I did not explain and that you had no way of guessing. In many ways OC is a fairly straightforward projection back from MC with allowance for sound changes which merge and split forms over time. (I’m going to ignore the “dialect” issue and assume linear development.) I see that most of the abundant evidence you rightly call out as pertaining to this straightforward projection. This includes many minor changes but also some major ones, including tonogenesis.

But one aspect of OC as reconstructed by Karlgren and others up through Baxter and Sagart and beyond is the positing of initial and final consonant clusters which are not present (and, from the POV of the reconstruction, have been reduced from their earlier forms) in MC and go beyond simple tonogenesis. These clusters are not part of the straightforward rule-based projection from MC back to OC (including tonogenesis as well as rules of limited scope). They are, in my mind, the one truly “interesting” feature of OC, and the one that dominates the way I think and talk about OC. And I believe that, within Chinese (as opposed to cognates and cross-language borrowings and writings), the main evidence for them is the phonetic series. Whence my hyperbolic statement.

> Re: "I therefore regard it as speculative in the specific sense of being not confirmable or disconfirmable to a reasonable degree of confidence. "

> I just saw a paper given at a conference about how well Baxter & Sagart 2014 vs. 張鄭尚芳 can explain rhyming phenomena in excavated texts, B&S did really well. They did have a problem with 侵部. Interestingly, Sagart gave a paper at the same conference on some geographical differences with OC部 and 侵部 was the main one. I'm pretty sure (though I'd have to confer with the author of the BnS2014 vs. zzsf paper to be 100% sure) that some of the issues he came across were cleared up due to the geographical differences.

> So, to challenge your assertion, here is a case of an excavated texts being used to confirm theories based on the 《詩經》、諧聲、通假、異體字, (all of which are independent from each other), and performing rather well. In places it doesn't perform, it points out areas to fix. That is not circular.

> So, your characterization of OC is not accurate.

Agreed. I have great admiration B and S as a team and individually. But, as I said above, I distinguish between the prefix/suffix (and esp., consonant cluster) aspect of OC and the straightforward projection aspect. I don’t know enough about early rhyming to know how they’re involved in the two aspects.

That said, I think the fundamental point of vulnerability for the modern OC models such as Baxter-Sagart is lack of a fleshed out model of how the language actually worked at a given point in time, particularly as regards morphemic and non/submorphemic prefixes and suffixes and associated connecting vowels. I think that the most effective way to address this weakness is to adduce examples of well attested and even directly observable languages that actually work the way OC might have worked 2500-3500 years ago..

> Re: "what appear as syllables in MC were segmentable into optional prefixes, roots (always present), and optional suffixes; "

> I'm not sure what you mean. Are you saying that there was suffixation in OC? What "appear as syllables in MC" are syllables. As are the roots plus affixation in OC. I'm not sure what you mean by "appearing as syllables".

All I meant by ‘appearing’ was ‘had become’. And, yes, my understanding is that there was suffixation in OC, though it mostly produced tone alternations in MC (but also qu-ru alternations like those we’ve discussed).

> Re: "My question is, given this situation, why weren’t the prefixes and suffixes represented in the writing system? "

> There are examples of them being reflected in writing. There are cases like, I think it's in 《方言》, where they say that 筆 is pronounced 不律 in some region, which fits exactly BnS2014 "loosely attached" (不律) vs. "tightly attached" (筆) prefix types. I've found other examples where 注釋家 explain some character like:

> 無X,X也. So, "not X" = "X". The reason? The 無 is representing a sound, a loosely attached pre-initial. So, there are instances of your (a).

Thanks for the examples. But the challenge is to find spellings that seem to correspond directly to the prefix-root and root-suffix sequences reconstructed by Baxter and others for OC.

> I don't think your (b) is viable though. How would you know that a given component is representing affixation? I've never seen any marking on a character that would indicate some internal aspect of a character's pronunciation. In fact, native speakers of languages don't analyze word-internal grammar, which is what your (b) is. So, (b) is out, but there are examples of (a). Pre-Qin characters reflect syllables, but not parts of syllables.

> There are things like 合文, where two characters are written together as a single character, but they are usually marked with a = symbol, showing that it should be read as two characters. But having a component represent a prefix doesn't really match how things worked in pre-Qin scripts.

Vietnamese chu nom follows the basic rules of Chinese character composition but is way more dynamic, and includes what can only be described as character annotations. At least one or two of them were added as combining characters in Unicode 13.

> I'm familiar with GONG Xun, but haven't read that particular paper.

I think you’ll like it, esp. if you’re already familiar with Maspero and Haudricourt!


re: "Thanks for the examples. But the challenge is to find spellings that seem to correspond directly to the prefix-root and root-suffix sequences reconstructed by Baxter and others for OC. "
The examples I gave did match B-S prefix-root. Even the very traditionally minded Taiwanese paleographers that I presented those examples to sometime around 2008 said, 那就是表音的成分而已. I'm not sure why they said it like that, since it was my intention to say that they were expressing a sound. You seem to be assuming a uniform one syllable = one character, which is not a necessarily a safe bet in the early script. Paleographers that work on/specialize in 甲骨文 and early 金文 seem to be in agreement, quite independent from phonologists, that it is probable that many early individual character forms represented multi-syllable words. They find it unlikely, for instance, that all animal names or plant names were monosyllabic.

Having said that, the paper I gave at the conference I mentioned earlier has to do with when Chinese characters began to be borrowed into the Korean peninsula. There is a ton a papers out there (seriously, you could write a PhD dissertation on this topic alone) on OC words being borrowed into Korean (actually, the language preceding Korean). 潘悟云 gives examples of words borrowed into Korean that retained the *-s post-coda for instance. However, it would be best to systematically look at all of the claimed OC words being borrowed into Korean, have some kind of unified criteria for determining which ones are more reliable, and which aren't, and then judging his examples in light of that. But, like I said, that's a huge project, and not directly within my area of research.

I also find it highly unlikely that there would ever be a reflection of post-codas in the script. Nor of tightly affixed prefixes. It doesn't make sense at all from a script point of view. It only makes sense if the affix is a different syllable or sesquisyllable, like the case with loosely attached prefixes. *-s or *-glottal stop don't fall into that category.

Chu nom is very different. If you look at pre-Qin scripts vs. clerical script, for instance, they are very, very different. So, it's not surprising that Chu nom would be different from early Chinese script, given that it's based on a much later version of the Chinese script, and that they didn't have 1500 years of tradition guiding what they did.

re: "These clusters are not part of the straightforward rule-based projection from MC back to OC"
That's not really how it works. Also, not what I meant. What I meant was that any proposed OC reconstruction has to be able to evolve into MC (which is the opposite direction to what you're saying). It's not the nature of OC reconstruction to just project a bunch of rules backward from MC. For instance, rhyming in the 《詩經》 has quite a few differences with MC rhyming. If you just "project back", your *OC is going to end up with a bunch of rhyme groups that don't match attested OC rhyming. You have to take all of the data for a given character/word into account. I suppose Schuessler's Minimal OC is more similar to what you're saying, though it can't be totally like that for the reasons stated above. It's not what Baxter & Sagart are saying. You'd also be ignoring any and all paleographic evidence if you just "project back" as well as most of the evidence that I listed earlier. Also, MC doesn't always give enough information on how to reconstruct an OC form, which is why B&S have things like "*(r)" meaning "not sure based on MC if there was an *r there".

I'll check out Gong Xun's paper, but it may be a while before I can get to it, given the huge stack of stuff on my plate at the moment.
Last edited:


Staff member
A note before anybody chimes in: the change of the "system" icons from nice black boxed S's to regular old S's with [] brackets around them on iOS in this release is our fault, not Outlier's; we don't have a reliable way of showing that boxed S on Android at the moment (Android users already had a bracketed S) and it's not really feasible for us to maintain separate versions of the same dictionary for iOS and Android.

But also, we were going to have to change this anyway in 4.0 since we're now using boxed letters to break up parts of speech, so if anybody has any suggestions for an icon / symbol that you feel perfectly captures the essence of Outlier's system data, please feel free to chime in :)

Screen Shot 2021-09-20 at 12.05.59 PM.png
Hi all,

We just released a big update to the dictionary yesterday!

This update adds about 380 new characters, plus 60+ new Expert entries, bringing the total to over 3000 characters with 250 Expert entries.

If you have the dictionary, you should get the update automatically, or you can go to Pleco's Menu > Add-ons > Updated. If you don't have the dictionary yet, you can get it in Pleco, or here: https://www.outlier-linguistics.com/products/outlier-dictionary-of-chinese-characters

Here's the list of new Expert entries for those interested:

Simplified: 一老适丂者包万合三上下而戎尔气帚鸟生大然户夺爻所升乌歌青何教孝非你学火灬山年智窃云五斗林犬辶麻於今燕受这飞知哥雨自可
Traditional: 一老丂者包合三上下而戎气這帚生氣大萬戶然爻爾所升歌青何教孝非號你適奪火灬山年學智云五斗林犬辶麻於麼今竊烏燕受飛鳥知哥雨自可雲出

There's some really interesting stuff in there. Here's a screenshot of the Expert entry for 智: