Outlier Dictionary of Chinese Characters

John Armstrong · Sep 17, 2021

Thanks for your patient and information-rich comments.

> re: "in that phonetic series have always been one of the main sources for the reconstruction of OC, maybe even the most important of all. " …

I see that this comment comes from a perspective that I did not explain and that you had no way of guessing. In many ways OC is a fairly straightforward projection back from MC with allowance for sound changes which merge and split forms over time. (I’m going to ignore the “dialect” issue and assume linear development.) I see that most of the abundant evidence you rightly call out as pertaining to this straightforward projection. This includes many minor changes but also some major ones, including tonogenesis.

But one aspect of OC as reconstructed by Karlgren and others up through Baxter and Sagart and beyond is the positing of initial and final consonant clusters which are not present (and, from the POV of the reconstruction, have been reduced from their earlier forms) in MC and go beyond simple tonogenesis. These clusters are not part of the straightforward rule-based projection from MC back to OC (including tonogenesis as well as rules of limited scope). They are, in my mind, the one truly “interesting” feature of OC, and the one that dominates the way I think and talk about OC. And I believe that, within Chinese (as opposed to cognates and cross-language borrowings and writings), the main evidence for them is the phonetic series. Whence my hyperbolic statement.

> Re: "I therefore regard it as speculative in the specific sense of being not confirmable or disconfirmable to a reasonable degree of confidence. "

> I just saw a paper given at a conference about how well Baxter & Sagart 2014 vs. 張鄭尚芳 can explain rhyming phenomena in excavated texts, B&S did really well. They did have a problem with 侵部. Interestingly, Sagart gave a paper at the same conference on some geographical differences with OC部 and 侵部 was the main one. I'm pretty sure (though I'd have to confer with the author of the BnS2014 vs. zzsf paper to be 100% sure) that some of the issues he came across were cleared up due to the geographical differences.

> So, to challenge your assertion, here is a case of an excavated texts being used to confirm theories based on the 《詩經》、諧聲、通假、異體字, (all of which are independent from each other), and performing rather well. In places it doesn't perform, it points out areas to fix. That is not circular.

> So, your characterization of OC is not accurate.

Agreed. I have great admiration B and S as a team and individually. But, as I said above, I distinguish between the prefix/suffix (and esp., consonant cluster) aspect of OC and the straightforward projection aspect. I don’t know enough about early rhyming to know how they’re involved in the two aspects.

That said, I think the fundamental point of vulnerability for the modern OC models such as Baxter-Sagart is lack of a fleshed out model of how the language actually worked at a given point in time, particularly as regards morphemic and non/submorphemic prefixes and suffixes and associated connecting vowels. I think that the most effective way to address this weakness is to adduce examples of well attested and even directly observable languages that actually work the way OC might have worked 2500-3500 years ago..

> Re: "what appear as syllables in MC were segmentable into optional prefixes, roots (always present), and optional suffixes; "

> I'm not sure what you mean. Are you saying that there was suffixation in OC? What "appear as syllables in MC" are syllables. As are the roots plus affixation in OC. I'm not sure what you mean by "appearing as syllables".

All I meant by ‘appearing’ was ‘had become’. And, yes, my understanding is that there was suffixation in OC, though it mostly produced tone alternations in MC (but also qu-ru alternations like those we’ve discussed).

> Re: "My question is, given this situation, why weren’t the prefixes and suffixes represented in the writing system? "

> There are examples of them being reflected in writing. There are cases like, I think it's in 《方言》, where they say that 筆 is pronounced 不律 in some region, which fits exactly BnS2014 "loosely attached" (不律) vs. "tightly attached" (筆) prefix types. I've found other examples where 注釋家 explain some character like:

> 無X，X也. So, "not X" = "X". The reason? The 無 is representing a sound, a loosely attached pre-initial. So, there are instances of your (a).

Thanks for the examples. But the challenge is to find spellings that seem to correspond directly to the prefix-root and root-suffix sequences reconstructed by Baxter and others for OC.

> I don't think your (b) is viable though. How would you know that a given component is representing affixation? I've never seen any marking on a character that would indicate some internal aspect of a character's pronunciation. In fact, native speakers of languages don't analyze word-internal grammar, which is what your (b) is. So, (b) is out, but there are examples of (a). Pre-Qin characters reflect syllables, but not parts of syllables.

> There are things like 合文, where two characters are written together as a single character, but they are usually marked with a = symbol, showing that it should be read as two characters. But having a component represent a prefix doesn't really match how things worked in pre-Qin scripts.

Vietnamese chu nom follows the basic rules of Chinese character composition but is way more dynamic, and includes what can only be described as character annotations. At least one or two of them were added as combining characters in Unicode 13.

> I'm familiar with GONG Xun, but haven't read that particular paper.

I think you’ll like it, esp. if you’re already familiar with Maspero and Haudricourt!

Ash · Sep 17, 2021

re: "Thanks for the examples. But the challenge is to find spellings that seem to correspond directly to the prefix-root and root-suffix sequences reconstructed by Baxter and others for OC. "
The examples I gave did match B-S prefix-root. Even the very traditionally minded Taiwanese paleographers that I presented those examples to sometime around 2008 said, 那就是表音的成分而已. I'm not sure why they said it like that, since it was my intention to say that they were expressing a sound. You seem to be assuming a uniform one syllable = one character, which is not a necessarily a safe bet in the early script. Paleographers that work on/specialize in 甲骨文 and early 金文 seem to be in agreement, quite independent from phonologists, that it is probable that many early individual character forms represented multi-syllable words. They find it unlikely, for instance, that all animal names or plant names were monosyllabic.

Having said that, the paper I gave at the conference I mentioned earlier has to do with when Chinese characters began to be borrowed into the Korean peninsula. There is a ton a papers out there (seriously, you could write a PhD dissertation on this topic alone) on OC words being borrowed into Korean (actually, the language preceding Korean). 潘悟云 gives examples of words borrowed into Korean that retained the *-s post-coda for instance. However, it would be best to systematically look at all of the claimed OC words being borrowed into Korean, have some kind of unified criteria for determining which ones are more reliable, and which aren't, and then judging his examples in light of that. But, like I said, that's a huge project, and not directly within my area of research.

I also find it highly unlikely that there would ever be a reflection of post-codas in the script. Nor of tightly affixed prefixes. It doesn't make sense at all from a script point of view. It only makes sense if the affix is a different syllable or sesquisyllable, like the case with loosely attached prefixes. *-s or *-glottal stop don't fall into that category.

Chu nom is very different. If you look at pre-Qin scripts vs. clerical script, for instance, they are very, very different. So, it's not surprising that Chu nom would be different from early Chinese script, given that it's based on a much later version of the Chinese script, and that they didn't have 1500 years of tradition guiding what they did.

re: "These clusters are not part of the straightforward rule-based projection from MC back to OC"
That's not really how it works. Also, not what I meant. What I meant was that any proposed OC reconstruction has to be able to evolve into MC (which is the opposite direction to what you're saying). It's not the nature of OC reconstruction to just project a bunch of rules backward from MC. For instance, rhyming in the 《詩經》 has quite a few differences with MC rhyming. If you just "project back", your *OC is going to end up with a bunch of rhyme groups that don't match attested OC rhyming. You have to take all of the data for a given character/word into account. I suppose Schuessler's Minimal OC is more similar to what you're saying, though it can't be totally like that for the reasons stated above. It's not what Baxter & Sagart are saying. You'd also be ignoring any and all paleographic evidence if you just "project back" as well as most of the evidence that I listed earlier. Also, MC doesn't always give enough information on how to reconstruct an OC form, which is why B&S have things like "*(r)" meaning "not sure based on MC if there was an *r there".

I'll check out Gong Xun's paper, but it may be a while before I can get to it, given the huge stack of stuff on my plate at the moment.

mikelove · Sep 20, 2021

A note before anybody chimes in: the change of the "system" icons from nice black boxed S's to regular old S's with [] brackets around them on iOS in this release is our fault, not Outlier's; we don't have a reliable way of showing that boxed S on Android at the moment (Android users already had a bracketed S) and it's not really feasible for us to maintain separate versions of the same dictionary for iOS and Android.

But also, we were going to have to change this anyway in 4.0 since we're now using boxed letters to break up parts of speech, so if anybody has any suggestions for an icon / symbol that you feel perfectly captures the essence of Outlier's system data, please feel free to chime in

John Renfroe · Sep 21, 2021

Hi all,

We just released a big update to the dictionary yesterday!

This update adds about 380 new characters, plus 60+ new Expert entries, bringing the total to over 3000 characters with 250 Expert entries.

If you have the dictionary, you should get the update automatically, or you can go to Pleco's Menu > Add-ons > Updated. If you don't have the dictionary yet, you can get it in Pleco, or here: https://www.outlier-linguistics.com/products/outlier-dictionary-of-chinese-characters

Here's the list of new Expert entries for those interested:

Simplified: 一老适丂者包万合三上下而戎尔气帚鸟生大然户夺爻所升乌歌青何教孝非你学火灬山年智窃云五斗林犬辶麻於今燕受这飞知哥雨自可
Traditional: 一老丂者包合三上下而戎气這帚生氣大萬戶然爻爾所升歌青何教孝非號你適奪火灬山年學智云五斗林犬辶麻於麼今竊烏燕受飛鳥知哥雨自可雲出

There's some really interesting stuff in there. Here's a screenshot of the Expert entry for 智:

jurgen85 · Oct 28, 2021

I'm trying to make sense of why some original meanings are so oddly specific. It feels like I'm missing some key historic context. Let's take for example 奪 "to seize a bird (sparrow) that s.o. is hiding in their clothing". Were there so many people hiding sparrows that they developed a word for seizing it? Were sparrows used in rituals? Was poaching forbidden? Is it just another case of language being weird and we don't really know?

Ash · Oct 28, 2021

Yeah, I'm not a big fan of that particular one either, but it's the explanation that best fits the most pieces of data. It could also be that the original meaning is "to seize" and that the character form is showing "to seize a bird (sparrow) that s.o. is hiding in their clothing" in order to represent "to seize" (though for that to work, people would still have have to understood the character form, meaning that there needs to be some kind of context for that as you mention). I'm not an expert on ancient ritual, but birds have been used as offerings in other cultures. There were indeed laws about who could hunt birds and who couldn't by the Han dynasty (if I remember correctly), though I don't know if that has anything to do with this situation. You have to also think people back then were familiar with carrier pigeons, hawking, etc., which have to do with control over birds.

One thing is very certain, and that is, the form of the character is a sparrow inside of clothing with a hand reaching in. I've looked at each different piece individually, and combinations of those pieces looking to see if there's a possible sound component, but there isn't. Usually, having a very specific original meaning is a bad sign, but in this case, it's the best available.

And, there definitely are cultural phenomena that are attached to a given time and place. There's something I call the Monkey Paw phenomenon. That is, when you go to some government office in a foreign country (something I've done a lot of), you get in line with your documents, thinking you have everything. Then, you get to the front of your line and step up to the counter, put your documents on the counter, the clerk looks them over, and then says, "Sir, where's your monkey paw?" And you think, "Who in the heck would think to bring a monkey paw to handle something like this?" And you turn around and see that the 30 people in line behind you all have monkey paws around their necks. Of course, the monkey paw could be anything, but that has happened to me a lot. The cause is the different assumptions people have, based upon their particular environment (the bonus lesson to learn here is, if you have to go a government office in a foreign country, never wait until the last day in case you have to go home and get your monkey paw).

It could also mean, and maybe this is a better explanation, though I'd have to go review everything again to know for sure:
a hand putting a sparrow into a piece of clothing in order to capture/seize it.

John Renfroe · Nov 26, 2021

Quick heads up: our Black Friday sale is going all weekend—40% off anything in the store (dictionary, courses, and posters, for both Chinese and Japanese) with the discount code 'BFCM2021'. We're also in the process of re-filming the courses and adding a bunch of material to them, so we'll likely raise prices on the courses sometime next year, making the current price probably the lowest price they'll be available for again.

LeonardoM · Nov 27, 2021

John Renfroe said:
Quick heads up: our Black Friday sale is going all weekend—40% off anything in the store (dictionary, courses, and posters, for both Chinese and Japanese) with the discount code 'BFCM2021'. We're also in the process of re-filming the courses and adding a bunch of material to them, so we'll likely raise prices on the courses sometime next year, making the current price probably the lowest price they'll be available for again.

Sir, I doubt this is the right place for spam advertising. If for every chinese course there were posts here, advertising their product, what mess would it be?

JD · Nov 27, 2021

@LeonardoM This thread’s topic is specific to the Outlier Dictionary that is distributed through Pleco. John Renfroe is one of the owners of Outlier. Pleco rarely has “sales” (only bundles), but if one wanted to purchase the Outlier dictionary for Pleco, John’s “advertisement” is letting you know it’s on sale.

mikelove · Nov 27, 2021

Yeah, in general we take an extremely dim view of advertising here - we don’t even allow *discussion* of competing apps because it’s impossible to draw a line between carefully written advertising and actual discussion - but Outlier is a rare exception since the thing they’re advertising is a Pleco add-on

LeonardoM · Dec 6, 2021

mikelove said:
Yeah, in general we take an extremely dim view of advertising here - we don’t even allow *discussion* of competing apps because it’s impossible to draw a line between carefully written advertising and actual discussion - but Outlier is a rare exception since the thing they’re advertising is a Pleco add-on

Fair enough. I wasn't aware.
I suppose I'm just fed up with ads, these days. My bad.

John Renfroe · Oct 13, 2022

Hi all,

We just released a big update to the dictionary!

This update adds ~180 new characters (Essentials entries), plus ~80 new Expert entries, bringing the total to about 3350 characters with 330 Expert entries!

Here's the list of new Expert entries for those interested (see below for an interesting sample entry):

Simplified: 东丰二些从体八其冉军分动卯卵去否咅囗围基天好如定实尌小少己已巳巾希微徵想旬期本束树正此毋毌母沙法犮理皆相着睘礼箸著衣袁豊贯走起车还进那部都里重长门隹面韦鼓
Traditional: 㐱丰二些从体八其冉分動卯卵去否咅囗圍基天好如定實尌小少己已巳巾希從微徵想旬期本束東樹正此毋毌母沙法犮理皆相着睘禮箸聯著衣袁豊豐貫走起車軍進還那部都里重長門隹面韋體鼓

Also, any time you want to check which characters have Expert entries in the dictionary, you can find a list on this page.

There's some really interesting stuff in there, so be sure to check out the new entries! Here's the expert entry for 微:

anhnha · Oct 13, 2022

I purchased this several years ago for expert as well and then realized that it doesn't help me much. I'm not gonna deny the great effort of the team to do a lot of research to do this. It is just that I don't need this kind of knowledge. I was hoping that it would help me to remember it better but most of them are too deep and relating to etymology which doesn't help me much. It would have been more useful for me if it was some good story (probably fake but okay) to help to remember the words better.

JD · Oct 14, 2022

anhnha said:
I purchased this several years ago for expert as well and then realized that it doesn't help me much. I'm not gonna deny the great effort of the team to do a lot of research to do this. It is just that I don't need this kind of knowledge. I was hoping that it would help me to remember it better but most of them are too deep and relating to etymology which doesn't help me much. It would have been more useful for me if it was some good story (probably fake but okay) to help to remember the words better.

A counter view: I purchased my Outlier bundle through the original Kickstarter campaign. I have thought since the beginning, both on the Kickstarter and through their current website, that it’s very clear that this is a scholarly work. It’s not at all meant to be a mnemonic device to help one remember words, but to give you clear scholarly meaning of the origin of the words. I actually find it invaluable to help me remember words, because I have an engineering brain that enjoys understanding the history and evolution of the words. I can immediately see why some people not may not be interested in that view of words, but I find it intriguing and valuable.

anhnha · Oct 15, 2022

You're right. I understand that it takes a lot of work and research to find the etymology for a word. However, I think an add-on with radicals and a story to help remember it from radicals probably suits me better.

ACardiganAndAFrown · Oct 16, 2022

Is the Expert Edition still expected to reach 2000 characters?

Does that mean it is still less than 20% complete?

Kun-it · Nov 5, 2022

I've purchased and downloaded the expert edition, but the entry for 微 in my dictionary is not nearly so detailed as that shown in John Renfroe's post above.
Can anyone that has used the dictionary help?

mikelove · Nov 5, 2022

Check the "Add-ons" screen under "Installed" - is the Essentials version still installed too? It should delete automatically when you install Expert, but if it didn't then that might explain the lack of an Expert link in this entry.

Kun-it · Nov 5, 2022

mikelove said:
Check the "Add-ons" screen under "Installed" - is the Essentials version still installed too? It should delete automatically when you install Expert, but if it didn't then that might explain the lack of an Expert link in this entry.

Oh great, that worked. Even downloaded the Mini version just to be on the safe side.

I guess I must have re-downloaded it thinking that the download was for the Expert version.

Cheers!

sobriaebritas · Nov 12, 2022

ACardiganAndAFrown said:
Is the Expert Edition still expected to reach 2000 characters?

Does that mean it is still less than 20% complete?

I guess the answer to both questions is the same: yes.

Outlier Dictionary of Chinese Characters

Member

进士

皇帝

进士

榜眼

进士

进士

进士

状元

皇帝

进士

进士

进士

状元

进士

状元

Member

皇帝

Member

榜眼