Hello Mike,
to give you a comprehensive answer, I have read some more about decompounding here:
Multilingual search: Create a language-specific lexicon for accurate decompounding using weak supervision, without sacrificing speed.
www.algolia.com
It explains: "Since most ecommerce search is keyword based, once we’ve removed the stop words
für (
for) and
meine (
my), breaking the compound
Hundehütte into
Hunde +
hütte should provide similar results as when querying
Hütte für meinen Hund."
So decompounding automatically searches for both "Hund" and "Hütte" instead of "Hundehütte" when I enter that compound word.
I understand that decompounding will try to split only non-lexical compounds, i.e. compounds that were formed ad-hoc by the person who is searching; so "Fahrkarte" (= ticket in public transport) would not be split because that compound was lexical and fixed, and the user would have no interest in getting search results for either "fahren" or "Karte".
Perhaps decompounding is most important for E-commerce search engines because in that case, you want to make sure to catch and display all eligible products that a user might want instead of just searching for the compound word. In Pleco, however, I suspect you'd be getting unwanted search results in too many cases—I'm not sure about that, though. It's hard to say without trying it out. A knowledgeable user would instinctively use separate words instead of long compounds, because they know that Pleco may have a hard time separating them. Almost all of the compound words I tried to come up with are lexical words—non-lexical compounds are quite rare in my everyday use. A few examples of lexical compounds:
Brieftasche
Taschentuch
Flaschenöffner
Kopfhörer
Bildschirmständer
Decompounding wouldn't do anything with these, which is correct. A few non-lexical, or at least less-lexical compound words:
Leseerfahrungen
Belüftungsschacht
Sanitärinstallation
Beatmungsapparat
Flaschenöffner-Herstellungsprozess / Flaschenöffnerherstellungsprozess (here, you'd be more likely to use the first version with a dash, because that makes clear where it combines two lexical compounds. So no more decompounding would be needed here, either.)
Kindheitserinnerungen
Nachtlektüre
I think the cases where decompounding may produce more useful search results than without are quite few and far between. I would definitely be making it an opt-in feature if it were implemented.
What would seem more important and useful to me are Boolean searches, or simply multi-word full text searches where each of the separate words can occur anywhere in definition (right now, as we know, one can only search for an exact multi-word expression), so I could also enter the components of longer words if I wanted to. I remember you had said a long time ago that Boolean searches were on the To-do list. I'm quite confident that's coming.
On your remark that adjectives are common candidates for compounding, did you mean adjectives formed from nouns? So that if I search for "Wasser" (water), it will also find "wässrig" (watery)? In general, I don't see a lot of compounding happening with adjectives.
Hope this helps,
Shun