Working on updates to our full-text indexer to support a "don't ignore diacritical marks" option (so that letters containing umlauts are treated differently from non-umlaut letters in German), and while ä / ö / ü are fairly straightforward, we're a bit puzzled about what to do with ß.
Right now, we turn that internally into 'ss', so that when you type an ß you also get matches with 'ss' in its place and vice versa. Our understanding, however, is that ß/ss does serve as a differentiator in some cases, 'buße' versus 'busse' e.g., and that it might be worth adding an option to make that distinction, probably as a separate option from the ä / ö / ü one since as we understand it ß is not used at all in Switzerland and German-speaking Swiss users would likely prefer that we not require them to remember whether a particular word ought to use an ß or an ss.
My question, though, is whether the ß/ss distinction is actually observed consistently enough in our dictionaries that it would be worth having as an option, or whether it's enough of a muddle that you're basically going to always want them merged anyway. Can anyone provide any insight on that (and on whether this option would be a good idea in general)?
Right now, we turn that internally into 'ss', so that when you type an ß you also get matches with 'ss' in its place and vice versa. Our understanding, however, is that ß/ss does serve as a differentiator in some cases, 'buße' versus 'busse' e.g., and that it might be worth adding an option to make that distinction, probably as a separate option from the ä / ö / ü one since as we understand it ß is not used at all in Switzerland and German-speaking Swiss users would likely prefer that we not require them to remember whether a particular word ought to use an ß or an ss.
My question, though, is whether the ß/ss distinction is actually observed consistently enough in our dictionaries that it would be worth having as an option, or whether it's enough of a muddle that you're basically going to always want them merged anyway. Can anyone provide any insight on that (and on whether this option would be a good idea in general)?