Still OCR enhancement suggestions

etm001

状元
Hi,

I spent some time today using the still OCR functionality to capture text from a variety of newspaper articles. Here are some suggestions/questions:
  • Allow me to append OCR'ed text to the "capture" screen. I really want to be able to OCR a block of text, check and/or correct it in the "capture" screen, jump back to the photo and OCR some more text, review/correct...ad infinitum, until I'm satisfied and I am ready to copy everything to the clipboard.
  • Resizing the OCR box in "block recognizer" mode works great and is easy to use. But resizing in "scroll capture text" mode is super frustrating, especially when I'm trying to align the recognizer box just right in order to capture text. In some cases no amount of resizing - whether resizing the OCR box or the photo - allowed me to get the OCR box exactly where I needed it. Could you make the resize controls in "scroll capture text" mode work the same as "block recognizer" mode?
  • Although the OCR will recognize punctuation, it doesn't appear as such on the screen. That is, for punctuation, no there are no "green overlaid" characters that indicate successful recognition.
  • For Chinese text, the OCR does a good job of aligning the "green overlay" characters onto the OCR'ed characters themselves. But at least in my testing, the green overlay text for numbers was super small in comparison to the OCR'ed numbers themselves. So small that it was really, really difficulty for me to see what numbers the OCR thought it had recognized without holding my iPhone right up to my face.
  • The OCR did recognize the decimal point in a number (e.g., "2.3萬元"), but in the "capture" screen it inserted a chunk of space after the period - I'm wondering if it thought the period indicated the end of a sentence? Maybe it was just a random fluke.
Thanks!
 

mikelove

皇帝
Staff member
Reasonable ideas. On your second point, though - why are you using "scroll capture text" in that case? Why not simply use Block Recognizer?

Also, on your third point, any chance the punctuation might simply be too small to see?
 

etm001

状元
On your second point, though - why are you using "scroll capture text" in that case? Why not simply use Block Recognizer?
I would definitely use the block recognizer. I just wanted to pass along that I found resizing the OCR box in "scroll capture text" mode difficult.

any chance the punctuation might simply be too small to see?
I've attached a screenshot which I think illustrates the issue. Also, look at the English text and numbers - they are recognized but the overlaid characters are really, really small (unlike the overlaid Chinese characters which are sized and well aligned with the image).

Another thing I've been thinking about: what's the best workflow for correcting OCR mistakes? IMHO the best method is one where you can compare the image text and the OCR text on the same screen, select incorrect (or missing) OCR characters, then correct them. The first part of this functionality is already available in Pleco: you can view the OCR'ed and image text on the same screen (and even cooler, hide the OCR text and directly tap on characters in the image to display definitions). The only piece missing is functionality that would allow the end user to select/edit the OCR'ed characters. (Note: you can certainly edit the OCR'ed characters on the "capture" screen, but you can't view the OCR'ed and image text on that screen at the same time).

Another thought: what about visually indicating Pleco's confidence (i.e., score) for each OCR'ed character? (I'm assuming Pleco uses an internal score/probability value to select the "correct" character amongst a set of candidates). In other words, characters that have a high score are green; medium scores are yellow; and low scores are red. (These scoring thresholds could even be user defined.) This kind of functionality would allow the end user to zero-in on the characters that are most likely to need review/correction (perhaps not critical for small amounts of text, but really helpful when scanning larger amounts of text, e.g., multiple pages in a book, etc.)
 

Attachments

  • photo1.jpg
    photo1.jpg
    83.5 KB · Views: 447
Last edited:

mikelove

皇帝
Staff member
English text / numbers just aren't something we bothered to optimize this for - certainly fixable though.

Mistake correction is on our to-do list, both in terms of simultaneously editing the text / characters on the page and also in removing extraneous (or boundary-miscalculated) characters / adding new ones.

We do have scores, but they only compare character candidates for a particular spot - there's no absolute confidence rating, the algorithm doesn't really lend itself to generating one.
 
I just bought the OCR Add-on last week after many years without it and it's already been worth the cost. I also like the idea of being able 'to append OCR'ed text to the "capture" screen'. I just looked up someone's name and then copied it to put it into my phone's contacts. I had to copy and paste three times where if I had been able to append the 2nd and 3rd character to the clipboard it would have been much faster. While off-topic, this would also work great in non-OCR usage too. For example, after copying all three characters and going to the dictionary, I want to copy the Pinyin to put next to the name in contacts. It would be great if I could copy the first character's Pinyin and then append the 2nd and 3rd characters' Pinyin to the clipboard so I can paste it all into the contacts.
 

mikelove

皇帝
Staff member
Why not just stretch the recognition area to cover the entire name? It can certainly do more than one character at a time - if you enclose all three characters then you can copy that to the clipboard very easily.

Automatic contact creation from OCR would be an interesting feature, but honestly I'm not sure if it's worth doing it on our end when there are so many dedicated business card scanner apps (quite a few of which support Chinese) already in the app store.
 
Top