Upcoming OCR enhancements?

etm001

状元
Hi,

I'm just curious to know what enhancements are planned, if any, for OCR? I think I saw mention of potential enhancements in another thread, but I don't recall what was specifically mentioned.

One feature I'd like to see: better handling of text that has inline Zhuyin. Right now the OCR scans the Zhuyin and translates it to gibberish. The only way around this is to manually draw the text recognition box such that it excludes the Zhuyin, but this approach is slow with multiple lines of text (as compared to the block recognizer).

In my experience, the most common scenario is for Zhuyin to be displayed on the right-hand side of vertically oriented Chinese text. (I can't think of a time when I saw Zhuyin written above or below left-to-right oriented text). The only other scenario that I can think of is for Zhuyin to be written in-between Chinese characters - I've only seen this in beginning-level textbook vocabulary, or perhaps in children's books. (I haven't tested the OCR to see how well it works with "in-between Zhuyin text", perhaps it's not an issue).

If updating the OCR to handle inline Zhuyin is not realistic (and I admit the number of users who would want/need this is small), perhaps update OCR so that the end user can draw multiple text recognition boxes on the still image? I don't think this is possible now(?), but it would be a bit easier and faster than having to re-draw the recognition box over each line of text, as I do now.

And thinking about it some more, it would be great if I could draw multiple text recognition boxes on a still image, then use/save them as a "set" that I could use for subsequent still images. This saves time because within a given book, the layout of text is fixed on all pages, so the spacing and the size of the recognition boxes is the same across all pages too (more or less; you'd need to tweak it a bit for each still image).
 

mikelove

皇帝
Staff member
Our big focus for the next round of OCR is better text region detection, in part by taking advantage of the greatly enhanced processing capabilities of newer iPhones; 5S penetration is at a point where we can reasonably start to make some improvements exclusive to 64-bit processors, but even with the older 5/4S there's a lot of room to grow seeing as we originally designed our OCR to work well on a 3GS.

So the sort of things you're likely to see going forward are the system doing a better job of picking out text against complicated backgrounds / of oddly mixed sizes / etc, and in Live Mode probably also some improvements in the areas of jitter / motion detection. We're pretty happy with the actual recognition portion of the engine as is - works very well even with tiny or murky text and often even with neat handwriting; the reason for most OCR failures at the moment is that we're not doing a good enough job of picking out the characters for it to recognize.

More complicated still image layouts are also on our to-do list, though, since we recognize there are a lot of people using our system to digitize documents. So multiple boxes and templates make a lot of sense for that reason, as does making documents save their state in detail (corrected characters etc).
 

etm001

状元
So the sort of things you're likely to see going forward are the system doing a better job of picking out text against complicated backgrounds / of oddly mixed sizes / etc

This is great to hear, and should go a long way towards my specific use case as well as many others.

we recognize there are a lot of people using our system to digitize documents. So multiple boxes and templates make a lot of sense for that reason, as does making documents save their state in detail (corrected characters etc).

Just so. Last year I bought a WorldPenScan BT to handle short scanning work, and to be honest, it never occurred to me to use Pleco's OCR. Overall the WorldPenScan's OCR is quite good - 99% on cleanly formatted text, and it handles mixed language text well too. That said it's a bit pricey, and not the most effective option when you need to scan more than a page or so of text.

For whatever reason, it dawned on me recently that I could use Pleco OCR's still image recognition to handle longer scanning jobs, and it my testing it did a great job on cleanly formatted text. It was only when I tried to scan text that had inline Zhuyin, or border/background images that I ran into problems.

With current functionality it's a bit tedious to scan multiple pages in one session, so any future updates that help with this kind of "batch" document scanning would be awesome.

Thanks!
 

etm001

状元
Here are some more wish list items:
  • Provide a workflow to facilitate batch OCR of documents/pages. As reference, see dedicated scanning apps like TurboScanner, etc.
  • Add OCR for the English alphabet and pinyin (based on my testing, OCR seems designed to capture only Chinese text).
  • Allow me to re-size the OCR recognition box into a trapezoid, to better define text on pages that are not perfectly flat (or, see next suggestion).
  • Improve the ability to recognize text on pages that are slightly curved - common when you are scanning text from a book that you can't open perfectly flat.
In regards to adding OCR for the English alphabet and pinyin: this would be tremendously helpful for students working with textbooks that provide English definitions and pinyin (I'd love to see Zhuyin included too, but I know the market for that is small), and providing English/pinyin OCR would cover a huge amount of Mandarin Chinese learning material.

Finally, I've attached to this message a sample of Chinese text that OCR couldn't recognize. (Actually, there was a brief moment when some of the text was recognized, but when I tried to adjust the recognition box, the recognized characters disappeared. No amount of re-adjustment of the recognition box thereafter helped).
 

Attachments

  • Sample Text.jpg
    Sample Text.jpg
    335.5 KB · Views: 651

JD

状元
Finally, I've attached to this message a sample of Chinese text that OCR couldn't recognize. (Actually, there was a brief moment when some of the text was recognized, but when I tried to adjust the recognition box, the recognized characters disappeared. No amount of re-adjustment of the recognition box thereafter helped).

Just as a test, I downloaded your image and pulled it up in Pleco's Still OCR and it appeared to recognize fine (on iPad Air), unless I'm not understanding your question.
 

Attachments

  • image.jpg
    image.jpg
    626 KB · Views: 622

JD

状元
When I first started trying to read Chinese, I thought it would be fun to read some simple comics. I found that the OCR had a real problem moving around from location to location. The attached figure shows an example. I have to manually resize and move the bounds-box very precisely to envelop just what I want. It would be ideal to be able to tap a group of characters and have Pleco snap around the characters. A less ideal alternative would be able to move the bounds-box by tapping the next location and having it snap to the next location, upon which I could resize it.
 

Attachments

  • image.jpg
    image.jpg
    704.4 KB · Views: 632

etm001

状元
Just as a test, I downloaded your image and pulled it up in Pleco's Still OCR and it appeared to recognize fine (on iPad Air), unless I'm not understanding your question.

I don't have any explanation why it would work on your iPad Air, but not work on my iPhone 5s. As I mentioned, there was a brief moment when the characters were recognized, but after I made an adjustment to the recognition box, no characters were recognized. I took several photos, one with flash and one without, and had the same problem with both photos.

In the end I just used my pen scanner to scan the text.
 

gato

状元
That's strange. It works alright on my iPhone 5s. See attached.
 

Attachments

  • image.jpg
    image.jpg
    221.3 KB · Views: 604

etm001

状元
I took another picture today of the same page.

The first pic shows no recognition of any characters; the recognition box is at its default size/location, i.e., the entire page is selected.

After more testing, I was able to recognize characters in the first three paragraphs (see second pic) by excluding the last paragraph from the recognizer box.
 

Attachments

  • Pic 2.PNG
    Pic 2.PNG
    602.2 KB · Views: 684
  • Pic 1.PNG
    Pic 1.PNG
    751.5 KB · Views: 660

JD

状元
I took another picture today of the same page.

The first pic shows no recognition of any characters; the recognition box is at its default size/location, i.e., the entire page is selected.

After more testing, I was able to recognize characters in the first three paragraphs (see second pic) by excluding the last paragraph from the recognizer box.


Hmmm...I tried it on your PIC1.png (pic of a pic of a scan) and the OCR worked well except for the top line where the green was, and it still worked really well.

Is here any chance you have a pending update to the OCR that never got installed? Can you delete the OCR component and reinstall it? Have you played, perhaps, with the OCR settings and tweaked them? (I'm always breaking something because I changed a setting and forgot to undo it!)
 

etm001

状元
Hmmm...I tried it on your PIC1.png (pic of a pic of a scan) and the OCR worked well except for the top line where the green was, and it still worked really well.

In all of my previous testing, I selected "Take picture" in Pleco OCR, then attempted character recognition. I tried one more time, but this time I loaded the picture from my camera roll, and recognition worked fine.

I don't have the source material with me at the moment, but I'll try some more tests again - one test using "take picture" and one test loading the photo from my camera roll, to see if there is a repeatable difference between the two.
 
Top