iPad 3 / Still OCR

character · Apr 20, 2012

Re: iPad 3

Mike, I don't know if you want to extract this OCR conversation and create a new thread or put it in a more appropriate thread so perhaps more people will weigh in on the issue.

mikelove said:
I'm not quite sure how it would hurt responsiveness - all we do is register with the system to detect two-tap gestures and do something differently when we actually do detect one of them. Nothing will change if there's only one finger on the screen.

OK; so there's not some waiting done by the system to make sure a tap isn't really part of a two-finger tap or a pinch zoom isn't a two-finger tap? Great.

The problem is that if we put it in the picture menu not enough people are going to find it (unless we can come up with a very obvious icon for "center box on screen," maybe a zooming box icon or something). Then again, the same could be said for a two-finger gesture, but once we're dealing with that barrier anyway it makes sense to make it as easy to invoke as possible for people who do discover it.

You can certainly put it somewhere else, but that particular idea of radioman's is quite good and worth making available. Personally I find many of Pleco's icons opaque as to meaning, so I wouldn't stress about this particular one. You might want to adopt the practice of some recent iPad apps and have tooltips appear for all the icons whenever the fan is pressed.

No, I'm talking about one that's permanently positioned (say) 20 pixels above the top left corner of the box - you drag it around and the box moves with it.

OK, though that only helps when that area is visible, so if prioritizing I would go with the resize to view button first.

I really like the potential workflow the resize button allows: read what is recognized when the whole page is OCR'd, then zoom in on a problem area and hit the resize button to try scanning just that section without a lot of fiddling with corner handles (which are great for precision positioning).

mikelove · Apr 21, 2012

character said:
Mike, I don't know if you want to extract this OCR conversation and create a new thread or put it in a more appropriate thread so perhaps more people will weigh in on the issue.

90% of this thread is about OCR, so I just renamed it - easier and spares having to put some messages in one thread or the other that might belong in both.

character said:
OK; so there's not some waiting done by the system to make sure a tap isn't really part of a two-finger tap or a pinch zoom isn't a two-finger tap? Great.

No, there are a few places where you have to make UI work that way (separating taps from scroll gestures in certain situations, e.g.) but in a well-designed app you generally just back out / undo whatever the user was in the process of doing with one finger when you detect a second one; that's what we do in handwriting input now.

character said:
You can certainly put it somewhere else, but that particular idea of radioman's is quite good and worth making available. Personally I find many of Pleco's icons opaque as to meaning, so I wouldn't stress about this particular one. You might want to adopt the practice of some recent iPad apps and have tooltips appear for all the icons whenever the fan is pressed.

Tooltips on the fan would have been a great idea, but now that we're turning it from a slide-up toolbar into a multi-row popup "bubble" it's unfortunately going to be covering up half of the things it might otherwise give you tips on. (we could maybe add an additional tooltip overlay option in that menu, though)

character said:
OK, though that only helps when that area is visible, so if prioritizing I would go with the resize to view button first.

I see those as serving two different functions, actually - resize to view is more of the "summon" gesture, while the move cursor as radioman suggests is useful when you're trying to keep the box the same size but move it around through a vocabulary list or somesuch.

character said:
I really like the potential workflow the resize button allows: read what is recognized when the whole page is OCR'd, then zoom in on a problem area and hit the resize button to try scanning just that section without a lot of fiddling with corner handles (which are great for precision positioning).

This is the main part of the discussion I'm not getting, honestly - how is this box going to give you more accurate results than a recognition area spread out over the entire document? It's really only through "precision positioning" that you're going to get better results - zooming in on a specific area on those occasions when the recognizer for whatever reason fails to recognize some portion of the entire page. And it's only through "precision positioning" that you're going to be able to capture, say, a specific part of the page that you want to save out the contents of.

So could you possibly explain in more detail when exactly you would find this "center on screen" button useful?

character · Apr 21, 2012

So could you possibly explain in more detail when exactly you would find this "center on screen" button useful?

Remember that we're talking about reading the document in the Block Recognizer; that's why we're asking for the shading inside the recognition area to be removed, etc. So resize to view is useful in that context: we are in the middle of reading the document with Hide Characters enabled. We've zoomed in on a particular part of the page, inside the bounds of the recognition area. While reading, we've panned several times down the page. We're in a state of flow. We know where we are in the story, but not on the page. We tap a character and nothing comes up, or something obviously wrong comes up. In that situation, which is better:

a) Tap picture, tap "center on screen", wait for recognition, tap character we wanted to look up, hopefully get back to reading without losing flow.

b) Zoom out to the point we can see the recognition box edges, grab a resizing handle, try to reorient ourselves to where we were in the document, start resizing one or more corners of the box, wait for recognition, zoom in approximately where we were before, try to remember which character we wanted to recognize, flow is long gone...

mikelove said:
This is the main part of the discussion I'm not getting, honestly - how is this box going to give you more accurate results than a recognition area spread out over the entire document? It's really only through "precision positioning" that you're going to get better results - zooming in on a specific area on those occasions when the recognizer for whatever reason fails to recognize some portion of the entire page. And it's only through "precision positioning" that you're going to be able to capture, say, a specific part of the page that you want to save out the contents of.

I have an image (two pages of a Chinese Breeze book) where the scan of characters printed on cheap paper comes through as mixes of black and (various shades of) gray pixels. Full page recognition worked on almost all of it, but not quite. A yi1 "one/a" was recognized as '...' and the last line and the vocabulary footnotes were not recognized. If I resize the recognition area a bit, then at least the last line is recognized. So if I was zoomed in and reading, I feel confident that the recognition that would happen after "center on screen" would recognize that last line.

I'm not concerned with trying to capture a specific part of the page, I'm reading the document in the Block Recognizer. I don't care if the characters at the edges of the recognition area get misrecognized, I'm not reading them at the moment. ETA: If I wanted to capture a specific section, "center on screen" gives me a quick way to resize the box to roughly the size I want which I can then tweak without first zooming out to be able to manipulate the box.

mikelove · Apr 21, 2012

character said:
Remember that we're talking about reading the document in the Block Recognizer; that's why we're asking for the shading inside the recognition area to be removed, etc. So resize to view is useful in that context: we are in the middle of reading the document with Hide Characters enabled. We've zoomed in on a particular part of the page, inside the bounds of the recognition area. While reading, we've panned several times down the page. We're in a state of flow. We know where we are in the story, but not on the page.

Thanks for the clarification. It sounds, however, like this would be equally well solved by radioman's two-finger "summon" gesture - perhaps even better since you don't need to switch out of your normal scrolling / flicking / tapping behavior to get into it. But it definitely makes the case for doing something to make it easier to grab / resize the box while zoomed-in.

One other thing I wonder about is what happens after you've summoned the box - are you likely to then want to switch back to recognizing the whole page? That actually does argue for the button, since it could act as a sort of toggle between the full page and the currently visible area - might be a bit confusing if people didn't realize that they could also move around the handles directly, but it seems like it would make it easier to return to normal reading after resizing the box.

character · Apr 21, 2012

mikelove said:
It sounds, however, like this would be equally well solved by radioman's two-finger "summon" gesture - perhaps even better since you don't need to switch out of your normal scrolling / flicking / tapping behavior to get into it.

True; I just worry adding the detection for that slowing down more common operations, and the ease of the summon gesture at the edges of the 9.7" screen when holding the iPad with one hand. The button doesn't depend on having big hands or both hands free.

One other thing I wonder about is what happens after you've summoned the box - are you likely to then want to switch back to recognizing the whole page? That actually does argue for the button, since it could act as a sort of toggle between the full page and the currently visible area - might be a bit confusing if people didn't realize that they could also move around the handles directly, but it seems like it would make it easier to return to normal reading after resizing the box.

Making it a toggle is interesting; it would be like some other buttons in Pleco then. I just figured it was so easy to zoom all the way out that a pinch gesture followed by the button press wasn't too bad a way to get back to full page recognition.

mikelove · Apr 21, 2012

character said:
True; I just worry adding the detection for that slowing down more common operations, and the ease of the summon gesture at the edges of the 9.7" screen when holding the iPad with one hand. The button doesn't depend on having big hands or both hands free.

I'm confident the detection won't slow anything down, though the difficulty of doing this one-handed is certainly worth considering.

character said:
Making it a toggle is interesting; it would be like some other buttons in Pleco then. I just figured it was so easy to zoom all the way out that a pinch gesture followed by the button press wasn't too bad a way to get back to full page recognition.

Well my worry was that that would interrupt "flow" too - you're already zoomed in when you encounter a particular piece of text that the system misread, so wouldn't it be kind of annoying to have to zoom back out again in order to return to the previous recognition area setting?

character · Apr 22, 2012

mikelove said:
I'm confident the detection won't slow anything down, the difficulty of doing this one-handed is certainly worth considering.

So there wouldn't be a problem with having both the gesture and the button.

Well my worry was that that would interrupt "flow" too - you're already zoomed in when you encounter a particular piece of text that the system misread, so wouldn't it be kind of annoying to have to zoom back out again in order to return to the previous recognition area setting?

You've sold me.

If you allowed the button to be put on the top level, then the only advantage the gesture has is the ability to be initially sized smaller than the screen. Whereas the button is much more easily discoverable, easy to use with one hand, and is 'non-destructive' in that a second press returns to the previous recognition area.

mikelove · Apr 23, 2012

character said:
If you allowed the button to be put on the top level, then the only advantage the gesture has is the ability to be initially sized smaller than the screen. Whereas the button is much more easily discoverable, easy to use with one hand, and is 'non-destructive' in that a second press returns to the previous recognition area.

The gesture is also easier to get to when you're in the mindset of scrolling around the screen tapping on things - moving your finger (and your attention) down to the button bar in that case could be a bit annoying, but tapping on the screen with two fingers is quite a lot more intuitive. (perhaps we could then make a 3-finger tap reset the box to occupy the entire screen)

character · Apr 23, 2012

You just love to argue.

mikelove said:
The gesture is also easier to get to when you're in the mindset of scrolling around the screen tapping on things - moving your finger (and your attention) down to the button bar in that case could be a bit annoying, but tapping on the screen with two fingers is quite a lot more intuitive.

Personally I think it's the opposite. I'm interacting with the screen repeatedly with one finger (and just one hand). The two finger gesture requires me to decide how big I want the box and try to position two fingers (possibly using two hands). With the one button press, the time required to find/hit the button is much smaller than the recognition time. The time required for figuring out/performing the two-finger gesture may equal the recognition time of recognizing the entire screen. If I have Hide Characters on (as is likely in this scenario) I might have to perform the two-finger gesture multiple times or turn Hide Characters off to figure out the optimal size to make the recognition area. I think for all those reasons the two-finger gesture would pull me farther out of reading than pressing the button.

But you could include both the button and the gesture.

(perhaps we could then make a 3-finger tap reset the box to occupy the entire screen)

But 3-finger tap would slow down recognition of the much more common 2-finger tap.

mikelove · Apr 23, 2012

character said:
Personally I think it's the opposite. I'm interacting with the screen repeatedly with one finger (and just one hand). The two finger gesture requires me to decide how big I want the box and try to position two fingers (possibly using two hands). With the one button press, the time required to find/hit the button is much smaller than the recognition time. The time required for figuring out/performing the two-finger gesture may equal the recognition time of recognizing the entire screen. If I have Hide Characters on (as is likely in this scenario) I might have to perform the two-finger gesture multiple times or turn Hide Characters off to figure out the optimal size to make the recognition area. I think for all those reasons the two-finger gesture would pull me farther out of reading than pressing the button.

But you could include both the button and the gesture.

As I'm increasingly inclined to do, yes

(though I'm surprised radioman hasn't chimed in here since he was so big on the two-finger gesture in his previous posts)

character said:
But 3-finger tap would slow down recognition of the much more common 2-finger tap.

Actually no, in fact if anything it's easier to undo a two-finger tap on a three-finger one than it is to undo a one-finger tap on a two-finger one; in both cases we're basically just waiting for there to be no fingers on the screen before we "commit" the current action, and each new finger brings different behavior. Tap one finger -> highlight character which we're about to look up, tap second finger -> clear highlight and resize recognition area to those two fingers, tap third finger -> resize recognition area to entire image.

furisas · Apr 29, 2012

hi, i am posting for first time to this forum, and am not sure if my noob questions are appropriate here in this thread, but it is about ipad 3 and OCR, so here goes.

My use case is this: I scan Chinese text into Evernote (the original textbook is cut up but I am prepared to live with that) and than save each image as a photo in ipad. I use Pleco OCR to block recognize each image from the photo library. Then, i find it very helpful to just tap unknown words to get at the meaning.

Trouble is quite a few of the OCR characters are plain wrong, and sometimes for text that is printed on fairly clouded or darker background, entire block of text is not OCRed.

I tried re-scanning at the highest resolution (600 DPI for colour and 1200 for BW), but it did not help. In fact, it got worse - more blocks of text are not recognised! What am I missing? What are the parameters for accurate Pleco OCR of every character in such a situation? The original text size are the usual size.

Will upgrading to ipad 3 solve the problem, as radioman earlier said? Is there any other thing else I can do, other than upgrading? Thanks.

Btw, I second the suggestions for the photo-strip solution to easily move back and forth between several pages, when the teacher jumps about between several pages. Sorry to distract the preceding discussion on one-two-three fingers.

mikelove · Apr 30, 2012

furisas said:
My use case is this: I scan Chinese text into Evernote (the original textbook is cut up but I am prepared to live with that) and than save each image as a photo in ipad. I use Pleco OCR to block recognize each image from the photo library. Then, i find it very helpful to just tap unknown words to get at the meaning.

Trouble is quite a few of the OCR characters are plain wrong, and sometimes for text that is printed on fairly clouded or darker background, entire block of text is not OCRed.

I tried re-scanning at the highest resolution (600 DPI for colour and 1200 for BW), but it did not help. In fact, it got worse - more blocks of text are not recognised! What am I missing? What are the parameters for accurate Pleco OCR of every character in such a situation? The original text size are the usual size.

The system's not perfect - it sounds like there's simply too much interference around those characters for it to pick them up reliably, or like they're in a font that the system hasn't been programmed for. It may also be that the resolution is actually too high - try lowering it and see if that helps. You could also play around with the images in Photoshop or another image editor (try to make a batch script that will convert them to black and white with a threshold such that the background noise gets removed) and see if that helps.

furisas said:
Will upgrading to ipad 3 solve the problem, as radioman earlier said? Is there any other thing else I can do, other than upgrading? Thanks.

Unfortunately no - about the only thing the iPad 3 helps with OCR-wise is taking pictures directly on-device, for which the iPad 3 would be significantly better than the old iPad on account of its autofocus lens. (however, ergonomically it's still not exactly ideal to try accurately aiming an iPad at things you want to take pictures of)

Also, have you been using the system in Block Recognizer mode or Scroll Lookup Words? The latter might work a bit more reliably for you, since it has a smaller / simpler thing to deal with at any given time.

furisas · May 1, 2012

Great. Your suggestion to try Scroll Lookup Words works. It does OCR the blocks of texts that were previously un-OCRed.

On the ipad3 contributing nothing to OCR of still images that has scanned, I am surprised that there is no benefit, and thus no incentive for me to upgrade to ipad 3, other than for retina display of the text in the images.

Can you give an idea when you can implement at least some initial "multiple page view" of several images?

mikelove · May 1, 2012

furisas said:
On the ipad3 contributing nothing to OCR of still images that has scanned, I am surprised that there is no benefit, and thus no incentive for me to upgrade to ipad 3, other than for retina display of the text in the images.

Not really any benefit, no - OCR is CPU- rather than GPU-bound, so the faster graphics chip in the 3 won't matter at all, and the Retina Display makes everything clearer but doesn't really add any functionality.

furisas said:
Can you give an idea when you can implement at least some initial "multiple page view" of several images?

Hopefully in the next major update, though we're not getting any more specific than "soon" regarding that at the moment

radioman · Jul 17, 2012

mikelove said:
As I'm increasingly inclined to do, yes (though I'm surprised radioman hasn't chimed in here since he was so big on the two-finger gesture in his previous posts)

Sorry, just seeing this.

With regard to one-finger operation, maybe I'm missing something, but the way that I conceptualized this is:

1) Entry into OCR mode puts the box on the screen (like it does today).

2)If you want to resize the box you can either move the corners (like you do currently, using one finger), or you have the new option to put down two fingers to define the box size.

3) Once the box size is set on the screen. it stays in that position until moved. If you move to another photo, the box is in the same screen position where you positioned it on the previous photo.

4) The only caveat to that is that I would want the ability to take box already on the screen, and if I press and hold the box (not on text but anywhere there is no text), then the box would be able to be slid using a single finger to another position on the screen (including scrolling down the picture). This is very useful when running lists of words (as an example) on some definition page, where the OCR does a poor job of interpreting text for whatever reason (i.e., text is read vertically when it should be horizontal, and switching vertical and horizontal still do not fix the problem). But there are other applications for this as well.

As for iPad 3, hated it and returned it. I know this is subjective, but the main reason was battery life. You could just feel the power burning away, and battery recharge times take far too long. Yes, Hanzi is amazing on it, but my eyes are just not that good. And, one of the benefits to using these devices is you can zoom in, etc., so I found that small Hanzi just is not that big a deal - my eyes bail out way before the retina display advantage kicks in. As for pictures, OCR on photos taken from the iPad 2 is worthless, but I have an iPhone 4 and that works fine, and the pictures just go to Photo-stream and wind up in the iPad. Most other docs I bring in are scanned @ 300 or 600dpi. And now they have a new iPad 2 that uses a new chip that yields even longer battery life (http://goo.gl/huzqx), but last I checked, it was limited to just the Wi-Fi only 16GB model.

radioman · Jul 17, 2012

character said:
I'm not concerned with trying to capture a specific part of the page, I'm reading the document in the Block Recognizer. I don't care if the characters at the edges of the recognition area get misrecognized, I'm not reading them at the moment. ETA: If I wanted to capture a specific section, "center on screen" gives me a quick way to resize the box to roughly the size I want which I can then tweak without first zooming out to be able to manipulate the box.

I can see where there could be applicability for this. And my first instinct is that I don't need to be OCRing things that are not "visible". However, not sure what effect that would have on scrolling documents, etc. But, the two finger box method I talk about is for the following scenario I believe addresses the following important scenario.

You are reading a book or document on the iPad. You put it up on your big iPad screen, which in landscape mode can well render a letter size or A4 size page such that it is basically actual size (you can't do this on 7 inch tablets...). On the page, there is text for the first two paragraphs, they read fine. The 3rd paragraph on the left side that screws up OCR for the text that is on the right side. Without having to re-zoom and take the page out of its original context (text with pictures), I simply highlight the area that has been "OCR-Challenged" and move on.

And maybe there would be a "back to full page OCR" or "Previous OCR Position" button after you look at the problem area.

I believe way this underscores the benefits of the live OCR using the pages of the book, where, rather than looking at just text that has been cut from the page, everything is in exact context that the document author intended, with no zooming required.

And this "no need to zoom" scenario would certainly be desirable for things like PowerPoint presentations that have been exported as graphics.

mikelove · Jul 17, 2012

radioman said:
1) Entry into OCR mode puts the box on the screen (like it does today).

2)If you want to resize the box you can either move the corners (like you do currently, using one finger), or you have the new option to put down two fingers to define the box size.

3) Once the box size is set on the screen. it stays in that position until moved. If you move to another photo, the box is in the same screen position where you positioned it on the previous photo.

4) The only caveat to that is that I would want the ability to take box already on the screen, and if I press and hold the box (not on text but anywhere there is no text), then the box would be able to be slid using a single finger to another position on the screen (including scrolling down the picture). This is very useful when running lists of words (as an example) on some definition page, where the OCR does a poor job of interpreting text for whatever reason (i.e., text is read vertically when it should be horizontal, and switching vertical and horizontal still do not fix the problem). But there are other applications for this as well.

All makes sense - we've been fiddling around with replacing the library's text detection algorithms with our own in an effort to (hopefully) reduce / eliminate the need to resize the box at all, but since that may not be working correctly for a year or more some variation on this seems like a good move in the meantime. And I certainly hear you about keeping the position between pages, though we'd probably want to make that optional since some people might prefer the reset.

radioman said:
As for iPad 3, hated it and returned it. I know this is subjective, but the main reason was battery life. You could just feel the power burning away, and battery recharge times take far too long. Yes, Hanzi is amazing on it, but my eyes are just not that good. And, one of the benefits to using these devices is you can zoom in, etc., so I found that small Hanzi just is not that big a deal - my eyes bail out way before the retina display advantage kicks in. As for pictures, OCR on photos taken from the iPad 2 is worthless, but I have an iPhone 4 and that works fine, and the pictures just go to Photo-stream and wind up in the iPad. Most other docs I bring in are scanned @ 300 or 600dpi. And now they have a new iPad 2 that uses a new chip that yields even longer battery life (http://goo.gl/huzqx), but last I checked, it was limited to just the Wi-Fi only 16GB model.

Supposedly there'll be an iPad 4 in the not-too-distant future that's thinner and has better battery life - Apple had to release the 3 because the market was demanding a new iPad, but just as they hastily replaced the original iPad with the iPad 2 to fix its various hardware faults they'll hastily replace the iPad 3 with the iPad 4 to deal with its.

radioman · Jul 17, 2012

mikelove said:
All makes sense - we've been fiddling around with replacing the library's text detection algorithms with our own in an effort to (hopefully) reduce / eliminate the need to resize the box at all.

Well, Pleco does a very good job now. Improving it would be great, but I figure there will always be something that will challenge the OCR. Including in the interpretation full grammatical context, not sure if that is currently being done, but I figure that and other complimentary analyses can improve the accuracy. Maybe near 100% under most conditions is possible down the road.

mike love said:
Supposedly there'll be an iPad 4 in the not-too-distant future...

I think they already did some things software-wise to improve the battery situation. But personally, I would be delighted if the iPad 4 was nothing more than Apple doing the equivalent of shoving the iPad 3 battery with the new iPad 2 hardware. 20-24 hour battery life would be lovely.

In the interim, looks like its time to review the virtues of the Nexus 7.

mikelove · Jul 18, 2012

radioman said:
Well, Pleco does a very good job now. Improving it would be great, but I figure there will always be something that will challenge the OCR. Including in the interpretation full grammatical context, not sure if that is currently being done, but I figure that and other complimentary analyses can improve the accuracy. Maybe near 100% under most conditions is possible down the road.

That's something we wouldn't even attempt to hook into OCR until we were confident that it was detecting / isolating text blocks reliably, but it would indeed be a neat addition.

radioman said:
I think they already did some things software-wise to improve the battery situation. But personally, I would be delighted if the iPad 4 was nothing more than Apple doing the equivalent of shoving the iPad 3 battery with the new iPad 2 hardware. 20-24 hour battery life would be lovely.

Well a processor with lower power consumption would help for heat dissipation too - I suspect that processor will debut in the iPhone 5 and then appear in an iPad update soon after. Though they may start making these iPad updates a little quieter, like the occasional 6-month-later updates to the MacBook Pro - that would explain the "the new iPad" moniker for the 3.

radioman said:
In the interim, looks like its time to review the virtues of the Nexus 7.

I actually find the interaction on 7-inch tablets to be quite different; they're better for reading e-books because they're lighter and hence more comfortable to hold, but the screen is small enough that you still feel more like you're on a smartphone than a tablet; the toolbar-driven, multiple-panes-of-information "desktop" doesn't really work on them. The iPad is about the smallest possible device on which you can have something with equivalent complexity to a desktop app UI and not feel like the interface is cramped.

alanmd · Jul 18, 2012

I think the reason for not using the name "iPad 3" is that the "new iPad" is not an umambiguous improvement over the iPad 2. It's a trade-off, with a few things sacrificed (charging time, battery life, heat, weight) to get the benefit of that amazing screen.

For someone whose sight isn't good enough to see the difference between the old and new screen (as was pointed out earlier), or who doesn't need the higher resolution (fast moving movies/games maybe) then there is no reason at all to get the new iPad over the iPad 2.

For reading PDFs of academic papers however (which is what I spend a lot of time doing on the iPad) which can't be reformatted to fit the screen like in iBooks or Kindle, I find an enormous difference is made by the new iPad's higher resolution screen. On the old iPad I had to mess around zooming in and out, whereas with the new iPad I can read all text including equations and footnotes, and if I crop off the borders from a letter size page the display is roughly life size.

I spend a lot of the rest of the time using Pleco, and it looks wonderful, especially with all the font sizes reduced as the screen is filled with text where even the most complex hanzi are still readable.

iPad 3 / Still OCR

状元

皇帝

状元

皇帝

状元

皇帝

状元

皇帝

状元

皇帝

Member

皇帝

Member

皇帝

状元

状元

皇帝

状元

皇帝

探花