Still OCR won't recognize large PDF file

Shun

状元
Hi Mike,

I took photographs of a book, then converted the resulting JPG files into one PDF file using Acrobat Pro (the PDF file size equals the total size of the JPEGs) and copied the PDF file to Pleco. Still OCR is able to recognize the individual JPG files, but strangely enough, when I open the PDF, the Chinese characters aren't tappable. I tried both the 甲 and the 乙 recognizers, unfortunately neither of them worked. The PDF file size is about 143 MB. I doubt that Pleco needs to read the entire decoded PDF file into memory, so shouldn't Still OCR be able to handle PDFs of any size? I will send you a link to the PDF file by E-mail.

Thanks, Shun
 

mikelove

皇帝
Staff member
Does it help if you downsample the JPGs? They're coming through as pretty enormous in this PDF file, we use slightly different methods to determine how much to downscale pages by in PDFs versus standalone images before we render them, and it's possible we're not downscaling them enough in this particular PDF.
 

Shun

状元
Downscaling to 900 pixel width created a 16 MB file, which worked, but was a bit too blurry. So I'm resorting to OCR'ing individual JPEGs for now.

Thanks!
 
Last edited:
Top