Partially tappable PDFs in Reader

Shun

状元
Hi,

I'm attaching a PDF that isn't fully tappable in the Document Reader, even though I can copy its text using Preview without any loss of Chinese characters.

I created the PDF using the newest MS Word and the Mac OS built-in Print to PDF function.

Might there be an easy fix for this issue? If not, I'll just use the Word or text file formats. I don't own the full Adobe Acrobat app.

Thanks!
 

Attachments

Shun

状元
Follow-up: If the only solution is converting the PDF to, say, a 300 dpi PNG file, which can contain multiple pages (when the PDF is exported in Preview), it would be a great time saver if all the pages could be recognized in one go by Pleco's OCR. The resulting text would contain the text of all pages concatenated together. Or, if the conversion is done in another application, and the PDF file is exported into many one-page PNGs, each numbered ...0001, ...0002, ...0003, and so on, Pleco's OCR could recognize this and do a full OCR of all pages.

Would such a capability be easy to add? You could try it out with any Chinese multi-page PDF. Thanks!


EDIT: Of course, if the PDF could be processed by the OCR engine directly (converting to bitmap internally), that would be the easiest solution. Many (Chinese) PDFs seem to be un-tappable.
 
Last edited:

mikelove

皇帝
Staff member
This is an issue with the Mac OS X PDF converter, we've had a few other reports of it now too - basically it's exporting some characters as Kangxi Radicals (which use different character encoding numbers) instead of as actual characters. Best fix at the moment is to generate your PDF some other way, or use another format. Now that we know this is a problem we'll probably add some code to automatically detect those + convert them to regular characters in our next major update but that's a few months away.
 

Shun

状元
Great, thanks! But strangely, a PDF my teacher converted from Word for Windows shows a similar partially-tappable behavior. Would you like me to E-mail it to you?
 

etm001

状元
Great, thanks! But strangely, a PDF my teacher converted from Word for Windows shows a similar partially-tappable behavior. Would you like me to E-mail it to you?
I encountered this problem when exporting RTF files from TextEdit to PDF using the "Export as PDF" function. I filed feedback with Apple (it's not really a bug report, I guess), and it could only help if more people filed reports.
 
Top