removing the paragraphing from a document with 4,807 of them

feng

榜眼
Every time you hit 'return' you make a little paragraph mark (usually invisible). I want to erase all paragraph instructions (not simply make the marks invisible) in a document that has 4,807 of them! At the moment I am using Pages, but if you can tell me how to do it in Word or something else, that could be useful to me as well.

In case you are wondering what I am talking about, the two links below contain the list of the 4,808 most common characters from Taiwan's Ministry of Education (old list; whether or not you think they are most common or whatever is not important to me), one as a PDF, one as a text file.
http://stroke-order.learningweb.moe.edu ... d/4808.zip
or
http://stroke-order.learningweb.moe.edu ... 808txt.rar
I've had no luck with this TXT file. It opens as nonsense characters in every application I open it with.

Problem is that when cutting and pasting the PDF, which is originally in eleven columns (or opening the TXT), it ends up one character to a line because after each character (except the last) there is a paragraph mark. Again, I am asking how to remove this instruction from the entire document, not just hide the paragraph symbol as that would help be not at all. And no I am not going to go through the document and delete each one individually. I want these 4,808 characters to all run together an one clump of text as if they were an essay. Saves me printing and paper shuffling for the project I am working on.

Can anyone help?
 

tsr

秀才
Re: removing the paragraphing from a document with 4,807 of

feng said:
Every time you hit 'return' you make a little paragraph mark (usually invisible). I want to erase all paragraph instructions (not simply make the marks invisible) in a document that has 4,807 of them! At the moment I am using Pages, but if you can tell me how to do it in Word or something else, that could be useful to me as well.

[...]

Can anyone help?

I would use TextEdit. Do a find and replace. Find all the \n characters and leave the replace field blank. That is assuming the "return" characters are simply new line markers.
 

tsr

秀才
Re: removing the paragraphing from a document with 4,807 of

tsr said:
I would use TextEdit. Do a find and replace. Find all the \n characters and leave the replace field blank. That is assuming the "return" characters are simply new line markers.

I'm sorry, I told you the wrong thing. Use TextEdit. Press command-F. Then click the magnifying glass and choose "Insert Pattern". There you will find options for "Paragraph Break" and "Line Break". Click the "Replace" checkmark to enable the replace field. Leave it blank to delete whatever you put in the find box. Hope that helps.
 

feng

榜眼
Re: removing the paragraphing from a document with 4,807 of

Thanks, but it doesn't seem to do a thing :D
 

mikelove

皇帝
Staff member
Re: removing the paragraphing from a document with 4,807 of

feng said:
Thanks, but it doesn't seem to do a thing

TextWrangler has a built-in "Remove Line Breaks" command, I believe. (at the very last it has robust Regular Expression search that should definitely be able to remove those line breaks for you)
 

feng

榜眼
Re: removing the paragraphing from a document with 4,807 of

Thanks, Mike! You're a genius. It worked perfectly :D
 

feng

榜眼
Re: removing the paragraphing from a document with 4,807 of

Now, supposing you had a document like this:
․ㄅㄚ ba 吧罷 ㄅㄚ bā 八巴叭扒吧芭疤笆 ㄅㄚˊ bá 拔跋鈸 ㄅㄚˇ bǎ 把靶 ㄅㄚˋ bà 伯把爸罷霸壩

This has five different Hanyu pinyin sounds. If you had 1180 in the document, is there a way to tell the computer to get rid of all the roman characters (and their associated diacritic marks) in one fell swoop? I mean, make it look like this:
․ㄅㄚ 吧罷 ㄅㄚ 八巴叭扒吧芭疤笆 ㄅㄚˊ 拔跋鈸 ㄅㄚˇ 把靶 ㄅㄚˋ 伯把爸罷霸壩
That would mean saving both the characters and the 注音符號。
 

mikelove

皇帝
Staff member
Re: removing the paragraphing from a document with 4,807 of

feng said:
Now, supposing you had a document like this:
․ㄅㄚ ba 吧罷 ㄅㄚ bā 八巴叭扒吧芭疤笆 ㄅㄚˊ bá 拔跋鈸 ㄅㄚˇ bǎ 把靶 ㄅㄚˋ bà 伯把爸罷霸壩

This has five different Hanyu pinyin sounds. If you had 1180 in the document, is there a way to tell the computer to get rid of all the roman characters (and their associated diacritic marks) in one fell swoop? I mean, make it look like this:
․ㄅㄚ 吧罷 ㄅㄚ 八巴叭扒吧芭疤笆 ㄅㄚˊ 拔跋鈸 ㄅㄚˇ 把靶 ㄅㄚˋ 伯把爸罷霸壩
That would mean saving both the characters and the 注音符號。

Do a regular expression replace, searching for:

[A-Za-z\x{00c0}-\x{01df}]

And replacing it with nothing - that should get rid of all of them.
 

feng

榜眼
Re: removing the paragraphing from a document with 4,807 of

Thanks, Mike, but ...

Using the 'find' feature? Didn't find anything. The thing that also causes a problem is the diacritics. If I search for "ba", that gives me "ba", not ba with any of the four tone marks over it. In spell check, with your equation it finds second and fourth tones, but doesn't find first or third tone syllables, but spell check can't delete anything.

I'm just being lazy, but that's why computers were invented :mrgreen:
 

mikelove

皇帝
Staff member
Re: removing the paragraphing from a document with 4,807 of

feng said:
Using the 'find' feature? Didn't find anything. The thing that also causes a problem is the diacritics. If I search for "ba", that gives me "ba", not ba with any of the four tone marks over it. In spell check, with your equation it finds second and fourth tones, but doesn't find first or third tone syllables, but spell check can't delete anything.

Did you turn on grep / regular expression matching? It won't work without that.
 

feng

榜眼
Re: removing the paragraphing from a document with 4,807 of

Thank you! I want to change the title of this thread to "Why Mike is a genius".
 
Top