Discussion in 'Future Products' started by Peter, Mar 5, 2015.
Good idea, thanks.
I’m not sure if this is supported already and I just didn’t find it:
How about an option to strip an entry of all html (barring entities) when exporting it to Anki as a flashcard? I have my own card format so I find myself having to delete the a bunch of html from every card I export from Pleco. Not that I wouldn’t have to edit them a bit anyway (putting word class in a separate field, adding examples etc.) but stripping the html sounds like a relatively simple modification which could save me a few very mechanical steps.
Not too tricky, but which formatting are you trying to remove? Unless you've turned on the option to format cards like dictionary entries there shouldn't be any formatting anywhere except the definition field, and most of that if of a sort that seems unlikely to conflict with your format (bold tags around sub-definitition numbers, gray part of speech labels, e.g.) - is that nonetheless something you'd prefer to have in plaintext?
I have the formatting as Pleco headers off for all fields if that’s what you mean.
When I export a card (from, say ABC), I get: a <div> element for alignment around the whole thing, multiple <b> elements (around the word class, the numbering for meanings etc.), <span> elements which adjust the font size of word class labels, as well as a <plecoentry /> tag at the end which I’m not quite sure what it does (just an ID maybe?). I’m pretty sure I also saw <br /> tags in other dictionaries (although that may be limited to custom ones, not sure…). Of course, they don’t conflict in the technical sense of the word, but they don’t fit the formatting I use on my flashcards. So it’s mostly a question of consistency (although I imagine that in the case of very long entries, the line breaks – be they caused by <div>’s or <br />’s could make a card annoying to use because it’s too long vertically). My suggestion would be to provide an option which would first add a space after anything that would case a line break (i.e. all block elements as well as <br />) and then strip away everything which has <>’s around it. Entities like ∼ (∼) are probably best left alone because most of them stand for characters important to the text. The only exception could be invisible things like non-breaking spaces or maybe the 	 control character (which I’m not completely sure what it does) after numbering.
Separate names with a comma.