iTunes Flashcard using Ting-Ting Text to Speech

Others experience in using latest MacOS to automate a list of numbered pinyin expressions audio (from text-to-speech) into iTunes and adding album art (to use as audio flashcards) would be most appreciated.

So far the procedure is:
- choose Ting-Ting as text to speech voice
- select from list of place flashcard data - 'numbered' pinyin expressions in opened in TextEditor
- use text to speech (to iTunes) service
- serialise the filenames with a numbered prefix
- build up a batch of mp3 in iTunes
- use a 3rd party app to create 'separate' album names (e.g. 'Quick tag' or 'ID3 Editor')
- drag and drop 600x600 px graphics (can produce separate graphics of chars with 'Automator')

Would be great if someone new how to automate the last step or any of the former as the result is quite nifty on the iPod etc. If there was a way to export these as separate MP4s even better.
 
Top