reader text to speech

rizen suha

状元
any plans to incorporate advanced automated agents that can read (chinese) texts aloud, with proper segmentation and natural intonation according to "neighbouring" and "local" semantic context? the current "apple" voice synthesiser does a fairly good job, no problem understanding what is read, but not enjoyable and hardly turns a written book into a sufficiently "listenable" audio-book. im am of course thinking about the alternatives that "deep learning" nlp / tacotron systems from google (or apple itself) may provide, now or in the future. which, by the way, in the future, are sure to all but obliterate the need for humans on platforms like audible or "du chinese reader". when that day comes (but until then, lesser improvements are also welcome) pleco reader will become an extraordinary tool.
 

mikelove

皇帝
Staff member
Right now the only way to get access to those systems that I'm aware of is through an online API. Which is doable, but fraught with problems because so many of our users are in highly iffy situations internet-access-wise. I'm hoping that Apple or Google or Amazon will make a neural network based Chinese TTS available through an offline API at some point, but I'm reluctant to forge ahead and do an online one unless/until it becomes clear that an offline one is not forthcoming.
 

rizen suha

状元
ok. i obviously dont know the % of pleco users "online" most of the time. if its around 50% would it not make sense to offer it? perhaps an offline version is not that far off with improved hardware and dedicated neural net processors. after all nlp / speech is going to be used ubiquitiously in so many apps / services (already today but also people will use it "all the time", unlike today). but hey, who knows, perhaps the built in apple voice synth will upgrade "itself" and problem solved...?
 

mikelove

皇帝
Staff member
ok. i obviously dont know the % of pleco users "online" most of the time. if its around 50% would it not make sense to offer it?
It's not just the work of integrating it, it's also questions like how do you charge for it - if we're paying per API call then we have to bill people based on usage somehow, and then what if you pre-pay for some number of TTS outputs but then can't use them because of the GFW? These are not insurmountable problems, but taking a whole lot of time to engineer solutions around all of them when it seems likelier than not that the same functionality will be available in an easy-to-integrate unmetered offline form in a few years seems like not the best thing to spend our already way too limited time on.
 

Shun

状元
Hello Mike and rizen,

The advanced online-only API services you're describing, are they perhaps also accessible through a website on a pay-per-use basis? Would you know the address? I'm just wondering how they sound. A search on Google "chinese tts machine learning" hasn't turned up anything neat. :) Many thanks!

Shun
 

Shun

状元
Thanks! Then I will wait. I once had negative experiences with Google Cloud Platform's billing department. Installing the Amazon App Store on an Android phone is a somewhat risky proposition because Apps from unknown sources would have to be allowed. I guess we have to wait until Android phones have stronger Neural Processing Units and Apple includes lifelike Mandarin Chinese TTS with Siri (their English voices are already quite excellent). Hopefully developers can actually use Siri to read texts. I couldn't access the Siri voices while programming something on a Mac, only non-Siri ones.

Cheers, Shun
 

rizen suha

状元
cn speech example with amazon polly (which is used by the audio reader app commented above)
https://d1.awsstatic.com/product-marketing/Polly/voices/zhiyu-talk.4a8c0bbfbd1afacca606856c6fe33834e0c81a48.mp3
cn language service commented here (2018) together with another audio example
https://aws.amazon.com/blogs/machine-learning/meet-zhiyu-the-first-mandarin-chinese-voice-for-amazon-polly/
sounds good. and, if representative, close to perfect.
general service (text => mp3) described here
https://aws.amazon.com/polly/
seems like you can have 5 million characters / month converted for free (have to sign up for an account).

i would not mind paying (say) 50$ - in addition to the amazon conversion cost - to have a book of my choice on pleco, as a readable and speakable audiobook.
 
Last edited:

rizen suha

状元
Top