Once you’re happy with the spoken text, click on “Download” to get the mp3 file. You can always click on “Listen” to hear a preview. Plus, neural engines are more expensive however, for a few static short text blocks that you generate, the cost savings of the standard voice are not worth the huge quality boost you gain by selecting neural voices. They are not yet available in all AWS regions and not for all languages. Most languages already support neural voices, which I’d recommend, as they sound more natural. Next, choose the language, the voice to use and enter the text to speak. Make sure the AWS region and language support neural voices! Configure Amazon Polly with a good file format (e.g., mp3), enter the text and choose a neural speech engine. It doesn’t make much difference if you don’t plan many intermediary processing steps, as the final file size should anyway be extremely low to bring your finished project to Facebook or Instagram. MP3 goes up to a sample rate of 24000 Hz, PCM is limited to 16000 Hz.Ĭhoose either PCM for uncompressed sound or go with MP3. In its additional settings, Polly offers MP3, OGG, PCM and Speech Marks. Neither Amazon Polly nor the Microsoft Azure Text-to-Speech cognitive service can directly produce an m4a audio file. Generating Audio using Text-to-Speech (mp3 / PCM) Spark AR has the following requirements on audio files: I’m using the free Audiacity tool, which integrates the open-source FFmpeg plug-in. This short tutorial is a guidance on how to convert artificially generated neural voices (in this case coming from an mp3 file as produced by Amazon Polly) to the m4a format accepted by Spark AR. Unfortunately, only M4A with specific settings is allowed. Currently, Facebook’s Spark AR Studio is restrictive with supported audio formats.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |