The neural voice synthersized by azure TTS is trembling,unclear,unemotional,unnatural. RRS feed

  • Question

  • The neural voice synthersized by azure TTS is trembling,unclear,unemotional,unnatural.
    I listened the voice on my iPhone 6 .
    I listened the voice synthersized by IBM watson TTS on my iPhone6 ,too.It is emotional,and natural.
    So,what is the core reason? Anybody can help me to check the code?
    check the configuration.  I have deleted url link.    The code can running succesffuly ,but the result voice's quality is bad.Let me sad.I need your help , MicroSoft engineers friends.
    var ShortName = 'en-US-JessaNeural'

    var MsKey = ''
    var MsUri = ''
    var BaseUrl = ''

    // Gets an access token.
    function getAccessToken(subscriptionKey) {
    let options = {
    method: 'POST',
    uri: MsUri,
    headers: {
    'Ocp-Apim-Subscription-Key': subscriptionKey

    return rp(options);

    // Converts text to speech using the input from readline.
    function textToSpeech(accessToken, text) {
    // Create the SSML request.
    let xml_body = xmlbuilder.create('speak')
    .att('version', '1.0')
    .att('xmlns', '')
    .att('xmlns:mstts', '')
    .att('xml:lang', 'en-us')
    .att('xml:lang', 'en-us')
    .att('name', ShortName)
    .att('type', 'cheerful')
    .att('pitch', 'default')
    .att('rate', 'slow')
    .att('volume', 'loud')
    // Convert the XML into a string to send in the TTS request.
    let body = xml_body.toString();
    // console.log('xml_body=' + xml_body)
    let options = {
    method: 'POST',
    baseUrl: BaseUrl,
    url: 'cognitiveservices/v1',
    headers: {
    'Authorization': 'Bearer ' + accessToken,
    'cache-control': 'no-cache',
    'User-Agent': 'YOUR_RESOURCE_NAME',
    'X-Microsoft-OutputFormat': 'audio-24khz-160kbitrate-mono-mp3',
    'Content-Type': 'application/ssml+xml'
    encoding: null,
    body: body

    request_ = rp(options).on('response', (response) => {
    if (response.statusCode == 200) {
    console.log(new Date().getTime() + '---文件获取成功Your file is ready.')
    RequestOk = true;

    return request_;

    Wednesday, July 10, 2019 9:01 AM

All replies

  • The response voice file is played clearly on Wind 10 platform.I heared just now.I checked the quality of voice,again.

    But it is played unclearly and badly on iOS platform 

    it was genetrated in the form of audio-24khz-160kbitrate-mono-mp3.

    Why,can you help me ?

    Or I need to  change the voice file form when I request to generated neural voice?

    Which type of voice form I will choose?

    Saturday, July 13, 2019 9:23 AM
  • I changed many output format ,The problem still exists.  The output voice  is played  unclearly ,tremling on the Wechat  of my iPhone.                 But it is played clearly on the  window media player on Windows10.
    And request endpoint for example, West us , southeast asia


    Saturday, July 13, 2019 10:47 AM
  • Hi,

    Sorry for the late response, could you please send us an email at with your Azure subscription ID and URL of this thread? We can help live trouble shooting it.



    Friday, July 19, 2019 12:50 AM
  • Hi,

    We have not received your details. Hope you have solved your problem and everything is good. Please let us know if you still have further challenge.



    Thursday, July 25, 2019 7:57 PM
  • Hi,I see this reply just now.

    The quality of voice is still not emotional,and not natural.

    How can I  write the xml code?

    Monday, October 28, 2019 6:48 AM
  • I thought nobody will reply me in July. 

    But  I have another question about Azure service api .So I  ask a  question in the forum again,

    Monday, October 28, 2019 6:51 AM
  • Hi,

    Thank you for your response. We will help you in your new post.



    Monday, October 28, 2019 6:51 PM