{"id":83195,"date":"2018-06-15T00:01:27","date_gmt":"2018-06-15T07:01:27","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/?p=83195"},"modified":"2019-02-18T09:09:56","modified_gmt":"2019-02-18T16:09:56","slug":"windows-powershell-and-the-text-to-speech-rest-api-part-4","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/windows-powershell-and-the-text-to-speech-rest-api-part-4\/","title":{"rendered":"Windows PowerShell and the Text-to-Speech REST API (Part 4)"},"content":{"rendered":"<p><strong>Summary<\/strong>: Send and receive content to the Text-to-Speech API with PowerShell.<\/p>\n<p>Q: Hey, Scripting Guy!<\/p>\n<p>I was playing with the Text-to-Speech API. I have it almost figured out, but I\u2019m stumbling over the final steps of formatting the SSML markup language. Could you lend me a hand?<\/p>\n<p>\u2014MD<\/p>\n<p>A: Hello MD,<\/p>\n<p>Glad to lend a hand to a Scripter in need! I remember having that same challenge the first time I worked with it. It\u2019s actually not hard, but I needed a sample to work with.<\/p>\n<p>Let\u2019s first off remember where we were last time. We\u2019ve accomplished the first two pieces for Cognitive Services Text-to-Speech:<\/p>\n<ol>\n<li>The authentication piece, to obtain a temporary token for communicating with Cognitive Services.<\/li>\n<li>Headers containing the audio format and our application\u2019s unique parameters.<\/li>\n<\/ol>\n<p>Next, we need to build the body of content we need to send up to Azure. The body contains some key pieces:<\/p>\n<ul>\n<li>Region of the speech (for example, English US, Spanish, or French).<\/li>\n<li>Text we need converted to speech.<\/li>\n<li>Voice of the speaker (male or female).<\/li>\n<\/ul>\n<p>For more information about all this, see the section \u201cSupported locales and voice fonts\u201d in <a target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/speech\/api-reference-rest\/bingvoiceoutput\" rel=\"noopener\">Bing text to speech API<\/a>.<\/p>\n<p>The challenge I ran into was in just how to create the SSML content that was needed. SSML, which stands for Speech Synthesis Markup Language, is a standard for identifying just how speech should be spoken. Examples of this would be:<\/p>\n<ul>\n<li>Content<\/li>\n<li>Language<\/li>\n<li>Speed<\/li>\n<\/ul>\n<p>I could spend a lot of time reading up on it, but Azure gives you a great tool to create sample content without even trying! Check out <a target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-ca\/services\/cognitive-services\/speech\/\" rel=\"noopener\">Bing Speech<\/a><u>,<\/u> and look under the heading \u201cText to Speech.\u201d In the text box, type in whatever you would like to hear.<\/p>\n<p>In the sample below, I have entered in \u201cHello everyone, this is Azure Text to Speech.\u201d<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/cog-services-12-05282018.png\"><img decoding=\"async\" width=\"1024\" height=\"580\" class=\"alignnone size-large wp-image-83205\" alt=\"Screenshot of Bing Speech\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/cog-services-12-05282018-1024x580.png\" \/><\/a><\/p>\n<p>Now if you select <strong>View SSML<\/strong> (the blue button), you can see the code in SSML that would have been the body we would have sent to Azure.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/cog-services-13-05282018.png\"><img decoding=\"async\" width=\"1024\" height=\"551\" class=\"alignnone size-large wp-image-83215\" alt=\"Screenshot of SSML code\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/cog-services-13-05282018-1024x551.png\" \/><\/a><\/p>\n<p>You can copy and paste this into your editor of choice. From here, I will try to break down the content from our example.<\/p>\n<p>&lt;speak version=&#8221;1.0&#8243; xmlns=&#8221;http:\/\/www.w3.org\/2001\/10\/synthesis&#8221; xmlns:mstts=&#8221;http:\/\/www.w3.org\/2001\/mstts&#8221; xml:lang=<span style=\"color: #008000\">&#8220;en-US&#8221;<\/span>&gt;&lt;voice xml:lang=<span style=\"color: #008000\">&#8220;en-US&#8221;<\/span> name=<span style=\"color: #0000ff\">&#8220;Microsoft Server Speech Text to Speech Voice (en-US, JessaRUS)&#8221;<\/span>&gt;<span style=\"color: #ff0000\">Hello everyone, this is Azure Text to Speech<\/span>&lt;\/voice&gt;&lt;\/speak&gt;<\/p>\n<p>The section highlighted in <span style=\"color: #008000\">GREEN<\/span> is our locale. The <span style=\"color: #0000ff\">BLUE<\/span> section contains our service name mapping. The locale must always be matched with the same service name mapping from the row it came from. The double quotes are also equally important.<\/p>\n<p>If you mix them up, Azure will wag its finger at you and give a nasty error back.<\/p>\n<p>The section in <span style=\"color: #ff0000\">RED<\/span> is the actual content that Azure would like us to convert to speech.<\/p>\n<p>Let\u2019s take a sample from the table, and change this to an Australian female voice.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/cog-services-14-05282018.png\"><img decoding=\"async\" width=\"1024\" height=\"168\" class=\"alignnone size-large wp-image-83225\" alt=\"Table with two rows\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/cog-services-14-05282018-1024x168.png\" \/><\/a><\/p>\n<p>We first replace the locale with \u201cen-AU,\u201d and then the service name mapping with \u201cMicrosoft Server Speech Text to Speech Voice (en-AU, Catherine).\u201d<\/p>\n<p>&lt;speak version=&#8221;1.0&#8243; xmlns=&#8221;http:\/\/www.w3.org\/2001\/10\/synthesis&#8221; xmlns:mstts=&#8221;http:\/\/www.w3.org\/2001\/mstts&#8221; xml:lang=<span style=\"color: #008000\">&#8220;en-AU&#8221;<\/span>&gt;&lt;voice xml:lang=<span style=\"color: #008000\">&#8220;en-AU&#8221;<\/span> name=<span style=\"color: #0000ff\">&#8221; Microsoft Server Speech Text to Speech Voice (en-AU, Catherine)&#8221;<\/span>&gt;<span style=\"color: #ff0000\">Hello everyone, this is Azure Text to Speech<\/span>&lt;\/voice&gt;&lt;\/speak&gt;<\/p>\n<p>Now if we\u2019d like to have her say something different, we just change the content in red.<\/p>\n<p>How does this translate in Windows PowerShell?<\/p>\n<p>We can take the three separate components (locale, service name mapping, and content), and store them as objects.<\/p>\n<p><strong>$Locale=\u2018en-US\u2019<\/strong><\/p>\n<p><strong>$ServiceNameMapping=\u2018Microsoft Server Speech Text to Speech Voice (en-US, JessaRUS)\u2019<\/strong><\/p>\n<p><strong>$Content=\u2018Hello everyone, this is Azure Text to Speech\u2019<\/strong><\/p>\n<p>Now you can have a line like this in Windows PowerShell to dynamically build out the SSML content, and change only the pieces you typically need.<\/p>\n<p><strong>$Body='&lt;speak version=&#8221;1.0&#8243; xmlns=&#8221;http:\/\/www.w3.org\/2001\/10\/synthesis&#8221; xmlns:mstts=&#8221;http:\/\/www.w3.org\/2001\/mstts&#8221; xml:lang=&#8221;&#8216;+$locale+'&#8221;&gt;&lt;voice xml:lang=&#8221;&#8216; +$locale+'&#8221; name=&#8217;+$ServiceNameMapping+&#8217;&gt;&#8217;+$Content+'&lt;\/voice&gt;&lt;\/speak&gt;&#8217;<\/strong><\/p>\n<p>At this point, we only need to call up the REST API to have it do the magic. But that is for another post!<\/p>\n<p>See you next time when we finish playing with this cool technology!<\/p>\n<p>I invite you to follow the Scripting Guys on <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguystwitter\" rel=\"noopener\">Twitter<\/a> and <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguysfacebook\" rel=\"noopener\">Facebook<\/a>. If you have any questions, send email to them at <u>scripter@microsoft.com<\/u>, or post your questions on the <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingforum\" rel=\"noopener\">Official Scripting Guys Forum<\/a>.<\/p>\n<p><strong>Sean Kearney, Premier Field Engineer, Microsoft<\/strong><\/p>\n<p><strong>Frequent contributor to Hey, Scripting Guy!<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Send and receive content to the Text-to-Speech API with PowerShell.<\/p>\n","protected":false},"author":596,"featured_media":83285,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[568],"tags":[3,154,45],"class_list":["post-83195","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hey-scripting-guy","tag-scripting-guy","tag-sean-kearney","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Send and receive content to the Text-to-Speech API with PowerShell.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/83195","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=83195"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/83195\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/83285"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=83195"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=83195"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=83195"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}