{"id":83265,"date":"2018-07-16T00:01:39","date_gmt":"2018-07-16T07:01:39","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/?p=83265"},"modified":"2019-02-18T09:09:54","modified_gmt":"2019-02-18T16:09:54","slug":"parse-html-and-pass-to-cognitive-services-text-to-speech","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/parse-html-and-pass-to-cognitive-services-text-to-speech\/","title":{"rendered":"Parse HTML and pass to Cognitive Services Text-to-Speech"},"content":{"rendered":"<p><strong>Summary<\/strong>: Having some fun with Abbott and Costello\u2019s \u201cWho\u2019s on first?\u201d comedy routine, and multiple voices with Bing Speech.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/p>\n<p>Hello everyone!<\/p>\n<p>The last few posts, I showed you all about the Cognitive Services Text-to-Speech API. You learned about the process to authenticate with Windows PowerShell.<\/p>\n<p>It was also a great showcase for <strong>Invoke-RestMethod<\/strong>, as it demonstrated how REST API services are accessible with no real code for the IT professional.<\/p>\n<p>Today, as an IT pro, I\u2019m just going to have some fun. Sometimes that\u2019s the best way to learn how to code.<\/p>\n<p>Initially, all of this came about as a challenge from other members of \u201cHey, Scripting Guy!\u201d I demonstrated a silly little script I wrote to play Abbott and Costello\u2019s most famous comedy sketch, \u201cWho\u2019s on first?\u201d with the internal voices in Windows. It\u2019s a neat trick many PowerShell people love to play with like this.<\/p>\n<p><strong># Establish to the Voice Comobject<\/strong><\/p>\n<p><strong>$voiceAPI=New-Object -comobject SAPI.SPVoice<\/strong><\/p>\n<p><strong># Speed up the rate of the Speaker&#8217;s voice<\/strong><\/p>\n<p><strong>$voiceAPI.Rate=3<\/strong><\/p>\n<p>I proceeded to get the voices, and then depending on who\u2019s name (yes, that\u2019s his name), I found I would pick a voice in Windows.<\/p>\n<p><strong># Obtain the list of voices in Windows 10<\/strong><\/p>\n<p><strong>$voiceFont=$voiceAPI.GetVoices()<\/strong><\/p>\n<p><strong># Establish a table to match the Microsoft voices with the names of the comedians<\/strong><\/p>\n<p><strong>$nameMatch=@{&#8216;Abbott:&#8217; = &#8216;ZIRA&#8217;; &#8216;Costello:&#8217; = &#8216;DAVID&#8217; }<\/strong><\/p>\n<p>So it was neat. I had the text file on the hard drive, and it was all fun and games.<\/p>\n<p>Some people said, \u201cCool, but you should try the same approach with Cognitive Services!\u201d<\/p>\n<p>It was at this point I read and learned everything I showed you in the last several posts. Today we\u2019re going to have some fun: \u201cWho\u2019s on first?\u201d portrayed by the \u201cAzure Cognitive Services Players.\u201d<\/p>\n<p><strong>Challenge #1<\/strong> \u2013 Learn how to use Text-to-Speech in Azure. Accomplished, and built a function to leverage it. I\u2019ve prepopulated all of the available sound file options, so I could just select from an array in this function.<\/p>\n<p style=\"padding-left: 60px\"><code>Function Invoke-AzureTextToSpeech($Region,$Voice,$Content,$Filename)<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>{<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># Obtain Access Token to communicate with Voice API<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># I erased mine, you'll have to get your own ;)<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$APIKey='00000000000000000000000000000000'<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$AccessToken=Invoke-RestMethod -Uri \"https:\/\/api.cognitive.microsoft.com\/sts\/v1.0\/issueToken\" -Method 'POST' -ContentType 'application\/json' -Headers @{'Ocp-Apim-Subscription-Key' = $APIKey }<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># Generate GUID for Access<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># Just use this Cmdlet to generate a new one (New-Guid).tostring().replace('-','')<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$XSearchAppId='00000000000000000000000000000000'<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># Just use this Cmdlet to generate a new one (New-Guid).tostring().replace('-','')<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$XSearchClientId='00000000000000000000000000000000'<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># Current list of Audio formats for Azure Text to Speech<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># HTTP Headers X-Microsoft-OutputFormat<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/speech\/api-reference-rest\/bingvoiceoutput<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>#<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$AudioFormats=( `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'ssml-16khz-16bit-mono-tts', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'raw-16khz-16bit-mono-pcm', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'audio-16khz-16kbps-mono-siren', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'riff-16khz-16kbps-mono-siren', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'riff-16khz-16bit-mono-pcm', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'audio-16khz-128kbitrate-mono-mp3', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'audio-16khz-64kbitrate-mono-mp3', `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'audio-16khz-32kbitrate-mono-mp3' `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>)<\/code><\/p>\n<p style=\"padding-left: 90px\"><code># WAV File format<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$AudioOutputType=$AudioFormats[4]<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$UserAgent='PowerShellForAzureCognitiveApp'<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$Header=@{ `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'Content-Type' = 'application\/ssml+xml'; `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'X-Microsoft-OutputFormat' = $AudioOutputType; `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'X-Search-AppId' = $XSearchAppId; `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'X-Search-ClientId' = $XSearchClientId; `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>'Authorization' = $AccessToken `<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>}<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$Body=''+$Content+''<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>Invoke-RestMethod -Uri \"https:\/\/speech.platform.bing.com\/synthesize\" -Method 'POST' -Headers $Header -ContentType 'application\/ssml+xml' -Body $Body -UserAgent $UserAgent -OutFile $Filename<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>}<\/code><\/p>\n<p>I can now use this function and dynamically supply the region data, as well as the content, in a loop or script!<\/p>\n<p><strong>Challenge #2<\/strong> \u2013 Get a nice way to play WAV files synchronously, without launching additional applications.<\/p>\n<p>I used a simple function based upon the earlier posted PowerTip to solve this issue.<\/p>\n<p style=\"padding-left: 60px\"><code>Function Play-MediaFile($Filename)<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>{<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$PlayMedia=New-object System.Media.Soundplayer<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$PlayMedia.SoundLocation=($Filename)<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$PlayMedia.playsync()<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>}<\/code><\/p>\n<p><strong>Challenge #3<\/strong> \u2013 Get rid of the text file.\u00a0 I want to read the content straight from <a target=\"_blank\" href=\"http:\/\/www.abbottandcostellofanclub.com\/who.html\" rel=\"noopener\">The Abbott and Costello Fan Club<\/a>.<\/p>\n<p>Connecting was easy. Just use <strong>Invoke-WebRequest<\/strong>, and store the content in an object.<\/p>\n<p><strong>$RawSketch=Invoke-WebRequest -Uri &#8216;http:\/\/www.abbottandcostellofanclub.com\/who.html&#8217;<\/strong><\/p>\n<p>The challenge was that the returned content was one massive string. I needed it broken up into lines for an array.<\/p>\n<p>I\u2019m sure I could have contacted some friends like Tome Tanasovski or Thomas Rayner for some help with regular expressions, but I like trying alternative approaches sometimes.<\/p>\n<p>There were a lot of CRLF (CarriageReturn \/ LineFeed) and Tabs prefacing the lines. I needed that cleaned up.<\/p>\n<p><strong>$CR=[char][byte]13<\/strong><\/p>\n<p><strong>$LF=[char][byte]10<\/strong><\/p>\n<p><strong>$Tab=[char][byte]9<\/strong><\/p>\n<p><strong>$RawSketchContent=$RawSketch.Content<\/strong><\/p>\n<p><strong>$RawSketchContent=$RawSketchContent.Replace($cr+$lf+$tab,&#8217; &#8216;)<\/strong><\/p>\n<p>Once I completed this, I just had a nice list of content terminating in carriage returns. I could split this up into an array now, in the following fashion:<\/p>\n<p><strong>$SketchArray=$rawsketchcontent.split(&#8220;`r&#8221;)<\/strong><\/p>\n<p>I took a look at the raw HTML, and found a \u201cBefore\u201d and \u201cAfter\u201d on the sketch content. I passed this into <strong>Select-Object<\/strong> and captured the line numbers of the array. This allowed me to have a \u201cBegin\u201d parsing point, and an \u201cEnd.\u201d<\/p>\n<p><strong>$StartofSketch=$SketchArray | Select-string -SimpleMatch &#8216;&lt;PRE&gt;&#8217; | Select-Object -expandproperty LineNumber<\/strong><\/p>\n<p><strong>$EndofSketch=$SketchArray | Select-string -SimpleMatch &#8216;&lt;\/PRE&gt;&#8217; | Select-Object -expandproperty LineNumber<\/strong><\/p>\n<p>With this achieved, I needed to select two voices in Cognitive Services Text-to-Speech. If you remember Part 4 in the series, we showed the list to choose from. I decided on an Australian female voice for Bud Abbott, and an Irish male voice for Lou Costello.<\/p>\n<p>I used a simple array to store the data.<\/p>\n<p><strong>$CognitiveSpeakers=@()<\/strong><\/p>\n<p><strong>$CognitiveSpeakers+=&#8217;BUD:;en-AU;&#8221;Microsoft Server Speech Text to Speech Voice (en-AU, Catherine)&#8221;&#8216;<\/strong><\/p>\n<p><strong>$CognitiveSpeakers+=&#8217;LOU:;en-IE;&#8221;Microsoft Server Speech Text to Speech Voice (en-IE, Shaun)&#8221;&#8216;<\/strong><\/p>\n<p>We need to initial certain variables to figure out Who is talking (well yes, of course he is, that\u2019s his job), and to store away the audio content.<\/p>\n<p><strong>$CurrentSpeaker=&#8217;Nobody&#8217;<\/strong><\/p>\n<p><strong>$TempVoiceFilename=&#8217;whoisonfirst.wav&#8217;<\/strong><\/p>\n<p>Now for the work to begin. We start our loop from the beginning of the content array to the end, and make sure any temporary WAV file is erased from a previous run.<\/p>\n<p><strong>For ($a=$StartofSketch+1; $a -lt $EndofSketch; $a++)<\/strong><\/p>\n<p><strong>{<\/strong><\/p>\n<p><strong>Remove-Item $TempVoiceFilename -Force -ErrorAction SilentlyContinue<\/strong><\/p>\n<p>We then identify a line of content to parse:<\/p>\n<p><strong>$LinetoSpeak=$sketcharray[$a-1]<\/strong><\/p>\n<p>Each line that has a speaker on the site began with either BUD: or LOU:, so I used a little RegEx to trap for where the identified speaker name ended. Anything after that would be their speaking content.<\/p>\n<p><strong>$SearchForSpeaker=(($LinetoSpeak | Select-String -Pattern &#8216;[a-zA-Z]+(:)&#8217;).Matches)<\/strong><\/p>\n<p>The next scenario to trap for was whether the line contained a speaker name with text, or just text (which meant a continuation of the earlier line).<\/p>\n<p>This variable would set to 1 (beginning of a line). If a speaker was found, the beginning of the content would naturally be further down the line.<\/p>\n<p><strong>$LinetoSpeakStart=1<\/strong><\/p>\n<p>Then I had to trap for some \u201cfun situations.\u201d Did the speaker change? Is it the same speaker, but they have more lines to speak?<\/p>\n<p style=\"padding-left: 60px\"><code>If ($SearchForSpeaker -ne $NULL)<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>{<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$Speaker=$SearchForSpeaker[0].Value<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$LinetoSpeakStart=$SearchForSpeaker[0].Index + $SearchForSpeaker[0].Length + 5<\/code><\/p>\n<p>Then of course if the speaker did change, I needed to repopulate objects unique to the speaker for Azure.<\/p>\n<p style=\"padding-left: 60px\"><code>If ($Speaker -ne $CurrentSpeaker)<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>{<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$CurrentSpeaker = $Speaker<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$RawSpeakerData=$CognitiveSpeakers -match $CurrentSpeaker<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$SpeakerData=$RawSpeakerData.split(';')<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$Region=$SpeakerData[1]<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$Voice=$SpeakerData[2]<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$Name=$SpeakerData[0]<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>}<\/code><\/p>\n<p>As you can see, I\u2019m pulling in the data needed for Azure, like Voice and Region from the <strong>SpeakerData<\/strong> array I created earlier.<\/p>\n<p>Once we\u2019ve identified the speaker and the content, we can call up the two key functions of <strong>Invoke-AzureTextToSpeech<\/strong> and <strong>Play-MediaFile<\/strong>:<\/p>\n<p style=\"padding-left: 60px\"><code>If ($LinetoSpeak.Length -gt 1)<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>{<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$LinetoSpeak.replace('<b>','').replace('<\/b>','')<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>$Content=$LineToSpeak.Substring($LinetoSpeakStart).replace('<i>','').replace('<\/i>','')<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>Invoke-AzureTextToSpeech -Region $Region -Content $Content -Voice $Voice -Filename $TempVoiceFilename<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>Do { } until (Test-Path $TempVoiceFilename)<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>Play-MediaFile -filename $TempVoiceFilename<\/code><\/p>\n<p style=\"padding-left: 90px\"><code>Start-Sleep -Milliseconds 1000<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>}<\/code><\/p>\n<p>You\u2019ll note that there is a <strong>Start-Sleep<\/strong> in the loop. This is because there is a limit on the REST API of how many transactions it can take within a certain timeframe.<\/p>\n<p>I thank you for sharing your time with me today. Hopefully you had a little fun, and maybe even learned of some ways you, too, can play with HTML content.<\/p>\n<p>If you see a more efficient way of doing this, I\u2019d love to see the results! It could be a really cool blog post itself!<\/p>\n<p>Until next time, remember that the Power of Shell is in you!<\/p>\n<p>I invite you to follow the Scripting Guys on <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguystwitter\" rel=\"noopener\">Twitter<\/a> and <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguysfacebook\" rel=\"noopener\">Facebook<\/a>. If you have any questions, send email to them at <u>scripter@microsoft.com<\/u>, or post your questions on the <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingforum\" rel=\"noopener\">Official Scripting Guys Forum<\/a>.<\/p>\n<p><strong>Sean Kearney, Premier Field Engineer, Microsoft<\/strong><\/p>\n<p><strong>Frequent contributor to Hey, Scripting Guy!<\/strong><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p> Having some fun with Abbott and Costello\u2019s \u201cWho\u2019s on first?\u201d comedy routine, and multiple voices with Bing Speech.<\/p>\n","protected":false},"author":596,"featured_media":83285,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[568],"tags":[3,154,45],"class_list":["post-83265","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hey-scripting-guy","tag-scripting-guy","tag-sean-kearney","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p> Having some fun with Abbott and Costello\u2019s \u201cWho\u2019s on first?\u201d comedy routine, and multiple voices with Bing Speech.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/83265","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=83265"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/83265\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/83285"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=83265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=83265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=83265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}