{"id":181200,"date":"2023-11-14T09:00:06","date_gmt":"2023-11-14T14:00:06","guid":{"rendered":"https:\/\/blog.logrocket.com\/?p=181200"},"modified":"2024-06-04T16:54:43","modified_gmt":"2024-06-04T20:54:43","slug":"exploring-ai-speech-text-services-python","status":"publish","type":"post","link":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/","title":{"rendered":"Exploring AI speech-to-text services with Python"},"content":{"rendered":"<!DOCTYPE html>\n<html><p>The rapid adoption of AI in recent times proves that software products can take advantage of AI where necessary to create richer experiences for users. In this article, I would like you to grab a coffee, set up your Python playground, and get ready to explore different providers that offer AI speech-to-text (STT) services.<\/p><img loading=\"lazy\" decoding=\"async\" width=\"895\" height=\"597\" src=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png\" class=\"attachment-full size-full wp-post-image\" alt=\"Exploring AI speech-to-text services with Python\" srcset=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png 895w, https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python-300x200.png 300w, https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python-768x512.png 768w\" sizes=\"auto, (max-width: 895px) 100vw, 895px\">\n<p>There are a couple of these providers, such as:<\/p>\n<ol>\n<li><a href=\"https:\/\/openai.com\/research\/whisper\">OpenAI<\/a><\/li>\n<li><a href=\"https:\/\/deepgram.com\/product\/transcription\">DeepGram<\/a><\/li>\n<li><a href=\"https:\/\/www.rev.ai\/\">Rev AI<\/a><\/li>\n<li><a href=\"https:\/\/aws.amazon.com\/pm\/transcribe\/\">Amazon Transcribe<\/a><\/li>\n<li><a href=\"https:\/\/cloud.google.com\/speech-to-text\">Google Cloud Speech-to-Text<\/a><\/li>\n<\/ol>\n<p>But in this article, I will consider only the first three providers because they do not require proprietary software and make it easy to set up an account.<\/p>\n<h2 id=\"about-sample-project\">About our sample project<\/h2>\n<p>To explore these powerful STT service providers, we will use a 40-second audio recording as input to interact with their APIs. Our comparison will be based on the following metrics:<\/p>\n<ol>\n<li><strong>Speed<\/strong>: How long it takes to respond with a transcription<\/li>\n<li><strong>Accuracy<\/strong>: How well it transcribes audio to text<\/li>\n<\/ol>\n<p>Accuracy becomes important when you consider that many languages, including English, can be spoken with over 160 accents. Commonly, accuracy is measured by the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Word_error_rate\">Word Error Rate (WER)<\/a>. This refers to the number of word insertions, deletions, and\/or substitutions a transcribed text contains, with respect to the original text.<\/p>\n<h2 id=\"getting-started\">Getting started<\/h2>\n<p>To get started, I will transcribe <a href=\"https:\/\/emmanuelenya.bandcamp.com\/track\/stt-audio-sample\">sample audio<\/a> manually to serve as our reference text for calculating WER:<\/p>\n<pre class=\"language-plaintext hljs\">I am proud of a verbal data gathering tool which I built, particularly because, one, the use case, it empowered researchers to collect qualitative data which had richer insights. And secondly, for the architecture we used, we were thinking of reusability, so we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We were committed to producing excellent work.\n<\/pre>\n<p>As much as we can perform WER calculations by hand, we will not do that in this article. Instead, we will use a Python library, <a href=\"https:\/\/pypi.org\/project\/jiwer\/\">JiWER<\/a>, which is optimized to efficiently perform WER calculations in a fraction of the time it would take us to do the manual approach.<\/p>\n<p>Without any further ado, let us dive in and see some Automatic Speech Recognition (ASR) in action. \ud83d\udcaa\ud83d\ude04<\/p>\n<h2 id=\"openai-whisper\">OpenAI and Whisper<\/h2>\n<p>OpenAI is a US company well-known for the introduction of ChatGPT. They offer an audio transcription service, which is based on their public AI audio model, <a href=\"https:\/\/openai.com\/research\/whisper\">Whisper<\/a>. This model is priced at $0.006\/min for pre-recorded audio. As per their <a href=\"https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/supported-languages\">documentation<\/a>, the Whisper API limits the input file to a maximum of 25MB per audio file, and supports transcription in over 66 languages.<\/p>\n<p>Whisper provides the ability to attach simple prompts that can guide the model to produce quality transcripts, such as audio recordings with custom words and phrases that are not easily recognizable. You can also get better punctuation and capitalization using this handy feature.<\/p>\n<p>To use Whisper, install the Python library using pip, like so:<\/p>\n<pre class=\"language-bash hljs\">$ pip install openai\n<\/pre>\n<p>You should note two things:<\/p>\n<ol>\n<li>Running the command will not install the model on our local machine. The library is simply a wrapper that interacts with OpenAI\u2019s API<\/li>\n<li>You will require an API key to access the API via the library. <a href=\"https:\/\/platform.openai.com\/signup\">Sign up<\/a> on OpenAI to create one<\/li>\n<\/ol>\n<h3 id=\"transcribing-whisper\">Transcribing with Whisper<\/h3>\n<p>Now, we proceed to perform transcription!<\/p>\n<pre class=\"language-python hljs\">import openai\nimport time\n\nopenai.api_key = \"{OPEN_AI_API_KEY}\"\n\nstart_time = time.time()  # Start measuring the time\n\nwith open(\"audio_sample.m4a\", \"rb\") as audio_file:\n    transcript = openai.Audio.transcribe(\"whisper-1\", audio_file)\n    print(transcript['text'])\n\nduration = time.time() - start_time  # Calculate the duration in seconds\nprint(f\"Request duration: {duration} seconds\")\n<\/pre>\n<p>The code block above is a fairly simple Python program. Line 8 opens an audio file named <code>audio_sample.m4a<\/code> in binary read mode and assigns it to the local variable <code>audio_file<\/code>.<\/p>\n<p>Importantly, the <code>with<\/code> block is used to ensure proper resource management. It guarantees that <code>audio_sample.m4a<\/code> is closed after execution is completed.<\/p>\n<p>We received a result for the above program in 3.9 seconds:<\/p>\n<pre class=\"language-plaintext hljs\">I am proud of Enverbal Data Gathering tool which I built, particularly because one, the use case, it empowered researchers to collect qualitative data which had richer insights. And secondly, for the architecture we used, we were thinking of reusability, so we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We are committed to producing excellent work.\n<\/pre>\n<p>After multiple trials with the same input, the result remained same, which is good because it means the library is deterministic.<\/p>\n<p>The same cannot be said for the processing time. The API responded within the range of 2-6 seconds for each of our otherwise identical trials.<\/p>\n<h3>Determining word error rate (WER)<\/h3>\n<p>Now it\u2019s time to take the result from Whisper and compare it to the reference transcript I provided earlier to determine the word error rate.<\/p>\n<p>Let\u2019s install the <a href=\"https:\/\/pypi.org\/project\/jiwer\/\">JiWER<\/a> library for this purpose. Open your terminal and execute the command below:<\/p>\n<pre class=\"language-bash hljs\">pip install jiwer\n<\/pre>\n<p>Run the following code to determine the WER:<\/p>\n<pre class=\"language-python hljs\">from jiwer import wer\n\nreference = \"I am proud of a verbal data gathering tool which I built, particularly because one, the use case, it empowered researchers to collect qualitative data which had richer insights. And secondly, for the architecture we used, we were thinking of reusability, so we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We were committed to producing excellent work.\"\n\nhypothesis = \"I am proud of Enverbal Data Gathering tool which I built, particularly because one, the use case, it empowered researchers to collect qualitative data which had richer insights. And secondly, for the architecture we used, we were thinking of reusability, so we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We are committed to producing excellent work.\"\n\nerror = wer(reference, hypothesis)\n\nprint(error)\n<\/pre>\n<p>The result from the above computation is 0.0574, indicating a 5.74% word error rate.<\/p>\n<h2 id=\"deepgram\">Deepgram<\/h2>\n<p>Deepgram is a robust audio transcription service provider that accepts both pre-recorded and live-streamed audio. Setting up a developer account on Deepgram is free and it comes with a $200 credit for interacting with their different AI models.<\/p>\n<p>Deepgram also has a rich <a href=\"https:\/\/playground.deepgram.com\/\">set of features<\/a> that allow you to influence the output of the transcription, such as transcript quality and text format. There are quite a number of them, so If you are unsure of the various options, leaving it at the default works fine.<\/p>\n<h3 id=\"transcribing-deepgram\">Transcribing with Deepgram<\/h3>\n<p>To use Deepgram, install the Python library using pip, like so:<\/p>\n<pre class=\"language-bash hljs\">$ pip install deepgram-sdk\n<\/pre>\n<p>Again, we open an audio file named <code>audio_sample.m4a<\/code> in binary read mode and assign it to the local variable <code>audio_file<\/code>:<\/p>\n<pre class=\"language-python hljs\">from deepgram import Deepgram \nimport time\n\n\nDEEPGRAM_API_KEY = 'SECRETE_KEY' \n\ndg_client = Deepgram(DEEPGRAM_API_KEY) \n\nstart_time = time.time()  # Start measuring the time\n\nwith open(\"audio_sample.m4a\", \"rb\") as audio_file:\n    response = dg_client.transcription.sync_prerecorded( \n    { \n        'buffer': audio_file,\n      'mimetype': 'audio\/m4a'\n\n    }, \n    {\n      \"model\": \"nova\", \n      \"language\": \"en\", \n      \"smart_format\": True, \n    }, \n  ) \n    print(response\\[\"results\"\\][\"channels\"]\\[0\\][\"alternatives\"]\\[0\\][\"transcript\"])\n    duration = time.time() - start_time  # Calculate the duration in seconds\n    print(f\"Request duration: {duration} seconds\")\n<\/pre>\n<p>It took Deepgram about 3.9 seconds to generate the transcript in the block below:<\/p>\n<pre class=\"language-plaintext hljs\">I am proud of a verbal data gathering tool, which I doubt, particularly because when the use case, it empowered researchers to collect qualitative data, which had reached our insights. And secondly, for the architectural use, we're thinking of reusability. So we approached building UIs and other business logic in a functional style such that functions are pure. It outside effect. This way, our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We are committed to producing excellent work.\n<\/pre>\n<h3>Determining word error rate (WER)<\/h3>\n<p>Now let\u2019s calculate the WER for the above transcription:<\/p>\n<pre class=\"language-python hljs\">from jiwer import wer\n\nreference = \"I am proud of a verbal data gathering tool which I built, particularly because one, the use case, it empowered researchers to collect qualitative data which had richer insights. And secondly, for the architecture we used, we were thinking of reusability, so we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We were committed to producing excellent work.\"\n\nhypothesis = \"I am proud of a verbal data gathering tool, which I doubt, particularly because when the use case, it empowered researchers to collect qualitative data, which had reached our insights. And secondly, for the architectural use, we're thinking of reusability. So we approached building UIs and other business logic in a functional style such that functions are pure. It outside effect. This way, our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We are committed to producing excellent work.\"\n\nerror = wer(reference, hypothesis)\n\nprint(error)\n<\/pre>\n<p>The result from the above computation is 0.2184, indicating a 21.84% WER.<\/p>\n<h3 id=\"generating-transcript-summaries\">Generating transcript summaries<\/h3>\n<p>Aside from generating transcripts from audio files, Deepgram provides transcript summaries as well. To enable this feature, we simply specify it as a JSON argument to the Python SDK, like so:<\/p>\n<pre class=\"language-python hljs\">from deepgram import Deepgram \n\nDEEPGRAM_API_KEY = 'YOUR_SECRETE_KEY' \n\ndg_client = Deepgram(DEEPGRAM_API_KEY) \n\nwith open(\"audio_sample.m4a\", \"rb\") as audio_file:\n    response = dg_client.transcription.sync_prerecorded( \n    { \n        'buffer': audio_file,\n      'mimetype': 'audio\/m4a'\n\n    }, \n    {\n      \"model\": \"nova\", \n      \"language\": \"en\", \n      \"smart_format\": True,\n      \"summarize\": \"v2\"\n    }, \n  ) \n    print(response)\n<\/pre>\n<p>The output from the code above looks like this:<\/p>\n<pre class=\"language-python hljs\">{\n...\n  \"results\" : {\n        \"summary\":{\n         \"result\":\"success\",\n         \"short\":\"The speaker discusses a verbal data collection tool and introduces a architectural use of UIs and business logic in a functional style. They also mention the team's amazing work and commitment to producing high-quality work.\"\n      }\n}\n}\n<\/pre>\n<p>The summary clearly reflects what the speaker was communicating. In my opinion, this feature is very useful and has a strong compelling use case for audio-based note-taking apps.<\/p>\n<h2 id=\"rev-ai\">Rev AI<\/h2>\n<p><a href=\"https:\/\/www.rev.com\/\">Rev AI<\/a> combines the intelligence of machines and humans to produce high-quality speech-to-text transcriptions. Its AI transcriptions support 36 languages and can produce results in a matter of minutes. Rev\u2019s AI engine supports custom vocabulary, which means that you can add words or phrases that would not be in the average dictionary to help the speech engine identify them.<\/p>\n<p>There is also support for topic extraction and sentiment analysis. These latter features are very handy for businesses that want to draw insights from what their customers are saying.<\/p>\n<h3 id=\"transcribing-rev-ai\">Transcribing with Rev AI<\/h3>\n<p>To use Rev, install the SDK using pip, like so:<\/p>\n<pre class=\"language-bash hljs\">$ pip install rev-ai\n<\/pre>\n<p>Let\u2019s take a look at the code for the transcription now:<\/p>\n<pre class=\"language-python hljs\">from rev_ai import apiclient\n\ntoken = \"API_KEY\"\nfilePath = \"audio_sample.m4a\"\n\n# create your client\nclient = apiclient.RevAPIClient(token)\n\n# send a local file\njob = client.submit_job_local_file(filePath)\n\n# retrieve transcript as text\ntranscript_text = client.get_transcript_text(job.id)\nprint(transcript_text)\n<\/pre>\n<p>Rev AI executes asynchronously. That is, when you request to perform a transcription, Rev creates a job that is executed off the main program flow. At a later time, you can poll the server with the job ID to check for its status and, ultimately, get the result of the transcription when it completes.<\/p>\n<p>Polling the server works but it is not recommended in a production environment. A better way is to register a webhook that triggers a notification when the task completes.<\/p>\n<p>Due to the asynchronicity of the API, we are not able to use the Python time library to calculate the total execution time. Luckily for us, we have access to that information from Rev\u2019s dashboard:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-181203\" src=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/rev-ai-dashboard.png\" alt=\"The Rev AI dashboard\" width=\"895\" height=\"376\" srcset=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/rev-ai-dashboard.png 895w, https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/rev-ai-dashboard-300x126.png 300w, https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/rev-ai-dashboard-768x323.png 768w\" sizes=\"auto, (max-width: 895px) 100vw, 895px\" \/><\/p>\n<p>The last three jobs take the same input and perform transcription in about 13 seconds. Find below the result of the transcription:<\/p>\n<pre class=\"language-plaintext hljs\">I am proud of a verbal data gathering tool, which I built particularly because, one, the use case, it empowered researchers to collect qualitative data, which had richer insight. And secondly, for the architectural use, we were thinking of reusability. So we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way, our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. Were committed to producing excellent work.\n<\/pre>\n<h3>Determining word error rate (WER)<\/h3>\n<p>Now let\u2019s calculate the WER for the transcription:<\/p>\n<pre class=\"language-python hljs\">from jiwer import wer\n\nreference = \"I am proud of a verbal data gathering tool which I built, particularly because one, the use case, it empowered researchers to collect qualitative data which had richer insights. And secondly, for the architecture we used, we were thinking of reusability, so we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. We were committed to producing excellent work.\"\n\nhypothesis = \"I am proud of a verbal data gathering tool, which I built particularly because, one, the use case, it empowered researchers to collect qualitative data, which had richer insight. And secondly, for the architectural use, we were thinking of reusability. So we approached building UIs and other business logic in a functional style such that functions are pure without side effects. This way, our logic were deterministic and reusable. Lastly, the team which I worked with was truly amazing. Were committed to producing excellent work.\"\n\nerror = wer(reference, hypothesis)\n\nprint(error)\n<\/pre>\n<p>The result from the above computation is 0.1494, indicating a 14.94% error rate.<\/p>\n<h2 id=\"comparison-table\">Comparison table<\/h2>\n<p>If you\u2019ve made it this far, well done! We have covered a lot in exploring these different AI speech-to-text providers. Next, let\u2019s let\u2019s see how they perform based on the metrics we set at the get-go.<\/p>\n<p>To be fair in this, I executed the transcription process 20 times for each of the service providers. Here\u2019s the output:<\/p>\n<table class=\"center\">\n<thead>\n<tr>\n<th>STT Provider<\/th>\n<th>WER<\/th>\n<th>Avg Exection Time<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>OpenAI<\/td>\n<td>5.74%<\/td>\n<td>3.4s<\/td>\n<\/tr>\n<tr>\n<td>DeepGram<\/td>\n<td>21.84%<\/td>\n<td>3.1s<\/td>\n<\/tr>\n<tr>\n<td>Rev<\/td>\n<td>14.94%<\/td>\n<td>15.1s<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-181204\" src=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/word-error-rate-chart.png\" alt=\"Word error rate comparison chart\" width=\"895\" height=\"671\" srcset=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/word-error-rate-chart.png 895w, https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/word-error-rate-chart-300x225.png 300w, https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/word-error-rate-chart-768x576.png 768w\" sizes=\"auto, (max-width: 895px) 100vw, 895px\" \/><\/p>\n<h2>Conclusion<\/h2>\n<p>The speech-to-text (STT) service providers covered in this article deliver diverse solutions. OpenAI excels in accuracy, with a 5.74% Word Error Rate (WER). Deepgram provides rapid results in just 3.1 seconds, making it suitable for time-sensitive tasks. Rev combines human and AI transcription for high-quality outputs.<\/p>\n<p>Choosing the right STT provider depends on your priorities \u2014 be they precision, speed, or a balance of both. Keep an eye on the evolving AI landscape for new features and possibilities.<\/p>\n<\/html>\n","protected":false},"excerpt":{"rendered":"<p>AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.<\/p>\n","protected":false},"author":156415942,"featured_media":181206,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2147999,1],"tags":[2109833],"class_list":["post-181200","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dev","category-uncategorized","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.1.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Exploring AI speech-to-text services with Python - LogRocket Blog<\/title>\n<meta name=\"description\" content=\"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Exploring AI speech-to-text services with Python - LogRocket Blog\" \/>\n<meta property=\"og:description\" content=\"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/\" \/>\n<meta property=\"og:site_name\" content=\"LogRocket Blog\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-14T14:00:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-04T20:54:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png\" \/>\n\t<meta property=\"og:image:width\" content=\"895\" \/>\n\t<meta property=\"og:image:height\" content=\"597\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Emmanuel Enya\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/enyason95\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Emmanuel Enya\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/\",\"url\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/\",\"name\":\"Exploring AI speech-to-text services with Python - LogRocket Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.logrocket.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png\",\"datePublished\":\"2023-11-14T14:00:06+00:00\",\"dateModified\":\"2024-06-04T20:54:43+00:00\",\"author\":{\"@id\":\"https:\/\/blog.logrocket.com\/#\/schema\/person\/bed4921375bee03000cfb416558730cf\"},\"description\":\"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#primaryimage\",\"url\":\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png\",\"contentUrl\":\"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png\",\"width\":895,\"height\":597,\"caption\":\"Exploring AI speech-to-text services with Python\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.logrocket.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Exploring AI speech-to-text services with Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.logrocket.com\/#website\",\"url\":\"https:\/\/blog.logrocket.com\/\",\"name\":\"LogRocket Blog\",\"description\":\"Resources to Help Product Teams Ship Amazing Digital Experiences\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.logrocket.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.logrocket.com\/#\/schema\/person\/bed4921375bee03000cfb416558730cf\",\"name\":\"Emmanuel Enya\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.logrocket.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/16a8c3d7db74d8c310a92982ee365cf4ce0e8f68630c82040fc4332d479f584f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/16a8c3d7db74d8c310a92982ee365cf4ce0e8f68630c82040fc4332d479f584f?s=96&d=mm&r=g\",\"caption\":\"Emmanuel Enya\"},\"description\":\"I am a computer engineering graduate with five years of professional experience building modern Android applications. I am a huge fan of clean code because clarity is King \ud83d\ude04\",\"sameAs\":[\"https:\/\/github.com\/enyason\",\"https:\/\/www.linkedin.com\/in\/enyason\/\",\"https:\/\/x.com\/https:\/\/twitter.com\/enyason95\"],\"url\":\"https:\/\/blog.logrocket.com\/author\/emmanuelenya\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Exploring AI speech-to-text services with Python - LogRocket Blog","description":"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/","og_locale":"en_US","og_type":"article","og_title":"Exploring AI speech-to-text services with Python - LogRocket Blog","og_description":"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.","og_url":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/","og_site_name":"LogRocket Blog","article_published_time":"2023-11-14T14:00:06+00:00","article_modified_time":"2024-06-04T20:54:43+00:00","og_image":[{"width":895,"height":597,"url":"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png","type":"image\/png"}],"author":"Emmanuel Enya","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/enyason95","twitter_misc":{"Written by":"Emmanuel Enya","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/","url":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/","name":"Exploring AI speech-to-text services with Python - LogRocket Blog","isPartOf":{"@id":"https:\/\/blog.logrocket.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#primaryimage"},"image":{"@id":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png","datePublished":"2023-11-14T14:00:06+00:00","dateModified":"2024-06-04T20:54:43+00:00","author":{"@id":"https:\/\/blog.logrocket.com\/#\/schema\/person\/bed4921375bee03000cfb416558730cf"},"description":"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.","breadcrumb":{"@id":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#primaryimage","url":"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png","contentUrl":"https:\/\/blog.logrocket.com\/wp-content\/uploads\/2023\/11\/exploring-ai-speech-text-services-python.png","width":895,"height":597,"caption":"Exploring AI speech-to-text services with Python"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.logrocket.com\/exploring-ai-speech-text-services-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.logrocket.com\/"},{"@type":"ListItem","position":2,"name":"Exploring AI speech-to-text services with Python"}]},{"@type":"WebSite","@id":"https:\/\/blog.logrocket.com\/#website","url":"https:\/\/blog.logrocket.com\/","name":"LogRocket Blog","description":"Resources to Help Product Teams Ship Amazing Digital Experiences","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.logrocket.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.logrocket.com\/#\/schema\/person\/bed4921375bee03000cfb416558730cf","name":"Emmanuel Enya","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.logrocket.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/16a8c3d7db74d8c310a92982ee365cf4ce0e8f68630c82040fc4332d479f584f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/16a8c3d7db74d8c310a92982ee365cf4ce0e8f68630c82040fc4332d479f584f?s=96&d=mm&r=g","caption":"Emmanuel Enya"},"description":"I am a computer engineering graduate with five years of professional experience building modern Android applications. I am a huge fan of clean code because clarity is King \ud83d\ude04","sameAs":["https:\/\/github.com\/enyason","https:\/\/www.linkedin.com\/in\/enyason\/","https:\/\/x.com\/https:\/\/twitter.com\/enyason95"],"url":"https:\/\/blog.logrocket.com\/author\/emmanuelenya\/"}]}},"yoast_description":"AI speech-to-text services can make it easy to produce audio transcriptions quickly. Learn how to leverage them in this post.","_links":{"self":[{"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/posts\/181200","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/users\/156415942"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/comments?post=181200"}],"version-history":[{"count":6,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/posts\/181200\/revisions"}],"predecessor-version":[{"id":181208,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/posts\/181200\/revisions\/181208"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/media\/181206"}],"wp:attachment":[{"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/media?parent=181200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/categories?post=181200"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.logrocket.com\/wp-json\/wp\/v2\/tags?post=181200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}