By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large:
To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))").
Here is a simple Bash script to install all of them, assuming you've downloaded all three ZIP files into your current directory:
#!/usr/bin/env bash
SR_LIB=$(python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))")
sudo apt-get install --yes unzip
sudo unzip -o fr-FR.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/fr-FR/"
sudo unzip -o zh-CN.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/zh-CN/"
sudo unzip -o it-IT.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/pocketsphinx-data/it-IT/"Once installed, you can simply specify the language using the language parameter of recognizer_instance.recognize_sphinx. For example, French would be specified with "fr-FR" and Mandarin with "zh-CN".
For Linux and other POSIX systems (like OS X), you'll want to build from source. It should take less than two minutes on a fast machine.
- On any Debian-derived Linux distributions (like Ubuntu and Mint):
- Run
sudo apt-get install python3 python3-all-dev python3-pip build-essential swig git libpulse-dev libasound2-devfor Python 3. - Run
pip3 install pocketsphinxfor Python 3.
- Run
- On OS X:
- Run
brew install swig git python3for Python 3. - Install PocketSphinx-Python using Pip:
pip install pocketsphinx. - If this gives errors when importing the library in your program, try running
brew link --overwrite python.
- If this gives errors when importing the library in your program, try running
- Install PocketSphinx-Python using Pip:
- Run
- On Windows:
- Install Python, Pip, SWIG, and Git, preferably using a package manager.
- Add the folders containing the Python, SWIG, and Git binaries to your
PATHenvironment variable. - My
PATHenvironment variable looks something like:C:\Users\Anthony\Desktop\swigwin-3.0.8;C:\Program Files\Git\cmd;(A BUNCH OF OTHER PATHS).
- My
- Add the folders containing the Python, SWIG, and Git binaries to your
- Reboot to apply changes.
- Download the full PocketSphinx-Python source code by running
git clone --recursive --depth 1 https://github.com/cmusphinx/pocketsphinx-python(downloading the ZIP archive from GitHub will not work). - Run
python setup.py installin the PocketSphinx-Python source code folder to compile and install PocketSphinx. - Side note: when I build the precompiled Wheel packages, I skip steps 5 and 6 and do the following instead:
- For Python 3.4:
C:\Python34\python.exe setup.py bdist_wheel. - For Python 3.5:
C:\Users\Anthony\AppData\Local\Programs\Python\Python35\python.exe setup.py bdist_wheel. - The resulting packages are located in the
distfolder of the PocketSphinx-Python project directory.
- For Python 3.4:
- Every language has its own folder under
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/, whereLANGUAGE_NAMEis the IETF language tag, like"en-US"(US English) or"en-GB"(UK English). - For example, the US English data is stored in
/speech_recognition/pocketsphinx-data/en-US/. - The
languageparameter ofrecognizer_instance.recognize_sphinxsimply chooses the folder with the given name.
- For example, the US English data is stored in
- Every language has its own folder under
- Languages are composed of 3 parts:
- An acoustic model
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/acoustic-model/, which describes how to interpret audio data. - Acoustic models can be downloaded from the CMU Sphinx files. These are pretty disorganized, but instructions for cleaning up specific versions are listed below.
- All of these should be 16 kHz (broadband) models, since that's what the library will assume is being used.
- An acoustic model
- A language model
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/language-model.lm.bin(in CMU binary format). - A pronounciation dictionary
/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/pronounciation-dictionary.dict, which describes how words in the language are pronounced.
- All of the following points assume a Debian-derived Linux Distibution (like Ubuntu or Mint).
- To work with any complete, real-world languages, you will need quite a bit of RAM (16 GB recommended) and a fair bit of disk space (20 GB recommended).
- SphinxBase is needed for all language model file format conversions. We use it to convert between
*.dmpDMP files (an obselete Sphinx binary format),*.lmARPA files, and Sphinx binary*.lm.binfiles: - Install all the SphinxBase build dependencies with
sudo apt-get install build-essential automake autotools-dev autoconf libtool. - Download and extract the SphinxBase source code.
- Follow the instructions in the README to install SphinxBase. Basically, run
sh autogen.sh --force && ./configure && make && sudo make installin the SphinxBase folder.
- Install all the SphinxBase build dependencies with
- SphinxBase is needed for all language model file format conversions. We use it to convert between
- Pruning (getting rid of less important information) is useful if language model files are too large. We can do this using IRSTLM:
- Install all the IRSTLM build dependencies with
sudo apt-get install build-essential automake autotools-dev autoconf libtool - Download and extract the IRSTLM source code.
- Follow the instructions in the README to install IRSTLM. Basically, run
sh regenerate-makefiles.sh --force && ./configure && make && sudo make installin the IRSTLM folder. - If the language model is not in ARPA format, convert it to the ARPA format. To do this, ensure that SphinxBase is installed and run
sphinx_lm_convert -i LANGUAGE_MODEL_FILE_GOES_HERE -o language-model.lm -ofmt arpa. - Prune the model using IRSTLM: run
prune-lm --threshold=1e-8 t.lm pruned.lmto prune with a threshold of 0.00000001. The higher the threshold, the smaller the resulting file. - Convert the model back into binary format if it was originally not in ARPA format. To do this, ensure that SphinxBase is installed and run
sphinx_lm_convert -i language-model.lm -o LANGUAGE_MODEL_FILE_GOES_HERE.
- Install all the IRSTLM build dependencies with
- US English:
/speech_recognition/pocketsphinx-data/en-US/is taken directly from the contents of PocketSphinx's US English model. - International French:
/speech_recognition/pocketsphinx-data/fr-FR/: /speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.binisfr-small.lm.binfrom the Sphinx French language model./speech_recognition/pocketsphinx-data/fr-FR/pronounciation-dictionary.dictisfr.dictfrom the Sphinx French language model./speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/contains all of the files extracted fromcmusphinx-fr-5.2.tar.gzin the Sphinx French acoustic model.- To get better French recognition accuracy at the expense of higher disk space and RAM usage:
- Download
fr.lm.gmpfrom the Sphinx French language model. - Convert from DMP (an obselete Sphinx binary format) to ARPA format:
sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin. - Replace
/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.binwithfrench.lm.bincreated in the previous step.
- Download
- International French:
- Mandarin Chinese:
/speech_recognition/pocketsphinx-data/zh-CN/: /speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.binis generated as follows:- Download
zh_broadcastnews_64000_utf8.DMPfrom the Sphinx Mandarin language model. - Convert from DMP (an obselete Sphinx binary format) to ARPA format:
sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa. - Prune with a threshold of 0.00000004 using
prune-lm --threshold=4e-8 chinese.lm chinese.lm. - Convert from ARPA format to Sphinx binary format:
sphinx_lm_convert -i chinese.lm -o chinese.lm.bin. - Replace
/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.binwithchinese.lm.bincreated in the previous step.
- Download
/speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dictiszh_broadcastnews_utf8.dicfrom the Sphinx Mandarin language model./speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/contains all of the files extracted fromzh_broadcastnews_16k_ptm256_8000.tar.bz2in the Sphinx Mandarin acoustic model.- To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing
zh_broadcastnews_64000_utf8.DMP.
- Mandarin Chinese:
- Italian:
/speech_recognition/pocketsphinx-data/it-IT/: /speech_recognition/pocketsphinx-data/it-IT/language-model.lm.binis generated as follows:- Download
cmusphinx-it-5.2.tar.gzfrom the Sphinx Italian language model. - Extract
/etc/voxforge_it_sphinx.lmfromcmusphinx-it-5.2.tar.gzasitalian.lm. - Convert from ARPA format to Sphinx binary format:
sphinx_lm_convert -i italian.lm -o italian.lm.bin. - Replace
/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.binwithitalian.lm.bincreated in the previous step.
- Download
/speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dictis/etc/voxforge_it_sphinx.dicfromcmusphinx-it-5.2.tar.gz(from the Sphinx Italian language model)./speech_recognition/pocketsphinx-data/it-IT/acoustic-model/contains all of the files in/model_parametersextracted fromcmusphinx-it-5.2.tar.gz(from the Sphinx Italian language model).
- Italian: