Some basic python functions for working with the Thai language. For example:
import pythai
pythai.split(u"การที่ได้ต้องแสดงว่างานดี")
>>> u"การ ที่ ได้ ต้อง แสดง ว่า งาน ดี"
pythai.word_count(u"การที่ได้ต้องแสดงว่างานดี")
>>> 8
pythai.contains_thai(u"hello")
>>> False
pythai.contains_thai(u"helloการที่ไ")
>>> TrueIt's meant to be fast and efficient enough to handle large documents without breaking a sweat.
Currently the library supports these functions:
- Word segmentation (
split) - Word count (
word_count) (faster than counting the result ofsplit) - Whether a string contains Thai or not (
contains_thai)
PyThai requires libthai-dev to work. You can install it quite easily:
sudo apt-get install libthai-dev
And then you can simply install pythai through pip:
pip install pythai==0.1.3
Special thanks to Vee Satayamas for the original python bindings of libthai from C.
This library was written for use in Gengo. It's free and open-source under the GNU lesser public license. Any contributions are welcome!