Machine learning and statistics with python

Posts

Showing posts with the label python nlp usage

Understanding Readability Score:Implement readability in python

Introduction: In the vast landscape of written communication, readability score stands as a crucial metric, often overlooked but profoundly impactful. In essence, readability score measures the ease with which a reader can comprehend a piece of text. This score is determined by various linguistic factors such as sentence structure, word choice, and overall complexity. While seemingly technical, readability score plays a vital role in shaping effective communication across diverse contexts, from literature and journalism to academia and business. At its core, readability score serves as a bridge between the writer and the reader, facilitating a smoother flow of information and ideas. Imagine trying to traverse a rugged terrain versus a well-paved road; similarly, text with a high readability score offers a smoother journey for the reader's comprehension. By analyzing factors like sentence length, syllable count, and vocabulary complexity, readability formulas provide a quantita...

pytextrank: module for TextRank for phrase extraction and text summarization

Introduction: We have described spacy in part1 , part2 , part3 , and part4 . In this post, we will describe the pytextrank project based on spacy structure which solves phrase extraction and text summarization. Pytextrank is written by Paco nathan , an american computer scientist, based on texas. Pytextrank is mainly interesting for me for two reasons: (1) implementation of the textrank algorithm very nicely in a spacy extension format (2) the easy usage of the package which properly abstracts out all the complexity of the package from the user and can be used with little to no understanding of the underlying algorithm. Now, as I may have given enough motivation to read and use this package; we will explore the basic usage first, and then dive in to see the inner working; which will be the more advanced part of this post. How to use pytextrank: pytextrank can be installed via pip3 install pytextrank as it is included in the pypi listing. Now once you install it in that manner; t...

spacy exploration part 3: spacy data structures and pipelines

Introduction: We discussed about dependency parsing in part 2 of spacy exploration series. Now, in the part 3 of our spacy exploration, we will explore some more concepts of NLP usages by spacy pipelines and utilities. Let's dive in. How does spacy work internally? Spacy uses all types of optimizations possible to make the processing as fast as possible. One of the main trick in doing so is to use hash code for the strings, and turn them into string as late as possible. The way it helps is that, digits take fixed spaces and can be processed faster than them in most of the operation. For this reason, all strings are hash coded and the vocabulary object behaves like a double dictionary, in which using the hash you can find the string, and using the string, you can find the hash. See the following examples to get the idea about hashing: Now, let's go over the data structures of the main objects in nlp. First we will see how to create a doc object manually to understand the ...

dependency parsing using spaCy : spacy exploration part 2

Select Language Afrikaans Albanian Arabic Armenian Azerbaijani Basque Belarusian Bulgarian Catalan Chinese (Simplified) Chinese (Traditional) Croatian Czech Danish Dutch English Estonian Filipino Finnish French Galician Georgian German Greek Haitian Creole Hebrew Hindi Hungarian Icelandic Indonesian Irish Italian Japanese Korean Latvian Lithuanian Macedonian Malay Maltese Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Thai Turkish Ukrainian Urdu Vietnamese Welsh Yiddish Bengali Gujarati Marathi Nepali Punjabi Tamil Telugu Introduction: In our previous post , we discussed about the basic nlp works using spacy. If you have not read that post, read that post now for better understanding. Today we are going to discuss dependency parsing using spaCy. This is the second post of our spacy exploration series. What is dependency parsing? dependency parsing is the analyzing of a sentence in grammatical way, to establish the ...

NLP using spacy: spacy exploration part 1

Introduction: spaCy is an open source natural language software library for advanced natural language processing, written in 2015 by explosion ai founders Matthew Honnibal and Ines Montani. While NLTK is mainly used for teaching nlp concepts and research, spaCy is one of the most famous packages used in production for companies world-wide. Before spaCy, the market was lacking of a production level great packages, which people would integrate to their services and use the best nlp services present. And spaCy exactly did that. To quote Mr.Honnibal from '15, " spaCy is a new library for text processing in Python and Cython. I wrote it because I think small companies are terrible at natural language processing (NLP). Or rather: small companies are using terrible NLP technology. " spaCy is a industrial library which is written on python and cython; and provides support for TensorFlow, PyTorch, MXNet and other deep learning platforms. In this post, we will explore the dif...