0% found this document useful (0 votes)

84 views2 pages

Murenei - Natural Language Processing With Python and NLTK

This document provides a cheat sheet on natural language processing with Python and the nltk library. It covers topics like text handling, tokenization, part-of-speech tagging, parsing, named entity recognition, and using regular expressions with Pandas.

Uploaded by

Sony Asampalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views2 pages

Murenei - Natural Language Processing With Python and NLTK

Uploaded by

Sony Asampalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Natural Language Processing with Python & nltk Cheat Sheet

by RJ Murray (murenei) via cheatography.com/58736/cs/15485/

Handling Text Part of Speech (POS) Tagging

text='Some words' assign string nltk.help.upenn_tagset( Lookup definition for a POS

list(text) Split text into character tokens 'MD') tag

set(text) Unique tokens nltk.pos_tag(words) nltk in-built POS tagger

len(text) Number of characters <use an alternative tagger

to illustrate ambiguity>

Accessing corpora and lexical resources

Sentence Parsing
from nltk.corpus import brow import CorpusReader
object g=nltk.data.load('grammar.cfg') Load a
n
grammar from
brown.words(text_id) Returns pretokenised
a file
document as list of words
g=nltk.CFG.fromstring("""...""") Manually
brown.fileids() Lists docs in Brown
define
corpus
grammar
brown.categories() Lists categories in Brown
parser=nltk.ChartParser(g) Create a parser
corpus
out of the
grammar
Tokenization
trees=parser.parse_all(text)
text.split(" ") Split by space
for tree in trees: ... print tree
nltk.word_tokenizer( nltk in-built word tokenizer
from nltk.corpus import treebank
text)
treebank.parsed_sents('wsj_00 Treebank
nltk.sent_tokenize(d nltk in-built sentence tokenizer
01.mrg') parsed
oc)
sentences

Lemmatization & Stemming

Text Classification
input="List listed lists listing listing Different
from sklearn.feature_extraction.text import CountVe
s" suffixes
ectorizer
words=input.lower().split(' ') Normalize
vect=CountVectorizer().fit(X_train) Fit bag of word
(lower‐
vect.get_feature_names() Get features
case)
words vect.transform(X_train) Convert to doc

porter=nltk.PorterStemmer Initialise
Stemmer
[porter.stem(t) for t in words] Create list
of stems
WNL=nltk.WordNetLemmatizer() Initialise
WordNet
lemmatizer
[WNL.lemmatize(t) for t in words] Use the
lemmatizer

By RJ Murray (murenei) Published 28th May, 2018. Sponsored by Readable.com

cheatography.com/murenei/ Last updated 29th May, 2018. Measure your website readability!
tutify.com.au Page 1 of 2. https://readable.com
Natural Language Processing with Python & nltk Cheat Sheet
by RJ Murray (murenei) via cheatography.com/58736/cs/15485/

Entity Recognition (Chunking/Chinking)

g="NP: {<DT>?<JJ>*<NN>‐ Regex chunk grammar

cp=nltk.RegexpParser(g Parse grammar

)

ch=cp.parse(pos_sent) Parse tagged sent. using

grammar
print(ch) Show chunks

ch.draw() Show chunks in IOB tree

cp.evaluate(test_sents Evaluate against test doc

)

sents=nltk.corpus.treebank.tagged_sents(
)

print(nltk.ne_chunk(s‐ Print chunk tree

ent))

RegEx with Pandas & Named Groups

df=pd.DataFrame(time_sents, columns=['text'])

df['text'].str.split().str.len()

df['text'].str.contains('word')

df['text'].str.count(r'\d')

df['text'].str.findall(r'\d')

df['text'].str.replace(r'\w+day\b', '???')

df['text'].str.replace(r'(\w)', lambda x: x.groups(‐

)[0][:3])

df['text'].str.extract(r'(\d?\d):(\d\d)')

df['text'].str.extractall(r'((\d?\d):(\d\d) ?([ap
]m))')

df['text'].str.extractall(r'(?P<digits>\d)')

By RJ Murray (murenei) Published 28th May, 2018. Sponsored by Readable.com

cheatography.com/murenei/ Last updated 29th May, 2018. Measure your website readability!
tutify.com.au Page 2 of 2. https://readable.com

NLP with Python & NLTK Guide
No ratings yet
NLP with Python & NLTK Guide
2 pages
Natural Language Processing Journal
No ratings yet
Natural Language Processing Journal
73 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
CCS369-Text and Speech Analysis Lab (1-9)
No ratings yet
CCS369-Text and Speech Analysis Lab (1-9)
37 pages
Lab-1 - Tokenization, Stemming, Stopwords - Jupyter Notebook
No ratings yet
Lab-1 - Tokenization, Stemming, Stopwords - Jupyter Notebook
15 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
NLP Techniques for Developers
No ratings yet
NLP Techniques for Developers
3 pages
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
NLP with NLTK in Python Guide
No ratings yet
NLP with NLTK in Python Guide
5 pages
Natural Language Processing With Python's NLTK Package - Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package - Real Python
27 pages
TSA Lab Manual New
No ratings yet
TSA Lab Manual New
14 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
7 Idf
No ratings yet
7 Idf
5 pages
Natural Language Processing Practical Journal
No ratings yet
Natural Language Processing Practical Journal
27 pages
NLP Using Python
No ratings yet
NLP Using Python
50 pages
NLP
No ratings yet
NLP
12 pages
NLP Tutorial with Python NLTK
No ratings yet
NLP Tutorial with Python NLTK
19 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
NLP Record
No ratings yet
NLP Record
6 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
27 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLTK for NLP Education
No ratings yet
NLTK for NLP Education
4 pages
UBC Summer Linguistics Course Overview
No ratings yet
UBC Summer Linguistics Course Overview
33 pages
NLTK Text Analysis Cheatsheet
No ratings yet
NLTK Text Analysis Cheatsheet
3 pages
NLTK Text Analysis Cheatsheet
No ratings yet
NLTK Text Analysis Cheatsheet
3 pages
NLP Practical Manual
No ratings yet
NLP Practical Manual
48 pages
NLTK Cheatsheet for Text Analysis
No ratings yet
NLTK Cheatsheet for Text Analysis
3 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
Tsa Labmanual
No ratings yet
Tsa Labmanual
26 pages
NLTK
No ratings yet
NLTK
16 pages
Python
No ratings yet
Python
9 pages
Module 1 Updated Final
No ratings yet
Module 1 Updated Final
45 pages
NLTK: Python for Natural Language Processing
No ratings yet
NLTK: Python for Natural Language Processing
23 pages
Ccs369-Lab Ex 3,4,5
No ratings yet
Ccs369-Lab Ex 3,4,5
8 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
28 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
12 pages
NLTK Tutorial: Basics and Techniques
No ratings yet
NLTK Tutorial: Basics and Techniques
33 pages
NLP Journl
No ratings yet
NLP Journl
15 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
NLP Notes and Related Questions
No ratings yet
NLP Notes and Related Questions
7 pages
NLP Assignment 1 by Devesh Pawar
No ratings yet
NLP Assignment 1 by Devesh Pawar
11 pages
InfoSec Lab Manual for Students
No ratings yet
InfoSec Lab Manual for Students
25 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
3.Nlp Lab Manual
No ratings yet
3.Nlp Lab Manual
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Main Topics: Start With A Checkmark Followed by The Topic Name
No ratings yet
Main Topics: Start With A Checkmark Followed by The Topic Name
48 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Text Processing with NLTK in Python
No ratings yet
Text Processing with NLTK in Python
16 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
ChatGPT Advanced Tutorial
92% (36)
ChatGPT Advanced Tutorial
57 pages
Prompt Engineer 101
97% (34)
Prompt Engineer 101
45 pages
The Python Bible
97% (33)
The Python Bible
506 pages
15000+ ChatGPT Prompts, (Crafti - Pro) - Tareas
93% (27)
15000+ ChatGPT Prompts, (Crafti - Pro) - Tareas
367 pages
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
100% (11)
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
209 pages
101 Best Funnel Prompts PDF
100% (25)
101 Best Funnel Prompts PDF
57 pages
Python in Excel (2024)
100% (14)
Python in Excel (2024)
607 pages
The Best ChatGPT
98% (53)
The Best ChatGPT
8 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (15)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
200 ChatGPT Prompts
87% (60)
200 ChatGPT Prompts
14 pages
Practical Projects
100% (32)
Practical Projects
478 pages
ChatGPT Cheatsheet (v3)
90% (21)
ChatGPT Cheatsheet (v3)
1 page
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
97% (35)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Full Course of Machine Learning
100% (17)
Full Course of Machine Learning
660 pages
Chat GPT For Dummies. A Quick Introduction To Prompt Engineering 2023
92% (12)
Chat GPT For Dummies. A Quick Introduction To Prompt Engineering 2023
33 pages
Comprehensive ChatGPT Prompt Guide
90% (21)
Comprehensive ChatGPT Prompt Guide
120 pages
Codi Byte - Chat GPT Bible - 10 Books in 1_ Everything You Need to Know About AI and Its Applications to Improve Your Life, Boost Productivity, Earn Money, Advance Your Career, And Develop New Skills.
94% (31)
Codi Byte - Chat GPT Bible - 10 Books in 1_ Everything You Need to Know About AI and Its Applications to Improve Your Life, Boost Productivity, Earn Money, Advance Your Career, And Develop New Skills.
447 pages
ChatGPT Cheat Sheet
100% (37)
ChatGPT Cheat Sheet
4 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Unlocking The Potential of ChatGPT
100% (22)
Unlocking The Potential of ChatGPT
45 pages
Machine Learning Projects in Python
100% (17)
Machine Learning Projects in Python
135 pages
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
100% (15)
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
244 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (18)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
Mastering AI Agents
100% (12)
Mastering AI Agents
93 pages
SEO Handbook
85% (13)
SEO Handbook
164 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (18)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
The Ultimate n8n Starter Kit (2025)
100% (10)
The Ultimate n8n Starter Kit (2025)
36 pages
AI Agents by Google
100% (11)
AI Agents by Google
42 pages
Shortest Strings in Formal Languages
No ratings yet
Shortest Strings in Formal Languages
60 pages
CS301 Theory of Computation
No ratings yet
CS301 Theory of Computation
2 pages
Constrained Decoding For Fill-in-the-Middle Code Language Models Via Efficient Left and Right Quotienting of Context-Sensitive Grammars
No ratings yet
Constrained Decoding For Fill-in-the-Middle Code Language Models Via Efficient Left and Right Quotienting of Context-Sensitive Grammars
12 pages
10CS56 Dec14
No ratings yet
10CS56 Dec14
3 pages
Handout-13 Context Free Grammars
No ratings yet
Handout-13 Context Free Grammars
38 pages
Chomsky Classification of Grammar
No ratings yet
Chomsky Classification of Grammar
17 pages
UCS802
No ratings yet
UCS802
1 page
Toc Imp Questions
No ratings yet
Toc Imp Questions
3 pages
Complier Design Gate Question
No ratings yet
Complier Design Gate Question
22 pages
Context-Free Grammars in TOC
No ratings yet
Context-Free Grammars in TOC
100 pages
Automata and Complexity Theory Chapter 3
No ratings yet
Automata and Complexity Theory Chapter 3
10 pages
Equivalence of Pushdown Automata With Context-Free Grammar
No ratings yet
Equivalence of Pushdown Automata With Context-Free Grammar
45 pages
Easy Algebraic Expression Parsing
No ratings yet
Easy Algebraic Expression Parsing
16 pages
Chomsky Normal Form Guide
No ratings yet
Chomsky Normal Form Guide
45 pages
Compilers Principles Techniques and Tools Updated 2nd Edition Alfred v. Aho
No ratings yet
Compilers Principles Techniques and Tools Updated 2nd Edition Alfred v. Aho
15 pages
SE Compiler Chapter 4-SDT
No ratings yet
SE Compiler Chapter 4-SDT
7 pages
Research Paper Compiler
No ratings yet
Research Paper Compiler
9 pages
Automata & Compiler Design Course
No ratings yet
Automata & Compiler Design Course
2 pages
FLAT Solved Question Paper
No ratings yet
FLAT Solved Question Paper
9 pages
Compiler Design Set A Final To Print
No ratings yet
Compiler Design Set A Final To Print
1 page
Automata Theory Homework Guide
No ratings yet
Automata Theory Homework Guide
11 pages
Compiler Design Unit 4: LR (0) Example (Belongs To Unit 3)
No ratings yet
Compiler Design Unit 4: LR (0) Example (Belongs To Unit 3)
7 pages
Chapter 3 Regular Expressions Notes
100% (1)
Chapter 3 Regular Expressions Notes
36 pages
Lec 02 - Chapter - 3 (Part - 1)
No ratings yet
Lec 02 - Chapter - 3 (Part - 1)
24 pages
CIS 262 Fall 2012 Homework 1 Tasks
No ratings yet
CIS 262 Fall 2012 Homework 1 Tasks
2 pages
Compiler Assignment-1
No ratings yet
Compiler Assignment-1
4 pages
Properties of Finite Automata
No ratings yet
Properties of Finite Automata
58 pages
TOC (Questions)
No ratings yet
TOC (Questions)
3 pages
Tafl 2018
No ratings yet
Tafl 2018
2 pages
11.solutions Set 5
No ratings yet
11.solutions Set 5
5 pages

Murenei - Natural Language Processing With Python and NLTK

Uploaded by

Murenei - Natural Language Processing With Python and NLTK

Uploaded by

Natural Language Processing with Python & nltk Cheat Sheet

by RJ Murray (murenei) via cheatography.com/58736/cs/15485/

Handling Text Part of Speech (POS) Tagging

text='Some words' assign string nltk.h​elp.up​enn​_ta​gse​t( Lookup definition for a POS

list(text) Split text into character tokens 'MD') tag

set(text) Unique tokens nltk.p​os_​tag​(words) nltk in-built POS tagger

len(text) Number of characters <use an altern​ative tagger

Accessing corpora and lexical resources

Lemmat​ization & Stemming

By RJ Murray (murenei) Published 28th May, 2018. Sponsored by Readable.com

Entity Recogn​ition (Chunk​ing​/Ch​inking)

g="NP: {<D​T>?​<JJ​>*<​NN>​‐ Regex chunk grammar

cp=nlt​k.R​ege​xpP​ars​er(g Parse grammar

ch=cp.p​ar​se(​pos​_sent) Parse tagged sent. using

ch.draw() Show chunks in IOB tree

cp.eva​lua​te(​tes​t_s​ents Evaluate against test doc

print(​nlt​k.n​e_c​hun​k(s​‐ Print chunk tree

RegEx with Pandas & Named Groups

df['te​xt'​].s​tr.r​ep​lac​e(r​'(\w)', lambda x: x.grou​ps(​‐

By RJ Murray (murenei) Published 28th May, 2018. Sponsored by Readable.com

You might also like

text='Some words' assign string nltk.help.upenn_tagset( Lookup definition for a POS

set(text) Unique tokens nltk.pos_tag(words) nltk in-built POS tagger

len(text) Number of characters <use an alternative tagger

Lemmatization & Stemming

Entity Recognition (Chunking/Chinking)

g="NP: {<DT>?<JJ>*<NN>‐ Regex chunk grammar

cp=nltk.RegexpParser(g Parse grammar

ch=cp.parse(pos_sent) Parse tagged sent. using

cp.evaluate(test_sents Evaluate against test doc

print(nltk.ne_chunk(s‐ Print chunk tree

df['text'].str.replace(r'(\w)', lambda x: x.groups(‐