0% found this document useful (0 votes)

11 views2 pages

Practicle 7-Notes

The document outlines various functions and techniques from the NLTK library and sklearn for natural language processing (NLP). Key functions include sentence and word tokenization, frequency distribution analysis, stopword removal, stemming, lemmatization, part-of-speech tagging, and TF-IDF feature extraction. These tools are essential for preprocessing and analyzing text data in NLP applications.

Uploaded by

kanadeshubhu04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views2 pages

Practicle 7-Notes

Uploaded by

kanadeshubhu04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

Practicle 7

DSBDA

1. sent_tokenize function from the nltk.tokenize module

- This function helps in tokenizing text based on sentences rather than words,
which can be a key step in many NLP pipelines.

2. word_tokenize function from the nltk.tokenize

- This function is very useful in text processing when you need to handle words
separately from punctuation for further analysis or tasks.

3. FreqDist function from the nltk.probability module

- You can use it to identify the most frequent words, which can help in tasks like
text summarization, keyword extraction, or language modeling.

4. fdist.plot()

- The `fdist.plot()` function is used to visually represent the frequency

distribution of words in a text, helping to identify the most common words and
their occurrence.

5. stopwords data from the NLTK library

- The `stopwords` corpus is used to retrieve a list of common words (like "the",
"is", "in") that are typically removed from text data during preprocessing to focus
on more meaningful words.

6. from nltk.tokenize import word_tokenize

- This code tokenizes a sentence into words and removes common stopwords to produce
a filtered list of meaningful words for further text analysis.

7. imports the PorterStemmer and word_tokenize functions from NLTK

- This code imports the PorterStemmer and word_tokenize functions from NLTK to
tokenize text into words and apply stemming, which reduces words to their root
forms (e.g., "running" becomes "run"), helping in text analysis by standardizing
different word variations.

8. WordNet and OMW-1.4 corpora from NLTK, and then imports the WordNetLemmatizer
and the PorterStemmer

- This code downloads the WordNet and OMW-1.4 corpora from NLTK, and then imports
the WordNetLemmatizer for lemmatization (which reduces words to their base form
using dictionary definitions) and the PorterStemmer for stemming (which reduces
words to their root form by stripping suffixes), enabling both techniques for word
normalization in text processing.

9. averaged_perceptron_tagger model from NLTK and then uses nltk.pos_tag()

- This code downloads the averaged_perceptron_tagger model from NLTK and then uses
nltk.pos_tag() to tag each token in the `tokens` list with its corresponding part
of speech (POS), such as noun, verb, or adjective, helping to understand the
grammatical structure of the text.
10. TfidfVectorizer from sklearn.feature_extraction.text

- This code imports the TfidfVectorizer from sklearn.feature_extraction.text, which

is used to convert a collection of text documents into a matrix of TF-IDF (Term
Frequency-Inverse Document Frequency) features, helping to quantify the importance
of words in a document relative to a corpus for tasks like text classification or
clustering.

11. get_feature_names_out() method of the TfidfVectorizer

- This code uses the get_feature_names_out() method of the TfidfVectorizer to

retrieve a list of all the unique words (features) extracted from the input text
corpus, which are then used to represent the documents in the feature matrix. This
helps in understanding which words the model considers when analyzing the text.

NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
DSBDA Practical 7 Tutorial
No ratings yet
DSBDA Practical 7 Tutorial
11 pages
DS7NLTK
No ratings yet
DS7NLTK
2 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
Ai Lab Final
No ratings yet
Ai Lab Final
21 pages
Main Topics: Start With A Checkmark Followed by The Topic Name
No ratings yet
Main Topics: Start With A Checkmark Followed by The Topic Name
48 pages
Python NLP Practical Exercises
No ratings yet
Python NLP Practical Exercises
14 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
NLP Exp2
No ratings yet
NLP Exp2
6 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP Lab Manual for B.E. Students
No ratings yet
NLP Lab Manual for B.E. Students
21 pages
Sumati
No ratings yet
Sumati
10 pages
Methodology
No ratings yet
Methodology
9 pages
Assignment 2 IR
No ratings yet
Assignment 2 IR
6 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Assignment-10 (NLP-part-2)
No ratings yet
Assignment-10 (NLP-part-2)
2 pages
Text Analytics with TF-IDF in Python
No ratings yet
Text Analytics with TF-IDF in Python
14 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
Text Processing Techniques
No ratings yet
Text Processing Techniques
14 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Basenlp
No ratings yet
Basenlp
5 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Assignment
No ratings yet
Assignment
6 pages
NLP Essentials: Tokenization to Embeddings
No ratings yet
NLP Essentials: Tokenization to Embeddings
3 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
Natural Langauage Processing (NLP) : Tokenization of Words
No ratings yet
Natural Langauage Processing (NLP) : Tokenization of Words
8 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
NLP Tasks for MCA Students
No ratings yet
NLP Tasks for MCA Students
16 pages
NLP Record
No ratings yet
NLP Record
16 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Combine PDF
No ratings yet
Combine PDF
124 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLTK Tokenization & Stemming Guide
No ratings yet
NLTK Tokenization & Stemming Guide
8 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
A09Ass07 - Jupyter Notebook
No ratings yet
A09Ass07 - Jupyter Notebook
6 pages
NLP Crecord Mid2
No ratings yet
NLP Crecord Mid2
36 pages
Next Word Prediction With NLP and Deep Learning
No ratings yet
Next Word Prediction With NLP and Deep Learning
13 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
NLTK Tutorial: Basics and Techniques
No ratings yet
NLTK Tutorial: Basics and Techniques
33 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Text Analysis for Students
No ratings yet
Text Analysis for Students
11 pages
Essay Writing Vs Technical Writing
No ratings yet
Essay Writing Vs Technical Writing
10 pages
Class 12 Computer Technology Full Textbook English Medium 2022
No ratings yet
Class 12 Computer Technology Full Textbook English Medium 2022
256 pages
SYMAP - A9 - MDEC Settings - v3.8 - GB
No ratings yet
SYMAP - A9 - MDEC Settings - v3.8 - GB
14 pages
Notes For Mac Users
No ratings yet
Notes For Mac Users
13 pages
Dunn Complaint
No ratings yet
Dunn Complaint
144 pages
Sigen C&I Modbus User Guide
No ratings yet
Sigen C&I Modbus User Guide
14 pages
Information Processing Handout 3
No ratings yet
Information Processing Handout 3
5 pages
EPIQ 1.5.6 and Up Software Installation FII
No ratings yet
EPIQ 1.5.6 and Up Software Installation FII
13 pages
Projects
No ratings yet
Projects
6 pages
MIL - Assignment 6 - Jan 29 2021
No ratings yet
MIL - Assignment 6 - Jan 29 2021
1 page
Elmer Grid Manual
No ratings yet
Elmer Grid Manual
45 pages
C++ OOP Practical Notebook
No ratings yet
C++ OOP Practical Notebook
44 pages
CONTROL-M R/3 Account Setup Guide
No ratings yet
CONTROL-M R/3 Account Setup Guide
5 pages
December 2025 Computer Holiday Assignment 2025
No ratings yet
December 2025 Computer Holiday Assignment 2025
7 pages
Preview Mail: Login To GIIS-PG
0% (1)
Preview Mail: Login To GIIS-PG
1 page
Success Website 24-6-19
No ratings yet
Success Website 24-6-19
143 pages
Understanding Operating Systems for Grade 10
No ratings yet
Understanding Operating Systems for Grade 10
9 pages
C01002512H 07 08 16 en
No ratings yet
C01002512H 07 08 16 en
270 pages
Worksheet 6th
No ratings yet
Worksheet 6th
6 pages
ERPNext v15 Vs SAP S4HANA RISE A Comprehensive Comparison
No ratings yet
ERPNext v15 Vs SAP S4HANA RISE A Comprehensive Comparison
6 pages
CVWWW Presa
100% (1)
CVWWW Presa
1 page
Class Path Entry Does Not Point To A Valid Jar For A Class-Path Reference - Temenos Knowledge Center
No ratings yet
Class Path Entry Does Not Point To A Valid Jar For A Class-Path Reference - Temenos Knowledge Center
2 pages
CIT315
No ratings yet
CIT315
2 pages
Python
No ratings yet
Python
8 pages
LVR 01
No ratings yet
LVR 01
48 pages
Vision-Based Robotic Grasping
No ratings yet
Vision-Based Robotic Grasping
23 pages
AI Playbook Template
No ratings yet
AI Playbook Template
15 pages
Iim WMP Overses2811229735405
No ratings yet
Iim WMP Overses2811229735405
1 page
LipiScan Networking Guide SECTIONS 1-6
No ratings yet
LipiScan Networking Guide SECTIONS 1-6
18 pages
inSolar Web Client User Manual 1.3
No ratings yet
inSolar Web Client User Manual 1.3
60 pages

Practicle 7-Notes

Uploaded by

Practicle 7-Notes

Uploaded by

Practicle 7

1. sent_tokenize function from the nltk.tokenize module

2. word_tokenize function from the nltk.tokenize

3. FreqDist function from the nltk.probability module

- The `fdist.plot()` function is used to visually represent the frequency

5. stopwords data from the NLTK library

6. from nltk.tokenize import word_tokenize

7. imports the PorterStemmer and word_tokenize functions from NLTK

9. averaged_perceptron_tagger model from NLTK and then uses nltk.pos_tag()

- This code imports the TfidfVectorizer from sklearn.feature_extraction.text, which

11. get_feature_names_out() method of the TfidfVectorizer

- This code uses the get_feature_names_out() method of the TfidfVectorizer to

You might also like