0% found this document useful (0 votes)

12 views10 pages

Natural Language Processing (NLP) Tutorial - GeeksforGeeks

Uploaded by

Kelum Buddhika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

Natural Language Processing (NLP) Tutorial - GeeksforGeeks

Uploaded by

Kelum Buddhika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Natural Language Processing (NLP) Tutorial

Last Updated : 17 Dec, 2024

Natural Language Processing (NLP) is the branch of Artificial Intelligence

(AI) that gives the ability to machine understand and process human
languages. Human languages can be in the form of text or audio format.

Applications of NLP
The applications of Natural Language Processing are as follows:

Voice Assistants like Alexa, Siri, and Google Assistant use NLP for voice
recognition and interaction.
Tools like Grammarly, Microsoft Word, and Google Docs apply NLP for
grammar checking and text analysis.
Information extraction through Search engines such as Google and
DuckDuckGo.
Website bots and customer support chatbots leverage NLP for
automated conversations and query handling.
Google Translate and similar services use NLP for real-time translation
between languages.
Text summarization

This NLP tutorial is designed for both beginners and professionals. Whether
you are a beginner or a data scientist, this guide will provide you with the
knowledge and skills you need to take your understanding of NLP to the
next level.

Phases of Natural Language Processing

There are two components of Natural Language Processing:

Natural Language Understanding

Natural Language Generation

Libraries for Natural Language Processing

Some of natural language processing libraries include:

NLTK (Natural Language Toolkit)

spaCy
Transformers (by Hugging Face)
Gensim

To explore in detail, you can refer to this article: NLP Libraries in

Python

Normalizing Textual Data in NLP

Text Normalization transforms text into a consistent format improves the
quality and makes it easier to process in NLP tasks.

Key steps in text normalization includes:

1. Regular Expressions (RE) are sequences of characters that define search

patterns.

How to write Regular Expressions?

Properties of Regular Expressions
RegEx in Python
Email Extraction using RE
2. Tokenization is a process of splitting text into smaller units called tokens.

How Tokenizing Text, Sentences, and Words Works

Word Tokenization
Rule-based Tokenization
Subword Tokenization
Dictionary-Based Tokenization
Whitespace Tokenization
WordPiece Tokenization

3. Lemmatization reduces words to their base or root form.

4. Stemming reduces works to their root by removing suffixes. Types of

stemmers include:

Porter Stemmer
Lancaster Stemmer
Snowball Stemmer
Lovis Stemmer
Rule-based Stemming

5. Stopword removal is a process to remove common words from the

document.

6. Parts of Speech (POS) Tagging assigns a part of speech to each word in

sentence based on definition and context.

Text Representation or Text Embedding Techniques in

NLP
Text representation converts textual data into numerical vectors that are
processed by the following methods:

One-Hot Encoding
Bag of Words (BOW)
N-Grams
Term Frequency-Inverse Document Frequency (TF-IDF)
N-Gram Language Modeling with NLTK

Text Embedding Techniques refer to the methods and models used to

create these vector representations, including traditional methods (like
TFIDF and BOW) and more advanced approaches:
1. Word Embedding

Word2Vec (SkipGram, Continuous Bag of Words – CBOW)

GloVe (Global Vectors for Word Representation)
fastText

2. Pre-Trained Embedding

ELMo (Embeddings from Language Models)

BERT (Bidirectional Encoder Representations from Transformers)

3. Document Embedding – Doc2Vec

Deep Learning Techniques for NLP

Deep learning has revolutionized Natural Language Processing (NLP) by
enabling models to automatically learn complex patterns and
representations from raw text. Below are some of the key deep learning
techniques used in NLP:

Artificial Neural Networks (ANNs)

Recurrent Neural Networks (RNNs)
Long Short-Term Memory (LSTM)
Gated Recurrent Unit (GRU)
Seq2Seq Models
Transformer Models

Pre-Trained Language Models

Pre-trained models understand language patterns, context and semantics.

The provided models are trained on massive corpora and can be fine tuned
for specific tasks.

GPT (Generative Pre-trained Transformer)

Transformers XL
T5 (Text-to-Text Transfer Transformer)
RoBERTa

To learn how to fine tune a model, refer to this article: Transfer

Learning with Fine-tuning
Natural Language Processing Tasks
1. Text Classification

Dataset for Text Classification

Text Classification using Naive Bayes
Text Classification using Logistic Regression
Text Classification using RNNs
Text Classification using CNNs

2. Information Extraction

Information Extraction
Named Entity Recognition (NER) using SpaCy
Named Entity Recognition (NER) using NLTK
Relationship Extraction
NLP Data Analysis Tutorial Python - Data visualization tutorial NumPy Pandas OpenCV R Machine L
3. Sentiment Analysis

What is Sentiment Analysis?

Sentiment Analysis using VADER
Sentiment Analysis using Recurrent Neural Networks (RNN)

4. Machine Translation

Statistical Machine Translation of Language

Machine Translation with Transformer

5. Text Summarization

What is Text Summarization?

Text Summarizations using Hugging Face Model
Text Summarization using Sumy

6. Text Generation

Text Generation using Fnet

Text Generation using Recurrent Long Short Term Memory Network
Text2Text Generations using HuggingFace Model

History of NLP
Natural Language Processing (NLP) emerged in 1950 when Alan Turing
published his groundbreaking paper titled Computing Machinery and
Intelligence. Turing’s work laid the foundation for NLP, which is a subset of
Artificial Intelligence (AI) focused on enabling machines to automatically
interpret and generate human language. Over time, NLP technology has
evolved, giving rise to different approaches for solving complex language-
related tasks.

1. Heuristic-Based NLP

The Heuristic-based approach to NLP was one of the earliest methods

used in natural language processing. It relies on predefined rules and
domain-specific knowledge. These rules are typically derived from expert
insights. A classic example of this approach is Regular Expressions
(Regex), which are used for pattern matching and text manipulation tasks.

2. Statistical and Machine Learning-Based NLP

As NLP advanced, Statistical NLP emerged, incorporating machine learning

algorithms to model language patterns. This approach applies statistical
rules and learns from data to tackle various language processing tasks.
Popular machine learning algorithms in this category include:

Naive Bayes
Support Vector Machines (SVM)
Hidden Markov Models (HMM)

3. Neural Network-Based NLP (Deep Learning)

The most recent advancement in NLP is the adoption of Deep Learning

techniques. Neural networks, particularly Recurrent Neural Networks
(RNNs), Long Short-Term Memory Networks (LSTMs), and Transformers,
have revolutionized NLP tasks by providing superior accuracy. These models
require large amounts of data and considerable computational power for
training

FAQs on Natural Language Processing

What is the most difficult part of natural language processing?

Ambiguity is the main challenge of natural language processing
because in natural language, words are unique, but they have different
meanings depending upon the context which causes ambiguity on
lexical, syntactic, and semantic levels.

What are the 4 pillars of NLP?

The four main pillars of NLP are 1.) Outcomes, 2.) Sensory acuity, 3.)
behavioural flexibility, and 4.) report.

What language is best for natural language processing?

Python is considered the best programming language for NLP because

of their numerous libraries, simple syntax, and ability to easily
integrate with other programming languages.

What is the life cycle of NLP?

There are four stages included in the life cycle of NLP – development,
validation, deployment, and monitoring of the models.

Comment More info Advertise with us Next Article

Computer Vision Tutorial

Similar Reads
AI ML DS - How To Get Started?
Artificial Intelligence (AI), Machine Learning (ML), and Data Science (DS) are three interrelated fields in
computer science and statistics. AI focuses on creating intelligent systems, ML enables computers to lear…

3 min read
Data Analysis (Analytics) Tutorial
Data Analysis or Data Analytics is studying, cleaning, modeling, and transforming data to find useful
information, suggest conclusions, and support decision-making. This Data Analytics Tutorial will cover al…

7 min read

Machine Learning Tutorial

Machine learning is a subset of Artificial Intelligence (AI) that enables computers to learn from data and
make predictions without being explicitly programmed. If you're new to this field, this tutorial will provid…

8 min read

Deep Learning Tutorial

Deep Learning tutorial covers the basics and more advanced topics, making it perfect for beginners and
those with experience. Whether you're just starting or looking to expand your knowledge, this guide…

5 min read

Natural Language Processing (NLP) Tutorial

Natural Language Processing (NLP) is the branch of Artificial Intelligence (AI) that gives the ability to
machine understand and process human languages. Human languages can be in the form of text or audi…

5 min read

Computer Vision Tutorial

Computer Vision is a branch of Artificial Intelligence (AI) that enables computers to interpret and extract
information from images and videos, similar to human perception. It involves developing algorithms to…

8 min read

Data Science Tutorial

Data Science is an interdisciplinary field that combines powerful techniques from statistics, artificial
intelligence, machine learning, and data visualization to extract meaningful insights from vast amounts of…

6 min read

Artificial Intelligence Tutorial | AI Tutorial

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed
to think and act like humans. It involves the development of algorithms and computer programs that can…

7 min read

AI ML DS Interview Series
The AI-ML-DS Interview Series is an essential resource designed for individuals aspiring to start or switch
careers in the fields of Artificial Intelligence (AI), Machine Learning (ML), and Data Science (DS). This seri…

4 min read

AI ML DS - Projects
Welcome to the "Projects Series: Artificial Intelligence, Machine Learning, and Data Science"! This series is
designed to dive deep into the transformative world of AI, machine learning, and data science through…
6 min read

Corporate & Communications Address:

A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar Pradesh
(201305)

Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida, Gautam
Buddh Nagar, Uttar Pradesh, 201305

Advertise with us

Company Languages
About Us Python
Legal Java
Privacy Policy C++
In Media PHP
Contact Us GoLang
Advertise with us SQL
GFG Corporate Solution R Language
Placement Training Program Android Tutorial
GeeksforGeeks Community Tutorials Archive

DSA Data Science & ML

Data Structures Data Science With Python
Algorithms Data Science For Beginner
DSA for Beginners Machine Learning
Basic DSA Problems ML Maths
DSA Roadmap Data Visualisation
Top 100 DSA Interview Problems Pandas
DSA Roadmap by Sandeep Jain NumPy
All Cheat Sheets NLP
Deep Learning

Web Technologies Python Tutorial

HTML Python Programming Examples
CSS Python Projects
JavaScript Python Tkinter
TypeScript Web Scraping
ReactJS OpenCV Tutorial
NextJS Python Interview Question
Bootstrap Django
Web Design

Computer Science DevOps

Operating Systems Git
Computer Network Linux
Database Management System AWS
Software Engineering Docker
Digital Logic Design Kubernetes
Engineering Maths Azure
Software Development GCP
Software Testing DevOps Roadmap

System Design Inteview Preparation

High Level Design Competitive Programming
Low Level Design Top DS or Algo for CP
UML Diagrams Company-Wise Recruitment Process
Interview Guide Company-Wise Preparation
Design Patterns Aptitude Preparation
OOAD Puzzles
System Design Bootcamp
Interview Questions

School Subjects GeeksforGeeks Videos

Mathematics DSA
Physics Python
Chemistry Java
Biology C++
Social Science Web Development
English Grammar Data Science
Commerce CS Subjects
World GK

MAT 1122 - Differential Equations I -2022
No ratings yet
MAT 1122 - Differential Equations I -2022
3 pages
Introduction to Pathology
No ratings yet
Introduction to Pathology
88 pages
Session 01
No ratings yet
Session 01
12 pages
Sr Lecture 1
No ratings yet
Sr Lecture 1
96 pages
Chapter 02 Part 02
No ratings yet
Chapter 02 Part 02
16 pages
MAT 1122 - Differential Equations I - 2021
No ratings yet
MAT 1122 - Differential Equations I - 2021
2 pages
Chapter 02 Part 1
No ratings yet
Chapter 02 Part 1
9 pages
EEI3266 Case Study
No ratings yet
EEI3266 Case Study
3 pages
ug-683463-821876
No ratings yet
ug-683463-821876
136 pages
chapter 5
No ratings yet
chapter 5
13 pages
Brainspace Fall 2017
No ratings yet
Brainspace Fall 2017
37 pages
Internet, Email and Web Based Applications
No ratings yet
Internet, Email and Web Based Applications
27 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
2023 Handbook
No ratings yet
2023 Handbook
123 pages
Examplereport
No ratings yet
Examplereport
9 pages
A Review of 7 Software Architecture Visualization Tools - TechTarget
No ratings yet
A Review of 7 Software Architecture Visualization Tools - TechTarget
4 pages
BSc in Electronics & Automation Guide
No ratings yet
BSc in Electronics & Automation Guide
11 pages
Dokumen - Pub - Modern Software Engineering Doing What Works To Build Better Software Faster 1nbsped 0137314914 9780137314911
100% (3)
Dokumen - Pub - Modern Software Engineering Doing What Works To Build Better Software Faster 1nbsped 0137314914 9780137314911
256 pages
2023 Full-Stack Developer Roadmap
No ratings yet
2023 Full-Stack Developer Roadmap
8 pages
Antonini Giulia - ANALYSIS OF THE 2019 RANSOMWARE
No ratings yet
Antonini Giulia - ANALYSIS OF THE 2019 RANSOMWARE
20 pages
High-Resolution Lung CT Advances
No ratings yet
High-Resolution Lung CT Advances
19 pages
MVC vs. MVVM - 2 Architecture Patterns For Modularity - TechTarget
No ratings yet
MVC vs. MVVM - 2 Architecture Patterns For Modularity - TechTarget
4 pages
Ib3 - common-Mistakes-At-proficiency-cambridge-cpe
No ratings yet
Ib3 - common-Mistakes-At-proficiency-cambridge-cpe
33 pages
Body CT - Chest HRCT
No ratings yet
Body CT - Chest HRCT
5 pages
1 - GE Bright Speed 16 CT Machine
No ratings yet
1 - GE Bright Speed 16 CT Machine
826 pages
Influence of CT Image Matrix Size and Kernel Type
No ratings yet
Influence of CT Image Matrix Size and Kernel Type
13 pages
simple-mvc Framework Guide
No ratings yet
simple-mvc Framework Guide
25 pages
Javascript Programming: Introduction To
No ratings yet
Javascript Programming: Introduction To
14 pages
Measurement Activities for Students
No ratings yet
Measurement Activities for Students
5 pages
Abyip 23 25 SK Mabiga Final For Merge
No ratings yet
Abyip 23 25 SK Mabiga Final For Merge
11 pages
Power Resources Notes
No ratings yet
Power Resources Notes
26 pages
Understanding Affinity Diagrams for Teams
No ratings yet
Understanding Affinity Diagrams for Teams
4 pages
Alia ARC900 Paperless Recorders
No ratings yet
Alia ARC900 Paperless Recorders
4 pages
Vignemont. The Empathic Brain. How, When and Why
No ratings yet
Vignemont. The Empathic Brain. How, When and Why
7 pages
A.I. Literacy for Educators
No ratings yet
A.I. Literacy for Educators
11 pages
Mini Project
83% (6)
Mini Project
4 pages
FPC CEO's Guide to Purpose-Centricity
No ratings yet
FPC CEO's Guide to Purpose-Centricity
32 pages
Electrostatic Potential & Capacitance Quiz
No ratings yet
Electrostatic Potential & Capacitance Quiz
7 pages
Sony Cdx-Gt650ui Gt700ui Gt707ui
No ratings yet
Sony Cdx-Gt650ui Gt700ui Gt707ui
42 pages
Machine Tools: Driving Mechanisms Explained
No ratings yet
Machine Tools: Driving Mechanisms Explained
12 pages
Grade 2 English Lesson on Homophones
No ratings yet
Grade 2 English Lesson on Homophones
12 pages
Teen Detective 9 The Colony Angel Carter No Waiting Time
100% (2)
Teen Detective 9 The Colony Angel Carter No Waiting Time
90 pages
Cylinder Pressure Measuring
No ratings yet
Cylinder Pressure Measuring
3 pages
ITC TradeMap
No ratings yet
ITC TradeMap
17 pages
Humankind Game Manual
No ratings yet
Humankind Game Manual
17 pages
Research Methodology Syllabus Overview
No ratings yet
Research Methodology Syllabus Overview
111 pages
Ty v. People
No ratings yet
Ty v. People
11 pages
Universal Indicators in Demonstration Experiments
No ratings yet
Universal Indicators in Demonstration Experiments
3 pages
Why The Future Doesn't Need Us - Bill Joy
No ratings yet
Why The Future Doesn't Need Us - Bill Joy
3 pages
Symbolism of Bertha in Jane Eyre
No ratings yet
Symbolism of Bertha in Jane Eyre
2 pages
English Assessment Multiple Choice
No ratings yet
English Assessment Multiple Choice
14 pages
Pneumonia Detection On X-Ray Image Using Improved Depthwise Separable Convolutional Neural Networks
No ratings yet
Pneumonia Detection On X-Ray Image Using Improved Depthwise Separable Convolutional Neural Networks
9 pages
Phosphorus Cycle
No ratings yet
Phosphorus Cycle
18 pages
Time Zones 1 Exam Review
No ratings yet
Time Zones 1 Exam Review
4 pages
CMI Programme Application Form
No ratings yet
CMI Programme Application Form
3 pages
The Culmination: Heidegger, German Idealism, and The Fate of Philosophy Robert B. Pippin Ebook Comprehensive PDF Version
100% (1)
The Culmination: Heidegger, German Idealism, and The Fate of Philosophy Robert B. Pippin Ebook Comprehensive PDF Version
45 pages
Job Satisfaction Survey Report
No ratings yet
Job Satisfaction Survey Report
6 pages
매일 10문제씩 푸는 영어 주간지 - 4월 1주차
No ratings yet
매일 10문제씩 푸는 영어 주간지 - 4월 1주차
24 pages