0% found this document useful (0 votes)

21 views47 pages

Lec02 1 BasicTextProcessing

The document discusses basic text processing techniques in Natural Language Processing (NLP), focusing on regular expressions and text normalization. It covers various aspects such as tokenization, lemmatization, and the use of regular expressions for searching and matching text patterns. Additionally, it highlights the importance of normalization tasks in preparing text for analysis.

Uploaded by

furkancalis81

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views47 pages

Lec02 1 BasicTextProcessing

Uploaded by

furkancalis81

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Basic Text Processing

• Regular Expressions
• Text Normalization

Natural Language Processing 1

Basic Text Processing: Regular Expressions

Natural Language Processing 2

Regular Expressions
• Regular expressions are the most important tool to describe text patterns and we can
use them to specify the strings to be extracted from the text.
– Regular expressions are widely used in many text preprocessing tasks.
– A set of text preprocessing tasks is called as text normalization.
• Normalizing text means converting it to a more convenient, standard form.

Tokenization: Separating out or tokenizing words from text.

– English words are often separated from each other by whitespace (not enough).
• For processing tweets we’ll need to tokenize emoticons like :) or hashtags like #nlproc.

Lemmatization: Task of determining that words have the same root.

– Words sings, singing, sang, sung have the same root word (lemma) sing.
– Words kitabım, kitaplar,… have the same root word kitap.
– Stemming: a simpler version of lemmatization in which we mainly just strip suffixes from
the end of the word.
Sentence Segmentation: breaking up a text into individual sentences.

Natural Language Processing 3

Regular Expressions
• Each Regular Expression (RE) represents a set of strings having certain pattern.
– In NLP, we can use REs to find strings having certain patterns in a given text.
• Regular Expressions are an algebraic way to describe formal languages.
– Regular Expressions describe exactly the regular languages.
• A regular expression is built up of simpler regular expressions (using defining rules).
• Simple Definition for Regular Expressions over alphabet 
–  is a regular expression
– If a  , a is a regular expression
– or : If E1 and E2 are regular expressions, then E1 | E2 is a regular expression
– concatenation : If E1 and E2 are regular expressions, then E1E2 is a regular
expression
– Kleene Closure: If E is a regular expression, then E* is a regular expression
– Positive Closure: If E is a regular expression, then E+ is a regular expression

Natural Language Processing 4

Searching Strings with Regular Expressions
(using Python style REs)
• How can we search for any of following strings?
– woodchuck
– woodchucks
– Woodchuck
– Woodchucks
• The simplest kind of regular expression is a sequence of simple characters.
– The regular expression b will match with the string “b”.
– The regular expression bc will match with the string “bc”.
– The regular expression woodchuck will match with the string “woodchuck”.
– The regular expression woodchucks will match with the string “woodchucks”.
– The regular expression woodchuck will NOT match with the string “Woodchuck”.

Natural Language Processing 5

Regular Expressions: Disjunctions
disjunction of characters []
• Disjunction of Characters: The string of characters inside the braces [ ] specifies a
disjunction of characters to match.
• The regular expression [wW] matches patterns containing either w or W.
Regular Expression Matches
[wW]oodchuck Woodchuck, woodchuck
[1234567890] Any digit

• Ranges in []: If there is a well-defined sequence associated with a set of characters,

dash (-) in brackets can specify any one character in a range.

Regular Expression Matches

[A-Z] An upper case letter
[a-z] A lower case letter
[0-9] A single digit

Natural Language Processing 6

Regular Expressions: Disjunctions
Negations in []
• Negations in []:
– The square braces can also be used to specify what a single character cannot be, by use of
the caret ^.
– If the caret ^ is the first symbol after the open square brace [, the resulting pattern is
negated.

Regular Expression Matches

[Â-Z] Not an upper case letter
[â-z] Not a lower case letter
[^Ss] Neither ‘S’ nor ‘s’
[ê^] Neither e nor ^
a^b The pattern a^b

Natural Language Processing 7

Regular Expressions: Disjunctions
or (disjunction) operator | (pipe symbol)

• If E1 and E2 are regular expressions, then E1 | E2 is a regular expression

Regular Expression Matches

Natural Language Processing 8

Regular Expressions: Closure Operators
Kleene * and Kleene +
• Kleene * (closure) operator: The Kleene star means “zero or more occurrences of the
immediately previous regular expression.
• Kleene + (positive closure) operator: The Kleene plus means “one or more
occurrences of the immediately preceding regular expression.

Regular Expression Matches

ba* b, ba, baa, baaa, ...
ba+ ba, baa, baaa, ...
(ba)* , ba, baba, bababa, …
(ba)+ ba, baba, bababa, …
(b|a)+ b, a, bb, ba, aa, ab, …

Natural Language Processing 9

Regular Expressions: {} . ?
• {m,n} causes the resulting RE to match from m to n repetitions of the preceding RE.
• {m} specifies that exactly m copies of the previous RE should be matched
• The question mark ? marks optionality of the previous expression.

Regular Expression Matches

woodchucks? woodchuck or woodchucks
colou?r color or colour
(a|b)?c ac, bc, c
(ba){2,3} baba, bababa

• A wildcard expression dot . matches any single character (except a carriage return).
Regular Expression Matches
beg.n begin, begun, begxn, …
a.*b any string starts with a and ends with b
Natural Language Processing 10
Regular Expressions: Anchors ^ $
• Anchors are special characters that anchor regular expressions to particular places in a
string.
• The caret ^ matches the start of a string.
– The regular expression ^The matches the word The only at the start of a string.
• The dollar sign $ matches the end of a line.

Regular Expression Matches

.$ any character at the end of a string
\.$ dot character at the end of a string
^[A-Z] any uppercase character at the
beginning of a string
^The dog\.$ a string that contains only the phrase
The dog.

Natural Language Processing 11

Regular Expressions: Precedence of Operators
• The order precedence of RE operator precedence, from highest precedence to lowest
precedence is as follows

– Parenthesis ()
– Counters * + ? {}
– Sequences and anchors ^ $
– Disjunction |

• The regular expression the* matches theeeee but not thethe

• The regular expression (the)* matches thethe but not theeeee

Natural Language Processing 12

Regular Expressions: backslashed characters
• Aliases for common sets of characters

RE Expansion Match
\d [0-9] any digit
\D [^0-9] any non-digit
\w [a-zA-Z0-9_] any word character
\W [^a-zA-Z0-9_] any non-word character
\s [ \t\n\r\f\v] any whitespace character
\S [^ \t\n\r\f\v] any non-whitespace character

Natural Language Processing 13

Regular Expressions: backslashed characters
• Special characters need to be backslashed.

RE Match
\b a word boundary: A word boundary is the position between a word character
(\w) and a non-word character (\W).
\B a non-word boundary: This is the opposite of \b, and it matches any position
that is not a word boundary.
\n a newline character
\t a tab character
escaping other special characters: We can escape almost any special character
(like *, +, ?, (, ), {, }, [, ], |, ^, $, ., and \\) by preceding it with a backslash (\).
\. \+ \* \\ \?  \[ \] \| \^ \$

Natural Language Processing 14

Regular Expressions: Example
• We want to write a RE to find cases of the English article the
– We can use findall method in re library for tokenization.
import re
sentence = "The book and the other book are not theology books. We breathe the air."
[Link](r"the",sentence)
➔ ['the', 'the', 'the', 'the', 'the']
The book and the other book are not theology books. We breathe the air.
[Link](r"[tT]he",sentence)
➔ ['The', 'the', 'the', 'the', 'the', 'the']
The book and the other book are not theology books. We breathe the air.
[Link](r"[tT]he\b",sentence)
➔ ['The', 'the', 'the', 'the']
The book and the other book are not theology books. We breathe the air.
[Link](r"\b[tT]he",sentence)
➔ ['The', 'the', 'the', 'the']
The book and the other book are not theology books. We breathe the air.
[Link](r"\b[tT]he\b",sentence)
➔ ['The', 'the', 'the']
The book and the other book are not theology books. We breathe the air.

Natural Language Processing 15

Regular Expressions & FSAs
• Any regular expression can be realized as a finite state automaton (FSA)
• There are two kinds of FSAs
– Deterministic Finite State Automatons (DFAs)
– Non-deterministic Finite State Automatons (NFAs)
• Any NFA can be converted into a corresponding DFA.
• A DFA (and a regular expression) represents a regular language.

Regular Expressions

Finite Automata Regular Languages

Natural Language Processing 16

Regular Expressions: A DFA and A NFA
• A regular language: The strings whose second characters from the right end are 1.

• Regular Expression for this regular language: (0|1)*1(0|1)

• A DFA for this language :

Natural Language Processing 17

Formal Definition of
Finite-State Automaton
• FSA is Q x  x q0 x F x 

• Q: a finite set of N states q0, q1, … qN

• : a finite input alphabet of symbols
• q0: the start state
• F: the set of final states
• (q,i): transition function
– DFA : There is exactly one arc leaving a state q with a symbol a.
There is no arc with the empty string.

Natural Language Processing 18

Basic Text Processing: Text Normalization

Natural Language Processing 19

Text Normalization
• Almost every natural language processing task needs to do text normalization.

• Three tasks are commonly applied as part of any normalization process:

1. Segmenting/tokenizing words from the text
2. Normalizing word formats
3. Segmenting sentences in the text.

Natural Language Processing 20

Words
• Before processing words, we need to decide what counts as a word.
• How many words are in the following sentence?
He stepped out into the hall, was delighted to encounter a water brother.
– If we do NOT count punctuations as words ➔ 13 words
– If we count punctuations as words ➔ 15 words
• Punctuations can be useful to identify boundaries of things and some aspects of
meaning.

• Are capitalized tokens and uncapitalized tokens the same word?

– The and the big possibly
– US and us may be not (US: united states of America)

Natural Language Processing 21

Words
• Are the inflected forms like cat and cats the same word?
• They have the same lemma cat, but they have different wordforms.
• A lemma is a set of lexical forms having the same stem, the same major part-of-
speech, and the same word sense.
• The wordform is the full inflected or derived form of the word.
– For morphologically complex languages, we often need to deal with lemmatization.
– For many tasks in English, however, wordforms are sufficient.

Natural Language Processing 22

Words: How many words are there in English?
• A type is a distinct Word in a corpus.
• V: Vocabulary is the set of types.
– |V| is the size of the vocabulary.
• Each word in a corpus is a token.
– N is the number of tokens in the corpus.

Corpus # of Tokens = N # of Types = |V|

Shakespeare 884,000 31 thousand
Switchboard phone conversations 2.4 million 20 thousand

Brown corpus 1 million 38 thousand

Google N-grams 1 trillion 13 million

Natural Language Processing 23

Word Tokenization and Normalization
• Tokenization is the task of segmenting the text into words.
• Normalization is the task of putting words in a standard format.

• We can use regular expressions to segment the text into words for tokenization task.
– Since tokenization needs to be run before any other language processing, it is
important for it to be very fast.
– The method for tokenization/normalization is to use deterministic algorithms
based on regular expressions compiled into very efficient finite state automata.

Natural Language Processing 24

Tokenization
• Normally we want to break off punctuations as separate tokens, but sometimes we
want to keep them in words internally.
• Punctuations as separate tokens: He ate apple, orange and banana.
• Punctuations kept internally:
– m.p.h. Ph.D. AT&T Prices: $43.55 Dates: 27/09/2019
– URLs: [Link] Twitter hashtags: #nlproc
• A tokenizer can also expand clitic contractions that are marked by apostrophes.
– what're to two tokens what are
– we're to two tokens we are
• Tokenization algorithms may also tokenize multiword expressions like New York or
rock 'n' roll as a single token.

Natural Language Processing 25

Tokenization in Python
• We can use methods in re or nltk library for tokenization.
– We can use findall method in re library for tokenization.
– We can use word_tokenize or regexp_tokenize methods in nltk library for tokenization.
• Token patterns are described with regular expressions.
[Link](pattern,string)
– Returns all non-overlapping matches of pattern in string, as a list of strings or tuples.
– The string is scanned left-to-right, and matches are returned in the order found.
– The result depends on the number of capturing groups in the pattern.
• If there are no groups, return a list of strings matching the whole pattern.
• If there is exactly one group, return a list of strings matching that group.
• If multiple groups are present, return a list of tuples of strings matching the groups.

nltk.regexp_tokenize(string,pattern)
nltk.word_tokenize(string)
– nltk.word_tokenize may require to download some library.
• [Link]('punkt_tab')

Natural Language Processing 26

Tokenization in Python
• In regular expressions, parentheses are used for two main purposes:
1. Grouping: Parentheses (...) are used to group parts of the pattern together.
2. Non-Grouping: Parentheses (?:...) are used to create non-capturing groups.
• This group is still used to group parts of the pattern together (like regular parentheses),
but it does not create a capturing group.
str = "the evening shows start at 7:00pm and 10:15pm. the morning show at
9:00."
# Parentheses identify a group within the pattern
matches = [Link](r"(\d\d?:\d\d)(am|pm)?", str)
➔ [('7:00', 'pm'), ('10:15', 'pm'), ('9:00', ‘’)]
matches = [Link](r"(?:\d\d?:\d\d)(am|pm)?", str)
➔ ['pm', 'pm', ‘’]
matches = [Link](r"(\d\d?:\d\d)(?:am|pm)?", str)
➔ ['7:00', '10:15', '9:00’]
matches = [Link](r"(?:\d\d?:\d\d)(?:am|pm)?", str)
➔ ['7:00pm', '10:15pm', '9:00']

Natural Language Processing 27

Tokenization in Python
import nltk
import re
sentence = 'That U.S.A. poster-print costs $12.40...'
nltk.word_tokenize(sentence)
➔ ['That', 'U.S.A.', 'poster-print', 'costs', '$', '12.40', '...’]
pattern = r"\w+" # tokens are any non-empty sequence of word characters
nltk.regexp_tokenize(sentence,pattern) # or findall of re
[Link](pattern,sentence)
➔ ['That', 'U', 'S', 'A', 'poster', 'print', 'costs', '12', '40']
pattern = r"\S+" # tokens are white space separated
nltk.regexp_tokenize(sentence,pattern)
➔ ['That', 'U.S.A.', 'poster-print', 'costs', '$12.40...’]
pattern = r"""(?x) # multiline and set verbose for regular expressions
(?:[A-Z]\.)+ # abbreviations
| \w+(?:-\w+)* # words with optional internal hyphens
| \$?\d+(?:\.\d+)?%? # currency, percentages, e.g. $12.40, 45%
| \.\.\. # ellipsis (three dots)
| [][.,;\"\'?():_-] # other single character tokens
"""
nltk.regexp_tokenize(sentence,pattern)
➔ ['That', 'U.S.A.', 'poster-print', 'costs', '$', '12.40', '...']

Natural Language Processing 28

Tokenization: Language Issues
• French:
– L'ensemble to two words un ensemble

• German noun compounds are not segmented:

– Lebensversicherungsgesellschaftsangestellter
– German tokenizer needs compound splitter.

• Chinese and Japanese no spaces between words:

– 莎拉波娃现在居住在美国东南部的佛罗里达。
– 莎拉波娃现在居住在美国东南部的佛罗里达
– Sharapova now lives in US southeastern Florida

Natural Language Processing 29

Word Tokenization in Chinese
• Word tokenization is also called Word Segmentation
• Chinese words are composed of characters
– Characters are generally 1 syllable and 1 morpheme.
– Average word is 2.4 characters long.

• Standard baseline segmentation algorithm: Maximum Matching

Given a wordlist of Chinese, and a string.

1. Start a pointer at the beginning of the string
2. Find the longest word in dictionary that matches the string starting at pointer
3. Move the pointer over the word in string
4. Go to 2

Natural Language Processing 30

Max-match segmentation
• Thecatinthehat the cat in the hat
• Thetabledownthere the table down there
theta bled own there
– Doesn’t generally work in English!

• But works well in Chinese

– 莎拉波娃现在居住在美国东南部的佛罗里达。
– 莎拉波娃现在居住在美国东南部的佛罗里达

• Modern probabilistic segmentation algorithms even better

Natural Language Processing 31

Subword Tokenization
• Another option for text tokenization ➔ Subword Tokenization
• Instead of white-space word segmentation, use single-character segmentation
• Use the data (training corpus) to tell us how to tokenize.
• Subword tokenization (because tokens can be parts of words as well as whole words)
• To deal with this unknown word problem, modern tokenizers (used by Large
Language Models) automatically induce sets of tokens that include tokens smaller than
words, called subwords.
– Subwords can be arbitrary substrings, or they can be meaning-bearing units like the
morphemes -est or -er.
– In modern tokenization schemes, most tokens are words, but some tokens are frequently
occurring morphemes or other subwords like -er.
– Every unseen word like lower can thus be represented by some sequence of known subword
units, such as low and er, or even as a sequence of individual letters if necessary.

Natural Language Processing 32

Subword Tokenization
• Three common algorithms for subword tokenization:
– Byte-Pair Encoding (BPE) (Sennrich et al., 2016)
– Unigram language modeling tokenization (Kudo, 2018)
– WordPiece (Schuster and Nakajima, 2012)

• All algorithms have 2 parts:

– A token learner that takes a raw training corpus and induces a vocabulary (a set
of tokens).
– A token segmenter that takes a raw test sentence and tokenizes it according to
that vocabulary

Natural Language Processing 33

Byte-Pair Encoding (BPE)
- Token Learner
• Let vocabulary be the set of all individual characters
= {A, B, C, D,…, a, b, c, d….}
• Repeat:
– Choose the two symbols that are most frequently adjacent in the training corpus
(say 'A', 'B')
– Add a new merged symbol 'AB' to the vocabulary
– Replace every adjacent 'A' 'B' in the corpus with 'AB’.

• Until k merges have been done.

Natural Language Processing 34

Byte-Pair Encoding (BPE)
- Token Learner

Natural Language Processing 35

Byte-Pair Encoding (BPE)
- Token Learner: Example
• Most subword algorithms are run inside space-separated tokens.
• So we commonly first add a special end-of-word symbol ‘_' before space which comes
after each word in training corpus
• Next, separate into letters (character).

• Tiny Corpus:
low low low low low lowest lowest newer newer newer
newer newer newer wider wider wider new new

• Add end-of-word tokens

➔ Initial Vocabulary:

Natural Language Processing 36

Byte-Pair Encoding (BPE)
- Token Learner: Example

• Merge e r to er

Natural Language Processing 37

Byte-Pair Encoding (BPE)
- Token Learner: Example

• Merge er _ to er_

Natural Language Processing 38

Byte-Pair Encoding (BPE)
- Token Learner: Example

• Merge n e to ne

Natural Language Processing 39

Byte-Pair Encoding (BPE)
- Token Learner: Example

• Next Merges:

Natural Language Processing 40

Byte-Pair Encoding (BPE)
- Token Segmenter
• On the test data, run each merge learned from the training data:
– Greedily
– In the order we learned them

• So: merge every e r to er, then merge er _ to er_, etc.

• Result:
– Test set "n e w e r _" would be tokenized as a full word "newer_"
– Test set "l o w e r _" would be two tokens: "low er_"

Natural Language Processing 41

Properties of Byte-Pair Encoding Tokens
• Usually include frequent words
• And frequent subwords
– Which are often morphemes like -est or –er

• A morpheme is the smallest meaning-bearing unit of a language

– unlikeliest has 3 morphemes un-, likely, and -est

Natural Language Processing 42

Text Normalization
• Tokens can also be normalized, in which a single normalized form is chosen for
words with multiple forms like USA and US.
– This standardization may be valuable, despite the spelling information that is lost
in the normalization process.
– For information retrieval, we want a query for US to match a document that has
USA.

• Case folding is another kind of normalization: Reduce all letters to lower case.
– For most applications (information retrieval), case folding is helpful.
– For some NLP applications (MT, information extraction) cases can be helpful.
• US versus us are important

Natural Language Processing 43

Lemmatization
• Lemmatization is the task of determining that two words have the same root, despite
their surface differences.
– am, are, is → be
– car, cars, car's, cars' → car
• Lemmatization: have to find correct dictionary headword form of the Word.

• The most sophisticated methods for lemmatization involve complete morphological

parsing of the word.
• Morphology is the study of the way words are built up from smaller meaning-bearing
units called morphemes.
• Two broad classes of morphemes can be distinguished:
– Stems : the central morpheme of the word, supplying the main meaning
– Affixes : adding “additional” meanings of various kinds.
Natural Language Processing 44
Lemmatization
• Lemmatization algorithms can be complex.
• For this reason we sometimes make use of a simpler but cruder method, which mainly
consists of chopping off word-final affixes.
• This naive version of morphological analysis is called stemming.
• One of the most widely used stemming algorithms is Porter Stemmer.
– The algorithm is based on series of rewrite rules run in series, in which the output
of each pass is fed as input to the next pass.
– Some rules are:
• ATIONAL → ATE (e.g., relational → relate)
• ING →  if stem contains vowel (e.g., motoring → motor)
• SSES → SS (e.g., grasses → grass)

Natural Language Processing 45

Sentence Segmentation
• Sentence segmentation is another important step in text processing.
• The most useful cues for segmenting a text into sentences are punctuation, like
periods, question marks, exclamation points.
• Question marks and exclamation points are relatively unambiguous markers of
sentence boundaries.
• Periods, on the other hand, are more ambiguous.
– Abbreviations like Inc. or Dr.
– Numbers like .02% or 4.3
• Build a binary classifier
– Looks at a “.”
– Decides EndOfSentence/NotEndOfSentence
– Classifiers: hand-written rules, regular expressions, or machine-learning

Natural Language Processing 46

Summary
• The regular expression language is a powerful tool for pattern-matching.
• Word tokenization and normalization are generally done by cascades of simple
regular expression substitutions or finite automata.

Natural Language Processing 47

3-Regular Expressions
No ratings yet
3-Regular Expressions
34 pages
2 Regular Expressions
No ratings yet
2 Regular Expressions
34 pages
CS 491 Natural Language Processing Module 2: Basic Text Processing
No ratings yet
CS 491 Natural Language Processing Module 2: Basic Text Processing
24 pages
02 Text Processing - Regular Expressions-Text Normalization
No ratings yet
02 Text Processing - Regular Expressions-Text Normalization
58 pages
3 Regular Expression
No ratings yet
3 Regular Expression
15 pages
Regex and Automata in NLP
No ratings yet
Regex and Automata in NLP
62 pages
Regular Expressions, Tok-Enization, Edit Distance
No ratings yet
Regular Expressions, Tok-Enization, Edit Distance
29 pages
Regular Expressions Overview by Mansouri
No ratings yet
Regular Expressions Overview by Mansouri
39 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
NLP Chapter 5
No ratings yet
NLP Chapter 5
70 pages
Regular Expressions Guide
No ratings yet
Regular Expressions Guide
72 pages
Lexi Cal Analysis
No ratings yet
Lexi Cal Analysis
91 pages
Regular Expressions in NLP Basics
No ratings yet
Regular Expressions in NLP Basics
69 pages
Regular Expressions, Text Normalization, Edit Distance
No ratings yet
Regular Expressions, Text Normalization, Edit Distance
30 pages
Word-Level NLP Techniques and Regex
No ratings yet
Word-Level NLP Techniques and Regex
8 pages
Regular Expressions, Text Normalization, Edit Distance
No ratings yet
Regular Expressions, Text Normalization, Edit Distance
23 pages
Module2 NLP BAD613B Notes
100% (1)
Module2 NLP BAD613B Notes
16 pages
Upto Morphological Parsing
No ratings yet
Upto Morphological Parsing
19 pages
Regular Expression
No ratings yet
Regular Expression
32 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
65 pages
2 Regex
No ratings yet
2 Regex
21 pages
Chapter Three Regular Expressions and Finite-State Automata
No ratings yet
Chapter Three Regular Expressions and Finite-State Automata
19 pages
Intro to Regular Expressions
No ratings yet
Intro to Regular Expressions
209 pages
Lecture 2n 04032024 081220pm 19022025 105409am
No ratings yet
Lecture 2n 04032024 081220pm 19022025 105409am
38 pages
NLP Unit1Content
No ratings yet
NLP Unit1Content
106 pages
2-Regular Expressions, Text Normalization, Edit Distance
No ratings yet
2-Regular Expressions, Text Normalization, Edit Distance
42 pages
Aula 2
No ratings yet
Aula 2
26 pages
Basic Text Processing: Regular Expressions & Automata in NLP
No ratings yet
Basic Text Processing: Regular Expressions & Automata in NLP
27 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Intro to Regular Expressions
No ratings yet
Intro to Regular Expressions
27 pages
Class 3
No ratings yet
Class 3
52 pages
Regex Techniques for NLP Applications
No ratings yet
Regex Techniques for NLP Applications
16 pages
3b TextProcessing
No ratings yet
3b TextProcessing
32 pages
Regular Expression
No ratings yet
Regular Expression
3 pages
Regular Expressions in Compiler Design
No ratings yet
Regular Expressions in Compiler Design
25 pages
Lecture 4 Regular Expression
No ratings yet
Lecture 4 Regular Expression
30 pages
Chap-2 2 (RegularExpression)
No ratings yet
Chap-2 2 (RegularExpression)
46 pages
Usage of Regular Expressions in NLP
No ratings yet
Usage of Regular Expressions in NLP
7 pages
Regular Expressions in NLP
No ratings yet
Regular Expressions in NLP
7 pages
Ayan Saha - 10700121101
No ratings yet
Ayan Saha - 10700121101
10 pages
Regex for NLP Enthusiasts
No ratings yet
Regex for NLP Enthusiasts
42 pages
Advanced NLP & Regular Expressions
No ratings yet
Advanced NLP & Regular Expressions
102 pages
Regular Expressions
No ratings yet
Regular Expressions
23 pages
Unit 3 - Regular Expression
No ratings yet
Unit 3 - Regular Expression
45 pages
COMP3 RegEx
No ratings yet
COMP3 RegEx
10 pages
Bai601 NLP MODULE 2 Lecture Notes
No ratings yet
Bai601 NLP MODULE 2 Lecture Notes
22 pages
Language About Complier Construction
No ratings yet
Language About Complier Construction
23 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
NLP Lecture on Regular Expressions
No ratings yet
NLP Lecture on Regular Expressions
43 pages
Basic Text Processing: Regular Expressions and Text Normalization
No ratings yet
Basic Text Processing: Regular Expressions and Text Normalization
53 pages
ACT Chapter 2
No ratings yet
ACT Chapter 2
22 pages
Basic Text Processing: Regular Expressions and Text Normalization
No ratings yet
Basic Text Processing: Regular Expressions and Text Normalization
53 pages
Regular Expressions
No ratings yet
Regular Expressions
20 pages
Compiler Construction: Scanning & Tokens
No ratings yet
Compiler Construction: Scanning & Tokens
72 pages
Regular Languages
No ratings yet
Regular Languages
26 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Mod 2
No ratings yet
Mod 2
49 pages
Expression (Abbreviated Regex or Regexp) and Sometimes Called A Rational Expression
No ratings yet
Expression (Abbreviated Regex or Regexp) and Sometimes Called A Rational Expression
4 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Donna Jo Napoli, Vera Lee-Schoenfeld - Language Matters - A Guide To Everyday Questions About Language, Second Edition-Oxford University Press, USA (2010)
100% (2)
Donna Jo Napoli, Vera Lee-Schoenfeld - Language Matters - A Guide To Everyday Questions About Language, Second Edition-Oxford University Press, USA (2010)
248 pages
Use The Proqol - Proqol - Permission To Use
No ratings yet
Use The Proqol - Proqol - Permission To Use
4 pages
Teacher's Book: English Course Overview
No ratings yet
Teacher's Book: English Course Overview
144 pages
Reasoning Test 3
No ratings yet
Reasoning Test 3
6 pages
LS English 8 Unit 8 Test Answers
No ratings yet
LS English 8 Unit 8 Test Answers
3 pages
Cambridge IGCSE: French 0520/21
No ratings yet
Cambridge IGCSE: French 0520/21
15 pages
Purposive Communication 1
No ratings yet
Purposive Communication 1
10 pages
PP 2-Eso 23 24
No ratings yet
PP 2-Eso 23 24
50 pages
Zero Conditional Sentence Exercises
No ratings yet
Zero Conditional Sentence Exercises
1 page
Kinds of Tests: Proficiency Tests, Achievement Tests, Diagnostic Tests, and Placement Tests
100% (1)
Kinds of Tests: Proficiency Tests, Achievement Tests, Diagnostic Tests, and Placement Tests
7 pages
JavaCup Compiler Design Overview
No ratings yet
JavaCup Compiler Design Overview
47 pages
0500 First Language English: MARK SCHEME For The May/June 2014 Series
No ratings yet
0500 First Language English: MARK SCHEME For The May/June 2014 Series
9 pages
Activity Task 2021 CLASS VIII ENGLISH (2ND SERIES)
No ratings yet
Activity Task 2021 CLASS VIII ENGLISH (2ND SERIES)
2 pages
Activity: Travel With Me! Directions: Write An Article of 80-120 Words About Giving An Advice To Tourists When
67% (3)
Activity: Travel With Me! Directions: Write An Article of 80-120 Words About Giving An Advice To Tourists When
6 pages
English Grade8 Quarter4 Lesson 6
No ratings yet
English Grade8 Quarter4 Lesson 6
3 pages
Mini Test 8th Grade
No ratings yet
Mini Test 8th Grade
4 pages
Perfect Tenses
No ratings yet
Perfect Tenses
22 pages
Exploring A Poem-Chapter 4, Helen Vendler-Poems, Poets, Poetry: An Introduction and
No ratings yet
Exploring A Poem-Chapter 4, Helen Vendler-Poems, Poets, Poetry: An Introduction and
3 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
Grade 4 Class Test Term Ii Datesheet 1737352771
No ratings yet
Grade 4 Class Test Term Ii Datesheet 1737352771
1 page
Pert 2 & 3
No ratings yet
Pert 2 & 3
4 pages
Word Formation First
No ratings yet
Word Formation First
5 pages
Brain Development in Baby's First Year
No ratings yet
Brain Development in Baby's First Year
1 page
2425 F.5 Listening 1st Exam - Part B - Answers - Updated On Dec 16
No ratings yet
2425 F.5 Listening 1st Exam - Part B - Answers - Updated On Dec 16
7 pages
Connectors of Time:: Conjuctions
No ratings yet
Connectors of Time:: Conjuctions
1 page
English III Homework.1
No ratings yet
English III Homework.1
4 pages
Forming Adjectives
No ratings yet
Forming Adjectives
19 pages
Comparatives (Worksheet)
No ratings yet
Comparatives (Worksheet)
2 pages
Then and Now
No ratings yet
Then and Now
3 pages
Inflectional and Derivational Morphemes
No ratings yet
Inflectional and Derivational Morphemes
3 pages