0% found this document useful (0 votes)
42 views3 pages

Text Mining and Word Cloud in R

This document discusses loading packages and preparing text data for analysis in R. It loads packages for text mining, stemming, word clouds and colors. It then loads text data as a corpus, constructs a term document matrix, and cleans the text by converting to lowercase, removing numbers, stopwords and punctuation. Finally, it generates a word cloud of the most frequent terms and explores frequent term associations.

Uploaded by

yashsethea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views3 pages

Text Mining and Word Cloud in R

This document discusses loading packages and preparing text data for analysis in R. It loads packages for text mining, stemming, word clouds and colors. It then loads text data as a corpus, constructs a term document matrix, and cleans the text by converting to lowercase, removing numbers, stopwords and punctuation. Finally, it generates a word cloud of the most frequent terms and explores frequent term associations.

Uploaded by

yashsethea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

#Install and load the required packages

# for text mining


[Link]("tm")
# for text stemming
[Link]("SnowballC")
# for word-cloud generator
[Link]("wordcloud")
# for colour palettes
[Link]("RColorBrewer")

# Load
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

#Load the data as a corpus

docs <- Corpus(VectorSource(text))

#Build a term-document matrix

dtm <- TermDocumentMatrix(docs)


m <- [Link](dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- [Link](word = names(v),freq=v)
head(d, 10)
#Cleaning the text

# Convert the text to lower case


docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word # specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)

# Build a term-document matrix

dtm <- TermDocumentMatrix(docs)


m <- [Link](dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- [Link](word = names(v),freq=v)
head(d, 10)

#Generate the Word cloud

[Link](1234)

wordcloud(words = d$word, freq = d$freq, [Link] = 1,

[Link]=200, [Link]=FALSE, [Link]=0.35,


colors=[Link](8, "Dark2"))

#Explore frequent terms and their associations

findFreqTerms(dtm, lowfreq = 4)

You might also like