0% found this document useful (0 votes)
11 views19 pages

R Studio Commands

The document provides an overview of data manipulation and analysis techniques in R, including commands for viewing data, creating various data structures (vectors, matrices, arrays, lists, factors, and data frames), and performing statistical calculations. It also covers measures of central tendency and dispersion, relationships between variables, and basic data visualization methods such as bar and line charts. Additionally, it discusses simple and multiple regression models, as well as textual analysis techniques for word frequency and sentiment scoring.

Uploaded by

24f2001200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

R Studio Commands

The document provides an overview of data manipulation and analysis techniques in R, including commands for viewing data, creating various data structures (vectors, matrices, arrays, lists, factors, and data frames), and performing statistical calculations. It also covers measures of central tendency and dispersion, relationships between variables, and basic data visualization methods such as bar and line charts. Additionally, it discusses simple and multiple regression models, as well as textual analysis techniques for word frequency and sentiment scoring.

Uploaded by

24f2001200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

1.

To View First 6 Rows of the data:


Head(dataset name)

2. To View first n number of Rows of


the Data:
Print(DatasetName,n=120)
3. To View Last 6 Rows of the Data:
Tail(Datasetname)

4. To View the Structure of the Dataset


Str(DatasetName)

5. To View the Dimensions of Rows and


Columns (Counting)
Dim(DatasetName)

6. To View First 6 Rows of a Specific


Column
Head(Datasetname$Columnname)
7. To View Summary Statistics of the
Dataset
Summary(Datasetname)

Data Structures in R

1. Vector (1 Dimensional Data)


It is a collection of values of the same type
(numeric, character, logical etc.)
- To Create Numeric Vector
a <- c(10,20,30,40)
print (a)
- To Create a character vector
Fruits <- c(“Apple”, “Cherry”, “Kiwi”)
Print (Fruits)
Numeric and Character can me mixed to create a
vector.
2. Matrix (2-Dimentional Data)
It has same type of elements in rows and
columns.
Matrix works on multiplication. Example: 1:15
range of values, then rows=5 and column=3
Variable <- Matrix (Range of Values, Rows,
Column)
- To Create a 3X3 Numeric Matrix
a <- matrix (1:9, nrow=3, ncol=3)
print (a)
3. Array (Multidimensional Data)
It is like a matrix but can have more than two
dimensions.
Arr<- array(1:18,dim=c(3,3,2))
Print(Arr)

4. List (Collection of Different Data


Types)
It can store different types of data (vectors,
matrics, dataframes etc.)
Studentlist<- list(name= “John”, age = 25, scores
= c(90,85,88))
Print(Studentlist)

5. Factor (Categorical Data)


It used to store categorical (Grouped) data such
as Male/Female, Yes/No, Low/Medium/High etc.
Gender <- factor (c(“Male”, “Female”, “Male” ,
“Male”))
Print(Gender)
6. Data frame (Tabular Data)
It is like a spreadsheet or table that holds
different types of data (numbers, text, logical
values etc) in columns.

df<-data.frame(Name =
c("Alice","Bob","Charlie"),Age =
c(25,30,35),Score = c(90,85,88))
> print(df)

Summary: Choosing the Right Data


Structure
Data Definition Example Use
Structur Case
e

Vector 1D collection of Storing test


same type scores (e.g.,
c(85, 90, 78))

Matrix 2D table of same Storing sales


type elements data for multiple
products
Array Multidimensional Storing weather
matrix data over
multiple years

List Stores different Collecting a


types of data person’s details
(age, name,
grades)

Factor Categorical data Gender, survey


with levels responses (e.g.,
Yes/No)

Data Table with A dataset with


Frame different types in names, ages,
columns and test scores

Data Frame Practice

# Create sample data


data <- data.frame(
age = c(25, 30, 35, 40, 45, 50, 55, 60, 65, 70),
income = c(35000, 40000, 45000, 50000,
55000, 60000, 65000, 70000, 75000, 80000)
)
print(data)

To print only one Column of the Table


Print(Data$Columnname)

Measures of Central Tendency

1. To Calculate Average Age of the Data


Mean_age <- mean(data$age)
Print (Mean_age)
2. To Calculate Average Income of the
Data
Mean_income <- mean(data$income)
Print (Mean_income)

3. To Calculate Median Age of the Data


median_age <- median(data$age)
Print (median_age)
4. To Calculate Median Income of the
Data
Median_income <- Median(data$income)
Print (Median_income)
5. To Calculate Mode Income of the
Data
mode <- function(x) {
names(sort(table(x), decreasing=TRUE))[1]
}

# Example usage:
numbers <- c(1,2,2,3,4,4,4,5)
mode(numbers)

Measures of Dispersion

Create table on R studio


math_scores science_scores
1 75 70
2 80 78
3 68 65
4 90 92
5 85 88
6 72 75
7 88 85
8 72 96
9 88 80
10 95 83
1. To make the Table:
> StudentScores <- data.frame(math_scores =
c(75,80,68,90,85,72,88,72,88,95), science_scores
= c(70,78,65,92,88,75,85,96,80,83))
> print(StudentScores)

2. To Calculate Range
Range_maths =
range(studentscores$maths_Scores)
Range_science =
range(studentscores$maths_scores)
3. To calculate difference in Range
Diff(range_maths)
Diff(range_science)

4. To Calculate Variance
> var_maths <- var(StudentScores$math_scores)
> print(var_maths)
> var_science <-
var(StudentScores$science_scores)
> print(var_science)

5. To Calculate Standard Deviation


> sd_maths <- sd(StudentScores$math_scores)
> sd_science <-
sd(StudentScores$science_scores)
> print(sd_maths)
[1] 9.177872
> print(sd_science)
[1] 9.647107

Relationship between Variables


- Covariance
- > cov_maths_science <- cov(StudentScores$math_scores,
StudentScores$science_scores)
- > print(cov_maths_science)
- [1] 41.26667
- > round(cov_maths_science,2)
- [1] 41.27

- Correlation
- > cor_maths_science <-
cor(StudentScores$math_scores,StudentScores$science_scores)
- > print(cor_maths_science)
- [1] 0.4660798

- Coefficient of Determination (R2)

It is simply the square of the correlation


coefficient.
R_squared <- coefficient of Correlation^2
> print(R_squared)

Data Visualization in R Studio

Bar Chart

Month Furnitu Office Technolo


re Supplies gy

Januar 4200 3000 3900


y

Februa 4700 3200 4100


ry

March 4500 3100 4400


# Step 1: Create the data
month <- c("January", "February", "March")
furniture <- c(4200, 4700, 4500)
office_supplies <- c(3000, 3200, 3100)
technology <- c(3900, 4100, 4400)

# Step 2: Calculate total sales per category


total_sales <- c(sum(furniture),
sum(office_supplies), sum(technology))
categories <- c("Furniture", "Office Supplies",
"Technology")

# Step 3: Create the bar chart


barplot(total_sales,
names.arg = categories,
col = "skyblue",
main = "Total Sales by Category",
xlab = "Category",
ylab = "Total Sales")

Explanation:
 barplot() creates the bar chart.
 names.arg specifies the category labels
shown on the x-axis.
 col adds color to the bars.
 main, xlab, ylab set the chart title and
axis labels.

Line Chart
# Step 1: Create the sales data for
Technology
months <- c("January", "February",
"March")
tech_sales <- c(3900, 4100, 4400)

# Step 2: Create the line chart


plot(Tech_sales,type = "o", col = "blue", main =
"Technology Sales Trend", xlab = "Month", ylab =
"Sales")

📝 Explanation:
 plot() creates the line chart.
 type = "o" means both line and points
are shown.
 xaxt = "n" turns off the default x-axis
(so we can use custom month labels).
 axis() is used to manually label the x-
axis with month names.
col sets the line color, and main, xlab, ylab
provide chart title and axis labels.

Simple Regression Model :


Step 1: Create your sample data
R
CopyEdit
Hours <- c(1,2,3,4,5,6,7,8,9,10)
Score <- c(50,55,60,65,65,70,75,80,85,90)
Step 2: Fit a linear regression model
R
CopyEdit
model <- lm(Score ~ Hours)
Step 3: Check your regression model
summary
R
CopyEdit
summary(model)
Step 4: Make predictions with intervals (for
5 Hours)
R
CopyEdit
new_data <- data.frame(Hours = 5)

# Confidence interval (Average score


prediction)
predict(model, newdata=new_data,
interval="confidence", level=0.95)
# Prediction interval (Individual score
prediction)
predict(model, newdata=new_data,
interval="prediction", level=0.95)
Easy Explanation:
 Confidence interval shows where the
average score for students studying 5 hours
would likely fall.
 Prediction interval tells you where the
score for one specific student studying 5
hours might fall.

Multiple Regression Model:

 Hours studied (Hours)

 Number of classes attended


(Attendance)

✅ Step 1: Create the Data


# Sample data
Hours <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Attendance <- c(5, 6, 7, 7, 8, 8, 9, 9, 10, 10)
Score <- c(50, 52, 55, 58, 60, 63, 68, 74, 78,
85)

# Combine into a data frame


data <- data.frame(Hours, Attendance,
Score)

✅ Step 2: Build the Multiple Linear


Regression Model
# Fit the model
model <- lm(Score ~ Hours + Attendance,
data = data)

# View summary
summary(model)

Texual Analysis

Easy Textual Analysis Question in R Studio:


Question:
Given the sentence:
"Data science is easy and fun. Data analysis is
interesting."
Find how many times each word occurs (word
frequency).

Step-by-step Solution in R Studio:


Step 1: Enter text into R
text <- "Data science is easy and fun. Data
analysis is interesting."
Step 2: Split text into words
words <- unlist(strsplit(tolower(text), "\\W+"))
 tolower() converts all letters to lowercase.
 strsplit() splits the text into words by spaces
and punctuation (\\W+).
Step 3: Calculate frequency
word_freq <- table(words)
word_freq
 table() counts how many times each word
appears.
Output:
words
analysis and data easy fun interesting
is science
1 1 2 1 1 1 2
1
Explanation:
 The word "data" appears 2 times.
 The word "is" appears 2 times.
 All other words appear 1 time.

Texual Analysis Question type 2:

🔹 Step 1: Create Simple Text Data


r
CopyEdit
text <- c("I love R", "R is boring", "Text mining is
fun", "I hate this", "R is helpful")
🔹 Step 2: Count Words (Text Mining)
r
CopyEdit
words <- tolower(unlist(strsplit(text, " ")))
table(words)
✅ This shows how often each word appears.

🔹 Step 3: Categorize as Positive or Negative


r
CopyEdit
positive_words <- c("love", "fun", "helpful")
negative_words <- c("boring", "hate")

category <- ifelse(grepl(paste(positive_words,


collapse="|"), text), "Positive",
ifelse(grepl(paste(negative_words,
collapse="|"), text), "Negative", "Neutral"))

data.frame(text, category)
✅ Each sentence is marked as Positive,
Negative, or Neutral.

🔹 Step 4: Sentiment Score (Simple)


r
CopyEdit
install.packages("syuzhet") # Only run once
library(syuzhet)

sentiment <- get_sentiment(text)


data.frame(text, sentiment)
✅ This gives a number (score):
 Positive score = Good mood
 Negative score = Bad mood

You might also like