0% found this document useful (0 votes)
19 views24 pages

Dar Lecture 7

The document provides an overview of various data structures in R, including matrices, factors, lists, and data frames, along with their uses and examples. It explains how to create and manipulate these structures, including plotting scatter and contour plots, accessing elements, and handling categorical data. Additionally, it covers data frame operations such as adding/removing columns, subsetting, and dealing with missing data.

Uploaded by

sharmahemant3610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views24 pages

Dar Lecture 7

The document provides an overview of various data structures in R, including matrices, factors, lists, and data frames, along with their uses and examples. It explains how to create and manipulate these structures, including plotting scatter and contour plots, accessing elements, and handling categorical data. Additionally, it covers data frame operations such as adding/removing columns, subsetting, and dealing with missing data.

Uploaded by

sharmahemant3610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Matrix, Factor, List and Data

Frames
Compiled and Presented by:
Dr. Chetna Arora
Plotting a Scatter Plot
• A scatter plot is a type of graph used to display and analyze the
relationship between two numerical variables.
• Each point on the plot represents a pair of values, with one variable on the
x-axis (horizontal) and the other on the y-axis (vertical).
• Scatter plots are commonly used to visualize correlations or trends
between variables, where patterns like clusters, linear relationships, or
outliers may appear.

• For example, in R, you can create a scatter plot using the plot() function:

• x <- c(1, 2, 3, 4, 5)
• y <- c(2, 4, 5, 7, 10)
• plot(x, y)
• This will plot the points (1,2), (2,4), (3,5), (4,7), and (5,10) on a graph.
Matrix:

• A matrix is a two-dimensional data structure


(with rows and columns) where all elements
must be of the same type.
• Use: It is used for mathematical computations
like matrix multiplication, transformations,
and handling tabular data with uniform types.
• Example: m <- matrix(1:9, nrow=3, ncol=3)
Matrices
• Let us create a matrix which is 3 rows by 4 columns and set all its
elements to 1.
> matrix (1, 3, 4)
• Use a vector to create an array, 3 rows high and 3 columns wide.
• Step 1: Begin by creating a vector that has elements from 10 to 90
with an interval of 10.
> a <- seq(10, 90, by = 10)
• Step 2: Validate by printing the value of vector
a. > a [1] 10 20 30 40 50 60 70 80 90
• Step 3: Call the matrix function with vector, ‘a’ the number of rows
and the number of columns.
> matrix (a, 3, 3)
Re-shape the vector itself into an array
using the dim function.
• Step 1: Begin by creating a vector that has elements from 10 to 90
with an interval of 10.
> a <- seq (10, 90, by = 10)
• Step 2: Validate by printing the value of vector, a.
>a
[1] 10 20 30 40 50 60 70 80 90
• Step 3: Assign new dimensions to vector, a by passing a vector having
3 rows and 3 columns (c (3, 3)).
> dim(a) <- c(3, 3)
• Step 4: Print the values of vector, a. You will notice that the values
have shifted to form 3 rows by 3 columns. The vector is no longer one
dimensional. It has been converted into a two-dimensional matrix that
is 3 rows high and 3 columns wide.
Matrix Access
Access the elements of a 3 *4 matrix
Step 1: Create a matrix, ‘mat’, 3 rows high and 4
columns wide using a vector.
x <- 1:12
> x [1] 1 2 3 4 5 6 7 8 9 10 11 12
> mat <- matrix (x, 3, 4)
> mat
Step 2: Access the element present in the second
row and third column of the matrix, ‘mat’.
> mat [2, 3]
Contour Plot
• A contour plot is used to represent three-dimensional
data in two dimensions, where lines (called contours)
connect points of equal value.
• In such plots, variations in values are essential because
they help distinguish different levels on the plot.
• If all values in the matrix are the same (like in example
where all values are 1), the contour plot won't have
any variation, and it won't display meaningful
contours, as there are no differences between the
values.
Contour Plot
• Create a matrix, ‘mat’ which is 9 rows high and 9 columns wide
and assign the value ‘1’ to all its elements.
> mat <- matrix(1, 9, 9)
• (This contour plot will not give anything because all the values are
equal 1)
>mat[3, 3] <-0
(Now, there is variation in the matrix. It will show you the contour
plot).
• Plot the contour chart using the contour() function.
• The contour() function creates a contour plot or adds contour
lines to an existing plot.
> contour(mat)
FACTORS
A factor is used to represent categorical data in R. It
stores both the actual values and the levels or
categories.
Factors are particularly useful for statistical modeling
and plotting, where categories are involved.
Use: Helps manage and interpret categorical data
(e.g., gender, color, types of products) efficiently.
• Example:
• f <- factor(c("Male", "Female", "Female", "Male"))
• In R, a factor is a special data structure used to store categorical data.
• Categorical data refers to data that can take on a limited, fixed number
of values, such as "Male" or "Female", "Red", "Green", or "Blue", or
levels like "Low", "Medium", and "High".
• Factors are particularly useful in statistical analysis and data
visualization because R treats the different categories (or levels)
distinctly.

• Why is c() used in creating a factor?


• The c() function is used to combine elements into a vector. In R, factors
are created by first using c() to combine the categorical values into a
vector, and then wrapping that vector in the factor() function to
convert it into a factor.

• The c() function stands for concatenate or combine. When creating a


factor, c() simply groups together the individual categorical values (like
"Male" and "Female") into a vector before turning it into a factor
Why use factors instead of just vectors?

• Factors handle categorical data more efficiently:


• In a vector, R treats each element as a separate character
string. However, in a factor, R recognizes the unique
categories, which makes it more efficient to handle and
store
• c() is used to combine individual categorical elements into a
vector.
• factor() is used to convert that vector into a factor, which is
a structure for handling categorical data efficiently.
List
• A list is a collection of elements of different types,
such as numbers, strings, vectors, or even other
lists. It is a more flexible structure than a vector.
• Use: It’s useful when dealing with heterogeneous
data (i.e., data with different types) or when
returning multiple objects from a function.
• Example:
• l <- list(name="Alice", age=25, scores=c(89, 92, 95))
Data Frames
• A data frame is one of the most commonly used data structures in R for
storing tabular data. It is similar to a table or spreadsheet, where data is
organized into rows and columns.
• A data frame can store data of different types (numeric, character, logical,
etc.) in each column, making it flexible and essential for data analysis tasks.

• Key Characteristics of Data Frames


• Columns can have different data types: Each column can store data of a
different type (numeric, character, logical, factor, etc.).
• Rows represent observations: Each row represents an individual observation
or record.
• Columns represent variables: Each column corresponds to a variable or
feature.
• Named columns and rows: Columns have names (variable names), and rows
can have row names (although it's not necessary).
Creating a Data Frame
The data.frame() function is used to create a data frame.
Example of creating a data frame
df <- data.frame( Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35), Gender = c("Female", "Male",
"Male"), stringsAsFactors = FALSE )

# Optional argument to prevent character vectors from


being converted to factors
>df
Accessing Data in a Data Frame

• You can access the elements of a data frame


using different methods:
• Access by column name:
• df$Age :Access the "Age" column
• df["Age"] : Access the "Age" column as a data
frame
• df[, "Age"] : Access the "Age" column as a
vector
Access by row and column index

• Use [] to access elements by their row and


column positions:
df[1, 2] :Access the element at the first row
and second column (Age of Alice)
• df[1, ] : Access the entire first row
• df[, 2] :Access the entire second column (Age)
Adding and Removing Columns

• Adding a column: You can add new columns


by assigning values to a new column name:
df$Salary <- c(50000, 60000, 55000)
• Removing a column: You can remove columns
by assigning NULL to the column:
• df$Gender <- NULL :Remove the Gender
column
Modifying Data in a Data Frame

• You can modify existing data by directly


assigning new values.
• # Changing the age of "Alice"
df$Age[1] <- 26
Subsetting Data Frames

• Subsetting allows you to extract specific rows


or columns based on conditions.
• Extracting specific rows:
• subset(df, Age > 30) :Extract rows where Age is
greater than 30
Summary statistics
summary() provides a summary of each column, including min, max, mean,
etc.:
summary(df)
Aggregating data: You can use functions like aggregate() or tapply() to
compute group-level statistics.
aggregate(Salary ~ Age, data = df, FUN = mean) : Calculate the mean salary for
each age group
Merging Data Frames
You can combine data frames using functions like merge(), rbind(), and
cbind().
Merging by common columns:
df1 <- data.frame(ID = c(1, 2, 3), Name = c("Alice", "Bob", "Charlie"))
df2 <- data.frame(ID = c(1, 2, 3), Age = c(25, 30, 35))
merged_df <- merge(df1, df2, by = "ID")
Dealing with Missing Data in Data Frames

• R provides functions to handle missing data (NA) in


data frames.
• Checking for missing values:
is.na(df) # Check for missing values
• Removing missing values:
df_clean <- na.omit(df) :Remove rows with missing
values
• Replacing missing values:
df$Age[is.na(df$Age)] <- mean(df$Age, na.rm =
TRUE) :Replace missing Age values with the mean age
Here are some commonly used functions to
manipulate and summarize data frames
• head(df): View the first few rows of the data frame.
• tail(df): View the last few rows of the data frame.
• nrow(df): Get the number of rows in the data frame.
• ncol(df): Get the number of columns in the data frame.
• names(df): Get or set the column names of the data
frame.
• dim(df): Get the dimensions (rows and columns) of the
data frame.
• str(df): Display the structure of the data frame (data
types and number of rows/columns).
Converting Other Data Structures to Data Frames

• Converting a matrix to a data frame:


• mat <- matrix(1:9, nrow = 3, ncol = 3)
• df_from_matrix <- as.data.frame(mat)
• Converting a list to a data frame:
• lst <- list(Name = c("Alice", "Bob"), Age = c(25,
30))
• df_from_list <- as.data.frame(lst)
• A data frame is an essential and flexible data
structure in R for handling tabular data. It
allows for efficient storage and manipulation
of data, providing easy access to rows and
columns. Data frames form the foundation for
many data analysis tasks in R, and
understanding how to create, access, and
manipulate them is crucial for working with
data in R.

You might also like