INTRODUCTION TO
DATA SCIENCE &
R PROGRAMMING
K.NAGA JYOTHI MA,M.Tech(DEPARTMENT OF COMPUTER SCIENCE)
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
UNIT I: Defining Data Science and Big data, Benefits and Uses, facets of Data,
Data Science Process. History and Overview of R, Getting Started with R, R
Nuts and Bolts.
DATA SCIENCE
Data science is the study of data that helps us derive useful insight for business decision making.
Data Science is all about using tools, techniques, and creativity to uncover insights hidden within
data. It combines math, computer science, and domain expertise to tackle real-world challenges in a
variety of fields.
Data Science processes the raw data and solves business problems and even makes prediction about
the future trend or requirement. For example, from the huge raw data of a company, data science
can help answer following question:
What does customer want?
How can we improve our services?
What will the upcoming trend in sales?
How much stock they need for upcoming festival.
In short, data science empowers the industries to make smarter, faster, and more informed decisions.
In order to find patterns and achieve such insights, expertise in relevant domain is required.
Data science involves these key steps:
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Data Collection: Gathering raw data from various sources, such as databases, sensors, or user
interactions.
Data Cleaning: Ensuring the data is accurate, complete, and ready for analysis.
Data Analysis: Applying statistical and computational methods to identify patterns, trends, or
relationships.
Data Visualization: Creating charts, graphs, and dashboards to present findings clearly.
Decision-Making: Using insights to inform strategies, create solutions, or predict outcomes.
IMPORTANTANCE OF DATA SCIENCE:
In a world flooded with user-data, data science is crucial for driving progress and innovation in
every industry. Here are some key reasons why it is so important:
Helps Business in Decision-Making: By analyzing data, businesses can understand trends and
make informed choices that reduce risks and maximize profits.
Improves Efficiency: Organizations can use data science to identify areas where they can save
time and resources.
Personalizes Experiences: Data science helps create customized recommendations and offers
that improve customer satisfaction.
Predicts the Future: Businesses can use data to forecast trends, demand, and other important
factors.
Drives Innovation: New ideas and products often come from insights discovered through data
science.
Benefits Society: Data science improves public services like healthcare, education, and
transportation by helping allocate resources more effectively.
Real Life Example of Data Science
There are lots of examples you can observe around yourself, where data science is being used. For
Example – Social Media, Medical, Preparing strategy for Cricket or FIFA by analyzing past
matches. Here are some more real life examples:
Social Media Recommendation:
Have you ever wondered why you always get Instagram Reels aligned towards your interest? These
platforms uses data-science to Analyze your past interest/data (Like, Comments, watch etc.) and
create personalized recommendation to serve content that matches your interests.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Early Diagnosis of Disease:
Data Science can predicts the risk of conditions like diabetes or heart disease, by analyzing a
patient’s medical records and lifestyle habits. This allows doctors to act early and improve lives. In
Future, it can help doctors detect diseases before symptoms even start to appear. For example,
predicting a Tumor or Cancer at a very early stage. Data Science uses medical history and Image-
data for such prediction.
E-commerce recommendation and Demand Forecast:
E-commerce platforms like Amazon or Flipkart use data science to enhance the shopping
experience. By analyzing your browsing history, purchase behavior, and search patterns, they
recommend products based on your preferences. It can also help in predicting demand for products
by studying past sales trends, seasonal patterns etc.
APPLICATIONS OF DATA SCIENCE:
Data science has a wide range of applications across various industries, by transforming how they
operate and deliver results. Here are some examples:
It helps detect fraudulent transactions, manage risks, and provide personalized financial advice.
Businesses use data science to understand customer behavior, recommend products, optimize
inventory, and improve supply chains.
Data science powers innovations like search engines, virtual assistants, and recommendation
systems.
It enables route optimization, traffic management, and predictive maintenance for vehicles.
Data science helps in designing personalized learning experiences, tracking student performance,
and improving administrative efficiency.
Important Data Science Skills
Data Scientists need a mix of technical and soft skills to excel in this domain. To start with data
science, it’s important to learn the basics like Mathematics and Basic programming skills. Here are
some essential skills for a successful career in data science:
Programming: Proficiency in programming languages like Python, R, or SQL is crucial for
analyzing and processing data effectively.
Statistics and Mathematics: A strong foundation in statistics and linear algebra helps in
understanding data patterns and building predictive models.
Machine Learning: Knowledge of machine learning algorithms and frameworks is key to
creating intelligent data-driven solutions.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Data Visualization: The ability to present data insights through tools like Tableau, Power BI,
or Matplotlib ensures findings are clear and actionable.
Data Wrangling: Skills in cleaning, transforming, and preparing raw data for analysis are vital
for maintaining data quality.
Big Data Tools: Familiarity with tools like Hadoop, Spark, or cloud platforms helps in handling
large datasets efficiently.
Critical Thinking: Analytical skills to interpret data and solve problems creatively are essential
for uncovering actionable insights.
Communication: The ability to explain complex data findings in simple terms to stakeholders is
a valuable asset.
BIG DATA:
Big Data refers to the vast volumes of data generated at high velocity from a variety of sources. This
data is characterized by the three V’s: Volume, Velocity, and Variety.
1. Volume: Big Data involves large datasets that are too complex for traditional data processing
tools to handle. These datasets can range from terabytes to petabytes of information.
2. Velocity: Big Data is generated in real-time or near real-time, requiring fast processing to extract
meaningful insights.
3. Variety: The data comes in multiple forms, including structured data (like databases), semi-
structured data (like XML files), and unstructured data (like text, images, and videos).
Big Data’s primary role is to collect and store this massive amount of information efficiently.
Technologies such as Hadoop, Apache Spark, and NoSQL databases like MongoDB are commonly
used to manage and process Big Data.
Key Differences between Big Data and Data Science
While Big Data and Data Science are interrelated, they serve different purposes and require
different skill sets.
Aspect Big Data Data Science
Handling and processing vast amounts Extracting insights and knowledge from
Definition
of data data
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Aspect Big Data Data Science
Efficient storage, processing, and Analyzing data to inform decisions and
Objective
management of data predict trends
Analytical methods, models, and
Focus Volume, velocity, and variety of data
algorithms
Collection, storage, and processing of Data analysis, modeling, and
Primary Tasks
data interpretation
Hadoop, Spark, NoSQL databases (e.g.,
Tools/Technologies Python, R, TensorFlow, Scikit-Learn
MongoDB)
Structured, semi-structured, and
Data Types Processed and cleaned data for analysis
unstructured data
Outcome Accessible data repositories for analysis Actionable insights, predictive models
Data engineering, distributed Statistical analysis, machine learning,
Skill Set
computing programming
Data Scientists, Machine Learning
Typical Roles Data Engineers, Big Data Analysts
Engineers
Real-time data processing, large-scale Predictive analytics, data-driven
Applications
data storage decision making
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Aspect Big Data Data Science
Distributed computing, data Statistical modeling, machine learning
Key Techniques
warehousing algorithms
How Big Data and Data Science Complement Each Other
Despite their differences, Big Data and Data Science are complementary fields. Big Data provides
the foundation by collecting and storing vast amounts of information. Without this foundational
layer, Data Science would lack the raw material needed for analysis.
Conversely, Data Science adds value to Big Data by analyzing and interpreting the data. The
insights derived from Data Science can help businesses leverage Big Data more effectively,
uncovering trends and patterns that can inform strategic decisions.
Facets of Data
• Very large amount of data will generate in big data and data science. These data is various types
and main categories of data are as follows:
a) Structured
b) Unstructured
c) Natural language
d) Machine-generated
e) Graph-based
f) Audio, video and images
g )Streaming
a)Structured Data
• Structured data is arranged in rows and column format. It helps for application to retrieve and
process data easily. Database management system is used for storing structured data.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
• The term structured data refers to data that is identifiable because it is organized in a structure.
The most common form of structured data or records is a database where specific information is
stored based on a methodology of columns and rows.
• An Excel table is an example of structured data.
b)Unstructured Data
• Unstructured data is data that does not follow a specified format. Row and columns are not used
for unstructured data. Therefore it is difficult to retrieve required information. Unstructured data
has no identifiable structure.
• The unstructured data can be in the form of Text: (Documents, email messages,
c)Natural Language
• Natural language is a special type of unstructured data.
• Natural language processing enables machines to recognize characters, words and sentences,
then apply meaning and understanding to that information. This helps machines to understand
language as humans do.
•For natural language processing to help machines understand human language, it must go through
speech recognition, natural language understanding and machine translation. It is an iterative
process comprised of several layers of text analysis.
d)Machine - Generated Data
• Machine-generated data is an information that is created without human interaction as a result of
a computer process or application activity. This means that data entered manually by an end-user
is not recognized to be machine-generated.
• Machine data contains a definitive record of all activity and behavior of our customers, users,
transactions, applications, servers, networks, factory machinery and so on.
e)Graph-based or Network Data
•Graphs are data structures to describe relationships and interactions between entities in complex
systems. In general, a graph contains a collection of entities called nodes and another collection of
interactions between a pair of nodes called edges.
• Nodes represent entities, which can be of any object type that is relevant to our problem domain.
By connecting nodes with edges, we will end up with a graph (network) of nodes.
f)Audio, Image and Video
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
• Audio, image and video are data types that pose specific challenges to a data scientist. Tasks that
are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for
computers.
•The terms audio and video commonly refers to the time-based media storage format for
sound/music and moving pictures information. Audio and video digital recording, also referred as
audio and video codecs, can be uncompressed, lossless compressed or lossy compressed
depending on the desired quality and use cases.
g)Streaming Data
Streaming data is data that is generated continuously by thousands of data sources, which typically
send in the data records simultaneously and in small sizes (order of Kilobytes).
• Streaming data includes a wide variety of data such as log files generated by customers using
your mobile or web applications, ecommerce purchases, in-game player activity, information from
social networks, financial trading floors or geospatial services and telemetry from connected
devices or instrumentation in data centers.
Difference between Structured and Unstructured Data
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
DATA SCIENCE PROCESS LIFE CYCLE
Some steps are necessary for any of the tasks that are being done in the field of data science to
derive any fruitful results from the data at hand.
Data Collection – After formulating any problem statement the main task is to calculate data
that can help us in our analysis and manipulation. Sometimes data is collected by performing
some kind of survey and there are times when it is done by performing scrapping.
Data Cleaning – Most of the real-world data is not structured and requires cleaning and
conversion into structured data before it can be used for any analysis or modeling.
Exploratory Data Analysis – This is the step in which we try to find the hidden patterns in the
data at hand. Also, we try to analyze different factors which affect the target variable and the
extent to which it does so.
Model Building – Different types of machine learning algorithms as well as techniques have
been developed which can easily identify complex patterns in the data which will be a very
tedious task to be done by a human.
Model Deployment – After a model is developed and gives better results on the holdout or the
real-world dataset then we deploy it and monitor its performance. This is the main part where we
use our learning from the data to be applied in real-world applications and use cases.
Data Science Process Life Cycle
Key Components of Data Science Process
Data Science is a very vast field and to get the best out of the data at hand one has to apply multiple
methodologies and use different tools to make sure the integrity of the data remains intact
throughout the process keeping data privacy in mind. If we try to point out the main components of
Data Science then it would be:
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Data Analysis – There are times when there is no need to apply advanced deep learning and
complex methods to the data at hand to derive some patterns from it. Due to this before moving
on to the modeling part, we first perform an exploratory data analysis to get a basic idea of the
data and patterns which are available in it this gives us a direction to work on if we want to apply
some complex analysis methods on our data.
Statistics – It is a natural phenomenon that many real-life datasets follow a normal distribution.
And when we already know that a particular dataset follows some known distribution then most
of its properties can be analyzed at once.
Data Engineering – When we deal with a large amount of data then we have to make sure that
the data is kept safe from any online threats also it is easy to retrieve and make changes in the
data as well. To ensure that the data is used efficiently Data Engineers play a crucial role.
Advanced Computing
o Machine Learning – Machine Learning has opened new horizons which had helped us
to build different advanced applications and methodologies so, that the machines
become more efficient and provide a personalized experience to each individual and
perform tasks in a snap of the hand earlier which requires heavy human labor and time
intense.
o Deep Learning – This is also a part of Artificial Intelligence and Machine Learning
but it is a bit more advanced than machine learning itself. High computing power and a
huge corpus of data have led to the emergence of this field in data science.
HISTORY AND OVERVIEW OF R
R is an open-source programming language used in statistical software and data analysis tools. It is
an important tool for Data Science. It is highly popular and is the first choice of many statisticians
and data scientists.
R includes powerful tools for creating aesthetic and insightful visualizations.
Facilitates data extraction, transformation and loading with interfaces for SQL, spreadsheets, and
more.
Provides essential packages for cleaning and transforming data.
Enables the application of ML algorithms to predict future events.
Supports analysis of unstructured data through NoSQL database interfaces.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
R PROGRAMMING LANGUAGE
R programming is a leading tool for machine learning, statistics, and data analysis, allowing for the
easy creation of objects, functions, and packages. Designed by Ross Ihaka and Robert Gentleman at
the University of Auckland and developed by the R Development Core Team, R Language is
platform-independent and open-source, making it accessible for use across all operating systems
without licensing costs.
R is one of the most sought-after programming languages today. Originating as an implementation
of the S programming language with influences from Scheme, R has evolved since its conception in
1992, with its first stable beta version released in 2000.
Features of R Programming Language
The R Language is renowned for its extensive features that make it a powerful tool for data
analysis, statistical computing, and visualization. Here are some of the key features of R:
1. Comprehensive Statistical Analysis:
R language provides a wide array of statistical techniques, including linear and nonlinear
modeling, classical statistical tests, time-series analysis, classification, and clustering.
2. Advanced Data Visualization:
With packages like ggplot2, plotly, and lattice, R excels at creating complex and aesthetically
pleasing data visualizations, including plots, graphs, and charts.
3. Extensive Packages and Libraries:
The Comprehensive R Archive Network (CRAN) hosts thousands of packages that extend R’s
capabilities in areas such as machine learning, data manipulation, bioinformatics, and more.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
4. Open Source and Free:
R is free to download and use, making it accessible to everyone. Its open-source nature
encourages community contributions and continuous improvement.
5. Platform Independence:
R is platform-independent, running on various operating systems, including Windows, macOS,
and Linux, which ensures flexibility and ease of use across different environments.
6. Integration with Other Languages:
R language can integrate with other programming languages such as C, C++, Python, Java, and
SQL, allowing for seamless interaction with various data sources and computational processes.
7. Powerful Data Handling and Storage:
R efficiently handles and stores data, supporting various data types and structures, including
vectors, matrices, data frames, and lists.
8. Robust Community and Support:
R has a vibrant and active community that provides extensive support through forums, mailing
lists, and online resources, contributing to its rich ecosystem of packages and documentation.
9. Interactive Development Environment (IDE):
RStudio, the most popular IDE for R, offers a user-friendly interface with features like syntax
highlighting, code completion, and integrated tools for plotting, history, and debugging.
10. Reproducible Research:
R supports reproducible research practices with tools like R Markdown and Knitr, enabling users
to create dynamic reports, presentations, and documents that combine code, text, and
visualizations.
Advantages of R language
R is the most comprehensive statistical analysis package. As new technology and concepts often
appear first in R.
As R programming language is an open source. Thus, you can run R anywhere and at any time.
R programming language is suitable for GNU/Linux and Windows operating systems.
R programming is cross-platform and runs on any operating system.
In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Disadvantages of R language
In the R programming language, the standard of some packages is less than perfect.
Although, R commands give little pressure on memory management. So R programming
language may consume all available memory.
In R basically, nobody to complain if something doesn’t work.
R programming language is much slower than other programming languages such as Python and
MATLAB.
Applications of R language
We use R for Data Science. It gives us a broad variety of libraries related to statistics. It also
provides the environment for statistical computing and design.
R is used by many quantitative analysts as its programming tool. Thus, it helps in data importing
and cleaning.
R is the most prevalent language. So many data analysts and research programmers use it.
Hence, it is used as a fundamental tool for finance.
Tech giants like Google, Facebook, Bing, Twitter, Accenture, Wipro, and many more using R
nowadays.
Introduction to R Studio
R Studio is an integrated development environment(IDE) for R. IDE is a GUI, where you can write
your quotes, see the results and also see the variables that are generated during the course of
programming.
R Studio is available as both Open source and Commercial software.
R Studio is also available as both Desktop and Server versions.
R Studio is also available for various platforms such as Windows, Linux, and macOS.
R studio is an open-source tool that provides Ide to use R-language, and enterprise-ready
professional software for data science teams to develop share the work with their team.
R Studio can be downloaded from its official Website (https://rstudio.com/) and installation is
performed to use.
After the installation process is over, the R Studio interface looks like:
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
The console panel(left panel) is the place where R is waiting for you to tell it what to do, and see
the results that are generated when you type in the commands.
To the top right, you have the Environmental/History panel. It contains 2 tabs:
o Environment tab: It shows the variables that are generated during the course of
programming in a workspace that is temporary.
o History tab: In this tab, you’ll see all the commands that are used till now from the
start of usage of R Studio.
To the right bottom, you have another panel, which contains multiple tabs, such as files,
plots, packages, help, and viewer.
o The Files tab shows the files and directories that are available within the default
workspace of R.
o The Plots tab shows the plots that are generated during the course of programming.
o The Packages tab helps you to look at what are the packages that are already installed
in the R Studio and it also gives a user interface to install new packages.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
o The Help tab is the most important one where you can get help from the R
Documentation on the functions that are in built-in R.
o The final and last tab is that the Viewer tab which can be used to see the local web
content that’s generated using R.
Features of R Studio
A friendly user interface
writing and storing reusable programs
All imported data and newly created objects (such as variables, functions, etc.) are easily
accessible.
Comprehensive assistance for any item Code autocompletion
The capacity to organize and share your work with your partners more effectively through the
creation of projects.
Simple terminal and console switching
Tracking of operational history
Set the working directory in R Studio
R is always pointed at a directory on our computer. We can find out which directory by running
the getwd() function. Note: this function has no arguments. We can set the working directory
manually in two ways:
The first way is to use the console and using the command setwd(“directorypath”).
You can use this function setwd() and give the path of the directory which you want to be the
working directory for R studio, in the double codes.
The second way is to set the working directory from the GUI.
To set the working directory from the GUI you have to click on this 3 dots button. When you
click this, this will open up a file browser, which will help you to choose your working directory.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Once you choose your working directory, you need to use this setting button in the more tab and
click it and then you get a popup menu, where you need to select “Set as working directory”.
This will select the current directory, which you have chosen using this file browser as your
working directory. Once you set the working directory, you are ready to program in R Studio.
CREATE AN R-STUDIO PROJECT
Step 1: Select the FILE option and select create option.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Step 2: Then select the New Project option.
Step 3: Then choose the path and directory name.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Finally, project is created in a specific location:
Navigating directories in R studio
getwd(): Returns the current working directory.
setwd(): Set the working directory.
dir(): Return the list of the directory.
sessionInfo(): Return the session of the windows.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
date(): Return the current date.
Creating your first R script
Here we are adding two numbers in R studio.
Basic Syntax in R Programming
R is the most popular language used for Statistical Computing and Data Analysis with the support
of over 10, 000+ free packages in CRAN repository. Like any other programming language, R has a
specific syntax which is important to understand if you want to make use of its powerful features.
This article assumes R is already installed on your machine. We will be using RStudio but we can
also use R command prompt by typing the following command in the command line.
$R
This will launch the interpreter and now let’s write a basic Hello World program to get started.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
.
We can see that “Hello, World!” is being printed on the console. Now we can do the same thing
using print() which prints to the console. Usually, we will write our code inside scripts which are
called RScripts in R. To create one, write the below given code in a file and save it
as myFile.R and then run it in console by writing:
Rscript myFile.R
Output:
[1] "Hello, World!"
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Syntax of R program
A program in R is made up of three things: Variables, Comments, and Keywords. Variables are used
to store the data, Comments are used to improve code readability, and Keywords are reserved words
that hold a specific meaning to the compiler.
Variables in R: Previously, we wrote all our code in a print() but we don’t have a way to address
them as to perform further operations. This problem can be solved by using variables which like
any other programming language are the name given to reserved memory locations that can store
any type of data. In R, the assignment can be denoted in three ways:
1. = (Simple Assignment)
2. <- (Leftward Assignment)
3. -> (Rightward Assignment)
Output:
"Simple Assignment"
"Leftward Assignment!"
"Rightward Assignment"
The rightward assignment is less common and can be confusing for some programmers, so it is
generally recommended to use the <- or = operator for assigning values in R.
Comments in R
Comments are a way to improve your code’s readability and are only meant for the user so the
interpreter ignores it. Only single-line comments are available in R but we can also use multiline
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
comments by using a simple trick which is shown below. Single line comments can be written by
using # at the beginning of the statement.
EXAMPLE:
Output:
[1] "This is fun!"
DATA TYPES in R Programming Language
Each variable in R has an associated data type. Each R-Data Type requires different amounts of
memory and has some specific operations which can be performed over it.
1. numeric – (3,6.7,121)
2. Integer – (2L, 42L; where ‘L’ declares this as an integer)
3. logical – (‘True’)
4. complex – (7 + 5i; where ‘i’ is imaginary number)
5. character – (“a”, “B”, “c is third”, “69”)
6. raw – ( as.raw(55); raw creates a raw vector of the specified length)
R Programming language has the following basic R-data types and the following table shows the
data type and the values that each data type can take.
1. Numeric Data type in R
Decimal values are called numeric in R. It is the default R data type for numbers in R. If you assign
a decimal value to a variable x as follows, x will be of numeric type.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Real numbers with a decimal point are represented using this data type in R. It uses a format for
double-precision floating-point numbers to represent numerical values.
# A simple R program to illustrate Numeric data type
# Assign a decimal value to x
x = 5.6
# print the class name of variable
print(class(x))
# print the type of variable
print(typeof(x))
Output
[1] "numeric"
[1] "double"
2. Integer Data type in R
R supports integer data types which are the set of all integers.You can create as well as convert a
value into an integer type using the as.integer() function.You can also use the capital ‘L’ notation
as a suffix to denote that a particular value is of the integer R data type.
# A simple R program to illustrate integer data type
x = as.integer(5)
# print the class name of x
print(class(x))
# print the type of x
print(typeof(x))
# Declare an integer by appending an L suffix.
y = 5L
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
# print the class name of y
print(class(y))
# print the type of y
print(typeof(y))
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
3. Logical Data type in R
R has logical data types that take either a value of true or false. A logical value is often created via
a comparison between variables. Boolean values, which have two possible values, are represented
by this R data type: FALSE or TRUE
# A simple R program to illustrate logical data type
# Sample values
x=4
y=3
# Comparing two values
z=x>y
# print the logical value
print(z)
# print the class name of z
print(class(z))
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
# print the type of z
print(typeof(z))
Output
[1] TRUE
[1] "logical"
[1] "logical"
4. Complex Data type in R
R supports complex data types that are set of all the complex numbers. The complex data type is to
store numbers with an imaginary component.
# A simple R program to illustrate complex data type # Assign a complex value to x
x = 4 + 3i
# print the class name of x
print(class(x))
# print the type of x
print(typeof(x))
Output
[1] "complex"
[1] "complex"
5. Character Data type in R
R supports character data types where you have all the alphabets and special characters.It stores
character values or strings. Strings in R can contain alphabets, numbers, and symbols.The easiest
way to denote that a value is of character type in R data type is to wrap the value inside single or
double inverted commas.
# A simple R program to illustrate character data type # Assign a character value to char
char = "DataSciences"
# print the class name of char
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
print(class(char))
# print the type of char
print(typeof(char))
output:
[1] "character"
[1] "character"
6. Raw data type in R
To save and work with data at the byte level in R, use the raw data type. By displaying a series of
unprocessed bytes, it enables low-level operations on binary data. Here are some speculative data on
R’s raw data types:
# Create a raw vector
x <- as.raw(c(0x1, 0x2, 0x3, 0x4, 0x5))
print(x)
Output
[1] 01 02 03 04 05
Five elements make up this raw vector x, each of which represents a raw byte value.
R Operators
Operators are the symbols directing the compiler to perform various kinds of operations between
the operands. Operators simulate the various mathematical, logical, and decision operations
performed on a set of Complex Numbers, Integers, and Numericals as input operands.
R supports majorly four kinds of binary operators between a set of operands.Here,we will see
various types of operators in R Programming language and their usage.
Types of the operator in R language
Arithmetic Operators
Logical Operators
Relational Operators
Assignment Operators
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Miscellaneous Operators
Arithmetic Operators
Arithmetic Operators modulo using the specified operator between operands, which may be either
scalar values, complex numbers, or vectors. The R operators are performed element-wise at the
corresponding positions of the vectors.
Addition operator (+)
The values at the corresponding positions of both operands are added. Consider the following R
operator snippet to add two vectors:
a <- c (1, 0.1)
b <- c (2.33, 4)
print (a+b)
Output : 3.33 4.10
Subtraction Operator (-)
The second operand values are subtracted from the first. Consider the following R operator snippet
to subtract two variables:
a <- 6
b <- 8.4
print (a-b)
Output : -2.4
Multiplication Operator (*)
The multiplication of corresponding elements of vectors and Integers are multiplied with the use of
the ‘*’ operator.
B= c(4,4)
C= c(5,5)
print (B*C)
Output : 20 20
Division Operator (/)
The first operand is divided by the second operand with the use of the ‘/’ operator.
a <- 10
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
b <- 5
print (a/b)
Output : 2
Power Operator (^)
The first operand is raised to the power of the second operand.
a <- 4
b <- 5
print(a^b)
Output : 1024
Modulo Operator (%%)
The remainder of the first operand divided by the second operand is returned.
list1<- c(2, 22)
list2<-c(2,4)
print(list1 %% list2)
Output : 0 2
The following R code illustrates the usage of all Arithmetic R operators.
# R program to illustrate the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)
# Performing operations on Operands
cat ("Addition of vectors :", vec1 + vec2, "\n")
cat ("Subtraction of vectors :", vec1 - vec2, "\n")
cat ("Multiplication of vectors :", vec1 * vec2, "\n")
cat ("Division of vectors :", vec1 / vec2, "\n")
cat ("Modulo of vectors :", vec1 %% vec2, "\n")
cat ("Power operator :", vec1 ^ vec2)
Output
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Addition of vectors : 2 5
Subtraction of vectors : -2 -1
Multiplication of vectors : 0 6
Division of vectors : 0 0.6666667
Modulo of vectors : 0 2
Power operator : 0 8
Logical Operators
Logical Operators in R simulate element-wise decision operations, based on the specified operator
between the operands, which are then evaluated to either a True or False boolean value. Any non-
zero integer value is considered as a TRUE value, be it a complex or real number.
Element-wise Logical AND operator (&)
Returns True if both the operands are True.
list1 <- c(TRUE, 0.1)
list2 <- c(0,4+3i)
print(list1 & list2)
Output : FALSE TRUE
Any non zero integer value is considered as a TRUE value, be it complex or real number.
Element-wise Logical OR operator (|)
Returns True if either of the operands is True.
list1 <- c(TRUE, 0.1)
list2 <- c(0,4+3i)
print(list1|list2)
Output : TRUE TRUE
NOT operator (!)
A unary operator that negates the status of the elements of the operand.
list1 <- c(0,FALSE)
print(!list1)
Output : TRUE TRUE
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Logical AND operator (&&)
Returns True if both the first elements of the operands are True.
list1 <- c(TRUE, 0.1)
list2 <- c(0,4+3i)
print(list1[1] && list2[1])
Output : FALSE
Compares just the first elements of both the lists.
Logical OR operator (||)
Returns True if either of the first elements of the operands is True.
list1 <- c(TRUE, 0.1)
list2 <- c(0,4+3i)
print(list1[1]||list2[1])
Output : TRUE
The following R code illustrates the usage of all Logical Operators in R:
# R program to illustrate the use of Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)
# Performing operations on Operands
cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1[1] && vec2[1], "\n")
cat ("Logical OR :", vec1[1] || vec2[1], "\n")
cat ("Negation :", !vec1)
Output
Element wise AND : FALSE FALSE
Element wise OR : TRUE TRUE
Logical AND : FALSE
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Logical OR : TRUE
Negation : TRUE FALSE
Relational Operators
The Relational Operators in R carry out comparison operations between the corresponding elements
of the operands. Returns a boolean TRUE value if the first operand satisfies the relation compared
to the second. A TRUE value is always considered to be greater than the FALSE.
Less than (<)
Returns TRUE if the corresponding element of the first operand is less than that of the second
operand. Else returns FALSE.
list1 <- c(TRUE, 0.1,"apple")
list2 <- c(0,0.1,"bat")
print(list1<list2)
Output : FALSE FALSE TRUE
Less than equal to (<=)
Returns TRUE if the corresponding element of the first operand is less than or equal to that of the
second operand. Else returns FALSE.
list1 <- c(TRUE, 0.1, "apple")
list2 <- c(TRUE, 0.1, "bat")
# Convert lists to character strings
list1_char <- as.character(list1)
list2_char <- as.character(list2)
# Compare character strings
print(list1_char <= list2_char)
Output : TRUE TRUE TRUE
Greater than (>)
Returns TRUE if the corresponding element of the first operand is greater than that of the second
operand. Else returns FALSE.
list1 <- c(TRUE, 0.1, "apple")
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
list2 <- c(TRUE, 0.1, "bat")
print(list1_char > list2_char)
Output : FALSE FALSE FALSE
Greater than equal to (>=)
Returns TRUE if the corresponding element of the first operand is greater or equal to that of the
second operand. Else returns FALSE.
list1 <- c(TRUE, 0.1, "apple")
list2 <- c(TRUE, 0.1, "bat")
print(list1_char >= list2_char)
Output : TRUE TRUE FALSE
Not equal to (!=)
Returns TRUE if the corresponding element of the first operand is not equal to the second operand.
Else returns FALSE.
list1 <- c(TRUE, 0.1,'apple')
list2 <- c(0,0.1,"bat")
print(list1!=list2)
Output : TRUE FALSE TRUE
The following R code illustrates the usage of all Relational Operators in R:
# R program to illustrate
# the use of Relational operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)
# Performing operations on Operands
cat ("Vector1 less than Vector2 :", vec1 < vec2, "\n")
cat ("Vector1 less than equal to Vector2 :", vec1 <= vec2, "\n")
cat ("Vector1 greater than Vector2 :", vec1 > vec2, "\n")
cat ("Vector1 greater than equal to Vector2 :", vec1 >= vec2, "\n")
cat ("Vector1 not equal to Vector2 :", vec1 != vec2, "\n")
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Output
Vector1 less than Vector2 : TRUE TRUE
Vector1 less than equal to Vector2 : TRUE TRUE
Vector1 greater than Vector2 : FALSE FALSE
Vector1 greater than equal to Vector2 : FALSE FALSE
Vector1 not equal to Vector2 : TRUE TRUE
Assignment Operators
Assignment Operators in R are used to assigning values to various data objects in R. The objects
may be integers, vectors, or functions. These values are then stored by the assigned variable names.
There are two kinds of assignment operators: Left and Right
Left Assignment (<- or <<- or =)
Assigns a value to a vector.
vec1 = c("ab", TRUE)
print (vec1)
Output : "ab" "TRUE"
Right Assignment (-> or ->>)
Assigns value to a vector.
c("ab", TRUE) ->> vec1
print (vec1)
Output : "ab" "TRUE"
The following R code illustrates the usage of all Relational Operators in R:
# R program to illustrate
# the use of Assignment operators
vec1 <- c(2:5)
c(2:5) ->> vec2
vec3 <<- c(2:5)
vec4 = c(2:5)
c(2:5) -> vec5
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
# Performing operations on Operands
cat ("vector 1 :", vec1, "\n")
cat("vector 2 :", vec2, "\n")
cat ("vector 3 :", vec3, "\n")
cat("vector 4 :", vec4, "\n")
cat("vector 5 :", vec5)
Output
vector 1 : 2 3 4 5
vector 2 : 2 3 4 5
vector 3 : 2 3 4 5
vector 4 : 2 3 4 5
vector 5 : 2 3 4 5
Taking Input from User in R Programming
Developers often have a need to interact with users, either to get data or to provide some sort of
result. Most programs today use a dialog box as a way of asking the user to provide some type of
input. Like other programming languages in R it’s also possible to take input from the user. For
doing so, there are two methods in R.
Using readline() method
Using scan() method
Using readline() method
In R language readline() method takes input in string format. If one inputs an integer then it is
inputted as a string, lets say, one wants to input 255, then it will input as “255”, like a string. So one
needs to convert that inputted value to the format that he needs. In this case, string “255” is
converted to integer 255. To convert the inputted value to the desired data type, there are some
functions in R,
as.integer(n); —> convert to integer
as.numeric(n); —> convert to numeric type (float, double etc)
as.complex(n); —> convert to complex number (i.e 3+2i)
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
as.Date(n) —> convert to date …, etc
Syntax:
var = readline();
var = as.integer(var);
Note that one can use “<-“ instead of “=”
Example:
# R program to illustrate taking input from the user
# taking input using readline()
# this command will prompt you
# to input a desired value
var = readline();
# convert the inputted value to integer
var = as.integer(var);
# print the value
print(var)
Output:
255
[1] 255
One can also show message in the console window to tell the user, what to input in the program. To
do this one must use a argument named prompt inside the readline() function.
Actually prompt argument facilitates other functions to constructing of files documenting.
But prompt is not mandatory to use all the time.
Syntax:
var1 = readline(prompt = “Enter any number : “);
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
or,
var1 = readline(“Enter any number : “);
Example:
# R program to illustrate taking input from the user
# taking input with showing the message
var = readline(prompt = "Enter any number : ");
# convert the inputted value to an integer
var = as.integer(var);
# print the value
print(var)
Output:
Enter any number : 255
[1] 255
Taking multiple inputs in R
Taking multiple inputs in R language is same as taking single input, just need to define
multiple readline() for inputs. One can use braces for define multiple readline() inside it.
Syntax:
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
or,
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
Example:
# R program to illustrate taking input from the user
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
# taking multiple inputs using braces
var1 = readline("Enter 1st number : ");
var2 = readline("Enter 2nd number : ");
var3 = readline("Enter 3rd number : ");
var4 = readline("Enter 4th number : ");
# converting each value
var1 = as.integer(var1);
var2 = as.integer(var2);
var3 = as.integer(var3);
var4 = as.integer(var4);
# print the sum of the 4 number
print(var1 + var2 + var3 + var4)
Output:
Enter 1st number : 12
Enter 2nd number : 13
Enter 3rd number : 14
Enter 4th number : 15
[1] 54
Using scan( ) method:
Another way to take user input in R language is using a method, called scan() method. This method
takes input from the console. This method is a very handy method while inputs are needed to taken
quickly for any mathematical calculation or for any dataset. This method reads data in the form of a
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
vector or list. This method also uses to reads input from a file also.
Syntax:
x = scan()
scan() method is taking input continuously, to terminate the input process, need to press Enter key 2
times on the console.
Example:
This is simple method to take input using scan() method, where some integer number is taking as
input and print those values in the next line on the console.
# R program to illustrate taking input from the user
# taking input using scan()
x = scan()
# print the inputted values
print(x)
Output:
1: 1 2 3 4 5 6
7: 7 8 9 4 5 6
13:
Read 12 items
[1] 1 2 3 4 5 6 7 8 9 4 5 6
Explanation:
Total 12 integers are taking as input in 2 lines when the control goes to 3rd line then by
pressing Enter key 2 times the input process will be terminated.
Taking double, string, character type values using scan() method
To take double, string, character types inputs, specify the type of the inputted value in
the scan() method. To do this there is an argument called what, by which one can specify the data
type of the inputted value.
Syntax:
x = scan(what = double()) —-for double
x = scan(what = ” “) —-for string
x = scan(what = character()) —-for character
Example: # R program to illustrate taking input from the user
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
# double input using scan()
d = scan(what = double())
# string input using 'scan()'
s = scan(what = " ")
# character input using 'scan()'
c = scan(what = character())
# print the inputted values
print(d) # double
print(s) # string
print(c) # character
Output:
1: 123.321 523.458 632.147
4: 741.25 855.36
6:
Read 5 items
1: geeksfor geeks gfg
4: c++ R java python
8:
Read 7 items
1: g e e k s f o
8: r g e e k s
14:
Read 13 items
[1] 123.321 523.458 632.147 741.250 855.360
[1] "geeksfor" "geeks" "gfg" "c++" "R" "java" "python"
[1] "g" "e" "e" "k" "s" "f" "o" "r" "g" "e" "e" "k" "s"
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Explanation:
Here, count of double items is 5, count of sorting items is 7, count of character items is 13.
Control Statements in R Programming
Control statements are expressions used to control the execution and flow of the program based on
the conditions provided in the statements. These structures are used to make a decision after
assessing the variable. In this article, we’ll discuss all the control statements with the examples.
In R programming, there are 8 types of control statements as follows:
if condition
if-else condition
nested loops
for loop
while loop
repeat and break statement
return statement
next statement
if condition
This control structure checks the expression provided in parenthesis is true or not. If true, the
execution of the statements in braces { } continues.
Syntax:
if(expression){
statements
....
Example:
x <- 100
if(x > 10){
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
print(paste(x, "is greater than 10"))
Output:
[1] "100 is greater than 10"
if-else condition
It is similar to if condition but when the test expression in if condition fails, then statements
in else condition are executed.
Syntax:
if(expression){
statements
....
....
else{
statements
....
....
Example:
x <- 5
# Check value is less than or greater than 10
if(x > 10){
print(paste(x, "is greater than 10"))
}else{
print(paste(x, "is less than 10"))
}
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Output:
[1] "5 is less than 10"
for loop
It is a type of loop or sequence of statements executed repeatedly until exit condition is reached.
Syntax:
for(value in vector){
statements
....
....
Example:
x <- letters[4:10]
for(i in x){
print(i)
Output:
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
Nested loops
Nested loops are similar to simple loops. Nested means loops inside loop. Moreover, nested loops
are used to manipulate the matrix.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Example:
# Defining matrix
m <- matrix(2:15, 2)
for (r in seq(nrow(m))) {
for (c in seq(ncol(m))) {
print(m[r, c])
}
Output:
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
while loop
while loop is another kind of loop iterated until a condition is satisfied. The testing expression is
checked first before executing the body of loop.
Syntax:
while(expression){
statement
....
....}
Example:
x=1
# Print 1 to 5
while(x <= 5){
print(x)
x=x+1
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
repeat loop and break statement
repeat is a loop which can be iterated many number of times but there is no exit condition to come
out from the loop. So, break statement is used to exit from the loop. break statement can be used in
any type of loop to exit from the loop.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Syntax:
repeat {
statements
....
....
if(expression) {
break
Example:
x=1
# Print 1 to 5
repeat{
print(x)
x=x+1
if(x > 5){
break
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
return statement
return statement is used to return the result of an executed function and returns control to the
calling function.
Syntax:
return(expression)
Example:
# Checks value is either positive, negative or zero
func <- function(x){
if(x > 0){
return("Positive")
}else if(x < 0){
return("Negative")
}else{
return("Zero")
func(1)
func(0)
func(-1)
Output:
[1] "Positive"
[1] "Zero"
[1] "Negative"
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
next statement
next statement is used to skip the current iteration without executing the further statements and
continues the next iteration cycle without terminating the loop.
Example:
# Defining vector
x <- 1:10
# Print even numbers
for(i in x){
if(i%%2 != 0){
next #Jumps to next loop
print(i)
Output:
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
Data Structures in R Programming
A data structure is a particular way of organizing data in a computer so that it can be used
effectively. The idea is to reduce the space and time complexities of different tasks. Data structures
in R programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether
they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements are
often of various types). This gives rise to the six data types which are most frequently utilized in
data analysis.
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
The most essential data structures used in R include:
Vectors
Lists
Dataframes
Matrices
Arrays
Factors
Tibbles
Vectors
A vector is an ordered collection of basic data types of a given length. The only key thing here is all
the elements of a vector must be of the identical data type e.g homogeneous data structures. Vectors
are one-dimensional data structures.
Example:
# R program to illustrate Vector Vectors(ordered collection of same data type)
X = c(1, 3, 5, 7, 8)
# Printing those elements in console
print(X)
Output:
[1] 1 3 5 7 8
Lists
A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data
structures. These are also one-dimensional data structures. A list can be a list of vectors, list of
matrices, a list of characters and a list of functions and so on.
Example:
# R program to illustrate a List
# The first attributes is a numeric vector
# containing the employee IDs which is
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
# created using the 'c' command here
empId = c(1, 2, 3, 4)
# The second attribute is the employee name
# which is created using this line of code here
# which is the character vector
empName = c("Debi", "Sandeep", "Subham", "Shiba")
# The third attribute is the number of employees
# which is a single numeric variable.
numberOfEmp = 4
# We can combine all these three different
# data types into a list
# containing the details of employees
# which can be done using a list command
empList = list(empId, empName, numberOfEmp)
print(empList)
Output:
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4
Dataframes
Dataframes are generic data objects of R which are used to store the tabular data. Dataframes are the
foremost popular data objects in R programming because we are comfortable in seeing the data
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
within the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of
vectors of equal lengths.
Data frames have the following constraints placed upon them:
A data-frame must have column names and every row should have a unique name.
Each column must have the identical number of items.
Each item in a single column must be of the same data type.
Different columns may have different data types.
To create a data frame we use the data.frame() function.
Example:
# R program to illustrate dataframe
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")
# A vector which is a character vector
Language = c("R", "Python", "Java")
# A vector which is a numeric vector
Age = c(22, 25, 45)
# To create dataframe use data.frame command
# and then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
print(df)
Output:
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Matrices
A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know
rows are the ones that run horizontally and columns are the ones that run vertically. Matrices are
two-dimensional, homogeneous data structures.
Now, let’s see how to create a matrix in R. To create a matrix in R you need to use the function
called matrix. The arguments to this matrix() are the set of elements in the vector. You have to pass
how many numbers of rows and how many numbers of columns you want to have in your matrix
and this is the important point you have to remember that by default, matrices are in column-wise
order.
Example:
# R program to illustrate a matrix
A = matrix(
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
# No of rows and columns
nrow = 3, ncol = 3,
# By default matrices are
# in column-wise order
# So this parameter decides
# how to arrange the matrix
byrow = TRUE
print(A)
Output:
[,1] [,2] [,3]
[1,] 1 2 3
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
[2,] 4 5 6
[3,] 7 8 9
Arrays
Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-
dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates
3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures.
Now, let’s see how to create arrays in R. To create an array in R you need to use the function called
array(). The arguments to this array() are the set of elements in vectors and you have to pass a
vector containing the dimensions of the array.
Example:
# R program to illustrate an array
A = array(
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8),
# Creating two rectangular matrices
# each with two rows and two columns
dim = c(2, 2, 2)
print(A)
Output:
,,1
[,1] [,2]
[1,] 1 3
[2,] 2 4
,,2
[,1] [,2]
[1,] 5 7
[2,] 6 8
Factors
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
Factors are the data objects which are used to categorize the data and store it as levels. They are
useful for storing categorical data. They can store both strings and integers. They are useful to
categorize unique values in columns like “TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc..
They are useful in data analysis for statistical modeling.
Now, let’s see how to create factors in R. To create a factor in R you need to use the function called
factor(). The argument to this factor() is the vector.
Example:
# R program to illustrate factors
# Creating factor using factor()
fac = factor(c("Male", "Female", "Male",
"Male", "Female", "Male", "Female"))
print(fac)
Output:
[1] Male Female Male Male Female Male Female
Levels: Female Male
Tibbles
Tibbles are an enhanced version of data frames in R, part of the tidyverse. They offer improved
printing, stricter column types, consistent subsetting behavior, and allow variables to be referred to
as objects. Tibbles provide a modern, user-friendly approach to tabular data in R.
Now, let’s see how we can create a tibble in R. To create tibbles in R we can use the tibble function
from the tibble package, which is part of the tidyverse.
Example:
# Load the tibble package
library(tibble)
# Create a tibble with three columns: name, age, and city
my_data <- tibble(
name = c("Sandeep", "Amit", "Aman"),
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)
age = c(25, 30, 35),
city = c("Pune", "Jaipur", "Delhi")
# Print the tibble
print(my_data)
Output:
name age city
<chr> <dbl> <chr>
1 Sandeep 25 Pune
2 Amit 30 Jaipur
3 Aman 35 Delhi
K.NAGA JYOTHI MA,M.Tech(Department of Computer Science)