0% found this document useful (0 votes)
44 views11 pages

Dads301 (U)

R and Python are both widely used programming languages for data analysis and machine learning. While they share properties like conditionals and loops, they differ in aspects like syntax, libraries, and learning curve. R was designed specifically for statistical analysis and has specialized packages and data structures for that purpose. Python has a simpler syntax and is more general-purpose, making it easier for beginners to learn. Both languages have large communities but R is used more commonly in academics and research.

Uploaded by

Thrift Armario
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views11 pages

Dads301 (U)

R and Python are both widely used programming languages for data analysis and machine learning. While they share properties like conditionals and loops, they differ in aspects like syntax, libraries, and learning curve. R was designed specifically for statistical analysis and has specialized packages and data structures for that purpose. Python has a simpler syntax and is more general-purpose, making it easier for beginners to learn. Both languages have large communities but R is used more commonly in academics and research.

Uploaded by

Thrift Armario
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

SET-1

Que.-1:- Explain different features of R. Differentiate between R and Python.

Ans.:- R has the following major characteristics:

1. Seamless data processing and storage: One of R's most powerful characteristics is its
ability to manage data of varied sizes and sources. R allows users to work with tiny data,
which is generally less than one gigabyte in size.
2. Fast array operators: Another key aspect of R is the availability of a fast operator on array
objects. Arrays are data segments that a user may extract for additional analysis and
visualization.
3. Efficient data wrangling and analysis tools: Another aspect that makes R popular is the
availability of data wrangling and analysis tools. The process of converting raw data into
a consumable format is known as data wrangling.
4. High-quality graphics for display or print: In addition to all of the characteristics stated
above, which are also available in other high-level programming languages such as
Python, one of the notable advantages of R is its ability to create high-quality graphics
output. R is the top choice for any job that demands the conveyance of information
through high-quality graphical output.
5. Conditionals, loops, user-defined functions, and input-output facilities are examples of
traditional programming principles. R also offers typical programming capabilities like
conditionals, for-loops, if-statements, user-defined functions, and so on, in addition to all
of these high-level features. These low-level features enable users to write basic code for
debugging or other uses.
6. Others: R is platform agnostic. It offers support for several tools. It also allows for
smooth connection with a variety of third-party solutions.

R and Python are both widely used programming languages for data analysis, statistical
modeling, and machine learning. While they share certain properties, they also have differences
that make them suited for diverse applications. Here are some important distinctions between R
and Python:
1. Syntax & Purpose:
● R: R is a programming language that is especially developed for statistical
analysis and data processing. It features a syntax that is specifically designed for
statistical operations and a large number of built-in statistical functions and
packages.
● Python is a general-purpose programming language with an easy-to-read syntax.
It is well-known for its adaptability, as it may be used for reasons other than
statistical analysis, such as web development, scripting, and automation.
2. Libraries & Packages:
● R: The R programming language contains a large ecosystem of specialized
packages for statistical modeling, data visualization, and machine learning. R data
analysis packages such as ggplot2, dplyr, and caret are commonly used.
● Python includes a plethora of data analysis packages, including NumPy, Pandas,
and Matplotlib. Python has also gained prominence in machine learning with
packages such as scikit-learn, TensorFlow, and PyTorch.
3. Data Structures:
● R: R includes specialized data structures such as data frames that are ideal for
working with tabular data. It also includes factors, which may be used to describe
categorical data.
● Python features data structures such as lists & dictionaries that are adaptable and
can handle a wide range of data types. Python's Pandas module contains a
DataFrame object, which is comparable to R's data frame and is commonly used
for data processing.
4. Learning Curve:
● R: R has a higher learning curve than Python, especially for people with no
programming experience. Its syntax and emphasis on statistical procedures might
be difficult for newcomers at first.
● Python: With a simpler syntax and a strong supporting community, Python is
frequently regarded as more beginner-friendly. It features a gentle learning curve,
making it easy to get started for newbies.
5. Community & Usage:
● R has a large community of statisticians, data scientists, & academics that
contribute to the huge library of statistical packages available. It is commonly
used in academics and research.
● Python has a bigger and more diversified community with uses that extend
beyond statistics. It is utilized in a wide range of businesses and fields, including
web development, scientific computing, and artificial intelligence.

Que.-2:- Explain Fundamental Data Types in R. Differentiate between List, Matrix and
Data Frames in R.

Ans.:- One of the features of an object is the ‘data type.’ Basically, there are 5 data types in R:
1. Character
2. Numeric/ Real /Decimal /Floating point
3. Integer
4. Logical
5. Complex
Fundamental data types are utilized to generate more complex objects, especially when dealing
with massive amounts of data. The code below shows how to create objects of various data kinds
in R.

1. Numeric:
● Real numbers are stored in numeric data types. Integers (whole numbers) and
floating-point numbers (decimal values) are both included. In R, for example, 5, -
2, and 3.14 are all numerical values.
2. Character:
● The character data type is used to hold text data such as names, phrases, or any
character sequence. To indicate that text data is a character, it is contained in
quotations (either single or double). In R, for example, a character value is "Hello,
world!"
3. Integer:
● Whole numbers are stored in integer data types. It is distinguished from the
numeric data type by the absence of decimal places. In R, integer numbers such as
10, -3, and 0 are examples.
4. Logical:
● The logical data type is used to express logical truth values by storing boolean
values. It can take one of two values: TRUE or FALSE. Logical values are
frequently produced via logical operations or comparisons. For example, the
statement 3 > 2 is TRUE.
5. Complex:
● The complex data type is used to hold numbers that have both real and imaginary
portions. Complex numbers are expressed in R as a + bi, where a represents the
real component and b represents the imaginary part. In R, for example, 2 + 3i is a
complex value.
6. Raw:
● The raw data type is used to hold unprocessed bytes of data. It is mostly used for
low-level operations and the manipulation of binary data. A raw value is
represented by a series of hexadecimal numbers. In R, for example, 0x41 is a raw
value.

Lists, matrices, and data frames are three separate data structures in R that are used to organize
and manipulate data. Here are the primary distinctions between them:

1. List:
● In R, a list is a flexible data structure that may hold items of many sorts, including
vectors, matrices, data frames, and even other lists. A list's elements can differ in
length and data type.
● Lists are generated by combining separate objects using the c() method or by
using the list() function.
● Indexing can be used to access list elements, and the index can be numeric or
character-based.
● Lists are widely used to store and manage large, heterogeneous data structures, or
when several forms of storage are required.
2. Matrix:
● In R, a matrix is a two-dimensional data structure with components of the same
data type organized in rows and columns. A matrix's elements must all be the
same length.
● Matrices are formed by supplying the data values and the dimensions (number of
rows and columns) of the matrix with the matrix() function.
● Indexing, row and column numbers, and logical expressions can all be used to
access matrix elements.
3. Data Frame:
● In R, a data frame is a two-dimensional tabular data structure that is comparable
to a matrix but has more flexibility. It can store and manipulate heterogeneous
datasets since it may contain columns of multiple data kinds.
● Data frames are commonly constructed by importing data from external sources
such as CSV files or by using methods such as data.frame().
● Columns in a data frame can be accessed by column names or indexing.

Que.-3:- Define Exploratory Data Analysis. List various functions available in R for
Exploratory Data Analysis.

Ans.:- Exploratory Data Analysis entails visualizing, manipulating, and modifying data in order
to answer questions about the data in the dataset. This is a natural process because visualization
and transformation may raise new questions about the data. This process is repeated until the
data is clear enough to be given to the stakeholders.

R has several functions and packages for exploratory data analysis (EDA). Here are some R
functions that are often used in EDA:

1. Summary Statistics:
● summary() generates summary statistics for each variable in a dataset (mean,
median, quartiles, and so on).
● quantile() computes a numeric variable's quantiles (percentiles).
● table(): This function computes the frequency distribution of categorical variables.
● describe(): Returns descriptive statistics for a dataset (mean, standard deviation,
skewness, etc.) using the psych package.
2. Data Visualization:
● plot(): This function generates scatter plots, line plots, bar graphs, histograms, and
other basic visualizations.
● ggplot2: A powerful programme for making aesthetically beautiful and
configurable graphs.
● ggpairs(): Generates a scatter plot matrix for several variables in a dataset.
● boxplot(): Creates boxplots to visualize a variable's distribution and variability.
● hist() generates histograms to investigate the distribution of a numeric value.
● pairs(): Generates scatterplot matrices for investigating relationships between
several variables.
3. Missing Data Analysis:
● is.na(): This function finds missing values in a dataset.
● complete.cases(): This function looks for full cases (rows with no missing values).
● na.omit(): Removes rows from a dataset that have missing values.
4. Correlation and Relationships:
● cor() is a function that computes correlation coefficients between variables.
● cor.test(): Runs correlation hypothesis tests.
● scatterplotMatrix() and pairs are two examples.panels(): Generates scatterplot
matrices that display correlations.
5. Outlier Detection:
● boxplot.stats(): Finds outliers using the boxplot technique.
● The vehicle package's outlierTest() function detects outliers using several
statistical techniques.
● Grubbs' test for outlier detection is performed by grubbs.test() from the outliers
package.
6. Data Manipulation:
● subset(): Extracts subsets of data depending on the conditions supplied.
● The dplyr package includes methods for efficient data processing such as filter(),
select(), modify(), group_by(), and summarize().

These are just a few of the numerous exploratory data analysis functions available in R. R's vast
ecosystem of packages and libraries provides even more specialized functions for specific EDA
tasks including time series analysis, spatial analysis, and more.
SET-2

Que.-4:-Explain type conversion in Python. Explain how sets can be iterated in Python.

Ans.:- Type Conversion in Python:


Type conversion, often known as type casting, is the process of transforming an object's data
type from one type to another. Python includes built-in routines for type conversion. The
following are some of the most widely used type conversion functions in Python:

1. int():
● Changes a no. or a string into an int. For e.g.:
num = int("10") # Converts the string "10" to the integer 10

2. float():
● changes a no. or a string into a floating-point no. For e.g.:

num = float("3.14") # Converts the string "3.14" to the float 3.14

3. str():
● changes an object into a string. For e.g.:
num = 10
num_str = str(num) # Changes the int 10 to the string "10"

4. list():
● changes an iterable object (like a tuple, set, or string) into a list. For e.g.:

my_tuple = (1, 2, 3)
my_list = list(my_tuple) # changes the tuple (1, 2, 3) to the list [1, 2, 3]

5. tuple():
● changes an iterable object (like a list, set, or string) into a tuple. For e.g.:

my_list = [1, 2, 3]
my_tuple = tuple(my_list) # changes the list [1, 2, 3] to the tuple (1, 2, 3)

6. set():
● changes an iterable object (such as list, tuple, or string) into a set. For e.g.:
my_list = [1, 2, 2, 3]
my_set = set(my_list) # Converts the list [1, 2, 2, 3] to the set {1, 2, 3}

You may use these type conversion methods to convert data between different types, allowing
you to execute various operations and manipulations on the data.

Iterating over Sets in Python: Sets are unordered collections of unique items in Python. Iterating
over a set entails accessing each element one by one. In Python, there are several techniques to
iterate across sets:
1. Using a for loop:
my_set = {1, 2, 3, 4}
for element in my_set:
print(element)

Output:
1
2
3
4

2. Utilizing set comprehension:

my_set = {1, 2, 3, 4}
result = [element for element in my_set]
print(result)

Output:
[1, 2, 3, 4]

3. changing the set to a list & iterating by the list:

my_set = {1, 2, 3, 4}
my_list = list(my_set)
for element in my_list:
print(element)

1
2
3
4

Sets are iterable objects, which means they may be used in various looping structures & have
actions performed on each element. Iterating over a set eliminates any duplicate values since sets
only include unique components.

Que.-5:- How to use a regular expression using the built -in function in the ‘re’ package.

Ans.:- Python Re' module has built-in routines for working with regular expressions. Here's a
step-by-step tutorial on utilizing regular expressions with the're' package:
1. Import the 're' package:
import re

2. Describe the frequent expression pattern:

pattern = r"your_pattern_here"

The 'r' preceding the string indicates that it is a raw string, which considers backslashes as literal
characters. It's widely used with regular expressions to prevent special characters from escaping.

3. Utilize the frequent expression functions:


● re.search(pattern, string):
● Searches for the 1st appearance of the pattern in the string.
● Returns a match object if a match is found, or None if no match is
discovered.
match = re.search(pattern, string)

re.match(pattern, string):
● Connects the structure only at the beginning of the string.
● Gives a match object if the pattern matches at the initiation of the string, or None if it
doesn't.
connect = re.match(pattern, string)

re.findall(pattern, string):
● Gives all non-overlapping appearances of the structure in the string as a list of strings.

connects = re.findall(pattern, string)

re.finditer(pattern, string):
● Gives an iterator yielding match objects for all non-overlapping appearances of the
structure in the string.

connects_iterator = re.finditer(pattern, string)

4. Retrieving matched content:


● match.group():
● Returns the matched substring.
matched_text = match.group()

5. Utilizing pattern modifiers and flags:


● Various modifiers & flags can be used with regular expressions to modify the matching
behavior. For example:
● re.IGNORECASE: Ignore case when matching.
pattern = r"your_pattern_here"
match = re.search(pattern, string, flags=re.IGNORECASE)
6. Frequent expression patterns:
● Regular expressions include special characters & sequences that describe search
structures. Some frequently adopted elements include:
● Character classes: [a-z], [0-9], [A-Za-z], etc.
● Metacharacters: . ^ $ * + ? { } [ ] \ | ( )
● Anchors: ^ (start of the string), $ (end of the string)
● Quantifiers: *, +, ?, {n}, {n, m}, etc.
● Groups & capturing: ( ), (?: ), (?P<name> ), etc.
Remember to read the Python documentation for the're' package for more info. and examples on
regular expressions and how to use them in Python.

Que.-6:- How Data Frames can be merged using Pandas. Explain with examples.

Ans.:- The merge() method in Pandas is used to join data frames based on similar columns or
indices. It supports many forms of joins, similar to SQL procedures. Here's a description of how
to combine data frames with Pandas, complete with examples:
Consider the following two data frames:
pandas as pd import

df1 = pd.DataFrame({'ID': [1, 2, 3, 4],


'Name': ['John', 'Alice', 'Bob', 'Emma'],
'Age': [25, 28, 32, 30]})

df2 = pd.DataFrame({'ID': [1, 2, 5, 6],


'Salary': [5000, 6000, 5500, 7000]})

1. Inner Join:
● An inner join delivers just the records that match from both data frames, discarding the
rest.

merged_inner = pd.merge(df1, df2, on='ID', how='inner')


print(merged_inner)

Output:
ID Name Age Salary
1 John 25 5000
2 Alice 28 6000

2. Left Join:
● A left join returns all of the records from the left data frame as well as the records that
match from the right data frame. NaN values will be assigned to non-matching entries
from the correct data frame.

merged_left = pd.merge(df1, df2, on='ID', how='left')


print(merged_left)

Output:

ID Name Age Salary


1 John 25 5000
2 Alice 28 6000
3 Bob 32 NaN
4 Emma 30 NaN

3. Right Join:
● A right join retrieves all of the records from the right data frame as well as the records
that match from the left data frame. Records from the left data frame that do not match
will have NaN values.

merged_right = pd.merge(df1, df2, on='ID', how='right')


print(merged_right)

Output:

ID Name Age Salary


1 John 25 5000
2 Alice 28 6000
5 NaN NaN 5500
6 NaN NaN 7000
4. Outer Join:
● The records from both data frames are returned through an outer join. Records that do not
match will have values of NaN.
merged_outer = pd.merge(df1, df2, on='ID', how='outer')
print(merged_outer)

Output:

ID Name Age Salary


1 John 25 5000
2 Alice 28 6000
3 Bob 32 NaN
4 Emma 30 NaN
5 NaN NaN 5500
6 NaN NaN 7000

These are the fundamental instances of using Pandas to combine data frames. Pandas' merge()
method handles several forms of connections and allows you to merge data frames based on
similar columns or indices. You may also use a list to indicate several columns to merge on, or
you can merge on various column names in each data frame.

You might also like