What is vector?
a vector is a basic data structure that can hold multiple values of the same type.
Vectors are one-dimensional arrays, and they can contain numeric, character, or logical
data. Vectors are a fundamental aspect of R, and many operations in R are vectorized
for e iciency.
Di erent classes used on R programming?
In R programming, data can be stored in various classes, each suited for di erent types
of data. Here are some of the most commonly used classes in R:
1. Numeric:
o This class includes all real numbers (both integers and decimal numbers).
o Example: x <- 42, y <- 3.14
2. Integer:
o This class includes whole numbers.
o Example: x <- as.integer(42)
3. Character:
o This class includes strings of text.
o Example: x <- "Hello, R!"
4. Logical:
o This class includes Boolean values: TRUE and FALSE.
o Example: x <- TRUE, y <- FALSE
5. Factor:
o This class is used for categorical data and can store both ordered and
unordered factors.
o Example: x <- factor(c("low", "medium", "high"))
6. Complex:
o This class includes complex numbers with real and imaginary parts.
o Example: x <- 2 + 3i
7. List:
o This class can store elements of di erent types, including vectors, lists,
and even functions.
o Example: x <- list(a = 1, b = "Hello", c = TRUE)
8. Matrix:
What do you call a function in R?
In R programming, a function is a named piece of code that performs a specific task.
Functions are used to encapsulate code into reusable blocks that can be executed whenever
needed. Functions can accept arguments (inputs), perform operations, and return a result.
What is plotting?
plotting refers to the process of creating visual representations of data, such as charts
and graphs. Plotting is a fundamental tool in data analysis and visualization, allowing us
to better understand patterns, trends, and relationships within data. In R, plotting is an
essential feature, and there are various functions and packages available to create
di erent types of plots.
What do you mean by normal distribution?
In R, you can easily generate random numbers from a normal distribution using the
rnorm() function. This function allows you to specify the number of observations, the
mean, and the standard deviation.
Mention two application of T distribution?
The t-distribution (Student's t-distribution) is especially useful in scenarios where the
sample size is small, and the population standard deviation is unknown.
. Confidence Intervals for the Mean:
Hypothesis Testing:
What is Hypothesis Testing?
Hypothesis testing is a fundamental method in statistics used to make decisions or
draw conclusions about a population based on sample data. It involves testing an
assumption (the hypothesis) regarding a population parameter.
Null Hypothesis
Alternative Hypothesis
What is linear regression?
linear regression is a statistical method used to model the relationship between one
dependent variable and one or more independent variables by fitting a linear equation
to the observed data. The purpose of linear regression is to predict the value of the
dependent variable based on the values of the independent variables.
Explain uniform distribution with respect to probability density function with an
example?
the Uniform distribution is a probability distribution where all outcomes in a given
range are equally likely. It is characterized by two parameters: the minimum value aaa
and the maximum value bbb. The Probability Density Function (PDF) for the
continuous uniform distribution
What is cumulative sum,product,minimum,maximum? Explain with R program
Cumulative Sum (cumsum())
The cumulative sum is the running total of elements in a vector. It adds each element
to the sum of the preceding elements.
# Example vector
x <- c(2, 4, 6, 8)
# Compute cumulative sum
cumsum_x <- cumsum(x)
print(cumsum_x) # Output: 2, 6, 12, 20
Cumulative Product (cumprod())
The cumulative product multiplies each element by the product of the preceding
elements.
# Example vector
x <- c(2, 4, 6, 8)
# Compute cumulative product
cumprod_x <- cumprod(x)
print(cumprod_x) # Output: 2, 8, 48, 384
Cumulative Minimum (cummin())
The cumulative minimum tracks the smallest value encountered so far as you move
through the vector.
# Example vector
x <- c(5, 3, 7, 1, 4)
# Compute cumulative minimum
cummin_x <- cummin(x)
print(cummin_x) # Output: 5, 3, 3, 1, 1
Cumulative Maximum (cummax())
The cumulative maximum tracks the largest value encountered so far as you move
through the vector.
# Example vector
x <- c(5, 3, 7, 1, 4)
# Compute cumulative maximum
cummax_x <- cummax(x)
print(cummax_x) # Output: 5, 5, 7, 7, 7
# Multiple Visualizations in One Layout
par(mfrow = c(2, 2)) # Set layout: 2 rows, 2 columns
Explain data visualization technique with neat diagram
# Bar Chart
categories <- c("A", "B", "C", "D")
values <- c(10, 15, 7, 20)
barplot(values, names.arg = categories, col = "blue", main = "Bar Chart")
# Line Chart
time <- c(1, 2, 3, 4, 5)
values <- c(2, 4, 6, 8, 10)
plot(time, values, type = "o", col = "red", main = "Line Chart")
# Histogram
data <- rnorm(1000, mean = 50, sd = 10)
hist(data, col = "green", breaks = 20, main = "Histogram")
# Scatter Plot
x <- rnorm(100)
y <- 2 * x + rnorm(100)
plot(x, y, col = "blue", pch = 19, main = "Scatter Plot")
Di rence bar and histogram plotting?
Bar plots and histograms are two commonly used types of visualizations in data
analysis, but they serve di erent purposes and have distinct characteristics.
Aspect Bar Plot Histogram
Displays the distribution of numerical
Purpose Displays categorical data.
data by dividing it into intervals (bins).
Represents categories (discrete Represents intervals (bins) of
X-Axis
data). continuous data.
Represents the value or count for Represents the frequency (count) of
Y-Axis
each category. data points in each bin.
Bars are spaced apart to
Bars are adjacent, with no gaps, to show
Spacing emphasize that categories are
continuity of the data.
distinct.
Data
Deals with qualitative data. Deals with quantitative data.
Type
Categories can be plotted in any
Order Bins are plotted in a numerical order.
order.
Product sales by category, Heights of people, test scores, daily
Examples
population by region. temperatures.
Discuss t-test with example?
T-Test in Statistics
A t-test is a statistical test used to compare the means of two groups to determine if
there is a significant di erence between them. It is commonly used when the sample
size is small, and the population standard deviation is unknown.
Types of T-Tests
1. One-Sample T-Test: Compares the sample mean to a known value (e.g.,
population mean).
2. Independent Two-Sample T-Test: Compares the means of two independent
groups.
3. Paired Sample T-Test: Compares the means of two related groups (e.g., before
and after treatment).
One-sample T-test
# Example data
heights <- c(31, 29, 32, 28, 30, 34, 29)
# Perform one-sample t-test
t_test_result <- t.test(heights, mu = 30)
# Display results
print(t_test_result)
Two sample T-test
# Example data
method_A_scores <- c(75, 80, 85, 88, 90, 85)
method_B_scores <- c(60, 65, 70, 72, 68, 75)
# Perform independent two-sample t-test
t_test_result <- t.test(method_A_scores, method_B_scores)
# Display results
print(t_test_result)
paired sample T-test
# Example data (before and after treatment)
before_diet <- c(80, 85, 90, 78, 92)
after_diet <- c(76, 82, 88, 74, 89)
# Perform paired t-test
t_test_result <- t.test(before_diet, after_diet, paired = TRUE)
# Display results
print(t_test_result)
Explain probablity function in detail?
In R programming, you can work with di erent probability functions such as the
probability mass function (PMF) for discrete distributions or the probability density
function (PDF) for continuous distributions using various functions provided by the stats
package and others. Here, we'll explore examples of how to generate and visualize these
probability functions.
Probability Mass Function (PMF) Example
A discrete probability distribution like the Binomial distribution can be used to illustrate
the PMF.
n <- 10
p <- 0.5
x <- 0:n
pmf <- dbinom(x, size = n, prob = p)
plot(x, pmf, type = "h", col = "blue",
main = "Binomial PMF", xlab = "Number of Successes", ylab = "Probability")
Probability Density Function (PDF) Example
A continuous probability distribution like the Normal (Gaussian) distribution can be
used to illustrate the PDF.
x <- seq(-4, 4, by = 0.1)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l", col = "green", lwd = 2,
main = "Normal Distribution PDF", xlab = "X", ylab = "Density")
Cumulative Distribution Function (CDF) Example
To illustrate the cumulative probability, you can use the pnorm function for a normal
distribution.
# Example of Cumulative Distribution Function (CDF) for a Normal Distribution
x <- seq(-4, 4, by = 0.1)
cdf <- pnorm(x, mean = 0, sd = 1)
plot(x, cdf, type = "l", col = "red", lwd = 2,
main = "Normal Distribution CDF", xlab = "X", ylab = "CDF")