0% found this document useful (0 votes)
6 views4 pages

R Notes

Uploaded by

2tmjzgbwdm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

R Notes

Uploaded by

2tmjzgbwdm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

load

Statistical inference
(statr) 1. StrC>
:library
c.
library (dplyr) use to return a list of objectand their structure.
3.
library (ggplotz)
2. Side
by side box plots =

to remove no soxplot (AA-b, data =


-, "A", ycab:" b")
nlab:

-
Sum (n, [Link]:TRUE ) first variable sand variable
-

WA. remove

to compart mean of a
groups
a data 1.3

+group-by CA')%>1
-summarise (mean-weight :
mean weight)
↓ "habit' mean-weight'
I 1.23
-

2. 4.56

3. NA
7.84

NA to
compare
e,
no a
group
given (data weight, datas habit,
data
by summary)
parameter were
3. to conduct
interal-mean,
hypotesting confidence
estimating
a

sample
dist
-
sinference
Cy M,nitdeddata---Iac,
median
type: "e', null 0,
proportion
=
=

null
dist alternative:less method:simulation
greater

0.95

- inference (y:ran, data:


method:"theoretical" (
-istatistic: 'mean', type:'Ci', conf-lere) 0.99
=

tidyvere
Install:
install
packages ('dplyr')
IPdd

>library (dplyr)

load EBA
1.
library stats ( 1. load data

2.
Library (aplyr)
3.
library (ggplotz) #summarize data
a. Summarize Data dataset

summary (data name)


terms: dim (dataname] to get dimension of the dataset in term num of
of
now a column.

1.
Min: minimum value

2. IstQu=25th
percentile
3. Visualize the Data.
using (gyp1+)
skewness
3. Median:median value create
histogram of certain variable
-

Mean:mean
4. value
>ggplotC dataname, als (n-variable)) .5%

5. 3rd Qu:75th
percentive +geom-histogram (binwidth:10)

6. Max: Maximum value.

#create
scatterplotofvariable,
is variable a fix variable
using a

ggplot (data:dataname) +
grom-point(mapping:aes ( =
-1y
= - ))

Create
Barplot #create boxplot
#

ggplot (dataname, als


(u=-, <ggplot (data:dataname, als (n=vary, yeare13%.2%
(fill:"blue")
y
=
- 1) geom-bar (stat:
+

+geom-boxplot

barplot (Ata, I correlation


create matrixof around to ade.)
main: I >round (cor (dataname ic C'r,'' v.2',' v.3'....3]),a)
• -1 indicates a perfectly negative linear correlation between two variables
nimb:""ylab:" " • 0 indicates no linear correlation between two variables
• 1 indicates a perfectly positive linear correlation between two variables
!?ename'scamee
beside:TRUE)

a mutate2) Add new variable 3. Visualize the data


using [pplyr)
b select () Selectvariable summarize
#
()"?A var Remove NAvalu
X

"filter selectobservations summarize (vary, delay:mean (dep-dcay), [Link]=TEME)

## deldy
d. arrange's ordering of
the row

#
# I 12.6
e. rename is Rename war, name

f.
group-by Graps data summarize group-by, *AFs, LA45 I
+

Summarise () CVar1, v.2, V37%.3%


g. gives summary group-by
+ summarize (mean:mean (Var4) 3

from nyc-adu
<

ran-flights. 1.71.
+group-by (Origin) 1.2%
+summarize (meandd:meansdlp-delay), sd-dd:
sd<dyp-delay), n n(s)
=

1. Measure of center 1. Mean

2 median

Measure
spread min
2 -
of 1 a max

2. Std dd, var

3. Percentile

4.21 IQR
&
R codes -

Regression.
Scatterplot
a protecta spacey-y>
1. (
+
gen-point

2. Correlation a data name . summarise (corcy,n))


coefficient b. cor (data$-, data - 3

dependent
war
&

which is more
wantto
are Vardadd when this compare
significantpredictor]
, independent L
M -xX - XX

3. Run linear a model-namec - lm (y-n, data =


-
>

Regression ·
Then to show model
output:
summary (model-name) summary
5 ([Link] -xx)

Intercept"mean value of the


response value when all of the

variable in modal
predictor
0
=

:value o f
y when wis

Plotregression gyplot (data:-, y-y))


4. line (
grom-point (shape-1
a aes (n= n, +

**- *I*
geom-smooth (method:"Im",
scatterplot se:FALSE)
on t =

stat
>plot- S
SE:TRUE the of Regression
-
- Im( to see standard error
-
,
*
En PAEph bEY
>abline C-1-)=LEMIAAY
4.1
Jitterplot a
ggplot
(data:-,aes (n=x, [Link](
·
scatterplot +geom-point (position:position -jitter (r: -
,h
=

- 1)

AA I:Do +
ylab("-"( · NULL

nab("-")
+

·0.1

5. View data of fitted & a New-of-name:cbind Cdf, model - name $fitted values, model- name presiduals)
Residual

6. Residual plotto check a


ggplot [data:Mode)- name, des(u=n, y: resid)) +geom-point()
A cinarity

List(model- names residuals, breaks:...)


7.
Histogram for a.

* Residual

(M-xx -xx $residuals ( dots GY * AZ:not normal.


qqnormSim
8.99 plotto test
a
ggnorm (model- name I residuals (
7
of
normality qqline (model-name si reciduals)
Residual.

9. of
Checking
a

Multicollinearity
the
potential-predictoria,
Corplot (cor(potential predictor),
-
method: "number" s

Air 1 f
10. Run multiple
a.
Model-namec - 1m
(yU rulerva, data:-)
-
Linear Regression summary (model- name ( variables.
independent
1. Filter some variable.
t #Fr
saboc- [dataname] %.5% filter (variable="(
·
Histogram ↑

s nist (abc $variable, breaks: 5(

Scatterplot
·

ggplot (data:abs) (if%2% notwork,


justuce +
a +

geom-point(mapping y ()
(n
aes
1
= =
=

2. Barchart:stacked (to find the largest number of


a var. (

(data:
a
ggplot Itgeom-bar
me
(mapping aes
= (n=
fill= (-geom-col (position:"stack")
use to create bar chart
· to separate plot each
for differentvariable, use
facet-wrap.
(data:- +
ggplot
a

geom-bur (mapping: aes (n =-)) +

facet-wrap (- *-Y variable ASA

st.*A
e

Man!
3. Variance test
[abc $ (
I var
-
> var (abod $( 0.558.12
>[Link] Cabos abcd$S 0
-
05
p
>

p
< 0.05
&
p10.05. Rejectnull, acceptHi :resethall acceptnull
ifC1 contain 1:nodiff
between arms of
study

4. Residual fitted
plot
is

< res<- resid (model name)

a
plot (fitted (modelname), res)

, abline 10,01

Hypothesis testing 1985


1. T-test Sindependenttwo-sample test)
greater
> t (dataBaba-data beef, falternative:"two-sided"),
test mu=0, confleve) 0.95)
=

> t test (data sabo, data $bCd) SURE


↑ I to

2. ANOVA

One
way
·

>one-way<-900CY-A, data: 3
summary (one way

Two
way
·

, two (YrA +
B, data= C
way--nor
>summary (two-way s

Correlation
·
coefficient
s round (cor(dataname [CC'abs',' bad', 'def'(3),2]

You might also like