0% found this document useful (0 votes)
51 views16 pages

Analisis Eksplorasi Data

The document shows code for creating histograms of various datasets using base R and ggplot2 graphics. It includes code to customize histograms by changing colors, labels, axes, breaks, densities, and adding density lines. The document explores customizing histograms for the AirPassengers dataset and chol (age) dataset. At the end, it shows code for reading in the Boston housing dataset and beginning to create a histogram for the INDUS variable.

Uploaded by

andi fahira alsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views16 pages

Analisis Eksplorasi Data

The document shows code for creating histograms of various datasets using base R and ggplot2 graphics. It includes code to customize histograms by changing colors, labels, axes, breaks, densities, and adding density lines. The document explores customizing histograms for the AirPassengers dataset and chol (age) dataset. At the end, it shows code for reading in the Boston housing dataset and beginning to create a histogram for the INDUS variable.

Uploaded by

andi fahira alsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

ANDI FAHIRA ALSA 06211840000004 1/04/2020

AED KELAS B
> chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"),
header = TRUE)
> hist(AirPassengers)

Histogram of AirPassengers
20
Frequency

15
10
5
0

100 200 300 400 500 600

AirPassengers

> hist(chol$AGE)

Histogram of chol$AGE
50
40
Frequency

30
20
10
0

20 30 40 50 60

chol$AGE
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> hist(AirPassengers,
+ main="Histogram for Air Passengers",
+ xlab="Passengers",
+ border="blue",
+ col="green",
+ xlim=c(100,700),
+ las=1,
+ breaks=5)

Histogram for Air Passengers

40
Frequency

30

20

10

100 200 300 400 500 600 700

Passengers

> hist(AirPassengers, main="Histogram for Air Passengers") # mengganti judul


dari histogram

Histogram for Air Passengers


20
Frequency

15
10
5
0

100 200 300 400 500 600

AirPassengers
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B

> hist(AirPassengers, border="blue", col="green") #memberi warna pada


histogram

Histogram of AirPassengers
20
Frequency

15
10
5
0

100 200 300 400 500 600

AirPassengers

> hist(AirPassengers, xlim=c(100,700), ylim=c(0,30)) #memberikan batas untuk


sumbu x dan y

Histogram of AirPassengers
30
25
20
Frequency

15
10
5
0

100 200 300 400 500 600 700

AirPassengers
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> hist(AirPassengers, las=1) #mengubah label pada sumbu, jika 0 labelnya
normal, jika 1 label sumbu x dan y menjadi horizontal, jika 2 label sumbu x
vertikal y horizontal, jika 3 maka label sumbu x dan y vertikal

Histogram of AirPassengers
20
Frequency

15
10
5
0

100 200 300 400 500 600

AirPassengers

> hist(AirPassengers, breaks=5) #break berguna mengubah lebar batang dan


menambah jumlah batang

Histogram of AirPassengers
40
Frequency

30
20
10
0

100 200 300 400 500 600 700

AirPassengers
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> hist(AirPassengers, breaks=c(100, 300, 500, 700))

Histogram of AirPassengers
80
60
Frequency

40
20
0

100 200 300 400 500 600 700

AirPassengers

> hist(AirPassengers, breaks=c(100, seq(200,700, 150))) #sumbu x dimulai


dr nilai 100, pada nilai 200-700 lebar batang sebesar 150

Histogram of AirPassengers
0.0030
0.0020
Density

0.0010
0.0000

100 200 300 400 500 600

AirPassengers
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> hist(AirPassengers,
+ main="Histogram for Air Passengers",
+ xlab="Passengers",
+ border="blue",
+ col="green",
+ xlim=c(100,700),
+ las=1,
+ breaks=5,
+ prob = TRUE) #membuat histogram dengan memunculkan probability desity

Histogram for Air Passengers

0.0030

0.0025

0.0020
Density

0.0015

0.0010
0.0005

0.0000

100 200 300 400 500 600 700

Passengers

> lines(density(AirPassengers)) #memberikan kurva pada histogram

Histogram for Air Passengers

0.0030

0.0025

0.0020
Density

0.0015

0.0010
0.0005

0.0000

100 200 300 400 500 600 700

Passengers
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> install.packages("ggplot2")
Installing package into ‘C:/Users/HP/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
Warning: package ‘ggplot2’ is in use and will not be installed
> library(ggplot2)
> # Load in `chol` data, set `header` to `TRUE`
> chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"),
header = ....)
Error in
read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"), :
object '....' not found
> # Inspect first rows of `chol` with `head()`
> ....(....)
Error in ....(....) : could not find function "...."
> # Summary of `chol` with `summary()`
> .......(....)
Error in .......(....) : could not find function "......."
> # Structure of `chol` with `str()`
> ...(....)
Error in ...(....) : could not find function "..."
> # Compute a histogram of `chol$AGE`
> qplot(chol$AGE, geom="histogram")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

20

15

10

20 30 40 50 60
chol$AGE
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> # Compute a histogram of `chol$AGE`
> ggplot(data=chol, aes(chol$AGE)) +
+ geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
Use of `chol$AGE` is discouraged. Use `AGE` instead.

20

15
count

10

20 30 40 50 60
chol$AGE

# Histogram for `chol$AGE`


> qplot(chol$AGE,
+ geom="histogram",
+ binwidth = 5,
+ main = "Histogram for Age",
+ xlab = "Age",
+ fill=I("blue"),
+ col=I("red"),
+ alpha=I(.2),
+ xlim=c(20,50))
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B

> ggplot(data=chol, aes(x=chol$AGE)) +


+ geom_histogram(breaks=seq(20, 50, by=2),
+ col="red",
+ fill="green",
+ alpha = 0.2) +
+ labs(title="Histogram for Age", x="Age", y="Count") +
+ xlim(c(18,52)) +
+ ylim(c(0,30))
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B

> ggplot(data=chol, aes(chol$AGE)) +


+ geom_histogram(breaks=seq(20, 50, by=2),
+ col="red",
+ aes(fill=..count..)) +
+ scale_fill_gradient("Count", low="green", high="red")

> ggplot(data=chol, aes(chol$AGE)) +


+ geom_histogram(aes(y =..density..),
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
+ breaks=seq(20, 50, by = 2),
+ col="red",
+ fill="green",
+ alpha=.2) +
+ geom_density(col=2) +
+ labs(title="Histogram for Age", x="Age", y="Count")
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"),
header = TRUE)
> hist(AirPassengers,
+ main="Histogram untuk Penumpang Pesawat Udara",
+ xlab="Passengers",
+ border="yellow",
+ col="red",
+ xlim=c(0,700),
+ las=3,
+ breaks=c(100, seq(250,700, 200)),
+ prob = TRUE)
> lines(density(AirPassengers))

> install.packages("ggplot2")
> library(ggplot2)
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
> ggplot(data=chol, aes(chol$AGE)) +
+ geom_histogram(breaks=seq(20, 50, by=2),
+ col="red",
+ aes(fill=..count..)) +
+ scale_fill_gradient("Count", low="green", high="blue")+
+ geom_density(col=7) +
+ labs(title="Histogram untuk Usia Penumpang Pesawat Udara", x="Usia",
y="Jumlah")+
+ xlim(c(15,55)) +
+ ylim(c(0,30))

> boston = read.csv(file.choose(), header=TRUE, sep = ",")


> # 1. I N D U S
> hist(boston$INDUS,
+ main="Histogram untuk Proporsi Lahan Bisnis Non-eceran per Kota
",
+ xlab="INDUS",
+ border="yellow",
+ col="red",
+ xlim=c(0,35),
+ las=3,
+ breaks=c(seq(0,35,1)),
+ prob = TRUE)
> lines(density(boston$INDUS))
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B

> # 2. N O X
> ggplot(data=boston, aes(boston$NOX)) +
+ geom_histogram(breaks=seq(0.3, 0.9, by=0.05),
+ col="red",
+ aes(fill=..count..)) +
+ scale_fill_gradient("Count", low="pink", high="blue")+
+ labs(title="Histogram untuk Proporsi Kadar Nitrit Oksida per Kota",
x="NOX", y="Proporsi")

> # 3. R M
> ggplot(data=boston, aes(boston$RM)) +
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B
+ geom_histogram(breaks=seq(3,9, by=0.5),
+ col="black",
+ aes(fill=..count..)) +
+ scale_fill_gradient("Count", low="brown", high="grey")+
+ labs(title="Histogram untuk Proporsi Jumlah Kamar per Hunian", x="RM",
y="Proporsi")

> # 4. P T R A T I O
> ggplot(data=boston, aes(boston$PTRATIO)) +
+ geom_histogram(breaks=seq(12,22, by=1),
+ col="black",
+ aes(fill=..count..)) +
+ scale_fill_gradient("Count", low="grey", high="green")+
+ labs(title="Histogram untuk Rasio Murid dan Guru per Kota", x="PTRATIO",
y="Jumlah")
ANDI FAHIRA ALSA 06211840000004 1/04/2020
AED KELAS B

> # 5. B
> hist(boston$B, main="Histogram untuk Proporsi Orang Berkulit
Hitam",xlab="B", border="yellow", col="red", xlim=c(0,400), las=2,
breaks=c(seq(0,400,50)),prob = TRUE)
> lines(density(boston$B))

You might also like