0% found this document useful (0 votes)
19 views8 pages

Homework Example

The document outlines a homework assignment focused on data analysis using bootstrap methods to estimate whale abundance. It includes data descriptions, results with tables and figures, and code snippets for implementing the analysis in a programming environment. The assignment emphasizes the importance of including comments in the code for clarity and future reference.

Uploaded by

李其諺
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Homework Example

The document outlines a homework assignment focused on data analysis using bootstrap methods to estimate whale abundance. It includes data descriptions, results with tables and figures, and code snippets for implementing the analysis in a programming environment. The assignment emphasizes the importance of including comments in the code for clarity and future reference.

Uploaded by

李其諺
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Homework example (homework # and title here)

Chih-hao Hsieh (your name here)

Your homework will include:


1. Data description (no need to include all data if the data set is big).
2. Your results (including tables and figures) and interpretation.
3. Your codes (with comments). Also send your codes in a separate file .r or .m or
whatever.

Below is a demo:

Data:
transect sighting n individuals nS Transect length L
1 0 0 650
2 0 0 650
3 4 5 700
4 5 12 700
5 0 0 700
6 8 15 650
7 0 0 650
8 6 13 700
9 0 0 650
10 1 2 700
11 5 11 650
12 6 7 650
13 0 0 650
14 9 20 700
15 5 15 700

Non-parametric bootstrap sample:


nS: #1
10.0000 7.5000 12.0000 0 13.2000 0 13.0000 6.2500
0 0 10.5000 0 9.3750 0 13.0000
nS: #2
1.2500 14.4000 10.8333 12.0000 0 0 19.2000 0
7.5000 0 0 0 9.3750 13.2000 0
L: #1
700 700 700 650 650 700 650 650 700 700 650 650 650 700
700
L: #2
650 650 700 700 650 700 650 650 700 650 700 700 650 650
700

1
Parametric bootstrap sample:
w:
2.5245 1.8453……

g(0):
0.9390 0.9345 ……

AnS
N=
2Lwg (0)
NOTE: here nS=sum(nS) and L=sum(L) of the given data

Bias correction for CI

2
Gˆ (!ˆ) = 0.56
Z0 = 0.1510

Doing both BC and BCa


For BCa, acceleration is obtained based on Jacknife estimator

jacknife_N =

1.0e+003 *

1.3707 1.5389 2.4523 2.1629 2.3534 2.3278 2.8349 1.3827

1.5939 2.7405 2.6961 3.3523 2.2262 3.1122 2.4855

jack_mean =

2309
n

! (#ˆ
i =1
(.) " #ˆ(i ) ) 3
a= n
6(! (#ˆ(.) " #ˆ(i ) ) 2 ) 3 / 2
i =1

a = 0.0068

Comparison of methods

bootstrap mean(N) 2341


bootstrap median(N) 2212
approximate CV(N) 0.3274

90% CI
method normal lognormal bootstrap bootstrap BC bootstrap Bca with acceleration=0.0068
mean 2242 2242 2341 2341 2341
lower CI 1034 1326 1252 1431 1433
upper CI 3449 3789 3752 4194 4237

3
compare boostrap number = 1000 and 10000
simulating 100 times

4
(Your codes: be sure to include comments so that you know what you are doing. And
most importantly, you still recognize what you did after a really long time.)

Bootstrap method (demo, this code will not work because I delete
lines randomly).
%Script: whale2.m by Chih-hao Hsieh 01/25/2003
%Generating confidence interval by bootstrap method
%Estimating whale abundance
%N=AnS/{2Lwg(0)}
%In this formula, nS=sum(nS): total number of individuals,
%L=sum(L): total length of transect survey
%A: size of the study area
%w: effective strip width
%g(0): prob of detecting a whale if on the transect

%import data. Be careful that in the following nS and L have different


%meaning from the above
data=xlsread('hw1_data.xls');
%n: # of sighting
%nS: # of individuals
%S: group size
A=815000;%A: size of the study area (km squar)
n=data(:,2);
nS=data(:,3);
L=data(:,4);

5
%calculate S
m=length(nS);
for i=1:m
end;
Splus=S(find(S~=0));%only keep S when S>0
S_length=length(Splus);%for deciding upper bound of unifrom prob of S

%bootstraping method to simulate N:abundance 1000 times


Nm=1000;%number of bootstrap simulation
for k=1:Nm
%Random generating n,S,L from sample data by non-parametric bootstrap
%method
%sample with replacement following uniform probability U(0,15)
%random sampling n,S, and calculate nS
i=1;
while i<=15
pick_n=ceil(15*rand);
if bootstrap_n(i)==0
bootstrap_nS(i)=0;%Since there is no sighting, there won't be any individuals
else %if bootstrap_n(i)~=0
%if n is not zero, S won't be zero
%random sampling S from non-zero S
pick_S=ceil(S_length*rand);

end;
i=i+1;
end;

%random sampling L
pick_L=ceil(15*rand(1,15));

%calculate sum of bootstraping nS and L


boots_sum_nS=sum(bootstrap_nS);

%Random generating w,g(0) from sample data by parametric bootstrap method


%CV=SE/E and thus SE=E*CV
%Dist normal(mu, var)=randn*sqrt(VAR)+mu

%random generating bootstraping w


%w: mean=2.0, CV=0.2
w_mean=2.0;CV_w=0.2;w_se=w_mean*CV_w;
bootstrap_w=randn*w_se+w_mean;

%random generating bootstraping g(0)


%g: mean=0.9 CV=0.05

6
g_mean=0.9;CV_g=0.05;g_se=g_mean*CV_g;
j=0;
while j==0
p=randn*g_se+g_mean;
%g follows truncated Gaussian, g>1 or g<0 is not allowed
if p<=1 & p>=0
bootstrap_g=p; %used for calculate CV(sum(nS)/sum(L))
j=1;
end;
end;
end;

%Percentile bootstrap to estimate 90% confidence interval


percentile=90/100;
alpha=(1-percentile)/2;
N=sort(N);
bootstrap_mean_N=mean(N)
CI_boots_up=N(round(Nm*(1-alpha)))%upper limit
figure
hist(N,50)

%calculate CI by using approximate formula for the CV


%CV(AnS/{2Lwg(0)})=CV_N
%=sqrt(CV(nS/L)_squar+CV(w)_squar+CV(%g)_squar
CV_nS_L=std(boots_nS_L)/mean(boots_nS_L)
est_N=A*sum(nS)/(2*sum(L)*w_mean*g_mean)%approximate abundance
se_theta=est_N*CV_N;%standard error of expected N , =SE(N_hat)

%CI for normal distribution


%prob(Z)=0.05
Z_alpha=abs(norminv(alpha,0,1));%find the inverse of standard normal CDF
CI_normal=Z_alpha*se_theta;
CI_up_normal=est_N+CI_normal

%CI for lognormal distribution


%SE(ln(theta))=sqrt(ln(1+Var(theta)/theta^2))
Var_N_hat=se_theta^2;%back calculate variance of N
CI_up_lognormal=est_N*CI_lognormal

%emperical CDF plot


vect=1:Nm;
G=vect/Nm;
figure
plot(N,G)
hold on

7
axis([0 max(N) 0 1])
title('bootstrap cumulative distribution function')
ylabel('G(\theta)')
hold off

%find G(theta_hat), here theta_hat=bootstrap_mean_N


G_theta_hat=max(find(N<bootstrap_mean_N))/Nm
Z_0=norminv(G_theta_hat,0,1)%find the inverse of standard normal CDF

%BC method, theta_BC(alpha)=inv(G(cdf(2*Z_0+Z_alpha)))


CI_BC_low=N(round(normcdf(2*Z_0-Z_alpha,0,1)*Nm))
CI_BC_up=N(round(normcdf(2*Z_0+Z_alpha,0,1)*Nm))

%BCa method
%a based on jacknife estimator, call function whale3.m to generate jacknife sampling
[jack_mean, jack_N,a]=whale3
CI_BCa_low=N(round(normcdf(Z_0+(Z_0-Z_alpha)/(1-a*(Z_0-Z_alpha)),0,1)*Nm))
CI_BCa_up=N(round(normcdf(Z_0+(Z_0+Z_alpha)/(1-a*(Z_0+Z_alpha)),0,1)*Nm))

%plot different methods


figure
hold on
plot(1, est_N,'*', 1, CI_low_normal,'o', 1, CI_up_normal,'o')%normal approximation
plot(4,bootstrap_mean_N,'*', 4, CI_BC_low,'o', 4, CI_BC_up,'o')%bootstrap with BC
plot(5,bootstrap_mean_N,'*', 5, CI_BCa_low,'o', 5, CI_BCa_up,'o')%bootstrap with BC
axis([0 6 0 5000])
title('comparison of methods')
x=1:5;
y=500*ones(1,5);
text(x,y,method)
hold off

You might also like