2-Modelos Lineales Generalizados
2-Modelos Lineales Generalizados
GENERALIZADOS,
AIC Y DESVIANZA
Historia
Siglo XIX (Legendre, Gauss): regresión lineal múltiple → distribución normal con
enlace identidad
1920-1935 (Fisher): ANOVA → distribución normal con enlace identidad
R.A. Fisher 1922 (Fisher): función de verosimilitud
1922 (Fisher): ensayos de dilución → distribución binomial con enlace log-log
complementario
1934 (Fisher): funciones exponenciales
1935 (Bliss): análisis probit → distribución binomial con enlace probit
1944-1952 (Berkson; Dike & Patterson): logit de proporciones → distribución
binomial con enlace logit
1960 (Rasch): análisis de casos → distribución de Bernouilli con enlace logit
1963 (Birch): modelos loglineares para contajes → distribución de Poisson con
enlace logit
1965-1967 (Feigl & Zelen; Zippin & Armitage; Glasser): modelos de regresión
para datos de supervivencia → distribución exponencial con enlace
recíproco o logit
1966 (Nelder): polinomios inversos → distribución gamma con enlace recíproco
Relación tamaño corporal-
fecundidad
16
14 y = 0.1571x - 15.26
R2 = 0.9554
12
Mean clutch size
10
0
100 120 140 160 180 200
Mean female carapace lenght (mm)
Modelo lineal
yi=b0 + b1xi + ei
Número de
Tamaño
huevos
corporal
yi=-15.26 + 0.1571xi + ei
Relación tamaño corporal-
fecundidad
16
14 y = 0.1571x - 15.26
R2 = 0.9554
12
Mean clutch size
10
4 3.91
2
0
122
100 120 140 160 180 200
Mean female carapace lenght (mm)
Análisis de la varianza
(ANOVA)
Análisis de experimentos
Modelos lineales
Fuentes de variación en los datos
Factores fijos versus factores
aleatorios
Estimación de parámetros mediante
máxima verosimilitud por mínimos
cuadrados
Grados de libertad
n
11.4 mm
Calculamos la varianza var( x)
(x x)
i
2
n 1
2.8 mm2
simular
Anaphes nitens
Efecto de la calidad del
hospedador sobre el tamaño
Efecto de la calidad del
hospedador sobre el tamaño
80
A
y = 114.7x - 70.004
70 2
R = 0.4605
60
Número de huevos
50
40
30
20
10
0
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2
Longitud del ala (mm)
Tamaño/sexo-superparasitismo
1.0
0.9
Wing length (mm)
0.8
16 46 16 16
0.7
male female male female
parasitized superparasitized
ANÁLISIS DE LA VARIANZA
1.1
machos
hembras
1
Longitud del ala (mm)
Lineal
(hembras)
Lineal
(machos)
0.9
0.8
0.7
Superparasitado Parasitado
Tipo de hospedador
REGRESIÓN LINEAL
yi = b0 + b1Sexo+b2Trat2+b3 Sexo.Trat2+e
***** Regression Analysis *****
Response variate: wing_length_mm
Fitted terms: Constant + grupo + sex + [Link]
REGRESIÓN:
yi = b0 + b1Sexo+b2Trat2+b3 Sexo.Trat2+e
longitud=intercepción+tratamiento+sexo+interacción+error
Modelo con interacciones
a1 = b1 = (ab)1j = (ab)i1 = 0
Columna
Fila
1 2 3
1 m m+b2 m+b3
2 m+ a2 m+ a2 + b2 + (ab)22 m+ a2 + b3 + (ab)23
3 m+ a3 m+ a3 + b2 + (ab)32 m+ a3 + b3 + (ab)33
EFECTOS FIJOS Y
ALEATORIOS
ANOVA DE DOS FACTORES
Cuadrado
Efectos fijos medio esperado F
nb a 2
Ai
2
e
a 1 i 1
MSA/MSe
b
na
Bj
e2
b 1 i 1
2
MSB/MSe
a b
n
ABij
e2
(a 1)(b 1) i 1 jMS
(
1 AB
)
/MSe
2
e2
Residuos (error)
ANOVA DE DOS FACTORES
Efectos Cuadrado
aleatorios medio esperado F
e2 n AB
2
nb A2
Ai MSA/MSAB
e2 n AB
2
na B2
Bj MSB/MSAB
ABij
e2 n AB
2
MSAB/MSe
Residuos (error)
e2
Sólo en el caso de diseños equilibrados los
factores fijos y aleatorios producen los
mismos resultados
Factores inter-sujetos
N
SEXO 0 17
1 20
Temperatura 10 6
Longitud 15 3
20 15
25 6
26 7
Efecto de la temperatura y el sexo
sobre el tamaño corporal
machos
LONGITUD
hembras
7.5
7
Longitud (cm)
6.5
5.5
4.5
10 15 20 25 28
Temperatura (ºC)
FACTORES FIJOS
yijk=m+ti+bj+eijk
yijk=m+ti+bj+eijk
y i =b 0 +b 1 x 1 +b 2 x 2 +b 3 x 3 +e i
Tratamiento Ecuación
1 yi=b0(1)+b1(1)+b2(0)+b3(0)
2 yi=b0(1)+b1(0)+b2(1)+b3(0)
3 yi=b0(1)+b1(0)+b2(0)+b3(1)
4 yi=b0(1)+b1(0)+b2(0)+b3(0)
y i =b 0 +b 1 x 1 +b 2 x 2 +b 3 x 3 +e i
Tratamiento Ecuación
1 yi=b0(1)+b1(1)+b2(0)+b3(0)
2 yi=b0(1)+b1(0)+b2(1)+b3(0)
3 yi=b0(1)+b1(0)+b2(0)+b3(1)
4 yi=b0(1)+b1(0)+b2(0)+b3(0)
y i =b 0 +b 1 x 1 +b 2 x 2 +b 3 x 3 +e i
estructura lineal
del modelo
y=Xb+e
matriz de diseño
Función de enlace
hi=gi(yi)
FUNCIÓN DE ENLACE
9! 5
P(5) (1 ) (95)
5!(9 5)!
La verosimilitud £ no depende de la parte
combinatoria de la fórmula, y podemos ignorarla
d£( ) 5 (9 5)
Por tanto la derivada es:
d (1 )
Fuente: Crawley, M. J. 1993. GLIM for ecologists, Oxford: Blackwell Science.
Función de verosimilitud
Lognormal (2,2)
Nota: las funciones g2 y g3 están
F(4,10) intercambiadas en Burnham &
Anderson (1998). El error se corrige
aquí
Distancia de Kullback-Lieber
entre dos funciones
integral
Fuente: K. P. Burnham and D. R. Anderson, 1998. Model
función f
f ( x)
I ( f , g) f ( x) log dx
g ( x )
Buscamos un modelo que se aproxime y que
minimice la pérdida de información, esto es,
tenemos que minimizar I(f,g) sobre g.
Weibull (a=2, b=20) Lognormal (=2, 2=2)
g ( x )
Sin embargo, podemos reescribir la ecuación como:
approach, New York: Springer. 353 pages.
Distancia relativa:
Modelo que se aproxima Rango
I(f, gi)-C
g1 Weibull (a=2, b=20) 3.45591 1
f ( x)
Fuente: K. P. Burnham and D. R. Anderson, 1998. Model
conocemos la verdad
Iˆ( f , g ) E x log( f ( x)) E x log(g ( x ˆ( y )))
approach, New York: Springer. 353 pages.
Iˆ( f , g ) constante E x log(g ( x ˆ( y )))
Eˆ Iˆ( f , g ) constante E y E x log(g ( x ˆ( y )))
Criterio de Información de
Akaike (AIC)
Hirotsugu Akaike
Tenemos que estimar el valor de T:
T E y E x log( g ( x ˆ( y )))
selection and inference. A practical information-theoretic
fuente: K. P. Burnham and D. R. Anderson, 1998. Model
verosimilitud
Tˆ log(£(ˆ y )) K
Donde K es aproximadamente igual al número de parámetros
estimables del modelo→ sesgo
Criterio de Información de
Akaike (AIC)
Hirotsugu Akaike
AIC 2 log(£(ˆ y )) 2 K
selection and inference. A practical information-theoretic
Fuente: K. P. Burnham and D. R. Anderson, 1998. Model
sesgo2 o Varianza
Número de parámetros
Criterio de Información de
Akaike corregido (AIC c )
n
AICc 2 log(£(ˆ)) 2 K
approach, New York: Springer. 353 pages.
n K 1
2 K ( K 1)
AICc AIC
n K 1
2 K ( K 1)
QAICc QAIC
n K 1
Vigilancia sin
contacto en
Calopteryx
haemorrhoidalis
Métodos de marcaje
Termorregulación
del galápago
europeo (Emys
orbicularis)
MARCAJE CON CÓDIGOS DE
POSICIÓN
M A R C A S N AT U R A L E S
Galápago europeo (Emys orbicularis)
Marcas naturales (Salamandra salamandra)
Tipos de experimentos
Two-sample (Peterson) vs multi-sample experiments.
Source: N. A. Arnason, C. J. Schwarz, and G. Boyer, 1998. POPAN-5. A data maintenance and analysis system for mark-recapture data, Manitoba, Canada: Department of
Computer Science, The University of Manitoba. 318 pages.
Índice de Petersen-Lincoln
m m r r
n
n N rn
N
m
N m
n
Asunciones del método de
Petersen-Lincoln
1. Todas las marcas son permanentes y se anotan sin error cuando se recaptura
el animal.
2. El hecho de que un animal sea capturado, manipulado, y marcado una o varias
veces no debe afectar a la probabilidad de recaptura.
100
Número estimado
80
60
40
20
0
14-08
15-08
16-08
18-08
21-08
23-08
24-08
25-08
26-08
30-08
31-08
8-09
10-09
12-09
17-08
19-08
20-08
22-08
27-08
28-08
29-08
1-09
2-09
3-09
4-09
5-09
6-09
7-09
9-09
11-09
13-09
14-09
15-09
Fuente: A. Cordero and J. A. Andrés. 1999. Lifetime mating success, survivorship and synchronized reproduction in the damselfly Ischnura pumilio (Odonata:
Coenagrionidae). Int. J. Odonatol. 2 (1):105-114.
Un nuevo paradigma
Fuente: J. D. Lebreton
, K. P. Burnham, J.
Clobert
, and D. R. Anderson. 19
92
.
Modeling survival and te
sting biological hypothe
ses using marked animal
s: a unified approach wit
h case studies.
Ecol. Monographs 62
(1): 67-118.
Marcaje-recaptura múltiple
Datos de marcaje-recaptura
Fuente: J. D. Lebreton
, K. P. Burnham, J.
Clobert
, and D. R. Anderson. 19
92
.
Modeling survival and te
sting biological hypothe
ses using marked animal
s: a unified approach wit
h case studies.
Ecol. Monographs 62
(1): 67-118.
Datos de marcaje-recaptura
Fuente: J. D. Lebreton
, K. P. Burnham, J.
Clobert
, and D. R. Anderson. 19
92
.
Modeling survival and te
sting biological hypothe
ses using marked animal
s: a unified approach wit
h case studies.
Ecol. Monographs 62
(1): 67-118.
Tres ocasiones de recaptura
Tres ocasiones de recaptura
Datos de marcaje-recaptura
Fuente: J. D. Lebreton
, K. P. Burnham, J.
Clobert
, and D. R. Anderson. 19
En general, es posible detallar la probabilidad
92
.
explícita de cada historia de recaptura
Modeling survival and te
sting biological hypothe
ses using marked animal
s: a unified approach wit
h case studies.
Ecol. Monographs 62
(1): 67-118.
Fuente: J. D. Lebreton
, K. P. Burnham, J.
Clobert
, and D. R. Anderson. 19
92
.
Modeling survival and te
sting biological hypothe
ses using marked animal
s: a unified approach wit
h case studies.
Ecol. Monographs 62
(1): 67-118.
ESTIMAS DE LA
PROBABILIDAD DE
SUPERVIVENCIA DE EMYS
ORBICULARIS
Probabilidades de superviencia y recaptura estimadas a partir de las
historias de recaptura de 29 machos, 18 hembras y 13 juveniles (1997-
2001) de una población en Centeáns (O Porriño, Pontevedra), usando
MARK 2.1
PROBABILIDAD DE
SUPERVIVENCIA
[Link]
Erythromma viridulum
ALGUNAS
Saltar CONSIDERACIONES ANTES
sección
DE EMPEZAR A MARCAR
Fieldwork Planning
It tag loss is likely, then using two marks in each individual can be a good
strategy. Animals who lose one tag can be used to estimate how many
have lost both, and take this into account in calculations.
We should try to maintain constant intervals between samplings.
Very irregular intervals will increase sampling error and difficult
estimation.
Fieldwork Planning
Determine which phenotypical variables will be recorded. Some of these atributes would
analysis system for mark-recapture data, Manitoba, Canada: Department of Computer Science,
Source: N. A. Arnason, C. J. Schwarz, and G. Boyer, 1998. POPAN-5. A data maintenance and
be specific to each individual (sex, colour morph, cohort) but other will change with age.
We should try to include all atributes that potentially can affect survival or recapture rates.
Some phenotypic variables that we might think irrelevant for our study might be crucial to
subdivide the population in groups of individuals with homogeneous survival and recapture
rates.
Think about how the marking method selected will affect to marked animals, to determine
whether violations of the iii assumption are likely.
Try to estimate how much time is needed to record all phenotypic characters of each
specimen. Animals should be handled the minimum time possible, to minimize changes to
their behaviour (think about your odor on the animal and how this could affect congeners
and predators).
The University of Manitoba. 318 pages.
It is crucial that the capture method produces no damages, so that animals became trap-
shy. On the contrary, the use of baited traps can produce trap-hapiness. Try to avoid both
effects, for instance changing trap placements and types of capture methods.
Is there any research already done with the selected species (or a very similar taxon)?.
Many minor details of how to solve sampling problems never appear in the Methods
section of papers: if you can consult with somebody that has previous experience, then do
it!
Fieldwork Planning
Precission needed
Manwork available
In a small population (100-500 individuals) this means to try to mark 20% or even
50% of specimens in the first 3-4 sampling occasions.
Sampling effort should be higher in the first sampling occasions. Try to get some
helpers for the first sampling days.
In large populations (more than 1000 individuals) satisfactory parameter estimates
can be obtained with a proportion of marked animals of 20% or less.
When a reasonable number of animals are marked, sampling effort can be
minimized, but ideally should be enough to recapturate all specimens at least
once. Therefore marking should be done for at least the average duration of
the lifespan.
For instance, if the daily survival rate is 0.9, then the expected lifespan is 10
days(=1/(1-0.9)), and we should be able to recapture at least 20% of the population
The University of Manitoba. 318 pages.
marked animals with the rest of the population. The estimates can be
unreliable with inter-sampling interval is wrong.
For instance, if survival rate is high (say 0.9 between sampling dates)
then many survival rates would be over 1.0, unless sampling intensity is
very high.
In this case, taking the samples every 4 sampling intervals the estimate
of survival rate would be 0.66 (=0.94), and estimates over 1.0 would be
very unlikely.
The best survival estimates are usually between 0.6 and 0.8. Sampling
efforts that give estimates below 0.6 are clearly unrealiable.
Remember that one of the assumptions of the models is that marking is
The University of Manitoba. 318 pages.
instantaneous in time.
This assumption is important if the disappearance rate is low and the
speed of mixturing of marked animals with the unmarked ones is high,
but certainly should not be ignored. A study where marking is done
continously, and that afterwards is artifically divided into days, weeks or
months, has the risk of induce heterogeneity in recapture rates.
Number of Samplings
analysis system for mark-recapture data, Manitoba, Canada: Department of Computer Science,