GENOMICA FUNCIONAL DR. VCTOR TREVIO VTREVINO@ITESM.
MX
A7-421
Microarrays Image Analysis
vtrevino@[Link]
Microarray - Pre-Processing Purpose
vtrevino@[Link]
Microarray Image Analysis
TECHNOLOGIES y
sectors (~=3)
m
probsets (~100)
x sectors (~=3) DNA Usually 3 Sectors (print-tip) i x j spots (18x20) Empty spots landing lights
Target
(cDNA, PCR products, etc.)
n probsets
(~100)
Probes Copies per gene Organization Sectors Controls
Probeset
Oligos
~20 40nt
Usually 1 n x m probsets
perfect match probes (pm) mismatch probes (mm)
vtrevino@[Link]
Microarray - Image Analysis
TECHNOLOGIES RAW DATA
10,000 genes * 2 dyes * 3 copies/gene * ~40 pixels/gene = 2,400,00 values Image Analysis Pre-processing only 10,000 values
10,000 genes * 20 oligos * 2 (pm,mm) * ~ 36 pixels/gene = 14,400,00 values
only 10,000 values
vtrevino@[Link]
Image Analysis
Addressing
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities Done by GeneChip Affymetrix software quality measures.
vtrevino@[Link]
Image Analysis
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures.
Addressing (by grid, GenePix)
vtrevino@[Link]
Image Analysis
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures. Segmentation
Circular feature
Irregular feature shape
Finally compute Average
Background Reduction
Extraction: Determining Background
Image Analysis
Segmentation (Spot detection) Background Estimation Value
Value = Spot Intensity Spot Background
vtrevino@[Link]
Sample1
Sample1
Gene1
Gene2
Gene3
.
.
Genek
.
.
GeneN
100
209
7
.
.
9882
.
.
2298
98
4209
2
.
.
9711
.
.
28
vtrevino@[Link]
Data Transformation two dyes
Sample1
Sample1
Gene1
100
Gene2
209
Gene3
7
.
.
.
.
Genek
9882
.
.
.
.
GeneN
2298
98
4209
2
.
.
9711
.
.
28
G=Sample1
Log2(G=Sample1)
Log2
Microarray Bioinformatics - D. Stekel (Cambridge, 2003)
Log2(R=Sample1)
R=Sample1
vtrevino@[Link]
Data Transformation two dyes
(log2 scale)
Sample1
Sample1
Gene1
100
Gene2
209
Gene3
7
.
.
.
.
Genek
9882
.
.
.
.
GeneN
2298
98
4209
2
.
.
9711
.
.
28
R=Sample1
Desv Intensity
MA-Plot
G=Sample1
1 value?
R M = Log 2 G Log 2(R G ) A= 2
R G
M A
Normalization 2 dyes
"With-in"
(2 color technologies)
(assumption: Majority No change)
1
M
log2(R)-log2(G) -1 -4 -3 -2
10
12
14
16
(log2(G)+log2(R)) / 2
Normalization 2 dyes
"With-in"
(2 color technologies)
(assumption: Majority No change)
Before
After
Normalization 2 dyes
"With-in" Spatial
(2 color technologies) Before Normalization Aftter loess Global Normalization Aftter loess by Sector (print-tip) Normalization
vtrevino@[Link]
Data Transformation one dye
Sample1
Gene1
100
Gene2
209
Gene3
7
.
.
.
.
Genek
9882
.
.
.
.
GeneN
2298
Log2
1.5
Normalization 1 or 2 dyes
density(x = log2(t[, 15] + 200), adjust = 0.475)
Between-slides
Density 0.0
7
0.5
1.0
9 N = 3840
10 Bandwidth = 0.1051
11
12
Before normalization
1.0
After normalization quantile scale qspline invariantset loess
0.8
0.8
MAD (median absolute deviation)
density
density
0.6
0.4
0.0
0.2
10
11
12
13
14
15
16
0.0
0.2
0.4
0.6
10
11
12 x
13
14
15
log intensity
Summarization Affymetrix
Oligonucleotide dependent technologies
PM MM
Sumarization = "Average"(Intensities) Usual Methods: tukey-biweight av-diff median-polish
The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.
vtrevino@[Link]
Microarrays Filtering / Treating Undefined Values
Some spots may be defective in the printing process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc)
Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning)
vtrevino@[Link]
Microarray Data Filtering
More than 10,000 genes Too many data increases Computation Time and analysis complexity Remove
Genes that do not change significantly Undefined Genes Low expression Large signal to noise ratio Large statistical significance Large variability Large expression
Keeping
Microarray Pre-Processing Summary
ImageAnalysisandBackgroundSubtraction
b)
a)
DataProcessing
Aymetrix
vtrevino@[Link]
Transformation
c)
Microarray Twodyes
Image
Scanning
Spot
Detection
Background
Detection& Subtraction
Intensity Value
M=log2(R/G)
Normalization
d)
Within
Between
A=log2(R*G)/2
vtrevino@[Link]
Image Analysis Exercise
Data processing of Placental Microarrays
Dr. Hugo A. Barrera Saldaa Paper in Mol. Med. 2007 : DNA Microarrays - A Powerful Genomic Tool for Biomedical Research Trevino - Barrera - Mol Med 2007
Search PubMed for Trevino V
Experimental Design Goal : Differential Expression
mRNAExtraction
Labelling
Placenta1
ReferencePool
Placenta2
Green
Red
Green
Red
(controls)
Microarray
Hybridization
(byduplicates)
Scanning&
DataProcessing
Detectionof
Dierentially
ExpressedGenes
Validationand
Analysis
Image
Analysis
Within
Normalization
(perarray)
Between
Normalization
(allarrays)
(Dr. Hugo Barrera)
ttestH0:=0
pvaluescorrection:FalseDiscoveryRate
ComparisonWithKnownTissueSpecicGenes
vtrevino@[Link]
Experimental Design - Slides
SLIDES' SCANNINGS GROUP 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a SLIDE 52 A 52 B 51 A 51 B 56 A 56 B A 54 B 54 A 55 B 55 A 53 B 53 V V V V V V V V V V V V CY3 (GREEN) Sample Sample Sample Sample Control Control Control Control Control Control Control Control CY5(RED) Control Control Control Control Muestra Muestra Muestra Muestra Control Control Control Control COMMENTS
RIGHT TOP GROUP RIGHT BOTTOM GROUP
LEFT TOP GROUP LEFT BOTTOM GROUP
Download Images from
6b
[Link]
vtrevino@[Link]
Read Images
Read BOTH Images together using SpotFinder
Mark file 1 as "Cy3" = Green Mark file 2 as "Cy5" = Red
Adjust Image Brightness and Contrast
vtrevino@[Link]
Create Grid
Create Grid
Metarows = 12, Metacolumns = 4 Rows = 24, Columns = 24 Pixels = 450 (of the 24 x 24 spots) Spacing = 18 (between metacolumns and metarows)
vtrevino@[Link]
Adjust Grid
Created Grids are not aligned to the image.
Use Visible All (right click in a blank area) Use Move All To adjust overall position. Use visible all to restore grid.
Adjust each of the 12*4 Grids to correct positions
Right mouse button in a grid to move that grid Arrow keys also work Right mouse button in a blank section to move all grids
vtrevino@[Link]
Save Grid
Save the grid frequently to avoid loosing your work
Image Analysis
vtrevino@[Link]
Use Gridding and Processing
Copy images
Adjust (save grid first, in mac adjust doesnt work well) Process
Export to .mev file Open .mev file in excel Remove comment lines Compute signal:
1 From the grid adjust 1 From the RI plot 1 From the data (figure) 2 From the QC view (A and B) What does they represent?
Signal A = Cy3 Green = MNA - MedBkgA = Media del spot A - Mediana del fondo B Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo B Copy image in a word file
Plot Signal A vs Signal B
DO NOT SAVE THE modified .MEV FILE
vtrevino@[Link]
Execute Process
- Select Gridding Tab - Use Histogram Segmentation - Spot Size = 10 - Process All !
vtrevino@[Link]
Inspect DATA PROCESSED
Select Data Tab
Select a row / spot See results and interpret output
vtrevino@[Link]
Inspect MA-PLOT
Select RI-PLOT Tab Observe the MA-PLOT You can switch on/off specific grids A tendency can be observed (which has to be corrected to 0 see MIDAS exercise)
vtrevino@[Link]
Quality Control View
Quality view tab
View 2 gives if each had M > 1 (yellow, or 0.5 in this image) or M < -1 View 1 gives the count of all M values per color (yellow, gray, blue, and green)
vtrevino@[Link]
Export DATA and VIEW in Excel
Save data to a .mev file
Open .mev file in excel Remove comment lines (important !) Compute signal:
Signal A = Cy3 Green = MNA MedBkgA = Media del spot A Mediana del fondo B Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo B Copy image in a word file
Plot Signal A vs Signal B
The Plot in Excel should be similar to the MA plot (RI-Plot)
DO NOT SAVE THE modified .MEV FILE
vtrevino@[Link]
Resumen del Uso de SpotFinder
Imagen
Lemos 2 imgenes, Verde=Cy3, Roja=Cy5 para generar un valor de intensidad con ruido de fondo reducido para cada color:
Generamos
Datos
un grid con la cantidad de spots y diseo espacial especificado para el microarreglo Ajustamos las posiciones visualmente moviendo los grids Calculamos el valor de la seal y el ruido de fondo para cada color Obtuvimos un archivo con datos