Most important agronomic and quality traits of crops are quantitative in nature. The genetic vari... more Most important agronomic and quality traits of crops are quantitative in nature. The genetic variations in such traits are usually controlled by sets of genes called quantitative trait loci (QTLs), and the interactions between QTLs and the environment. It is crucial to understand the genetic architecture of complex traits to design efficient strategies for plant breeding. In the present study, a new experimental design and the corresponding statistical method are presented for QTL mapping. The proposed mapping population is composed of double backcross populations derived from backcrossing both homozygous parents to DH (double haploid) or RI (recombinant inbreeding) lines separately. Such an immortal mapping population allows for across-environment replications, and can be used to estimate dominance effects, epistatic effects, and QTL-environment interactions, remedying the drawbacks of a single backcross population. In this method, the mixed linear model approach is used to estimate the positions of QTLs and their various effects including the QTL additive, dominance, and epistatic effects, and QTL-environment interaction effects (QE). Monte Carlo simulations were conducted to investigate the performance of the proposed method and to assess the accuracy and efficiency of its estimations. The results showed that the proposed method could estimate the positions and the genetic effects of QTLs with high efficiency.
SummaryStudies on inheritance of fertility are of great importance in wheat breeding. Although su... more SummaryStudies on inheritance of fertility are of great importance in wheat breeding. Although substantial progress has been achieved in molecular characterization of male sterility and fertility restoration recently, little effort has been devoted to female sterility. To identify the gene(s) controlling female sterility in wheat efficiently, an investigation was conducted for the seed setting ratio using a set of F2 populations derived from the cross between a female sterile line XND126 and an elite cultivar Gaocheng 8901. Bulked segregation analysis (BSA) method and recessive class approach were adopted to screen for SSR markers potentially linked to female fertility gene loci in 2005. Out of 1080 SSRs in wheat genome, eight markers on chromosome 2D showed a clear difference between two disparate bulks and small recombination frequency values, suggesting a strong linkage signal to the sterility gene. Based on the candidate linked markers, partial linkage maps were constructed with...
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently nee... more The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.
Nicotine dependence (ND) is costly to societies worldwide, moderately heritable, and genetically ... more Nicotine dependence (ND) is costly to societies worldwide, moderately heritable, and genetically complex. Risk loci can be identified with genetic linkage analysis independent of prior physiological hypotheses. We completed a genomewide linkage scan to map loci increasing risk for DSM-IV ND and for a quantitative assessment of ND as measured by the Fagerstrom Test for Nicotine Dependence (FTND) in a set of 634 small nuclear families ascertained on the basis of multiple individuals affected with cocaine or opioid dependence. Of these, 507 had at least two subjects affected with ND. There are two distinct populations within this sample, European-Americans (EAs) and African-Americans (AAs). A region on chromosome 5 was identified as containing a gene that affects risk for ND on the basis of FTND score in the AA part of our sample (logarithm of the odds [lod] score 3.04; empirically determined to be genomewide-significant, p = .0374; point p = .0001). The highest lod score observed in the EA part of the sample was on chromosome 7 (lod score 2.73). Several other "possible" risk loci were identified in either AA or EA subjects, with many of these in proximity to previously suggested risk loci from other clinical samples. Three nominally significant single-nucleotide polymorphism associations were found at the peptidylglycine alpha-amidating monooxygenase (PAM) locus under the chromosome 5 linkage peak, also in the AA part of the sample. These data add to the growing evidence for locations for ND risk loci, add a novel statistically significant locus important in AAs, and suggest a gene that might be contributing to this linkage signal.
We proposed a faster pedigree-based generalized multifactor dimensionality reduction algorithm, c... more We proposed a faster pedigree-based generalized multifactor dimensionality reduction algorithm, called PedG-MDR II (PII), to detect gene-gene interactions underlying complex traits. Inherited from our previous framework of PedGMDR (PI), PII can handle both dichotomous and continuous traits in pedigree-based designs and allows for covariate adjustment. Compared with PI, this faster version can theoretically halve the computing burden and memory requirement. To evaluate the performance of PII, we performed comprehensive simulations across a wide variety of experimental scenarios, in which we considered two study designs, discordant sib pairs and mixed families of varying size, and, for each study design, we considered five common factors that may potentially affect statistical power: minor allele frequency, missing rate of parental genotypes, covariate effect, gene-gene interaction, and scheme to adjust phenotypic outcomes. Simulations showed that PII gave well controlled type I error rates against population admixture. Under a total of 4,096 scenarios simulated, PII, in general, had a higher average power than PI for both dichotomous and continuous traits, and the advantage was more pronounced for continuous traits. PII also appeared to be less sensitive than PI to changes in the other four factors than the magnitude of genetic effects considered in this study. Applied to the Mid-South Tobacco Family study, PII detected a significant interaction with a p value of 5.4 × 10 −5 between two taster receptor genes, TAS2R16 and TAS2R38, responsible for nicotine dependence. In conclusion, PII is a faster supplementary version of our previous PI for detecting multifactor interactions.
PURPOSE: ARDS is associated with neutrophil mediated inflammation in the lungs. Previously, we ha... more PURPOSE: ARDS is associated with neutrophil mediated inflammation in the lungs. Previously, we have shown that a collagen derived tri-peptide proline-glycine-proline (PGP), a neutrophil chemo attractant is elevated in patients with chronic diseases with neutrophilic predominant inflammation such as COPD and cystic fibrosis. However, it is unknown if PGP is elevated in ARDS. We hypothesized that levels of PGP are increased in ARDS as compared to cardiogenic edema and non-lung disease ventilated patients.
With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been... more With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been available, enabling uncovering genetic architecture of complex traits by genome-wide association study (GWAS). However, the current GWAS strategy usually ignores epistatic and gene-environment interactions due to absence of appropriate methodology and heavy computational burden. This study proposed a new GWAS strategy by combining the graphics processing unit- (GPU-) based generalized multifactor dimensionality reduction (GMDR) algorithm with mixed linear model approach. The reliability and efficiency of the analytical methods were verified through Monte Carlo simulations, suggesting that a population size of nearly 150 recombinant inbred lines (RILs) had a reasonable resolution for the scenarios considered. Further, a GWAS was conducted with the above two-step strategy to investigate the additive, epistatic, and gene-environment associations between 701,867 SNPs and three important qua...
Previous studies have demonstrated that the c-aminobutyric acid type B (GABAB) receptor plays an ... more Previous studies have demonstrated that the c-aminobutyric acid type B (GABAB) receptor plays an essential role in modulating neurotransmitter release and regulating the activity of ion channels and adenyl cyclase. However, whether the naturally occurring polymorphisms in the two GABAB receptor subunit genes interact with each other to alter susceptibility to nicotine dependence (ND) remains largely unknown. In this study, we genotyped 5 and 33 single nucleotide polymorphisms (SNPs) for GABAB receptor subunit 1 and 2 genes (GABBR1, GABBR2), respectively, in a sample of 2037 individuals from 602 nuclear families of African- American (AA) or European-American (EA) origin. We conducted association analyses to determine (1) the association of each subunit gene with ND at both the individual SNP and haplotype levels and (2) the collective effect(s) of SNPs in both GABAB subunits on the development of ND. Several individual SNPs and haplotypes in GABBR2 were significantly associated with ...
For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a po... more For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a population. Because of its evolutionary relationship with linkage, this phenomenon is called linkage disequilibrium. Today, linkage disequilibrium-based mapping has become a major focus of recent genome research into mapping complex traits. In this article, we present a new statistical method for mapping quantitative trait loci (QTL) of additive, dominant, and epistatic effects in equilibrium natural populations. Our method is based on haplotype analysis of multilocus linkage disequilibrium and exhibits two significant advantages over current disequilibrium mapping methods. First, we have derived closed-form solutions for estimating the marker-QTL haplotype frequencies within the maximum-likelihood framework implemented by the EM algorithm. The allele frequencies of putative QTL and their linkage disequilibria with the markers are estimated by solving a system of regular equations. This pr...
A unified GMDR method for detecting gene–gene interactions in family and unrelated samples with a... more A unified GMDR method for detecting gene–gene interactions in family and unrelated samples with application to nicotine dependence
Lipid-related DEGs in Brassica napus and their lipid-related homologous genes in arabidopsis. (XL... more Lipid-related DEGs in Brassica napus and their lipid-related homologous genes in arabidopsis. (XLS 33 kb)
The top 20 most represented GO terms of DEG in the biological process category at the 15â 17 DAF ... more The top 20 most represented GO terms of DEG in the biological process category at the 15â 17 DAF (S2-G2). (XLS 33 kb)
The gi number of genes in NR database homologous with DEGs between varieties in QTL regions. (XLS... more The gi number of genes in NR database homologous with DEGs between varieties in QTL regions. (XLS 25 kb)
To reveal the impacts of smoking on genetic architecture of human body weight, we conducted a gen... more To reveal the impacts of smoking on genetic architecture of human body weight, we conducted a genome-wide association study on 5,336 subjects in four ethnic populations from MESA (The Multi-Ethnic Study of Atherosclerosis) data. A full genetic model was applied to association mapping for analyzing genetic effects of additive, dominance, epistasis, and their ethnicity-specific effects. Both the unconditional model (base) and conditional model including smoking as a cofactor were investigated. There were 10 SNPs involved in 96 significant genetic effects detected by the base model, which accounted for a high heritability (61.78%). Gene ontology analysis revealed that a number of genetic factors are related to the metabolic pathway of benzopyrene, a main compound in cigarettes. Smoking may play important roles in genetic effects of dominance, dominance-related epistasis, and gene-ethnicity interactions on human body weight. Gene effect prediction shows that the genetic effects of smoki...
Organism is a multi-level and modularized complex system that is composed of numerous interwoven ... more Organism is a multi-level and modularized complex system that is composed of numerous interwoven metabolic and regulatory networks. Functional associations and random evolutionary events in evolution result in elusive molecular, physiological, metabolic, and evolutionary relationships. It is a daunting challenge for biological studies to decipher the complex biological mechanisms and crack the codes of life. Hidden Markov models and more generally hidden Markov random fields can capture both random signals and inherent correlation structure typically in time and space, and have emerged as a powerful approach to solve many analytical problems in biology. This article will introduce the theory of hidden Markov model and the computational algorithms for the three fundamental statistical problems and summarize striking applications of hidden Markov models to biological and medical studies.
The manifestation of complex traits is influenced by gene-gene and gene-environment interactions,... more The manifestation of complex traits is influenced by gene-gene and gene-environment interactions, and the identification of multifactor interactions is an important but challenging undertaking for genetic studies. Many complex phenotypes such as disease severity are measured on an ordinal scale with more than two categories. A proportional odds model can improve statistical power for these outcomes, when compared to a logit model either collapsing the categories into two mutually exclusive groups or limiting the analysis to pairs of categories. In this study, we propose a proportional odds model-based generalized multifactor dimensionality reduction (GMDR) method for detection of interactions underlying polytomous ordinal phenotypes. Computer simulations demonstrated that this new GMDR method has a higher power and more accurate predictive ability than the GMDR methods based on a logit model and a multinomial logit model. We applied this new method to the genetic analysis of low-density lipoprotein (LDL) cholesterol, a causal risk factor for coronary artery disease, in the Multi-Ethnic Study of Atherosclerosis, and identified a significant joint action of the CELSR2, SERPINA12, HPGD, and APOB genes. This finding provides new information to advance the limited knowledge about genetic regulation and gene interactions in metabolic pathways of LDL cholesterol. In conclusion, the proportional odds model-based GMDR is a useful tool that can boost statistical power and prediction accuracy in studying multifactor interactions underlying ordinal traits.
Identification of multifactor gene-gene (G G) and gene-environment (G E) interactions underlying ... more Identification of multifactor gene-gene (G G) and gene-environment (G E) interactions underlying complex traits poses one of the great challenges to today's genetic study. Development of the generalized multifactor dimensionality reduction (GMDR) method provides a practicable solution to problems in detection of interactions. To exploit the opportunities brought by the availability of diverse data, it is in high demand to develop the corresponding GMDR software that can handle a breadth of phenotypes, such as continuous, count, dichotomous, polytomous nominal, ordinal, survival and multivariate, and various kinds of study designs, such as unrelated case-control, family-based and pooled unrelated and family samples, and also allows adjustment for covariates. We developed a versatile GMDR package to implement this serial of GMDR analyses for various scenarios (e.g., unified analysis of unrelated and family samples) and large-scale (e.g., genome-wide) data. This package includes other desirable features such as data management and preprocessing. Permutation testing strategies are also built in to evaluate the threshold or empirical p values. In addition, its performance is scalable to the computational resources. The software is available at http:// www.soph.uab.edu/ssg/software or http://ibi.zju.edu.cn/software.
Most important agronomic and quality traits of crops are quantitative in nature. The genetic vari... more Most important agronomic and quality traits of crops are quantitative in nature. The genetic variations in such traits are usually controlled by sets of genes called quantitative trait loci (QTLs), and the interactions between QTLs and the environment. It is crucial to understand the genetic architecture of complex traits to design efficient strategies for plant breeding. In the present study, a new experimental design and the corresponding statistical method are presented for QTL mapping. The proposed mapping population is composed of double backcross populations derived from backcrossing both homozygous parents to DH (double haploid) or RI (recombinant inbreeding) lines separately. Such an immortal mapping population allows for across-environment replications, and can be used to estimate dominance effects, epistatic effects, and QTL-environment interactions, remedying the drawbacks of a single backcross population. In this method, the mixed linear model approach is used to estimate the positions of QTLs and their various effects including the QTL additive, dominance, and epistatic effects, and QTL-environment interaction effects (QE). Monte Carlo simulations were conducted to investigate the performance of the proposed method and to assess the accuracy and efficiency of its estimations. The results showed that the proposed method could estimate the positions and the genetic effects of QTLs with high efficiency.
SummaryStudies on inheritance of fertility are of great importance in wheat breeding. Although su... more SummaryStudies on inheritance of fertility are of great importance in wheat breeding. Although substantial progress has been achieved in molecular characterization of male sterility and fertility restoration recently, little effort has been devoted to female sterility. To identify the gene(s) controlling female sterility in wheat efficiently, an investigation was conducted for the seed setting ratio using a set of F2 populations derived from the cross between a female sterile line XND126 and an elite cultivar Gaocheng 8901. Bulked segregation analysis (BSA) method and recessive class approach were adopted to screen for SSR markers potentially linked to female fertility gene loci in 2005. Out of 1080 SSRs in wheat genome, eight markers on chromosome 2D showed a clear difference between two disparate bulks and small recombination frequency values, suggesting a strong linkage signal to the sterility gene. Based on the candidate linked markers, partial linkage maps were constructed with...
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently nee... more The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.
Nicotine dependence (ND) is costly to societies worldwide, moderately heritable, and genetically ... more Nicotine dependence (ND) is costly to societies worldwide, moderately heritable, and genetically complex. Risk loci can be identified with genetic linkage analysis independent of prior physiological hypotheses. We completed a genomewide linkage scan to map loci increasing risk for DSM-IV ND and for a quantitative assessment of ND as measured by the Fagerstrom Test for Nicotine Dependence (FTND) in a set of 634 small nuclear families ascertained on the basis of multiple individuals affected with cocaine or opioid dependence. Of these, 507 had at least two subjects affected with ND. There are two distinct populations within this sample, European-Americans (EAs) and African-Americans (AAs). A region on chromosome 5 was identified as containing a gene that affects risk for ND on the basis of FTND score in the AA part of our sample (logarithm of the odds [lod] score 3.04; empirically determined to be genomewide-significant, p = .0374; point p = .0001). The highest lod score observed in the EA part of the sample was on chromosome 7 (lod score 2.73). Several other "possible" risk loci were identified in either AA or EA subjects, with many of these in proximity to previously suggested risk loci from other clinical samples. Three nominally significant single-nucleotide polymorphism associations were found at the peptidylglycine alpha-amidating monooxygenase (PAM) locus under the chromosome 5 linkage peak, also in the AA part of the sample. These data add to the growing evidence for locations for ND risk loci, add a novel statistically significant locus important in AAs, and suggest a gene that might be contributing to this linkage signal.
We proposed a faster pedigree-based generalized multifactor dimensionality reduction algorithm, c... more We proposed a faster pedigree-based generalized multifactor dimensionality reduction algorithm, called PedG-MDR II (PII), to detect gene-gene interactions underlying complex traits. Inherited from our previous framework of PedGMDR (PI), PII can handle both dichotomous and continuous traits in pedigree-based designs and allows for covariate adjustment. Compared with PI, this faster version can theoretically halve the computing burden and memory requirement. To evaluate the performance of PII, we performed comprehensive simulations across a wide variety of experimental scenarios, in which we considered two study designs, discordant sib pairs and mixed families of varying size, and, for each study design, we considered five common factors that may potentially affect statistical power: minor allele frequency, missing rate of parental genotypes, covariate effect, gene-gene interaction, and scheme to adjust phenotypic outcomes. Simulations showed that PII gave well controlled type I error rates against population admixture. Under a total of 4,096 scenarios simulated, PII, in general, had a higher average power than PI for both dichotomous and continuous traits, and the advantage was more pronounced for continuous traits. PII also appeared to be less sensitive than PI to changes in the other four factors than the magnitude of genetic effects considered in this study. Applied to the Mid-South Tobacco Family study, PII detected a significant interaction with a p value of 5.4 × 10 −5 between two taster receptor genes, TAS2R16 and TAS2R38, responsible for nicotine dependence. In conclusion, PII is a faster supplementary version of our previous PI for detecting multifactor interactions.
PURPOSE: ARDS is associated with neutrophil mediated inflammation in the lungs. Previously, we ha... more PURPOSE: ARDS is associated with neutrophil mediated inflammation in the lungs. Previously, we have shown that a collagen derived tri-peptide proline-glycine-proline (PGP), a neutrophil chemo attractant is elevated in patients with chronic diseases with neutrophilic predominant inflammation such as COPD and cystic fibrosis. However, it is unknown if PGP is elevated in ARDS. We hypothesized that levels of PGP are increased in ARDS as compared to cardiogenic edema and non-lung disease ventilated patients.
With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been... more With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been available, enabling uncovering genetic architecture of complex traits by genome-wide association study (GWAS). However, the current GWAS strategy usually ignores epistatic and gene-environment interactions due to absence of appropriate methodology and heavy computational burden. This study proposed a new GWAS strategy by combining the graphics processing unit- (GPU-) based generalized multifactor dimensionality reduction (GMDR) algorithm with mixed linear model approach. The reliability and efficiency of the analytical methods were verified through Monte Carlo simulations, suggesting that a population size of nearly 150 recombinant inbred lines (RILs) had a reasonable resolution for the scenarios considered. Further, a GWAS was conducted with the above two-step strategy to investigate the additive, epistatic, and gene-environment associations between 701,867 SNPs and three important qua...
Previous studies have demonstrated that the c-aminobutyric acid type B (GABAB) receptor plays an ... more Previous studies have demonstrated that the c-aminobutyric acid type B (GABAB) receptor plays an essential role in modulating neurotransmitter release and regulating the activity of ion channels and adenyl cyclase. However, whether the naturally occurring polymorphisms in the two GABAB receptor subunit genes interact with each other to alter susceptibility to nicotine dependence (ND) remains largely unknown. In this study, we genotyped 5 and 33 single nucleotide polymorphisms (SNPs) for GABAB receptor subunit 1 and 2 genes (GABBR1, GABBR2), respectively, in a sample of 2037 individuals from 602 nuclear families of African- American (AA) or European-American (EA) origin. We conducted association analyses to determine (1) the association of each subunit gene with ND at both the individual SNP and haplotype levels and (2) the collective effect(s) of SNPs in both GABAB subunits on the development of ND. Several individual SNPs and haplotypes in GABBR2 were significantly associated with ...
For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a po... more For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a population. Because of its evolutionary relationship with linkage, this phenomenon is called linkage disequilibrium. Today, linkage disequilibrium-based mapping has become a major focus of recent genome research into mapping complex traits. In this article, we present a new statistical method for mapping quantitative trait loci (QTL) of additive, dominant, and epistatic effects in equilibrium natural populations. Our method is based on haplotype analysis of multilocus linkage disequilibrium and exhibits two significant advantages over current disequilibrium mapping methods. First, we have derived closed-form solutions for estimating the marker-QTL haplotype frequencies within the maximum-likelihood framework implemented by the EM algorithm. The allele frequencies of putative QTL and their linkage disequilibria with the markers are estimated by solving a system of regular equations. This pr...
A unified GMDR method for detecting gene–gene interactions in family and unrelated samples with a... more A unified GMDR method for detecting gene–gene interactions in family and unrelated samples with application to nicotine dependence
Lipid-related DEGs in Brassica napus and their lipid-related homologous genes in arabidopsis. (XL... more Lipid-related DEGs in Brassica napus and their lipid-related homologous genes in arabidopsis. (XLS 33 kb)
The top 20 most represented GO terms of DEG in the biological process category at the 15â 17 DAF ... more The top 20 most represented GO terms of DEG in the biological process category at the 15â 17 DAF (S2-G2). (XLS 33 kb)
The gi number of genes in NR database homologous with DEGs between varieties in QTL regions. (XLS... more The gi number of genes in NR database homologous with DEGs between varieties in QTL regions. (XLS 25 kb)
To reveal the impacts of smoking on genetic architecture of human body weight, we conducted a gen... more To reveal the impacts of smoking on genetic architecture of human body weight, we conducted a genome-wide association study on 5,336 subjects in four ethnic populations from MESA (The Multi-Ethnic Study of Atherosclerosis) data. A full genetic model was applied to association mapping for analyzing genetic effects of additive, dominance, epistasis, and their ethnicity-specific effects. Both the unconditional model (base) and conditional model including smoking as a cofactor were investigated. There were 10 SNPs involved in 96 significant genetic effects detected by the base model, which accounted for a high heritability (61.78%). Gene ontology analysis revealed that a number of genetic factors are related to the metabolic pathway of benzopyrene, a main compound in cigarettes. Smoking may play important roles in genetic effects of dominance, dominance-related epistasis, and gene-ethnicity interactions on human body weight. Gene effect prediction shows that the genetic effects of smoki...
Organism is a multi-level and modularized complex system that is composed of numerous interwoven ... more Organism is a multi-level and modularized complex system that is composed of numerous interwoven metabolic and regulatory networks. Functional associations and random evolutionary events in evolution result in elusive molecular, physiological, metabolic, and evolutionary relationships. It is a daunting challenge for biological studies to decipher the complex biological mechanisms and crack the codes of life. Hidden Markov models and more generally hidden Markov random fields can capture both random signals and inherent correlation structure typically in time and space, and have emerged as a powerful approach to solve many analytical problems in biology. This article will introduce the theory of hidden Markov model and the computational algorithms for the three fundamental statistical problems and summarize striking applications of hidden Markov models to biological and medical studies.
The manifestation of complex traits is influenced by gene-gene and gene-environment interactions,... more The manifestation of complex traits is influenced by gene-gene and gene-environment interactions, and the identification of multifactor interactions is an important but challenging undertaking for genetic studies. Many complex phenotypes such as disease severity are measured on an ordinal scale with more than two categories. A proportional odds model can improve statistical power for these outcomes, when compared to a logit model either collapsing the categories into two mutually exclusive groups or limiting the analysis to pairs of categories. In this study, we propose a proportional odds model-based generalized multifactor dimensionality reduction (GMDR) method for detection of interactions underlying polytomous ordinal phenotypes. Computer simulations demonstrated that this new GMDR method has a higher power and more accurate predictive ability than the GMDR methods based on a logit model and a multinomial logit model. We applied this new method to the genetic analysis of low-density lipoprotein (LDL) cholesterol, a causal risk factor for coronary artery disease, in the Multi-Ethnic Study of Atherosclerosis, and identified a significant joint action of the CELSR2, SERPINA12, HPGD, and APOB genes. This finding provides new information to advance the limited knowledge about genetic regulation and gene interactions in metabolic pathways of LDL cholesterol. In conclusion, the proportional odds model-based GMDR is a useful tool that can boost statistical power and prediction accuracy in studying multifactor interactions underlying ordinal traits.
Identification of multifactor gene-gene (G G) and gene-environment (G E) interactions underlying ... more Identification of multifactor gene-gene (G G) and gene-environment (G E) interactions underlying complex traits poses one of the great challenges to today's genetic study. Development of the generalized multifactor dimensionality reduction (GMDR) method provides a practicable solution to problems in detection of interactions. To exploit the opportunities brought by the availability of diverse data, it is in high demand to develop the corresponding GMDR software that can handle a breadth of phenotypes, such as continuous, count, dichotomous, polytomous nominal, ordinal, survival and multivariate, and various kinds of study designs, such as unrelated case-control, family-based and pooled unrelated and family samples, and also allows adjustment for covariates. We developed a versatile GMDR package to implement this serial of GMDR analyses for various scenarios (e.g., unified analysis of unrelated and family samples) and large-scale (e.g., genome-wide) data. This package includes other desirable features such as data management and preprocessing. Permutation testing strategies are also built in to evaluate the threshold or empirical p values. In addition, its performance is scalable to the computational resources. The software is available at http:// www.soph.uab.edu/ssg/software or http://ibi.zju.edu.cn/software.
Uploads
Papers by Xiang-Yang Lou