Academia.eduAcademia.edu

Genomic Selection in Wheat Breeding using Genotyping-by-Sequencing

The Plant Genome

Abstract

Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marker-assisted selection.Genomic selection addresses this complexity by including all markers in the prediction model. A key method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other kernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive kernels in plant breeding, a new software package for R called rrBLUP has been developed. At its core is a fast maximum-likelihood algorithm for mixed models with a single variance component besides the residual error, which allows for effi cient prediction with unreplicated training data. Use of the rrBLUP software is demonstrated through several examples, including the identifi cation of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with nonadditive kernels was signifi cantly higher than RR for wheat (Triticum aestivum L.) grain yield but equivalent for several maize (Zea mays L.) traits.

Key takeaways

  • Four diff erent imputation methods were evaluated for the GBS data, which had up to 80% missing data per marker: (i) using the marker mean value (mean), (ii) calling missing genotypes as heterozygotes (hets), (iii) using RF regression (Breiman, 2001), and (iv) using a multivariate normal (MVN)-expectation maximization (EM) algorithm.
  • Th is analysis was repeated for each of the four imputation methods as well as for the relationship matrix based on a reduced set of GBS markers and the relationship matrix from the DArT markers.
  • Compared to the full set of GBS markers, the mean accuracy with the reduced marker set was not signifi cantly diff erent for yield and TKW (0.07).
  • Th e comparable performance of a limited number of GBS markers relative to the complete GBS data set of 34,749 markers indicates that (i) the population under study has relatively close relationships resulting in only a limited number of markers being need for full characterization, (ii) since the true breeding values remain unknown, uncertainty in the phenotypic observations limits the prediction accuracy, which was measured as the correlation between GEBVs and the observed phenotypes rather than the true breeding values, and/or (iii) the addition of GBS markers with higher levels of missing data does little to improve the characterization of kinship among the breeding lines.
  • Here we have shown that GBS can be used to generate markers to characterize wheat breeding lines and develop accurate GS models.