Papers by Bayazit Yunusbayev

Reconstructing recent population history while mapping rare variants using haplotypes, 2019
Haplotype-based methods are a cost-effective alternative to characterize unobserved rare variants... more Haplotype-based methods are a cost-effective alternative to characterize unobserved rare variants and map disease-associated alleles. Moreover, they can be used to reconstruct recent population history, which shaped distribution of rare variants and thus can be used to guide gene mapping studies. In this study, we analysed Illumina 650 k genotyped dataset on three underrepresented populations from Eastern Europe, where ancestors of Russians came into contact with two indigenous ethnic groups, Bashkirs and Tatars. Using the IBD mapping approach, we identified two rare IBD haplotypes strongly enriched in asthma patients of distinct ethnic background. We reconstructed recent population history using haplotype-based methods to reconcile this contradictory finding. Our ChromoPainter analysis showed that these haplotypes each descend from a single ancestor coming from one of the ethnic groups studied. Next, we used DoRIS approach and showed that source populations for patients exchanged recent (<60 generations) asymmetric gene flow, which supported the ChromoPainter-based scenario that patients share haplotypes through inter-ethnic admixture. Finally, we show that these IBD haplotypes overlap with asthma-associated genomic regions ascertained in European population. This finding is consistent with the fact that the two donor populations for the rare IBD haplotypes: Russians and Tatars have European ancestry. Low frequency (1% < MAF < 5%) and rare genetic variants (<1%) evolved recently and tend to have more dele-terious effect 1. While such variants may play an important role in the heritability of complex traits 2,3 , their effect remains largely uncharacterized. Accurate detection of rare variants requires extremely large samples (>10000) and costly high-coverage resequencing 4. Therefore, there is a need for cost-effective methods to study rare variants in populations that are underrepresented in large-scale full genome sequencing projects. Chip-genotyped SNP datasets and rare haplotypes (<1%) constructed from them can be used as proxies for rare variants 5,6. For populations that are not present in large-scale re-sequencing projects, use of haplotypes as proxies for rare variants is the only available option currently. Distribution of rare variants has been shaped by more recent demographic events (5000-10000 years ago) in human population history 7. Therefore, when mapping rare variants, knowledge about the recent demographic history for each studies population is essential 7. In this regard, haplotype-based methods offer a rich arsenal of methods designed to reconstruct recent population history 8. In this study, we focus on underrepresented populations (Table 1) from Eastern Europe, the region that borders Central Asia and Siberia (Fig. 1). This region denoted as the Volga-Ural region has been a historical crossroad for human migrations and admixture 9,10. It represents a useful model to understand the effect of recent complex population history on the distribution of rare haplotypes that serve here as a proxy for rare variants. Genome-wide data for our samples were retrieved from a previously published dataset 11 , and here we briefly describe background information relevant for our study. Patients and healthy controls were recruited from the Republic of Bashkortostan (Fig. 1), which geographically represent easternmost European Russia. Three ethnic groups currently represent the majority of the population in this region: Bashkirs, Russians, and Tatars, each amounting roughly 1/3 of the total population (total census size ~4.5 million people). Of these, Bashkirs and

It is broadly accepted that psoriasis is an immune-mediated disease with a heritable component, b... more It is broadly accepted that psoriasis is an immune-mediated disease with a heritable component, but it is not clear what causes inflammation in the skin. Previous research suggests that fragments of the keratin 17 (K17) protein, which are constitutively expressed in hair follicles, could act as autoantigens. In this study, we synthesized the K17 protein from mRNA derived from hair follicles and tested whether it elicited T cell responses depending on the patient genotype at the major susceptibility locus HLA-Cw*06:02. We treated peripheral blood-derived cells with the K17 protein and its short fragments to assess the T cell proliferation response using flow cytometry. Our analyses show a significantly stronger increase in cell proliferation among patients but not in healthy controls. We then examined whether the variation in T cell proliferation correlated with the patient HLA-Cw*06:02 risk genotype. Considering the affected status and patient genotype as two independent predictors, we fitted a linear model and showed that the HLA-Cw*06:02 allele dosage strongly predicted the T cell response. Our study findings suggest that the K17 protein likely acts as an autoantigen in psoriasis and that patients' risk genotype is strongly correlated with the magnitude of the response to this putative autoantigen. Psoriasis is an immune-mediated chronic skin disease with a complex aetiology involving both genetic risk factors and environmental triggers. Skin lesions in psoriasis are infiltrated by inflammatory cells, but a marked increase in proliferation and turnover of keratinocytes distinguishes psoriasis from other inflammatory skin diseases. As the major site of inflammation, the skin is in continuous communication with the systemic immune, neural and endocrine systems and is capable of recognizing, discriminating and integrating various signals within a highly heterogeneous environment 1. Although various environmental factors have been reported to influence psoriasis 2 , only infections with beta-haemolytic streptococci have been convincingly associated with both the initiation and exacerbation of psoriasis 3–5. There is evidence that a streptococcal throat infection accompanies a considerable fraction of psoriasis cases, ranging from 56% to 60% according to some reports 6,7. Moreover, blood samples of psoriasis patients often contain antibodies to beta haemolytic streptococci and the pathogen itself 8. Therefore, the putative mechanism of inflammation could be similar to those observed in acute rheumatic fever and rheumatic heart disease. In this disease, streptococcal superantigen (GAS) circulating in the infected organism disrupts host immune tolerance and increases infiltration of GAS activated immune cells into the target tissue 9. An autoimmune response then develops because GAS-activated immune cells erroneously respond to self-proteins through a process called molecular mimicry 10. In the case of psoriasis, host antigens must be specific to the skin and joints, and keratin 17 (K17) is the most studied candidate to date 11–14. This protein shares short

High-coverage whole-genome sequence studies have so far focused
on a limited number1 of geographi... more High-coverage whole-genome sequence studies have so far focused
on a limited number1 of geographically restricted populations2–5,
or been targeted at specific diseases, such as cancer6. Nevertheless,
the availability of high-resolution genomic data has led to the
development of new methodologies for inferring population
history7–9 and refuelled the debate on the mutation rate in humans10.
Here we present the Estonian Biocentre Human Genome Diversity
Panel (EGDP), a dataset of 483 high-coverage human genomes
from 148 populations worldwide, including 379 new genomes from
125 populations, which we group into diversity and selection
sets. We analyse this dataset to refine estimates of continent-wide
patterns of heterozygosity, long- and short-distance gene flow, archaic
admixture, and changes in effective population size through time as
well as for signals of positive or balancing selection. We find a genetic
signature in present-day Papuans that suggests that at least 2% of
their genome originates from an early and largely extinct expansion
of anatomically modern humans (AMHs) out of Africa. Together
with evidence from the western Asian fossil record11, and admixture
between AMHs and Neanderthals predating the main Eurasian
expansion12, our results contribute to the mounting evidence for
the presence of AMHs out of Africa earlier than 75,000 years ago.

Происхождение и история демографии тюркских народов активно дискутируются, однако на геномном уро... more Происхождение и история демографии тюркских народов активно дискутируются, однако на геномном уровне изучены недостаточно. В данной работе, нами проанализированы 312,524 SNP маркера в выборках тюркоязычных народов Поволжья и Центральной Азии с целью определения происхождения некоторых компонентов генома. Анализ был выполнен путем расчета главных компонент изменчивости геномов популяций Юго-восточной Азии и Сибири с последующей проекцией образцов тюркояхычных народов на осях вариации. Данный двухэтапный подход позволил нам найти положение проецируемых образцов по тем маркерам генома, которые характеризуют компоненты изменчивости популяций Юго-восточной Азии и Сибири. Результат проекции показал тесную кластеризацию геномов тюркоязычных народов Поволжья и Центральной Азии с выборками народов Южной Сибири и Монголии. Это указывает на преимущественно южносибирское происхождение части генома западных тюркских народов. Нами также показано, что доля южносибирского компонента у западных тюркских народов убывает в ряду: киргизы, казахи, узбеки, туркмены, башкиры, татары, чуваши. Кроме того, нами показана роль разных популяций Юго-Восточной Азии и Сибири в формировании генофондов западных тюркских народов.

PLOS Genetics, 2015
The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic language... more The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, East Europe, the Caucasus, Anatolia, the Middle East, and Afghani-stan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language's expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. In agreement with the elite dominance model of language expansion most of the Turkic peoples studied genetically resemble their geographic neighbors. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area

The American Journal of Human Genetics, 2011
South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpr... more South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.

PLOS ONE, 2015
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid diver... more The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.
Biosphere Origin and Evolution, 2008
The study of the Volga-Ural region and the Central Asia populations is carried out on the basis o... more The study of the Volga-Ural region and the Central Asia populations is carried out on the basis of the analysis of SNP and microsatellites of Y-chromosome, and also mtDNA hypervariable segment I and coding region. Principally new data on relationship, reciprocal location, degree of similarity and distinction of populations are received. Genetic relationships between populations of these regions are investigated.

PLoS ONE, 2014
Contemporary inhabitants of the Balkan Peninsula belong to several ethnic groups of diverse cultu... more Contemporary inhabitants of the Balkan Peninsula belong to several ethnic groups of diverse cultural background. In this study, three ethnic groups from Bosnia and Herzegovina -Bosniacs, Bosnian Croats and Bosnian Serbs -as well as the populations of Serbians, Croatians, Macedonians from the former Yugoslav Republic of Macedonia, Montenegrins and Kosovars have been characterized for the genetic variation of 660 000 genome-wide autosomal single nucleotide polymorphisms and for haploid markers. New autosomal data of the 70 individuals together with previously published data of 20 individuals from the populations of the Western Balkan region in a context of 695 samples of global range have been analysed. Comparison of the variation data of autosomal and haploid lineages of the studied Western Balkan populations reveals a concordance of the data in both sets and the genetic uniformity of the studied populations, especially of Western South-Slavic speakers. The genetic variation of Western Balkan populations reveals the continuity between the Middle East and Europe via the Balkan region and supports the scenario that one of the major routes of ancient gene flows and admixture went through the Balkan Peninsula.

Human biology, 2013
The origin and history of the Ashkenazi Jewish population have long been of great interest, and a... more The origin and history of the Ashkenazi Jewish population have long been of great interest, and advances in high-throughput genetic analysis have recently provided a new approach for investigating these topics. We and others have argued on the basis of genome-wide data that the Ashkenazi Jewish population derives its ancestry from a combination of sources tracing to both Europe and the Middle East. It has been claimed, however, through a reanalysis of some of our data, that a large part of the ancestry of the Ashkenazi population originates with the Khazars, a Turkic-speaking group that lived to the north of the Caucasus region ~1,000 years ago. Because the Khazar population has left no obvious modern descendants that could enable a clear test for a contribution to Ashkenazi Jewish ancestry, the Khazar hypothesis has been difficult to examine using genetics. Furthermore, because only limited genetic data have been available from the Caucasus region, and because these data have been ...

Molecular Biology and Evolution, 2012
The Caucasus, inhabited by modern humans since the Early Upper Paleolithic and known for its ling... more The Caucasus, inhabited by modern humans since the Early Upper Paleolithic and known for its linguistic diversity, is considered to be important for understanding human dispersals and genetic diversity in Eurasia. We report a synthesis of autosomal, Y chromosome, and mitochondrial DNA (mtDNA) variation in populations from all major subregions and linguistic phyla of the area. Autosomal genome variation in the Caucasus reveals significant genetic uniformity among its ethnically and linguistically diverse populations and is consistent with predominantly Near/Middle Eastern origin of the Caucasians, with minor external impacts. In contrast to autosomal and mtDNA variation, signals of regional Y chromosome founder effects distinguish the eastern from western North Caucasians. Genetic discontinuity between the North Caucasus and the East European Plain contrasts with continuity through Anatolia and the Balkans, suggesting major routes of ancient gene flows and admixture.

European Journal of Human Genetics, 2011
The phylogenetic relationships of numerous branches within the core Y-chromosome haplogroup R-M20... more The phylogenetic relationships of numerous branches within the core Y-chromosome haplogroup R-M207 support a West Asian origin of haplogroup R1b, its initial differentiation there followed by a rapid spread of one of its sub-clades carrying the M269 mutation to Europe. Here, we present phylogeographically resolved data for 2043 M269-derived Y-chromosomes from 118 West Asian and European populations assessed for the M412 SNP that largely separates the majority of Central and West European R1b lineages from those observed in Eastern Europe, the Circum-Uralic region, the Near East, the Caucasus and Pakistan. Within the M412 dichotomy, the major S116 sub-clade shows a frequency peak in the upper Danube basin and Paris area with declining frequency toward Italy, Iberia, Southern France and British Isles. Although this frequency pattern closely approximates the spread of the Linearbandkeramik (LBK), Neolithic culture, an advent leading to a number of pre-historic cultural developments during the past r10 thousand years, more complex pre-Neolithic scenarios remain possible for the L23(xM412) components in Southeast Europe and elsewhere.
Balkan Journal of Medical Genetics, 2000
... living in four Eur-asian regions. Hum Hered 2006; 61(1): 1-9. 18. Yunusbayev B, Kutuev I, Khu... more ... living in four Eur-asian regions. Hum Hered 2006; 61(1): 1-9. 18. Yunusbayev B, Kutuev I, Khusainova R, Guseinov G, Khusnutdinova E. Genetic structure of Dagestan populations: a study of 11 Alu insertion polymorphisms. Hum Biol 2006; 78(4): 465-476. 19. Osmanov MO. ...

The American Journal of Human Genetics, 2011
South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpr... more South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.
Human Heredity, 2006
We have analyzed the distribution and patterns of the genetic diversity of eight Alu loci (ACE, A... more We have analyzed the distribution and patterns of the genetic diversity of eight Alu loci (ACE, ApoA1, PV92,
Uploads
Papers by Bayazit Yunusbayev
on a limited number1 of geographically restricted populations2–5,
or been targeted at specific diseases, such as cancer6. Nevertheless,
the availability of high-resolution genomic data has led to the
development of new methodologies for inferring population
history7–9 and refuelled the debate on the mutation rate in humans10.
Here we present the Estonian Biocentre Human Genome Diversity
Panel (EGDP), a dataset of 483 high-coverage human genomes
from 148 populations worldwide, including 379 new genomes from
125 populations, which we group into diversity and selection
sets. We analyse this dataset to refine estimates of continent-wide
patterns of heterozygosity, long- and short-distance gene flow, archaic
admixture, and changes in effective population size through time as
well as for signals of positive or balancing selection. We find a genetic
signature in present-day Papuans that suggests that at least 2% of
their genome originates from an early and largely extinct expansion
of anatomically modern humans (AMHs) out of Africa. Together
with evidence from the western Asian fossil record11, and admixture
between AMHs and Neanderthals predating the main Eurasian
expansion12, our results contribute to the mounting evidence for
the presence of AMHs out of Africa earlier than 75,000 years ago.
on a limited number1 of geographically restricted populations2–5,
or been targeted at specific diseases, such as cancer6. Nevertheless,
the availability of high-resolution genomic data has led to the
development of new methodologies for inferring population
history7–9 and refuelled the debate on the mutation rate in humans10.
Here we present the Estonian Biocentre Human Genome Diversity
Panel (EGDP), a dataset of 483 high-coverage human genomes
from 148 populations worldwide, including 379 new genomes from
125 populations, which we group into diversity and selection
sets. We analyse this dataset to refine estimates of continent-wide
patterns of heterozygosity, long- and short-distance gene flow, archaic
admixture, and changes in effective population size through time as
well as for signals of positive or balancing selection. We find a genetic
signature in present-day Papuans that suggests that at least 2% of
their genome originates from an early and largely extinct expansion
of anatomically modern humans (AMHs) out of Africa. Together
with evidence from the western Asian fossil record11, and admixture
between AMHs and Neanderthals predating the main Eurasian
expansion12, our results contribute to the mounting evidence for
the presence of AMHs out of Africa earlier than 75,000 years ago.