{"title":"PyVideo.org - genomes","link":[{"@attributes":{"href":"https:\/\/pyvideo.org\/","rel":"alternate"}},{"@attributes":{"href":"https:\/\/pyvideo.org\/feeds\/tag_genomes.atom.xml","rel":"self"}}],"id":"https:\/\/pyvideo.org\/","updated":"2011-03-11T00:00:00+00:00","subtitle":{},"entry":{"title":"Rapid Python used on Big Data to Discover Human Genetic Variation","link":{"@attributes":{"href":"https:\/\/pyvideo.org\/pycon-us-2011\/pycon-2011--rapid-python-used-on-big-data-to-disc.html","rel":"alternate"}},"published":"2011-03-11T00:00:00+00:00","updated":"2011-03-11T00:00:00+00:00","author":{"name":"Deniz Kural"},"id":"tag:pyvideo.org,2011-03-11:\/pycon-us-2011\/pycon-2011--rapid-python-used-on-big-data-to-disc.html","summary":"<h3>Description<\/h3><p>Rapid Python used on Big Data to Discover Human Genetic Variation<\/p>\n<p>Presented by Deniz Kural<\/p>\n<p>Advances in genome sequencing has enabled large-scale projects such as\nthe 1000 Genomes Project to sequence genomes across diverse populations\naround the world, resulting in very large data sets. I use Python for\nrapid \u2026<\/p>","content":"<h3>Description<\/h3><p>Rapid Python used on Big Data to Discover Human Genetic Variation<\/p>\n<p>Presented by Deniz Kural<\/p>\n<p>Advances in genome sequencing has enabled large-scale projects such as\nthe 1000 Genomes Project to sequence genomes across diverse populations\naround the world, resulting in very large data sets. I use Python for\nrapid development of algorithms for processing &amp; analyzing genomes and\ndiscovering thousands of new variants, including &quot;Mobile Elements&quot; that\ncopy&amp;paste; themselves across the genome.<\/p>\n<p>Abstract<\/p>\n<p>Recent advances in high-throughput sequencing now enables accurate\nsequencing human genomes at a low cost &amp; high speed. This technology is\nnow used to initiate projects involving large-scale sequencing of many\ngenomes. The 1000 Genomes project aims to sequence 2500 genomes across\n27 world populations, and has initially completed its Pilot phase. The\naim of the project is to discover &amp; characterize novel variants. These\nvariants enable association studies that investigate the link between\ngenomic variation &amp; phenotypes, including disease.<\/p>\n<p>A class of variants, known as &quot;Structural Variants&quot; represent a\nheterogenous class of larger variants, such as inversions, duplications,\ndeletions, and various kinds of insertions.<\/p>\n<p>I use Python to for rapid development of algorithms to process, analyze,\nand annotate very large data sets. In particular, I focus on Mobile\nElements, pieces of DNA that copy&amp;paste; across the genome. These\nelements constitute roughly half of the genome, whereas protein-coding\ngenes account for roughly 1.5 % of the genome.<\/p>\n<p>I will discuss distributed computing, genomics, and big data within the\ncontext of Python.<\/p>\n","category":[{"@attributes":{"term":"PyCon US 2011"}},{"@attributes":{"term":"bigdata"}},{"@attributes":{"term":"casestudy"}},{"@attributes":{"term":"dna"}},{"@attributes":{"term":"genomes"}},{"@attributes":{"term":"pycon"}},{"@attributes":{"term":"pycon2011"}}]}}