Skip to content

PacktPublishing/Bioinformatics-with-Python-Cookbook-Fourth-Edition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bioinformatics with Python Cookbook, Fourth Edition

This is the code repository for Bioinformatics with Python Cookbook, Fourth Edition, published by Packt.

Solve advanced computational biology problems and build production pipelines with Python and AI tools

Shane Brubaker

      Free PDF       Amazon      

About the book

Bioinformatics with Python Cookbook, Fourth Edition height=

If you've ever felt overwhelmed by the vast number of Python tools available for bioinformatics, you're not alone. The Bioinformatics with Python Cookbook is a recipe-based guide that explores practical approaches for solving classic bioinformatics challenges, showing you which Python packages work best for each task. You’ll start with the essential Python libraries for data science and bioinformatics, then move through key workflows in sequencing analysis, quality control, alignment, and variant calling. Along the way, you’ll pick up modern coding practices, explore recent advances in bioinformatics research, and gain hands-on experience with libraries such as NumPy, pandas, and sci-kit learn. This book walks you through core bioinformatics tasks such as phylogenetic analysis and population genomics while familiarizing you with the wealth of modern public bioinformatics databases. You’ll learn cloud computing approaches used by researchers, set up workflow orchestration systems for controlling bioinformatics pipelines, and see how AI and the use of large language models (LLMs) are reshaping the field–right down to designing proteins and DNA. By the end of this book, you’ll be ready to apply Python for real bioinformatics work and launch bioinformatics pipelines for your research.

Key Learnings

  • Process, analyze, and align sequencing data
  • Call variants and interpret their biological meaning
  • Use modern cloud infrastructure to launch bioinformatics workflows
  • Ingest, clean, and transform data efficiently
  • Explore how AI is shaping the future of bioinformatics
  • Leverage imaging data for biological insights
  • Apply single-cell sequencing to cluster and compare gene expression

Chapters

Chapters Colab Kaggle Gradient Studio Lab
Chapter 1: Computer Specifications and Python Setup
  • Welcome.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 2: Basics of Data Manipulation
  • Ch02-1-pandas-basic.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch02-2-pandas-pitfalls.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch02-3-pandas-memory.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 3: Modern Coding Practices and AI-Generated Coding
  • Ch02-1-pandas-basic.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch03-1-pycodestyle.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch03-2-sequence-manipulation.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch03-3-read-alignment.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch03-4-test-writing.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • pycodestyle.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 4: Data Science and Graphing
  • Ch04-1-numpy.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch04-2-PCA.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch04-3-k-means-PCA-animated.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch04-3-k-means.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch04-4-decision-trees.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch04-5-matplotlib.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch04-6-seaborn.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 5: Alignment and Variant Calling
  • Ch05-1-qc-data.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch05-2-sequence-manipulation.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch05-3-alignment.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch05-4-variant-calling.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 6: Annotation and Biological Interpretation
  • Ch06-1-variant-parsing.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch06-2-genome-annotation.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch06-3-genes-variants.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch06-4-protein-domains.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 7: Genomes and Genome Assembly
  • Ch07-1-genomes.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch07-2-graph-genomes.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch07-3-long-read-assembly.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch07-4-genome-assessment.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 8: Accessing Public Databases
  • Ch08-1-genbank-ncbi.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch08-2-using-sra.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch08-3-pdb-uniprot.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 9: Protein Structure and Proteomics
  • Ch09-1-extracting-from-pdb.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch09-2-molecular-distances.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch09-3-geometric-operations.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch09-4-nglview.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch09-4-py3dmol.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch09-5-proteomics.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 10: Phylogenetics
  • Ch10-1-preparing-dataset-checkpoint.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch10-1-preparing-dataset.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch10-2-aligning-genetic-data.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch10-3-comparing-sequences.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch10-4-reconstructing-trees.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch10-5-recursive-trees.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch10-6-visualizing-phylogenetics.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 11: Population Genetics
  • Ch11-1-plink.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch11-2-using-sgkit.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch11-3-exploring-with-sgkit.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch11-4-population-structure.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 12: Metabolic Modeling and Other Applications
  • Ch12-1-cobrapy.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch12-2-sirna.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch12-3-food-properties.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch12-4-gene-discovery.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 13: Genome Editing
  • Ch13-1-grna-design.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch13-2-barcodes.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch13-3-genome-editing.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 14: Cloud Basics
  • Ch14-2-boto3.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch14-3-containers.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 15: Workflow Systems
  • Ch15-1-bonus-using-galaxy-apis.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch15-1-introducing-galaxy.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch15-3-nextflow.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch15-2-snakemake.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 16: More Workflow Systems
Chapter 17: Deep Learning and LLMs for Nucleic Acid and Protein Design
  • Ch17-1-machine-learning.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch17-2-protein-design.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch17-3-genome-design-older.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch17-3-genome-design.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch17-bonus-agent.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
Chapter 18: Single-Cell Technology and Imaging
  • Ch18-1-microfluidics.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch18-2-scanpy.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch18-3-image-analysis.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab
  • Ch18-4-brain-mapping.ipynb
Open In Colab
Open In Kaggle
Open In Gradient
Open In Studio Lab

Requirements for this book

Here are a few things you should possibly know about:

  • You should have a basic understanding of a programming language to use this book.
  • Take the time to pursue the resources provided in the book if you think you need to brush up on a topic to get the most out of a section.
  • The book is best performed on a modern MacBook or macOS computer. However, alternatives are provided if you do not have one.

At the top level of the GitHub repository, you will find a README.md file. This is a Markdown file that can be read with any text editor. This file will contain updates to information and code in the book. There will also be a README.md file within each chapter directory with more detailed information. These files will inform you about important bug fixes and code updates in the recipes.

Get to know the Author

Shane Brubaker is a bioinformatics manager living in California. He believes in the power of bioinformatics as an interdisciplinary science to save lives and transform society. Shane has applied bioinformatics in areas ranging from synthetic biology to human health. Over the years, he has taught courses in computer science and biology, co-founded BayBifx, a leading Bay Area bioinformatics networking event, and mentored many bioinformatics professionals. Shane is passionate about training and providing opportunities for the next generation of scientists.

Other Related Books

About

Bioinformatics with Python Cookbook - Fourth Edition, published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •