Bioinformatics with Python Cookbook, Fourth Edition

This is the code repository for Bioinformatics with Python Cookbook, Fourth Edition, published by Packt.

Solve advanced computational biology problems and build production pipelines with Python and AI tools

Shane Brubaker

About the book

Bioinformatics with Python Cookbook, Fourth Edition height=

If you've ever felt overwhelmed by the vast number of Python tools available for bioinformatics, you're not alone. The Bioinformatics with Python Cookbook is a recipe-based guide that explores practical approaches for solving classic bioinformatics challenges, showing you which Python packages work best for each task. You’ll start with the essential Python libraries for data science and bioinformatics, then move through key workflows in sequencing analysis, quality control, alignment, and variant calling. Along the way, you’ll pick up modern coding practices, explore recent advances in bioinformatics research, and gain hands-on experience with libraries such as NumPy, pandas, and sci-kit learn. This book walks you through core bioinformatics tasks such as phylogenetic analysis and population genomics while familiarizing you with the wealth of modern public bioinformatics databases. You’ll learn cloud computing approaches used by researchers, set up workflow orchestration systems for controlling bioinformatics pipelines, and see how AI and the use of large language models (LLMs) are reshaping the field–right down to designing proteins and DNA. By the end of this book, you’ll be ready to apply Python for real bioinformatics work and launch bioinformatics pipelines for your research.

Key Learnings

Process, analyze, and align sequencing data
Call variants and interpret their biological meaning
Use modern cloud infrastructure to launch bioinformatics workflows
Ingest, clean, and transform data efficiently
Explore how AI is shaping the future of bioinformatics
Leverage imaging data for biological insights
Apply single-cell sequencing to cluster and compare gene expression

Chapters

Chapters	Colab	Kaggle	Gradient	Studio Lab
Chapter 1: Computer Specifications and Python Setup
Welcome.ipynb
Chapter 2: Basics of Data Manipulation
Ch02-1-pandas-basic.ipynb
Ch02-2-pandas-pitfalls.ipynb
Ch02-3-pandas-memory.ipynb
Chapter 3: Modern Coding Practices and AI-Generated Coding
Ch02-1-pandas-basic.ipynb
Ch03-1-pycodestyle.ipynb
Ch03-2-sequence-manipulation.ipynb
Ch03-3-read-alignment.ipynb
Ch03-4-test-writing.ipynb
pycodestyle.ipynb
Chapter 4: Data Science and Graphing
Ch04-1-numpy.ipynb
Ch04-2-PCA.ipynb
Ch04-3-k-means-PCA-animated.ipynb
Ch04-3-k-means.ipynb
Ch04-4-decision-trees.ipynb
Ch04-5-matplotlib.ipynb
Ch04-6-seaborn.ipynb
Chapter 5: Alignment and Variant Calling
Ch05-1-qc-data.ipynb
Ch05-2-sequence-manipulation.ipynb
Ch05-3-alignment.ipynb
Ch05-4-variant-calling.ipynb
Chapter 6: Annotation and Biological Interpretation
Ch06-1-variant-parsing.ipynb
Ch06-2-genome-annotation.ipynb
Ch06-3-genes-variants.ipynb
Ch06-4-protein-domains.ipynb
Chapter 7: Genomes and Genome Assembly
Ch07-1-genomes.ipynb
Ch07-2-graph-genomes.ipynb
Ch07-3-long-read-assembly.ipynb
Ch07-4-genome-assessment.ipynb
Chapter 8: Accessing Public Databases
Ch08-1-genbank-ncbi.ipynb
Ch08-2-using-sra.ipynb
Ch08-3-pdb-uniprot.ipynb
Chapter 9: Protein Structure and Proteomics
Ch09-1-extracting-from-pdb.ipynb
Ch09-2-molecular-distances.ipynb
Ch09-3-geometric-operations.ipynb
Ch09-4-nglview.ipynb
Ch09-4-py3dmol.ipynb
Ch09-5-proteomics.ipynb
Chapter 10: Phylogenetics
Ch10-1-preparing-dataset-checkpoint.ipynb
Ch10-1-preparing-dataset.ipynb
Ch10-2-aligning-genetic-data.ipynb
Ch10-3-comparing-sequences.ipynb
Ch10-4-reconstructing-trees.ipynb
Ch10-5-recursive-trees.ipynb
Ch10-6-visualizing-phylogenetics.ipynb
Chapter 11: Population Genetics
Ch11-1-plink.ipynb
Ch11-2-using-sgkit.ipynb
Ch11-3-exploring-with-sgkit.ipynb
Ch11-4-population-structure.ipynb
Chapter 12: Metabolic Modeling and Other Applications
Ch12-1-cobrapy.ipynb
Ch12-2-sirna.ipynb
Ch12-3-food-properties.ipynb
Ch12-4-gene-discovery.ipynb
Chapter 13: Genome Editing
Ch13-1-grna-design.ipynb
Ch13-2-barcodes.ipynb
Ch13-3-genome-editing.ipynb
Chapter 14: Cloud Basics
Ch14-2-boto3.ipynb
Ch14-3-containers.ipynb
Chapter 15: Workflow Systems
Ch15-1-bonus-using-galaxy-apis.ipynb
Ch15-1-introducing-galaxy.ipynb
Ch15-3-nextflow.ipynb
Ch15-2-snakemake.ipynb
Chapter 16: More Workflow Systems
Chapter 17: Deep Learning and LLMs for Nucleic Acid and Protein Design
Ch17-1-machine-learning.ipynb
Ch17-2-protein-design.ipynb
Ch17-3-genome-design-older.ipynb
Ch17-3-genome-design.ipynb
Ch17-bonus-agent.ipynb
Chapter 18: Single-Cell Technology and Imaging
Ch18-1-microfluidics.ipynb
Ch18-2-scanpy.ipynb
Ch18-3-image-analysis.ipynb
Ch18-4-brain-mapping.ipynb

Requirements for this book

Here are a few things you should possibly know about:

You should have a basic understanding of a programming language to use this book.
Take the time to pursue the resources provided in the book if you think you need to brush up on a topic to get the most out of a section.
The book is best performed on a modern MacBook or macOS computer. However, alternatives are provided if you do not have one.

At the top level of the GitHub repository, you will find a README.md file. This is a Markdown file that can be read with any text editor. This file will contain updates to information and code in the book. There will also be a README.md file within each chapter directory with more detailed information. These files will inform you about important bug fixes and code updates in the recipes.

Get to know the Author

Shane Brubaker is a bioinformatics manager living in California. He believes in the power of bioinformatics as an interdisciplinary science to save lives and transform society. Shane has applied bioinformatics in areas ranging from synthetic biology to human health. Over the years, he has taught courses in computer science and biology, co-founded BayBifx, a leading Bay Area bioinformatics networking event, and mentored many bioinformatics professionals. Shane is passionate about training and providing opportunities for the next generation of scientists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bioinformatics with Python Cookbook, Fourth Edition

Solve advanced computational biology problems and build production pipelines with Python and AI tools

About the book

Key Learnings

Chapters

Requirements for this book

Get to know the Author

Other Related Books

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
Ch01		Ch01
Ch02		Ch02
Ch03		Ch03
Ch04		Ch04
Ch05		Ch05
Ch06		Ch06
Ch07		Ch07
Ch08		Ch08
Ch09		Ch09
Ch10		Ch10
Ch11		Ch11
Ch12		Ch12
Ch13		Ch13
Ch14		Ch14
Ch15		Ch15
Ch16		Ch16
Ch17		Ch17
Ch18		Ch18
docker/main		docker/main
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt

License

PacktPublishing/Bioinformatics-with-Python-Cookbook-Fourth-Edition

Folders and files

Latest commit

History

Repository files navigation

Bioinformatics with Python Cookbook, Fourth Edition

Solve advanced computational biology problems and build production pipelines with Python and AI tools

About the book

Key Learnings

Chapters

Requirements for this book

Get to know the Author

Other Related Books

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages