0% found this document useful (0 votes)
35 views77 pages

AI ML - Pradyot

The document provides an overview of Python and R programming languages, highlighting their advantages and libraries relevant to biology and bioinformatics. Python is noted for its versatility and ease of use, while R excels in statistical analysis and data visualization. Key libraries for both languages are discussed, along with recommended books for further learning.

Uploaded by

KP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views77 pages

AI ML - Pradyot

The document provides an overview of Python and R programming languages, highlighting their advantages and libraries relevant to biology and bioinformatics. Python is noted for its versatility and ease of use, while R excels in statistical analysis and data visualization. Key libraries for both languages are discussed, along with recommended books for further learning.

Uploaded by

KP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

ARTIFICIAL

INTELLIGENCE AND
MACHINE LEARNING
IN BIOLOGY AND
BIOINFORMATICS
INTRODUCTION TO R AND PYTHON
• Python is an interpreted, high-level object-oriented programming language.
• It comes with built-in data structures, dynamic typing(a process wherein
type checks are done during the runtime), and binding(mapping of different
objects with one another), which makes it a top language used for the
development of applications.
• Python syntaxes are simple, easy to read, and easy to learn.
• Python is better suitable for machine learning, deep learning, and large-
scale web applications.
• R is a statistical language used for the analysis and visual representation of
data.
• R is suitable for statistical learning having powerful libraries for data
experiment and exploration.
ADVANTAGES OF R AND PYTHON
 ADVANTAGES OF PYTHON
• Versatility: It is neat, uncomplicated to use, and well-structured. Python is object-oriented and its flexibility
makes exploratory data analysis hassle-free.
• Open Source: Free to use by anybody.
• Libraries: Python has many libraries that are necessary to carry out major data science-related functions.
• Productivity: Its integration and control capabilities enhance and save a lot of time.

 ADVANTAGES OF R-
• Open Source: R is an open-source language and is free to download and use.
• Platform independent: R is platform-independent and can work on all operating systems like UNIX,
Windows, and Mac.
• Data Wrangling: Through its packages like readr and dplyr, R has the capability of converting messy code into
a structured one.
• Plots and Graphs: Through ggplot and plotly, R creates attractive graphs with notations and formulas.
• Package Availability: R has numerous packages dedicated to the development of machine learning, data
analysis, and statistical projects.
LIBRARIES AND REPOSITORIES IN PYTHON
 USEFUL LIBRARIES IN PYTHON:
• Biopython: Biopython is a large collection of computational biology and bioinformatics tools. Biopython can be used to edit
biological sequences, interpret bioinformatics file formats, and conduct other bioinformatics activities.
• Bioconda: Bioconda is a bioinformatics software distribution for Conda, an open-source software package manager. Bioconda
makes it simple to install and administer a widerange of bioinformatics tools and libraries.
• Pandas: Pandas is extremely useful for dealing with and analysing structured data, which is prevalent in biological research.
Pandas can aid in the cleaning, transformation, and analysis of big datasets containing biological data.
• NumPy: NumPy is a core scientific computing package that includes support for huge, multi-dimensional arrays and matrices.
NumPy is essential for performing numerical operations on biological data, such as matrix manipulation and statistical analysis.
• Matplotlib: Matplotlib is a Python-based 2D charting package that generates static, animated, and interactive visualisations.
Matplotlib can be used to visualise biological data suchas gene expression or protein structures.
• Seaborn: Seabom offers statistical data visualisation techniques such as box plots, violin plots, heat maps, joint plots, and pair
plots, among others, You can use Seaborn to visually explore and present your data, as well as to enrich your Matplotlib plots
with new features and styles.
• Scikit-learn: This package includes machine learning and data mining algorithms like classification, regression, clustering,
dimensionality reduction, feature selection, and more.Scikit-learn allows you to apply k-means, logistic regression, decision
trees, random forests, support vector machines, and other methods and models to your data.
• Scikit-bio: Scikit-bio implements bioinformatics algorithms and data structures. Sequence manipulation, statistics, taxonomy
manipulation, feature extraction from sequences, and other features are available.
REPOSITORIES
• PyPI
• GitHub
LIBRARIES AND REPOSITORIES IN R
 USEFUL LIBRARIES IN R
• dplyr: It is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the
most frequent data manipulation hurdles.
• ggplot: Termed as Grammar of Graphics is a free, open-source, and easy-to-use visualization package
widely used in R Programming Language.
• Bioconductor: Bioconductor focuses on software tailored for genomic analysis.
• Caret: Classification And Regression Training is a set of functions that attempt to streamline the
process for creating predictive models.
• Gango: Advanced methods to interpret metagenomic/metatranscriptomic data based on networks and
gene ontologies.
• goProflies: Statistical Analysis of Functional Profiles. Functions for equivalence-based similarity analysis
of multiple gene lists.
 REPOSITORIES
• CRAN
• GitHub
MATPLOTLIB IN PYTHON
• Matplotlib is a comprehensive library for creating static, animated,
and interactive visualizations in Python.
• Matplotlib is a graph plotting library in python that serves as a
visualization utility.
• Matplotlib was created by John D. Hunter.
• Matplotlib is open source, and we can use it freely.
• Matplotlib is mostly written in python, a few segments are written in
C, Objective-C and Javascript for Platform compatibility.✓ It's capable
of plotting different pairwise data, statistical distribution, gridded
data, irregularly gridded data, 3D and volumetric data.
GGPLOT IN R
• Data visualization with ggplot2 in R Programming
Language also termod as Grammar of Graphics is
a free, open-source, and easy-to-use visualization package widely used in R
Programming Language.
• It is the most powerful visualization package written by Hadley Wickham.
• It has multiple layers to it-
Data: The element is the data set itself
Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis, y-axis, color, fill,
size, labels, alpha, shape, line width, line type
Geometrics: How our data being displayed using point, line, histogram, bar, boxplot
Facets: It displays the subset of the data using Columns and rows
Statistics: Binning, smoothing, descriptive, intermediate
Coordinates: the space between data and display using Cartesian, fixed, polar, limits
Themes: Non-data link
BOOKS FOR PYTHON AND R
• "Bioinformatics Data Skills" by Vince Buffalo: A practical guide to
managing and analyzing biological data.
PYTHON PROGRAMMING BOOKS FOR BIOLOGISTS
• "Python for Biologists" by Martin Jones.
• "Biopython tutorial and Cookbook" by Jeff Chang et. al., 2024.URL-(
https://biopython.org/DIST/docs/tutorial/Tutorial.html)
R PROGRAMMING BOOKS FOR BIOLOGISTS
• "Practical R for biologists: an introduction" by Donald Quicke, et. al.,
2021.
• "Programming for Bioinformatics" by Robert Gentleman.
PYTHON AND R PRACTICALS

You might also like