Data Science
Session 11
Making data work for
you
Data science is actually simple set of
methodologies taking in thousands of
forms of data that are available to us
today and using them to draw
meaningful conclusions.
Data is everywhere. Every
like, click, Email, credit card
swipe and Tweet Is a new
piece of data that can be used
to describe the present or
better predict the future.
What can data
do?
Describe the current state of an organization or
process
Detect anomalous events
Diagnose the causes of events and
behaviors Predict future events
Data can describe a current state. This can be
accomplished with dashboards or alerts,
simplifying time-intensive reporting
processes with new data Technologies.
The data science
workflow
Data collection Exploration and visualization
Experimentation and
prediction First, we collect data from various sources like Customer surveys, web traffic
results, emails between a sales representative and potential clients and financial
transactions. Next, We explore and visualize the data. This can help us explore
and visualize our data and track how data changes over time. Finally, we make
predictions or draw insights with our data.
For example, this can involve building a system that can segment clusters of
customers or classify Pictures of different types of cars.
Applications of
Data
Science
Mari Nazary
VP of Content, DataCamp
Machine Learning
Machine learning is one of the core components within data science that
focuses on creating algorithms and models that enable computers to learn
and make predictions or decisions based on data.
Case study: Fraud detection
Suppose you are the in charge of Fraud detection at a large
Bank. You might like to use data to determine the probability
of transaction being Fake.
You might start with gathering information about each Purchase
such as the amount, date and type of transaction made and
cardholder's address. We need many examples of transactions
including this information as well as the label which tells you
whether the transaction was fake or valid. You must have this data
in your database which is known as training data and is used to
build an algorithm. Every time a transaction is made, you give
this data to the Algorithm, and it will predict whether the
transaction was valid for fraudulent.
Case study: fraud
detection Amount Date Type ...
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
What do we need for machine
learning?
A well-defined question
"What is the probability that this transaction is
fraudulent?"
A set of example data
Old transactions labeled as "fraudulent" or "valid"
A new set of data to use our algorithm on
New credit card transactions
1.Data: High-quality, organized data with clear features and corresponding outcomes (labels) is fundamental for training machine
learning models.
2.Algorithms: Mathematical models and techniques that learn from data patterns to make predictions or decisions.
3.Evaluation and Refinement: Continuous validation, testing, and refinement of models using metrics, tuning parameters, and domain
expertise ensure accurate and effective predictions on new data. Making sure the computer's guesses are good by testing them with
new data and making improvements if it's not doing well enough.
Internet of Things
IoT refers to a network of interconnected devices or objects that can
communicate and share data with each other through the internet
Case study: smart
watch
Internet of
Things
Smart watches Suppose, you are trying to make a Smartwatch to monitor physical
Internet-connected home security Activity. You want to be able to detect different physical activities
like running or walking. A special sensor known as Accelerometer
systems Electronic toll collection systems monitors motion in three dimensions. The data generated by the
sensor is the basis of machine learning in the problem.
Building energy management
You could ask several volunteers to record their activity when they
systems Much, much more! are running or walking. Then, you can develop the algorithm that
can recognize and accelerometer data as representing one of two
States running or walking.
It establishes a field which is known as internet of things which is
also a part of data science. It talks about the Gadgets which are not
standard computers but still have the ability to transmit data. This
include smart watches, internet connected home security systems,
stereo systems and fire alarm system.
Deep learning
Deep learning is a subset of machine learning using neural networks that learn from data to
solve complex tasks like image recognition and language understanding. It involves training
layered networks to recognize patterns and make predictions, requiring significant data and
computational resources for its operations.
Case study: image
recognition
Case study: image
recognition
Images are broken down into grids of numbers, with each number representing a pixel's color or intensity. Deep learning models analyze these
numerical patterns to identify features like edges or textures. By learning from these patterns, the models can recognize objects or categories in
new images based on similarities to what they've seen during training.
Deep The task for self-driving cars involves identifying images
learning
Many neurons work together
containing humans. These pictures can be represented as a
matrix of numbers, forming a dataset.
Requires much more training
data Used in complex problems
However, when we input this information into a model, it may
Image classification seem complex. Advanced algorithms, known as deep learning,
are required. These algorithms are constructed as neurons in
Language
learning/understanding multiple layers that collaborate to derive intricate conclusions.
Deep learning necessitates a larger volume of data compared to
traditional machine learning models. It has capacity to handle
complex, unstructured data and make sense of it. such as image
classification or language understanding.
Dashboar
ds
Kaelen Medeiros
Product Data Scientist, DataCamp
What is a
Adashboard?
Common Dashboard
dashboard is a
set of metrics -Time Series
usually in the form
of graphs that are -Categorical
updated on schedule. Comparison
-Highlighting
Some dashboards a single number
can update in real
time as well.
Tracking a value over
time
Line Chart
Tracking composition over
time
Stacked/Clustered Chart
It tells us the percent of users coming through
different platforms
These are stacked bars and always total to 100 %
Categorical
comparison
Bar/Column Chart
Bar Chart: Compares different groups
during same time period
Define time range By adding label
Data from the past 30
days
Highlighting a single
number
Can be used for making a figure prominent.
Can also be used to compare multiple values over time
Displaying
text
Great way to add qualitative data like a few customer comment to signify the situation
Timestamp Comment
Oct 9, 2018 12:57:05 Awesome website! Loved the new layout.
Oct 10 2018 03:16:00 Had trouble getting the website to load. I couldn't buy my favorite
product!
Where can we build a
dashboard?
Spreadsheets: Excel or Google
Sheets BI Tools: Power BI, Tableau,
Looker Customized tools: R Shiny
or d3.js