0% found this document useful (0 votes)
17 views23 pages

Lecture-13-Data Science, Database & Big Data

The document provides a comprehensive overview of Data Science, Databases, and Big Data, detailing their definitions, processes, techniques, and tools. It covers the types of data, sources, applications, and the importance of databases in modern computing, along with the characteristics and analytics of Big Data. Key concepts such as the 5 V's of Big Data and various visualization techniques are also discussed.

Uploaded by

efaz352
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

Lecture-13-Data Science, Database & Big Data

The document provides a comprehensive overview of Data Science, Databases, and Big Data, detailing their definitions, processes, techniques, and tools. It covers the types of data, sources, applications, and the importance of databases in modern computing, along with the characteristics and analytics of Big Data. Key concepts such as the 5 V's of Big Data and various visualization techniques are also discussed.

Uploaded by

efaz352
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Data Science, Database, and

Big Data
A Fundamental Overview of Techniques and Tools

©Md. Mahedi Hassan


What is Data Science?
Data Science is an interdisciplinary field that combines domain expertise,
programming skills, and knowledge of mathematics and statistics to extract
meaningful insights from data. It uses advanced algorithms and systems to analyze
large amounts of data, both structured and unstructured. The goal is to derive
actionable insights that drive decision-making and innovation.

• Disciplines involved: Statistics, Computer Science, Machine Learning, Domain Expertise

• Key Objective: Transform raw data into meaningful information for decision-making

• Key Techniques: Regression analysis, classification, clustering, deep learning

• Tools Commonly Used: Python, R, SQL, Jupyter, TensorFlow, PyTorch

• Related Fields: Artificial Intelligence (AI), Business Intelligence (BI), Data Engineering,

Operations Research

©Md. Mahedi Hassan


TYPES OF DATA
Structured Data, Unstructured Data, Semi-structured Data,
Time-Series Data, Spatial Data,Graph Data

1. Structured Data: Clearly defined data types stored in relational databases. Example: Employee

records in a table with columns like ID, name, and salary.

2. Unstructured Data: Data that doesn’t follow a specific format. Examples: PDFs, videos, images,

emails.

3. Semi-structured Data: Data that does not conform to a formal structure but has some

organizational properties. Example: JSON, XML files.

4. Time-Series Data: Data collected at successive points in time. Example: Stock prices,

temperature readings.

5. Spatial Data: Represents the physical location and shape of objects. Example: Maps, satellite

images.

6. Graph Data: Captures relationships and connections between entities. Example: Friend

connections on Facebook.
©Md. Mahedi Hassan
Sources of Data
• Internal Systems: Enterprise databases, HR systems, sales and inventory logs

• Sensors/IoT Devices: Temperature sensors, motion detectors, smart meters

• Web APIs: REST APIs providing structured access to external data (e.g.,

OpenWeatherMap, Twitter API)

• Public Datasets: Open Government Data, World Bank data, Kaggle

competitions

• Web Scraping: Automatically collecting data from websites using tools like

BeautifulSoup, Scrapy, Selenium

• Crowdsourced Data: Wikipedia edits, user reviews, citizen science platforms

©Md. Mahedi Hassan


Data Science Process
1. Problem Definition 2. Data Collection 3. Data Cleaning
Clarify the objective and define the scope Identify data sources and collect Handle missing values, correct

datasets errors, remove duplicates

4. Exploratory Data 5.Feature Engineering 6.Modeling


Analysis (EDA) Choose and apply machine
Create new variables, encode
Summarize the main characteristics of the learning algorithms (e.g., decision
categorical data, scale features
data using plots and statistics trees, logistic regression, neural

networks)
7. Evaluation 8. Deployment 9. Monitoring &
Use metrics such as accuracy, Integrate the model into the Maintenance
precision, recall, RMSE to evaluate end-user environment (e.g., via Track model performance

model performance web app, dashboard, API) and retrain as necessary

©Md. Mahedi Hassan


Applications of Data Science
• Finance: Risk modeling, fraud detection, algorithmic trading

• Retail: Dynamic pricing, customer segmentation, inventory optimization

• Healthcare: Disease prediction, medical image analysis, treatment

personalization

• Manufacturing: Predictive maintenance, quality control using computer

vision

• Marketing: Customer churn prediction, sentiment analysis, A/B testing

• Transportation: Route optimization, autonomous vehicle algorithms

• Government: Tax fraud detection, smart city planning, election forecasting


©Md. Mahedi Hassan
Database
Overview

Md. Mahedi Hassan

©Md. Mahedi Hassan


Introduction to Databases
A database is an organized collection of data stored electronically. It
allows users and applications to easily access, update, and manipulate
information. This data contains text, numbers, images, videos and more.
Databases are managed using specialized software known as a Database
Management System (DBMS), which facilitates the storage, retrieval, and
manipulation of data.

Databases are fundamental to modern computing, supporting various


applications from online shopping platforms to social media networks. It is
widely used in business, government, healthcare, and many other sectors to
store and manage data, enabling informed decision-making and efficient
operations.

©Md. Mahedi Hassan


Components of a Database
Databases consist of several critical components that work together to store, organize, and retrieve data effectively. Here’s a detailed
explanation of each component:
1. Data
Data is the core component of any database, representing the actual information stored. It can include numbers, text, images, videos, or
documents, depending on the database's purpose. For instance, a customer database might store customer names, addresses, and purchase
histories
2. Schema
The schema is the blueprint or structure of the database. It defines how data is organized and includes details like tables, columns, data types, and
relationships between entities. For example, a table in a customer database might have columns like CustomerID, Name, and Email. The schema
ensures consistency and helps users understand how the database is designed.
3. DBMS
The DBMS is the software layer that enables interaction with the database. It manages the storage, retrieval, and manipulation of data while
ensuring security and data integrity. Examples of DBMS software include MySQL, Oracle, and MongoDB. The DBMS also handles tasks like backup,
recovery, and query optimization to maintain the database’s performance.
4. Queries
Queries are commands used to interact with the database, allowing users to retrieve, manipulate, or update data. For relational databases, SQL
(Structured Query Language) is commonly used. For instance, a query like SELECT * FROM Customers WHERE Country = 'USA'; retrieves all
customers from the USA. Queries are vital for extracting actionable insights and managing data effectively.
5. Users
Users are individuals or applications that interact with the database. They can have different levels of access based on their roles, such as
administrators, developers, or end-users. For example, a database administrator might have full control, including the ability to create or delete
tables, while a regular user might only have permission to view specific data. ©Md. Mahedi Hassan
Why do we use Databases?
Learning about databases is essential for anyone looking to understand how information is stored, organized

and accessed in the digital world. Whether you are managing personal data, building applications or analyzing

business metrics, a strong knowledge of databases helps you to handle data efficiently and make informed

decisions.

Here are the reasons why we use databases:

• Efficient Data Storage: Databases can store large amounts of data and ensure it remains accessible and

organized.

• Facilitate Transactions: They enable smooth transactions in online banking, shopping, social media, and more

by processing and retrieving data in real-time.

• Data Updates: Databases allow for easy and quick updates to data, ensuring that the most current information is

always available.

• Data Analysis: Databases provide an excellent framework for analyzing trends and making data-driven decisions.

©Md. Mahedi Hassan


Types of Databases
Databases are the backbone of modern applications. They are designed to address specific data
storage, retrieval and management needs. Databases can be classified based on their structure, use
cases or storage methods.

Some Major types of databases:


1. Relational Databases,
2. NoSQL Databases
3. Distributed Databases
4. Cloud databases
5. Graph Databases
6. Object-Oriented Databases
7. Hierarchical Databases
8. Centralized Database

©Md. Mahedi Hassan


Types of Databases
1. Relational Databases
These databases organize data into tables with rows and columns, more like a spreadsheet. They use SQL to
manage and query data. Accessing structured data is made most flexible and efficient by relational
database technology. They ensure data is stored consistently and allow for complex queries, making them
ideal for applications requiring structured data and relationships.
• Examples: MySQL, PostgreSQL, Oracle, Microsoft SQL Server.
• Use Cases: E-commerce platforms, banking systems, and HR management.

2. NoSQL Databases
These databases are designed to handle unstructured and semi-structured data. They do not use tables and
SQL, instead store data in formats like documents, key-value pairs or graphs. They are highly scalable and
flexible, making them ideal for real-time applications
• Examples: MongoDB, Cassandra, DynamoDB.
• Use Cases: Social media platforms, IoT applications and big data analytics.

©Md. Mahedi Hassan


Types of Databases
3. Distributed Databases
These databases are made up of two or more files that are spread across multiple locations or servers. They
work together to provide a unified view of data. The database could be spread across many networks,
housed in one physical place, or kept on several computers. They enhance performance and reliability by
distributing data and workload across multiple servers.
• Examples: Google Spanner, Apache Cassandra.
• Use Cases: Global-scale applications, content delivery networks (CDNs).
4. Cloud databases
A collection of organized or unorganized data that is housed on a private, public or hybrid cloud computing
platform is known as a cloud database. They can be relational or NoSQL databases. They offer scalability,
flexibility, and cost-efficiency. Here users can pay for what they use without maintaining physical hardware.

• Examples: Amazon RDS, Google BigQuery, Microsoft Azure SQL Database.


• Use Cases: SaaS applications, startups, and dynamic workloads.

©Md. Mahedi Hassan


Database Management System (DBMS)
A Database Management System (DBMS) is a software solution designed to efficiently manage organize
and retrieve data in a structured manner.
It allows users to create, modify and query databases while ensuring data integrity, security and efficient data
access.
Unlike traditional file systems, DBMS minimizes data redundancy, prevents inconsistencies and simplifies
data management with features like concurrent access and backup mechanisms. DBMS plays an important
role in supporting data-driven decision-making and operational efficiency.
• A DBMS is a software that allows to create, update and retrieval of data in an organized way. It also
provides security to the database.
• Examples of relational DBMS are MySQL, Oracle, Microsoft SQL Server, Postgre SQL and Snowflake.
• Examples of NoSQL DBMS are MongoDB, Cassandra, DynamoDB and Redis.

©Md. Mahedi Hassan


Real Life Uses of Databases
• Banking: Store account details, manage transactions, fraud
detection
• E-commerce: Manage product inventory, customer
profiles, transaction records
• Healthcare: Electronic Health Records (EHR), appointment
scheduling, medical billing
• Transportation: Ticket booking, vehicle tracking, logistics
• Education: Student databases, learning management
systems (LMS), research databases
• Social Media: User profiles, friend connections, messages
and posts ©Md. Mahedi Hassan
Introduction to Big Data

Overview

Md. Mahedi Hassan


Big Data ?
Big Data describes extremely large
datasets that cannot be easily
managed, processed, or analyzed using
traditional data-processing tools.

It’s not just about the size of the data, but also the variety, velocity, and complexity involved in managing it.

©Md. Mahedi Hassan


The 5 V's of Big Data
1. Volume: The name ‘Big Data’ itself is related to a size which is enormous. Volume is a huge amount of data. To determine the value of
data, size of data plays a very crucial role. If the volume of data is very large, then it is actually considered as a ‘Big Data’. This means
whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence while dealing
with Big Data it is necessary to consider a characteristic ‘Volume’. Data is generated in terabytes or petabytes (e.g., Facebook generates 4
petabytes/day)
2. Velocity: The speed at which new data is generated and moved. Velocity refers to the high speed of accumulation of data. In Big Data
velocity data flows in from sources like machines, networks, social media, mobile phones etc. There is a massive and continuous flow of
data. This determines the potential of data that how fast the data is generated and processed to meet the demands. Sampling data can
help in dealing with the issue like ‘velocity’.
Example: There are more than 3.5 billion searches per day are made on Google. Also, Facebook users are increasing by 22%(Approx.) year
by year.
3. Variety: It refers to nature of data that is structured, semi-structured and unstructured data. It also refers to heterogeneous sources.
Different forms of data: text, images, videos, social media, sensor data etc.
4. Veracity: It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get messy and quality and
accuracy are difficult to control. (e.g., noisy social media data)
5. Value: After having the 4 V’s into account there comes one more V which stands for Value. The bulk of Data having no Value is of no
good to the company, unless you turn it into something useful. Data in itself is of no use or importance but it needs to be converted into
something valuable to extract Information. ©Md. Mahedi Hassan
Big Data Analytics
Big data analytics refers to the processes, techniques, and tools used to examine massive and complex datasets to
uncover hidden patterns, correlations, market trends, and other useful information. This process helps organizations
make more informed decisions, improve efficiency, and gain a competitive edge.

Big data analytics can be broadly categorized into four main types: descriptive, diagnostic, predictive, and prescriptive. These

types of analytics help organizations understand past performance, identify root causes of issues, predict future trends, and

develop optimal solutions.

©Md. Mahedi Hassan


Big Data Analytics
1. Descriptive Analytics: This type focuses on summarizing and describing past data to
understand what has happened. It answers questions like "What happened?" or "What is the
current state?". Examples include generating reports on sales, website traffic, or customer
demographics.
2. Diagnostic Analytics: Diagnostic analytics goes a step further than descriptive analytics by
investigating the reasons behind observed patterns or trends. It aims to answer "Why did it
happen?" by identifying the root causes of past events. Techniques like drill-down analysis and
data mining are often used to explore data in more detail.
3. Predictive Analytics: This type utilizes historical data and statistical techniques to forecast
future outcomes and trends. It answers questions like "What is likely to happen?". Predictive
models can be used to predict customer behavior, sales trends, or potential risks.
4. Prescriptive Analytics: Prescriptive analytics focuses on recommending the best course of
action to achieve desired outcomes. It goes beyond prediction to suggest specific solutions and
strategies. This type uses machine learning and optimization algorithms to determine the optimal
path to take. ©Md. Mahedi Hassan
Data Visualization Techniques
• Bar Charts & Histograms: Show counts or
frequency distributions
• Pie Charts: Show proportions of a whole
• Line Graphs: Show trends over time
• Scatter Plots: Explore relationships
between two variables
• Box Plots: Identify distribution and
outliers
• Heatmaps: Correlation matrices or density
plots
• Word Clouds: Visualize text data based on
word frequency
• Advanced Tools: Tableau (interactive
dashboards), Power BI, D3.js for custom
©Md. Mahedi Hassan
visualizations
©Md. Mahedi Hassan
References
• https://www.geeksforgeeks.org/

• Igual, L., & Segui, S. (2017). Introduction to Data Science

• Silberschatz, A., Korth, H. F., & Sudarshan, S. (2019). Database System

Concepts

• O’Reilly (2015). Big Data Principles and Practices

• IBM Data Science Professional Certificate (Coursera)

• Microsoft Learn: Introduction to Databases

©Md. Mahedi Hassan

You might also like