0% found this document useful (0 votes)

38 views4 pages

Big Data Notes

The document discusses the emergence and significance of big data, highlighting its origins, value, and applications across various fields such as personalized marketing, biomedical applications, and smart cities. It emphasizes the importance of collaboration among diverse teams in data science, outlining the '5 P's of data science: people, purpose, process, platforms, and programmability.' Additionally, it details the data analysis process, which includes steps like acquiring, preparing, analyzing, reporting, and acting on data.

Uploaded by

codeneerajs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views4 pages

Big Data Notes

Uploaded by

codeneerajs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Big data notes :

Module 2 :

Lec 1 : what launched the big data era ?

As per mckinsey report 2013, a torrent – big data , the users are generating more and more data dat by
day and the demand for on – demand computing or cloud computing is also increasing, hence it started
big data era. Although google or facebook, one of them stated it in 2004.

What makes big data valuable ?

Question : describe examples from fields where big data has enabled better models which allows for
higher precision recommendations or solutions to make the world a better place.

Answer : big data  better models - high precision.

- Personalized marketing data.

- Business development.
- Hear the voice of each consumer, example – Walmart etc uses it for personalized consumer
service.
- Better marketing campaign : example – recommendation engines on amazon , Netflix etc.
- Sentiment analysis of events, customer or products - through reviews.
- Mobile advertising – through GPS providing real-time location based advertising.
- Collective consumer behavior : Using global consumer behavior for product growth.
- Biomedical applications : genome data editing etc.
- Personalized cancer treatment.
- Smart cities.

 Saving lives with big data : wildfire analysis.

Question : Give an example for sensor, organizational and ppl – generated data used in wildfire
analytics.

Break up into two parameter : prediction and response.

Where does Big data come from ?

 Origin of big data :

- Not new
- Generated by – machine , ppl, organization.
- Large hedron – 40 tb every second generation.
- Org : transaction etc

 How machine generated data is useful :

Ques : understand how machine generated big data is being used to enable real time actions.
And identify whats needed to start creating a big data strategy that includes machine generated
data ?

Ans :

 Human generated data for example – notes, texts, images, audio , videos, is unstructured and
not suited for pre-defined data models so we need to clean it in line of particular predefined
data model.
 Still, companies are using it using techs like – Hadoop, storm, spark, NoSQL, to clean
unstructured data.
- Hadoop is open source big data framework which is designed to support the enormous amount
of data in distributed computing environment.
- Real time data , called high velocity data is being handled by apache storm and spark.
-

Module 4 :-

 Getting value out of big data is a team work where ppl with different domains or expertise sit
together to take out insights from data. Yes, many insights can be collected from same peace of
data. The insight or lets say prediction is taken out using data as empirical evidence and near-
time- real data .
 Building a big data strategy : a paln of action or policy designed to achieve an overall aim .
- Strategy : aim, policy, plan , action.

 5 P’s of data science : ppl , purpose, process, platforms , programmability.

 Purpose: The purpose refers to the challenge or set of challenges defined by your big
data strategy. The purpose can be related to a scientific analysis with a hypothesis or a
business metric that needs to be analyzed based often on Big Data.
 People: The data scientists are often seen as people who possess skills on a variety of
topics including: science or business domain knowledge; analysis using statistics,
machine learning and mathematical knowledge; data management, programming and
computing. In practice, this is generally a group of researchers comprised of people with
complementary skills.

 Process: Since there is a predefined team with a purpose, a great place for this team to
start with is a process they could iterate on. We can simply say, People with Purpose will
define a Process to collaborate and communicate around! The process of data science
includes techniques for statistics, machine learning, programming, computing and data
management. A process is conceptual in the beginning and defines the course set of steps
and how everyone can contribute to it. Note that similar reusable processes can be
applicable to many applications with different purposes when employed within different
workflows. Data science workflows combine such steps in executable graphs. We believe
that process-oriented thinking is a transformative way of conducting data science to
connect people and techniques to applications. Execution of such a data science process
requires access to many datasets, Big and small, bringing new opportunities and
challenges to Data Science. There are many Data Science steps or tasks, such as Data
Collection, Data Cleaning, Data Processing/Analysis, Result Visualization, resulting in a
Data Science Workflow. Data Science Processes may need user interaction and other
manual operations, or be fully automated.Challenges for the data science process include
1) how to easily integrate all needed tasks to build such a process; 2) how to find the best
computing resources and efficiently schedule process executions to the resources based
on process definition, parameter settings, and user preferences.

 Platforms: Based on the needs of an application-driven purpose and the amount of data
and computing required to perform this application, different computing and data
platforms can be used as a part of the data science process. This scalability should be
made part of any data science solution architecture.

 Programmability: Capturing a scalable data science process requires aid from

programming languages, e.g., R, and patterns, e.g., MapReduce. Tools that provide
access to such programming techniques are key to making the data science process
programmable on a variety of platforms.

To summarize, data science can be defined as a craft of using the five pieces identified above.
Having a process between the more business driven P’s people and purpose and the more
technical driven P’s platforms and programmability leads to a streamlined approach that starts
and ends with a defined business value, team accountability and collaboration in mind.

 Asking the right questions :

- Define the problem.

- Assess the current situation.
 Data analysis process steps : acquire – prepare – analyze – report – act.

1) Acquiring data :

Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Data Science & Big Data Essentials
No ratings yet
Data Science & Big Data Essentials
46 pages
Orientation To Computing
No ratings yet
Orientation To Computing
67 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Industry 4.0 & AI in Data Management
No ratings yet
Industry 4.0 & AI in Data Management
8 pages
Fods MQP Solutions - 025136
No ratings yet
Fods MQP Solutions - 025136
76 pages
Ids Unit-I
No ratings yet
Ids Unit-I
34 pages
Bda Unit-1 Notes
No ratings yet
Bda Unit-1 Notes
10 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
DS - Unit I
No ratings yet
DS - Unit I
3 pages
Need For Data Science:: PART-1
No ratings yet
Need For Data Science:: PART-1
65 pages
Big Data Characteristics and Data Science Process
No ratings yet
Big Data Characteristics and Data Science Process
3 pages
Data Science
No ratings yet
Data Science
10 pages
AIDS C04-Session-19
No ratings yet
AIDS C04-Session-19
29 pages
Unit 1 Data Science
No ratings yet
Unit 1 Data Science
12 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
232 pages
00 - 00 DS - Overview - FRAMEWORK
No ratings yet
00 - 00 DS - Overview - FRAMEWORK
63 pages
6001 - Datascience With Bigdata
No ratings yet
6001 - Datascience With Bigdata
34 pages
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Big Data Analytics for Students
No ratings yet
Big Data Analytics for Students
23 pages
Data Science Course in Pitampura
No ratings yet
Data Science Course in Pitampura
19 pages
Data Science and Big Data Analytics Unit 1 Notes
No ratings yet
Data Science and Big Data Analytics Unit 1 Notes
13 pages
Mod 3
No ratings yet
Mod 3
96 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
15 pages
Data Science Overview for Honours Students
No ratings yet
Data Science Overview for Honours Students
28 pages
DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
Data
No ratings yet
Data
43 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Ids Unit 1,2,3,4 & 5
No ratings yet
Ids Unit 1,2,3,4 & 5
117 pages
Data Science
No ratings yet
Data Science
31 pages
Foundations of Data Science Course
No ratings yet
Foundations of Data Science Course
25 pages
Data Science Unit-I
No ratings yet
Data Science Unit-I
13 pages
Datascience
No ratings yet
Datascience
12 pages
Big Data Insights for Businesses
No ratings yet
Big Data Insights for Businesses
17 pages
Ids Unit 1 Final
No ratings yet
Ids Unit 1 Final
30 pages
Data Science Course in Hyderabad
No ratings yet
Data Science Course in Hyderabad
9 pages
Data Science Mastery Course in Pitampura
No ratings yet
Data Science Mastery Course in Pitampura
19 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
70 pages
Fods Unit 1
No ratings yet
Fods Unit 1
9 pages
Data Science
No ratings yet
Data Science
5 pages
20IT501 BDA Unit1
No ratings yet
20IT501 BDA Unit1
18 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
19 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
Data Science Process UNIT - II PS New
No ratings yet
Data Science Process UNIT - II PS New
21 pages
Data Science
No ratings yet
Data Science
17 pages
5th Sem Internship Eport
No ratings yet
5th Sem Internship Eport
83 pages
Chapter - 01 - Introduction To Big Data
No ratings yet
Chapter - 01 - Introduction To Big Data
22 pages
Unit - I Data Science
No ratings yet
Unit - I Data Science
95 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
347 862932 Introduction
No ratings yet
347 862932 Introduction
35 pages
Data Science
No ratings yet
Data Science
40 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Data Science
No ratings yet
Data Science
9 pages
Chapter 14 Big Data and Data Science - DONE DONE DONE
No ratings yet
Chapter 14 Big Data and Data Science - DONE DONE DONE
28 pages
ChatGPT - MyLearning On Big Data, Data Science and Machine Learning
No ratings yet
ChatGPT - MyLearning On Big Data, Data Science and Machine Learning
44 pages
Unit1 Fds
No ratings yet
Unit1 Fds
20 pages
File
No ratings yet
File
27 pages
IPSec VPN Troubleshooting Guide
No ratings yet
IPSec VPN Troubleshooting Guide
28 pages
ESD Important Questions
No ratings yet
ESD Important Questions
4 pages
EmbeddedSoftware Assignment
No ratings yet
EmbeddedSoftware Assignment
1 page
Iologik - MXIO - DLL - API Reference - v4
No ratings yet
Iologik - MXIO - DLL - API Reference - v4
198 pages
Hackathon PPT-1
No ratings yet
Hackathon PPT-1
7 pages
PE Implementation Paper
No ratings yet
PE Implementation Paper
2 pages
Computer System Level Hierarchy Multilevel Viewpoint of Machine
No ratings yet
Computer System Level Hierarchy Multilevel Viewpoint of Machine
10 pages
A3 DTG T-Shirt Printer Specifications
No ratings yet
A3 DTG T-Shirt Printer Specifications
2 pages
PCS7v6 Siprotec 7SJ6x Mapping3 4 v1 0 en PDF
No ratings yet
PCS7v6 Siprotec 7SJ6x Mapping3 4 v1 0 en PDF
41 pages
Worksheet Chapter 2 Polynomials
No ratings yet
Worksheet Chapter 2 Polynomials
4 pages
Computing With Strings v3
No ratings yet
Computing With Strings v3
48 pages
ThinkPad Mobile Internet - ArchWiki Seting Modem Thinkpad t440
No ratings yet
ThinkPad Mobile Internet - ArchWiki Seting Modem Thinkpad t440
6 pages
VC SpyGlass CDC UserGuide
No ratings yet
VC SpyGlass CDC UserGuide
1,608 pages
Firewall App Blocker v1.7
No ratings yet
Firewall App Blocker v1.7
6 pages
Yamaha RX V4a
No ratings yet
Yamaha RX V4a
324 pages
Daa Unit5
No ratings yet
Daa Unit5
28 pages
Bootstrap
No ratings yet
Bootstrap
10 pages
MMM-R FCT Man 0722 en-US
No ratings yet
MMM-R FCT Man 0722 en-US
206 pages
Big Data Analytics in Healthcare
No ratings yet
Big Data Analytics in Healthcare
32 pages
IBM Infosphere Datastage and QualityStage Parallel Job Advanced Developer Guide v8 7
No ratings yet
IBM Infosphere Datastage and QualityStage Parallel Job Advanced Developer Guide v8 7
861 pages
Computer Networks Notes 1
No ratings yet
Computer Networks Notes 1
9 pages
Screenshot 2025-05-29 at 11.15.37 PM
No ratings yet
Screenshot 2025-05-29 at 11.15.37 PM
1 page
Leatus Vision-Inspection-Solutions-En
No ratings yet
Leatus Vision-Inspection-Solutions-En
21 pages
Class 5 Computer Term 1
No ratings yet
Class 5 Computer Term 1
2 pages
Computer G1,2,3
No ratings yet
Computer G1,2,3
6 pages
Obstructive Sleep Apnea
No ratings yet
Obstructive Sleep Apnea
19 pages
Ehealth Cloud Security Challenges A Survey PDF
No ratings yet
Ehealth Cloud Security Challenges A Survey PDF
16 pages
SAE J514 Standard Overview
No ratings yet
SAE J514 Standard Overview
71 pages
User Guide - Outlook Settings For Your Mobile Phone
No ratings yet
User Guide - Outlook Settings For Your Mobile Phone
2 pages
Malware Lab Concept
No ratings yet
Malware Lab Concept
44 pages

Big Data Notes

Uploaded by

Big Data Notes

Uploaded by

Big data notes :

Lec 1 : what launched the big data era ?

What makes big data valuable ?

Answer : big data  better models - high precision.

- Personalized marketing data.

 Saving lives with big data : wildfire analysis.

Break up into two parameter : prediction and response.

Where does Big data come from ?

 Origin of big data :

 How machine generated data is useful :

 5 P’s of data science : ppl , purpose, process, platforms , programmability.

 Programmability: Capturing a scalable data science process requires aid from

 Asking the right questions :

- Define the problem.

You might also like