UNIT:01
INTRODUCTION TO BIG DATA ANALYTICS
Dr. M B Patil
Dr. M B Patil Dept of CSE N K Orchid College Solapur 1
Content
◦ Why Big Data and where did it come from?
◦ Characteristics of Big Data.
◦ Application of Big Data.
◦ Enabling Technologies for Big Data
◦ Big Data Stack
◦ Big data distribution packages
Dr. M B Patil Dept of CSE N K Orchid College Solapur 2
Why Big Data and where did it come from?
◦ What is Data?
The quantities, characters, or symbols on which operations are performed by a computer, which may be
stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical
recording media.
◦ What is Big Data?
Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in
volume and yet growing exponentially with time. In short such data is so large and complex that none of the
traditional data management tools are able to store it or process it efficiently
Dr. M B Patil Dept of CSE N K Orchid College Solapur 3
Examples Of Big Data
The New York Stock Exchange generates about one terabyte of new trade data per day
Dr. M B Patil Dept of CSE N K Orchid College Solapur 4
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social media site
Facebook, every day. This data is mainly generated in terms of photo and video uploads, message
exchanges, putting comments etc
Dr. M B Patil Dept of CSE N K Orchid College Solapur 5
Tabular Representation of various Memory Sizes
Dr. M B Patil Dept of CSE N K Orchid College Solapur 6
Types of Data
1. Structured.
2. Unstructured.
3. Semi structured.
Dr. M B Patil Dept of CSE N K Orchid College Solapur 7
Semi-structured
• Semi-structured data can contain both the forms of data.
• We can see semi-structured data as a structured in form but it is actually not defined with e.g. a
table definition in relational DBMS.
• Example of semi-structured data is a data represented in an XML file.
• Examples Of Semi-structured Data Personal data stored in an XML file- Prashant RaoMale35
Seema R.Female41 Satish ManeMale29 Subrato RoyMale26 Jeremiah J.Male35</rec
Dr. M B Patil Dept of CSE N K Orchid College Solapur 8
Characteristics Of Big Data
◦ The following are known as “Big Data Characteristics”.
◦ 1. Volume
◦ Volume means “How much Data is generated”. Now-a-days,
◦ 2. Velocity
◦ Velocity means “How fast produce Data”.
◦ 3. Variety
◦ Variety means “Different forms of Data”.
◦ 4. Veracity
◦ Veracity means “The Quality or Correctness or Accuracy of Captured Data”
Dr. M B Patil Dept of CSE N K Orchid College Solapur 9
Big Data challenges
◦ Insufficient understanding and acceptance of Big data.
◦ Confusion while Big data tool selection.
◦ Big loads of Money.
◦ Data Integration.
◦ Data security.
◦ Other Challenges.
Dr. M B Patil Dept of CSE N K Orchid College Solapur 10
Applications of Big Data
◦ Business Intelligence and Decision Making
◦ Healthcare
◦ Finance
◦ Marketing and Customer Insights
◦ Supply Chain Optimization.
◦ Energy Management.
◦ Smart Cities and Urban Planning
◦ Manufacturing and Industry 4.0
◦ Agriculture
◦ Transportation and Logistics
◦ Media and Entertainment
◦ Environmental Monitoring
◦ Research and Development
◦ Government and Public Services
Dr. M B Patil Dept of CSE N K Orchid College Solapur 11
Enabling Technology of Big Data:
◦ The Key technology that enable Big data are
1. Apache Hadoop
2. No SQL.
3. Data Warehousing Solutions
4. Machine Learning and AI
5. Data Streaming and Real-Time Analytics :Apache Kafka and Apache Flink
6. Data Visualization Tools
7. Cloud Computing
8. Data Pre-processing Tools
9. Data Security and Privacy Tools
Dr. M B Patil Dept of CSE N K Orchid College Solapur 12