0% found this document useful (0 votes)

16 views18 pages

U1 D CLSRM

The document provides an overview of Big Data, including its types (structured, unstructured, semi-structured), characteristics, and the analytic process which consists of phases like business understanding, data collection, preparation, modeling, evaluation, and deployment. It distinguishes between reporting and analysis, emphasizing their different purposes and outputs. Additionally, it mentions various modern data analytic tools available for organizations to consider.

Uploaded by

lolrofl102938

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

U1 D CLSRM

Uploaded by

lolrofl102938

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Lecture-7

Big Data (KCS-061)

Unit 1: Introduction to Big Data
• Types of digital data
• History of Big Data innovation
• Introduction to Big Data platform, drivers for Big Data
• Big Data architecture and characteristics
• 5 Vs of Big Data
• Big Data technology components
• Big Data importance and applications
• Big Data features – security, compliance, auditing and protection
• Big Data privacy and ethics
• Big Data Analytics
• Challenges of conventional systems
• Intelligent data analysis, nature of data, analytic processes and tools, analysis vs
reporting, modern data analytic tools
Nature of Data

• The data in Big Data can be any of the following:

• Structured

• Unstructured

• Semi-structured
• Usually, data is in the unstructured format which makes extracting
information from it difficult.
• According to Merrill Lynch, 80–90% of business data is either
unstructured or semi-structured.
• Gartner also estimates that unstructured data constitutes 80% of the
whole enterprise data.
Formats of Digital Data
Here is a percent distribution of the three forms of data
• Structured
By structured data, we mean data that can be processed, stored, and retrieved in a fixed
format. It refers to highly organized information that can be readily and seamlessly
stored and accessed from a database by simple search engine algorithms. For instance,
the employee table in a company database will be structured as the employee details,
their job positions, their salaries, etc., will be present in an organized manner.

• Unstructured
This data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data.
Email is an example of unstructured data. Structured and unstructured are two
important types of big data.
• Semi-structured
This data pertains to the data containing both the formats mentioned above, that is,
structured and unstructured data. To be precise, it refers to the data that although has
not been classified under a particular repository (database), yet contains vital
information or tags that segregate individual elements within the data. Thus we come to
the end of types of data. Lets discuss the characteristics of data.
Example: data in an XML file
The Analytic Process
• An analysis process contains all or some of the following phases:

• Business Understanding

• Data Collection and Understanding

• Data Preparation

• Modeling

• Evaluation

• Deployment
1. Business Understanding:

• This step mostly focuses on understanding the Business in all the different aspects. It follows the
below different steps.

a) Identify the goal and frame the business problem.

b) Gather information on resource, constraints, assumptions, risks etc

c) Prepare Analytical Goal

d) Flow Chart
2. Data Collection:

• The process of collecting data is an important task in executing a project plan accurately.

• In this phase, data from different data sources is collected furst and then described in terms of its
application and need of the project.

• This process is also called data exploration.

• Exploration of data is required to ensure the quality of the collected data.

3. Data Preparation:

• In this step, the provided data is prepared and cleaned.

• In other words, unnecessary or unwanted data is removed in this phase.

4. Data Modeling:

• In this phase, a model is created by using a data modeling technique.

• The data model is used to analyze the relationship between different selected objects in the data.

• Test cases are created to assess the applicability of model and data is structured according to the
model.
5. Data Evaluation:

• The results obtained from the different test cases are evaluated and reviewed for errors.

• After validating the results, analysis reports are created for determining the next plan of action.

5. Deployment:

• In this phase, the plan is finalized for deployment.

• The deployed plan is constantly checked for errors and maintenance.

• This process is also termed as reviewing the project.

• Phases of analysis:
Analysis vs Reporting
• Sometimes the line between reporting and analysis tends to blur.
• We need to be able to distinguish between these two areas.

• Reporting:
• It is a process in which data is organized and summarized in an easy-to-understand format.
• Reports enable organizations to monitor various performance parameters and imporve customer
satisfaction.

• Analysis:
• It is a process in which data and reports are examined to get insights from them.
• These insights help an organization to perform important tasks in a timely manner, such as
planning a strategy, taking important business decisions, introducing a new product, and
improving customer satisfaction.
• In simple words, reporting can be sonsidered as a process in which raw data is transformed into
useful information and analysis as a process that transforms information into insights.

• While both draw upon the same collected online data, reporting and analysis are very different in
terms of their purpose, tasks, outputs, delivery, and value as shown below:
Modern Data Analytic Tools
• Various types of analytical tools are available in the market, but no company can buy and
implement all of them.

• Some of the open-source analytical tools are as follows:

✔ GridGain
✔ HPCC
✔ Storm
✔ Terrastore
✔ Neo4j

*
• The decision to invest in an analytical tool is a crucial one and needs careful consideration on the
part of a company on various parameters.

• The following are some popular analytical tools:

✔ The R Project for Statistical Computing

✔ IBM SPSS

✔ SAS
Thank You

Common questions

The data collection phase is crucial to the analytics process as it involves gathering data from various sources relevant to the project's goals. This phase is also referred to as data exploration, wherein data is evaluated to ensure its quality and applicability for subsequent analysis. Proper data collection lays the groundwork for accurate data preparation, modeling, and evaluation, directly impacting the quality of analytical outcomes. High-quality, relevant data helps create reliable models and generate actionable insights, whereas poor-quality data can lead to erroneous conclusions and inefficient decisions .

Understanding the nature of data is fundamental in crafting effective Big Data solutions because it determines the methods and tools used for data processing and analysis. With structured data, traditional database management methods suffice, while unstructured and semi-structured data require advanced analytical tools and methodologies. The prevalence of unstructured data in enterprises means that specialized techniques are needed to harness its potential. Recognizing these differences helps in choosing appropriate technologies and developing efficient data models, ensuring that data-driven decisions are well-supported by accurate analysis .

Security, compliance, auditing, and protection are critical components that impact Big Data applications by ensuring data integrity, privacy, and trustworthiness. Security measures protect sensitive information from unauthorized access and breaches. Compliance ensures that data handling meets legal and industry-specific regulations. Auditing provides transparency and traceability in data operations, allowing organizations to monitor and verify compliance with these regulations. Protection involves setting policies and practices that safeguard data throughout its lifecycle. Together, these elements help mitigate risks associated with data storage and processing, fostering trust in Big Data applications and their outputs .

Reporting and analysis differ greatly in their functions despite both utilizing collected data. Reporting organizes and summarizes data into a clear format that allows monitoring of performance parameters, enhancing decision-making by providing factual information at a glance. Analysis, however, involves a deeper examination of data and reports to derive insights, which can guide strategic planning and decision-making. Therefore, while reporting provides the 'what' of business performance, analysis offers the 'why' and 'how', allowing companies to not only track past performance but also predict and prepare for future trends .

Modern data analytic tools are characterized by their ability to handle vast, diverse datasets and perform complex analyses at speed. These tools, such as GridGain, Neo4j, and SAS, provide features like real-time processing, support for multiple data formats, and advanced visualization capabilities. They facilitate every stage of the analytics process, from data preparation to modeling and evaluation, thus allowing businesses to draw insights from both structured and unstructured data efficiently. By leveraging these tools, organizations can enhance their decision-making processes, optimize operations, and innovate by transforming raw data into valuable insights .

Conventional systems often struggle with scalability, data volume, and real-time processing needs, limiting their ability to handle the demands of modern-day data environments. Big Data technologies, however, are specifically designed to address these challenges by offering distributed computing, high storage capacity, and parallel processing. Technologies like Hadoop allow for efficient handling of large datasets across multiple servers. Additionally, Big Data systems can process diverse data formats—from structured to unstructured—more effectively than conventional systems. This capability provides organizations with deeper insights and faster, more efficient data processing solutions .

Ethical considerations in Big Data primarily revolve around privacy and data usage. Data privacy concerns arise from the vast amount of personal information processed in Big Data applications, potentially leading to misuse or unauthorized exposure. Ethical data usage requires organizations to balance the benefits of data analysis with individuals' rights to privacy. This involves adhering to strict data protection regulations, ensuring transparency in data usage practices, and obtaining informed consent from data subjects. Addressing these ethical concerns is vital to maintaining public trust and preventing legal repercussions while leveraging Big Data's capabilities for societal benefits .

The history of Big Data innovation has significantly shaped contemporary Big Data platforms and architectures. Early advancements focused on improving data storage and computing power to manage large datasets. This evolution has led to the development of sophisticated architectures that support distributed computing, fault tolerance, and scalability. Modern Big Data platforms are designed to accommodate the increasing volume, variety, and velocity of data by incorporating technologies such as Hadoop and cloud computing. These innovations have enabled real-time analytics, enhanced data integration, and improved access to insights, driving more informed and timely decision-making across industries .

The '5 Vs of Big Data'—Volume, Velocity, Variety, Veracity, and Value—define the major characteristics and challenges of managing Big Data. Volume refers to the massive amounts of data generated; Velocity is the speed at which data is produced and must be processed; Variety encompasses the different types of data, from structured to unstructured formats; Veracity highlights the uncertainty of data quality; and Value pertains to the insights and business benefits derived from the data. Each 'V' presents unique challenges, such as storage capacity for Volume, real-time processing for Velocity, integration of diverse data sources for Variety, trustworthiness for Veracity, and extraction of actionable insights for Value .

Big Data analytics involves three main categories of data: structured, unstructured, and semi-structured. Structured data refers to highly organized data that can be easily stored and accessed, such as entries in a database. Unstructured data lacks a predefined format, making it challenging to process and analyze; examples include emails and social media posts. Semi-structured data contains elements of both structured and unstructured data, such as XML files. Understanding these categories is essential for effectively extracting valuable insights, as most enterprise data is unstructured or semi-structured .

Data Analytics Lecture 2
No ratings yet
Data Analytics Lecture 2
26 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
R Analytics in Data Analysis
No ratings yet
R Analytics in Data Analysis
11 pages
(BIT-601) Data Analytics Question Bank
No ratings yet
(BIT-601) Data Analytics Question Bank
56 pages
Data Science - III
No ratings yet
Data Science - III
94 pages
Module 6 - Analytics and Big Data
No ratings yet
Module 6 - Analytics and Big Data
24 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Unit 1ppt 241202105748 Ba1c594f
No ratings yet
Unit 1ppt 241202105748 Ba1c594f
30 pages
Unit - I DA
No ratings yet
Unit - I DA
107 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
Fundamentals of Data Analysis Overview
No ratings yet
Fundamentals of Data Analysis Overview
15 pages
Untitled Document-1
No ratings yet
Untitled Document-1
3 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Analytics Lecture 3-1
No ratings yet
Data Analytics Lecture 3-1
23 pages
DA - Unit I
No ratings yet
DA - Unit I
83 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
9 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
Data Analytics for Beginners
No ratings yet
Data Analytics for Beginners
11 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Data Analytics and Supporting Services - Module 3-1
No ratings yet
Data Analytics and Supporting Services - Module 3-1
65 pages
Lec02 Business Analytics - 20231224 - 102047 - 0000 1
No ratings yet
Lec02 Business Analytics - 20231224 - 102047 - 0000 1
23 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Da Notes
No ratings yet
Da Notes
61 pages
Data Analytics and Visualization Unit-I
No ratings yet
Data Analytics and Visualization Unit-I
25 pages
Understanding Data Analytics and Its Importance
No ratings yet
Understanding Data Analytics and Its Importance
5 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
No ratings yet
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
18 pages
Understanding Data Analytics Essentials
No ratings yet
Understanding Data Analytics Essentials
15 pages
Unit 1 Introduction To Data Analytics
No ratings yet
Unit 1 Introduction To Data Analytics
20 pages
Unit 1 Notes Final Part C
No ratings yet
Unit 1 Notes Final Part C
38 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Big Data Insights for IT Professionals
No ratings yet
Big Data Insights for IT Professionals
35 pages
Notes 3 (Prepare Coursera)
No ratings yet
Notes 3 (Prepare Coursera)
67 pages
BA Unit 2
No ratings yet
BA Unit 2
15 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
24 pages
Topic 5 Data and Databases 1
No ratings yet
Topic 5 Data and Databases 1
16 pages
Advantages and Disadvantages of Data Analytics
No ratings yet
Advantages and Disadvantages of Data Analytics
6 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
Data Analytics Using R (DAR)
No ratings yet
Data Analytics Using R (DAR)
54 pages
Week 01 Data Analytics - An Overview
No ratings yet
Week 01 Data Analytics - An Overview
25 pages
Quantum DA Review
No ratings yet
Quantum DA Review
28 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
22 pages
Unit 1 Data - Analytics
No ratings yet
Unit 1 Data - Analytics
53 pages
Data For Business Analytics Unit 2
No ratings yet
Data For Business Analytics Unit 2
23 pages
Principles of Data Science
No ratings yet
Principles of Data Science
46 pages
Data Analysis Essentials Guide
No ratings yet
Data Analysis Essentials Guide
9 pages
CH 1
No ratings yet
CH 1
31 pages
Understanding Analytics and Big Data: Unit-3 p-1
No ratings yet
Understanding Analytics and Big Data: Unit-3 p-1
29 pages
Data Analytics PDF
No ratings yet
Data Analytics PDF
115 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
M3 - Business Data Analysis
No ratings yet
M3 - Business Data Analysis
31 pages
Understanding Semi-Structured Data
No ratings yet
Understanding Semi-Structured Data
6 pages
Data Analytics Unit - I Data Analytics and Lifecycle
No ratings yet
Data Analytics Unit - I Data Analytics and Lifecycle
46 pages
3 Data Analytics Techniques
No ratings yet
3 Data Analytics Techniques
17 pages
Unit 1 Notes - Data Analysis Using R
No ratings yet
Unit 1 Notes - Data Analysis Using R
17 pages
SSD & RAM Price List 2021
No ratings yet
SSD & RAM Price List 2021
1 page
Tatatito Menu Prices Philippines 2025 (Updated) - All About Philippines Menu
No ratings yet
Tatatito Menu Prices Philippines 2025 (Updated) - All About Philippines Menu
20 pages
Empoerment Tech Q4Module5 6
No ratings yet
Empoerment Tech Q4Module5 6
14 pages
Proven Fixed Ops Marketing Tactics You're Not Using: Jeff Clark EVP Business Development
No ratings yet
Proven Fixed Ops Marketing Tactics You're Not Using: Jeff Clark EVP Business Development
18 pages
Fixed Slite Display: Installation Manual
No ratings yet
Fixed Slite Display: Installation Manual
61 pages
Valu-Based Pricing - FAQ
No ratings yet
Valu-Based Pricing - FAQ
2 pages
Ambo University Woliso Campus, Technology and Informatics School Department of Computer Science
No ratings yet
Ambo University Woliso Campus, Technology and Informatics School Department of Computer Science
48 pages
ELgamal & LFSR
No ratings yet
ELgamal & LFSR
21 pages
Automotive Electrical Assembly NC II
No ratings yet
Automotive Electrical Assembly NC II
70 pages
Dissertation School Uniform
100% (2)
Dissertation School Uniform
5 pages
Network Protocols and OSI Model Overview
No ratings yet
Network Protocols and OSI Model Overview
118 pages
Nitrogen & Oxygen Generators
No ratings yet
Nitrogen & Oxygen Generators
16 pages
IoT Medical Aid Dispenser System
No ratings yet
IoT Medical Aid Dispenser System
6 pages
IVENT GE Versamed
No ratings yet
IVENT GE Versamed
227 pages
Thermodynamics and Heat Power 9th Edition Irving Granet Instant Download
No ratings yet
Thermodynamics and Heat Power 9th Edition Irving Granet Instant Download
127 pages
Natural Justice in Indian Law
100% (1)
Natural Justice in Indian Law
5 pages
Airtel Payment Bank Account Statement
No ratings yet
Airtel Payment Bank Account Statement
3 pages
MSD 7500 Installation Instructions
No ratings yet
MSD 7500 Installation Instructions
8 pages
Financial Management Basics
No ratings yet
Financial Management Basics
3 pages
Afl GB
No ratings yet
Afl GB
8 pages
Hema Poultry Farming Project Report
No ratings yet
Hema Poultry Farming Project Report
10 pages
Gas Piping Best Practices Chris Wolfe
No ratings yet
Gas Piping Best Practices Chris Wolfe
11 pages
Lasswell's Model Explained
0% (1)
Lasswell's Model Explained
9 pages
Inbound WePay Carrier Manual
No ratings yet
Inbound WePay Carrier Manual
69 pages
Ordem Paranormal RPG Terms 2014
No ratings yet
Ordem Paranormal RPG Terms 2014
1 page
Job Satisfaction
No ratings yet
Job Satisfaction
3 pages
La Marzocco Technical Newsletter July 2017
No ratings yet
La Marzocco Technical Newsletter July 2017
1 page
Slot1.1 What Is Back End Development
No ratings yet
Slot1.1 What Is Back End Development
18 pages
Constitution-Trade, Commerce and Intercourse
No ratings yet
Constitution-Trade, Commerce and Intercourse
12 pages
Test Your Understanding - Constructor (Copy) - Attempt Review
100% (1)
Test Your Understanding - Constructor (Copy) - Attempt Review
7 pages

U1 D CLSRM

Uploaded by

U1 D CLSRM

Uploaded by

Lecture-7

Big Data (KCS-061)

• The data in Big Data can be any of the following:

• Data Collection and Understanding

a) Identify the goal and frame the business problem.

b) Gather information on resource, constraints, assumptions, risks etc

c) Prepare Analytical Goal

• This process is also called data exploration.

• Exploration of data is required to ensure the quality of the collected data.

• In this step, the provided data is prepared and cleaned.

• In other words, unnecessary or unwanted data is removed in this phase.

• In this phase, a model is created by using a data modeling technique.

• In this phase, the plan is finalized for deployment.

• The deployed plan is constantly checked for errors and maintenance.

• This process is also termed as reviewing the project.

• Some of the open-source analytical tools are as follows:

• The following are some popular analytical tools:

✔ The R Project for Statistical Computing

Common questions

What role does the data collection phase play in the analytics process, and how does it impact the quality of results?

Why is understanding the nature of data critical in developing effective Big Data solutions?

How do security, compliance, auditing, and protection impact Big Data applications?

How can the distinction between reporting and analysis influence business decision-making processes?

What are the unique characteristics of modern data analytic tools, and how do they support analytical processes?

In what ways do the challenges of conventional systems contrast with those addressed by Big Data technologies?

Evaluate the ethical considerations of Big Data, focusing on privacy and data usage.

How does the history of Big Data innovation influence current Big Data platforms and architectures?

Discuss the significance of the '5 Vs of Big Data' and how they relate to Big Data challenges.

What are the characteristics and categories of data in the context of Big Data analytics?

You might also like