0% found this document useful (0 votes)

35 views65 pages

Bda Unit-1

Uploaded by

ANSHI RANK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views65 pages

Bda Unit-1

Uploaded by

ANSHI RANK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

BDA

Unit – 1
Introduction to Big Data
By :- Urvi Dhamecha

Urvi Dhamecha
What’s Big Data?
• Big data is the term for a collection of data sets so large
and complex that it becomes difficult to process using
on-hand database management tools or traditional data
processing applications.
• The challenges include capture, storage, search, sharing,
transfer, analysis, and visualization.
• The trend to larger data sets is due to the additional
information derivable from analysis of a single large set
of related data, as compared to separate smaller sets
with the same total amount of data, allowing
correlations to be found to "spot business trends,
determine quality of research, prevent diseases, link
legal citations, combat crime, and determine real-time
roadway traffic conditions.”
Urvi Dhamecha
Distributed file system
• A distributed file system (DFS) is a file
system with data stored on a server.

• The DFS makes it convenient to share

information and files among users on a
network in a controlled and authorized way.

Urvi Dhamecha
Types of Big Data
• Structured Data
• Semi-Structured Data
• Unstructured Data

Urvi Dhamecha
Types of Big Data
• Structured Data:
– Information stored in databases is known as
structured data because it is represented in
a strict format.
– The DBMS then checks to ensure that all
data follows the structures and constraints
specified in the schema.

Urvi Dhamecha
Types of Big Data
• Semi-Structured Data:
– In some applications, data is collected in an ad-hoc
manner before it is known how it will be stored and
managed.
– This data may have a certain structure, but not all the
information collected will have identical structure.
This type of data is known as semi-structured data.
– In semi-structured data, the schema information is
mixed in with the data values, since each data object
can have different attributes that are not known in
advance. Hence, this type of data is sometimes
referred to as self-describing data.

Urvi Dhamecha
Types of Big Data

• Unstructured Data:
– A third category is known as unstructured data,
because there is very limited indication of the type
of data.
– A typical example would be a text document that
contains information embedded within it. Web
pages in HTML that contain some data are
considered as unstructured data.

Urvi Dhamecha
Characteristics of big data
The FOUR V’s of Big Data
• Volume
• Velocity
• Variety
• Veracity

Urvi Dhamecha
Urvi Dhamecha
Volume

Urvi Dhamecha
Volume
• Big data is always large in volume. It actually doesn't
have to be a certain number of petabytes to qualify.
• If your store of old data and new incoming data has
gotten so large that you are having difficulty handling
it, that's big data.
• Remember that it's going to keep getting bigger. Your
consultant needs to recommend a scalable
solution that can grow with your data.

Urvi Dhamecha
Variety

Urvi Dhamecha
Variety
• Variety points to the number of sources or incoming
vectors leading to your databases.
• That might be embedded sensor data, phone
conversations, documents, video uploads or feeds,
social media, and much more.
• Variety in data means variety in databases – you'll
almost certainly need to add a non-relational database
if you haven't already done so.

Urvi Dhamecha
Velocity

Urvi Dhamecha
Velocity
• Velocity or speed refers to how fast the data is coming
in, but also to how fast you need to be able to analyze
and utilize it.
• If you have one or more business processes that
require real-time data analysis, you have a velocity
challenge.
• Solving this issue might mean expanding your private
cloud using a hybrid model that allows bursting for
additional compute power as-needed for data analysis.
• Your consultant may need to offer suggestions for
hardware, software, and business process changes to
handle today's high-speed data.
Urvi Dhamecha
Veracity

Urvi Dhamecha
Veracity
• Veracity is probably the toughest nut to crack.
• Veracity refers to the quality, accuracy, integrity and
credibility of data.
• Gathered data could have missing pieces, might be
inaccurate or might not be able to provide real,
valuable insight.
• Veracity, overall, refers to the level of trust there is in
the collected data.
• If you can't trust the data itself, the source of the data,
or the processes you are using to identify which data
points are important, you have a veracity problem.
Urvi Dhamecha
Veracity
• One of the biggest problems with big data is the
tendency for errors to snowball.
• User entry errors, redundancy and corruption all
affect the value of data.
• Your consulting firm needs to help you clean your
existing data and put processes in place to reduce
the accumulation of dirty data going forward.

Urvi Dhamecha
Big Data V.S. Relational Data
Application Relation-Based Data Big Data
Data processing Single-computer platform Cluster platforms that scale
that scales with better to thousands of nodes,
CPUs, centralized distributed process.
processing.
Data management Relational database (SQL), Non-relational databases
centralized storage. that manage varied data
types and formats (NoSQL),
distributed storage.
Analytics Batched, descriptive, Real-time, predictive and
centralized. prescriptive, distributed
analytics.

Urvi Dhamecha
Advantage of “Big Data” Analytics
• Scalability – nodes can be added to scale the system
with little administration.
• Unlike traditional RDBMS, no pre-processing is
required before storing.
• Any unstructured data such as text, images and
videos can be stored.
• There is no limit to how much data needs to be
stored and for how long.
• Protection against hardware failure – in case of any
node failure, it is redirected to other nodes. Multiple
copies of the data are automatically stored.
Urvi Dhamecha
Big Data Analytics
• Big data analytics is the process of examining
large data sets to uncover hidden patterns,
unknown correlations, market trends,
customer preferences and other useful
business information.

Urvi Dhamecha
Big data applications
• Understanding and targeting users
• Understanding and optimizing business processes
• Performance optimization
• Improving healthcare and public health
• Improving sports performance
• Improving science and research
• Optimizing machine and device performance
• Improving security and law enforcement
• Improving and optimizing cities and countries
• Financial trading
Urvi Dhamecha
Big Data Architecture
• What is Big Data Architecture?

The term "Big Data architecture" refers to the systems

and software used to manage Big Data. A Big Data
architecture must be able to handle the scale,
complexity, and variety of Big Data. It must also be able
to support the needs of different users, who may want
to access and analyze the data differently.

Urvi Dhamecha
Big Data Architecture
• What is Big Data Architecture?
1) Data ingestion 2) Data Processing
3) Data Storage 4) Data visulization

Urvi Dhamecha
Big Data Architecture
Big Data Architecture Layers
1. Data Ingestion
2. Data Processing
3. Data Storage
4. Data Visualization

Urvi Dhamecha
Big Data Architecture
Big Data Architecture Layers
There are four main Big Data architecture layers to an
architecture of Big Data:
1. Data Ingestion
This layer is responsible for collecting and storing data
from various sources. In Big Data, the data ingestion
process of extracting data from various sources and
loading it into a data repository. Data ingestion is a key
component of a Big Data architecture because it
determines how data will be ingested, transformed,
and stored.
Urvi Dhamecha
Big Data Architecture
2. Data Processing
Data processing is the second layer, responsible for
collecting, cleaning, and preparing the data for analysis.
This layer is critical for ensuring that the data is high
quality and ready to be used in the future.

3. Data Storage
Data storage is the third layer, responsible for storing
the data in a format that can be easily accessed and
analyzed. This layer is essential for ensuring that the
data is accessible and available to the other layers.
Urvi Dhamecha
Big Data Architecture
4. Data Visualization
Data visualization is the fourth layer and is responsible
for creating visualizations of the data that humans can
easily understand. This layer is important for making
the data accessible.

Urvi Dhamecha
Big Data Storage
File System and Distributed File System
• Difference between the file system and Distributed
File system.
Local File System Distributed File System

LFS store the data on a single Block. DFS divides data as multiple blocks and
stores it into different DataNodes.

DFS provides Master-Slave architecture

LFS uses Tree format to store Data.
for Data storage.

Data retrieval in LFS is slow. Data retrieval in DFS is fast.

It is not reliable because LFS data does It is reliable because in DFS data blocks
not replicate the Data files. are replicated into different DataNodes.

Urvi Dhamecha
Big Data Storage
File System and Distributed File System
Local File System Distributed File System
LFS is cheaper because it does not needs DFS is expensive because it needs extra
extra memory for storing any data file. memory to replicate the same data blocks.

Files can not be accessed directly in DFS

Files can be accessed directly in LFS. because the actual location of data blocks
are only known by NameNode.

LFS is not appropriate for analysis of very DFS is appropriate for analysis of big file of
big file of data because it needs large time data because it needs less amount of time
to process. to process as compare to Local file system.

LFS is less complex than DFS. DFS is more complex than LFS.

Urvi Dhamecha
NoSQL
What is NoSQL..?
• NoSQL is database management system that
provides mechanism for storage and retrieval of
massive amount of unstructured data in a distributed
environment on virtual servers with the focus to
provide high scalability, performance and availability.
• NoSQL was developed in response to a large volume
of data stored about users, objects and products that
need to be frequently accessed and processed.
• Some say the term “NoSQL” stands for “non SQL”
while others say it stands for “not only SQL.

Urvi Dhamecha
Features of NoSQL
• NoSQL is next generation database which is
completely different from the traditional database.
• NoSQL stands for Not only SQL. SQL as well as other
query languages can be used with NoSQL databases.
• NoSQL is non-relational database, and it is schema-
free.
• NoSQL is free of JOINs.
• NoSQL uses distributed architecture and works on
multiple processors to give high performance.
• NoSQL databases are horizontally scalable.

Urvi Dhamecha
Features of NoSQL
• Many open-source NoSQL databases are available.
• Data file can be easily replicated.
• NoSQL uses simple API (Application Programing
Interface).
• NoSQL can manage huge amount of data.
• NoSQL can be implemented on commodity hardware
which has separate RAM and disk (shared nothing
concept).

Urvi Dhamecha
Features of NoSQL
• 24 × 7 Data availability
• Location transparency
• Schema-less data model
• Modern day transaction analysis
• Architecture that suits big data
• Analytics and business intelligence

Urvi Dhamecha
Why NoSQL..?
• A relational database product can deal with more
predictable, structured data.
• NoSQL is required because today’s industry needs a
very agile system that can process unstructured and
unpredictable data dynamically.
• NoSQL is known for its high performance with high
availability, rich query language, and easy scalability
as per the need.
• SQL supports atomicity, consistency, isolation,
durability (ACID) properties.
• NoSQL supports CAP theorem.
Urvi Dhamecha
CAP Theorem
• Consistency, Availability, Partition tolerance (CAP)
theorem, also called as Brewer’s theorem.

1. Consistency guarantees all storage and their replicated

nodes have the same data at the same time.
2. Availability means every request is guaranteed to
receive a success or failure response.
3. Partition tolerance guarantees that the system
continues to operate in spite of arbitrary partitioning
due to network failures.

Urvi Dhamecha
Sharding
• Database sharding is a technique for horizontal
scaling of databases, where the data is split across
multiple database instances, or shards, to improve
performance and reduce the impact of large
amounts of data on a single database.

Urvi Dhamecha
Sharding
Sharding Architectures:
• Key Based Sharding
• Horizontal or Range Based Sharding
• Vertical Sharding
• Directory-Based Sharding

Urvi Dhamecha
Sharding
• Key Based Sharding

• This technique is also known as hash-based sharding.

• Here, we take the value of an entity such as
customer ID, customer email, IP address of a client,
zip code, etc and we use this value as an input of
the hash function.
• This process generates a hash value which is used to
determine which shard we need to use to store the
data.

Urvi Dhamecha
Sharding
• Key Based Sharding

• We need to keep in mind that the values entered

into the hash function should all come from
the same column (shard key) just to ensure that data
is placed in the correct order and in a consistent
manner.
• Basically, shard keys act like a primary key or a
unique identifier for individual rows.

Urvi Dhamecha
Sharding
• Horizontal or Range Based Sharding

• In this method, we split the data based on

the ranges of a given value inherent in each entity.
• Let’s say you have a database of your online
customers’ names and email information.
• You can split this information into two shards. In one
shard you can keep the info of customers whose first
name starts with A-P and in another shard, keep the
information of the rest of the customers.

Urvi Dhamecha
Sharding
• Vertical Sharding

• In this method, we split the entire column from the

table and we put those columns into new distinct
tables.
• Data is totally independent of one partition to the
other ones.
• Also, each partition holds both distinct rows and
columns.
• We can split different features of an entity in
different shards on different machines.
Urvi Dhamecha
Sharding
• Directory-Based Sharding

Urvi Dhamecha
Sharding
• Directory-Based Sharding

• In this method, we create and maintain a lookup

service or lookup table for the original database.
• Basically we use a shard key for lookup table and we
do mapping for each entity that exists in the
database.
• This way we keep track of which database shards
hold which data.

Urvi Dhamecha
Replication
What is Replication?
• Data replication is the process of creating and
maintaining multiple copies of the same data in different
locations or on different storage devices.
• The goal of data replication is to improve data
availability, reliability, and fault tolerance.
• By having multiple copies of data, systems can continue
to function even if one copy becomes unavailable due to
hardware failure, network issues, or other reasons.
• Data replication is commonly used in distributed systems,
databases, and storage systems to ensure that data is
always accessible and to improve system performance
and scalability.
Urvi Dhamecha
Replication
Benefits of data replication:
• Improve the availability of data
• Increase the speed of data access
• Enhance server performance
• Accomplish disaster recovery

Urvi Dhamecha
Replication
Improve the availability of data
• When a particular system experiences a technical glitch due to
malware or a faulty hardware component, the data can still be
accessed from a different site or node.
• Data replication enhances the resilience and reliability of
systems by storing data at multiple nodes across the network.
Increase data access speed
• In organizations where there are multiple branch offices
spread across the globe, users may experience some latency
while accessing data from one country to another.
• Placing replicas on local servers provides users with faster
data access and query execution times.

Urvi Dhamecha
Replication
Enhance server performance
• Database replication effectively reduces the load on the primary server by
dispersing it among other nodes in the distributed system, thereby
improving network performance.
• By routing all read-operations to a replica database, IT administrators can
save the primary server for write-operations that demand more
processing power.
Accomplish Disaster recovery
• Businesses are often susceptible to data loss due to a data breach or
hardware malfunction.
• During such a catastrophe, the employees' valuable data, along with client
information can be compromised.
• Data replication facilitates the recovery of data which is lost or corrupted
by maintaining accurate backups at well-monitored locations, thereby
contributing to enhanced data protection.

Urvi Dhamecha
Replication
Types of data replication
• Full table replication
• Transactional replication
• Snapshot replication
• Merge replication
• Key-based incremental replication

Urvi Dhamecha
Replication
Full table replication
• Full table replication means that the entire data is
replicated. This includes new, updated as well as existing
data that is copied from source to the destination.
• This method of replication is generally associated with
higher costs since the processing power and network
bandwidth requirements are high.
• However, full table replication can be beneficial when it
comes to the recovery of hard-deleted data.

Urvi Dhamecha
Replication
Transactional replication
• In this method, the data replication software makes full
initial copies of data from origin to destination following
which the subscriber database receives updates
whenever data is modified.
• This is more efficient mode of replication since fewer
rows are copied each time data is changed.
• Transactional replication is usually found in server-to-
server environments.

Urvi Dhamecha
Replication
Snapshot replication
• In Snapshot replication, data is replicated exactly as it
appears at any given time.
• Unlike other methods, Snapshot replication does not pay
attention to the changes made to data.
• This mode of replication is used when changes made to
data tends to be infrequent; for example performing
initial synchronizations between publishers and
subscribers

Urvi Dhamecha
Replication
Merge replication
• This type of replication is commonly found in server-to-
client environments and allows both the publisher and
subscriber to make changes to data dynamically.
• In merge replication, data from two or more databases
are combined to form a single database thereby
contributing to the complexity of using this technique.

Urvi Dhamecha
Replication
Key-based incremental replication
• Also called key-based incremental data capture, this
technique only copies data changed since the last
update.
• Keys can be looked at as elements that exist within
databases that trigger data replication.
• Since only a few rows are copied during each update, the
costs are significantly low.
• However, the drawback lies in the fact that this
replication mode cannot be used to recover hard deleted
data, since the key value is also deleted along with the
record.
Urvi Dhamecha
ACID and BASE Properties
ACID Model:
To explain ACID in more detail and easy way is to
understand through breaking down the acronym, ACID:
• Atomicity: This property states transaction must be
treated as an atomic unit, that is, either all of its
operations are executed or none, and there must be
no state in a database where a transaction is left
partially completed also the states should be defined
either before the execution of the transaction or
after the execution of the transaction.

Urvi Dhamecha
ACID and BASE Properties
ACID Model:
• Consistency: The database must remain in a consistent state
after any transaction also no transaction should have any
adverse effect on the data residing in the database and if the
database was in a consistent state before the execution of a
transaction then it must remain consistent after the execution
of the transaction as well.
• Isolation: In a database system where more than one
transaction is being executed simultaneously and in parallel,
the property of isolation states that each one of the
transactions is going to be administered and executed as it is
the only transaction in the system also no transaction will
affect the existence of any other transactions.

Urvi Dhamecha
ACID and BASE Properties
ACID Model:

• Durability: The database should be durable enough

to hold all its latest updates even if the system fails
or restarts so, In a practical way of saying that if a
transaction updates a chunk of data in a database
and commit is performed then the database will hold
the modified data but if the commit is not performed
then no data is modified and it can only be done
when the system start.

Urvi Dhamecha
ACID and BASE Properties
BASE Model:
Acronym BASE stands for:-
• Basically Available: Instead of making it compulsory
for immediate consistency, BASE-modelled NoSQL
databases will ensure the availability of data by
spreading and replicating it across the nodes of the
database cluster.
• Soft State: Due to the lack of immediate consistency,
the data values may change over time. The BASE
model breaks off with the concept of a database that
obligates its own consistency, delegating that
responsibility to developers.
Urvi Dhamecha
ACID and BASE Properties
Base Model:
• Eventually Consistent: The fact that BASE does not
obligates immediate consistency but it does not
mean that it never achieves it. However, until it does,
the data reads are still possible (even though they
might not reflect reality).

Urvi Dhamecha
ACID and BASE Properties
Difference between ACID and BASE:
S. No Criteria ACID BASE

1. Simplicity Simple Complex

2. Maintenance High Low

3. Consistency Of Data Strong Weak/Loose

4. Concurrency scheme Exact Answer Close to answer

5. Scaling Vertical Horizontal

Urvi Dhamecha
ACID and BASE Properties
Difference between ACID and BASE:
S. No Criteria ACID BASE

6. Implementation Easy to implement Difficult to implement

7. Upgrade Harder to upgrade Easy to upgrade

8. Type of database Robust Simple

9. Type of code Simple Harder

Time required for

10. Less time More time.
completion

DynamoDB,
Oracle, MySQL, SQL
11. Examples Cassandra, CouchDB,
Server, etc.
SimpleDB etc.

Urvi Dhamecha
End of Unit – 1

Urvi Dhamecha

Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Big Data Analytics: - by Ayushi Gupta
No ratings yet
Big Data Analytics: - by Ayushi Gupta
94 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Unit 1 Introduction: Data Science and Big Data: Syllabus
No ratings yet
Unit 1 Introduction: Data Science and Big Data: Syllabus
38 pages
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
No ratings yet
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
28 pages
Unit 5
No ratings yet
Unit 5
63 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
7 pages
Big Data Essentials for IT Professionals
No ratings yet
Big Data Essentials for IT Professionals
26 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Unit 1
No ratings yet
Unit 1
54 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Unit I
No ratings yet
Unit I
25 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
5introduction Data Science
No ratings yet
5introduction Data Science
46 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
13 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
16 pages
What Is Big Data
No ratings yet
What Is Big Data
8 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Unit 1
No ratings yet
Unit 1
44 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
60 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
R II Bca IV Sem Unit 3 Balu Sir
No ratings yet
R II Bca IV Sem Unit 3 Balu Sir
14 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
BIG Data 1
No ratings yet
BIG Data 1
10 pages
Big Data 1 Unit
No ratings yet
Big Data 1 Unit
21 pages
Unit 1 BD
No ratings yet
Unit 1 BD
24 pages
Report On Big Data
No ratings yet
Report On Big Data
23 pages
Big Data Analytics - Lecture Slides
No ratings yet
Big Data Analytics - Lecture Slides
72 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
BD 1
No ratings yet
BD 1
15 pages
Bda U1
No ratings yet
Bda U1
78 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Enterprise Integration Report
No ratings yet
Enterprise Integration Report
7 pages
Unit 1
No ratings yet
Unit 1
26 pages
Big Data Overview: Types and Characteristics
No ratings yet
Big Data Overview: Types and Characteristics
15 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
30 pages
A Review On Big Data
No ratings yet
A Review On Big Data
6 pages
Big Data Introduction
No ratings yet
Big Data Introduction
41 pages
BigData UNIT-1
No ratings yet
BigData UNIT-1
19 pages
Big Data
No ratings yet
Big Data
7 pages
BDA Unit 1
No ratings yet
BDA Unit 1
60 pages
Unit - 1
No ratings yet
Unit - 1
104 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Big Data
No ratings yet
Big Data
16 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Unit 1.1 - Introduction To Big Data Analytics
No ratings yet
Unit 1.1 - Introduction To Big Data Analytics
19 pages
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
No ratings yet
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
42 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
48 pages
Wireshark: Network Analysis Guide
No ratings yet
Wireshark: Network Analysis Guide
24 pages
Database Structure of Accounting Systems
67% (3)
Database Structure of Accounting Systems
4 pages
DBMS Unit 7 Database Backup Recovery and Security
No ratings yet
DBMS Unit 7 Database Backup Recovery and Security
11 pages
EAPP Curriculum Guide Overview
No ratings yet
EAPP Curriculum Guide Overview
5 pages
Inside PostgreSQL Shared Memory
100% (3)
Inside PostgreSQL Shared Memory
25 pages
DBMS Unit Wise QB
No ratings yet
DBMS Unit Wise QB
4 pages
Iso 214 1976
No ratings yet
Iso 214 1976
9 pages
ServiceNow CIS Questions - MoreCorrect
100% (1)
ServiceNow CIS Questions - MoreCorrect
12 pages
Functional Elements of GIS
50% (4)
Functional Elements of GIS
2 pages
Data Science Methodology
No ratings yet
Data Science Methodology
4 pages
Kunal's Yaml Tutorial Notes
No ratings yet
Kunal's Yaml Tutorial Notes
12 pages
Simple Backup/Restore Utility With SQL-: Introduction To SQL-DMO
No ratings yet
Simple Backup/Restore Utility With SQL-: Introduction To SQL-DMO
8 pages
Oracle SQL PL/SQL Employee Queries Guide
0% (1)
Oracle SQL PL/SQL Employee Queries Guide
11 pages
Teaching For Personal and Societal Flourishing in Schools (Brunsdon & Griffin, in Press)
No ratings yet
Teaching For Personal and Societal Flourishing in Schools (Brunsdon & Griffin, in Press)
25 pages
2D Barcode Insights & Technology
0% (1)
2D Barcode Insights & Technology
10 pages
Figurative Language in Denver's "Back Home Again"
No ratings yet
Figurative Language in Denver's "Back Home Again"
11 pages
ESD Unit 4 Memory 2024
No ratings yet
ESD Unit 4 Memory 2024
78 pages
COMP303 Lecture 01 - 153927
No ratings yet
COMP303 Lecture 01 - 153927
51 pages
4413 Eb 6 de 6
No ratings yet
4413 Eb 6 de 6
1 page
Lab Manual
No ratings yet
Lab Manual
15 pages
Undergraduate Engineering Program
No ratings yet
Undergraduate Engineering Program
26 pages
Culturally Sustaining Pedagogy
No ratings yet
Culturally Sustaining Pedagogy
11 pages
Data Bricks
No ratings yet
Data Bricks
9 pages
Lecture 1 Introduction To Marketing Analytics
No ratings yet
Lecture 1 Introduction To Marketing Analytics
34 pages
NetBackup Daemons and Commands Guide
No ratings yet
NetBackup Daemons and Commands Guide
7 pages
Food ID and Recipe Data Overview
No ratings yet
Food ID and Recipe Data Overview
480 pages
CS306 Lecture 1
No ratings yet
CS306 Lecture 1
58 pages
Making The Table Rows Read-Only - Gyan
No ratings yet
Making The Table Rows Read-Only - Gyan
6 pages
Management Essentials for Students
No ratings yet
Management Essentials for Students
57 pages

Bda Unit-1

Uploaded by

Bda Unit-1

Uploaded by

BDA

• The DFS makes it convenient to share

The term "Big Data architecture" refers to the systems

DFS provides Master-Slave architecture

Data retrieval in LFS is slow. Data retrieval in DFS is fast.

Files can not be accessed directly in DFS

1. Consistency guarantees all storage and their replicated

• This technique is also known as hash-based sharding.

• We need to keep in mind that the values entered

• In this method, we split the data based on

• In this method, we split the entire column from the

• In this method, we create and maintain a lookup

• Durability: The database should be durable enough

1. Simplicity Simple Complex

2. Maintenance High Low

3. Consistency Of Data Strong Weak/Loose

4. Concurrency scheme Exact Answer Close to answer

5. Scaling Vertical Horizontal

6. Implementation Easy to implement Difficult to implement

7. Upgrade Harder to upgrade Easy to upgrade

8. Type of database Robust Simple

9. Type of code Simple Harder

Time required for

You might also like