0% found this document useful (0 votes)

1 views23 pages

Big Data Analytics Unit3

The document provides an overview of Distributed File Systems (DFS), detailing their purpose, components, features, and implementations, including Hadoop Distributed File System (HDFS) and Google File System (GFS). It discusses the advantages and disadvantages of DFS, emphasizing its importance for managing big data across multiple clusters. Additionally, it covers the history of Hadoop and its relation to Apache Nutch, highlighting the evolution of distributed file systems in handling large-scale data processing.

Uploaded by

ayush.singh3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views23 pages

Big Data Analytics Unit3

Uploaded by

ayush.singh3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 3: INTRODUCTION TO HADOOP

3.1 Distributed File System

Source: https://www.geeksforgeeks.org/what-is-dfsdistributed-file-system/

What is a distributed file system?

A Distributed File System (DFS) is a file system that is distributed on multiple file servers or multiple
locations. It allows programs to access or store isolated files, allowing programmers to access files
from any network or computer.

The main purpose of the Distributed File System (DFS) is to allow users of physically distributed
systems to share their data and resources by using a Common File System. A collection of
workstations and mainframes connected by a Local Area Network (LAN) is a configuration on
Distributed File System. A DFS is executed as a part of the operating system. In DFS, a namespace is
created and this process is transparent for the clients.

DFS has two components:

• Location Transparency – Location Transparency is achieved through the namespace

component.

• Redundancy – Redundancy is done through a file replication component.

In the case of failure and heavy load, these components together improve data availability by
allowing the sharing of data in different locations to be logically grouped under one folder, which is
known as the “DFS root”.

It is not necessary to use both the components of DFS together, it is possible to use the namespace
component without using the file replication component and it is perfectly possible to use the file
replication component without using the namespace component between servers.

File system replication:

Early iterations of DFS made use of Microsoft’s File Replication Service (FRS), which allowed for
straightforward file replication between servers. The most recent iterations of the whole file are
distributed to all servers by FRS, which recognises new or updated files.

“DFS Replication” was developed by Windows Server 2003 R2 (DFSR). By only copying the portions of
files that have changed and minimising network traffic with data compression, it helps to improve
FRS. Additionally, it provides users with flexible configuration options to manage network traffic on a
configurable schedule.

What are the features or characteristics of distributed file systems?

Features or Characteristics of DFS:

• Transparency: This includes four types as given below.

o Structure transparency – There is no need for the client to know about the number
or locations of file servers and the storage devices. Multiple file servers should be
provided for performance, adaptability, and dependability.

1
o Access transparency – Both local and remote files should be accessible in the same
manner. The file system should be automatically located on the accessed file and
send it to the client’s side.

o Naming transparency – There should not be any hint in the name of the file to the
location of the file. Once a name is given to the file, it should not be changed during
transferring from one node to another.

o Replication transparency – If a file is copied on multiple nodes, both the copies of

the file and their locations should be hidden from one node to another.

• User mobility: The user can login from any location. It will automatically bring the user’s
home directory to the node where the user logs in.

• Performance: Performance is calculated based on the average amount of time needed to

service the client requests. This is the sum of CPU time, time taken to access secondary
storage, and network access time. It is expected that the performance of the Distributed File
System is similar to that of a centralized file system.

• Simplicity and ease of use: The user interface of a file system should be simple and the
number of commands in the file should be minimal.

• High availability: A Distributed File System should be able to continue operations in case of
any partial failures like a link failure, a node failure, or a storage drive crash.

• Scalability: A DFS needs to enable scaling by adding new machines or combining two
networks to meet the scalability requirements over time. Meanwhile, it needs to ensure
availability and performance.

• High Reliability: The likelihood of data loss should be minimized to permissible levels in a
distributed file system. Users should not feel the need to make backup copies of their files in
order to handle the risk of data loss. Rather, a file system should create backup copies of key
files that can be used if the originals are lost. Many file systems employ stable storage as a
high-reliability strategy.

• Data Integrity: Multiple users share and access a file system. The integrity of data saved in a
shared file must be guaranteed by the file system. That is, concurrent access requests from
many users who are competing for access to the same file must be correctly synchronized
using a concurrency control method.

• Security: A distributed file system should be secure. It should safeguard the data from
unwanted & unauthorized access, and incorporate adequate data security mechanisms.

• Heterogeneity: Heterogeneity in distributed systems is unavoidable as a result of huge scale.

Users of heterogeneous distributed systems have the option of using multiple computer
platforms for different purposes.

Distributed File System - History

Unix: Network File System (NFS) is a distributed file system protocol originally developed by Sun
Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer
network much like local storage is accessed.

2
Windows: The server component of the Distributed File System was initially introduced as an add-on
feature. It was added to Windows NT 4.0 Server and was known as “DFS 4.1”. Then later on it was
included as a standard component for all editions of Windows 2000 Server. Client-side support has
been included in Windows NT 4.0 and also in later on version of Windows.

Linux: Linux kernels 2.6.14 and versions after it come with an SMB (Server Message Block) client and
VFS (Virtual File System) known as “CIFS” (Common Internet File System) which supports DFS.

Mac OS: Mac OS X 10.7 (lion) and onwards supports Mac OS X DFS.

Distributed File System - Properties:

• File Transparency - users can access files without knowing where they are physically stored
on the network.
• Load Balancing - the file system can distribute file access requests across multiple computers
to improve performance and reliability.
• Data Replication - the file system can store copies of files on multiple computers to ensure
that the files are available even if one of the computers fails.
• Security - the file system can enforce access control policies to ensure that only authorized
users can access files.
• Scalability - the file system can support a large number of users and a large number of files.
• Concurrent Access - multiple users can access and modify the same file at the same time.
• Fault Tolerance - the file system can continue to operate even if one or more of its
components fail.
• Data Integrity - the file system can ensure that the data stored in the files is accurate and has
not been corrupted.
• File Migration - the file system can move files from one location to another without
interrupting access to the files.
• Data Consistency - changes made to a file by one user are immediately visible to all other
users.
• Support for different file types - the file system can support a wide range of file types,
including text files, image files, and video files.

Most Common DFS Implementations:

• Windows Distributed File System (DFS)

• Network File System (NFS)
• Server Message Block (SMB)
• Google File System (GFS)
• Lustre
• Hadoop Distributed File System (HDFS)
• GlusterFS
• Ceph
• MapR File System

3
Additional Information:

• NFS – NFS stands for Network File System. It is a client-server architecture that allows a
computer user to view, store, and update files remotely. The protocol of NFS is one of the
several distributed file system standards for Network-Attached Storage (NAS).

• CIFS – CIFS stands for Common Internet File System. CIFS is an application of SMB (Server
Message Block) protocol, designed by Microsoft.

• SMB – SMB stands for Server Message Block. It is a protocol for sharing a file and was
invented by IBM. The SMB protocol was created to allow computers to perform read and
write operations on files to a remote host over a Local Area Network (LAN). The directories
present in the remote host can be accessed via SMB and are called as “shares”.

• Hadoop – Hadoop is a group of open-source software services. It gives a software framework

for distributed storage and operating of big data using the MapReduce programming model.
The core of Hadoop contains a storage part, known as Hadoop Distributed File System
(HDFS), and an operating part which is a MapReduce programming model.

Working of DFS:

There are two ways in which DFS can be implemented:

• Standalone DFS namespace

Standalone DFS namespace indicates that DFS root exists on the local computer and does not
use Active Directory. A Standalone DFS can only be acquired on a computer on which it is
created. It does not provide any fault tolerance (because there is no redundancy in the
design in the form of a backup server or replication server) and cannot be linked to any other
DFS. Standalone DFS roots are rarely used because of their limited advantage.

• Domain-based DFS namespace

It stores the configuration of DFS in Active Directory, creating the DFS namespace root
accessible at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot> . FQDN stands for Fully
Qualified Domain Name.

4
Advantages:

• DFS allows multiple users to access or store the data.

• It allows the data to be shared remotely.
• It improves the availability of file, access time, and network efficiency.
• Improves the capacity to change the size of the data and also improves the ability to
exchange the data.
• Distributed File System provides transparency of data even if server or disk fails.

Disadvantages:

• In Distributed File System nodes and connections need to be secured. This is very
challenging. Therefore, we can say that security is at stake.
• There is a possibility of loss of messages and data in the network while movement from one
node to another.
• Database connection in case of Distributed File System is complicated.
• Also handling of the database is not easy in Distributed File System as compared to a single
user system.
• There are chances of overloading when all nodes try to send data at once.

Why do we need a distributed file system for Big Data?

One of the challenges of working with big data is that it is too big to manage on a single server
(irrespective of the storage capacity or computing power of the server). This is because, it is not
feasible (economically as well as technically) to add more and more capacity to a single server.
Instead, the data needs to be distributed across multiple clusters (also called nodes) by scaling out to
make use of the computing power of each cluster.

A distributed file system (DFS) enables businesses to manage the accessing of big data across
multiple clusters or nodes, allowing them to read big data quickly and perform multiple parallel reads
and writes.

The advantages of using a Distributed File System include:

• Transparent Local Access — Data is accessed as if it’s on a user’s own device or computer.

• Location Independence — Users may have no idea where file data physically resides.

• Massive Scaling — Teams can add as many machines as they want to a DFS to scale out.

• Fault Tolerance — A DFS will continue to operate even if some of its servers or disks fail
because machines are connected and the DFS can gracefully failover.

5
3.2 Google File System

Google File System (GFS or GoogleFS) is a scalable distributed file system (DFS) developed at Google
Inc., to meet the company’s growing data processing needs. GFS offers fault tolerance, dependability,
scalability, availability, and performance across the network and all connected nodes. GFS is made up
of a number of storage systems constructed from inexpensive commodity hardware parts.

GFS was originally designed for Google’s use case of searching and indexing the web. GFS addresses
the following concerns:

• Fault Tolerance: Google uses commodity machines because they are cheap and easy to
acquire, but the software behind GFS needs to be robust to handle failures of machines,
disks, and networks.

• Large Files: It’s assumed most files are large (i.e. ≥ 100 MB). Small files are supported, but
not optimized for.

• Optimized for Reads + Appends: The system is optimized for reading (specifically large
streaming reads) or appending because web crawling and indexing heavily relied on these
operations.

• High and Consistent Bandwidth: It’s acceptable to have slow operations now and then, but
the overall amount of data flowing through the system should be consistent. This aligns with
Google’s crawling and indexing purposes.

GFS manages two types of data namely File Metadata and File Data.

The GFS node cluster consists of a single master and several chunk servers that various client systems
regularly access. On local discs, chunk servers keep data in the form of Linux files. Large (64 MB)
pieces of the stored data are split up and replicated at least three times around the network.
Reduced network overhead results from the greater chunk size.

GFS meets Google’s huge cluster requirements. Hierarchical directories with path names are used to
store files. The master is responsible of managing metadata, including namespace, access control,
and mapping data. The master communicates with each chunk server by timed heartbeat messages
and keeps track of its status updates.

More than 1,000 nodes with 300 TB of disc storage capacity make up the largest GFS clusters. This is
available for constant access by hundreds of clients.

6
Components of GFS

A group of connected computers (also known as a cluster) is what constitutes GFS. There could be
hundreds or even thousands of computers in each cluster. There are three basic entities included in
any GFS cluster as follows:

1. GFS Clients: They are computer programs or applications which are used to request files.
Requests are made to access and modify already-existing files or add new files to the system.

2. GFS Master Server: It serves as the cluster’s coordinator. It preserves a record of the cluster’s
actions in an operation log. Additionally, it keeps track of the data that describes chunks, or
metadata. The chunks’ place in the overall file and which files they belong to are indicated by
the metadata to the master server.

3. GFS Chunk Servers: They are the GFS’s workhorses, responsible for storing and serving
chunks of data. They keep 64 MB-sized file chunks. The master server does not receive any
chunks from the chunk servers. Instead, chunk servers directly deliver the client the desired
chunks. The GFS makes numerous copies of each chunk and stores them on various chunk
servers in order to assure stability; the default is three copies.

Features of GFS

• Namespace management and locking (applying read or write lock at a node or file or
directory level)
• Fault tolerance
• Optimal network traffic - Reduced client-master interaction
• High availability
• Critical data replication
• Automatic and efficient data recovery
• High aggregate throughput.

7
Advantages of GFS

1. High accessibility - Data is still accessible even if a few nodes fail (because of replication
component).
2. Throughput - Provides excessive throughput because many nodes operate concurrently.
3. Dependability - Data that has been corrupted is found and duplicated. Hence dependability
on data retrieval is very high.

Disadvantages of GFS

1. Not the best fit for small files. Performs poorly as a general-purpose file system.
2. Master is the single point of failure. If the master fails to operate, the whole service fails.
3. Performs well with sequential access (I/O) but has very limited support for random access.
4. Suitable only for written once and only read (or appended) later kind of operations.

3.3 History of Hadoop

Apache Nutch is a highly extensible and scalable open-source web crawler software project. Nutch is
coded entirely in the Java programming language. It stores data in a language-independent format. It
has a highly modular architecture, allowing developers to create plug-ins for parsing, data retrieval,
querying and clustering.

Apache Nutch was initiated or founded by Doug Cutting (creator of both Apache Lucene and Apache
Hadoop) and Mike Cafarella. They wrote the web crawler (also known as “fetcher” or "robot") from
scratch for Apache Nutch project. In June, 2003, they used it to crawl 100-million web pages.

In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a
subproject of Apache Lucene in June of that same year. Since April, 2010, Nutch has been considered
an independent, top level project of the Apache Software Foundation.

To meet the multi-machine processing needs of the crawl and index tasks, the Nutch project
implemented a MapReduce processing model and a distributed file system. These two were later
grouped under a subproject called Hadoop.

Hadoop is an open-source framework based on Java that manages the storage and processing of
large amounts of data for applications. Hadoop uses distributed storage and parallel processing to
handle big data and analytics jobs, breaking workloads down into smaller workloads that can be run
at the same time.

Four modules comprise the primary Hadoop framework and work collectively to form the Hadoop
ecosystem. These are 1) Hadoop Distributed File System (HDFS), 2) Yet Another Resource Negotiator
(YARN), 3) Hadoop Common, and 4) MapReduce). Hadoop Common is a core component of the
Apache Hadoop framework that provides libraries and utilities for other Hadoop modules. It's a Java
library that includes additional utilities and applications like HBase, Hive, Apache Spark, Sqoop, and
Flume.

The design of HDFS is based on the Google File System. It was originally built as infrastructure for the
Apache Nutch web search engine project but has since become a member of the Hadoop Ecosystem.

8
In the earlier years of the internet, web crawlers started to pop up as a way for people to search for
information on web pages. During this era search engines such as Yahoo and Google emerged.

Nutch, a web-crawler was developed to distribute data and computations or processing across
multiple computers simultaneously. Nutch then moved to Yahoo, and was divided into two (one to
handing batch processing and the other to handle real-time data). Apache Hadoop and Spark are
these two components or separate entities. Hadoop handles batch processing, and Spark handles
real-time data.

Nowadays, Hadoop's structure and framework are managed by the Apache software foundation
which is a global community of software developers and contributors.

HDFS was born from this and is designed to replace hardware storage solutions with a better, more
efficient method - a virtual filing system. When it first came onto the scene, MapReduce was the only
distributed processing engine that could use HDFS. More recently, alternative Hadoop data services
components like HBase and Solr also utilize HDFS to store data.

What is Hadoop Distributed File System (HDFS) in the world of big data?

The term "big data" refers to all the data that's very huge in volume and very challenging to store,
process and analyze. HDFS can be used to store, process and analyze big data.

As we now know, Hadoop is a framework that works by using parallel processing and distributed
storage. This can be used to sort and store big data, as it can't be stored in traditional ways.

In fact, HDFS the most commonly used software to handle big data, and is used by companies such as
Netflix, Expedia, and British Airways.

There are five core elements of big data organized by HDFS services:

• Velocity - How fast data is generated, collated and analyzed.

• Volume - The amount of data generated.
• Variety - The type of data, this can be structured, semi-structured, or unstructured.
• Veracity - The quality and accuracy of the data.
• Value - How you can use this data to bring an insight into your business processes.

Advantages of Hadoop Distributed File System

1. Fault tolerance - HDFS has been designed to detect faults and automatically recover quickly
ensuring continuity and reliability.

2. Speed – With its cluster architecture, it can assure a transfer rate of 2 GB per second.

3. Access to more types of data (specifically, streaming data) - Because of its design to handle
large amounts of data it allows for high data throughput rates making it ideal to support
streaming data (through stream processing) and stored data (through batch processing).

4. Compatibility and portability - HDFS is designed to be portable across a variety of hardware

setups and compatible with several underlying operating systems.

9
5. Scalable - HDFS enables scaling of resources according to the size of the file system. HDFS
includes vertical and horizontal scalability mechanisms.

6. Data locality – In case of HDFS, the data resides in data nodes. There is no need for the data
to move to where the central processing facility or computational unit is. Computing process
happens where the data is. By shortening the distance between the data and the computing
process, HDFS decreases network congestion and improves performance.

7. Cost efficiency – The data is stored in inexpensive commodity hardware that can be virtual.
HDFS drastically reduces file system metadata and file system namespace data storage costs.
There is no license fee because HDFS is an open-source software.

8. Stores large amounts of data - HDFS can store data of all varieties and sizes. This includes
structured, semi-structured and unstructured data.

9. Flexible. Unlike traditional databases, there is no need to process the data collected before
storing it. It is possible to store as much data, and manage it according to the needs.

What are the four modules of the Hadoop ecosystem?

1. Hadoop Distributed File System (HDFS): HDFS is the primary component of the Hadoop
ecosystem. It is a distributed file system in which individual Hadoop nodes operate on data
that resides in their local storage. This removes network latency, providing high-throughput
access to application data. In addition, administrators don’t need to define schemas up front.
2. Yet Another Resource Negotiator (YARN): YARN performs scheduling and resource allocation
across the Hadoop system. It is a resource-management platform responsible for managing
compute resources in clusters and using them to schedule users’ applications.
3. MapReduce: MapReduce is a programming model for large-scale data processing. In the
MapReduce model, subsets of larger datasets and instructions for processing the subsets are
dispatched to multiple different nodes, where each subset is processed by a node in parallel
with other processing jobs. After processing the results, individual subsets are combined into
a smaller, more manageable dataset.
4. Hadoop Common: Hadoop Common includes the libraries and utilities used and shared by
other Hadoop modules.

Beyond HDFS, YARN, and MapReduce, the entire Hadoop open-source ecosystem continues to grow
and includes many tools and applications to help collect, store, process, analyze, and manage big
data. These include Apache Pig, Apache Hive, Apache HBase, Apache Spark, Presto, and Apache
Zeppelin.

How does Hadoop work?

Hadoop allows for the distribution of datasets across a cluster of commodity hardware. Processing is
performed in parallel on multiple servers simultaneously.

Software clients input data into Hadoop. HDFS handles metadata and the distributed file system.
MapReduce then processes and converts the data. YARN divides the jobs across the computing
cluster.

10
All Hadoop modules are designed with a fundamental assumption that hardware failures of
individual machines or racks of machines are common and should be automatically handled in
software by the framework.

Activity: Read and create a summary: https://hadoop.apache.org/docs/stable/hadoop-project-

dist/hadoop-hdfs/HdfsDesign.html#Assumptions_and_Goals

3.4 Concept of Commodity Server and Cluster

Cluster is a collection of things. A simple computer cluster is a group of various computers that are
connected with each other through LAN (Local Area Network). The nodes in a cluster share the data
pertaining to a common task. All nodes in a cluster are good enough to work together as a single
unit.

Hadoop cluster is a collection of various commodity hardware or servers. Commodity hardware

means devices that are inexpensive. This Hardware components work together as a single unit. In a
Hadoop cluster, there are several nodes. The Name node and Resource Manager works as Master.
The data node, and Node Manager works as a Slave. The purpose of Master nodes is to guide the
slave nodes in a single Hadoop cluster. Hadoop clusters are designed for storing, analysing,
understanding, and for finding the facts that are hidden behind the data or datasets which contain
some crucial information. Hadoop cluster stores and processes different types of data (structured,
unstructured, and semi-structured).

Hadoop Cluster Schema:

Characteristics or Properties of Hadoop Cluster

1. Scalability: Hadoop clusters allow scaling-up and scaling-down the number of nodes. This is done
by adding or removing commodity hardware. The process of scaling up or scaling down the number
of servers in the Hadoop cluster is called scalability because it enables Hadoop store and process
different volumes of data on need basis.

2. Flexibility: Hadoop cluster is flexible in terms of handling any type of data irrespective of its type
and structure. With the help of this property, Hadoop can process any type of data from online web
platforms.

11
3. Speed: Hadoop clusters are very efficient in processing data distributed among the clusters. This is
because of the architecture of MapReduce that ensures speed of processing.

4. No Data-loss: There is no chance of loss of data from any node in a Hadoop cluster because
Hadoop clusters have the ability to replicate the data. Failure of a single node does not result in loss
of data because the same data is stored as backup in two other nodes to manage such failures.

5. Economical: The Hadoop clusters are cost-efficient and economical. This is because Hadoop
involves commodity hardware and the data is distributed in a cluster among all the nodes. In order to
increase the storage, we only need to add one more additional hardware storage.

There are two types of Hadoop clusters – 1) Single Node Hadoop Cluster and 2) Multiple Node
Hadoop Cluster.

1. Single Node Hadoop Cluster: Single Node Hadoop Cluster has a single node. All Hadoop Daemons
such as Name Node, Data Node, Secondary Name Node, Resource Manager, and Node Manager run
on the same node. All corresponding Hadoop processes are handled by a single JVM (Java Virtual
Machine) instance.

2. Multiple Node Hadoop Cluster: Multiple Node Hadoop Cluster contains multiple nodes. In this
kind of cluster Hadoop Daemons can run on different nodes in the same cluster setup. In a multiple
node Hadoop cluster, it is a general practice to utilize nodes with higher processing capability for
Master i.e. Name node and Resource Manager and utilize nodes with lower processing capability for
the slave Daemon’s i.e. Node Manager and Data Node.

3.5 Hadoop Vs RDBMS

Source: https://www.geeksforgeeks.org/difference-between-rdbms-and-hadoop/

RDBMS (Relational Database Management System): RDBMS is a data management system, which is
based on a data model. RDBMS uses tables for data storage. Each row of the table represents a
record and column represents an attribute of data. The objective of RDBMS is to store, manage, and
retrieve structured data as quickly and reliably as possible. The way data is organized and
manipulated in RDBMS is unique and different as compared the way data is organized and
manipulated in other types of data bases. Transactions in RDBMS need to adhere to ACID (atomicity,
consistency, integrity, durability) properties.

12
Hadoop: It is an open-source software framework used for storing big data. Data processing
applications run on a group of commodity hardware. Hadoop offers capabilities to ensure large
storage capacity, scalability and high processing power. It can manage multiple concurrent processes.
It is used in processing big data for predictive analysis, data mining and machine learning. It can
handle structured, semi-structured and unstructured data. It is more flexible than RDBMS in storing,
processing, and managing data.

Differences between RDBMS and Hadoop

Sr. No. RDBMS Hadoop

Traditional row-column based databases, An open-source software used for storing big
1. basically used for data storage, data and running applications or processes
manipulation and retrieval. data concurrently.

Meant for processing structured and

2. Meant for processing structured data.
unstructured data.

3. Best suited for OLTP environment. It is best suited for Big Data environment.

4. It is less scalable than Hadoop. It is highly scalable.

5. Data normalization is required. Data normalization is not required.

6. SQL is used to store and retrieve data. Both SQL and NoSQL are used.

7. Stores transformed and aggregated data. Stores huge volume of data.

8. It has no latency in response. It has some latency in response.

9. The data schema of RDBMS is static type. The data schema of Hadoop is dynamic type.

10. High level of data integrity is essential. Data integrity can be at a low level.

11. Cost is applicable for licensed software. Free of cost, as it is an open-source software.

13
3.6 Data integrity in Hadoop

Data integrity refers to the accuracy, completeness, and quality of data. Preserving the integrity of
data is an essential and continuous process.

Data integrity is a broad discipline that influences how data is collected, stored, accessed and used.
The idea of integrity is a central element of many regulatory compliance frameworks, such as the
General Data Protection Regulation (GDPR).

Data integrity is preserved in data that's kept complete, accurate, consistent and safe throughout its
entire lifecycle. Data that has data integrity needs to have the following characteristics.

Completeness – In order to ensure completeness, data is maintained in its full form and no data
elements are filtered, truncated or lost. For example, if 100 students attend an entrance
examination, the examination results data need to reflects the results of all 100 students. The results
of students who failed or scored low marks should not be omitted or removed from the results.

Accuracy - Data should not be altered or aggregated in any way that affects the ability to perform
data analytics. For example, examination results should not be rounded up or down at random. Any
criteria or special conditions need to be well-documented and understood. Repeating the same type
of analysis at different point in time should return the same results.

Consistency – Data is consistent when it remains unchanged regardless of how, or how often, it's
accessed and no matter how long it's stored. For example, data accessed a year from now will have
to be the same data that's generated or accessed today. The same holds good for data accessed
through different systems or devices.

Safety - Data should be maintained in a secure manner and can only be accessed and used by
authorized applications or individuals. Further, data should be protected from malicious actors or
hackers. Data security involves considerations such as authentication, authorization, encryption,
backup or other data protection, and access logging.

Data Corruption & Data Integrity - Data corruption can be caused by hardware failures, human error,
a malicious action or failure of data security. Data corruption occurs when any unwanted or
unexpected changes to data take place during storage, access or processing. All such occurrences
can lead to failure or loss of data integrity.

Why is data integrity important?

Data integrity is very important to businesses because

1) Missing or inaccurate data might result in incorrect analysis and ineffective business
decisions or delayed actions.
2) Any issue in data integrity can lead to mishandling of customer requests or customer
interactions. This is because, data integrity ensures that customers are serviced
correctly (E.g., through timely complaint or issue resolution and reporting). Sensitive
data of customers must be protected from data loss or theft.
3) According to certain legal and other compliance requirements, businesses are
required to retain data for a period of time (say five years or ten years) to ensure
that business processes are followed in accordance with prevailing industry
standards and government regulations. Data integrity is vital for complete, accurate
and consistent reporting for all compliance purposes; otherwise, the business may
be out of compliance and subject to huge penalties and other legal actions.

14
Types of Data Integrity:

Data integrity can be categorized in to two broad categories – Physical Integrity and Logical Integrity.

Physical integrity - Physical integrity relates to the storing and retrieving of data -- primarily in
relation to the storage devices, memory components and any associated hardware. For example, if a
hard drive or memory device is damaged, then the stored data is affected as well. Here is a sample
list of events that can affect data integrity and damage the physical storage hardware.

• hardware faults and failures

• design oversights and failures
• natural deterioration such as corrosion
• power disruptions and outages
• natural disasters, and
• radiation and environmental extremes such as temperature and pressure.

Organizations can enhance the physical integrity of data by implementing hardware infrastructure,
including redundant storage subsystems such as RAID, with battery-protected write cache, using
advanced error-correcting memory devices, implementing clustered and distributed file systems, and
using error-detecting algorithms to detect data changes in transit. Organizations often adopt a
variety of hardware devices and techniques to enhance data's physical integrity.

Logical integrity - Logical integrity can be affected by poor software design, software bugs, or human
error. There are four major types of logical integrity:

1. Entity integrity - This ensures that no data element is repeated and that no critical data entry
is blank or null. This is a common logical integrity consideration in relational
database systems. For example, there can be only one student with a specific student id, and
student name cannot be null or blank.

2. Referential integrity - These rules define how data is stored and used in a database and that
only authorized changes, additions or deletions can occur. For example, one cannot delete a
department form the department table, when there are one or more employees working in
that department.

3. Domain integrity - This reflects the format, type, amount and value range or scope of
acceptable data values within a database. For example, marks scored by a student in an
examination is supposed to be numerical. An alphanumeric data is rejected because it
violates domain integrity. Such integrity checks can be pre-defined in relational data bases
when we create tables.

4. User-defined integrity - These are additional rules and constraints that are implemented in
accordance with the organization's specific needs and aren't otherwise covered by the first
three integrity types. For example, marks scored by a student has to be between 0 and 100.
Such integrity checks or validations are performed at application level.

An issue in physical integrity can impact logical integrity. For example, a null data stream captured
from an IoT sensor might violate logical or entity integrity when we attempt to store the data in a
database table. In the case, the root cause of the null data can be traced to a failed or
defective IoT sensor.

15
Data Integrity Risks

Data integrity can be impacted due factors such as,

• Human errors - Data may be accidentally deleted, entered or altered inaccurately -- such as a
wrong customer address -- or left incomplete.

• Transfer errors - Data may be damaged or lost -- or even stolen -- in transit between two
systems, such as a network failure or incorrect storage destination, or between two physical
locations, such as transporting a storage device filled with data to another location.

• Malicious acts - Malware, hacking and other cyber threats can result in a loss of both data
integrity and data security.

• Improper infrastructure configurations - Poor network and infrastructure security can

expose flaws that attackers can exploit to steal, alter or destroy data.

Proper, well-documented and strongly enforced configuration standards are essential in all data
integrity and data security situations.

What happens when there is loss of data integrity?

The consequences of data integrity loss can range from a minor annoyance to a major business
catastrophe -- depending on the amount of loss and the nature of the data involved. Business and
technology leaders invest considerable time and resources to understand and prevent data integrity
loss.

How to ensure data integrity compliance

Data integrity is a broad area that involves people, processes, rules and various tools to provide
guardrails and support. While there's no single universal solution for data integrity, there are
numerous tactics that can help to build an environment that supports data integrity. Common tactics
include the following:

1. Employee training - Organizations typically create policies and procedures designed to

govern the collection, access and protection of business-related data. Regular training
sessions will help new employees understand their roles and responsibilities in data integrity
and keep that knowledge fresh and updated as changes take place over time.

2. Establish an integrity culture - Data integrity is more meaningful when business leaders and
managers care about it as a business goal. Collaboration and top-down buy-in to integrity
concepts can drive better data integrity efforts.

3. Validate the data – Data needs to be checked or validated. This can be especially important
when data is acquired from third-party sources. For example, if a business is running
analytics on several different data sets, it is worth checking that all the data sets show
consistent – or at least sensible -- data.

4. Process data sensibly - Check and remove duplicate entries in data sets and ensure that any
data pre-processing -- such as data normalization or aggregation -- doesn't affect the base
data.

16
5. Protect data - Regular data backups can help to ensure data integrity by copying data to a
second location. This ensures that data is always available in the event of hardware,
infrastructure or security violations.

6. Implement strong security - Ensure that data access is protected using strong authentication
and authorization controls. Log all data access and retain logs to audit activity. Encryption for
data at rest and in-flight can help to protect it from unauthorized access or theft.

Data integrity vs. data security vs. data quality

The terms integrity, security and quality are sometimes improperly used as interchangeable terms.
Although the three ideas are closely related, they possess unique attributes that distinguish them
from their companion terms.

Data quality refers to the reliability of data. Good quality data must be accurate, complete, unique
with no duplicates and timely enough to be useful.

Data security is the infrastructure, tools and rules used to ensure that only authorized applications
and users can access data; that the data is used in a business-compliant manner; and that data is
preserved or backed up against loss, theft or malfeasance.

Data integrity then provides a broader umbrella that embraces aspects of data quality and security,
ensuring proper retention, appropriate destruction, and adequate compliance with relevant industry
and government regulations.

Hadoop – Data Integrity

With reference to Hadoop, data integrity is about making sure that no data is lost or corrupted
during storage or processing of big data.

The occurrence of data corruption is highly likely in Hadoop because the amount of data being
written or read is large very huge in volume.

To detect and resolve data corruption, checksum is computed when data written to the disk for the
first time and again checked while reading data from the disk. A checksum is a small-sized block of
data derived from another block of digital data to detect errors that may have been introduced
during its transmission or storage.

HDFS calculates/computes checksums for each data block and eventually stores them in a separate
hidden file in the same HDFS namespace. HDFS uses 32-bit Cyclic Redundancy Check (CRC32) as the
default checksum algorithm because of 4 bytes long and less than 1% storage overhead.

If checksum matches the original checksum, then implies that the data is not corrupted otherwise
the data is corrupted.

It is possible that it is the checksum that is corrupt, not the data, but this is very unlikely, because the
checksum is much smaller than the data.

Data Nodes are responsible for verifying the data they receive before storing the data and its
checksum. Checksum is computed for the data that they receive from clients and from other Data
Nodes during replication.

17
Hadoop can heal the corrupted data by copying one of the good replicas to produce the new replica
which is uncorrupted replica. This is why Hadoop keeps a primary and a secondary replica for each
block.

HDFS uses NameNodes and DataNodes. NameNode holds the meta data and DataNodes hold the
actual data. We will learn more about NameNode and DataNode in the next unit.

When a client detects an error while reading a block, it reports the bad block and the DataNodes it
was trying to read from. This information is reported to the NameNode before throwing a Checksum
Exception. The NameNode marks the block replica as corrupt so it doesn’t direct any more clients to
it or try to copy this replica to another DataNodes.

It provides a good copy of the block in another DataNodes to make sure that there are two replicas.
This is to make sure that the replication factor is maintained at the expected level.

Once this has happened, the corrupt replica is deleted.

3.7 Basic overview of Hadoop Installation, Hadoop Shell Commands

To use HDFS you need to install and set up a Hadoop cluster. This can be a single node set up which is
more appropriate for first-time users, or a cluster set up for large, distributed clusters. You then need
to familiarize yourself with HDFS commands, such as the below, to operate and manage your system.

Command Description
-rm Removes file or directory
-ls Lists files with permissions and other details
-mkdir Creates a directory named path in HDFS
-cat Shows contents of the file
-rmdir Deletes a directory
-put Uploads a file or folder from a local disk to HDFS
-rmr Deletes the file identified by path or folder and subfolders
-get Moves file or folder from HDFS to local file
-count Counts number of files, number of directories, and file size
-df Shows free space
-getmerge Merges multiple files in HDFS
-chmod Changes file permissions
-copyToLocal Copies files to the local system
-stat Prints statistics about the file or directory
-head Displays the first kilobyte of a file
-usage Returns the help for an individual command
-chown Allocates a new owner and group of a file

18
How does HDFS work?

HDFS uses NameNodes and DataNodes. NameNode holds the meta data and DataNodes hold the
actual data. HDFS allows the quick transfer of data between computers or nodes. When HDFS takes
in data, it's able to break down the information into blocks, distributing them to different nodes in a
cluster.

Data is broken down into blocks and distributed among the DataNodes for storage, these blocks can
also be replicated across nodes which allows for efficient parallel processing. You can access, move
around, and view data through various commands. HDFS DFS options such as "-get" and "-put" allow
you to retrieve and move data around as necessary.

HDFS is designed to be highly alert and can detect faults quickly. The file system uses data replication
to ensure every piece of data is saved multiple times and then assigns it across individual nodes,
ensuring at least one copy is on a different rack than the other copies.

When a DataNode is no longer sending signals to the NameNode, it removes the DataNode from the
cluster and operates without it. If this data node comes back, it can be allocated to a new cluster.
Plus, since the data blocks are replicated across several DataNodes, removing one will not lead to any
file corruptions of any kind.

Installation of HDFS

There are two types of clusters - single-node and multi-node. Depending on what you require, you
can either use a single-node or multi-node cluster.

A single node cluster means only one DataNode is running. It will include the NameNode, DataNode,
resource manager, and node manager on one machine.

For some industries, a single node cluster is good enough. For example, in medical field, if you're
conducting studies and need to collect, sort, and process data in a sequence for specific research
that requires limited storage, you can use a single-node cluster. This can easily handle the data on a
smaller scale, as compared to data distributed across many hundreds of machines. To install a single-
node cluster, follow these steps:

1. Download the Java 8 Package. Save this file in your home directory.

2. Extract the Java Tar File.

3. Download the Hadoop 2.7.3 Package.

4. Extract the Hadoop tar File.

5. Add the Hadoop and Java paths in the bash file (.bashrc).

6. Edit the Hadoop Configuration files.

7. Open core-site.xml and edit the property.

8. Edit hdfs-site.xml and edit the property.

9. Edit the mapred-site.xml file and edit the property.

19
10. Edit yarn-site.xml and edit the property.

11. Edit hadoop-env.sh and add the Java path.

12. Go to Hadoop home directory and format the NameNode.

13. Go to hadoop-2.7.3/sbin directory and start all the daemons.

14. Check that all the Hadoop services are running.

When you complete all these, you should now have a successfully installed HDFS.

How to access HDFS files

To access HDFS files you can download the "jar" file from HDFS to your local file system. You can also
access the HDFS using its web user interface. Simply open your browser and
type "localhost:50070" into the search bar. From there, you can see the web user interface of HDFS
and move to the utilities tab on the right-hand side. Then click "browse file system," this shows you a
full list of files located on your HDFS.

HDFS DFS Commands - Examples

Here are some of the most common Hadoop command examples.

Example A

To delete a directory, you need to apply the following (note: this can only be done if the files are
empty):

$Hadoop fs -rmdir /directory-name

$hdfs dfs -rmdir /directory-name

Example B

When you have multiple files in an HDFS, you can use a "-getmerge" command. This will merge
multiple files into a single file, which you can then download to your local file system. You can do this
with the following:

$ Hadoop fs -getmerge [-nl] /source /local-destination

$ hdfs dfs -getmerge [-nl] /source /local-destination

Example C

When you want to upload a file from HDFS to local, you can use the "-put" command. You specify
where you want to copy from, and what file you want to copy onto HDFS. Use the below:

$ Hadoop fs -put /local-file-path /hdfs-file-path

$ hdfs dfs -put /local-file-path /hdfs-file-path

20
Example D

The count command is used to track the number of directories, files, and file size on HDFS. You can
use the following:

$ Hadoop fs -count /hdfs-file-path

$ hdfs dfs -count /hdfs-file-path

Example E

The "chown" command can be used to change the owner and group of a file. To activate this, use the
below:

$ Hadoop fs -chown [-R] [owner][:[group]] hdfs-file-path

$ hdfs dfs -chown [-R] [owner][:[group]] hdfs-file-path

Working with HDFS file system

Listing your HDFS

Your HDFS listing should be /user/yourUserName. To view the contents of your HDFS home directory,
enter:

hdfs dfs -ls

As you're just getting started, you won't be able to see anything at this stage. When you want to view
the contents of a non-empty directory, input:

hdfs dfs -ls /user

You can then see the names of the home directories of all the other Hadoop users.

Creating a directory in HDFS

You can now create a test directory, ‘testHDFS’ with the following command.

hdfs dfs -mkdir testHDFS

Now you can verify if the directory exists by using the -ls by entering the following command.

hdfs dfs -ls /user/yourUserName

21
Copying a file

To copy a file from your local file system to HDFS, create a file you wish to copy. To do this, enter:

echo "HDFS test file" >> testFile

This will create a new file ‘testFile’. Try

cat testFile

You will then need to copy the file to HDFS. To copy files from Linux into HDFS, you need to use:

hdfs dfs -copyFromLocal testFile

Notice that you have to use the command "-copyFromLocal" because the command "-cp" is used to
copy files within HDFS.

Now you just need to confirm that the file has been copied over correctly. Do this by entering the
following:

code>hdfs dfs -ls

hdfs dfs -cat testFile

Moving and copying files

When you copied the testfile it was put into the base home directory. Now you can move it into the
testHDFS directory you've already created. To do this, use the following command.

hdfs dfs -mv testFile testHDFS

hdfs dfs -ls

hdfs dfs -ls testHDFS/

The first command moves your testFile from the HDFS home directory into testHDFS. The second
command then shows us that it's no longer in the HDFS home directory, and the third command
confirms that it had now been moved to the test HDFS directory.

To copy a file, try these commands

hdfs dfs -cp testHDFS/testFile testHDFS/testFile2

hdfs dfs -ls testHDFS/

Checking disk usage

To know disk usage, try

hdfs dfs -du

This will show how much space you are using in your HDFS. To know how much space is available in
HDFS across the cluster, try

hdfs dfs -df

22
Removing a file/directory

To do this, try

hdfs dfs -rm testHDFS/testFile

hdfs dfs -ls testHDFS/

This will show that you still have the testHDFS directory and testFile2. Try

hdfs dfs -rmdir testhdfs

You will see "rmdir: testhdfs: Directory is not empty". The directory needs to be empty before it can
be deleted. You can "rm -r" command to bypass this and do a recursive delete. Try

hdfs dfs -rm -r testHDFS

hdfs dfs -ls

----

2distributed File System Dfs
No ratings yet
2distributed File System Dfs
21 pages
Rev. Lecture 1 PPT2
No ratings yet
Rev. Lecture 1 PPT2
24 pages
Distributed File System
No ratings yet
Distributed File System
5 pages
(DFS) Distributed File System-1
No ratings yet
(DFS) Distributed File System-1
12 pages
What Is DFS
No ratings yet
What Is DFS
37 pages
Chapter 2 (II) Distributed System
No ratings yet
Chapter 2 (II) Distributed System
80 pages
Module III Hadoop Framework
No ratings yet
Module III Hadoop Framework
21 pages
22is403 LM16
No ratings yet
22is403 LM16
10 pages
Distributed File System Overview
No ratings yet
Distributed File System Overview
16 pages
Distributed File System Overview
No ratings yet
Distributed File System Overview
12 pages
A Distributed File System: By, Prof Ankita Mandore
No ratings yet
A Distributed File System: By, Prof Ankita Mandore
37 pages
Navigating The Landscape of Distributed File Systems: Architectures, Implementations, and Considerations
No ratings yet
Navigating The Landscape of Distributed File Systems: Architectures, Implementations, and Considerations
10 pages
Unit 3 Dic
No ratings yet
Unit 3 Dic
22 pages
AStudyOnDistributedFileSystems MahmutUNVER
No ratings yet
AStudyOnDistributedFileSystems MahmutUNVER
6 pages
Chapter 8
No ratings yet
Chapter 8
22 pages
Distributed File Systems-2
No ratings yet
Distributed File Systems-2
4 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Key Features of Distributed File Systems
No ratings yet
Key Features of Distributed File Systems
7 pages
7 A Taxonomy and Survey On Distributed File Systems
No ratings yet
7 A Taxonomy and Survey On Distributed File Systems
6 pages
Distributed File Systems
No ratings yet
Distributed File Systems
50 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
Lec 11 - Distributed Files - Distributed File System
No ratings yet
Lec 11 - Distributed Files - Distributed File System
33 pages
Chapter 3 - Distributed File System
No ratings yet
Chapter 3 - Distributed File System
39 pages
Distributed System DS Unit5
No ratings yet
Distributed System DS Unit5
61 pages
Distributed File Systems Although The World Wide Web Is The Predominant Distributed System in Use Today
No ratings yet
Distributed File Systems Although The World Wide Web Is The Predominant Distributed System in Use Today
3 pages
Lecture24 DFS PartI 25nov 2014
No ratings yet
Lecture24 DFS PartI 25nov 2014
46 pages
DC Chapter 6
No ratings yet
DC Chapter 6
15 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
Distributed Systems U4
No ratings yet
Distributed Systems U4
8 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
Distributed Systems: (3rd Edition)
No ratings yet
Distributed Systems: (3rd Edition)
36 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
18 pages
Distributed File System Requirements
No ratings yet
Distributed File System Requirements
4 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
2.5 DFS
No ratings yet
2.5 DFS
14 pages
Dist Sys Unit 4 Notes
No ratings yet
Dist Sys Unit 4 Notes
45 pages
Class Notes
No ratings yet
Class Notes
9 pages
DC EXP8-1
No ratings yet
DC EXP8-1
5 pages
Shivajirao Kadam Institute of Technology and Management, Indore (M.P.)
No ratings yet
Shivajirao Kadam Institute of Technology and Management, Indore (M.P.)
13 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
6 pages
UNIT 2 Distributed File System
No ratings yet
UNIT 2 Distributed File System
18 pages
Distributed File System - File Service Architecture
No ratings yet
Distributed File System - File Service Architecture
51 pages
Intro to Distributed File Systems
No ratings yet
Intro to Distributed File Systems
10 pages
Understanding Cloud Spanning Models
No ratings yet
Understanding Cloud Spanning Models
6 pages
DC - Unit 3 Uhh Ybhg The G Hai H G BT
No ratings yet
DC - Unit 3 Uhh Ybhg The G Hai H G BT
32 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
7 pages
Chap 6
No ratings yet
Chap 6
54 pages
Distributed File Systems Concepts and Examples
No ratings yet
Distributed File Systems Concepts and Examples
55 pages
Distributed File System
100% (1)
Distributed File System
17 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
9 pages
DC Mod 6
No ratings yet
DC Mod 6
9 pages
Unit-5.2 Distributed File System (DFS)
No ratings yet
Unit-5.2 Distributed File System (DFS)
29 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Brochure JLR-7500,7800
No ratings yet
Brochure JLR-7500,7800
6 pages
GitHub Education
No ratings yet
GitHub Education
1 page
Mason Tool Kit
No ratings yet
Mason Tool Kit
2 pages
Exam Topics
No ratings yet
Exam Topics
107 pages
Preliminary: A1 106648-EC-00000-31346-016 A Silangan Mindanao Mining Company Inc
No ratings yet
Preliminary: A1 106648-EC-00000-31346-016 A Silangan Mindanao Mining Company Inc
1 page
Saudi General Aviation Airshow (Sand & Fun) : Single Day Pass L Promo 09:30 Tue 19 Nov 2024
No ratings yet
Saudi General Aviation Airshow (Sand & Fun) : Single Day Pass L Promo 09:30 Tue 19 Nov 2024
5 pages
Computer Ulagam June 2010
No ratings yet
Computer Ulagam June 2010
32 pages
Computer Application Course Outline
No ratings yet
Computer Application Course Outline
15 pages
33881019442221441207
No ratings yet
33881019442221441207
11 pages
Gagan Pratap SSC Maths Notes PDF
85% (40)
Gagan Pratap SSC Maths Notes PDF
414 pages
Virtual Circuit Switching and MPLS
No ratings yet
Virtual Circuit Switching and MPLS
24 pages
Boiler Tube Failure Solutions
100% (1)
Boiler Tube Failure Solutions
27 pages
FortiManager Training Overview
No ratings yet
FortiManager Training Overview
28 pages
Dme Jde Hose Performance Test Report
No ratings yet
Dme Jde Hose Performance Test Report
2 pages
Icfest 2025 - 20250219 - 155331 - 0000
No ratings yet
Icfest 2025 - 20250219 - 155331 - 0000
1 page
AB-Assignments: Object Oriented Programming (C++) Air University
No ratings yet
AB-Assignments: Object Oriented Programming (C++) Air University
2 pages
Cyberattack Case Studies: Analysis and Prevention: Darshan Bhavsar
No ratings yet
Cyberattack Case Studies: Analysis and Prevention: Darshan Bhavsar
10 pages
Santanu Padhy Frontend
No ratings yet
Santanu Padhy Frontend
1 page
Latest MK-300 Series English Manual 22.03.22
No ratings yet
Latest MK-300 Series English Manual 22.03.22
71 pages
IELTS 2 - Writing - Unit 1
No ratings yet
IELTS 2 - Writing - Unit 1
16 pages
Maze Solver Robot, Using Artificial Intelligence With Arduino
No ratings yet
Maze Solver Robot, Using Artificial Intelligence With Arduino
16 pages
Logistic Regression Vs Decision Tree
No ratings yet
Logistic Regression Vs Decision Tree
2 pages
Lab Programs 12,13,14,15,16
No ratings yet
Lab Programs 12,13,14,15,16
3 pages
Fundamentals Guide
No ratings yet
Fundamentals Guide
134 pages
Steel Grades Specifications: Grades, Properties, and Nearest Equivalents
No ratings yet
Steel Grades Specifications: Grades, Properties, and Nearest Equivalents
5 pages
3D Machine Vision Systems Overview
No ratings yet
3D Machine Vision Systems Overview
12 pages
Bayes - Revision Questions - 2025
No ratings yet
Bayes - Revision Questions - 2025
7 pages
Shape Functions in Finite Elements
No ratings yet
Shape Functions in Finite Elements
12 pages
State Space Modeling and Analysis of Bicyle Dynamics Presentation
No ratings yet
State Space Modeling and Analysis of Bicyle Dynamics Presentation
19 pages
Design Technology HL Paper 1
No ratings yet
Design Technology HL Paper 1
16 pages