0% found this document useful (0 votes)

56 views16 pages

Big Data Open Source Implementation & Administration

This document provides an overview of a course on Big Data open source implementation and administration. The course roadmap covers modules on big data management systems, data acquisition and storage, data access and processing, and data unification and analysis. The course uses a Big Data virtual machine to demonstrate concepts and technologies like Apache Hadoop, HDFS, MapReduce, YARN, Hive, Pig, and Impala.

Uploaded by

Joze1208

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views16 pages

Big Data Open Source Implementation & Administration

Uploaded by

Joze1208

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Big Data Open Source Implementation and Administration

Hours: 40
Instructor: Ing. Yonogy Curi

1 Introduction
Objectives 1-2
Questions About You 1-3
Course Objectives 1-4
Course Road Map: Module 1 Big Data Management System 1-5
Course Road Map: Module 2 Data Acquisition and Storage 1-6
Course Road Map: Module 3 Data Access and Processing 1-7
Course Road Map: Module 4 Data Unification and Analysis 1-8
The Big Data Virtual Machine (Used in this Course) Home Page 1-10
Connecting to the Practice Environment 1-11
Starting the Big Data Virtual Machine (VM) Used in this Course 1-12
Starting the Big Data (BDLite) Virtual Machine (VM) used in this Course 1-13
Accessing the Getting Started Page from the BDVM 1-14
Big Data Appliances Documentation 1-19

2 Big Data and the Information Management System

Lesson Objectives 2-3
Big Data: A Strategic IM Perspective 2-4
Big Data 2-5
Characteristics of Big Data 2-6
Importance of Big Data 2-8
Big Data Opportunities: Some Examples 2-9
Big Data Challenges 2-10
Information Management Landscape 2-12
Extending the Boundaries of Information Management 2-13
A Simple Functional Model for Big Data 2-14
Information Management Conceptual Architecture 2-16
Design Patterns to Component Usage Map 2-18
Big Data Adoption and Implementation Patterns 2-20
IM Architecture Data Approaches: Schema-on-Write vs Schema-on-Read 2-22
Course Approach: Big Data Project Phases 2-24
IM System for Big Data 2-26
Additional Resources 2-30
Summary 2-31
3 Using Big Data on Virtual Machine
Objectives 3-3
Lesson Agenda 3-4
Big Data Virtual Machine: Introduction 3-5
Big Data VM Components 3-6
Initializing the Environment for the Big Data VM 3-7
Initializing the Environment 3-8
Lesson Agenda 3-9
MoviePlex Case Study: Introduction 3-10
Big Data Challenge 3-12
Derive Value from Big Data 3-13
MoviePlex: Goal 3-14
MoviePlex: Big Data Challenges 3-15
MoviePlex: Architecture 3-16
MoviePlex: Data Generation 3-17
MoviePlex: Data Generation Format 3-18
MoviePlex Application 3-19
Summary 3-20

4 Introduction to the Big Data Ecosystem

Objectives 4-3
Computer Clusters 4-4
Distributed Computing 4-5
Apache Hadoop 4-6
Types of Analysis That Use Hadoop 4-7
Apache Hadoop Ecosystem 4-8
Apache Hadoop Core Components 4-9
HDFS Key Definitions 4-11
NameNode (NN) & DataNodes 4-12
MapReduce Framework 4-14
Benefits of MapReduce 4-15
MapReduce Job 4-16
MapReduce Versions 4-19
Choosing a Hadoop Distribution and Version 4-20
Additional Resources: Apache Hadoop 4-22
Cloudera’s Distribution Including Apache Hadoop (CDH) 4-23
CDH Architecture 4-24
CDH Components 4-25
CDH Architecture 4-26
CDH Components 4-28
Summary 4-30
5 Introduction to the Hadoop Distributed File System (HDFS)
Objectives 5-3
HDFS: Characteristics 5-5
HDFS Deployments: High Availability (HA) and Non-HA 5-7
HDFS Key Definitions 5-8
Functions of the NameNode 5-10
Secondary NameNode (Non-HA) 5-11
Functions of DataNodes 5-13
NameNode and Secondary NameNodes 5-14
Storing and Accessing Data Files in HDFS 5-15
HDFS Architecture: HA 5-17
Configuring an HA Cluster Hardware Resources 5-19
Data Replication Process 5-26
Accessing HDFS 5-27
HDFS Commands 5-29
Shell Interface 5-30
Accessing HDFS 5-32
FS Shell Commands 5-33
Sample FS Shell Commands 5-35
HDFS Administration Commands 5-38
Using the hdfs fsck Command: Example 5-39
HDFS Features and Benefits 5-40
Summary 5-41

6 Acquire Data Using CLI, Fuse DFS, and Flume

Objectives 6-3
Reviewing the Command Line Interface (CLI) 6-4
Viewing File System Contents Using the CLI 6-5
Loading Data Using the CLI 6-6
What is Fuse DFS? 6-7
Enabling Fuse DFS on Big Data 6-8
Using Fuse DFS 6-9
What is Flume? 6-10
Flume: Architecture 6-11
Flume Sources (Consume Events) 6-12
Flume Channels (Hold Events) 6-13
Flume Sinks (Deliver Events) 6-14
Configuring Flume 6-16
Exploring a flume*.conf File 6-17
Additional Resources 6-18
Summary 6-19
7 Acquire and Access Data Using NoSQL Database
Objectives 7-3
What is a NoSQL Database? 7-4
RDBMS Compared to NoSQL 7-5
HDFS Compared to NoSQL 7-6
NoSQL Database 7-7
Points to Consider Before Choosing NoSQL 7-8
NoSQL Key-Value Data Model 7-9
Acquiring and Accessing Data in a NoSQL DB 7-11
Primary (Parent) Table Data Model 7-12
Table Data Model: Child Tables 7-13
Creating Tables 7-14
Creating Tables: Two Options 7-15
Data Definition Language (DDL) Commands 7-16
CREATE TABLE 7-17
Accessing the CLI 7-19
Executing a DDL Command 7-20
Viewing Table Descriptions 7-21
Recommendation: Using Scripts 7-22
Loading Data Into Tables 7-23
Accessing the KVStore 7-24
Introducing the TableAPI 7-25
Write Operations: put() Methods 7-26
Writing Rows to Tables: Steps 7-27
Constructing a Handle 7-28
Creating Row Object, Adding Fields, and Writing Record 7-29
Reading Data from Tables 7-30
Read Operations: get() Methods 7-31
Retrieving Table Data: Steps 7-32
Retrieving Single a Row 7-33
Retrieving Multiple Rows 7-34
Retrieving Child Tables 7-35
Removing Data From Tables 7-36
Delete Operations: 3 TableAPIs 7-37
Deleting Row(s) From a Table: Steps 7-38
Additional Resources 7-39
Summary 7-40
8 Primary Administrative Tasks for NoSQL Database
Objectives 8-3
Installation Planning: KVStore Analysis 8-4
InitialCapacityPlanning Spreadsheet 8-5
Planning Spreadsheet Sections 8-6
Next Topic 8-7
Configuration Requirements 8-8
Determine the Number of Shards 8-9
Determine # of Partitions and Replication Factor 8-10
Determine # of Storage Nodes 8-11
Installation and Configuration Steps 8-12
Step 1: Creating Directories 8-13
Step 2: Extracting Software 8-14
Step 3: Verifying the Installation 8-15
Step 4: Configuring Nodes (Using the makebootconfig Utility) 8-16
Using the makebootconfig Utility 8-18
Starting the Storage Node Agents 8-19
Pinging the Replication Nodes 8-20
Next Topic 8-21
Configuration and Monitoring Tools 8-22
Steps to Deploy a KVStore 8-23
Introducing Plans 8-24
States of a Plan 8-25
Starting the Configuration Tool 8-26
Configuring KVStore 8-27
Creating a Zone 8-28
Deploying Storage and Admin Nodes 8-29
Creating a Storage Pool 8-30
Joining Nodes to the Storage Pool 8-31
Creating a Topology 8-32
Deploying the KVStore 8-33
Testing the KVStore 8-34
Additional Resources 8-35
Summary 8-36
9 Introduction to MapReduce
Objectives 9-3
MapReduce 9-4
MapReduce Architecture 9-5
MapReduce Version 1 (MRv1) Architecture 9-6
MapReduce Phases 9-7
MapReduce Framework 9-8
Parallel Processing with MapReduce 9-9
MapReduce Jobs 9-10
Interacting with MapReduce 9-11
MapReduce Processing 9-12
MapReduce (MRv1) Daemons 9-13
Hadoop Basic Cluster (MRv1): Example 9-14
MapReduce Application Workflow 9-15
Data Locality Optimization in Hadoop 9-17
MapReduce Mechanics: Deck of Cards Example 9-18
MapReduce Mechanics Example: Assumptions 9-19
MapReduce Mechanics: The Map Phase 9-20
MapReduce Mechanics: The Shuffle and Sort Phase 9-21
MapReduce Mechanics: The Reduce Phase 9-22
Word Count Process: Example 9-23
Submitting a MapReduce 9-24
Summary 9-25

10 Resource Management Using YARN

Objectives 10-3
Agenda 10-4
Apache Hadoop YARN: Overview 10-5
MapReduce 2.0 or YARN Architecture 10-7
MapReduce 2.0 (MRv2) or YARN Daemons 10-8
Hadoop Basic Cluster YARN (MRv2): Example 10-9
YARN Versus MRv1 Architecture 10-10
YARN (MRv2) Architecture 10-11
MapReduce 2.0 (MRv2) or YARN Daemons 10-13
YARN (MRv2) Daemons 10-14
YARN: Features 10-15
Launching an Application on a YARN Cluster 10-16
MRv1 Versus MRv2 10-18
Job Scheduling in YARN 10-20
YARN Fair Scheduler 10-21
Cloudera Manager Resource Management Features 10-23
Static Service Pools 10-25
Working with the Fair Scheduler 10-26
Cloudera Manager Dynamic Resource Management: Example 10-27
Submitting a Job to hrpool By User lucy from the hr Group 10-33
Monitoring the Status of the Submitted MapReduce Job 10-34
Examining the marketingpool 10-35
Submitting a Job to marketingpool By User lucy from the hr Group 10-36
Monitoring the Status of the Submitted MapReduce Job 10-37
Submitting a Job to marketingpool By User bob from the marketing Group 10-38
Monitoring the Status of the Submitted MapReduce Job 10-39
Delay Scheduling 10-40
Agenda 10-41
YARN application Command 10-42
YARN application Command: Example 10-43
Monitoring an Application Using the UI 10-45
The Scheduler: BDA Example 10-46
Summary 10-47

11 Overview of Hive and Pig

Objectives 11-3
Hive 11-4
Use Case: Storing Clickstream Data 11-5
Defining Tables over HDFS 11-6
Hive: Data Units 11-8
The Hive Metastore Database 11-9
Hive Framework 11-10
Creating a Hive Database 11-11
Data Manipulation in Hive 11-12
Data Manipulation in Hive: Nested Queries 11-13
Steps in a Hive Query 11-14
Hive-Based Applications 11-15
Hive: Limitations 11-16
Pig: Overview 11-17
Pig Latin 11-18
Pig Applications 11-19
Running Pig Latin Statements 11-20
Pig Latin: Features 11-21
Working with Pig 11-22
Summary 11-23
12 Overview of Cloudera Impala
Objectives 12-3
Hadoop: Some Data Access/Processing Options 12-4
Cloudera Impala 12-5
Cloudera Impala: Key Features 12-6
Cloudera Impala: Supported Data Formats 12-7
Cloudera Impala: Programming Interfaces 12-8
How Impala Fits Into the Hadoop Ecosystem 12-9
How Impala Works with Hive 12-10
How Impala Works with HDFS and HBase 12-11
Summary of Cloudera Impala Benefits 12-12
Impala and Hadoop: Limitations 12-13
Summary 12-14

13 Using XQuery for Hadoop

Objectives 13-3
XML 13-4
XML Elements 13-6
XML Attributes 13-8
XML Path Language 13-9
XPath Terminology: Node Types 13-10
XPath Terminology: Family Relationships 13-11
XPath Expressions 13-12
Location Path Expression: Example 13-13
XQuery: Review 13-14
XQuery Terminology 13-15
XQuery Review: books.xml Document Example 13-16
XQuery for Hadoop (OXH) 13-18
OXH Features 13-19
XQuery for Hadoop Data Flow 13-20
Using OXH 13-21
OXH Installation 13-22
OXH Functions 13-23
OXH Adapters 13-24
Running a Query: Syntax 13-25
OXH: Configuration Properties 13-26
XQuery Transformation and Basic Filtering: Example 13-27
Viewing the Completed Application in YARN 13-30
Calling Custom Java Functions from XQuery 13-31
Additional Resources 13-32
Summary 13-33
14 Overview of Solr
Objectives 14-3
Apache Solr (Cloudera Search) 14-4
Types of Indexing 14-5
The solrctl Command 14-12
SchemaXML File 14-13
Creating a Solr Collection 14-14
Using OXH with Solr 14-15
Using Solr with Hue 14-16
Summary 14-18

15 Apache Spark
Objectives 15-3
Apache Spark 15-4
Introduction to Spark 15-5
Spark: Components for Distributed Execution 15-6
Resilient Distributed Dataset (RDD) 15-7
RDD Operations 15-8
Characteristics of RDD 15-9
Directed Acyclic Graph Execution Engine 15-10
Scala Language: Overview 15-11
Scala Program: Word Count Example 15-12
Spark Shells 15-13
Summary 15-14

16 Options for Integrating Your Big Data

Objectives 16-3
Unifying Data: A Typical Requirement 16-4
Introducing Data Unification Options 16-6
Data Unification: Batch Loading 16-7
Sqoop 16-8
Loader for Hadoop (OLH) 16-9
Copy to BDA 16-10
Data Unification: Batch and Dynamic Loading 16-11
SQL Connector for Hadoop 16-12
Data Unification: ETL and Synchronization 16-13
Big Data Heterogeneous Integration with Hadoop Environments 16-14
Data Unification: Dynamic Access 16-16
Big Data SQL: A New Architecture 16-17
When To Use Different Technologies? 16-18
Summary 16-19
17 Overview of Apache Sqoop
Objectives 17-3
Apache Sqoop 17-4
Sqoop Components 17-5
Sqoop Features 17-6
Sqoop: Connectors 17-7
Importing Data into Hive 17-8
Sqoop: Advantages 17-9
Summary 17-10

18 Using Loaders for Hadoop (OLH)

Objectives 18-3
Loader for Hadoop 18-4
Software Prerequisites 18-5
Modes of Operation 18-6
OLH: Online Database Mode 18-7
Running an OLH Job 18-8
OLH Use Cases 18-9
Load Balancing in OLH 18-10
Input Formats 18-11
OLH: Offline Database Mode 18-12
Offline Load Advantages in OLH 18-13
OLH Versus Sqoop 18-14
Summary 18-15

19 Using Copy to BDA

Course Road Map 19-2
Objectives 19-3
Copy to BDA 19-4
Requirements for Using Copy to BDA 19-5
How Does Copy to BDA Work? 19-6
Copy to BDA: Functional Steps 19-7
Querying the Data in Hive 19-13
Summary 19-14
20 Using SQL Connector for HDFS
Objectives 20-3
SQL Connector for HDFS 20-4
OSCH Architecture 20-5
Using OSCH: Two Simple Steps 20-6
Using OSCH: Creating External Directory 20-7
Using OSCH: Database Objects and Grants 20-8
Using OSCH: Supported Data Formats 20-9
Using OSCH: HDFS Text File Support 20-10
Using OSCH: Hive Table Support 20-12
Using OSCH: Partitioned Hive Table Support 20-14
OSCH: Features 20-15
OSCH: Performance Tuning 20-17
OSCH: Key Benefits 20-18
Summary 20-20

21 Data Integrator with Hadoop

Objectives 21-3
Data Integrator 21-4
Declarative Design 21-5
Big Data Heterogeneous Integration with Hadoop Environments 21-7
Resources for Integration 21-13
Summary 21-14

22 Using Big Data SQL

Objectives 22-3
Barriers to Effective Big Data Adoption 22-4
Overcoming Big Data Barriers 22-5
Goal and Benefits 22-7
Using Big Data SQL 22-8
Configuring Big Data SQL 22-9
Create External Tables Over HDFS Data and Query the Data 22-14
Create External Tables to Leverage the Hive Metastore and Query the Data 22-16
Using Access Parameters with _hive 22-17
Automating External Table Creation 22-19
Applying Database Security Policies 22-20
Viewing the Results 22-21
Applying Redaction Policies to Data in Hadoop 22-22
Viewing Results from the Hive (Avro) Source 22-23
Viewing the Results from Joined RDBMS and HDFS Data 22-24
Summary 22-25
23 Using Advanced Analytics: Data Mining and R Enterprise
Objectives 23-3
Advanced Analytics 23-4
Data Mining Overview 23-5
What Is Data Mining? 23-6
Common Uses of Data Mining 23-7
Defining Key Data Mining Properties 23-8
Data Mining Categories 23-10
Supervised Data Mining Techniques 23-11
Supervised Data Mining Algorithms 23-12
Unsupervised Data Mining Techniques 23-13
Unsupervised Data Mining Algorithms 23-14
Data Mining: Overview 23-15
Data Miner GUI 23-16
DM SQL Interface 23-17
Data Miner 4.1 Big Data Enhancement 23-18
Example Workflow Using JSON Query Node 23-19
ODM Resources 23-20
What Is R? 23-23
Who Uses R? 23-24
Why Do Statisticians, Data Analysts, Data Scientists Use R? 23-25
Limitations of R 23-26
Strategy for the R Community 23-27
R Enterprise 23-28
R: Software Features 23-29
R Packages 23-30
Functions for Interacting with Database 23-31
R: Target Environment 23-32
R: Data Sources 23-33
R and Hadoop 23-34
R and HDFS Connectivity and Interaction 23-37
Hadoop Connectivity and Interaction 23-40
Summary 23-45
24 Introducing Big Data Discovery
Course Road Map 24-2
Objectives 24-3
Big Data Discovery 24-4
Find Data 24-5
Explore Data 24-6
Transform and Enrich Data 24-7
Discover Information 24-8
Share Insights 24-9
BDD: Technical Innovation on Hadoop 24-10
Additional Resources 24-11
Summary 24-12

25 Introduction to the Big Data Appliance (BDA)

Objectives 25-3
Big Data Appliance 25-4
Big Data Appliance: Key Component of the Big Data 25-5
Engineered Systems for Big Data 25-6
The Available BDA Configurations 25-7
Using the Mammoth Utility 25-8
Using BDA Configuration Generation Utility 25-10
Configuring Big Data Appliance 25-11
The Generated Configuration Files 25-13
The BDA Configuration Generation Utility Pages 25-15
Big Data Appliance: Software Components 25-16
Big Data Appliance and YARN 25-17
Stopping the YARN Service 25-18
Hardware Failure in NoSQL 25-22
Integrated Lights Out Manager (ILOM): Overview 25-23
ILOM Users 25-24
Connecting to ILOM Using the Network 25-25
ILOM: Integrated View 25-26
Monitoring the Health of BDA: Management Utilities 25-27
Big Data Appliance: Usage Guidelines 25-39
Summary 25-40
26 Managing BDA
Objectives 26-3
Lesson Agenda 26-4
Mammoth Utility 26-5
Installation types 26-6
Mammoth Code: Examples 26-7
Mammoth Installation Steps 26-8
Lesson Agenda 26-10
Monitoring BDA 26-11
BDA Command-Line Interface 26-12
bdacli 26-13
setup-root-ssh 26-14
Lesson Agenda 26-15
Monitor BDA with Enterprise Manager 26-16
OEM: Web and Command-Line Interfaces 26-17
OEM: Hardware Monitoring 26-18
Hadoop Cluster Monitoring 26-19
Lesson Agenda 26-20
Managing CDH Operations 26-21
Using Cloudera Manager 26-22
Monitoring BDA Status 26-23
Performing Administrative Tasks 26-24
Managing Services 26-25
Lesson Agenda 26-26
Monitoring MapReduce Jobs 26-27
Monitoring the Health of HDFS 26-28
Lesson Agenda 26-29
Cloudera Hue 26-30
Hive Query Editor (Hue) Interface 26-31
Logging in to Hue 26-32
Lesson Agenda 26-33
Starting BDA 26-34
Stopping BDA 26-35
BDA Port Assignments 26-36
Summary 26-37
27 Balancing MapReduce Jobs
Objectives 27-3
Ideal World: Neatly Balanced MapReduce Jobs 27-4
Real World: Skewed Data and Unbalanced Jobs 27-5
Data Skew 27-6
Data Skew Can Slow Down the Entire Hadoop Job 27-7
Perfect Balance 27-8
How Does the Perfect Balance Work? 27-9
Using Perfect Balance 27-10
Application Requirements for Using Perfect Balance 27-11
Perfect Balance: Benefits 27-12
Using Job Analyzer 27-13
Getting Started with Perfect Balance 27-14
Using Job Analyzer 27-16
Environmental Setup for Perfect Balance and Job Analyzer 27-17
Using Job Analyzer as a Stand-Alone Utility: Example with a YARN Cluster 27-19
Configuring Perfect Balance 27-20
Using Perfect Balance to Run a Balanced MapReduce Job 27-21
Running a Job Using Perfect Balance: Examples 27-23
Perfect Balance–Generated Reports 27-25
The Job Analyzer Reports: Structure of the Job Output Directory 27-26
Reading the Job Analyzer Reports 27-27
Reading the Job Analyzer Report in HDFS Using a Web Browser 27-28
Reading the Job Analyzer Report in the Local File System in a Web Browser 27-29
Looking for Skew Indicators in the Job Analyzer Reports 27-30
Job Analyzer Sample Reports 27-31
Collecting Additional Metrics with Job Analyzer 27-32
Using Perfect Balance API 27-34
Troubleshooting Jobs Running with Perfect Balance 27-37
Perfect Balance Examples Available with Installation 27-38
Summary 27-40
28 Securing Your Data
Objectives 28-3
Security Trends 28-4
Security Levels 28-5
Outline 28-6
Relaxed Security 28-7
HDFS ACLs 28-10
Changing Access Privileges 28-11
Challenges with Relaxed Security 28-13
Create Databases (in Hive) 28-15
Privileges on Source Data for Tables 28-16
Granting Privileges on Source Data for Tables 28-18
Creating the Table and Loading the Data 28-19
Grant and Revoke Access to Table 28-21
Database Access to HDFS 28-22
Auditing 28-24
Encryption 28-25
Network Encryption 28-26
Data at Rest Encryption 28-28
Summary 28-30

29 Introduction to Big Data on Cloud

Objectives 29-2
Big Data on Cloud Service 29-3
Big Data Cloud Service: Key Features 29-4
Big Data Cloud Service: Benefits 29-5
Elasticity: Dedicated Compute Bursting 8-6
Security Made Easy 8-8
Comprehensive Analytics Toolset Included 29-9
Big Data Deployment Models: Choices 29-12
Resources 29-15
Summary 29-16

Big Data & Hadoop
100% (3)
Big Data & Hadoop
189 pages
Hadoop Introduction
No ratings yet
Hadoop Introduction
29 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
Hadoop Administration
No ratings yet
Hadoop Administration
97 pages
Data Engineering Brochure FXSr63lN9T
No ratings yet
Data Engineering Brochure FXSr63lN9T
14 pages
DE Python
No ratings yet
DE Python
11 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
Hadoop: Big Data Processing Essentials
No ratings yet
Hadoop: Big Data Processing Essentials
19 pages
Hadoop Developer Training Overview
No ratings yet
Hadoop Developer Training Overview
8 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
BDA Module2
No ratings yet
BDA Module2
83 pages
Comprehensive Azure SQL Training Guide
No ratings yet
Comprehensive Azure SQL Training Guide
6 pages
62-BigData Hadoop Course
No ratings yet
62-BigData Hadoop Course
3 pages
3rd Sem Syllabus
No ratings yet
3rd Sem Syllabus
13 pages
Comprehensive Hadoop Training Course
No ratings yet
Comprehensive Hadoop Training Course
3 pages
Unit 1 J2 Big Data
No ratings yet
Unit 1 J2 Big Data
6 pages
LP BigData
No ratings yet
LP BigData
5 pages
Big Data Analytics Unit-1
No ratings yet
Big Data Analytics Unit-1
39 pages
Lecture 3 MR Model and Systems
No ratings yet
Lecture 3 MR Model and Systems
67 pages
Big Data & Hadoop Curriculum
0% (1)
Big Data & Hadoop Curriculum
13 pages
Bda Sem 7 Book
No ratings yet
Bda Sem 7 Book
188 pages
Big Data Hadoop Architect
No ratings yet
Big Data Hadoop Architect
19 pages
Hadoop Ecosystem Overview and Setup
No ratings yet
Hadoop Ecosystem Overview and Setup
48 pages
Data Bots Training Courses
100% (1)
Data Bots Training Courses
36 pages
Hadoop 2 Quick Start Guide PDF
100% (1)
Hadoop 2 Quick Start Guide PDF
736 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Big Data Engineer Course Syllabus
No ratings yet
Big Data Engineer Course Syllabus
21 pages
Big Data and Hadoop Course Overview
No ratings yet
Big Data and Hadoop Course Overview
6 pages
Introduction to Hadoop and Cloudera
100% (1)
Introduction to Hadoop and Cloudera
91 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Big Data Evolution & Data Wrangling
No ratings yet
Big Data Evolution & Data Wrangling
56 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Data W - Bigdata8
No ratings yet
Data W - Bigdata8
105 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
Comprehensive Guide to Hadoop and Big Data
No ratings yet
Comprehensive Guide to Hadoop and Big Data
2 pages
2-Introduction To Hadoop Eco System
No ratings yet
2-Introduction To Hadoop Eco System
35 pages
1 - HADOOP Crash Course
No ratings yet
1 - HADOOP Crash Course
52 pages
Data Engineering Essentials
100% (1)
Data Engineering Essentials
92 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Data Engineering
No ratings yet
Data Engineering
91 pages
SergeBazhievsky Introduction To Hadoop MapReduce v2
No ratings yet
SergeBazhievsky Introduction To Hadoop MapReduce v2
67 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
HADOOP
No ratings yet
HADOOP
4 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
Data Engineering Skills Guide
100% (1)
Data Engineering Skills Guide
102 pages
Fillatre Big Data
No ratings yet
Fillatre Big Data
98 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Biggdata
No ratings yet
Biggdata
24 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
Had Oop Details
No ratings yet
Had Oop Details
21 pages
BDH (1 5) ChatGPT
No ratings yet
BDH (1 5) ChatGPT
26 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
Big Data Curriculum for CS & CSE Students
No ratings yet
Big Data Curriculum for CS & CSE Students
2 pages
AWS Big Data Workshop: 35-Hour Course
No ratings yet
AWS Big Data Workshop: 35-Hour Course
2 pages
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
Top 25 Cost Analysis KPIs 2018 Online PDF
No ratings yet
Top 25 Cost Analysis KPIs 2018 Online PDF
109 pages
Limit Calculation for Poscomp 2018
No ratings yet
Limit Calculation for Poscomp 2018
1 page
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
MQ Administration
100% (4)
MQ Administration
157 pages
E-Lock Client Software Installation and Troubleshooting Guide
No ratings yet
E-Lock Client Software Installation and Troubleshooting Guide
10 pages
Borland C#Builder ™
No ratings yet
Borland C#Builder ™
2 pages
Internship: Bachelor of Technology Electronics and Communication Engineering
No ratings yet
Internship: Bachelor of Technology Electronics and Communication Engineering
96 pages
Tellabs 6300 Client Installation Guide
100% (2)
Tellabs 6300 Client Installation Guide
18 pages
Autodesk - Corrupt CascadeInfo
No ratings yet
Autodesk - Corrupt CascadeInfo
2 pages
3HAC026876-001 Revc en
No ratings yet
3HAC026876-001 Revc en
443 pages
Java FS + SQL JD
No ratings yet
Java FS + SQL JD
2 pages
Techstream Part 4
No ratings yet
Techstream Part 4
26 pages
Common PC File Extensions Guide
No ratings yet
Common PC File Extensions Guide
24 pages
Socket Programming in C# - Part 2
75% (4)
Socket Programming in C# - Part 2
6 pages
Customer Item Cross References
No ratings yet
Customer Item Cross References
7 pages
Assignment#3 Atif Ali
No ratings yet
Assignment#3 Atif Ali
8 pages
Destiny Follett LM Onsite Essentials fg1
No ratings yet
Destiny Follett LM Onsite Essentials fg1
121 pages
Dubizzle Saved Search System Guide
No ratings yet
Dubizzle Saved Search System Guide
4 pages
Educational Software Evaluation
No ratings yet
Educational Software Evaluation
8 pages
Basic Troubleshooting Computer
No ratings yet
Basic Troubleshooting Computer
1 page
Top 20 Distraction-Free Text Editors
No ratings yet
Top 20 Distraction-Free Text Editors
11 pages
10 Awesome Online School Communication Apps For Teachers
No ratings yet
10 Awesome Online School Communication Apps For Teachers
8 pages
Digital Learning Tools Guide
No ratings yet
Digital Learning Tools Guide
15 pages
Apache Camel Context Guide
No ratings yet
Apache Camel Context Guide
5 pages
SDR Development with PlutoSDR
No ratings yet
SDR Development with PlutoSDR
40 pages
Ios 26
No ratings yet
Ios 26
16 pages
01.basics Understanding and Data Types, Variables & Operators - Jupyter Notebook
No ratings yet
01.basics Understanding and Data Types, Variables & Operators - Jupyter Notebook
42 pages
Class X-Ict Skills
No ratings yet
Class X-Ict Skills
11 pages
Digital Repositories for Institutions
No ratings yet
Digital Repositories for Institutions
30 pages
FC500 Console Installation and Updates Guide
No ratings yet
FC500 Console Installation and Updates Guide
4 pages
Word 365 MOS Day Tasks Guide
No ratings yet
Word 365 MOS Day Tasks Guide
2 pages
Access PDF
No ratings yet
Access PDF
20 pages
Change SAPSR3 Password Using BRTOOLS
No ratings yet
Change SAPSR3 Password Using BRTOOLS
2 pages

Big Data Open Source Implementation & Administration

Uploaded by

Big Data Open Source Implementation & Administration

Uploaded by

Big Data Open Source Implementation and Administration

2 Big Data and the Information Management System

4 Introduction to the Big Data Ecosystem

6 Acquire Data Using CLI, Fuse DFS, and Flume

10 Resource Management Using YARN

11 Overview of Hive and Pig

13 Using XQuery for Hadoop

16 Options for Integrating Your Big Data

18 Using Loaders for Hadoop (OLH)

19 Using Copy to BDA

21 Data Integrator with Hadoop

22 Using Big Data SQL

25 Introduction to the Big Data Appliance (BDA)

29 Introduction to Big Data on Cloud

You might also like