0% found this document useful (0 votes)

27 views4 pages

Hadoop Spark MongoDB SCALA Notes

The document outlines the Hadoop ecosystem, including components like HDFS, YARN, and various schedulers. It also introduces NoSQL databases, specifically MongoDB, detailing data types and operations. Additionally, it covers Apache Spark's architecture and Scala programming language features.

Uploaded by

ishantjaiswal2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Hadoop Spark MongoDB SCALA Notes

Uploaded by

ishantjaiswal2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Hadoop Ecosystem and YARN

1. Hadoop Ecosystem Components:

- HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop, Flume, Oozie, Zookeeper, Ambari.

2. Schedulers:

- FIFO: First In First Out.

- Fair Scheduler: Resources shared equally among jobs.

- Capacity Scheduler: Resources allocated based on queue capacity.

3. Hadoop 2.0 New Features:

- NameNode High Availability: Eliminates single point of failure.

- HDFS Federation: Supports multiple NameNodes.

- MRv2 (MapReduce Version 2): Decouples resource management and job scheduling.

- YARN (Yet Another Resource Negotiator): Resource management layer.

- Running MRv1 in YARN: Backward compatibility with MRv1 applications.

NoSQL Databases

1. Introduction to NoSQL:

- Non-relational, distributed, schema-less databases.

- Types: Key-Value, Document, Column-Family, Graph databases.

MongoDB
1. Introduction:

- Document-oriented NoSQL database.

- Stores data in BSON format.

2. Data Types:

- String, Integer, Boolean, Double, Arrays, Objects, Null, Date, ObjectId.

3. Creating, Updating, and Deleting Documents:

- db.collection.insertOne(), insertMany()

- db.collection.updateOne(), updateMany()

- db.collection.deleteOne(), deleteMany()

4. Querying:

- db.collection.find(), findOne()

- Query operators: $gt, $lt, $in, $and, $or, $regex

5. Introduction to Indexing:

- Improves query performance.

- db.collection.createIndex({field: 1})

6. Capped Collections:

- Fixed-size collections that overwrite oldest data.

Apache Spark

1. Installing Spark:
- Download binaries, set environment variables, configure Spark.

2. Spark Applications, Jobs, Stages, and Tasks:

- Application: User program.

- Job: Triggered by an action.

- Stage: Set of tasks based on shuffle boundaries.

- Task: Smallest unit of work.

3. Resilient Distributed Datasets (RDDs):

- Immutable, distributed collection of objects.

- Supports transformations and actions.

4. Anatomy of a Spark Job Run:

- Driver program launches SparkContext.

- Executes transformations, actions.

- DAG scheduler creates stages, tasks distributed by TaskScheduler.

5. Spark on YARN:

- Allows Spark to run on Hadoop YARN for resource management.

SCALA

1. Introduction:

- Functional and object-oriented language.

- Runs on the JVM.

2. Classes and Objects:

- Class: Blueprint for objects.

- Object: Singleton instance.

3. Basic Types and Operators:

- Int, Float, Double, Char, Boolean.

- Operators: +, -, *, /, %, ==, !=, &&, ||

4. Built-in Control Structures:

- if, else, while, for, match-case

5. Functions and Closures:

- def functionName(parameters): returnType = {...}

- Closures: Functions with free variables.

6. Inheritance:

- class Subclass extends Superclass

- Supports traits for multiple inheritance.

Unit 4-Hadoop Ecosystem and YARN
No ratings yet
Unit 4-Hadoop Ecosystem and YARN
4 pages
BIGDATA4
No ratings yet
BIGDATA4
28 pages
Unit 4 1
No ratings yet
Unit 4 1
7 pages
Big Data Computing Notes
No ratings yet
Big Data Computing Notes
17 pages
Comprehensive Guide to Hadoop and Big Data
No ratings yet
Comprehensive Guide to Hadoop and Big Data
2 pages
DE Python
No ratings yet
DE Python
11 pages
Big - Data - ISE 2
No ratings yet
Big - Data - ISE 2
12 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
Big Data
No ratings yet
Big Data
27 pages
Big Data Unit 4 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 4 (Easy Notes) Edushine Classes
34 pages
Hadoop Ecosystem Overview and Components
No ratings yet
Hadoop Ecosystem Overview and Components
96 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Spark Development for Developers
No ratings yet
Spark Development for Developers
172 pages
Unit 2
No ratings yet
Unit 2
7 pages
HADOOP
No ratings yet
HADOOP
4 pages
Apache Spark
No ratings yet
Apache Spark
3 pages
Bda U4
No ratings yet
Bda U4
49 pages
Unit - 4
No ratings yet
Unit - 4
18 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Wa0005.
No ratings yet
Wa0005.
84 pages
Spark Fundamentals Overview
No ratings yet
Spark Fundamentals Overview
25 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Unit 4
No ratings yet
Unit 4
85 pages
Understanding MapReduce in CloudPDF
No ratings yet
Understanding MapReduce in CloudPDF
138 pages
Unit 6-1
No ratings yet
Unit 6-1
128 pages
Big Data
No ratings yet
Big Data
3 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
Hadoopvsspark 180108070838
No ratings yet
Hadoopvsspark 180108070838
17 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
PySpark and AWS Big Data Training
No ratings yet
PySpark and AWS Big Data Training
8 pages
Scala and Spark Overview PDF
No ratings yet
Scala and Spark Overview PDF
37 pages
Apache Spark Cheatsheet (2014)
No ratings yet
Apache Spark Cheatsheet (2014)
9 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
40 pages
Bda Unit 5 - Mam
No ratings yet
Bda Unit 5 - Mam
44 pages
SPARK
No ratings yet
SPARK
47 pages
2.2. Components of Hadoop - Analysing
No ratings yet
2.2. Components of Hadoop - Analysing
16 pages
BD U-2 (Anupam Sir)
No ratings yet
BD U-2 (Anupam Sir)
30 pages
BDA Unit 2
No ratings yet
BDA Unit 2
52 pages
Detailed Big Data and Hadoop Notes
No ratings yet
Detailed Big Data and Hadoop Notes
3 pages
BD Notes 5
No ratings yet
BD Notes 5
37 pages
Chap5 BigDataComputingAndProcessing
No ratings yet
Chap5 BigDataComputingAndProcessing
72 pages
Introduction to Apache Spark 2 Architecture
No ratings yet
Introduction to Apache Spark 2 Architecture
43 pages
BDH (1 5) ChatGPT
No ratings yet
BDH (1 5) ChatGPT
26 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
SPARK
No ratings yet
SPARK
66 pages
Big Data and Hadoop Notes
No ratings yet
Big Data and Hadoop Notes
3 pages
Benefits of Hadoop MapReduce
No ratings yet
Benefits of Hadoop MapReduce
1 page
Unit IV Spark
No ratings yet
Unit IV Spark
23 pages
Unit 4 Endsem PYQs
No ratings yet
Unit 4 Endsem PYQs
24 pages
Sparklyr Online Training Overview
No ratings yet
Sparklyr Online Training Overview
80 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
Overview of Hadoop and Spark Ecosystem
No ratings yet
Overview of Hadoop and Spark Ecosystem
14 pages
Iso 2911 2004
No ratings yet
Iso 2911 2004
9 pages
CX Plus Controller Replacement Guide
No ratings yet
CX Plus Controller Replacement Guide
4 pages
Degrees of Freedom in Gases Explained
No ratings yet
Degrees of Freedom in Gases Explained
2 pages
JEE Main 2023 Physics & Chemistry Solutions
No ratings yet
JEE Main 2023 Physics & Chemistry Solutions
26 pages
Hypothesis Testing for Mean Salaries and Costs
No ratings yet
Hypothesis Testing for Mean Salaries and Costs
4 pages
Sol Accordion Course
No ratings yet
Sol Accordion Course
6 pages
Radial Flow Between Parallel Disks
No ratings yet
Radial Flow Between Parallel Disks
29 pages
Line Segment Division Ratios
No ratings yet
Line Segment Division Ratios
12 pages
Basic Laboratory Apparatus
No ratings yet
Basic Laboratory Apparatus
12 pages
ITwin Technology
No ratings yet
ITwin Technology
20 pages
Aptitude and Programming Questions Guide
No ratings yet
Aptitude and Programming Questions Guide
179 pages
Emt Mcqs
No ratings yet
Emt Mcqs
9 pages
Lecture2 Data
No ratings yet
Lecture2 Data
57 pages
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
100% (1)
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
55 pages
Math 101 - Supplemental Package Final
No ratings yet
Math 101 - Supplemental Package Final
83 pages
Aquarium Fabrication and Setup Guide
No ratings yet
Aquarium Fabrication and Setup Guide
12 pages
Lecture 4 WATER QUALITY MODELLING
No ratings yet
Lecture 4 WATER QUALITY MODELLING
78 pages
Implementation of VI Editor With NCURSES
No ratings yet
Implementation of VI Editor With NCURSES
10 pages
Economics Assignment, 2024
No ratings yet
Economics Assignment, 2024
2 pages
Quadratic Equations Class 10
No ratings yet
Quadratic Equations Class 10
4 pages
Strategies in Teaching Social Studies Inductive and Deductive Andragogy vs. Pedagogy
No ratings yet
Strategies in Teaching Social Studies Inductive and Deductive Andragogy vs. Pedagogy
31 pages
Machine Learning for ICT Students
No ratings yet
Machine Learning for ICT Students
26 pages
A Detailed Lesson Plan in Mathematics Five Name: Jerome M. Dela Cruz Subject: Mathematics 5
No ratings yet
A Detailed Lesson Plan in Mathematics Five Name: Jerome M. Dela Cruz Subject: Mathematics 5
4 pages
Horizontal Format - Calculating Lengths and Angles in Shapes
No ratings yet
Horizontal Format - Calculating Lengths and Angles in Shapes
2 pages
Analyser Sampling System
No ratings yet
Analyser Sampling System
20 pages
Berry Shrivel Disorder in Grape
No ratings yet
Berry Shrivel Disorder in Grape
5 pages
What Is RPC RMI
No ratings yet
What Is RPC RMI
5 pages
AC Voltage Controllers Overview
No ratings yet
AC Voltage Controllers Overview
21 pages
Experimental Probability Lesson Plan
No ratings yet
Experimental Probability Lesson Plan
5 pages
12th Physics Full Study Materil English Medium
100% (2)
12th Physics Full Study Materil English Medium
272 pages

Hadoop Spark MongoDB SCALA Notes

Uploaded by

Hadoop Spark MongoDB SCALA Notes

Uploaded by

Hadoop Ecosystem and YARN

1. Hadoop Ecosystem Components:

- FIFO: First In First Out.

- Fair Scheduler: Resources shared equally among jobs.

- Capacity Scheduler: Resources allocated based on queue capacity.

3. Hadoop 2.0 New Features:

- NameNode High Availability: Eliminates single point of failure.

- HDFS Federation: Supports multiple NameNodes.

- YARN (Yet Another Resource Negotiator): Resource management layer.

- Running MRv1 in YARN: Backward compatibility with MRv1 applications.

- Non-relational, distributed, schema-less databases.

- Types: Key-Value, Document, Column-Family, Graph databases.

- Document-oriented NoSQL database.

- Stores data in BSON format.

- String, Integer, Boolean, Double, Arrays, Objects, Null, Date, ObjectId.

3. Creating, Updating, and Deleting Documents:

- Query operators: $gt, $lt, $in, $and, $or, $regex

- Improves query performance.

- Fixed-size collections that overwrite oldest data.

2. Spark Applications, Jobs, Stages, and Tasks:

- Application: User program.

- Job: Triggered by an action.

- Stage: Set of tasks based on shuffle boundaries.

- Task: Smallest unit of work.

3. Resilient Distributed Datasets (RDDs):

- Immutable, distributed collection of objects.

- Supports transformations and actions.

4. Anatomy of a Spark Job Run:

- Driver program launches SparkContext.

- Executes transformations, actions.

- DAG scheduler creates stages, tasks distributed by TaskScheduler.

- Allows Spark to run on Hadoop YARN for resource management.

- Functional and object-oriented language.

- Runs on the JVM.

- Class: Blueprint for objects.

- Object: Singleton instance.

3. Basic Types and Operators:

- Int, Float, Double, Char, Boolean.

- Operators: +, -, *, /, %, ==, !=, &&, ||

4. Built-in Control Structures:

- if, else, while, for, match-case

5. Functions and Closures:

- def functionName(parameters): returnType = {...}

- Closures: Functions with free variables.

- class Subclass extends Superclass

- Supports traits for multiple inheritance.

You might also like