Data Ingest

This document provides examples of using Sqoop to import and export data between MySQL databases and HDFS. It demonstrates various Sqoop import options like text file format, conditional imports, column selection, and freeform queries. Examples are also given for importing all tables from a database. The document then shows how to export data from HDFS to MySQL tables. It briefly discusses changing import delimiters and formats. Finally, it provides examples of using Flume to ingest real-time and near real-time streaming data into HDFS.

Uploaded by

visha_s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views5 pages

Data Ingest

Uploaded by

visha_s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

# Import data from a MySQL database into HDFS using Sqoop

#1
- Importing orders table data from retail_db into HDFS
#
- textfile format with default delimiter, default mapper
#
- even if we dont specify target-dir, the output will be written to defa
ult hdfs folder
#
- when we use --query or -e switch, we must specify target-dir
sqoop import \
--connect="jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--target-dir /user/cloudera/orders \
--as-textfile
#2
- Importing orders table data from retail_db into HDFS
#
- textfile format with default delimiter, default mapper
#
- Clean up target dir if it exists
sqoop import \
--connect="jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--target-dir /user/cloudera/orders \
--as-textfile \
--delete-target-dir
#3
- Importing orders table data from retail_db into HDFS
#
- textfile format with only one mapper (so, total files will be 1)
#
- Clean up target dir if it exists
sqoop import \
--connect="jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--target-dir /user/cloudera/orders \
--as-textfile \
--delete-target-dir \
--m 1 \
--fields-terminated-by "|" \
--lines-terminated-by "\n"
#4
- Importing orders table data from retail_db into HDFS
#
- textfile format with only one mapper (so, total files will be 1)
#
- Conditional Import
sqoop import \
--connect="jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--target-dir /user/cloudera/orders \
--as-textfile \
--delete-target-dir \
--m 1 \
--fields-terminated-by "|" \
--lines-terminated-by "\n" \
--where "order_id < 11"
#5
#
#

- Importing orders table data from retail_db into HDFS with append
- textfile format with only one mapper (so, total files will be 1)
- Conditional Import

sqoop import \
--connect="jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--target-dir /user/cloudera/orders \
--as-textfile \
--append \
--m 1 \
--fields-terminated-by "|" \
--lines-terminated-by "\n" \
--where "order_id > 10 and order_id < 101"
#6
- Importing orders table data from retail_db into HDFS
#
- textfile format with only one mapper (so, total files will be 1)
#
- Clean up target dir if it exists
#
- Only specific columns (order_id, order_date, order_status)
#
- When we use --columns switch, there shouldn't be any white-space char
between columns
sqoop import \
--connect="jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--target-dir /user/cloudera/orders \
--as-textfile \
--delete-target-dir \
--m 1 \
--fields-terminated-by "|" \
--lines-terminated-by "\n" \
--columns order_id,order_date,order_status
#7
- Importing orders table data from retail_db into HDFS
#
- textfile format with freeform query
#
- when using freeform query, we must specify target-dir
#
- and also, we must specify split-by switch to specify column provided i
f we dont pass --m 1
sqoop import \
--connect jdbc:mysql://[Link]/retail_db \
--username retail_dba \
--password cloudera \
--delete-target-dir \
--as-textfile \
--target-dir=/user/cloudera/orders \
--query "select * from orders where \$CONDITIONS" \
--split-by order_customer_id
#8
- Importing orders table data from retail_db into HDFS
#
- sequencefile
#
- when using freeform query, we must specify target-dir
#
- and also, we must specify split-by switch to specify column provided i
f we dont pass --m 1
sqoop import \
--connect jdbc:mysql://[Link]/retail_db \
--username retail_dba \
--password cloudera \
--delete-target-dir \
--as-sequencefile \
--target-dir=/user/cloudera/orders \
--query "select * from orders where \$CONDITIONS" \

--split-by order_customer_id
#9
- Importing orders table data from retail_db into HDFS
#
- avrodatafile (This option will create .avsc file (AVRO Schema file) in
the local working folder
#
- when using freeform query, we must specify target-dir
#
- and also, we must specify split-by switch to specify column provided i
f we dont pass --m 1
sqoop import \
--connect jdbc:mysql://[Link]/retail_db \
--username retail_dba \
--password cloudera \
--delete-target-dir \
--as-avrodatafile \
--target-dir=/user/cloudera/orders \
--query "select * from orders where \$CONDITIONS" \
--split-by order_customer_id
#10
- Importing all tables from retail_db into HDFS
#
- as textfile
sqoop import-all-tables \
--connect "jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--as-textfile
################################################################################
###########################
# Export data to a MySQL database from HDFS using Sqoop
#1
- Exporting orders data from HDFS to MySQL table orders_export_test
create table orders_export_test select * from orders where 1=2;
sqoop export \
--connect "jdbc:mysql://[Link]/retail_db" \
--username retail_dba \
--password cloudera \
--table orders_export_test \
--export-dir /user/cloudera/orders
################################################################################
###########################
# Change the delimiter and file format of data during import using Sqoop
################################################################################
###########################
# Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
#1
- Ingest real-time data into HDFS
#
- Below is the flume conf file to read the data realtime
#
- It will read the result of "tail" command and ingest into HDFS
#
- Save this below conf into [Link] file under your local f
older
# Name the components on this agent
[Link] = r1
[Link] = k1
[Link] = c1

# Describe/configure the source

[Link] = exec
[Link] = tail -F /opt/gen_logs/logs/[Link]
# Describe the sink
[Link] = hdfs
[Link] = hdfs://[Link]/user/cloudera/flume/%y-%m-%d
[Link] = log
[Link] = DataStream
[Link] = true
# Use a channel which buffers events in memory
[Link] = memory
[Link] = 1000
[Link] = 100
# Bind the source and sink to the channel
[Link] = c1
[Link] = c1
#
- To start populating logs, we can use the following command
startlogs
#
- To start the flume agent
flume-ng agent --conf /home/cloudera --conf-file /home/cloudera/flume-exec-test.
conf --name a1
#2
#
#
#
folder

Ingest near real-time data into HDFS

Below is the flume conf file to read the data near-realtime
It will read the telnet inputs and ingest into HDFS
Save this below conf into [Link] file under your local

# Name the components on this agent

[Link] = r1
[Link] = k1
[Link] = c1
# Describe/configure
[Link] =
[Link] =
[Link] =

the source
netcat
localhost
44444

# Describe the sink

[Link] = hdfs
[Link] = hdfs://[Link]/user/cloudera/flume
[Link] = netcat
[Link] = DataStream
# Use a channel which buffers events in memory
[Link] = memory
[Link] = 1000
[Link] = 100
# Bind the source and sink to the channel
[Link] = c1
[Link] = c1
#
- To start populating input, we need to launch telnet on localhost:44444
telnet localhost 44444

#
- To start the flume agent
flume-ng agent --conf /home/cloudera --conf-file /home/cloudera/flume-netcat-tes
[Link] --name a1
################################################################################
###########################
# Load data into and out of HDFS using the Hadoop File System (FS) commands
################################################################################
###########################

Sqoop Demo
No ratings yet
Sqoop Demo
7 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
Importing Data with Cloudera Sqoop
No ratings yet
Importing Data with Cloudera Sqoop
23 pages
Slide 4 Data Loading Tool
No ratings yet
Slide 4 Data Loading Tool
77 pages
Sqoop Practice
No ratings yet
Sqoop Practice
5 pages
SQOOP Import and Export Commands Guide
No ratings yet
SQOOP Import and Export Commands Guide
29 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
Sqoop Import Techniques Guide
No ratings yet
Sqoop Import Techniques Guide
18 pages
Azure Synapse Analytics Overview
No ratings yet
Azure Synapse Analytics Overview
4 pages
Data Ingest
No ratings yet
Data Ingest
15 pages
Week 3
No ratings yet
Week 3
11 pages
Hadoop Exam
No ratings yet
Hadoop Exam
67 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Module IV
No ratings yet
Module IV
5 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
9 pages
RDBMS to HDFS Data Import/Export Guide
No ratings yet
RDBMS to HDFS Data Import/Export Guide
5 pages
Cloudera Data Management Tasks
No ratings yet
Cloudera Data Management Tasks
93 pages
Revision Solution
No ratings yet
Revision Solution
5 pages
Sqoop: Interface for RDBMS & Hadoop
No ratings yet
Sqoop: Interface for RDBMS & Hadoop
39 pages
Sqoop Data Transfer: Hadoop & MySQL
No ratings yet
Sqoop Data Transfer: Hadoop & MySQL
4 pages
Apache Sqoop: Import/Export Commands
No ratings yet
Apache Sqoop: Import/Export Commands
7 pages
Cloudera Msazure Hadoop Deployment Guide
No ratings yet
Cloudera Msazure Hadoop Deployment Guide
39 pages
Data Import and Analytics Guide
No ratings yet
Data Import and Analytics Guide
10 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
CCA175 Cloudera Exam Q&A Demo Guide
No ratings yet
CCA175 Cloudera Exam Q&A Demo Guide
11 pages
Sqoop Implementation Revised
No ratings yet
Sqoop Implementation Revised
7 pages
MySQL and Hive Database Commands Guide
No ratings yet
MySQL and Hive Database Commands Guide
5 pages
Week 4 - Hadoop Ecosystem
No ratings yet
Week 4 - Hadoop Ecosystem
109 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
18 pages
MySQL to Hive Data Import Guide
No ratings yet
MySQL to Hive Data Import Guide
14 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Hive Notes
No ratings yet
Hive Notes
4 pages
Sqoop MySQL to HDFS Data Transfer Guide
No ratings yet
Sqoop MySQL to HDFS Data Transfer Guide
7 pages
Questions For CCA175
50% (2)
Questions For CCA175
33 pages
Sqoop
No ratings yet
Sqoop
4 pages
Overview of Apache Hive Architecture
No ratings yet
Overview of Apache Hive Architecture
48 pages
Sqoop
No ratings yet
Sqoop
5 pages
Cloudera & MySQL Setup Guide
No ratings yet
Cloudera & MySQL Setup Guide
5 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Hadoop Commands
No ratings yet
Hadoop Commands
3 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Sqoop: Importing Data to Hadoop HDFS
No ratings yet
Sqoop: Importing Data to Hadoop HDFS
7 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
Understanding Apache Sqoop for Data Transfer
No ratings yet
Understanding Apache Sqoop for Data Transfer
24 pages
Hive 1
No ratings yet
Hive 1
3 pages
Apache - SQOOP and Flume
No ratings yet
Apache - SQOOP and Flume
16 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Practical 1-4
No ratings yet
Practical 1-4
14 pages
Hive Setup for Data Engineers
No ratings yet
Hive Setup for Data Engineers
8 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Sqoop Tool for AI & DS Students
No ratings yet
Sqoop Tool for AI & DS Students
10 pages
Importing RDBMS to HBase with Sqoop
No ratings yet
Importing RDBMS to HBase with Sqoop
4 pages
Moving Data in and Out of Hadoop
No ratings yet
Moving Data in and Out of Hadoop
17 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Red HatText Linux 7 Migration Planning Guide en US
No ratings yet
Red HatText Linux 7 Migration Planning Guide en US
80 pages
Getting Started With Apache Spark PDF
No ratings yet
Getting Started With Apache Spark PDF
88 pages
Getting Started With Apache Spark PDF
No ratings yet
Getting Started With Apache Spark PDF
88 pages
Building Maps For Oracle Business Intelligence Analyses and Dashboards
No ratings yet
Building Maps For Oracle Business Intelligence Analyses and Dashboards
1 page
Getting Started With Apache Spark PDF
No ratings yet
Getting Started With Apache Spark PDF
88 pages
Building Maps For Oracle Business Intelligence Analyses and Dashboards
No ratings yet
Building Maps For Oracle Business Intelligence Analyses and Dashboards
1 page
Incremental Aggregation in Informatica
No ratings yet
Incremental Aggregation in Informatica
3 pages
1 LabGuide
No ratings yet
1 LabGuide
4 pages
Knowledge Database Support Tool: Title Description/ Purpose
No ratings yet
Knowledge Database Support Tool: Title Description/ Purpose
7 pages
Perf Tune Siebel Sun
No ratings yet
Perf Tune Siebel Sun
79 pages
Upgrade
No ratings yet
Upgrade
30 pages
Ghana Communication Technology University: Data Communications and Networks Ii Network Systems Design Assignment
No ratings yet
Ghana Communication Technology University: Data Communications and Networks Ii Network Systems Design Assignment
5 pages
23-Abbate - Cold War and White Heat PDF
No ratings yet
23-Abbate - Cold War and White Heat PDF
11 pages
Fortios v6.4.2 Release Notes
No ratings yet
Fortios v6.4.2 Release Notes
59 pages
Cyber Physical System: A Seminar Report ON
No ratings yet
Cyber Physical System: A Seminar Report ON
31 pages
Huawei NetEngine AR600 Series Enterprise Routers Datasheet
No ratings yet
Huawei NetEngine AR600 Series Enterprise Routers Datasheet
12 pages
Question Bank: Intermediate Ii
No ratings yet
Question Bank: Intermediate Ii
3 pages
5000CTC Installation Instructions
0% (1)
5000CTC Installation Instructions
40 pages
Air Interface Logs LTE
No ratings yet
Air Interface Logs LTE
329 pages
Hybrid Machine Learning Model For Efficient Botnet Attack Detection in IoT
No ratings yet
Hybrid Machine Learning Model For Efficient Botnet Attack Detection in IoT
5 pages
Telecom Curriculum for EET Students
No ratings yet
Telecom Curriculum for EET Students
10 pages
Q
No ratings yet
Q
9 pages
Xerox - PrimeLink - B9100 - B9110 - B9125 - Production - Press - Sag - en-US Manual
100% (1)
Xerox - PrimeLink - B9100 - B9110 - B9125 - Production - Press - Sag - en-US Manual
248 pages
Changing The Path Selection Policy To Round Robin
No ratings yet
Changing The Path Selection Policy To Round Robin
9 pages
Preparing For DLP Management Server Installation
No ratings yet
Preparing For DLP Management Server Installation
1 page
Web Technology UNIT-1
No ratings yet
Web Technology UNIT-1
20 pages
IT Skills Competition Guide
100% (1)
IT Skills Competition Guide
10 pages
Understanding Software Defined Networking
No ratings yet
Understanding Software Defined Networking
21 pages
Hitachi Virtual Storage Platform G1000 Product Overview
No ratings yet
Hitachi Virtual Storage Platform G1000 Product Overview
56 pages
25-Spi-Ethernet Interfacing With Arm7 Slicker
No ratings yet
25-Spi-Ethernet Interfacing With Arm7 Slicker
20 pages
Cement Industry Leaders 2008
100% (1)
Cement Industry Leaders 2008
8 pages
Airtel Fixed Line & Broadband Bill Summary
No ratings yet
Airtel Fixed Line & Broadband Bill Summary
8 pages
D-Link Default Password List
No ratings yet
D-Link Default Password List
3 pages
Commvault CloudArchitectureGuidev11 SP15
No ratings yet
Commvault CloudArchitectureGuidev11 SP15
43 pages
ST500 & InUC Configuration for GT890
No ratings yet
ST500 & InUC Configuration for GT890
67 pages
Zero Trust-Based Authentication For Inter-Satellite Links in NextGen Low Earth Orbit Networks
No ratings yet
Zero Trust-Based Authentication For Inter-Satellite Links in NextGen Low Earth Orbit Networks
22 pages
Computer Networks S24
No ratings yet
Computer Networks S24
2 pages
ATM200 Actuators
No ratings yet
ATM200 Actuators
8 pages
Dark Web Crawler Research Paper
No ratings yet
Dark Web Crawler Research Paper
8 pages
SNCP
No ratings yet
SNCP
1 page
5150 SNMP Connectivity Module Guide
No ratings yet
5150 SNMP Connectivity Module Guide
8 pages

Data Ingest

Uploaded by

Data Ingest

Uploaded by

# Import data from a MySQL database into HDFS using Sqoop

# Describe/configure the source

Ingest near real-time data into HDFS

# Name the components on this agent

# Describe the sink

You might also like