0% found this document useful (0 votes)
25 views5 pages

Sqoop Practice

Uploaded by

chandhu194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views5 pages

Sqoop Practice

Uploaded by

chandhu194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

Use hn3 as Source Database:

===========================
sudo systemctl status mysqld
sudo systemctl start mysqld
sudo systemctl status mysqld
sudo systemctl enable mysqld

To create sample data in RDBMS:


===============================
Copy Sales2018.txt to hn3 machine with the help of WinScp
Run "bash Sales2018.txt" in hn3 machine

mysql -uroot -pwelcome1

Create Project folder on HDFS and provide permissions:


======================================================
sudo -u hdfs hdfs dfs -mkdir -p /project1/data/sqoop
sudo -u hdfs hdfs dfs -chown -R hdpuser:hdpadmin /project1
sudo -u hdfs hdfs dfs -mkdir /user/hdpuser
sudo -u hdfs hdfs dfs -chown hdpuser:hdpadmin /user/hdpuser
sudo -u hdfs hdfs dfs -chmod -R 777 /user

sqoop help

List all databases in MySQL


===========================
sqoop list-databases --connect jdbc:mysql://hn3.hadoop.com --username root --P

List all tables in MySQL


========================
sqoop list-tables --connect jdbc:mysql://hn3.hadoop.com/sales --username root --
password welcome1

Create Connection File


======================
sudo vi /var/lib/SqoopConnection_DB_Sales.txt

--connect
jdbc:mysql://hn3.hadoop.com/sales
--username
root
--password
welcome1

sudo chown hdpuser:hdpadmin /var/lib/SqoopConnection_DB_Sales.txt


sudo chmod 600 /var/lib/SqoopConnection_DB_Sales.txt

Pass parameter file to Sqoop


============================
sqoop list-databases --options-file /var/lib/SqoopConnection_DB_Sales.txt
sqoop list-tables --options-file /var/lib/SqoopConnection_DB_Sales.txt
Import table data to HDFS (O/P file will be by default delimited text)
======================================================================

sudo -u hdfs hdfs dfs -rm -r -skipTrash /user/hdpuser/sales_data

sqoop import --options-file /var/lib/SqoopConnection_DB_Sales.txt --table


sales_data -m 1

hdfs dfs -ls /user/hdpuser/sales_data


hdfs dfs -cat /user/hdpuser/sales_data/part-m-00000

Import table data to HDFS (Import only specific columns & rows)
===============================================================
hdfs dfs -rm -r -skipTrash /project1/data/sqoop/sales_data_NY
sqoop-import --options-file /var/lib/SqoopConnection_DB_Sales.txt --table
sales_data --columns "TransID,TransDate,Product,Price,State" --where "State='NY'"
--fields-terminated-by '$' --lines-terminated-by '#' --target-dir
/project1/data/sqoop/sales_data_NY -m 2

hdfs dfs -ls /project1/data/sqoop/sales_data_NY


hdfs dfs -cat /project1/data/sqoop/sales_data_NY/part-m-00000

Import table data to HDFS (Tab separated file format, Split by and Multiple
Mappers. Note: Multiple Mappers will not work if you dont have primary key in your
table)
===================================================================================
===============================================
sudo -u hdfs hdfs dfs -rm -r -skipTrash /project1/data/sqoop/sales_data
sqoop-import --options-file /var/lib/SqoopConnection_DB_Sales.txt --fields-
terminated-by '\t' --lines-terminated-by '\n' --query 'SELECT * from sales_data
where State="NY" AND $CONDITIONS' --split-by "Product" --target-dir
/project1/data/sqoop/sales_data -m 3

hdfs dfs -ls /project1/data/sqoop/sales_data


hdfs dfs -cat /project1/data/sqoop/sales_data/part-m-00000

Import table data to HDFS (Where condition and --direct)


========================================================
sudo -u hdfs hdfs dfs -rm -r -skipTrash /project1/data/sqoop/sales_data
sqoop-import --options-file /var/lib/SqoopConnection_DB_Sales.txt --table
sales_data --where "Price >= 7500" --target-dir /project1/data/sqoop/sales_data --
direct -m 1

hdfs dfs -ls /project1/data/sqoop/sales_data


hdfs dfs -cat /project1/data/sqoop/sales_data/part-m-00000

MySQL Direct Connector (--direct) allows faster import and export to/from MySQL
using mysqldump and mysqlimport tools functionality instead of SQL selects and
inserts. Data directly saved to disk. No need to travel through buffer cache/RAM.

Incrimental Imports:
====================
2 Modes: append & lastmodified

Append - If we know the last value (row ID) of the row


Lastmodified - If we want to import based on Lastmodified date column

Incrimental Import (based on Append)


====================================
hdfs dfs -rm -r -skipTrash /project1/data/sqoop/sales_data
sqoop import --options-file /var/lib/SqoopConnection_DB_Sales.txt --table
sales_data --target-dir /project1/data/sqoop/sales_data --incremental append --
check-column TransID --last-value 0 --direct -m 1

hdfs dfs -ls /project1/data/sqoop/sales_data


hdfs dfs -cat /project1/data/sqoop/sales_data/part-m-00000

Incrimental Import (based on Lastmodified)


==========================================
sudo -u hdfs hdfs dfs -rm -r -skipTrash /project1/data/sqoop/sales_data
sqoop import --options-file /var/lib/SqoopConnection_DB_Sales.txt --table
sales_data --target-dir /project1/data/sqoop/sales_data1 --incremental lastmodified
--check-column TransDate --last-value 2017-07-25 --direct -m 1

hdfs dfs -cat /project1/data/sqoop/sales_data1/part-m-00000

Incrimental Import (with sqoop job)


===================================
hdfs dfs -rm -r -skipTrash /project1/data/sqoop/sales_data
sqoop job --create DailyImport_sales.sales_data -- import --options-file
/var/lib/SqoopConnection_DB_Sales.txt --table sales_data --target-dir
/project1/data/sqoop/sales_data --incremental append --check-column TransID --last-
value 0 --direct -m 1

if you see any issues like NoClassDefFoundError: org/json/JSONObject, please follow


below steps:
Download java-json.jar file from
http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm and copy to
/usr/hdp/2.6.2.0-205/sqoop/lib/ directory.

To See Sqoop jobs:


==================
sqoop job --list
sqoop job --create
sqoop job --exec DailyImport_sales.sales_data
sqoop job --delete DailyImport_sales.sales_data

hdfs dfs -cat /project1/data/sqoop/sales_data/part-m-00000

if it is asking for password, please enable below option in sqoop-site.xml

sudo cp /etc/sqoop/2.6.2.0-205/0/sqoop-site.xml /etc/sqoop/2.6.2.0-205/0/sqoop-


site.xml.old
sudo cp /usr/hdp/2.6.2.0-205/etc/sqoop/conf.dist/sqoop-site.xml /etc/sqoop/2.6.2.0-
205/0/sqoop-site.xml

sudo vi /etc/sqoop/2.6.2.0-205/0/sqoop-site.xml

<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>

Add few more rows and test:


===========================
mysql -h 192.168.56.124 -u root -pwelcome1 -e "insert into sales.sales_data
(TransDate, Product, Price, PaymentType, CustName, City, State, Country) values
('2017-08-01 11:19:', 'Product1', 1200, 'Diners', 'Michael', 'Vaucluse', 'New South
Wales', 'Australia');"
mysql -h 192.168.56.124 -u root -pwelcome1 -e "insert into sales.sales_data
(TransDate, Product, Price, PaymentType, CustName, City, State, Country) values
('2017-08-01 12:25', 'Product2', 3600, 'Visa', 'Nanci', 'Issaquah', 'WA', 'United
States');"
mysql -h 192.168.56.124 -u root -pwelcome1 -e "insert into sales.sales_data
(TransDate, Product, Price, PaymentType, CustName, City, State, Country) values
('2017-08-01 16:45', 'Product3', 7500, 'Mastercard', 'Scott', 'Phoenix', 'MD',
'United States');"

hdfs dfs -cat /project1/data/sqoop/sales_data/part-m-00001

All these --last-values will be stored in /home/hdpuser/.sqoop folder as below.

[hdpuser@hnn .sqoop]$ ls -ltr /home/hdpuser/.sqoop


total 20
-rw-r--r-- 1 hdpuser hdpadmin 419 Dec 14 20:33 metastore.db.properties
-rw-r--r-- 1 hdpuser hdpadmin 5714 Dec 14 20:33 metastore.db.script

So, this is user based now. So, you can set the common Sqoop metastore as MySQL in
sqoop-site.xml

Export data to MySQL table


==========================
vi sales_data_hdfs
6923,2017-07-31 12:51:00,Product1,1200,Diners,Michael,Vaucluse,New South
Wales,Australia
6924,2017-07-31 12:52:00,Product2,3600,Visa,Nanci,Issaquah,WA,United States
6925,2017-07-31 12:53:00,Product3,7500,Mastercard,Scott,Phoenix,MD,United States

sudo -u hdfs hdfs dfs -put sales_data_hdfs /project1/data/sqoop/

sqoop export --options-file /var/lib/SqoopConnection_DB_Sales.txt --table


sales_data --export-dir /project1/data/sqoop/sales_data_hdfs

import-all-tables:
==================
This tool imports a set of tables from an RDBMS to HDFS. Data from each table is
stored in a separate directory in HDFS.

For the import-all-tables tool to be useful, the following conditions must be met:
Each table must have a single-column primary key.
You must intend to import all columns of each table.
You must not intend to use non-default splitting column, nor impose any conditions
via a WHERE clause.
Export Failure Cases:
=====================
Exports may fail for a number of reasons:
Loss of connectivity from the Hadoop cluster to the database (either due to
hardware fault, or server software crashes)
Attempting to INSERT a row which violates a consistency constraint (for example,
inserting a duplicate primary key value)
Attempting to parse an incomplete or malformed record from the HDFS source data
Attempting to parse records using incorrect delimiters
Capacity issues (such as insufficient RAM or disk space)

Other Load Options:


===================
Arguements:
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text (default)
--direct-split-size <n> Split the input stream every 'n' bytes when
importing in direct mode

--create-hive-table Fail if the target hive table exists


--hive-import Import tables into Hive (Uses Hive’s default
delimiters if none are set.)
--hive-overwrite Overwrite existing data in the Hive table.
--hive-table <table-name> Sets the table name to use when importing to Hive.
--hive-partition-key <partition-key> Sets the partition key to use when
importing to hive
--hive-partition-value <partition-value> Sets the partition value to use when
importing to hive

You might also like