Apache Sqoop
Hanoi – Autumn 2019
Apache Sqoop
• Sqoop exchanges data between an RDBMS and Hadoop.
• It can import all tables, a single table, or a portion of a table into HDFS.
• Does this very efficiently via a Map-only MapReduce job
• Result is a directory in HDFS containing comma-delimited text files (default)
• Sqoop can export data from HDFS back to RDBMS
Apache Sqoop Architect
Importing Tables with Sqoop
This will imports the customers table from a MySQL database
• Will create /mydata/customers directory in HDFS
• Directory will contain comma-delimited text files.
Importing An Entire Database with Sqoop
Import all tables from database (fields will be tab-delimited)
Importing Partial Tables with Sqoop
Import only specified columns from a table.
Importing Partial Tables with Sqoop
Import only matching rows
Importing With Free-Form Query
Sometimes what we want to import is harder to get than directly extracting
columns from one table. Therefore, Sqoop provides Free-Form Query for us to
create more complex import queries
● Must specify a destination directory with target-dir
Importing With Free-Form Query
● The query must include the token $CONDITIONS
○ Sqoop can use $CONDITIONS to get a table’s metadata by setting the
token to ‘1 = 0’
● Must select a splitting column with split-by
○ Helps Mappers parallelize the query
Importing With Free-Form Query
● We can tell the query to be executed once and imported serially by specifying
a single map task with -m 1
$ sqoop import \
--query ‘SELECT a.*, b.* FROM a JOIN b ON (a.id == b.id) WHERE $CONDITIONS’\
-m 1
-- target-dir /user/foo/joinresults
● $CONDITIONS can be listed along with other constraints
$ sqoop import \
--options-file world-options.txt -m 1 \
--target-dir /user/hdfs/sqoop-my-sql-import/canada-city
--query “SELECT ID, Name FROM City \
WHERE CountryCode=’CAN’ AND \$CONDITIONS”
Importing With Free-Form Query
$ sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop --password sqoop \
--query ‘SELECT normcities.id, countries.country, normcities.city FROM
normcities \
JOIN countries USING(country_id) WHERE $CONDITIONS’ \
--split-by id
--target-dir cities
Importing With Free-Form Query
#
# Options file for Sqoop import
#
# Specifies the tool being invoked
import
# Connect parameter and value
--connect
jdbc:mysql://localhost/db
# Username parameter and value
--username
foo
#
# Remaining options should be specified in the command line.
#
Arguments
Import control arguments
Output formatting arguments
Supported Database
Supports almost every RDBMS
Supports --direct option for MySql and PostgreSQL to increase importing speed
Incremental Imports with Sqoop
Two incremental import modes: append and lastmodified
● Sqoop’s incremental append mode imports only new records based on value
of last record in specified column
Handling Modifications with Incremental Imports
• What if existing records are also modified in the databases?
• Incremental append mode doesn’t handle this.
• Sqoop’s lastmodified incremental import mode adds and updates records
Exporting Data from Hadoop to RDBMS
Sqoop User Guide
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html