0% found this document useful (0 votes)
132 views18 pages

Sqoop Data Transfer Guide

Apache Sqoop allows exchanging data between relational databases and Hadoop. It can import full or partial tables from an RDBMS into HDFS efficiently using MapReduce. Sqoop also enables exporting data from HDFS back to relational databases. It supports importing data using queries, incremental imports to handle new or modified records, and exporting from HDFS to databases.

Uploaded by

Hoàng Chương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views18 pages

Sqoop Data Transfer Guide

Apache Sqoop allows exchanging data between relational databases and Hadoop. It can import full or partial tables from an RDBMS into HDFS efficiently using MapReduce. Sqoop also enables exporting data from HDFS back to relational databases. It supports importing data using queries, incremental imports to handle new or modified records, and exporting from HDFS to databases.

Uploaded by

Hoàng Chương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Apache Sqoop

Hanoi – Autumn 2019


Apache Sqoop

• Sqoop exchanges data between an RDBMS and Hadoop.

• It can import all tables, a single table, or a portion of a table into HDFS.

• Does this very efficiently via a Map-only MapReduce job

• Result is a directory in HDFS containing comma-delimited text files (default)

• Sqoop can export data from HDFS back to RDBMS


Apache Sqoop Architect
Importing Tables with Sqoop

This will imports the customers table from a MySQL database

• Will create /mydata/customers directory in HDFS

• Directory will contain comma-delimited text files.


Importing An Entire Database with Sqoop

Import all tables from database (fields will be tab-delimited)


Importing Partial Tables with Sqoop

Import only specified columns from a table.


Importing Partial Tables with Sqoop

Import only matching rows


Importing With Free-Form Query

Sometimes what we want to import is harder to get than directly extracting


columns from one table. Therefore, Sqoop provides Free-Form Query for us to
create more complex import queries
● Must specify a destination directory with target-dir
Importing With Free-Form Query

● The query must include the token $CONDITIONS


○ Sqoop can use $CONDITIONS to get a table’s metadata by setting the
token to ‘1 = 0’
● Must select a splitting column with split-by
○ Helps Mappers parallelize the query
Importing With Free-Form Query

● We can tell the query to be executed once and imported serially by specifying
a single map task with -m 1
$ sqoop import \
--query ‘SELECT a.*, b.* FROM a JOIN b ON (a.id == b.id) WHERE $CONDITIONS’\
-m 1
-- target-dir /user/foo/joinresults

● $CONDITIONS can be listed along with other constraints


$ sqoop import \
--options-file world-options.txt -m 1 \
--target-dir /user/hdfs/sqoop-my-sql-import/canada-city
--query “SELECT ID, Name FROM City \
WHERE CountryCode=’CAN’ AND \$CONDITIONS”
Importing With Free-Form Query

$ sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop --password sqoop \
--query ‘SELECT normcities.id, countries.country, normcities.city FROM
normcities \
JOIN countries USING(country_id) WHERE $CONDITIONS’ \
--split-by id
--target-dir cities
Importing With Free-Form Query
#
# Options file for Sqoop import
#

# Specifies the tool being invoked


import

# Connect parameter and value


--connect
jdbc:mysql://localhost/db

# Username parameter and value


--username
foo

#
# Remaining options should be specified in the command line.
#
Arguments
Import control arguments

Output formatting arguments


Supported Database

Supports almost every RDBMS

Supports --direct option for MySql and PostgreSQL to increase importing speed
Incremental Imports with Sqoop

Two incremental import modes: append and lastmodified


● Sqoop’s incremental append mode imports only new records based on value
of last record in specified column
Handling Modifications with Incremental Imports

• What if existing records are also modified in the databases?

• Incremental append mode doesn’t handle this.

• Sqoop’s lastmodified incremental import mode adds and updates records


Exporting Data from Hadoop to RDBMS
Sqoop User Guide

https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

You might also like