SRIKALAHASTEESWARA INSTITUTE OF TECHNOLOGY
Dept. of Computer Science and Engineering
Srikalahasti.
SQOOP – A HADDOP
TECHNOLOGY
By
A. Venkatamuni
15381A0526
1 C.S.E IV Year
CONTENTS
INTRODUCTION
ARCHITECTURE
IMPORTING DATA
EXPORTING DATA
LIST DATA
CONCLUSION
INTRODUCTION
Sqoop is a tool designed to transfer data between Hadoop and relational
database servers.
It is used to import data from relational databases such as MySQL,
Oracle to Hadoop HDFS, and export from Hadoop file system to
relational databases.
It is provided by the Apache Software Foundation.
ARCHITECTURE
The architecture of Sqoop consists of connectors, metadata and the map-
reduce job controller as shown in the Figure.
Connectors are one of the main components in sqoop that is responsible for
ensuring the database drivers given by the user is connected with sqoop
The metadata stores internals of table like indexes and partitions.
ARCHITECTURE (Contd.,)
The importing and exporting of the data is handled through map-reduce job
where the import statement given by the user is converted to a map-reduce
job and given to the HDFS cluster .
Map job launch multiple mappers depends on the number defined by user
in the command line.
For Sqoop import, each mapper task will be assigned with part of data to
be imported based on key defined in the command line.
Then each mapper creates connection with the database using JDBC and
fetches the part of data assigned by Sqoop and writes it into HDFS or Hive
or HBase based on the option provided in the command line.
IMPORTING DATA
Sqoop facilitates the users of RDBMS to import the data in tables to either of
the Hadoop platforms using import command (Apache sqoop import, 2013)
which are discussed Import a Table from MySQL to HDFS.
Import a Table from MySQL to Hbase:
A table in MySQL can be imported to HBase using the command as follows:
Case 1: If table have primary key and import all the column of MySQL:
$ bin/sqoop import --connect jdbc:mysql://localhost/database name --username
user --password password --table tableName --hbase-table hbase_tableName
--column-family hbase_table_col1 --hbase-create-table
Case 2: If table doesn’t have primary key then choose one column as a hbase-
row-key. Import only few columns.
$ bin/sqoop import --connect jdbc:mysql://localhost/database name --username
user --password password --table tableName --hbasetable hbase_tableName
--columns column1,column2 --column-family hbase_table_col --hbase-row-key
column1 --hbase-create-table
IMPORTING DATA (Contd.,)
Import a Table from MySQL to Hive:
A table in MySQL can be imported to Hive using the command as follows:
Case 1: Import MySQL table into Hive if table have primary key.
$bin/sqoop-import --connect jdbc:mysql://localhost:3306/database name
-username user -password password --table tableName --hive-table
tableName --create-hive-table --hiveimport --hive-home
path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn’t have primary key.
$bin/sqoop-import --connect jdbc:mysql://localhost:3306/database name
-username user -password password --table tableName --hive-table
tableName --create-hive-table --hiveimport --hive-home
path/to/hive_home -m 1.
EXPORTING DATA
Export describes how to export data back from the HDFS to the RDBMS
database.
The files which are given as input to the Sqoop contain records, which are
called rows in table.
Those are read and parsed into a set of records and delimited with user-
specified delimiter.
Syntax:
$ sqoop export (generic-args) (export-args)
$ sqoop-export (generic-args) (export-args)
Example:
It is mandatory that the table to be exported is created manually and is
present in the database from where it has to be exported.
$ mysqlmysql> USE db;mysql> CREATE TABLE employee ( id INT NOT
NULL PRIMARY KEY, name VARCHAR(20), deg VARCHAR(20),
salary INT, dept VARCHAR(10));
EXPORTING DATA (Contd.,)
The following command is used to export the table data (which is in
emp_data file on HDFS) to the employee table in db database of Mysql
database server.
$ sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \--table employee \
--export-dir /emp/emp_data
The following command is used to verify the table in mysql command
line.
mysql>select * from employee;
LIST DATA
We can list both Databases and tables inside those databases. Sqoop list-
databases tool parses and executes the ‘SHOW DATABASES’ query against
the database server. Thereafter, it lists out the present databases on the
server.
Syntax:
The following syntax is used for Sqoop list-databases command.
$ sqoop list-databases (generic-args) (list-databases-args)
$ sqoop-list-databases (generic-args) (list-databases-args)
Sample Query:
The following command is used to list all the databases in the MySQL
database server.
$ sqoop list-databases \--connect jdbc:mysql://localhost/ \--username root
LIST DATA (Contd.,)
Sqoop list-tables describes how to list out the tables of a particular database
in MySQL database server using Sqoop. Sqoop list-tables tool parses and
executes the ‘SHOW TABLES’ query against a particular database.
Thereafter, it lists out the present tables in a database.
Syntax:
The following syntax is used for Sqoop list-tables command.
$ sqoop list-tables (generic-args) (list-tables-args)
$ sqoop-list-tables (generic-args) (list-tables-args)
Sample Query:
The following command is used to list all the tables in the userdb database
of MySQL database server.
$ sqoop list-tables \--connect jdbc:mysql://localhost/userdb \--username
root
CONCLUSION
Sqoop consists of “eval” command to perform user defined constraints
on the tables.To conclude, Sqoop helps in transferring bulk data between
RDBMS systems and Distributed Systems very optimally. It reduces the
unnecessary efforts of the developers in coding and maintaining the code.
As Sqoop transfers the data in parallel, the data transfer is also very fast.
Because of its contributors and support, Sqoop helps very much in the
Hadoop world. It acts as a middleware between RDBMS and Non -
Structural Databases.
Thank you!
Any queries?
13