0% found this document useful (0 votes)
102 views13 pages

Sqoop: Data Transfer in Hadoop

Sqoop is a tool used to transfer data between Hadoop and relational databases. It allows users to import data from databases like MySQL into HDFS or Hive, and export data from HDFS to databases. The architecture of Sqoop consists of connectors, metadata, and a map-reduce job controller. Importing and exporting is handled through map-reduce jobs where the statements are converted to jobs and run on the HDFS cluster. Sqoop also provides commands to list databases and tables for easier management of data.

Uploaded by

Akram Sharieff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views13 pages

Sqoop: Data Transfer in Hadoop

Sqoop is a tool used to transfer data between Hadoop and relational databases. It allows users to import data from databases like MySQL into HDFS or Hive, and export data from HDFS to databases. The architecture of Sqoop consists of connectors, metadata, and a map-reduce job controller. Importing and exporting is handled through map-reduce jobs where the statements are converted to jobs and run on the HDFS cluster. Sqoop also provides commands to list databases and tables for easier management of data.

Uploaded by

Akram Sharieff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

SRIKALAHASTEESWARA INSTITUTE OF TECHNOLOGY

Dept. of Computer Science and Engineering


Srikalahasti.

SQOOP – A HADDOP
TECHNOLOGY

By
A. Venkatamuni
15381A0526
1 C.S.E IV Year
CONTENTS

INTRODUCTION

ARCHITECTURE

IMPORTING DATA

EXPORTING DATA

LIST DATA

CONCLUSION
INTRODUCTION
 Sqoop is a tool designed to transfer data between Hadoop and relational
database servers.
 It is used to import data from relational databases such as MySQL,
Oracle to Hadoop HDFS, and export from Hadoop file system to
relational databases.

 It is provided by the Apache Software Foundation.


ARCHITECTURE
 The architecture of Sqoop consists of connectors, metadata and the map-
reduce job controller as shown in the Figure.
 Connectors are one of the main components in sqoop that is responsible for
ensuring the database drivers given by the user is connected with sqoop
 The metadata stores internals of table like indexes and partitions.
ARCHITECTURE (Contd.,)

 The importing and exporting of the data is handled through map-reduce job
where the import statement given by the user is converted to a map-reduce
job and given to the HDFS cluster .

 Map job launch multiple mappers depends on the number defined by user
in the command line.

 For Sqoop import, each mapper task will be assigned with part of data to
be imported based on key defined in the command line.

 Then each mapper creates connection with the database using JDBC and
fetches the part of data assigned by Sqoop and writes it into HDFS or Hive
or HBase based on the option provided in the command line.
IMPORTING DATA
Sqoop facilitates the users of RDBMS to import the data in tables to either of
the Hadoop platforms using import command (Apache sqoop import, 2013)
which are discussed Import a Table from MySQL to HDFS.
Import a Table from MySQL to Hbase:
A table in MySQL can be imported to HBase using the command as follows:
Case 1: If table have primary key and import all the column of MySQL:
$ bin/sqoop import --connect jdbc:mysql://localhost/database name --username
user --password password --table tableName --hbase-table hbase_tableName
--column-family hbase_table_col1 --hbase-create-table
Case 2: If table doesn’t have primary key then choose one column as a hbase-
row-key. Import only few columns.
$ bin/sqoop import --connect jdbc:mysql://localhost/database name --username
user --password password --table tableName --hbasetable hbase_tableName
--columns column1,column2 --column-family hbase_table_col --hbase-row-key
column1 --hbase-create-table
IMPORTING DATA (Contd.,)
Import a Table from MySQL to Hive:

A table in MySQL can be imported to Hive using the command as follows:

Case 1: Import MySQL table into Hive if table have primary key.
$bin/sqoop-import --connect jdbc:mysql://localhost:3306/database name
-username user -password password --table tableName --hive-table
tableName --create-hive-table --hiveimport --hive-home
path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn’t have primary key.
$bin/sqoop-import --connect jdbc:mysql://localhost:3306/database name
-username user -password password --table tableName --hive-table
tableName --create-hive-table --hiveimport --hive-home
path/to/hive_home -m 1.
EXPORTING DATA
 Export describes how to export data back from the HDFS to the RDBMS
database.
 The files which are given as input to the Sqoop contain records, which are
called rows in table.
 Those are read and parsed into a set of records and delimited with user-
specified delimiter.
Syntax:
$ sqoop export (generic-args) (export-args)
$ sqoop-export (generic-args) (export-args)
Example:
It is mandatory that the table to be exported is created manually and is
present in the database from where it has to be exported.
$ mysqlmysql> USE db;mysql> CREATE TABLE employee ( id INT NOT
NULL PRIMARY KEY, name VARCHAR(20), deg VARCHAR(20),
salary INT, dept VARCHAR(10));
EXPORTING DATA (Contd.,)

The following command is used to export the table data (which is in


emp_data file on HDFS) to the employee table in db database of Mysql
database server.

$ sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \--table employee \
--export-dir /emp/emp_data

 The following command is used to verify the table in mysql command


line.
mysql>select * from employee;
LIST DATA

We can list both Databases and tables inside those databases. Sqoop list-
databases tool parses and executes the ‘SHOW DATABASES’ query against
the database server. Thereafter, it lists out the present databases on the
server.
Syntax:
The following syntax is used for Sqoop list-databases command.
$ sqoop list-databases (generic-args) (list-databases-args)
$ sqoop-list-databases (generic-args) (list-databases-args)
Sample Query:
The following command is used to list all the databases in the MySQL
database server.
$ sqoop list-databases \--connect jdbc:mysql://localhost/ \--username root
LIST DATA (Contd.,)

Sqoop list-tables describes how to list out the tables of a particular database
in MySQL database server using Sqoop. Sqoop list-tables tool parses and
executes the ‘SHOW TABLES’ query against a particular database.
Thereafter, it lists out the present tables in a database.
Syntax:
The following syntax is used for Sqoop list-tables command.
$ sqoop list-tables (generic-args) (list-tables-args)
$ sqoop-list-tables (generic-args) (list-tables-args)
Sample Query:
The following command is used to list all the tables in the userdb database
of MySQL database server.
$ sqoop list-tables \--connect jdbc:mysql://localhost/userdb \--username
root
CONCLUSION

Sqoop consists of “eval” command to perform user defined constraints


on the tables.To conclude, Sqoop helps in transferring bulk data between
RDBMS systems and Distributed Systems very optimally.  It reduces the
unnecessary efforts of the developers in coding and maintaining the code.
As Sqoop transfers the data in parallel, the data transfer is also very fast.
 Because of its contributors and support, Sqoop helps very much in the
Hadoop world. It acts as a middleware between RDBMS and Non -
Structural Databases.
Thank you!
Any queries?

13

You might also like