0% found this document useful (0 votes)

48 views27 pages

Unit 4 3 Lumify, Data Rapper and Sqooop

The document discusses data integration tools Lumify and Datawrapper, highlighting their features and functionalities for managing and visualizing big data. It also covers Sqoop, a tool for transferring data between relational databases and Hadoop, detailing its key features, import/export processes, and commands for various operations. Additionally, it provides examples of how to use Sqoop for importing tables and performing incremental imports.

Uploaded by

Utkarsha Mahajan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views27 pages

Unit 4 3 Lumify, Data Rapper and Sqooop

Uploaded by

Utkarsha Mahajan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Data Integration By Lumify and Data wrapper

• Lumify is an open source tool kit developed by Altamira that strive to help manage big data.
• Eric Schmidt, executive chairman of Google, was recently quoted as saying that "the biggest disrupter (of
2014) that we're sure about is the arrival of big data and machine intelligence everywhere"
• Lumify addresses the concerns of big data by providing a platform for
• data integration,
• analytics,
• visualization.
• Key features include
• full-text search,
• 2D and 3D graph visualizations,
• Link analysis between graph entities,
• Geospatial analysis via integration with any
• Open Layers-compatible mapping system,
• multimedia analysis to deal with
• text,
• images,
• video, and collaboration features.
Data wrapper
• Data wrapper is a data visualization tool that creates charts and maps to help better tell your story.
• Data wrapper was designed to be used by journalists and can be used on mobile or desktop devices
while in the design phase.
• Features
• Datawrapper provides common charts and maps such as:
• Bar Charts
• Dot, Arrow, and Scatter Plots
• Column Charts
• Line Graphs
• Donut/Pie Charts
• Tables
• Symbol or Choropleth Maps
• Datawrapper is easy to use and requires no advanced knowledge of coding or any sort of design
background.
• Data wrapper makes it easy to:
1. Copy and paste your data: Data can be uploaded from CSV files from popular data collection tools such as Excel
and Google Sheets.
2. Visualize: Select multiple chart and map types and insert annotations
3. Publish: Data wrapper provides an embed code to use on your website, or gives you the ability to export your chart
or map as an image file or a PDF.
Scoop
Introduction
Installation
Import and export
INTRODUCTION
• The traditional application management system, that is, the interaction of applications with relational database using
RDBMS, is one of the sources that generate Big Data.
• Big Data, generated by RDBMS, is stored in Relational Database Servers in the relational database structure.
• When Big Data storages and analyzers such as
• MapReduce,
• Hive,
• HBase,
• Cassandra,
• Pig, etc.
of the Hadoop ecosystem came into picture, they required a tool to interact with the relational database servers for importing and exporting
the Big Data residing in them.

• Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and
Hadoop’s HDFS.
• Sqoop − “SQL to Hadoop and Hadoop to SQL”
• Sqoop is a tool designed to transfer data between Hadoop and relational database servers.
• It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file
system to relational databases.
• It is provided by the Apache Software Foundation.
Key Features of Sqoop
• Sqoop provides many salient features like:
• Full Load: Apache Sqoop can load the whole table by a single command. You can also load all the tables from a database using a single
command.

• Incremental Load: Apache Sqoop also provides the facility of incremental load where you can load parts of table whenever it is updated.

• Parallel import/export: Sqoop uses YARN framework to import and export the data, which provides fault tolerance on top of parallelism.

• Import results of SQL query: You can also import the result returned from an SQL query in HDFS.

• Compression: You can compress your data by using deflate(gzip) algorithm with –compress argument, or by specifying –compression-codec
argument. You can also load compressed table in Apache Hive.

• Connectors for all major RDBMS Databases: Apache Sqoop provides connectors for multiple RDBMS databases, covering almost the
entire circumference.

• Kerberos Security Integration: Kerberos is a computer network authentication protocol which works on the basis of ‘tickets’ to allow nodes
communicating over a non-secure network to prove their identity to one another in a secure manner. Sqoop supports Kerberos authentication.

• Load data directly into HIVE/HBase: You can load data directly into Apache Hive for analysis and also dump your data in HBase, which is
a NoSQL database.

• Support for Accumulo: You can also instruct Sqoop to import the table in Accumulo rather than a directory in HDFS.
How Sqoop Works?
The following image describes the workflow of Sqoop.
Sqoop Architecture & Working
Import
• The import tool imports individual tables from RDBMS to HDFS.
• Each row in a table is treated as a record in HDFS.
• When we submit Sqoop command, our main task gets divided into
subtasks which is handled by individual Map Task internally.
• Map Task is the subtask, which imports part of data to the Hadoop
Ecosystem.
• Collectively, all Map tasks import the whole data.
Export
• Export also works in a similar manner.
• The export tool exports a set of files from HDFS back to an RDBMS.
• The files given as input to Sqoop contain records, which are called as rows in the table.
• When we submit our Job, it is mapped into Map Tasks which brings the chunk of data from HDFS.
• These chunks are exported to a structured data destination.
• Combining all these exported chunks of data, we receive the whole data at the destination, which in most
of the cases is an RDBMS (MYSQL/Oracle/SQL Server).
• Reduce phase is required in case of aggregations.
• But, Apache Sqoop just imports and exports the data;
• it does not perform any aggregations.
• Map job launch multiple mappers depending on the number defined by the user.
• For Sqoop import, each mapper task will be assigned with a part of data to be imported.
• Sqoop distributes the input data among the mappers equally to get high performance.
• Then each mapper creates a connection with the database using JDBC and fetches the part of data assigned
by Sqoop and writes it into HDFS or Hive or HBase based on the arguments provided in the CLI.
Sqoop Import and Export
• Sqoop Import
• The import tool imports individual tables from RDBMS to HDFS.
• Each row in a table is treated as a record in HDFS.
• All records are stored as text data in text files or as binary data in Avro and Sequence
files.
• Sqoop Export
• The export tool exports a set of files from HDFS back to an RDBMS.
• The files given as input to Sqoop, contain records which are called as rows in table.
• Those are read and parsed into a set of records and delimited with user-specified delimiter.
Example
Let us take an example of three tables named as emp, emp_add, and emp_contact, which are in a database called
userdb in a MySQL database server.

The three tables and their data are as follows.

emp:
id name deg salary dept
1201 gopal manager 50,000 TP
1202 manisha Proof reader 50,000 TP
1203 khalil php dev 30,000 AC
1204 prasanth php dev 30,000 AC
1204 kranthi admin 20,000 TP
emp_add:

id hno street city

1201 288A vgiri jublee
1202 108I aoc sec-bad
1203 144Z pgutta hyd
1204 78B old city sec-bad
1205 720X hitec sec-bad

emp_contact:

id phno email
1201 2356742 [email protected]
1202 1661663 [email protected]
1203 8887776 [email protected]
1204 9988774 [email protected]
1205 1231231 [email protected]
Import Tool
Syntax
$ sqoop import (generic-args) (import-args)
$ sqoop-import (generic-args) (import-args)
The following command is used to import the emp table from MySQL
database server to HDFS.
$ sqoop import \ connect jdbc:mysql://localhost/userdb \--username root \ table emp --m 1

Here -m 1 specifies one mapper for each table. All the tables are downloaded indefault directory.The default number of
mappers used is 4.
You can change this by appending the command by "-m number_of_mappers".
To verify the imported data in HDFS, use the following command.
$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*
It shows you the emp table data and fields are separated with comma (,).
1201, gopal, manager, 50000, TP
1202, manisha, preader, 50000, TP
1203, kalil, php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi, admin, 20000, TP
Importing into Target Directory
We can specify the target directory while importing table data into HDFS
using the Sqoop import tool.
the syntax to specify the target directory as option to the Sqoop import
command.
--target-dir <new or exist directory in HDFS>
The following command is used to import emp_add table data into
‘/queryresult’ directory.
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp_add \
--m 1 \
--target-dir /queryresult
Import
• The following command is used to verify the imported data in
/queryresult directory form emp_add table.
• $ $HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-*
• It will show you the emp_add table data with comma (,) separated
fields.
1201, 288A, vgiri, jublee
1202, 108I, aoc, sec-bad
1203, 144Z, pgutta, hyd
1204, 78B, oldcity, sec-bad
1205, 720C, hitech, sec-bad
Import Subset of Table Data
We can import a subset of a table using the ‘where’ clause in Sqoop import tool.
It executes the corresponding SQL query in the respective database server and stores the result in a target directory
in HDFS.
The syntax for where clause is as follows.
--where <condition>

Example: The following command is used to import a subset of emp_add table data.
The subset query is to retrieve the employee id and address, who lives in Secunderabad city.

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp_add \
--m 1 \
--where “city =’sec-bad’” \
--target-dir /wherequery
• The following command is used to verify the imported data in
/wherequery directory from the emp_add table.
• $ $HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-*
• It will show you the emp_add table data with comma (,) separated
fields.
1202, 108I, aoc, sec-bad
1204, 78B, oldcity, sec-bad
1205, 720C, hitech, sec-bad
Incremental Import
• Incremental import is a technique that imports only the newly added rows
in a table.
• It is required to add ‘incremental’, ‘check-column’, and ‘last-value’
options to perform the incremental import.
• The following syntax is used for the incremental option in Sqoop import
command.
--incremental <mode>
--check-column <column name>
--last value <last check column value>
Example
Let us assume the newly added data into emp table is as follows −
1206, satish p, grp des, 20000, GR

The following command is used to perform the incremental import in the emp table.
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp \
--m 1 \
--incremental append \
--check-column id \
-last value 1205
The following command is used to verify the imported data from emp table to HDFS emp/
directory.

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*

It shows you the emp table data with comma (,) separated fields.
1201, gopal, manager, 50000, TP
1202, manisha, preader, 50000, TP
1203, kalil, php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi, admin, 20000, TP
1206, satish p, grp des, 20000, GR

The following command is used to see the modified or newly added rows from the emp table.

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1

It shows you the newly added rows to the emp table with comma (,) separated fields.
1206, satish p, grp des, 20000, GR
Import all tables
Syntax
$ sqoop import-all-tables (generic-args) (import-args)
Example

Let us take an example of importing all tables from the userdb database.

The list of tables that the database userdb contains is as follows.

+--------------------+ | Tables | +--------------------+ | emp | | emp_add | | emp_contact | +--------------------+

The following command is used to import all the tables from the userdb database.

$ sqoop import-all-tables \ --connect jdbc:mysql://localhost/userdb \ --username root

The following command is used to verify all the table data to the userdb database in HDFS.

$ $HADOOP_HOME/bin/hadoop fs –ls
Output

drwxr-xr-x - hadoop supergroup 0 2014-12-22 22:50 _sqoop drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:46

emp drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:50 emp_add drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:52 emp_contact
Export
The following is the syntax for the export command.
$ sqoop export (generic-args) (export-args)
Example
Let us take an example of the employee data in file, in HDFS.
The employee data is available in emp_data file in ‘emp/’ directory in HDFS.

The emp_data is as follows.

1201, gopal, manager, 50000, TP
1202, manisha, preader, 50000, TP
1203, kalil, php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi, admin, 20000, TP
1206, satish p, grp des, 20000, GR

It is mandatory that the table to be exported is created manually and is present in the
database from where it has to be exported.
The following query is used to create the table ‘employee’ in mysql
command line.
$ mysql
mysql> USE db;
mysql> CREATE TABLE employee (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20),
deg VARCHAR(20),
salary INT,
dept VARCHAR(10));
The following command is used to export the table data (which is in
emp_data file on HDFS) to the employee table in db database of Mysql
database server.
$ sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee \
--export-dir /emp/emp_data
The following command is used to verify the table in mysql command line.
mysql>select * from employee;

If the given data is stored successfully, then you can find the following table of given employee
data.

+------+--------------+-------------+-------------------+--------+
| Id | Name | Designation | Salary | Dept |
+------+--------------+-------------+-------------------+--------+
| 1201 | gopal | manager | 50000 | TP |
| 1202 | manisha | preader | 50000 | TP |
| 1203 | kalil | php dev | 30000 | AC |
| 1204 | prasanth | php dev | 30000 | AC |
| 1205 | kranthi | admin | 20000 | TP |
| 1206 | satish p | grp des | 20000 | GR |
+------+--------------+-------------+-------------------+--------+

Bda U3
No ratings yet
Bda U3
59 pages
Module 5 - Sqoop
No ratings yet
Module 5 - Sqoop
25 pages
Sqoop in Hadoop: Features & Benefits
No ratings yet
Sqoop in Hadoop: Features & Benefits
8 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
Apache Sqoop Fundamentals and Usage
100% (1)
Apache Sqoop Fundamentals and Usage
66 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Big Data Ingestion with Sqoop and Flume
No ratings yet
Big Data Ingestion with Sqoop and Flume
104 pages
Understanding Sqoop in Hadoop
No ratings yet
Understanding Sqoop in Hadoop
27 pages
Understanding Apache Sqoop for Data Transfer
No ratings yet
Understanding Apache Sqoop for Data Transfer
24 pages
Sqoop: Data Transfer in Hadoop
No ratings yet
Sqoop: Data Transfer in Hadoop
13 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
DMBD MBAA21041 Sqoop
No ratings yet
DMBD MBAA21041 Sqoop
11 pages
Sqoop: Importing Data to Hadoop HDFS
No ratings yet
Sqoop: Importing Data to Hadoop HDFS
7 pages
Unit 6
No ratings yet
Unit 6
26 pages
Sqoop: Interface for RDBMS & Hadoop
No ratings yet
Sqoop: Interface for RDBMS & Hadoop
39 pages
Introduction to Sqoop in Hadoop
No ratings yet
Introduction to Sqoop in Hadoop
6 pages
160 P16cse5a-P16ite3a 2020052411232116
No ratings yet
160 P16cse5a-P16ite3a 2020052411232116
13 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
Apache Sqoop: Import/Export Commands
No ratings yet
Apache Sqoop: Import/Export Commands
7 pages
SqoopTutorial Ver 2.0
No ratings yet
SqoopTutorial Ver 2.0
51 pages
Unit 4
No ratings yet
Unit 4
119 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Sqoop: Bridging Hadoop and RDBMS
No ratings yet
Sqoop: Bridging Hadoop and RDBMS
4 pages
Lecture 15
No ratings yet
Lecture 15
27 pages
Sqoop Students Datadotz
No ratings yet
Sqoop Students Datadotz
19 pages
Sqoop Interview Guide for Big Data
No ratings yet
Sqoop Interview Guide for Big Data
25 pages
Sqoop Tutorial for Hadoop Data Management
No ratings yet
Sqoop Tutorial for Hadoop Data Management
6 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
Sqoop Tool for AI & DS Students
No ratings yet
Sqoop Tool for AI & DS Students
10 pages
Bda Module2
No ratings yet
Bda Module2
30 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Sqoop
No ratings yet
Sqoop
4 pages
Comprehensive Apache Sqoop Tutorial
No ratings yet
Comprehensive Apache Sqoop Tutorial
2 pages
Intro
No ratings yet
Intro
2 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
18 pages
Module IV
No ratings yet
Module IV
5 pages
Practice Assignment
No ratings yet
Practice Assignment
3 pages
Sqoop: Data Transfer Tool for Hadoop
No ratings yet
Sqoop: Data Transfer Tool for Hadoop
28 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
9 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
BDA Module 2 PDF
No ratings yet
BDA Module 2 PDF
123 pages
Overview of Apache Flume in Hadoop
No ratings yet
Overview of Apache Flume in Hadoop
12 pages
Experiment-5 (Case Study On Sqoop)
No ratings yet
Experiment-5 (Case Study On Sqoop)
5 pages
Apache - SQOOP and Flume
No ratings yet
Apache - SQOOP and Flume
16 pages
SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
Mod 2
No ratings yet
Mod 2
70 pages
Sqoop Import Techniques Guide
No ratings yet
Sqoop Import Techniques Guide
18 pages
Bda Exp8 Chinmay
No ratings yet
Bda Exp8 Chinmay
6 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
Module 2
No ratings yet
Module 2
27 pages
Bridging Databases Mastering Hadoop Sqoop Integration
No ratings yet
Bridging Databases Mastering Hadoop Sqoop Integration
10 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Enterprise Architecture Expertise
No ratings yet
Enterprise Architecture Expertise
11 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Assignment 1 2
No ratings yet
Assignment 1 2
4 pages
Shweta
No ratings yet
Shweta
4 pages
FortiAnalyzer 05 Reports
No ratings yet
FortiAnalyzer 05 Reports
59 pages
Module 3 Lab Michael Marquardt
No ratings yet
Module 3 Lab Michael Marquardt
4 pages
PowerBuilder Interview Q&A
No ratings yet
PowerBuilder Interview Q&A
11 pages
COSC 304 Introduction To Database Systems ER To Relational Mapping
No ratings yet
COSC 304 Introduction To Database Systems ER To Relational Mapping
34 pages
Database 2 Notes
No ratings yet
Database 2 Notes
42 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Apex Url
No ratings yet
Apex Url
1 page
Lecture 1 - Course Introduction & Dimensional Modeling
No ratings yet
Lecture 1 - Course Introduction & Dimensional Modeling
66 pages
Salesforce Trigger 5
No ratings yet
Salesforce Trigger 5
9 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
12 pages
Deployment: Ibm Infosphere Datastage V11.5
No ratings yet
Deployment: Ibm Infosphere Datastage V11.5
15 pages
DBMS M3
No ratings yet
DBMS M3
66 pages
DBMS Tutorial
No ratings yet
DBMS Tutorial
173 pages
Active Directory Domain Controller
No ratings yet
Active Directory Domain Controller
33 pages
PHP DRCP Fan Resultcaches 131994
No ratings yet
PHP DRCP Fan Resultcaches 131994
33 pages
PAEC Computer Developer Full Preparation Guide
No ratings yet
PAEC Computer Developer Full Preparation Guide
2 pages
Data Model
No ratings yet
Data Model
4 pages
Slides 01 Intro
No ratings yet
Slides 01 Intro
39 pages
7bcee3a Unit V Ado Net
No ratings yet
7bcee3a Unit V Ado Net
48 pages
Overview of Archiving For Asset Accounting: Symptom
No ratings yet
Overview of Archiving For Asset Accounting: Symptom
6 pages
NLP Unit-Ii (Part-I)
No ratings yet
NLP Unit-Ii (Part-I)
19 pages
Full Stack Development
No ratings yet
Full Stack Development
21 pages
Database System Final Project
No ratings yet
Database System Final Project
3 pages
DWH MCQ
No ratings yet
DWH MCQ
34 pages
SQL Interview Cheat Sheet Aditya Gupta
No ratings yet
SQL Interview Cheat Sheet Aditya Gupta
3 pages
ER Modeling for Engineering Students
No ratings yet
ER Modeling for Engineering Students
10 pages

Unit 4 3 Lumify, Data Rapper and Sqooop

Uploaded by

Unit 4 3 Lumify, Data Rapper and Sqooop

Uploaded by

Data Integration By Lumify and Data wrapper

The three tables and their data are as follows.

id hno street city

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1

The list of tables that the database userdb contains is as follows.

+--------------------+ | Tables | +--------------------+ | emp | | emp_add | | emp_contact | +--------------------+

$ sqoop import-all-tables \ --connect jdbc:mysql://localhost/userdb \ --username root

The emp_data is as follows.

You might also like