{"id":104801,"date":"2021-09-20T11:00:00","date_gmt":"2021-09-20T08:00:00","guid":{"rendered":"https:\/\/examples.javacodegeeks.com\/?p=104801"},"modified":"2021-09-17T14:06:52","modified_gmt":"2021-09-17T11:06:52","slug":"apache-hadoop-etl-tutorial","status":"publish","type":"post","link":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/","title":{"rendered":"Apache Hadoop ETL Tutorial"},"content":{"rendered":"<h2 class=\"wp-block-heading\" id=\"h-1-introduction\">1. Introduction<\/h2>\n<p>This is an in-depth article related to the Apache Hadoop ETL Tool &#8211; Hive. Hive is part of the Hadoop Ecosystem. It is used in Big Data solutions with Hadoop. It was developed by Facebook. Hadoop is an Apache Opensource project now. Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for the execution of queries and handling big data. <\/p>\n<h2 class=\"wp-block-heading\" id=\"h-2-apache-hadoop-etl\">2. Apache Hadoop ETL<\/h2>\n<h3 class=\"wp-block-heading\" id=\"h-2-1-prerequisites\">2.1 Prerequisites<\/h3>\n<p>Java 7 or 8 is required on the Linux, windows, or Mac operating system. Maven 3.6.1 is required. Apache Hadoop 2.9.1 and Hive 3.1.2 are used in this example.<\/p>\n<h3 class=\"wp-block-heading\" id=\"h-2-2-download\">2.2 Download<\/h3>\n<p>You can download Java 8 can be downloaded from the Oracle <a href=\"https:\/\/www.oracle.com\/technetwork\/java\/javase\/downloads\/jdk8-downloads-2133151.html\">website<\/a>.\u00a0Apache Maven 3.6.1 can be downloaded from<a href=\"https:\/\/maven.apache.org\/download.cgi\"> the\u00a0Apache site<\/a>.   Apache Hadoop 2.9.1 can be downloaded from\u00a0<a href=\"https:\/\/hadoop.apache.org\/releases.html\">Hadoop Website<\/a>. You can download Apache Hive 3.1.2 from the\u00a0<a href=\"https:\/\/hive.apache.org\/downloads.html\">Hive website<\/a>.<\/p>\n<h3 class=\"wp-block-heading\" id=\"h-2-3-setup\">2.3 Setup<\/h3>\n<p>You can set the environment variables for JAVA_HOME and PATH. They can be set as shown below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Setup<\/em><\/span><\/p>\n<pre class=\"brush:plain\">JAVA_HOME=\"\/desktop\/jdk1.8.0_73\"\nexport JAVA_HOME\nPATH=$JAVA_HOME\/bin:$PATH\nexport PATH\n<\/pre>\n<p>The environment variables for maven are set as below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Maven Environment<\/em><\/span><\/p>\n<pre class=\"brush:plain\">JAVA_HOME=\u201d\/jboss\/jdk1.8.0_73\u2033\nexport M2_HOME=\/users\/bhagvan.kommadi\/Desktop\/apache-maven-3.6.1\nexport M2=$M2_HOME\/bin\nexport PATH=$M2:$PATH\n<\/pre>\n<h3 class=\"wp-block-heading\" id=\"h-2-4-how-to-download-and-install-hadoop-and-etl-tool\">2.4 How to download and install Hadoop and ETL tool<\/h3>\n<p> After downloading the zip files of Hadoop and hive, they can be extracted to different folders.  The libraries in the libs folder are set in the CLASSPATH variable.<\/p>\n<h3 class=\"wp-block-heading\" id=\"h-2-5-apache-hive\">2.5 Apache Hive<\/h3>\n<p>Apache Hive is part of the Hadoop Ecosystem. It is used in Big Data solutions with Hadoop. It was developed by Facebook. Hadoop is an Apache Opensource project now. Apache Hive has features for SQL Access of data, handling multiple data formats, file access from Apache HDFS ad Apache HBase, executing query through Apache Tez, Apache Spark or Map Reduce, HPL-SQL language support, and query retrieval using Hive LLAP (<strong>Low Latency Analytical Processin<\/strong>g), Apache YARN &amp; Apache Slider. Hive has a command-line tool and JDBC driver for data operations.<\/p>\n<p>Apache Hive has HCatalog and WebHCat components. HCatalog is used for storing data into Hadoop and provides data processing capabilities using Pig and Map Reduce. WebHCat is used to execute Hadoop MapReduce, Pig, and Hive jobs. Hive can be used for managing metadata operations using REST API. Hive can handle JDBC data types for handling data transformations.<\/p>\n<p>Apache Hive is used for the execution of queries on Hadoop- as Map-Reduce jobs. Customizations can be added as scripts. Hive can be executed for storing data into the database.  Data can have primitives, and collections. This tool has CLI (Command Line Interface) which is used for the execution of DDL-based queries. Hive query language has support for CONCAT, SUBSTR, ROUND, SUM, COUNT, MAX, and other operations. This query language has support for GROUP BY and SORT BY clauses.<\/p>\n<h3 class=\"wp-block-heading\" id=\"h-2-6-apache-hadoop-etl-example\">2.6 Apache Hadoop ETL Example<\/h3>\n<p>You need to configure&nbsp;<code>HADOOP_HOME<\/code>&nbsp;as below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Setup<\/em><\/span><\/p>\n<pre class=\"brush:plain\">export HADOOP_HOME=\/users\/bhagvan.kommadi\/desktop\/hadoop-2.9.1\/\n<\/pre>\n<p>You need to configure&nbsp;<code>$HADOOP_HOME\/etc\/hadoop\/core-site.xml<\/code>&nbsp;as below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Core Site XML file<\/em><\/span><\/p>\n<pre class=\"brush:xml\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;\n&lt;?xml-stylesheet type=\"text\/xsl\" href=\"configuration.xsl\"?&gt;\n\n&lt;configuration&gt;\n\n&lt;property&gt;\n &lt;name&gt;fs.defaultFS&lt;\/name&gt;\n &lt;value&gt;hdfs:\/\/apples-MacBook-Air.local:8020&lt;\/value&gt;\n&lt;\/property&gt;\n\n&lt;\/configuration&gt;\n\n<\/pre>\n<p>You need to start running Hadoop by using the command below:<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<p><span style=\"text-decoration: underline\"><em>Hadoop Execution<\/em><\/span><\/p>\n<pre class=\"brush:plain\">cd hadoop-2.9.1\/\ncd sbin\n.\/start-dfs.sh\n<\/pre>\n<p>The output of the commands is shown below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Hadoop Execution<\/em><\/span><\/p>\n<pre class=\"brush:plain\">apples-MacBook-Air:sbin bhagvan.kommadi$ .\/start-dfs.sh\n20\/09\/14 20:26:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform\u2026 using builtin-java classes where applicable\nStarting namenodes on [apples-MacBook-Air.local]\napples-MacBook-Air.local: Warning: Permanently added the ECDSA host key for IP address 'fe80::4e9:963f:5cc3:a000%en0' to the list of known hosts.\nPassword:\napples-MacBook-Air.local: starting namenode, logging to \/Users\/bhagvan.kommadi\/desktop\/hadoop-2.9.1\/logs\/hadoop-bhagvan.kommadi-namenode-apples-MacBook-Air.local.out\nPassword:\nlocalhost: starting datanode, logging to \/Users\/bhagvan.kommadi\/desktop\/hadoop-2.9.1\/logs\/hadoop-bhagvan.kommadi-datanode-apples-MacBook-Air.local.out\nStarting secondary namenodes [0.0.0.0]\nPassword:\n0.0.0.0: starting secondarynamenode, logging to \/Users\/bhagvan.kommadi\/desktop\/hadoop-2.9.1\/logs\/hadoop-bhagvan.kommadi-secondarynamenode-apples-MacBook-Air.local.out\n20\/09\/14 20:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform\u2026 using builtin-java classes where applicable\n<\/pre>\n<p>Mysql is used as the database for Hive Metastore. You need to configure&nbsp;<code>$HIVE_HOME\/conf\/hive-site.xml<\/code>&nbsp;as below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Hive Site \u2013 Hive Configuration<\/em><\/span><\/p>\n<pre class=\"brush:xml\">&lt;configuration&gt;\n \n      &lt;property&gt;\n \n        &lt;name&gt;hive.metastore.warehouse.dir&lt;\/name&gt;\n \n        &lt;value&gt;\/users\/bhagvan.kommadi\/hive\/warehouse&lt;\/value&gt;\n \n      &lt;\/property&gt;\n \n    &lt;property&gt;\n \n      &lt;name&gt;javax.jdo.option.ConnectionURL&lt;\/name&gt;\n \n      &lt;value&gt;jdbc:mysql:\/\/localhost:3306\/hivedb?createDatabaseIfNotExist=true&amp;useSSL=false&lt;\/value&gt;\n \n    &lt;\/property&gt;\n \n    &lt;property&gt;\n \n      &lt;name&gt;javax.jdo.option.ConnectionDriverName&lt;\/name&gt;\n \n      &lt;value&gt;com.mysql.jdbc.Driver&lt;\/value&gt;\n \n    &lt;\/property&gt;\n \n    &lt;property&gt;\n \n      &lt;name&gt;javax.jdo.option.ConnectionUserName&lt;\/name&gt;\n \n      &lt;value&gt;newuser&lt;\/value&gt;\n \n    &lt;\/property&gt;\n \n    &lt;property&gt;\n \n      &lt;name&gt;javax.jdo.option.ConnectionPassword&lt;\/name&gt;\n \n      &lt;value&gt;newuser&lt;\/value&gt;\n \n    &lt;\/property&gt;\n \n&lt;\/configuration&gt;\n\n\n<\/pre>\n<p>You need to start running Hive (HiveServer2) by using the command below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Hive Execution<\/em><\/span><\/p>\n<pre class=\"brush:plain\">export HIVE_HOME=\/users\/bhagvan.kommadi\/desktop\/apache-hive-3.1.2-bin\/\n$HIVE_HOME\/bin\/hiveserver2\n<\/pre>\n<p>The output of the commands is shown below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Hive Execution<\/em><\/span><\/p>\n<pre class=\"brush:plain\">apples-MacBook-Air:hive bhagvan.kommadi$ $HIVE_HOME\/bin\/hiveserver2\n2020-09-14 23:56:26: Starting HiveServer2\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:\/Users\/bhagvan.kommadi\/Desktop\/apache-hive-3.1.2-bin\/lib\/log4j-slf4j-impl-2.10.0.jar!\/org\/slf4j\/impl\/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:\/Users\/bhagvan.kommadi\/Desktop\/hadoop-2.9.1\/share\/hadoop\/common\/lib\/slf4j-log4j12-1.7.25.jar!\/org\/slf4j\/impl\/StaticLoggerBinder.class]\nSLF4J: See http:\/\/www.slf4j.org\/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\nHive Session ID = 28c5134a-d9f7-4ac2-9313-a04386f57ac9\nHive Session ID = 9c2982fa-965d-43e3-9f45-660e899a8958\nHive Session ID = 3000b392-aa68-4db1-ae3f-5b55c0fda19d\nHive Session ID = da06d930-091f-4097-b8b0-cd463e14dc2d\nHive Session ID = be1d5b5a-7f1a-4608-a08e-68f5515a2d90\nHive Session ID = 42f8afa1-3399-490e-8101-3f28d8d30072\nHive Session ID = 17b1f2aa-2c6d-40ff-849b-4c82fd1e38e0\nHive Session ID = d4e82376-f0ee-42e1-b27c-70dd8ce6efdc\nHive Session ID = 1e20ac56-21cc-45ef-9976-48078c6e3a12\nHive Session ID = 5821afdf-696f-46d1-acfe-15f1cf078e4e\nHive Session ID = f67cf1ba-937b-46a3-92b7-9c9efd145ae2\nHive Session ID = 9d8e3c3e-e216-4907-b0ba-08f23ffc8fd4\nHive Session ID = 316e0807-9c55-4bb5-a8da-360396581870\nHive Session ID = cef4c8de-9da8-4617-a053-9e28b40e8d6b\nHive Session ID = 596b7b81-47d1-4b09-9816-e88576c5529c\nHive Session ID = 7b1fe697-77e7-4c19-ac19-b0e0bf942480\nHive Session ID = 3aa7813d-f6a8-4238-a0b4-334106946266\nHive Session ID = e6631200-ee2b-487a-af8f-5d25f2a5e193\n<\/pre>\n<p>To configure the JDBC connection to Apache Hive, you can use the following code:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Hadoop ETL Example<\/em><\/span><\/p>\n<pre class=\"brush:java\">import java.sql.Connection;\nimport java.sql.DriverManager;\nimport java.sql.ResultSet;\nimport java.sql.SQLException;\nimport java.sql.Statement;\n\npublic class HadoopETLExample {\n\tprivate static String driverClass = \"org.apache.hive.jdbc.HiveDriver\";\n\n\tpublic static void main(String args[]) throws SQLException {\n\t\ttry {\n\t\t\tClass.forName(driverClass);\n\t\t} catch (ClassNotFoundException exception) {\n\n\t\t\texception.printStackTrace();\n\t\t\tSystem.exit(1);\n\t\t}\n\t\tConnection connection = DriverManager.getConnection(\"jdbc:hive2:\/\/\", \"\", \"\");\n\t\tStatement statement = connection.createStatement();\n\n\t\tString table = \"EMPLOYEE\";\n\t\ttry {\n\t\t\tstatement.executeQuery(\"DROP TABLE \" + table);\n\t\t} catch (Exception exception) {\n\t\t\texception.printStackTrace();\n\t\t}\n\n\t\ttry {\n\t\t\tstatement.executeQuery(\"CREATE TABLE \" + table + \" (ID INT, NAME STRING, ADDR STRING)\");\n\t\t} catch (Exception exception) {\n\t\t\texception.printStackTrace();\n\t\t}\n\n\t\tString sql = \"SHOW TABLES '\" + table + \"'\";\n\t\tSystem.out.println(\"Executing Show table: \" + sql);\n\t\tResultSet result = statement.executeQuery(sql);\n\t\tif (result.next()) {\n\t\t\tSystem.out.println(\"Table created is :\" + result.getString(1));\n\t\t}\n\n\t\tsql = \"INSERT INTO EMPLOYEE (ID,NAME,ADDR) VALUES (1, 'John', '4 WestDrive SJC' )\";\n\t\tSystem.out.println(\"Inserting table into employee: \" + sql);\n\n\t\ttry {\n\t\t\tstatement.executeUpdate(sql);\n\t\t} catch (Exception exception) {\n\t\t\texception.printStackTrace();\n\t\t}\n\n\t\tsql = \"SELECT * FROM \" + table;\n\t\tresult = statement.executeQuery(sql);\n\t\tSystem.out.println(\"Running: \" + sql);\n\t\tresult = statement.executeQuery(sql);\n\t\twhile (result.next()) {\n\t\t\tSystem.out.println(\"Id=\" + result.getString(1));\n\t\t\tSystem.out.println(\"Name=\" + result.getString(2));\n\t\t\tSystem.out.println(\"Address=\" + result.getString(3));\n\t\t}\n\t\tresult.close();\n\n\t\tstatement.close();\n\n\t\tconnection.close();\n\n\t}\n}\n<\/pre>\n<p>In the eclipse, a Java project is configured with dependencies from<\/p>\n<ul class=\"wp-block-list\">\n<li><code>hive-jdbc.3.1.2-standalone.jar<\/code><\/li>\n<li><code>$HIVE_HOME\/lib\/*.jar<\/code>&nbsp;files<\/li>\n<li><em><code>$HADOOP_HOME\/share\/hadoop\/mapreduce\/<\/code><\/em><code>*.jar<\/code>&nbsp;files<\/li>\n<li><code>$HADOOP_HOME\/share\/hadoop\/common\/*.jar<\/code><\/li>\n<\/ul>\n<p>The apache hive JDBC code is executed from Eclipse using the Run command. The output is shown below:<\/p>\n<p><span style=\"text-decoration: underline\"><em>Hive Execution<\/em><\/span><\/p>\n<pre class=\"brush:plain\">Loading data to table default.employee\n2020-09-14T23:56:57,782 INFO [HiveServer2-Background-Pool: Thread-42] org.apache.hadoop.hive.ql.exec.Task - Loading data to table default.employee from file:\/users\/bhagvan.kommadi\/hive\/warehouse\/customer\/.hive-staging_hive_2020-09-14_23-56-50_794_3066299632130740540-1\/-ext-10000\n2020-09-14T23:56:57,784 INFO [HiveServer2-Background-Pool: Thread-42] org.apache.hadoop.hive.metastore.HiveMetaStore - 4: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore\nRunning: SELECT * FROM EMPLOYEE\n2020-09-14T23:56:58,584 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 42cd1c1e-dae1-4eb2-932c-57bf6653e77d\n2020-09-14T23:56:58,584 INFO [main] org.apache.hadoop.hive.ql.session.SessionState - Updating thread name to 42cd1c1e-dae1-4eb2-932c-57bf6653e77d main\n2020-09-14T23:56:58,785 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 42cd1c1e-dae1-4eb2-932c-57bf6653e77d\n2020-09-14T23:56:58,786 INFO [main] org.apache.hadoop.hive.ql.session.SessionState - Updating thread name to 42cd1c1e-dae1-4eb2-932c-57bf6653e77d main\n2020-09-14T23:56:58,786 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 42cd1c1e-dae1-4eb2-932c-57bf6653e77d\n2020-09-14T23:56:58,786 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.ql.session.SessionState - Resetting thread name to  main\n2020-09-14T23:56:58,786 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 42cd1c1e-dae1-4eb2-932c-57bf6653e77d\n2020-09-14T23:56:58,787 INFO [main] org.apache.hadoop.hive.ql.session.SessionState - Updating thread name to 42cd1c1e-dae1-4eb2-932c-57bf6653e77d main\n2020-09-14T23:56:58,833 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.mapred.FileInputFormat - Total input files to process : 1\n2020-09-14T23:56:58,837 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.ql.exec.TableScanOperator - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_TS_0:1, \n2020-09-14T23:56:58,838 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.ql.exec.SelectOperator - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_SEL_1:1, \n2020-09-14T23:56:58,838 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.ql.exec.ListSinkOperator - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_LIST_SINK_3:1, \n2020-09-14T23:56:58,838 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 42cd1c1e-dae1-4eb2-932c-57bf6653e77d\n2020-09-14T23:56:58,838 INFO [42cd1c1e-dae1-4eb2-932c-57bf6653e77d main] org.apache.hadoop.hive.ql.session.SessionState - Resetting thread name to  main\nId=1\nName=John\nAddress=4 WestDrive SJC\n<\/pre>\n<p>The output above shows only the\u00a0<code>select<\/code>\u00a0query from the\u00a0<code>EMPLOYEE<\/code>\u00a0table. In the code, the\u00a0<code>EMPLOYEE<\/code>\u00a0table is created. Data is inserted into the\u00a0<code>EMPLOYEE<\/code>\u00a0table. Apache Hive JDBC calls are based on Java JDBC calls using HiveQL (similar to SQL). Apache Hive can be used for modeling, manipulation, processing, and querying of data.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-3-download-the-source-code\">3. Download the Source Code<\/h2>\n<div class=\"download\"><strong>Download<\/strong><br \/>\nYou can download the full source code of this example here:<a href=\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2021\/09\/apachehadoopetl.zip\"> <strong>Apache Hadoop ETL Tutorial<\/strong><\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>1. Introduction This is an in-depth article related to the Apache Hadoop ETL Tool &#8211; Hive. Hive is part of the Hadoop Ecosystem. It is used in Big Data solutions with Hadoop. It was developed by Facebook. Hadoop is an Apache Opensource project now. Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system &hellip;<\/p>\n","protected":false},"author":31,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1255],"tags":[1270],"class_list":["post-104801","post","type-post","status-publish","format-standard","hentry","category-apache-hadoop","tag-apache-hadoop"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Apache Hadoop ETL Tutorial - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for execution of queries and handling big data.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Hadoop ETL Tutorial - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for execution of queries and handling big data.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\" \/>\n<meta property=\"og:site_name\" content=\"Examples Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/bhagvank\" \/>\n<meta property=\"article:published_time\" content=\"2021-09-20T08:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2018\/07\/apache-hadoop-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Bhagvan Kommadi\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bhaggu\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Bhagvan Kommadi\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\"},\"author\":{\"name\":\"Bhagvan Kommadi\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/4575ae335b8ff016be62c3b927d5d5e6\"},\"headline\":\"Apache Hadoop ETL Tutorial\",\"datePublished\":\"2021-09-20T08:00:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\"},\"wordCount\":644,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#organization\"},\"keywords\":[\"Apache Hadoop\"],\"articleSection\":[\"Apache Hadoop\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\",\"url\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\",\"name\":\"Apache Hadoop ETL Tutorial - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#website\"},\"datePublished\":\"2021-09-20T08:00:00+00:00\",\"description\":\"Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for execution of queries and handling big data.\",\"breadcrumb\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/examples.javacodegeeks.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java Development\",\"item\":\"https:\/\/examples.javacodegeeks.com\/category\/java-development\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Enterprise Java\",\"item\":\"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Apache Hadoop\",\"item\":\"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/apache-hadoop\/\"},{\"@type\":\"ListItem\",\"position\":5,\"name\":\"Apache Hadoop ETL Tutorial\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#website\",\"url\":\"https:\/\/examples.javacodegeeks.com\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Examples and Code Snippets\",\"publisher\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/examples.javacodegeeks.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\/\/examples.javacodegeeks.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png\",\"contentUrl\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/javacodegeeks\",\"https:\/\/x.com\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/4575ae335b8ff016be62c3b927d5d5e6\",\"name\":\"Bhagvan Kommadi\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2019\/02\/bhagvan.-kommadi-96x96.jpg\",\"contentUrl\":\"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2019\/02\/bhagvan.-kommadi-96x96.jpg\",\"caption\":\"Bhagvan Kommadi\"},\"description\":\"Bhagvan Kommadi is the Founder of Architect Corner &amp; has around 20 years\u2019 experience in the industry, ranging from large scale enterprise development to helping incubate software product start-ups. He has done Masters in Industrial Systems Engineering at Georgia Institute of Technology (1997) and Bachelors in Aerospace Engineering from Indian Institute of Technology, Madras (1993). He is member of IFX forum,Oracle JCP and participant in Java Community Process. He founded Quantica Computacao, the first quantum computing startup in India. Markets and Markets have positioned Quantica Computacao in \u2018Emerging Companies\u2019 section of Quantum Computing quadrants. Bhagvan has engineered and developed simulators and tools in the area of quantum technology using IBM Q, Microsoft Q# and Google QScript. He has reviewed the Manning book titled : \\\"Machine Learning with TensorFlow\u201d. He is also the author of Packt Publishing book - \\\"Hands-On Data Structures and Algorithms with Go\\\".He is member of IFX forum,Oracle JCP and participant in Java Community Process. He is member of the MIT Technology Review Global Panel.\",\"sameAs\":[\"http:\/\/www.architectcorner.com\",\"https:\/\/www.facebook.com\/bhagvank\",\"https:\/\/in.linkedin.com\/pub\/bhagvan-kommadi\/0\/3a6\/b46\",\"https:\/\/x.com\/bhaggu\"],\"url\":\"https:\/\/examples.javacodegeeks.com\/author\/bhagvan-kommadi\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Hadoop ETL Tutorial - Java Code Geeks","description":"Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for execution of queries and handling big data.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/","og_locale":"en_US","og_type":"article","og_title":"Apache Hadoop ETL Tutorial - Java Code Geeks","og_description":"Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for execution of queries and handling big data.","og_url":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/","og_site_name":"Examples Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_author":"https:\/\/www.facebook.com\/bhagvank","article_published_time":"2021-09-20T08:00:00+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2018\/07\/apache-hadoop-logo.jpg","type":"image\/jpeg"}],"author":"Bhagvan Kommadi","twitter_card":"summary_large_image","twitter_creator":"@bhaggu","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Bhagvan Kommadi","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#article","isPartOf":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/"},"author":{"name":"Bhagvan Kommadi","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/4575ae335b8ff016be62c3b927d5d5e6"},"headline":"Apache Hadoop ETL Tutorial","datePublished":"2021-09-20T08:00:00+00:00","mainEntityOfPage":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/"},"wordCount":644,"commentCount":0,"publisher":{"@id":"https:\/\/examples.javacodegeeks.com\/#organization"},"keywords":["Apache Hadoop"],"articleSection":["Apache Hadoop"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/","url":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/","name":"Apache Hadoop ETL Tutorial - Java Code Geeks","isPartOf":{"@id":"https:\/\/examples.javacodegeeks.com\/#website"},"datePublished":"2021-09-20T08:00:00+00:00","description":"Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for execution of queries and handling big data.","breadcrumb":{"@id":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/examples.javacodegeeks.com\/apache-hadoop-etl-tutorial\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/examples.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java Development","item":"https:\/\/examples.javacodegeeks.com\/category\/java-development\/"},{"@type":"ListItem","position":3,"name":"Enterprise Java","item":"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/"},{"@type":"ListItem","position":4,"name":"Apache Hadoop","item":"https:\/\/examples.javacodegeeks.com\/category\/java-development\/enterprise-java\/apache-hadoop\/"},{"@type":"ListItem","position":5,"name":"Apache Hadoop ETL Tutorial"}]},{"@type":"WebSite","@id":"https:\/\/examples.javacodegeeks.com\/#website","url":"https:\/\/examples.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Examples and Code Snippets","publisher":{"@id":"https:\/\/examples.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/examples.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/examples.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/examples.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/4575ae335b8ff016be62c3b927d5d5e6","name":"Bhagvan Kommadi","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/examples.javacodegeeks.com\/#\/schema\/person\/image\/","url":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2019\/02\/bhagvan.-kommadi-96x96.jpg","contentUrl":"https:\/\/examples.javacodegeeks.com\/wp-content\/uploads\/2019\/02\/bhagvan.-kommadi-96x96.jpg","caption":"Bhagvan Kommadi"},"description":"Bhagvan Kommadi is the Founder of Architect Corner &amp; has around 20 years\u2019 experience in the industry, ranging from large scale enterprise development to helping incubate software product start-ups. He has done Masters in Industrial Systems Engineering at Georgia Institute of Technology (1997) and Bachelors in Aerospace Engineering from Indian Institute of Technology, Madras (1993). He is member of IFX forum,Oracle JCP and participant in Java Community Process. He founded Quantica Computacao, the first quantum computing startup in India. Markets and Markets have positioned Quantica Computacao in \u2018Emerging Companies\u2019 section of Quantum Computing quadrants. Bhagvan has engineered and developed simulators and tools in the area of quantum technology using IBM Q, Microsoft Q# and Google QScript. He has reviewed the Manning book titled : \"Machine Learning with TensorFlow\u201d. He is also the author of Packt Publishing book - \"Hands-On Data Structures and Algorithms with Go\".He is member of IFX forum,Oracle JCP and participant in Java Community Process. He is member of the MIT Technology Review Global Panel.","sameAs":["http:\/\/www.architectcorner.com","https:\/\/www.facebook.com\/bhagvank","https:\/\/in.linkedin.com\/pub\/bhagvan-kommadi\/0\/3a6\/b46","https:\/\/x.com\/bhaggu"],"url":"https:\/\/examples.javacodegeeks.com\/author\/bhagvan-kommadi\/"}]}},"_links":{"self":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/104801","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/users\/31"}],"replies":[{"embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=104801"}],"version-history":[{"count":0,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/104801\/revisions"}],"wp:attachment":[{"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=104801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=104801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/examples.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=104801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}