{"id":11706,"date":"2019-11-02T17:58:22","date_gmt":"2019-11-02T17:58:22","guid":{"rendered":"https:\/\/ittutorial.org\/?p=11706"},"modified":"2019-11-10T10:07:26","modified_gmt":"2019-11-10T10:07:26","slug":"big-data-import-csv-to-hive","status":"publish","type":"post","link":"https:\/\/ittutorial.org\/big-data-import-csv-to-hive\/","title":{"rendered":"Big Data &#8211; Import .csv to Hive"},"content":{"rendered":"<p>Hi everyone, \u00a0<span>In this article we will see how to add a dataset we downloaded from kaggle as a Hive table.<\/span><\/p>\n<p><span>Hive is not a database. \u00a0This is to make use of SQL capabilities by defining a metadata to the files in HDFS. \u00a0Long story short, it brings the possibility to query the hdfs file.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"irc_mi\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/bb\/Apache_Hive_logo.svg\/1200px-Apache_Hive_logo.svg.png\" alt=\"hive ile ilgili g\u00f6rsel sonucu\" width=\"576\" height=\"518\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span>I&#8217;m doing it on the virtual machine I downloaded from Coudera&#8217;s site.\u00a0<a href=\"https:\/\/www.cloudera.com\/downloads\/quickstart_vms\/5-13.html\">https:\/\/www.cloudera.com\/downloads\/quickstart_vms\/5-13.html<\/a><\/span><\/p>\n<p><span>First, download a dataset from kaggle.<\/span><\/p>\n<p>Let&#8217;s try this one,<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11708\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_1.png\" alt=\"\" width=\"1698\" height=\"908\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_1.png 1698w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_1-300x160.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_1-768x411.png 768w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_1-1024x548.png 1024w\" sizes=\"auto, (max-width: 1698px) 100vw, 1698px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span>Move the downloaded data set to the virtual machine with a program such as WinSCP or FileZilla.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11709\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_2.png\" alt=\"\" width=\"1325\" height=\"710\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_2.png 1325w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_2-300x161.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_2-768x412.png 768w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_2-1024x549.png 1024w\" sizes=\"auto, (max-width: 1325px) 100vw, 1325px\" \/><\/p>\n<p>C<span>onnect to the virtual machine and start our operations. \u00a0As you can see in the picture, we see the file we threw into the virtual machine.<\/span><\/p>\n<p><span>Let&#8217;s transfer this file to the hadoop file system.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11710\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_3.png\" alt=\"\" width=\"684\" height=\"399\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_3.png 684w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_3-300x175.png 300w\" sizes=\"auto, (max-width: 684px) 100vw, 684px\" \/><\/p>\n<p>&nbsp;<\/p>\n<pre>hadoop fs -copyFromLocal african_crises.csv data\/\r\n\r\nhadoop fs -ls \/data<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-11711\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_4.png\" alt=\"\" width=\"966\" height=\"169\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_4.png 715w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_4-300x52.png 300w\" sizes=\"auto, (max-width: 966px) 100vw, 966px\" \/><\/p>\n<p><span>Now we will export this csv file to a table we will create.<\/span><\/p>\n<p><span>You can do this via &#8220;hive shell&#8221; or &#8220;hue&#8221;<\/span>. \u00a0<span>You&#8217;ll be doing the same thing in both processes<\/span>.<\/p>\n<p><span>To make the text look more beautiful, let&#8217;s perform this process over <strong>Hue<\/strong>.<\/span><\/p>\n<p><span>After reaching the hue via the web interface, you must open the location indicated by the arrow.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11712\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_5.png\" alt=\"\" width=\"1108\" height=\"660\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_5.png 1108w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_5-300x179.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_5-768x457.png 768w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_5-1024x610.png 1024w\" sizes=\"auto, (max-width: 1108px) 100vw, 1108px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span>I took the column names of the table from the csv file and set the data types.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11713\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_6.png\" alt=\"\" width=\"1469\" height=\"686\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_6.png 1469w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_6-300x140.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_6-768x359.png 768w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_6-1024x478.png 1024w\" sizes=\"auto, (max-width: 1469px) 100vw, 1469px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span>Ready to export our data from the csv file to the table.<\/span><\/p>\n<pre>load data local inpath '\/home\/cloudera\/african_crises.csv' overwrite into table african_crises;<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11714\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_7.png\" alt=\"\" width=\"964\" height=\"153\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_7.png 964w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_7-300x48.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_7-768x122.png 768w\" sizes=\"auto, (max-width: 964px) 100vw, 964px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Check the table,<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-11715\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_8.png\" alt=\"\" width=\"1188\" height=\"406\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_8.png 1188w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_8-300x103.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_8-768x262.png 768w, https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/Screenshot_8-1024x350.png 1024w\" sizes=\"auto, (max-width: 1188px) 100vw, 1188px\" \/><\/p>\n<p>Everything looks fine.<\/p>\n<p>&nbsp;<\/p>\n<p>See you in next article..<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hi everyone, \u00a0In this article we will see how to add a dataset we downloaded from kaggle as a Hive table. Hive is not a database. \u00a0This is to make use of SQL capabilities by defining a metadata to the files in HDFS. \u00a0Long story short, it brings the possibility to query the hdfs file. &hellip;<\/p>\n","protected":false},"author":67,"featured_media":11707,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6674],"tags":[],"class_list":["post-11706","post","type-post","status-publish","format-standard","has-post-thumbnail","","category-big-data"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/ittutorial.org\/wp-content\/uploads\/2019\/11\/1_BMO8Je_iwGEMQo26m6GhQw.png","jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/11706","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/comments?post=11706"}],"version-history":[{"count":1,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/11706\/revisions"}],"predecessor-version":[{"id":11716,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/11706\/revisions\/11716"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/media\/11707"}],"wp:attachment":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/media?parent=11706"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/categories?post=11706"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/tags?post=11706"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}