0% found this document useful (0 votes)
5 views14 pages

SIC Big Data Chapter 4 HBase

This document provides a comprehensive guide on performing data access operations using HBase, including creating, deleting, and altering tables, as well as executing CRUD operations. It covers the use of HBase shell commands for inserting, retrieving, and managing data within tables, and introduces concepts such as Time-To-Live (TTL) and versioning. Additionally, it details the process of scripting HBase commands for batch execution and demonstrates how to manage column families and their properties.

Uploaded by

Hoàng Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views14 pages

SIC Big Data Chapter 4 HBase

This document provides a comprehensive guide on performing data access operations using HBase, including creating, deleting, and altering tables, as well as executing CRUD operations. It covers the use of HBase shell commands for inserting, retrieving, and managing data within tables, and introduces concepts such as Time-To-Live (TTL) and versioning. Additionally, it details the process of scripting HBase commands for batch execution and demonstrates how to manage column families and their properties.

Uploaded by

Hoàng Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3.

Data Access with HBase


In this lab, you run to basic operation with creating, deleting, and altering a table in HBase shell. And
you will use the shell to put and get data in HBase.

1. Start with HBase Shell


1.1. Run the HBase shell. And execute the help command and view the basic usage
information for HBase Shell.
$hbase shell
hbase(main):001:0> help
Note: The HBase shell prompt ends with a “>” character.
1.2. Display the version and status for basic usage
hbase(main):001:0> version
hbase(main):001:0> status

You can check the Hbase version (2.3.5) and standalone execution (1 active master, 1 servers).
1.3. Use the create command to create a new table. You must specify the table name and the
column family name.
1.3.1. Table name: tbl_authors, Column Family: cf1
hbase(main):001:0> create ‘tbl_authors’, ‘cf1’
Note: Table names, rows, columns all must be enclosed in quote mark.
1.4. List table to verify table tbl_authors was created using list. You can either give the List
command alone or give the table name along with List .
hbase(main):002:0> list
hbase(main):003:0> list ‘tbl_authors’

36
1.5. Use the describe command to see details, including con guration defaults.
hbase(main):001:0> describe ‘tbl_authors’
hbase(main):002:0> describe

1.6. Delete the table you just created. And list all tables to verify table was successfully deleted
hbase(main):001:0> drop ‘tbl_authors’
Note: Table tbl_authors is enabled now. Disable it rst to delete it.
1.7. Disable the tbl_authors table for drop.
hbase(main):003:0> disable ‘tbl_authors’
hbase(main):004:0> drop ‘tbl_authors’
hbase(main):005:0> list

37
fi
fi
1.8. Create a test table, and use the put command. Here, we insert three values, one at a time.
1.8.1. The input data is in the following format.

Column family
Rowkey cf1:a cf1:b cf1:c
rk1 A
rk2 B
rk3 C
Table name: temp, Column Family: cf1

hbase(main):003:0> create 'temp', 'cf1'


hbase(main):004:0> put 'temp', 'rk1', 'cf1:a', 'A'
hbase(main):005:0> put 'temp', 'rk2', 'cf1:b', 'B'
hbase(main):006:0> put 'temp', 'rk3', 'cf1:c', 'C'

1.9. Count the number of rows in the HBase temp table that you created in the previous lab

38
hbase(main):003:0> count ‘temp’
1.10. What is the expected number of rows after running the following command?
1.10.1.Input data under the following conditions into temp table.
Column Family: cf1
RowKey:rk4, column descriptor : d and value: D
RowKey:rk1, column descriptor : b and value: 1B
RowKey:rk5, column descriptor : d and value: E

hbase(main):003:0> put 'temp', 'rk4', 'cf1:d', 'D'


hbase(main):003:0> put 'temp', 'rk1', 'cf1:b', '1B'
hbase(main):003:0> put 'temp', 'rk5', 'cf1:d', 'E'
hbase(main):003:0> scan ‘temp’
The following is the result of scanning the data in the TEMP table. The total list is 6, but the actual
number of rows is 5.

Note: There are two data in cf1 with ROWKEY rk=1.


1.11. Change the version attribute of cf1 in the temp table to 3, and check whether the change is
re ected correctly.
hbase(main):003:0> alter ‘temp’, {NAME=>'cf1', VERSIONS=>3}
hbase(main):004:0> desc ‘temp’

39
fl




1.12. Change the temp table to add the cf2 and cf3 column family using the ALTER command.
hbase(main):003:0> alter 'temp', 'cf2', 'cf3'
1.13. Execute the command to check whether cf2 and cf3 have been added.

1.14. Remove the newly added cf3 and check the result.
hbase(main):003:0> alter 'temp', {'delete' => 'cf3'}
hbase(main):004:0> desc ‘temp’

1.15. Delete the temp table. And list all tables to verify table was successfully deleted.
1.16. Exit HBase shell with the quit command
hbase(main):004:0> quit
1.17. Run HBase shell commands from a scripts le.
$vi rubyscript.rb
disable ‘temp’
drop ‘temp’

create 'temp', 'cf1'


put 'temp', 'rk1', 'cf1:a', 'A'
put 'temp', 'rk2', 'cf1:b', 'B'
put 'temp', 'rk3', 'cf1:c', 'C'
put 'temp', 'rk4', 'cf1:d', 'D'

alter 'temp', {NAME=>'cf1', VERSIONS=>3}


alter 'temp', 'cf2', 'cf3'

40

fi
put 'temp', 'rk1', 'cf1:b', '1B'
put 'temp', 'rk5', 'cf1:d', 'E'

scan 'temp'
get 'temp', 'rk1'
count 'temp'

exit
1.18. You can enter HBase shell commands into a text le, one command per line, and pass the
le to the shell.
hbase shell rubyscript.rb
hbase shell
hbase(main):001:0> require ‘./rubyscript.rb’

41
fi
fi
Note: This is a rubyscript.rb execution result screen, and if there is no exit statement in the script, it
remains in the shell state.

1.19. You can also pass commands to the HBase Shell in non-interactive mode using the echo
command and the | (pipe) operator.
echo "describe 'temp'" | hbase shell -n

42
4. Data Accessing Using DML commands
In this lab, you will use command to inserting, retrieving, scanning, and removing rows.

1. CRUD(Insert, select, update, delete) operations.


1.1. If you have not nished lab1, run the ruby.rb script rst and proceed with this lab2.
1.2. Run the HBase shell:
hbase shell
1.3. Enter data with the following conditions and display the results.
RowKey:rk1, column descriptor : b and value: F

hbase(main):003:0> put ‘temp’, ‘rk1’, ‘cf1:b’, ‘F’


hbase(main):004:0> scan ‘temp’

1.4. Get the row with rowkey rk1 from temp table.
Hbase(main):003:0> get ‘temp’, ‘rk1’

1.5. Get two values of the previous versions of the ‘b’ column with the rk1.
Hbase(main):003:0> get ‘temp’, ‘rk1’, {COLUMNS => ‘cf1:b’, VERSIONS=>2}
hbase(main):004:0> get ‘temp’, ‘rk1’, {COLUMNS => ‘cf1:b’}

43
fi



fi
Note: The rst input 1B value and the last input F value are displayed. If the version is not speci ed,
only the nal value is displayed.
1.6. View the all table, but only show the ‘a’, ‘b’, ‘c’ columns.
Hbase(main):003:0> scan ‘temp’, {COLUMNS => [‘cf1:b’, ‘cf1:a’, ‘cf1:c’]}
1.7. Delete the ‘b’ column from the temp table with the rowkey = rk1.
Hbase(main):003:0> delete ‘temp’, ‘rk1’, ‘cf1:b’
1.8. Verify that ‘b’ column has been deleted.
Hbase(main):004:0> scan ‘temp’

1.9. Delete the entire row form the temp table in rk1 rowkey.
Hbase(main):012:0> deleteall ‘temp’, ‘rk5’
1.10. Show the row with rk1 has been deleted from temp table.

2. Using MIN_VERSIONS and Time-To-Live

44
fi
fi





fi
2.1. Describe the temp table and check the TTL value of cf2 column. The default TTL(Time-
To-Live) is FOREVER, meaning that versions of a cell never expire.

2.2. Change the value from FOREVER to 30 seconds. This means the version will be deleted
30 seconds after inserting data into the table.
Base(main):005:0> alter ‘temp’, NAME => ‘cf2’, TTL=> 30
2.3. Show the TTL changed from FOREVER to 30 secs.

2.4. Insert a data with the following conditions and display the results.
RowKey:rk1, Column Family: cf2, column descriptor : ttl and value: Y

hbase(main):007:0> put ‘temp’, ‘rk1’, ‘cf2:ttl’, ‘Y’

45
2.5. After waiting at least 30 seconds, run the scan command again to see if the inserted row
has expired and has been deleted.

2.6. Set TTL to 10 seconds and set MIN_VERSIONS to 1 with cf2.


Hbase(main):014:0> alter ‘temp’, NAME => ‘cf2’, TTL=> 10, MIN_VERSIONS => 1
Note: When a version expires with MIN_VERSION option, it will not be deleted if it is the only
remaining version of the cell.
2.7. Verify the TTL and MIN_VERSIONS value in cf2.

2.8. Insert a data with the following conditions and display the results.
RowKey:rk5, Column Family: cf2, column descriptor : ttl_min and value: Z

hbase(main):016:0> put ‘temp’, ‘rk5’, ‘cf2:ttl_min’, ‘Z’

2.9. After waiting at least 10 seconds, run the scan command again to see if the inserted row
has expired and has been remained.

46

47
5. Working with HBase
In this lab, you will work with HBase.

1. Write a command for creating table that meets the following conditions
1.1. Create a table with the following conditions:
1.1.1. Table name: movie
Column family: info with 3 versions
Column family: media with 3 versions
1.1.2. Table name: ranking
Column family: info with 3 versions
Column family: versions with 3 versions
1.2. Create a table with the following conditions:
1.2.1. Table name: movie
Column family: info with 3 versions
Column family: media with 3 versions
1.2.2. Table name: ranking
Column family: info with 3 versions
Column family: versions with 3 versions
1.3. Show the number of rows in the movie and user tables you entered.
1.4. Use alter command to change the movie table to add the title column family.
1.5. Run the command to verify that the stitle column family has been added.
1.6. Remove the added stitle.
1.7. Check the movie table to ensure that the stitle column family has been removed.
1.8. Modify the media column family in the movie table to keep 4 versions.
1.9. Show the media change in the movie table.
2. Write a command for CRUD that meets the following conditions
2.1. Show the user table to check the column family names.
2.2. Get the row with row key 100 from the user table.
2.3. Add a row with the following properties.
2.3.1. Table name: user
2.3.2. Row key: 100
2.3.3. Column Family: info
2.3.4. Column Descriptor: age and value 20
2.3.5. Column Descriptor: gender and value F

48
2.3.6. Column Descriptor: zip and value 18730
2.4. Just show the row with row key 100 from the user table.
2.5. In addition, insert a row with the following attributes:
2.5.1. Table name: user
2.5.2. Row key: 100
2.5.3. Column Family: info
2.5.4. Column Descriptor: age and value 30
2.5.5. Table name: user
2.5.6. Row key: 100
2.5.7. Column Family: info
2.5.8. Column Descriptor: age and value 40
2.6. Show the row with row key 100 from the user table.
2.7. In the user table, get all the old columns of the age column version in the row with row key
100.
2.8. Show the entire table with scan command, but only display the age column.
2.9. In the user table, delete the info:age column of the row with row key 100.
2.10. Verify that the age column has been deleted.
2.11. Delete the entire row with row key 100 from the user table.
2.12. Verify that the row with row key 100 has been removed from the user table.

49

You might also like