0% found this document useful (0 votes)
204 views14 pages

Hive Builtin Functions

This document outlines various built-in functions in Hive including mathematical, collection, type conversion, and date functions. It provides the name, signature, and description of many common functions.

Uploaded by

trailver24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views14 pages

Hive Builtin Functions

This document outlines various built-in functions in Hive including mathematical, collection, type conversion, and date functions. It provides the name, signature, and description of many common functions.

Uploaded by

trailver24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Hive Built-in Functions

1. Mathematical Functions
The following built-in mathematical functions are supported in Hive; most return NULL when the
argument(s) are NULL:
Name (Signature) Description
round(DOUBLE a) Returns the rounded BIGINT value of a.
round(DOUBLE a, INT d) Returns a rounded to d decimal places.
bround(DOUBLE a) Returns the rounded BIGINT value of a using HALF_EVEN rounding
mode (as of Hive 1.3.0, 2.0.0). Also known as Gaussian rounding or
bankers' rounding. Example: bround(2.5) = 2, bround(3.5) = 4.
bround(DOUBLE a, INT d) Returns a rounded to d decimal places using HALF_EVEN rounding
mode (as of Hive 1.3.0, 2.0.0). Example: bround(8.25, 1) = 8.2,
bround(8.35, 1) = 8.4.
floor(DOUBLE a) Returns the maximum BIGINT value that is equal to or less than a.
ceil(DOUBLE a), Returns the minimum BIGINT value that is equal to or greater than a.
ceiling(DOUBLE a)
rand(), rand(INT seed) Returns a random number (that changes from row to row) that is
distributed uniformly from 0 to 1. Specifying the seed will make sure
the generated random number sequence is deterministic.
exp(DOUBLE a), Returns ea where e is the base of the natural logarithm. Decimal
exp(DECIMAL a) version added in Hive 0.13.0.
ln(DOUBLE a), Returns the natural logarithm of the argument a. Decimal version
ln(DECIMAL a) added in Hive 0.13.0.
log10(DOUBLE a), Returns the base-10 logarithm of the argument a. Decimal version
log10(DECIMAL a) added in Hive 0.13.0.
log2(DOUBLE a), Returns the base-2 logarithm of the argument a. Decimal version
log2(DECIMAL a) added in Hive 0.13.0.
log(DOUBLE base, DOUBLE a) Returns the base-base logarithm of the argument a. Decimal versions
log(DECIMAL base, DECIMAL a) added in Hive 0.13.0.
pow(DOUBLE a, DOUBLE p), Returns ap.
power(DOUBLE a, DOUBLE p)
sqrt(DOUBLE a), Returns the square root of a. Decimal version added in Hive 0.13.0.
sqrt(DECIMAL a)
bin(BIGINT a) Returns the number in binary format
hex(BIGINT a) hex(STRING a) If the argument is an INT or binary, hex returns the number as a
hex(BINARY a) STRING in hexadecimal format. Otherwise if the number is a
STRING, it converts each character into its hexadecimal
representation and returns the resulting STRING.
unhex(STRING a) Inverse of hex. Interprets each pair of characters as a hexadecimal
number and converts to the byte representation of the number.
(BINARY version as of Hive 0.12.0, used to return a string.)
conv(BIGINT num, INT Converts a number from a given base to another
from_base, INT to_base),
conv(STRING num, INT
from_base, INT to_base)
abs(DOUBLE a) Returns the absolute value.
pmod(INT a, INT b), Returns the positive value of a mod b.
pmod(DOUBLE a, DOUBLE b)
sin(DOUBLE a), sin(DECIMAL a) Returns the sine of a (a is in radians).
asin(DOUBLE a), Returns the arc sin of a if -1<=a<=1 or NULL otherwise.
asin(DECIMAL a)
cos(DOUBLE a), Returns the cosine of a (a is in radians). Decimal version added in
cos(DECIMAL a) Hive 0.13.0.
acos(DOUBLE a), Returns the arccosine of a if -1<=a<=1 or NULL otherwise. Decimal
acos(DECIMAL a) version added in Hive 0.13.0.
tan(DOUBLE a), Returns the tangent of a (a is in radians). Decimal version added in
tan(DECIMAL a) Hive 0.13.0.
atan(DOUBLE a), Returns the arctangent of a. Decimal version added in Hive 0.13.0.
atan(DECIMAL a)
degrees(DOUBLE a), Converts value of a from radians to degrees. Decimal version added in
degrees(DECIMAL a) Hive 0.13.0.
radians(DOUBLE a), Converts value of a from degrees to radians. Decimal version added in
radians(DOUBLE a) Hive 0.13.0.
positive(INT a), Returns a.
positive(DOUBLE a)
negative(INT a), Returns -a.
negative(DOUBLE a)
sign(DOUBLE a), Returns the sign of a as '1.0' (if a is positive) or '-1.0' (if a is negative),
sign(DECIMAL a) '0.0' otherwise. The decimal version returns INT instead of DOUBLE.
Decimal version added in Hive 0.13.0.
e() Returns the value of e.
pi() Returns the value of pi.
factorial(INT a) Returns the factorial of a (as of Hive 1.2.0). Valid a is [0..20].
cbrt(DOUBLE a) Returns the cube root of a double value (as of Hive 1.2.0).
Shiftleft(TINYINT|SMALLINT|INT Bitwise left shift (as of Hive 1.2.0). Shifts a b positions to the left.
a, INT b) Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
shiftleft(BIGINT a, INT b)
shiftright(TINYINT|SMALLINT| Bitwise right shift (as of Hive 1.2.0). Shifts a b positions to the right.
INT a, INT b) Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
shiftright(BIGINT a, INT b)
shiftrightunsigned(TINYINT| Bitwise unsigned right shift (as of Hive 1.2.0). Shifts a b positions to
SMALLINT|INT a, INT b), the right.
shiftrightunsigned(BIGINT a, INT Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
b)
greatest(T v1, T v2, ...) Returns the greatest value of the list of values (as of Hive 1.1.0). Fixed
to return NULL when one or more arguments are NULL, and strict
type restriction relaxed, consistent with ">" operator (as of Hive
2.0.0).
least(T v1, T v2, ...) Returns the least value of the list of values (as of Hive 1.1.0). Fixed to
return NULL when one or more arguments are NULL, and strict type
restriction relaxed, consistent with "<" operator (as of Hive 2.0.0).
width_bucket(NUMERIC expr, Returns an integer between 0 and num_buckets+1 by mapping expr
NUMERIC min_value, NUMERIC into the ith equally sized bucket. Buckets are made by dividing
max_value, INT num_buckets) [min_value, max_value] into equally sized regions. If expr <
min_value, return 1, if expr > max_value return num_buckets+1. (as
of Hive 3.0.0)
2. Collection Functions

Name(Signature) Description
size(Map<K.V>) Returns the number of elements in the map type.
size(Array<T>) Returns the number of elements in the array type.
map_keys(Map<K.V>) Returns an unordered array containing the keys of the input map.
map_values(Map<K.V>) Returns an unordered array containing the values of the input map.
array_contains(Array<T>, value) Returns TRUE if the array contains value.
sort_array(Array<T>) Sorts the input array in ascending order according to the natural
ordering of the array elements and returns it (as of version 0.9.0).

3. Type Conversion Functions

Name(Signature) Description
binary(string|binary) Casts the parameter into a binary.
cast(expr as <type>) Converts the results of the expression expr to <type>. For example, cast('1'
as BIGINT) will convert the string '1' to its integral representation. A null is
returned if the conversion does not succeed. If cast(expr as boolean) Hive
returns true for a non-empty string.

4. Date Functions

Name(Signature) Description
from_unixtime(bigint Converts the number of seconds from unix epoch (1970-01-01
unixtime[,stringformat ]) 00:00:00 UTC) to a string representing the timestamp of that
moment in the current system time zone in the format of "1970-01-01
00:00:00".
unix_timestamp() Gets current Unix timestamp in seconds. This function is not
deterministic and its value is not fixed for the scope of a query
execution, therefore prevents proper optimization of queries - this has
been deprecated since 2.0 in favour of CURRENT_TIMESTAMP
constant.
unix_timestamp(string date) Converts time string in format yyyy-MM-dd HH:mm:ss to Unix
timestamp (in seconds), using the default timezone and the default
locale, return 0 if fail:
Example : unix_timestamp('2009-03-20 11:30:01') = 1237573801
unix_timestamp(string date, string Convert time string with given pattern to Unix time stamp (in
pattern) seconds), return 0 if fail: unix_timestamp('2009-03-20', 'yyyy-MM-
dd') = 1237532400.
to_date(string timestamp) Returns the date part of a timestamp string (pre-Hive 2.1.0):
to_date("1970-01-01 00:00:00") = "1970-01-01". As of Hive 2.1.0,
returns a date object.
Prior to Hive 2.1.0 the return type was a String because no Date type
existed when the method was created.
year(string date) Returns the year part of a date or a timestamp string: Example :
year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970.
quarter(date/timestamp/string) Returns the quarter of the year for a date, timestamp, or string in the
range 1 to 4 (as of Hive 1.3.0). Example: quarter('2015-04-08') = 2.
month(string date) Returns the month part of a date or a timestamp string: Example :
month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11.
day(string date) dayofmonth(date) Returns the day part of a date or a timestamp string: day("1970-11-01
00:00:00") = 1, day("1970-11-01") = 1.
hour(string date) Returns the hour of the timestamp: hour('2009-07-30 12:58:59') =
12, hour('12:58:59') = 12.
minute(string date) Returns the minute of the timestamp.
second(string date) Returns the second of the timestamp.
weekofyear(string date) Returns the week number of a timestamp string: weekofyear("1970-
11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44.
extract(field FROM source) Retrieve fields such as days or hours from source (as of Hive 2.2.0).
Source must be a date, timestamp, interval or a string that can be
converted into either a date or timestamp. Supported fields include:
day, dayofweek, hour, minute, month, quarter, second, week and
year.
Examples:
1. select extract(month from "2016-10-20") results in 10.
2. select extract(hour from "2016-10-20 05:06:07") results in 5.
3. select extract(dayofweek from "2016-10-20 05:06:07") results in
5.
4. select extract(month from interval '1-3' year to month) results in 3.
5. select extract(minute from interval '3 12:20:30' day to second)
results in 20.
datediff(string enddate, string Returns the number of days from startdate to enddate:
startdate) datediff('2009-03-01', '2009-02-27') = 2.
date_add(date/timestamp/string Adds a number of days to startdate: date_add('2008-12-31', 1) =
startdate, tinyint/smallint/int '2009-01-01'.
days) Prior to Hive 2.1.0 the return type was a String because no Date type
existed when the method was created.
date_sub(date/timestamp/string Subtracts a number of days to startdate: date_sub('2008-12-31', 1) =
startdate, tinyint/smallint/int '2008-12-30'.
days) Prior to Hive 2.1.0 the return type was a String because no Date type
existed when the method was created.
from_utc_timestamp({any Coverts a timestamp* in UTC to a given timezone (as of Hive 0.8.0). *
primitive type}*, string timezone) timestamp is a primitive type, including timestamp/date,
tinyint/smallint/int/bigint, float/double and decimal. Fractional
values are considered as seconds. Integer values are considered as
milliseconds.. E.g from_utc_timestamp(2592000.0,'PST'),
from_utc_timestamp(2592000000,'PST') and
from_utc_timestamp(timestamp '1970-01-30 16:00:00','PST') all
return the timestamp 1970-01-30 08:00:00
to_utc_timestamp({any primitive Coverts a timestamp* in a given timezone to UTC (as of Hive 0.8.0). *
type} ts, string timezone) timestamp is a primitive type, including timestamp/date,
tinyint/smallint/int/bigint, float/double and decimal. Fractional
values are considered as seconds. Integer values are considered as
milliseconds.. E.g to_utc_timestamp(2592000.0,'PST'),
to_utc_timestamp(2592000000,'PST') and
to_utc_timestamp(timestamp '1970-01-30 16:00:00','PST') all return
the timestamp 1970-01-31 00:00:00
current_date Returns the current date at the start of query evaluation (as of Hive
1.2.0). All calls of current_date within the same query return the same
value.
current_timestamp Returns the current timestamp at the start of query evaluation (as of
Hive 1.2.0). All calls of current_timestamp within the same query
return the same value.
last_day(string date) Returns the last day of the month which the date belongs to (as of
Hive 1.1.0). date is a string in the format 'yyyy-MM-dd HH:mm:ss' or
'yyyy-MM-dd'. The time part of date is ignored.
add_months(string start_date, int Returns the date that is num_months after start_date (as of Hive
num_months) 1.1.0). start_date is a string, date or timestamp. num_months is an
integer. The time part of start_date is ignored. If start_date is the last
day of the month or if the resulting month has fewer days than the
day component of start_date, then the result is the last day of the
resulting month. Otherwise, the result has the same day component
as start_date.
next_day(string start_date, string Returns the first date which is later than start_date and named as
day_of_week) day_of_week (as of Hive 1.2.0). start_date is a
string/date/timestamp. day_of_week is 2 letters, 3 letters or full
name of the day of the week (e.g. Mo, tue, FRIDAY). The time part of
start_date is ignored.
Example: next_day('2015-01-14', 'TU') = 2015-01-20.
trunc(string date, string format) Returns date truncated to the unit specified by the format (as of Hive
1.2.0). Supported formats: MONTH/MON/MM, YEAR/YYYY/YY.
Example: trunc('2015-03-17', 'MM') = 2015-03-01.
months_between(date1, date2) Returns number of months between dates date1 and date2 (as of
Hive 1.2.0). If date1 is later than date2, then the result is positive. If
date1 is earlier than date2, then the result is negative. If date1 and
date2 are either the same days of the month or both last days of
months, then the result is always an integer. Otherwise the UDF
calculates the fractional portion of the result based on a 31-day month
and considers the difference in time components date1 and date2.
date1 and date2 type can be date, timestamp or string in the format
'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8
decimal places.
Example: months_between('1997-02-28 10:30:00', '1996-10-30') =
3.94959677
date_format(date/timestamp/ Converts a date/timestamp/string to a value of string in the format
string ts, string fmt) specified by the date format fmt (as of Hive 1.2.0). Supported formats
are Java SimpleDateFormat formats –
The second argument fmt should be constant. Example:
date_format('2015-04-08', 'y') = '2015'.
date_format can be used to implement other UDFs, e.g.:
• dayname(date) is date_format(date, 'EEEE')
• dayofyear(date) is date_format(date, 'D')
5. Conditional Functions

Name(Signature) Description
if(boolean testCondition, T valueTrue, T Returns valueTrue when testCondition is true, returns
valueFalseOrNull) valueFalseOrNull otherwise.
isnull( a ) Returns true if a is NULL and false otherwise.
isnotnull ( a ) Returns true if a is not NULL and false otherwise.
nvl(T value, T default_value) Returns default value if value is null else returns value (as
of HIve 0.11).
COALESCE(T v1, T v2, ...) Returns the first v that is not NULL, or NULL if all v's are
NULL.
assert_true(boolean condition) Throw an exception if 'condition' is not true, otherwise
return null (as of Hive 0.8.0). For example, select
assert_true (2<1).
CASE a When a = b, returns c; when a = d, returns e; else returns f.
WHEN b THEN c
[WHEN d THEN e]*
[ELSE f] END
CASE WHEN a THEN b [WHEN c THEN d]* When a = true, returns b;
[ELSE e] END when c = true, returns d;
else returns e.
nullif( a, b ) Returns NULL if a=b; otherwise returns a (as of Hive
2.2.0).
Shorthand for: CASE WHEN a = b then NULL else a

6. String Functions

Name(Signature) Description
ascii(string str) Returns the numeric value of the first character of str.
base64(binary bin) Converts the argument from binary to a base 64 string (as
of Hive 0.12.0).
character_length(string Returns the number of UTF-8 characters contained in str
str) (as of Hive 2.2.0). The function char_length is shorthand
for this function.
chr(bigint|double A) Returns the ASCII character having the binary equivalent to
A (as of Hive 1.3.0 and 2.1.0). If A is larger than 256 the
result is equivalent to chr(A % 256).
Example: select chr(88); returns "X".
concat(string|binary A, Returns the string or bytes resulting from concatenating the
string|binary B...) strings or bytes passed in as parameters in order.
Example, concat('foo', 'bar') results in 'foobar'. Note that
this function can take any number of input strings.
context_ngrams(array< Returns the top-k contextual N-grams from a set of
array<string>>, tokenized sentences, given a string of "context".
array<string>, int K, int
pf)
concat_ws(string SEP, Like concat() above, but with custom separator SEP.
string A, string B...)
concat_ws(string SEP, Like concat_ws() above, but taking an array of strings. (as
array<string>) of Hive 0.9.0)
in_file(string str, string Returns true if the string str appears as an entire line in
filename) filename.
decode(binary bin, Decodes the first argument into a String using the provided
string charset) character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8',
'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is
null, the result will also be null. (As of Hive 0.12.0.)
elt(N int,str1 string,str2 Return string at index number. For example
string,str3 string,...) elt(2,'hello','world') returns 'world'. Returns NULL if N is
less than 1 or greater than the number of arguments.
encode(string src, string Encodes the first argument into a BINARY using the
charset) provided character set (one of 'US-ASCII', 'ISO-8859-1',
'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either
argument is null, the result will also be null. (As of Hive
0.12.0.)
trim(string A) Returns the string resulting from trimming spaces from
both ends of A.
Example trim(' foobar ') results in 'foobar'
field(val T,val1 T,val2 Returns the index of val in the val1,val2,val3,... list or 0 if
T,val3 T,...) not found.
Example
field('world','say','hello','world') returns 3.
All primitive types are supported, arguments are compared
using str.equals(x). If val is NULL, the return value is 0.
find_in_set(string str, Returns the first occurance of str in strList where strList is
string strList) a comma-delimited string. Returns null if either argument
is null. Returns 0 if the first argument contains any
commas. For Example: , find_in_set('ab', 'abc,b,ab,c,def')
returns 3.
format_number(numbe Formats the number X to a format like '#,###,###.##',
r x, int d) rounded to D decimal places, and returns the result as a
string. If D is 0, the result has no decimal point or fractional
part. (As of Hive 0.10.0; bug with float types fixed in Hive
0.14.0, decimal type support added in Hive 0.14.0)
get_json_object(string Extracts json object from a json string based on json path
json_string, string path) specified, and returns json string of the extracted json
object. It will return null if the input json string is invalid.
NOTE: The json path can only have the characters [0-9a-
z_], i.e., no upper-case or special characters. Also, the keys
*cannot start with numbers.* This is due to restrictions on
Hive column names.
instr(string str, string Returns the position of the first occurrence of substr in str.
substr) Returns null if either of the arguments are null and returns
0 if substr could not be found in str. Be aware that this is
not zero based. The first character in str has index 1.
length(string A) Returns the length of the string.
locate(string substr, Returns the position of the first occurrence of substr in str
string str[, int pos]) after position pos.
lower(string A) Returns the string resulting from converting all characters
lcase(string A) of B to lower case.
Example : lower('fOoBaR') results in 'foobar'.
lpad(string str, int len, Returns str, left-padded with pad to a length of len. If str is
string pad) longer than len, the return value is shortened to len
characters. In case of empty pad string, the return value is
null.
ltrim(string A) Returns the string resulting from trimming spaces from the
beginning(left hand side) of A. Example, ltrim(' foobar ')
results in 'foobar '.
ngrams(array<array<str Returns the top-k N-grams from a set of tokenized
ing>>, int N, int K, int sentences, such as those returned by the sentences() UDAF.
pf)
printf(String format, Returns the input formatted according do printf-style
Obj... args) format strings (as of Hive 0.9.0).
octet_length(string str) Returns the number of octets required to hold the string str
in UTF-8 encoding (since Hive 2.2.0). Note that
octet_length(str) can be larger than character_length(str).
parse_url(string Returns the specified part from the URL. Valid values for
urlString, string partToExtract include HOST, PATH, QUERY, REF,
partToExtract [, string PROTOCOL, AUTHORITY, FILE, and USERINFO.
keyToExtract]) Example parse_url('http://facebook.com/path1/p.php?
k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. Also a
value of a particular key in QUERY can be extracted by
providing the key as the third argument,
Example parse_url('http://facebook.com/path1/p.php?
k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'.
regexp_extract(string Returns the string extracted using the pattern. Example,
subject, string pattern, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.'
int index) Note that some care is necessary in using predefined
character classes: using '\s' as the second argument will
match the letter s; '\\s' is necessary to match whitespace,
etc. The 'index' parameter is the Java regex Matcher
group() method index
regexp_replace(string Returns the string resulting from replacing all substrings in
INITIAL_STRING, INITIAL_STRING that match the java regular expression
string PATTERN, string syntax defined in PATTERN with instances of
REPLACEMENT) REPLACEMENT. For example, regexp_replace("foobar",
"oo|ar", "") returns 'fb.' Note that some care is necessary in
using predefined character classes: using '\s' as the second
argument will match the letter s; '\\s' is necessary to match
whitespace, etc.
repeat(string str, int n) Repeats str n times.
replace(string A, string Returns the string A with all non-overlapping occurrences
OLD, string NEW) of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0).
Example: select replace("ababab", "abab", "Z"); returns
"Zab".
reverse(string A) Returns the reversed string.
rpad(string str, int len, Returns str, right-padded with pad to a length of len. If str
string pad) is longer than len, the return value is shortened to len
characters. In case of empty pad string, the return value is
null.
rtrim(string A) Returns the string resulting from trimming spaces from the
end(right hand side) of A. Example, rtrim(' foobar ') results
in ' foobar'.
str_to_map(text[, Splits text into key-value pairs using two delimiters.
delimiter1, delimiter2]) Delimiter1 separates text into K-V pairs, and Delimiter2
splits each K-V pair. Default delimiters are ',' for delimiter1
and ':' for delimiter2.
split(string str, string Splits str around pat (pat is a regular expression).
pat)
sentences(string str, Tokenizes a string of natural language text into words and
string lang, string sentences, where each sentence is broken at the appropriate
locale) sentence boundary and returned as an array of words. The
'lang' and 'locale' are optional arguments.
Example, sentences('Hello there! How are you?') returns
( ("Hello", "there"), ("How", "are", "you") ).
space(int n) Returns a string of n spaces.
substr(string|binary A, Returns the substring or slice of the byte array of A starting
int start) from start position till the end of string A.
substring(string|binary Example, substr('foobar', 4) results in 'bar' .
A, int start)
substr(string|binary A, Returns the substring or slice of the byte array of A starting
int start, int len) from start position with length len. Example,
substring(string|binary substr('foobar', 4, 1) results in 'b'
A, int start, int len)
substring_index(string Returns the substring from string A before count
A, string delim, int occurrences of the delimiter delim (as of Hive 1.3.0). If
count) count is positive, everything to the left of the final delimiter
(counting from the left) is returned. If count is negative,
everything to the right of the final delimiter (counting from
the right) is returned. Substring_index performs a case-
sensitive match when searching for delim. Example:
substring_index('www.apache.org', '.', 2) = 'www.apache'.
translate(string|char| Translates the input string by replacing the characters
varchar input, string| present in the from string with the corresponding
char|varchar from, characters in the to string. This is similar to the translate
string|char|varchar to) function in PostgreSQL. If any of the parameters to this
UDF are NULL, the result is NULL as well. (Available as of
Hive 0.10.0, for string types)
Char/varchar support added as of Hive 0.14.0.
unbase64(string str) Converts the argument from a base 64 string to BINARY.
(As of Hive 0.12.0.)
upper(string A) Returns the string resulting from converting all characters
ucase(string A) of A to upper case
Example
upper('fOoBaR') results in 'FOOBAR'.
initcap(string A) Returns string, with the first letter of each word in
uppercase, all other letters in lowercase. Words are
delimited by whitespace. (As of Hive 1.1.0.)
levenshtein(string A, Returns the Levenshtein distance between two strings (as of
string B) Hive 1.2.0).
Example levenshtein('kitten', 'sitting') results in 3.
soundex(string A) Returns soundex code of the string (as of Hive 1.2.0).
Example soundex('Miller') results in M460.

7. Data Masking Functions

Name(Signature) Description
mask_hash(string|char| Returns a hashed value based on str (as of Hive 2.1.0).
varchar str) The hash is consistent and can be used to join masked
values together across tables. This function returns null
for non-string types.
mask(string str[, string Returns a masked version of str (as of Hive 2.1.0). By
upper[, string lower[, default, upper case letters are converted to "X", lower case
string number]]]) letters are converted to "x" and numbers are converted to
"n". For example mask("abcd-EFGH-8765-4321") results
in xxxx-XXXX-nnnn-nnnn. You can override the
characters used in the mask by supplying additional
arguments: the second argument controls the mask
character for upper case letters, the third argument for
lower case letters and the fourth argument for numbers.
Example, mask("abcd-EFGH-8765-4321", "U", "l", "#")
results in llll-UUUU-####-####.
mask_first_n(string str[, Returns a masked version of str with the first n values
int n]) masked (as of Hive 2.1.0). Upper case letters are
converted to "X", lower case letters are converted to "x"
and numbers are converted to "n".
Example, mask_first_n("1234-5678-8765-4321", 4)
results in nnnn-5678-8765-4321.
mask_last_n(string str[, Returns a masked version of str with the last n values
int n]) masked (as of Hive 2.1.0). Upper case letters are
converted to "X", lower case letters are converted to "x"
and numbers are converted to "n".
Example, mask_last_n("1234-5678-8765-4321", 4)
results in 1234-5678-8765-nnnn.
mask_show_first_n(strin Returns a masked version of str, showing the first n
g str[, int n]) characters unmasked (as of Hive 2.1.0). Upper case letters
are converted to "X", lower case letters are converted to
"x" and numbers are converted to "n".
Example, mask_show_first_n("1234-5678-8765-4321",
4) results in 1234-nnnn-nnnn-nnnn.
mask_show_last_n(strin Returns a masked version of str, showing the last n
g str[, int n]) characters unmasked (as of Hive 2.1.0). Upper case letters
are converted to "X", lower case letters are converted to
"x" and numbers are converted to "n".
Example, mask_show_last_n("1234-5678-8765-4321",
4) results in nnnn-nnnn-nnnn-4321.
8. Misc. Functions

Name(Signature) Description
java_method(class, method[, arg1[, Synonym for reflect. (As of Hive 0.9.0.)
arg2..]])
reflect(class, method[, arg1[, arg2..]]) Calls a Java method by matching the argument signature, using
reflection. (As of Hive 0.7.0.)
hash(a1[, a2...]) Returns a hash value of the arguments. (As of Hive 0.4.)
logged_in_user() Returns current user name from the session state (as of Hive 2.2.0).
This is the username provided when connecting to Hive.
current_user() Returns current user name from the configured authenticator
manager (as of Hive 1.2.0). Could be the same as the user provided
when connecting, but with some authentication managers (for
example HadoopDefaultAuthenticator) it could be different.
current_database() Returns current database name (as of Hive 0.13.0).
md5(string/binary) Calculates an MD5 128-bit checksum for the string or binary (as of
Hive 1.3.0). The value is returned as a string of 32 hex digits, or
NULL if the argument was NULL. Example: md5('ABC') =
'902fbdd2b1df0c4f70b4a5d23525e932'.
sha1(string/binary) Calculates the SHA-1 digest for string or binary and returns the value
sha(string/binary) as a hex string (as of Hive 1.3.0). Example: sha1('ABC') =
'3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'.
crc32(string/binary) Computes a cyclic redundancy check value for string or binary
argument and returns bigint value (as of Hive 1.3.0).
Example: crc32('ABC') = 2743272264.
sha2(string/binary, int) Calculates the SHA-2 family of hash functions (SHA-224, SHA-256,
SHA-384, and SHA-512) (as of Hive 1.3.0). The first argument is the
string or binary to be hashed. The second argument indicates the
desired bit length of the result, which must have a value of 224, 256,
384, 512, or 0 (which is equivalent to 256). SHA-224 is supported
starting from Java 8. If either argument is NULL or the hash length
is not one of the permitted values, the return value is NULL.
Example: sha2('ABC', 256) =
'b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e278
9df78'.
aes_encrypt(input string/binary, key Encrypt input using AES (as of Hive 1.3.0). Key lengths of 128, 192 or
string/binary) 256 bits can be used. 192 and 256 bits keys can be used if Java
Cryptography Extension (JCE) Unlimited Strength Jurisdiction
Policy Files are installed. If either argument is NULL or the key
length is not one of the permitted values, the return value is NULL.
Example: base64(aes_encrypt('ABC', '1234567890123456')) =
'y6Ss+zCYObpCbgfWfyNWTw=='.
aes_decrypt(input binary, key Decrypt input using AES (as of Hive 1.3.0). Key lengths of 128, 192 or
string/binary) 256 bits can be used. 192 and 256 bits keys can be used if Java
Cryptography Extension (JCE) Unlimited Strength Jurisdiction
Policy Files are installed. If either argument is NULL or the key
length is not one of the permitted values, the return value is NULL.
Example: aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='),
'1234567890123456') = 'ABC'.
version() Returns the Hive version (as of Hive 2.1.0). The string contains 2
fields, the first being a build number and the second being a build
hash.
Example: "select version();" might return "2.1.0.2.5.0.0-1245
r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results
will depend on your build.

9. Table-Generating Functions (UDTF)


Normal user-defined functions, such as concat(), take in a single input row and output a single output row. In
contrast, table-generating functions transform a single input row to multiple output rows.
Name(Signature) Description
explode(ARRAY<T> a) Explodes an array to multiple rows. Returns a row-set
with a single column (col), one row for each element
from the array.
explode(MAP<Tkey,Tvalue> m) Explodes a map to multiple rows. Returns a row-set
with a two columns (key,value) , one row for each key-
value pair from the input map. (As of Hive 0.8.0.).
posexplode(ARRAY<T> a) Explodes an array to multiple rows with additional
positional column of int type (position of items in the
original array, starting with 0). Returns a row-set with
two columns (pos,val), one row for each element from
the array.
inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a) Explodes an array of structs to multiple
rows. Returns a row-set with N columns (N = number
of top level elements in the struct), one row per struct
from the array. (As of Hive 0.10.)
stack(int r,T1 V1,...,Tn/r Vn) Breaks up n values V1,...,Vn into r rows. Each row will
have n/r columns. r must be constant.

json_tuple(string jsonStr,string k1,...,string kn) Takes JSON string and a set of n keys, and returns a
tuple of n values. This is a more efficient version of
the get_json_object UDF because it can get multiple
keys with just one call.
parse_url_tuple(string urlStr,string p1,...,string pn Takes URL string and a set of n URL parts, and returns
) a tuple of n values. This is similar to
the parse_url() UDF but can extract multiple parts at
once out of a URL. Valid part names are: HOST,
PATH, QUERY, REF, PROTOCOL, AUTHORITY,
FILE, USERINFO, QUERY:<KEY>.

10. Built-in Aggregate Functions (UDAF)

Name(Signature) Example
count(*), count(*) – Returns the total number of retrieved rows, including rows
count(expr), containing NULL values; count(expr) – Returns the number of rows for
count(DISTINCT expr[, which the supplied expression is non-NULL; count(DISTINCT expr[,
expr_.]) expr]) – Returns the number of rows for which the supplied
expression(s) are unique and non-NULL.
sum(col), Returns the sum of the elements in the group or the sum of the distinct
sum(DISTINCT col) values of the column in the group
avg(col), Returns the average of the elements in the group or the average of the
avg(DISTINCT col) distinct values of the column in the group
min(col) Returns the minimum of the column in the group
max(col) Returns the maximum value of the column in the group
variance(col), Returns the variance of a numeric column in the group
var_pop(col)
var_samp(col) Returns the unbiased sample variance of a numeric column in the
group
stddev_pop(col) Returns the standard deviation of a numeric column in the group
stddev_samp(col) Returns the unbiased sample standard deviation of a numeric column
in the group
covar_pop(col1, col2) Returns the population covariance of a pair of numeric columns in the
group
covar_samp(col1, col2) Returns the sample covariance of a pair of a numeric columns in the
group
corr(col1, col2) Returns the Pearson coefficient of correlation of a pair of a numeric
columns in the group
percentile(BIGINT col, p) Returns the exact pth percentile of a column in the group (does not work
with floating point types). p must be between 0 and 1. NOTE: A true
percentile can only be computed for integer values. Use
PERCENTILE_APPROX if your input is non-integral.
percentile(BIGINT col, Returns the exact percentiles p1, p2, … of a column in the group (does
array(p1 [, p2]…)) not work with floating point types). pi must be between 0 and 1. NOTE:
A true percentile can only be computed for integer values. Use
PERCENTILE_APPROX if your input is non-integral.
percentile_approx(DOUBLE Returns an approximate pth percentile of a numeric column (including
col, p [, B]) floating point types) in the group. The B parameter controls
approximation accuracy at the cost of memory. Higher values yield
better approximations, and the default is 10,000. When the number of
distinct values in col is smaller than B, this gives an exact percentile
value.
percentile_approx(DOUBLE Same as above, but accepts and returns an array of percentile values
col, array(p1 [, p2]…) [, instead of a single one.
B])
histogram_numeric(col, b) Computes a histogram of a numeric column in the group using b non-
uniformly spaced bins. The output is an array of size b of double-valued
(x,y) coordinates that represent the bin centers and heights
collect_set(col) Returns a set of objects with duplicate elements eliminated
collect_list(col) Returns a list of objects with duplicates. (As of Hive 0.13.0.)
ntile(INTEGER x) Divides an ordered partition into x groups called buckets and assigns a
bucket number to each row in the partition. This allows easy calculation
of tertiles, quartiles, deciles, percentiles and other common summary
statistics. (As of Hive 0.11.0.)

HIVE FUNCTION META COMMANDS


 SHOW FUNCTIONS– lists Hive functions and operators
 DESCRIBE FUNCTION [function name]– displays short description of the function
 DESCRIBE FUNCTION EXTENDED [function name]– access extended description of the function

You might also like