# 用来给某个字段赋默认值 nvl(value,default_value) - Returns default value if value is null else returns value Example: > SELECT nvl(null,'bla') FROM src LIMIT 1; bla
concat字符拼接
concat
1 2 3 4 5 6 7 8
# 返回字符串拼接之后的结果,参数为多个字符串 concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data bin1, bin2, ... binN Returns NULL if any argument is NULL. 示例: hive (study_hive)> select concat('1','aa'); OK _c0 1aa
concat_ws
1 2 3 4 5 6 7
# 第一个参数为分隔符,剩余参数可以使字符串也可以是数组 concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator. 示例: hive (study_hive)> select concat_ws(',','aa','bb'); OK _c0 aa,bb
collect_set汇总
1 2 3 4 5 6 7
# 将数据汇总后转为数组输出 collect_set(x) - Returns a setof objects withduplicate elements eliminated 示例: hive (study_hive)> select collect_set('aa'); OK _c0 ["aa"]
split切割
1 2 3 4 5 6 7
# 根据表达式进行切割字符串 split(str, regex) - Splits str around occurances that match regex 示例: hive (study_hive)> select split('a,b,c',','); OK _c0 ["a","b","c"]
explode拆分
1 2 3 4 5 6 7 8 9
# 将数组或者map结构的数据拆分为多行 explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns 示例: hive (study_hive)> select explode(split('a,b,c',',')); OK col a b c
lateral view(侧视图)可以配合explode来进行使用,将单行数据拆解成多行的数据集
1 2 3
-- 这里的category存储的是视频的分类数组 -- lateral view explode(category) temp就相当于一个虚拟表,之后会与原表video的该行数据进行笛卡尔积操作 select category_name from video lateralviewexplode(category) temp as category_name;
hive (study_hive)> select id,sum(money) over() from test_partitioned where `date` = '20210413'; OK id sum_window_0 2 70.0 1 70.0
在over中可以对数据窗口进行限制,如使用partition by 根据字段的值进行限制,sum所对应的也是按照该数据集中的id进行分组然后在进行求和
1 2 3 4 5 6 7
hive (study_hive)> select id,sum(money) over(partition by `date`) from test_partitioned; OK id sum_window_0 3 80.0 6 80.0 2 70.0 1 70.0
高级用法
over中的可用窗口函数,在使用order by时使用
如果order by之后没有使用特殊的窗口函数,默认则是RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(从开始到当前行)
如果没有使用order by也没有使用特殊的窗口函数,默认则是ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING(从头到尾)
1 2 3 4 5 6
CURRENT ROW:当前行 n PRECEDING:往前 n 行数据 n FOLLOWING:往后 n 行数据 UNBOUNDED:起点, UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING 表示到后面的终点
语法
1 2 3
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
示例:
1 2
-- 按照c分组 按照d排序 取前三行和后三行的平均值 SELECT a, AVG(b) OVER (PARTITIONBY c ORDERBY d ROWSBETWEEN3PRECEDINGAND3FOLLOWING) FROM T;
over前的窗口函数
1 2 3
LAG(col,n,default_val):往前第 n 行数据 LEAD(col,n, default_val):往后第 n 行数据 NTILE(n):把有序窗口的行分发到指定数据的组中,各个组有编号,编号从 1 开始,对于每一行,NTILE 返回此行所属的组的编号。注意:n 必须为 int 类型。
示例:
1
SELECT a, LAG(a, 3, 0) OVER (PARTITIONBY b ORDERBY C)FROM T;