分区表

由于hive作为数据仓库来说，存放的数据量特别大，而在进行数据分析的时候，其实并不需要分析这么大的数据，为了避免全表扫描，hive提供了分区表，分区表其实就是分为不同的文件夹，把一个大的业务表根据业务将数据集进行切分，这样在进行数据查询的时候指定分区进行查询，效率会提高很多倍

之前在说创建表语句时，创建表语法中有一个创建分区表的关键字PARTITIONED BY，可以根据该语法来进行创建分区表

创建分区表

create table if not exists test_partitioned
(id int,money double)
partitioned by (`date` string)
row format delimited fields terminated by '\t';

插入数据

insert into test_partitioned values (1,20.0,'20210413');
insert into test_partitioned values (2,50.0,'20210413');
insert into test_partitioned values (3,80.0,'20210414');

-- 或者导数据
load data local inpath 
'/Users/zhanghe/Desktop/user/myself/hive_data/test.txt' into table test_partitioned 
partition(`date`='20210413');

根据分区进行查询

查询时要使用分区表字段来进行查询以避免进行全表扫描

select * from test_partitioned where `date`='20210414';

查看分区

show partitions test_partitioned;
--
partition
date=20210413
date=20210414

增加分区

除了在插入数据的时候根据字段值进行增加分区之外，还可以手动增加分区

-- 增加一个分区
alter table test_partitioned add partition(`date`='20210410');
-- 同时增加多个分区
alter table test_partitioned add partition(`date`='20200411') partition(`date`='20200412');

删除分区

-- 删除单个分区
alter table test_partitioned drop partition (`date`='20210410');
-- 同时删除多个分区
alter table test_partitioned drop partition(`date`='20200411') partition(`date`='20200412');

二级分区

有时候数据量特别大，可能一天的数据也很多，可以再次进行分区，进行二级分区

create table if not exists test_partitioned
(id int,money double)
partitioned by (`date` string,hour string)
row format delimited fields terminated by '\t';