sqoop导入

sqoop导入是指从关系型数据库向大数据集群中(hdfs/hive/hbase)中传输数据，使用import

可以使用多种方式来进行导入

全部导入到hdfs

可以将整张表数据进行导入到hdfs中

sqoop import \
--connect jdbc:mysql://localhost:3306/company \
--username root \
--password 123456 \
--table staff \
--target-dir /user/sqoop/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"

将staff表中的数据全部导入到hdfs中/user/sqoop/company目录下，字段之间使用\t分隔

根据sql查询导入到hdfs中

sqoop import \
--connect jdbc:mysql://localhost:3306/company \
--username root \
--password 123456 \
--query 'select name,sex,age from staff where sex = 0 and $CONDITIONS;' \
--target-dir /user/sqoop/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"

根据sql语句select name,sex,age from staff where sex = 0来查询数据，并将数据导入到hdfs的/user/sqoop/company目录下

$CONDITIONS必须要有，否则会报错Query [select name,sex,age from staff where sex = 0;] must contain '$CONDITIONS' in WHERE clause $CONDITIONS是sqoop进程用来替换条件表达式的

使用--query可以代替--table、--columns、--where等参数

如果query后使用的是双引号，则$CONDITIONS前必须加转移符，防止shell识别为自己的变量

--query "select name,sex,age from staff where sex = 0 and \\$CONDITIONS;"

导入指定列到hdfs中

sqoop import \
--connect jdbc:mysql://localhost:3306/company \
--username root \
--password 123456 \
--columns name,sex \
--table staff \
--target-dir /user/sqoop/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" 

使用条件查询导入到hdfs中

sqoop import \
--connect jdbc:mysql://localhost:3306/company \
--username root \
--password 123456 \
--table staff \
--where "sex=0" \
--target-dir /user/sqoop/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"

导入到hive

sqoop import \
--connect jdbc:mysql://localhost:3306/company \
--username root \
--password 123456 \
--table staff \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive

这个需要hive-common、hive-exec包依赖，将jar包放入到sqoop的lib目录下

导入到hive存在两个步骤，第一步先将数据导入到hdfs，在将hdfs中的数据迁移到hive

参数说明

参数	说明
`--append`	文件追加写入
`--connect`	连接关系型数据库的URL
`--connection-manager`	指定要使用的连接管理类
`--driver`	Hadoop根目录
`--help`	打印帮助信息
`--password`	连接数据库的密码
`--username`	连接数据库的用户名
`--verbose`	在控制台打印出详细信息
`--target-dir`	HDF上的目标路径
`--table`	表的名称
`--enclosed-by <char>`	给字段值前加上指定的字符
`--escaped-by <char>`	对字段中的双引号加转义符
`--fields-terminated-by <char>`	设定每个字段是以什么符号作为结束，默认为逗号
`--lines-terminated-by <char>`	设定每行记录之间的分隔符，默认是\n
`--mysql-delimiters`	Mysql默认的分隔符设置，字段之间以逗号分隔，行之间以\n分隔，默认转义符是\，字段值以单引号包裹
`--optionally-enclosed-by <char>`	给带有双引号或单引号的字段值前后加上指定字符