- 浏览: 151022 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (113)
- Java工具类 (3)
- Hibernate (1)
- Struts>1&2 (0)
- Spring (0)
- Web前端 (1)
- 数据库 (2)
- 中间件 (3)
- 搜索引擎 (11)
- EJB (1)
- Log日志 (1)
- OS操作系统 (7)
- Java文档 (4)
- Security (1)
- Exception&Error (7)
- 日志&心情 (7)
- 开心一下 (3)
- 云计算 (41)
- Cache缓存 (4)
- MongoDB (1)
- OS操作系统 Linux (7)
- jquery (1)
- 机器学习 (2)
- Plugins (2)
- AJAX (1)
- Java工具 (1)
- WebService (1)
最新评论
-
a420144030:
你好,我想计算n篇文章的相似度,用mahout能处理吗,如何做 ...
mahout使用 -
di1984HIT:
OK,真的很牛啊
hadoop常见错误 -
xwl1991:
还真是这个问题 !
quartz报错java.lang.NoSuchMethodError: org.apache.commons.collections.SetUtils.ord
从hive将数据导出到mysql
- 博客分类:
- 云计算
转自:http://abloz.com/2012/07/20/export-data-to-mysql-from-the-hive.html
http://abloz.com
2012.7.20
author:周海汉
在上一篇文章《》中,提到sqoop可以让RDBMS和HDFS之间互导数据,并且也支持从mysql中导入到HBase,但从HBase直接导入mysql则不是直接支持,而是间接支持。要么将HBase导出到HDFS平面文件,要么将其导出到Hive中,再导出到mysql。本篇讲从hive中导出到mysql。
从hive将数据导出到mysql
一、创建mysql表
mysql> create table award (rowkey varchar(255), productid int, matchid varchar(255), rank varchar(255), tourneyid varchar(255), userid bigint, gameid int, gold int, loginid varchar(255), nick varchar(255), plat varchar(255)); Query OK, 0 rows affected (0.01 sec)
二、尝试用hive作为外部数据库连接hbase,导入mysql
hive> CREATE EXTERNAL TABLE hive_award(key string, productid int,matchid string, rank string, tourneyid string, userid bigint,gameid int,gold int,loginid string,nick string,plat string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:MPID,info:MatchID,info:Rank,info:TourneyID,info:UserId,info:gameID,info:gold,info:loginId,info:nickName,info:platform") TBLPROPERTIES("hbase.table.name" = "award");
hive> desc hive_award; key string from deserializer productid int from deserializer matchid string from deserializer rank string from deserializer tourneyid string from deserializer userid bigint from deserializer gameid int from deserializer gold int from deserializer loginid string from deserializer nick string from deserializer plat string from deserializer [zhouhh@Hadoop46 ~]$ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-xr-x - zhouhh supergroup 0 2012-07-16 14:08 /user/hive/warehouse/hive_award drwxr-xr-x - zhouhh supergroup 0 2012-07-16 14:30 /user/hive/warehouse/nnnon drwxr-xr-x - zhouhh supergroup 0 2012-07-16 13:53 /user/hive/warehouse/test222
[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_award --input-fields-terminated-by '\0001' 12/07/19 16:13:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/07/19 16:13:06 INFO tool.CodeGenTool: Beginning code generation 12/07/19 16:13:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `award` AS t LIMIT 1 12/07/19 16:13:06 INFO orm.CompilationManager: HADOOP_HOME is /home/zhouhh/hadoop-1.0.0/libexec/.. 注: /tmp/sqoop-zhouhh/compile/4366149f0b6dd311c5b622594744fbb0/award.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 12/07/19 16:13:08 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zhouhh/compile/4366149f0b6dd311c5b622594744fbb0/award.jar 12/07/19 16:13:08 INFO mapreduce.ExportJobBase: Beginning export of award 12/07/19 16:13:09 WARN mapreduce.ExportJobBase: Input path hdfs://Hadoop46:9200/user/hive/warehouse/hive_award contains no files 12/07/19 16:13:11 INFO input.FileInputFormat: Total input paths to process : 0 12/07/19 16:13:11 INFO input.FileInputFormat: Total input paths to process : 0 12/07/19 16:13:13 INFO mapred.JobClient: Running job: job_201207191159_0059 12/07/19 16:13:14 INFO mapred.JobClient: map 0% reduce 0% 12/07/19 16:13:26 INFO mapred.JobClient: Job complete: job_201207191159_0059 12/07/19 16:13:26 INFO mapred.JobClient: Counters: 4 12/07/19 16:13:26 INFO mapred.JobClient: Job Counters 12/07/19 16:13:26 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7993 12/07/19 16:13:26 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/19 16:13:26 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/19 16:13:26 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/07/19 16:13:26 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 16.9678 seconds (0 bytes/sec) 12/07/19 16:13:26 INFO mapreduce.ExportJobBase: Exported 0 records.
直接导外部表不成功,Input path hdfs://Hadoop46:9200/user/hive/warehouse/hive_award contains no files
三、hive中创建连结hbase的表,在hive中的插入会引起hbase的数据改变:
CREATE TABLE hive_award_data(key string,productid int,matchid string,rank string, tourneyid string,userid bigint,gameid int, gold int,loginid string,nick string,plat string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:MPID,info:MatchID,info:Rank,info:TourneyID,info:UserId,info:gameID,info:gold,info:loginId,info:nickName,info:platform") TBLPROPERTIES("hbase.table.name" = "award_test"); hive> insert overwrite table hive_award_data select * from hive_award limit 2; hbase(main):014:0> scan 'award_test' ROW COLUMN+CELL 2012-04-27 06:55:00:402713629 column=info:MPID, timestamp=1342754799918, value=5947 2012-04-27 06:55:00:402713629 column=info:MatchID, timestamp=1342754799918, value=433203828 2012-04-27 06:55:00:402713629 column=info:Rank, timestamp=1342754799918, value=2 2012-04-27 06:55:00:402713629 column=info:TourneyID, timestamp=1342754799918, value=4027102 2012-04-27 06:55:00:402713629 column=info:UserId, timestamp=1342754799918, value=402713629 2012-04-27 06:55:00:402713629 column=info:gameID, timestamp=1342754799918, value=1001 2012-04-27 06:55:00:402713629 column=info:loginId, timestamp=1342754799918, value=715878221 2012-04-27 06:55:00:402713629 column=info:nickName, timestamp=1342754799918, value=xxx 2012-04-27 06:55:00:402713629 column=info:platform, timestamp=1342754799918, value=ios 2012-04-27 06:55:00:402713629 column=info:userid, timestamp=1342754445451, value=402713629 2012-04-27 06:55:00:406788559 column=info:MPID, timestamp=1342754799918, value=778 2012-04-27 06:55:00:406788559 column=info:MatchID, timestamp=1342754799918, value=433203930 2012-04-27 06:55:00:406788559 column=info:Rank, timestamp=1342754799918, value=19 2012-04-27 06:55:00:406788559 column=info:TourneyID, timestamp=1342754799918, value=4017780 2012-04-27 06:55:00:406788559 column=info:UserId, timestamp=1342754799918, value=406788559 2012-04-27 06:55:00:406788559 column=info:gameID, timestamp=1342754799918, value=1001 2012-04-27 06:55:00:406788559 column=info:gold, timestamp=1342754799918, value=1 2012-04-27 06:55:00:406788559 column=info:loginId, timestamp=1342754799918, value=13835155880 2012-04-27 06:55:00:406788559 column=info:nickName, timestamp=1342754799918, value=xxx 2012-04-27 06:55:00:406788559 column=info:platform, timestamp=1342754799918, value=android 2 row(s) in 0.0280 seconds [zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_award_data --input-fields-terminated-by '\0001' 12/07/20 11:32:01 WARN mapreduce.ExportJobBase: Input path hdfs://Hadoop46:9200/user/hive/warehouse/hive_award_data contains no files
创建连接HBase的表,还是不能导入。
四、创建Hive表,将HBase外部表的数据导入
hive> CREATE TABLE hive_myaward(key string,productid int,matchid string,rank string,tourneyid string,userid bigint,gameid int,gold int,loginid string,nick string,plat string); hive> insert overwrite table hive_myaward select * from hive_award limit 2;
hive> select * from hive_myaward; OK 2012-04-27 06:55:00:402713629 5947 433203828 2 4027102 402713629 1001 NULL 715878221 杀破天A ios 2012-04-27 06:55:00:406788559 778 433203930 19 4017780 406788559 1001 1 13835155880 亲牛牛旦旦 android Time taken: 2.257 seconds [zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward --input-fields-terminated-by '\0001' java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user ''@'Hadoop48' to database 'toplists'
权限问题,再授权一下
mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'Hadoop48'; Query OK, 0 rows affected (0.03 sec) mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'localhost'; Query OK, 0 rows affected (0.00 sec)
五、解决Hive中遇到的空值NULL的问题:
[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward --input-fields-terminated-by '\0001' ... 12/07/20 11:49:25 INFO mapred.JobClient: map 0% reduce 0% 12/07/20 11:49:37 INFO mapred.JobClient: Task Id : attempt_201207191159_0227_m_000000_0, Status : FAILED java.lang.NumberFormatException: For input string: "\N" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
\N是什么东西呢?
[zhouhh@Hadoop46 ~]$ hadoop fs -cat /user/hive/warehouse/hive_myaward/000000_0 2012-04-27 06:55:00:4027136295947433203828240271024027136291001\N715878221杀破天Aios 2012-04-27 06:55:00:4067885597784332039301940177804067885591001113835155880亲牛牛旦旦android
hive> select * from hive_myaward; OK 2012-04-27 06:55:00:402713629 5947 433203828 2 4027102 402713629 1001 NULL 715878221 杀破天A ios 2012-04-27 06:55:00:406788559 778 433203930 19 4017780 406788559 1001 1 13835155880 亲牛牛旦旦 android Time taken: 2.257 seconds
由于Hive的NULL用\N来表示,字段用\01来分割,换行用\n来换行,所以需增加相应的指示,注意转义字符\:
见:https://issues.cloudera.org/browse/SQOOP-188
[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward/000000_0 --input-null-string "\\\\N" --input-null-non-string "\\\\N" --input-fields-terminated-by "\\01" --input-lines-terminated-by "\\n"
12/07/20 12:53:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/07/20 12:53:56 INFO tool.CodeGenTool: Beginning code generation 12/07/20 12:53:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `award` AS t LIMIT 1 12/07/20 12:53:56 INFO orm.CompilationManager: HADOOP_HOME is /home/zhouhh/hadoop-1.0.0/libexec/.. 注: /tmp/sqoop-zhouhh/compile/4427d3db678bb145c995073e0924dc0b/award.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 12/07/20 12:53:57 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-zhouhh/compile/4427d3db678bb145c995073e0924dc0b/award.java to /home/zhouhh/./award.java 12/07/20 12:53:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zhouhh/compile/4427d3db678bb145c995073e0924dc0b/award.jar 12/07/20 12:53:57 INFO mapreduce.ExportJobBase: Beginning export of award 12/07/20 12:53:58 INFO input.FileInputFormat: Total input paths to process : 1 12/07/20 12:53:58 INFO input.FileInputFormat: Total input paths to process : 1 12/07/20 12:53:58 INFO mapred.JobClient: Running job: job_201207191159_0232 12/07/20 12:53:59 INFO mapred.JobClient: map 0% reduce 0% 12/07/20 12:54:12 INFO mapred.JobClient: map 100% reduce 0% 12/07/20 12:54:17 INFO mapred.JobClient: Job complete: job_201207191159_0232 12/07/20 12:54:17 INFO mapred.JobClient: Counters: 18 12/07/20 12:54:17 INFO mapred.JobClient: Job Counters 12/07/20 12:54:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12114 12/07/20 12:54:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/20 12:54:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/20 12:54:17 INFO mapred.JobClient: Rack-local map tasks=1 12/07/20 12:54:17 INFO mapred.JobClient: Launched map tasks=1 12/07/20 12:54:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/07/20 12:54:17 INFO mapred.JobClient: File Output Format Counters 12/07/20 12:54:17 INFO mapred.JobClient: Bytes Written=0 12/07/20 12:54:17 INFO mapred.JobClient: FileSystemCounters 12/07/20 12:54:17 INFO mapred.JobClient: HDFS_BYTES_READ=335 12/07/20 12:54:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=30172 12/07/20 12:54:17 INFO mapred.JobClient: File Input Format Counters 12/07/20 12:54:17 INFO mapred.JobClient: Bytes Read=0 12/07/20 12:54:17 INFO mapred.JobClient: Map-Reduce Framework 12/07/20 12:54:17 INFO mapred.JobClient: Map input records=2 12/07/20 12:54:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=78696448 12/07/20 12:54:17 INFO mapred.JobClient: Spilled Records=0 12/07/20 12:54:17 INFO mapred.JobClient: CPU time spent (ms)=390 12/07/20 12:54:17 INFO mapred.JobClient: Total committed heap usage (bytes)=56623104 12/07/20 12:54:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=891781120 12/07/20 12:54:17 INFO mapred.JobClient: Map output records=2 12/07/20 12:54:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=123 12/07/20 12:54:17 INFO mapreduce.ExportJobBase: Transferred 335 bytes in 19.6631 seconds (17.037 bytes/sec) 12/07/20 12:54:17 INFO mapreduce.ExportJobBase: Exported 2 records.
导出到mysql成功
mysql> use toplists; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A
Database changed mysql> select * from award; +-------------------------------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-------+---------+ | rowkey | productid | matchid | rank | tourneyid | userid | gameid | gold | loginid | nick | plat | +-------------------------------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-------+---------+ | 2012-04-27 06:55:00:402713629 | 5947 | 433203828 | 2 | 4027102 | 402713629 | 1001 | NULL | 715878221 | ???A | ios | | 2012-04-27 06:55:00:406788559 | 778 | 433203930 | 19 | 4017780 | 406788559 | 1001 | 1 | 13835155880 | ????? | android | +-------------------------------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-------+---------+ 2 rows in set (0.00 sec)
虽然mysql中有了数据,不过,导入的却是乱码
在《Hive导出到Mysql中中文乱码的问题
》这篇文章中继续解决。
发表评论
-
用Sqoop把数据从HDFS导入到MYSQL
2012-10-27 23:30 2315转自:http://www.kaifajie.cn/mySQL ... -
大数据的储存:百度HDFS集群的数据压缩
2012-08-30 17:48 11512012年4月13日,由IT168(ITP ... -
用sqoop进行mysql和hdfs系统间的数据互导
2012-07-31 16:32 1878转自:http://abloz.com/2012/07/19/ ... -
Hive SQL
2012-07-19 13:49 1397转自:http://www.tbdata.org/ ... -
Hive优化总结
2012-07-09 15:18 4263优化时,把hive sql当做ma ... -
hadoop/mapred 优化方法
2012-06-12 11:47 1368从三个方面着手优化 : 1. hadoop配置 2. ... -
hadoop集群dfs.data.dir目录权限问题导致节点无法启动
2012-06-11 18:32 6185安装使用Hive的时候发现文件无法从本地拷贝到hadoo ... -
hadoop中的balancer
2012-06-11 15:49 1101对于HDFS集群,经常长时间的运行,尤其是大量的delete操 ... -
Hadoop集群上使用Lzo压缩
2012-05-28 11:03 900转自:http://www.tech126.com/hadoo ... -
使用Hive读取Hbase中的数据
2012-05-23 13:12 2238转自:http://www.4ucode.com/Study/ ... -
在XP系统下配置eclipse集成hadoop开发
2012-05-05 11:09 1009一、目的:由于hadoop只能在Linux环境下运行,所以对于 ... -
使用Hive读取Hbase中的数据
2012-04-18 14:24 1053转自:http://www.4ucode.com/Study/ ... -
Web日志分析方法概述 让复杂的数据挖掘变得简单可行
2012-04-13 16:31 834志在计算机系统中是一个非常广泛的概念,任何程序都有可能输出日志 ... -
应用sqoop将MySQL数据库中的数据导入Hbase
2012-04-12 11:31 2033转自:http://www.bwxxkj.com/a/jish ... -
hadoop中的trash机制,恢复删除的文件
2012-03-31 13:46 2153Hadoop回收站trash,默认是关闭的。 1.修改con ... -
Hadoop 和DBMS 的互补性
2012-03-29 12:22 642随着Microsoft 也加入Hadoop 阵营,Hadoop ... -
Hadoop 中的两表join
2012-03-29 12:20 823转自:http://www.gemini52013 ... -
各版本编译hadoop eclipse plugin
2012-03-29 12:19 1349转自:http://www.gemini52013 ... -
hadoop常见错误
2012-03-24 13:04 1619错误1:bin/hadoop dfs 不能正常启动,持续提 ... -
hbase运维
2012-03-23 17:33 1281转自:http://blog.bluedavy.c ...
相关推荐
datax数据从hive表导入mysql表,数据缺失解决
1.将Mysql中的数据迁移到Hdfs文件系统中,然后通过Hive加载HDFS文件系统中的数据值 2.将Hive中的数据迁移到指定Mysql数据库中 注意点: 1.数据迁移的过程中,由于hive的Null值存储为"\N",Mysql存储为NULL值,二者...
2、sqoop导入(RMDB-mysql、sybase到HDFS-hive) 网址:https://blog.csdn.net/chenwewi520feng/article/details/130572275 介绍sqoop从关系型数据库mysql、sybase同步到hdfs、hive中
生成hive建表语句
数据同步Sqoop用法之mysql与Hive导入导出.docx
有赞数据平台从2017年上半年...首先介绍一下有赞大数据平台总体架构:如下图所示,底层是数据导入部分,其中DataY区别于开源届的全量导入导出工具alibaba/DataX,是有赞内部研发的离线Mysql增量导入Hive的工具,把Hiv
配置列表若存在,取得最后一次同步的分区名保存到export_pt取到需要导出的分区到folder变量图一情况二:从hdfs://edi/edi_conf中没有取
29.Sqoop将MySQL数据导入到HDFS(一) 30.Sqoop将MySQL数据导入到HDFS(二) 31.Sqoop中的增量导入与Sqoop job 32.Sqoop将MySQL数据导入Hive表中 33.Sqoop的导出及脚本中使用的方式 34.案例分析-动态分区的实现 35....
flume、hive和sqoop的实用案例:flume收集日志hive负责处理数据sqoop负责将数据导出到mysql中供页面展示
Sqoop 将数据从mysql导入HIVE 操作hive中的数据来计算每个股票代码的股票量 将此数据存储到 hive 上的 stock_volume 表中 使用 sqoop 将此数据从 hive 导出回 mysql 进行报告。 待开发。 用于导入、数据操作和...
从hive导出到mysql; Sqoop中的Job任务使用;使用Job任务实现增量导入;从mysql导入hdfs可以指定在hdfs上存储格式等等内容。 希望对有需要的朋友有所帮助,如有疑问,可以私信;整理不易,多多支持!
1、典型的离线流数据分析系统 2、技术分析 - Hadoop - nginx - flume - hive - mysql - springboot + mybatisplus+vcharts nginx + lua 日志文件埋点的 基于Hadoop网站流量日志数据分析系统 1、典型的离线流数据...
使用Spark框架进行网站用户购物分析 ...4、使用Sqoop将数据从Hive导入MySQL 5、使用Sqoop将数据从MySQL导入HBase 6、使用HBase Java API把数据从本地导入到HBase中 7、使用R对MySQL中的数据进行可视化分析
2. 练习使用Sqoop将数据从Mysql中导入到HBase中 3. 练习使用Sqoop将数据在Mysql与Hive之间的导入导出 4. Sqoop上的操作举例
为了避免运营提出无穷无尽的查询需求,我们决定将有查询价值的数据从mysql导入hive中,让他们使用HUE这个开源工具进行查询。想必他们对表结构不甚了解,还需要为之提供一个表结构说明,于是编写了一个脚本,从hive...
sqoop问题处理通用办法,首先详细描绘了一个sqoop实际问题得处理过程,然后延申到完整得hadoop日志得处理过程。非常具体实用。
本次要实践的数据日志来源于国内某技术学习论坛,该论坛由某培训机构主办,汇聚...使用Sqoop把Hive产生的统计结果导出到mysql中; 两个日志文件,一共有200MB,符合大数据量级,可以作为推荐系统数据集和hadoop测试集。
3、Sqoop导出(HDFS到RMDB-mysql) 网址:https://blog.csdn.net/chenwewi520feng/article/details/130573318 介绍通过sqoop将hdfs数据同步到关系型数据库mysql中
包含了idea代码关于使用mapreduce清洗数据,以及上传数据到HDFS。 包含了hive创表以及sqoop导出数据到MySQL。
6.参与编写脚本文件将数据导出到MySQL中,运用Tableau工具进行可视化分析 参与编写部分Oozie任务调度脚本,实现工作流的自动化调控 实时项目:实时气象数据展示大屏搭建 个人职责: 1.与客户对接,确定大屏上需要...