hbase中的建表脚本:
create 'HisDiagnose',{ NAME => 'diagnoseFamily'}
通过往hive中创建外部表来映射hbase中已经存在的表结构,从而可以通过Hive QL查询hbase表中的数据,从而使得hbase这种NOSQL数据库具备SQL的能力,脚本脚本为:
CREATE EXTERNAL TABLE HisDiagnose(key string, doctorId int, patientId int, description String, rtime int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,diagnoseFamily:doctorId,diagnoseFamily:patientId,diagnoseFamily:description,diagnoseFamily:rtime")
TBLPROPERTIES("hbase.table.name" = "HisDiagnose");
问题描述:
通过hbase client api往hbase的HisDiagnose插入数据,字段doctorId、patientId、rtime类型int,在hive中通过select * from HisDiagnose查询得到doctorId、patientId、rtime三个字段的值为null,代码如下:
/**
* 插入数据
* @param tablename
*/
public static void insertData(String tablename) {
System.out.println("开始插数据 ....");
HTablePool pool = new HTablePool(conf, 1000);
HTableInterface table = pool.getTable(tablename);
try {
for(int i=1; i<=1; i++){
Put put = new Put(("2013-03-0" + i).getBytes());//一个PUT代表一行数据,再NEW一个PUT表示第二行数据,每行一个唯一的ROWKEY,此处ROWKEY为put构造方法中传入的值
put.add("diagnoseFamily".getBytes(), "doctorId".getBytes(), new Date().getTime(), Bytes.toBytes(i));
put.add("diagnoseFamily".getBytes(), "patientId".getBytes(), new Date().getTime(), Bytes.toBytes(i));
put.add("diagnoseFamily".getBytes(), "description".getBytes(), new Date().getTime(), "描述".getBytes());
put.add("diagnoseFamily".getBytes(), "rtime".getBytes(), new Date().getTime(), Bytes.toBytes(new Date().getTime()));
table.put(put);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("插数据结束 ....");
}
问题解决:
根据官网Wiki文档,https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration关于Column Mapping的说明如下:
There are two SERDEPROPERTIES that control the mapping of HBase columns to Hive:
(1)、hbase.columns.mapping
(2)、hbase.table.default.storage.type: Can have a value of either string (the default) or binary, this option is only available as of Hive 0.9 and the string behavior is the only one available in earlier versions
The column mapping support currently available is somewhat cumbersome and restrictive:
(1)、for each Hive column, the table creator must specify a corresponding entry in the comma-delimited hbase.columns.mapping string (so for a Hive table with n columns, the string should have n entries); whitespace should not be used in between entries since these will be interperted as part of the column name, which is almost certainly not what you want
(2)、a mapping entry must be either :key or of the form column-family-name:[column-name][#(binary|string) (the type specification that delimited by # was added in Hive 0.9.0, earlier versions interpreted everything as strings)
(3)、If no type specification is given the value from hbase.table.default.storage.type will be used
(4)、Any prefixes of the valid values are valid too (i.e. #b instead of #binary)
(5)、If you specify a column as binary the bytes in the corresponding HBase cells are expected to be of the form that HBase's Bytes class yields.
(6)、there must be exactly one :key mapping (we don't support compound keys yet)
(7)、(note that before HIVE-1228 in Hive 0.6, :key was not supported, and the first Hive column implicitly mapped to the key; as of Hive 0.6, it is now strongly recommended that you always specify the key explictly; we will drop support for implicit key mapping in the future)
(8)、if no column-name is given, then the Hive column will map to all columns in the corresponding HBase column family, and the Hive MAP datatype must be used to allow access to these (possibly sparse) columns
(9)、there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.
(10)、Since HBase does not associate datatype information with columns, the serde converts everything to string representation before storing it in HBase; there is currently no way to plug in a custom serde per column
(11)、it is not necessary to reference every HBase column family, but those that are not mapped will be inaccessible via the Hive table; it's possible to map multiple Hive tables to the same HBase table
The next few sections provide detailed examples of the kinds of column mappings currently possible.
根据以上得知:当在hive中创建hbase已经存在的外部表时,默认的hbase.table.default.storage.type类型为string,而hbase中的doctorId、patientId、rtime三个字段值为int类型的,难怪映射过来的值为null,将hive中的外部表删除,
hbase.table.default.storage.type的值设置为binary即可,重建脚本如下:
CREATE EXTERNAL TABLE HisDiagnose(key string, doctorId int, patientId int, description String, rtime int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,diagnoseFamily:doctorId,diagnoseFamily:patientId,diagnoseFamily:description,diagnoseFamily:rtime","hbase.table.default.storage.type"="binary")
TBLPROPERTIES("hbase.table.name" = "HisDiagnose");
相关推荐
Hive与Hbase的整合,集中两者的优势,使用HiveQL语言,同时具备了实时性
建立Hive和Hbase的映射关系,通过Spark将Hive表中数据导入ClickHouse对应的jar包
基于hadoop的,java实现hive快速导数据到Hbase解决方案。
配置,测试,导入数据详细操作,CREATE TABLE hive_hbase_table(key int, value string,name string) hadoop jar /usr/lib/hbase/hbase-...注意导入的时候要先在hbase中创建idap_dim_chrg_item_catgy这个表 create
被编译的hive-hbase-handler-1.2.1.jar,用于在Hive中创建关联HBase表的jar,解决创建Hive关联HBase时报FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop....
该文档保护了目前比较流行的大数据平台的原理过程梳理。Hadoop,Hive,Hbase,Spark,MapReduce,Storm
jdk1.8.0_131、apache-zookeeper-3.8.0、hadoop-3.3.2、hbase-2.4.12 mysql5.7.38、mysql jdbc驱动mysql-connector-java-8.0.8-dmr-bin.jar、 apache-hive-3.1.3 2.本文软件均安装在自建的目录/export/server/下 ...
HIVE建表时可以指定映射关系直接读取HBASE的数据,相当于有了一个HBASE向HIVE的通道。那HIVE向HBASE有通道吗?本文主要讲述了Hive库数据如何入到HBASE中。
由于大数据里面涉及到非关系型数据库如hive、kudu、hbase等的数据迁移,目前涉及到的迁移工具都没有支持hive数据库的事务表的迁移,如果hive库里面存在大量的事务表的时候,目前的工具都是不支持的,例如华为的CDM,...
hive与hbase整合经验谈
浅谈Hive vs. HBase,大数据类
hadoop、hbase、hive等相关面试题目。
在数据集目录中选取2个数据文件内部表创建表,选取2个数据文件创建外部表。 使用美国疫情数据创建分区表 疫情数据实现静态和动态加载数据到分区表 自己构建疫情数据实现多重分区表 使用美国疫情数据实现桶表 修改...
HBase2.1.3整合Hive3.1.2,Hive官方的hive-hbase-handler-3.1.1.jar包不好用,自己编译后的,确认好用
创建表时使用关键字external创建的表就是外部表,没有使用该关键字创建的表就是内部表。 删除表时(drop table)内部表会删除hdfs对应路径,而外部表不会删除hdfs对应的路径, 删除表无论是内部表和外部表都会删除元...
Hadoop Hive与Hbase整合配置
此文档是本人在工作中用到的知识总结出来的整合过程,本人是菜鸟,希望得到大神们的建议。