Hive 创建表

duming115

浏览: 113176 次
性别:
来自: 北京

最近访客更多访客>>

jis117

xxxyyylll

rabbitbug

mingyangsihai

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

cloudera

Hadoop 正则表达式 Apache F#

1. Hive 创建表时正则表达式与java代码中一致,RegexSerDe 中用\\来表示反转码
create table inst_cn_3 (
ip string,
time string,
mac string,
lan string,
ver string,
lc string,
pn string,
reg string,
vrf string,
line string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "- *([\\d\\.]+) *\\[([\\d]{2}/[\\w]+/[\\d]{4}:[\\d]{2}:[\\d]{2}:[\\d]{2}\\s+\\+[\\d]+)\\] *GET */mx3/inst/([0-9a-f]{12})/ver=([\\.\\d]+),lan=(0x[\\w]+)(?:,lc=([\\w]+))(?:,pn=([\\w]+))(?:,reg=([0-1]))(?:,vrf=([\\w]+))?.*"
)
STORED AS TEXTFILE;

alter table inst_cn_3 add columns(line string);
ALTER TABLE inst_cn_3 SET SERDEPROPERTIES (
"input.regex" = "- ([\\d\\.]+) \\[([\\d]{2}/[\\w]+/[\\d]{4}:[\\d]{2}:[\\d]{2}:[\\d]{2}\\s+\\+[\\d]+)\\] GET /mx3/inst/([0-9a-f]{12})/ver=([\\.\\d]+),lan=(0x[\\w]+)(?:,lc=([\\w]+))(?:,pn=([\\w]+))(?:,reg=([0-1]))(?:,vrf=([\\w]+))?.*|(.*)"
);
select * from inst_cn_3 limit 100;

select
line
from inst_cn_3
where
1=1
   and mac is null
   and line is not null
   and !(line rlike '.*unknowuser00.*')
;

hadoop fs -cp /mnt/nfs/log/statcn/inst/inst_cn_3.*.txt /hive/warehouse/inst_cn_3/

2011.06.09 hive时间处理
1.select
from_unixtime(unix_timestamp('02/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
from_unixtime(unix_timestamp('02/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
from_unixtime(unix_timestamp('03/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
from_unixtime(unix_timestamp('03/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss')
from
stat_c_log
where
1=1
and partkey = '20110503'
and logType = 'inst_cn_3'
and url rlike '/mx3/inst/.*'
limit
10
;

2.
select
from_unixtime(unix_timestamp('02/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
from_unixtime(unix_timestamp('02/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
from_unixtime(unix_timestamp('03/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
from_unixtime(unix_timestamp('03/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') ,
round((unix_timestamp('02/May/2011:00:00:00 +0800','dd/MMMMM/yyyy:HH:mm:ss Z')-4*3600)/(3600*24))*(3600*24*1000),
round((unix_timestamp('02/May/2011:23:59:59 +0800','dd/MMMMM/yyyy:HH:mm:ss Z')-4*3600)/(3600*24))*(3600*24*1000)
from
stat_c_log
where
1=1
and partkey = '20110503'
and logType = 'inst_cn_3'
and url rlike '/mx3/inst/.*'
limit
10
;

2012.03.01
1. Hive 方法注册类 FunctionRegistry
2012.06.14
1. set hive.cli.print.header=true; 可以设置hive shell的输出.
2012.06.26
1. hive cdh4b2 使用arichive 对表归档后，使用select line 对归档后的partition查询时，报FileNotFoundException 异常。
   https://issues.apache.org/jira/browse/MAPREDUCE-2704 是因为CombineFileInputFormat constructs new Path objects by converting an existing path to a URI, and then only pulling out the "path" part of it. This drops the scheme and host, which makes CombineFileInputFormat fail if the paths are on a filesystem other than the default one.
2012.07.16
1. EXPLAIN EXTENDED hive_query; 查看运行.
2012.07.29
1. DESCRIBE FORMATTED mock; 显示columns, location, params等.
2. /src/ql/src/test/queries/clientpositive/

分享到：

HBase 命令 | Chukwa

2011-05-16 18:02
浏览 6955
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hive 创建表

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hive 创建表

评论

发表评论

相关推荐

Hadoop4.3 源码笔记

hadoop 下的debug

最近访客更多访客>>