HiveQL(Hive SQL)跟普通SQL最大区别

guoyunsky

浏览: 839117 次
性别:
来自: 上海

最近访客更多访客>>

sdzhaoweiji

hywa

chen88358323

jinky2004

博主相关

博客

微博

相册

留言

关于我

博客专栏

: Heritrix源码分析
浏览量：203194

: SQL的MapReduce...
浏览量：0

文章分类

社区版块

存档分类

博客分类：

Hadoop
Hive

微博:http://weibo.com/guoyunwb

一直使用PIG，而今也需要兼顾HIVE。网上搜了点资料，感觉挺有用，这里翻译过来。翻译估计不太准确，待自己熟悉HIVE后再慢慢总结。

* No true date/time data types, no interval types, and many missing UDFs for manipulating dates (e.g. ADD_MONTH)

* Strict type matching without support for automatic coercion or typed literals (e.g. CASE <bigint expr> WHEN 1 THEN ... END)

* All queries must reference a table (no 'dual' or table-less queries)

* No session-scoped temp tables

* No 'IN' predicate

* No 'FIND' string search function for producing the offset to a match

* No find/replace string functions for plain strings (i.e. not regex)

* XPATH UDFs cannot return a string representing an entire subtree in the DOM, which prevents composition.

* Few mechanisms for collapsing arrays to scalar types (e.g. 'join' complement of string 'split'; aggregations other than 'size' for numeric arrays; etc.)

粗略的翻译：

1.HiveQL没有真正的日期/时间类型,自增类型,以及操作日期和时间的一些函数如(ADD_MONTH)

2.HiveQL有着非常严格的类型匹配,不支持类型自动转换(如不支持: CASE big_int_number WHEN 1 THEN ... END),我的理解是big int类型不可以自动帮你转换为int

3.HiveQL只能对表进行查询，普通的SQL可以对结果集查询,如一般的嵌套查询)

4.HiveQL没有临时表的概念

5.HiveQL没有IN操作

6.HiveQL对于字符串没有FIND和REPLACE函数

7.HiveQL中的XPATH UDF不能够返回一个代表子DOM树的字符串实体,为了阻止composition.

8.Few mechanisms for collapsing arrays to scalar types (e.g. 'join' complement of string 'split'; aggregations other than 'size' for numeric arrays; etc.)

===========================================================================================================================================================

1.No windowing functions. IE, SUM(sales) OVER (PARTITION BY date). Its difficult to do a lot things common to warehousing, like a running sum, without having to write custom mappers/reducers or a UDF.

2.No regular UNION, INTERSECT, or MINUS operators.

3.Null values are treated differently than empty string, and are exported differently. IE, empty strings are exported as '\n' and nulls are exported as nulls. I know this isn't unique to Hive but still annoying when exporting data from Hive into another system.

4.No hierarchical/self referencing querying. I know most distributed computing solutions can't do this, but it can be very handy.

5.No Update or Delete statements.

6.Haven't been able to find any kind of cost-based explain plans. Running explain plans generally just shows the path of accessing data. Useful to some degree but it would be great if it was more advanced in that it could help the user understand which steps are causing the biggest slowdowns.

=======================================================================================================================================================================

1. For row format delimiter for line termination, it only supports '\n'.

2. Hive does not support the ability to run a query that select from tables in more than one database.

3. Hive does not support sub-queries such as those connected by IN/EXISTS in the WHERE clause.

4. Hive does not support the truncation of data from a table.

===========================================================================================================================================================

更多技术文章、感悟、分享、勾搭，请用微信扫描:

分享到：

Apache Tajo介绍 | Github上README.md介绍

2013-02-19 18:15
浏览 32818
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论