Prestodb概述及性能测试

yugouai

浏览: 491694 次
性别:
来自: 深圳

最近访客更多访客>>

淡定情绪

spaceandroid

fengbin2005

hundun

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

presto系列

概述内容

（1）简介

（2）Hive and Prestodb, comparison of functionality

（3）Hive and Prestodb, comparison of performance

（1）简介

Presto是由facebook开发的一个分布式SQL查询引擎，它被设计为用来专门进行高速、实时的数据分析。它支持标准的ANSI SQL，包括复杂查询、聚合（aggregation）、连接（join）和窗口函数（window functions)。

Presto框架图如下：

下面的架构图中展现了简化的Presto系统架构。客户端（client）将SQL查询发送到Presto的协调员（coordinator）。协调员会进行语法检查、分析和规划查询计划。计划员（scheduler）将执行的管道组合在一起，将任务分配给那些里数据最近的节点，然后监控执行过程。客户端从输出段中将数据取出，这些数据是从更底层的处理段中依次取出的。

Presto的运行模型和Hive或MapReduce有着本质的区别。Hive将查询翻译成多阶段的MapReduce任务，一个接着一个地运行。每一个任务从磁盘上读取输入数据并且将中间结果输出到磁盘上。然而Presto引擎没有使用MapReduce。它使用了一个定制的查询和执行引擎和响应的操作符来支持SQL的语法。除了改进的调度算法之外，所有的数据处理都是在内存中进行的。不同的处理端通过网络组成处理的流水线。这样会避免不必要的磁盘读写和额外的延迟。这种流水线式的执行模型会在同一时间运行多个数据处理段，一旦数据可用的时候就会将数据从一个处理段传入到下一个处理段。这样的方式会大大的减少各种查询的端到端响应时间。

（2）Hive and Prestodb, comparison of functionality

√: Yes; ×: No; Blue: The main differences between hive and presto

	hive 0.11.0	presto 0.56
Implement	Java	Java
DataType
integer	√	√
string	√	√
floating point	√	√
boolean	√	√
map	√	√
list	√	√
struct	√	√
uniontype	√	×
timestamp	√	√
DDL(数据定义语言)
create/alter/drop table	√	×
create view	√	×
truncate table	√	×
desc	√	√
create index	√	×
DML(数据操作语言)
load data	√	×
insert	√	√
explain	√	√
tablesample(基于column做bucket)	√	√
group by	√	√
order by	√	√
having	√	√
limit	√	√
inner/left/right/full join	√	√
union	√	√
sub queries	√	√
Enhanced Aggregation, Cube, Grouping and Rollup	√	×
lateral view	√	×
Function
UDF	√	×
Mathematical Functions	√	√
String Functions	√	√
Date and Time Functions	√	√
Regex	√	√
Type Conversion Functions	√	×
Conditional Functions	√	√
Aggregate Functions	√	√
Windowing	√	√
Distinct	√	√
Url	√	√
Json	√	√