`
heipark
  • 浏览: 2079533 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

理解Pig中flatten关键字

    博客分类:
  • pig
 
阅读更多

flatten在英文的意思弄平整的意思,这个操作符在不同的场景有不同的功能。

 

1. flatten tuple

flatten会把tuple内容打开,下面举例:

-- A结构:(a, (b, c))
B = foreach A GENERATE $0, flatten($1)

B返回结果(a,b,c)

 

2. flatten bag

flatten会把bag内容打开,每个tuple是一行,即列转换为行

 

-- A结构:({(b,c),(d,e)})
B = foreach A generate flatten($0)

B返回结果
(b,c)
(d,e)

 

 

官方文档原文内容
The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).

For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics