hadoop-compression

leibnitz

浏览: 274560 次
性别:
来自: 广州

最近访客更多访客>>

eternal1025

bneliao

adapterofcoms

caipeijun666

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop

http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

(namely :使hadoop支持Splittable压缩lzo)

-what is the flow of upload a gzip file to dfs?

split to hdfs chunk block by block,so the first block is full,but last one maybe not.and no changes involved in these blockes in terms of bytes:

hadoop@host-08:~$ hadoop fsck /user/hadoop/mr-test-data.zj.tar.gz -blocks -locations  -files

FSCK started by hadoop from /192.168.12.108 for path /user/hadoop/mr-test-data.zj.tar.gz at Mon Oct 26 17:22:24 CST 2015
/user/hadoop/mr-test-data.zj.tar.gz 173826303 bytes, 2 block(s):  OK
0. blk_-6142856910439989465_2680086 len=134217728 repl=3 [192.168.12.148:50010, 192.168.12.110:50010, 192.168.12.132:50010]
1. blk_-9182536886628119965_2680086 len=39608575 repl=3 [192.168.12.110:50010, 192.168.12.134:50010, 192.168.12.140:50010]

compared with raw file:

ls 
-rw-r--r--  1 hadoop hadoopgrp 173826303 Apr 23  2014 mr-test-data.zj.tar.gz

-what is the order of writing a gzip file to dfs?split-> compress or compress -> split

TODO,see hbase's src

conclusion:

-a new record does -not- always mean that one slice text per line,though,may be one key/value pair etc.

-hadoop block level meaning is unrelated to 'splittable'

-lzo formatted file is splitable only if it generated by 'lzo indexed file' which is part of it.

this is similar to the 'hbase's hfile' format

gzip is not splittable ,so only one map to process it:

job_201411101612_0397	NORMAL	hadoop	word count	100.00% 1 	0

but for hbase's hfile with snappy compression there are more than one mappers:

hadoop@host-08:/usr/local/hadoop/hadoop-1.0.3$ hbase hfile  -s -f /hbase/archive/f63235f4a6d84c84722f82ffd8122206/fml/b7e2701a60764f9a940912743b55d4e0
15/10/26 17:51:56 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
15/10/26 17:51:56 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 3.7g
15/10/26 17:51:57 WARN snappy.LoadSnappy: **Snappy** native library is available

job_201411101612_0396	NORMAL	hadoop	word count	100.00% 51	51

so ,u can think as it's a common text file as the hfile(with snappy compression) only compress the streaming data to it for key/value data bytes,instead of generating a real snappy file *.snappy.

分享到：

scala-the answers to 'impatient scala' | spark-common RDD transformations and act ...

2015-10-26 16:52
浏览 457
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论