Wow, LZ4 is fast!
I’ve been doing some experiments with LZ4 recently and I must admit that I am truly impressed. For those not familiar with LZ4, it is a compression format from the LZ77 family. Compared to other similar algorithms (such as Google’s Snappy), LZ4’s file format does not allow for very high compression ratios since:
- you cannot reference sequences which are more than 64kb backwards in the stream,
- it encodes lengths with an algorithm that requires 1 + floor(n / 255) bytes to store an integer n instead of the 1 + floor(log(n) / log(2^7)) bytes that variable-length encoding would require.
This might sound like a lot of lost space, but fortunately things are not that bad: there are generally a lot of opportunities to find repeated sequences in a 64kb block, and unless you are working with trivial inputs, you very rarely need to encode lengths which are greater than 15. In case you still doubt LZ4 ability to achieve high compression ratios, the original implementation includes a high compression algorithm that can easily achieve a 40% compression ratio on common ASCII text.
But this file format also allows you to write fast compressors and uncompressors, and this is really what LZ4 excels at: compression and uncompression speed. To measure how faster LZ4 is compared to other famous compression algorithms, I wrote three Java implementations of LZ4:
- a JNI binding to the original C implementation (including the high compression algorithm),
- a pure Java port, using the standard API,
- a pure Java port that uses the sun.misc.Unsafe API to speed up (un)compression.
Then I modified Ning’s JVM compressor benchmark (kudos to Ning for sharing it!) to add my compressors and ran the Calgary compression benchmark.
The results are very impressive:
- the JNI default compressor is the fastest one in all cases but one, and the JNI uncompressor is always the fastest one,
- even when compressed with the high compression algorithm, data is still very fast to uncompress, which is great for read-only data,
- the unsafe Java compressor/uncompressor is by far the fastest pure Java compressor/uncompressor,
- the safe Java compressor/uncompressor has comparable performance to some compressors/uncompressors that use the sun.misc.Unsafe API (such as LZF).
Compression
Uncompression
If you are curious about the compressors whose names start with “LZ4 chunks”, these are compressors that are implemented with Java streams API and compress every 64kb block of the input data separately.
For the full Japex reports, see people.apache.org/~jpountz/lz4.
相关推荐
对snappy压缩研究后的算法详解,有兴趣的可以看看
Snappy(旧称:Zippy)是Google基于LZ77的思路用C++语言编写的快速数据压缩与解压程序库,并在2011年开源。其目标并非最大压缩率或与其他压缩程序的兼容性,而是非常高的速度和合理的压缩率。使用一个运行在64位模式...
22、MapReduce使用Gzip压缩、Snappy压缩和Lzo压缩算法写文件和读取相应的文件 网址:https://blog.csdn.net/chenwewi520feng/article/details/130456088 本文的前提是hadoop环境正常。 本文最好和MapReduce操作常见...
离线hadoop集群下,snappy在hbase中的安装配置步骤
google snappy 压缩算法 源码dll 及delphi例子
google snappy 快速压缩算法,c语言版本 速度比zlib快10倍,内存占用大概多25%
GZIP、LZO、Zippy/Snappy是常用的几种压缩算法,各自有其特点,因此适用的应用场景也不尽相同
配置hadoop支持LZO和snappy压缩
Snappy.Sharp用C#实现Google的Snappy压缩算法。 这项工作仍在进行中,尤其是流媒体方面。 目前的状态是,我认为压缩和扩展算法在Snappy.Sharp中起作用。Google的Snappy压缩算法在C#中的实现。 这项工作仍在进行中...
snappy压缩技术的源码,是google云存储的基础
支持snappy压缩的hadoop2.7.2,源码编译支持压缩 。
快照用于 Snappy 压缩算法的 Ruby 库安装将此行添加到应用程序的 Gemfile 中: gem 'ohsnap'然后执行: $ bundle或者自己安装: $ gem install ohsnap用法 compressed_data = Ohsnap . compress ( data ) data = ...
hadoop2.7.2 snappy压缩支持,替换lib/native下的即可实现snappy 压缩支持。
NULL 博文链接:https://qindongliang.iteye.com/blog/2267681
EasyCompressor 一个压缩库,可实现许多压缩算法,例如LZ4,Zstd,LZMA,Snappy,Brotli,GZip和Deflate 。 它通过减少用于缓存的内存使用量和网络流量来帮助您提高性能。Nuget软件包包裹名字版本描述 包含GZip,...
Snappy是在谷歌内部生产环境中被许多项目使用的压缩库,包括BigTable,MapReduce和RPC等。谷歌表示算法库针对性能做了调整,而不是针对压缩比或与其他类似工具的兼容性。Snappy同时针对64位x86处理器进行了优化,在...
编译过的hadoop3.0版本,安装了snappy压缩。
Snappy压缩库,使用参考https://www.cnblogs.com/Crysaty/p/6256367.html
并导出了x86(debug)的静态库,使用方法:将snappy.h加入到头文件目录中,将snappy.lib加到资源文件中,然后在需要的地方include这个头文件,即可调用snappy.h中的各种压缩解压接口(github上也有说明),注意测试过程...
snappy压缩算法包