binary优化及相关

litaocheng

浏览: 333386 次
性别:
来自: 北京

最近访客更多访客>>

funing

room_bb

huangyongxing

zengweishu1988

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Erlang

Erlang REST F#Blog 框架

R12B中引入了bit string(bits)，其包含任意数目的bit，如果其包含的bit数目可以被8整除，那么我们称其为binary(bytes)。
使用bits
有了bits，我们对某些协议的操作更加简便灵活。
比如IS 683-PRL 协议，其首部包含5个bit指示后面有多少个连续的11 bit的数据。
在以前我们要解析其这个数据包了非常繁琐，请看如下：

decode(<<NumChans:5, _Pad:3, _Rest/binary>> = Bin) ->
	decode(Bin, NumChans, NumChans, []).
decode(_, _, 0, Acc) ->
	Acc;
decode(Bin, NumChans, N, Acc) ->
	SkipBef = N * 11,
	SkipAft = (NumChans - N) * 11,
	Pad = 8 - ((NumChans * 11 + 5) rem 8),
	<<_:5, _:SkipBef, V:11, _:SkipAft, _:Pad>> = Bin,
	decode(Bin, NumChans, N-1, [V | Acc]).

是不是很复杂，之所以如此负载，就是因为原来binary拥有数据的最小单位是byte，所以我们只能用很多手法来定位解析我们要的数据。

而现在有了bits，我们可以非常轻松的实现上面的代码:
第一种方法：

decode(<<NumChans:5, Rest/binary>>) ->
	decode(Rest, NumChans, []).

decode(<<V:11, Rest/bits>>, N, Acc) ->
	decode(Rest, N-1, [V | Acc]);
decode(_, 0, Acc) ->
	lists:reverse(Acc).

是不是简洁很多？
我们还有第二种方法：

decode(<<NumChans:5, Chans:NumChans/bits-unit:11, _/bits>>) ->
	[Chan || <<Chan:11>> <= Chans].

binaries的创建
erlang在创建一个binary bin1的时候，会预留一定的空间（整个binary大小为：实际占用空间的2倍或256两者中较大者），这样当我们连续的在尾部添加bin2时，只需要将bin2追加到原binary的预留空间即可，不用重新分配空间，也不用拷贝原有bin1的数据。因此效率很高。
比如
Bits拥有1000个bit的数据，同时拥有600个bit的未使用空间，下面的表达式：

NewBits = <<Bits/bits, 12:32>>,

NewBits只是指向了原有的Bits，同时将32bit的数据追加到Bits中，此时NewBits拥有1032个bit的数据，568个未使用bit。而原有的Bits，此时还是保持原有的数据。
接着你写下了下面的语句：

NewBits2 = <<Bits/bits, 12:64>>,

如果我们还在Bits后面追加，那么显然NewBits2不是我们想要的结果。这里没有优化的可能了。
只能老老实实的创建一个新的binary，然后将Bits1的内容拷贝过来。

binary匹配
如何写出能够binary匹配优化的代码?
1, 牢牢记住下面的代码框架

f(<<Pattern1,...,Rest/bits>>,...) ->
  ... % Rest is not used here
  f(Rest,...);
f(<<Pattern2,...,Rest/bits>>,...) ->
  ... % Rest is not used here
  f(Rest,...);
...
f(<<>>, ...) ->
  ReturnValue

即不要在binary匹配的clause中返回或使用Rest binary，仅仅是将其作为参数传递给下一个自身函数调用。

2，函数中的clause如果是进行binary匹配，那么这些进行匹配的clause需要连续。（后面可以看到一个例子）
这样我们的代码才会被最大程度的binary优化。在这样的函数中，直到函数返回，我们只创建了一个match context，没有创建任何其他的sub binary。
要点就是Rest在代码中不要做任何其他使用，而是直接交给递归函数。
注：match contex和sub binary都是binary的一种，binary在内部实现中共有四种类型，另外两种类型是Refc binaries和
Heap binaries。具体的参看http://www.erlang.org/doc/efficiency_guide /binaryhandling.html#4。需要说明的是match context只在binaries match的时候生成，其比sub binary更加高效。

如何查看我的代码是否执行了binary 优化？
只要加上bin_opt_info选项即可，可以添加在命令行，也可以添加在源文件的-compile属性中：

erlc + bin_opt_info my.erl

或

%% my.erl
-module(my).
-compile([bin_opt_info]).

binary匹配代码优化举例

通过bin_opt_info编译选项，我们看几个例子，我们把编译器的提示信息作为注释加入到代码中，方便您查看未优化原因。

1，未优化，因为函数参数的顺序，binary匹配应该在第一位。

non_opt_eq([H|T1], <<H,T2/binary>>) ->
        %% INFO: matching anything else but a plain variable to
        %%    the left of binary pattern will prevent delayed
        %%    sub binary optimization;
        %%    SUGGEST changing argument order
        %% NOT OPTIMIZED: called function non_opt_eq/2 does not
        %%    begin with a suitable binary matching instruction
    non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
    false;
non_opt_eq([], <<>>) ->
    true.

优化：

opt_eq(<<H,T1/binary>>, [H|T2]) ->
       opt_eq(T1, T2);
opt_eq(<<_,_/binary>>, [_|_]) ->
    false;
opt_eq(<<>>, []) ->
    true.

2，未优化，不合理的匹配语法，导致无法优化，这里的Bad模式匹配，影响优化。

not_opt_sum1(<<A, Rest/binary>> = Bad, Acc) ->
	%Warning: NOT OPTIMIZED: called function not_opt_sum1/2
        %does not begin with a suitable binary matching instruction
	not_opt_sum1(Rest, A + Acc);
not_opt_sum1(<<>>, Acc) ->
	Acc.

优化：
去掉Bad

3，未优化，sub binary被使用。毫无疑问这里的bit_size使用了Rest.

not_opt_sum2(<<A, Rest/binary>>, Acc) ->
	bit_size(Rest), %Warning: NOT OPTIMIZED: sub binary is used or returned
	not_opt_sum2(Rest, A + Acc);
not_opt_sum2(<<>>, Acc) ->
	Acc.

优化：
去除bit_size(Rest)表达式

4，未优化，sub binary被返回，即第一个clause中 T被返回。

not_opt_zero(<<0, T/binary>>) ->
	T; %Warning: NOT OPTIMIZED: sub binary is used or returned
not_opt_zero(<<A, T/binary>>) ->
	not_opt_zero(T).

优化：

opt_zero(<<0, T/binary>>, not_found) ->
	opt_zero(T, found);
opt_zero(<<A, T/binary>>, _) ->
	opt_zero(T, not_found);
opt_zero(T, found) ->
	T.

5，未优化，多个binary匹配clause不连续导致（这个例子是从avindev的blog偷来的）

extract_str_end_with_tag(Data, Tag) ->
     extract_str_end_with_tag2(Data, <<>>, Tag, not_found).  

extract_str_end_with_tag2(<<Tag, T/binary>>, Buffer, Tag, _) ->
	 extract_str_end_with_tag2(T, Buffer, Tag, found);
extract_str_end_with_tag2(<<B, T/binary>>, Buffer, Tag, not_found) ->
     extract_str_end_with_tag2(T, <<Buffer/binary, B>>, Tag, not_found);
extract_str_end_with_tag2(Rest, Buffer, _, found) ->
	 {found, Buffer, size(Buffer), Rest};
	 % Warning: INFO: non-consecutive clauses that
	 %match binaries will prevent delayed sub binary optimization
extract_str_end_with_tag2(<<>>, Buffer, _Tag, _) ->
     {not_found, Buffer}.

优化：

extract_str_end_with_tag(Data, Tag) ->
     extract_str_end_with_tag2(Data, <<>>, Tag, not_found).  

extract_str_end_with_tag2(<<Tag, T/binary>>, Buffer, Tag, _) ->
	 extract_str_end_with_tag2(T, Buffer, Tag, found);
extract_str_end_with_tag2(<<B, T/binary>>, Buffer, Tag, not_found) ->
     extract_str_end_with_tag2(T, <<Buffer/binary, B>>, Tag, not_found);
extract_str_end_with_tag2(<<>>, Buffer, _Tag, _) ->
     {not_found, Buffer}.
extract_str_end_with_tag2(Rest, Buffer, _, found) ->
	 {found, Buffer, size(Buffer), Rest};

其实在写代码过程中，优化不是一个开始就要做的环节。首先正确实现工程最重要。但是如果我们了解一些优化的方法，写代码时顺手牵羊让代码
更加高效，何乐不为呢？
另外也不可把bin_opt_info的提示作为圣旨，一味追求OPTIMIZED，而扭曲了本来程序的逻辑。

备注：
遇到两个问题：
1：decode(<<NumChans:5, Chans:NumChans/bits-unit:11, _/bits>>) ->
[Chan || <<Chan:11>> <= Chans].
在R12B-3无法编译通过
2；avindev的blog中提出关于binary优化的一个bug？ http://avindev.iteye.com/blog/208927

参考：
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#4
http://www.erlang.se/euc/07/papers/1700Gustafsson.pdf

另外很高兴认识mryufeng，给予我很大帮助！

Update:
1, decode(<<NumChans:5, Chans:NumChans/bits-unit:11, _/bits>>) ->
[Chan || <<Chan:11>> <= Chans].
语法错误! 是理解binary错误, 正确写法:
decode(<<NumChans:5, Chans:NumChans/binary-unit:11, _/bits>>) ->
[Chan || <<Chan:11>> <= Chans].

因为bits(bitstring)默认的unit只能为1, 我们这里指定11,所以编译提示:bit type mismatch (unit) between 11 and 8, 我们将bits改为binary既可.
这里请注意:bits(bitstring)的unit为1, bytes的unit为8, 而binary的unit为1-256

分享到：

自己写一个tcp 通用服务器 | erlang的强数据类型

2009-01-13 10:37
浏览 1995
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论