hadoop基准测试

rockkyle

浏览: 10382 次

最近访客更多访客>>

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

HADOOP

第一步：数据准备要准备2份数据一份key-value形式的，一份非key-value的形式

key-value准备，写了个py脚本：

import random
import string
a='abcdefghijklmnopqrstuvwxyz'
alist=list(a)
blist=range(0,10)
f=open('testdata.txt','wb')
flag=True
j=0
while flag:
        astr=''.join(str(i) for i in random.sample(alist,5))
        bstr=''.join(str(i) for i in random.sample(blist,5))
#num j 决定生成数据的行数
        if j==20000000 :
                flag=False

        f.write("%s\t%s\n"%(astr,bstr))
        j+=1

将数据导入HDFS

hadoop fs -put testdata.txt /test/input/

另一份数据有hadoop-exmaple.jar里面的randomwriter生成

cd /usr/lib/hadoop/

hadoop jar hadoop-exmaple.jar randomwriter /test/input1/

第二步：执行测试

MRReliabilityTest：

hadoop jar hadoop-test.jar MRReliabilityTest -libjars hadoop-examples.jar

loadgen：

Usage: [-m <maps>] [-r <reduces>]
       [-keepmap <percent>] [-keepred <percent>]
       [-indir <path>] [-outdir <path]
       [-inFormat[Indirect] <InputFormat>] [-outFormat <OutputFormat>]
       [-outKey <WritableComparable>] [-outValue <Writable>]

可以根据情况设置参数

hadoop jar hadoop-test.jar loadgen -m 6 -r 3 -indir /test/input/ -outdir /test/output/

mapredtest：

Usage: TestMapRed <range> <counts>

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar mapredtest 2 10

testarrayfile：

Usage: TestArrayFile [-count N] [-nocreate] [-nocheck] file

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testarrayfile -count 4 /test/input/testdata.txt

testsequencefile:

Usage: SequenceFile [-count N] [-seed #] [-check] [-compressType <NONE|RECORD|BLOCK>] -codec <compressionCodec> [[-rwonly] | {[-megabytes M] [-factor F] [-nocreate] [-fast] [-merge]}] file

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testsequencefile -count 4 -check True -fast True /test/input/testdata.txt

testsetfile:

Usage: TestSetFile [-count N] [-nocreate] [-nocheck] [-compress type] file

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar testsetfile -count 4 /test/input/testdata.txt

threadedmapbench:

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar threadedmapbench

testfilesystem:

Usage: TestFileSystem -files N -megaBytes M [-noread] [-nowrite] [-noseek] [-fastcheck]

hadoop jar hadoop-test-1.2.0.1.3.0.0-107.jar -file 1 -megaBytes 1000

testmapredsort:

sortvalidate [-m <maps>] [-r <reduces>] [-deep] -sortInput <sort-input-dir> -sortOutput <sort-output-dir>

hadoop jar hadoop-test.jar -m 10 -r 5 -sortInput /test/input/ -sortOutpur /test/output

testbigmapoutput:

BigMapOutput -input <input-dir> -output <output-dir> [-create <filesize in MB>]hadoop jar hadoop-test.jar testbigmapoutput -input /test/input1/ -output /test/output1/

TestDFSIO基准测试HDFS

测试顺序应该是先写测试后读测试

Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]

写测试：

使用10个map任务写10个文件，每个500m。

hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 /tmp/TestDFSIO_log.txt

在运行的最后，结果被写入控制台并记录到路径/tmp/TestDFSIO_log.txt。

数据默认写入 /benchmarks/TestDFSIO目录下

读测试:

hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -read-nrFiles 10 -fileSize 1000 /tmp/TestDFSIO_log.txt

清除测试数据:

hadoop jar $HADOOP_HOME/hadoop-test-1.2.0.1.3.0.0-107.jar TestDFSIO -clean

namenode 基准测试:

12个mapper和6个reducer来创建1000个文件

hadoop jar hadoop-test.jar nnbench -operation create_write  -maps 12 -reduces 6 
-blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 
-readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`

mapreduce 基准测试:

mrbench会多次重复执行一个小作业，用于检查在机群上小作业的运行是否可重复以及运行是否高效

运行一个小作业50次

hadoop jar hadoop-test.jar mrbench -numRuns 50

testipc和tectrpc:

hadoop jar hadoop-test.jar testipc

hadoop jar hadoop-test.jar testrpc

PS：命令参数选择和设计可以根据硬件环境的设定

一些错误解决办法：

目的文件夹已存在：删除目标文件夹，再重跑相关命令

java heapsize不足：调高相应参数，或者跑任务之前参数设置多点maptask和reducetask

分享到：

python 对象内存分析 | python_ConfigParser

2013-10-31 20:04
浏览 1142
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论