利用hadoop自带基准测试工具包进行集群性能测试【也可以测试单机版】,我是用的是 hadoop-3.1.3,测试文件为:
${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar
以下使用的测试文件为:${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
# 不带参数运行 可以查看参数说明
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
gsleep: A sleep job whose mappers create 1MB buffer for every record.
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode w/ MR.
nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
timelineperformance: A job that launches mappers to test timeline service performance.
翻译版:
DFSCIOTest: libhdfs的分布式i/o基准测试。
DistributedFSCheck: 分布式文件系统一致性校验。
JHLogAnalyzer: 作业历史日志分析器。
MRReliabilityTest: 通过注入错误/故障来测试MR框架的可靠性程序。
nndatgenerator: 生成NNloadGenerator使用的数据。
NNloadGenerator: 使用无MR运行的NN loadgenerator在Namenode上生成负载。
nnloadgenerator: 使用MR作业运行的NN loadgenerator在Namenode上生成负载。
NNstructureGenerator: 生成NNdataGenerator使用的结构。
SliveTest: HDFS压力测试和实时数据验证。
TestDFSIO: 分布式i/o基准测试。
fail: 一个总是失败的作业。
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)这个都是专用名词,不翻译了。
gsleep: 一个睡眠任务,它的映射器为每条记录创建1MB的缓冲区。
largesorter: Large-Sort 测试器。
loadgen: 通用的 map/reduce 负载生成器。
mapredtest: map/reduce 测试检查。
minicluster: 单进程HDFS和MR集群。
mrbench: 一个可以创造许多小作业的map/reduce基准。
nnbench: 强调 namenode w/ 的MR基准测试。
nnbenchWithoutMR: 强调 namenode w/o 的MR基准测试。
sleep: 在每个map和reduce任务上休眠的作业。
testbigmapoutput: 一个在一个非常大的不可分割文件上作业的map/reduce程序,并执行一致性 map/reduce
testfilesystem: 文件系统读/写测试。
testmapredsort: 一个验证map-reduce框架排序的map/reduce程序。
testsequencefile: 对二进制键值对的平面文件进行测试。
testsequencefileinputformat: 序列文件输入格式测试。
testtextputformat: 文本输入格式的测试。
threadmapbench: 一个map/reduce基准,用于比较具有多个溢出的map对具有1个溢出的map的性能。
timelineperformance: 启动映射器以测试时间轴服务性能的任务。
1 TestDFSIO
TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。查看说明:
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO
2021-08-16 11:40:46,857 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage:
TestDFSIO [genericOptions]
-read [-random |
-backward |
-skip [-skipSize Size]] |
-write |
-append |
-truncate |
-clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-storagePolicy storagePolicyName] [-erasureCodePolicy erasureCodePolicyName]
1.1 测试HDFS写性能
测试内容:向HDFS集群写10个10KB的文件:
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
TestDFSIO \
-write \
-nrFiles 10 \
-size 10KB \
-resFile /home/hadoop/tmp/TestDFSIO.log
查看一下:
生成了10个文件,虽然文件内容是10KB,但是块的大小是128MB。
1.2 测试HDFS读性能
测试内容:读取HDFS集群10个128M的文件
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
TestDFSIO \
-read \
-nrFiles 10 \
-size 128MB \
-resFile /home/hadoop/tmp/TestDFSIO.log
1.3 清除测试数据
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
TestDFSIO -clean
2021-08-16 15:45:25,751 INFO fs.TestDFSIO: TestDFSIO.1.8
2021-08-16 15:45:25,752 INFO fs.TestDFSIO: nrFiles = 10
2021-08-16 15:45:25,768 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
2021-08-16 15:45:25,768 INFO fs.TestDFSIO: bufferSize = 1000000
2021-08-16 15:45:25,768 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
2021-08-16 15:45:27,051 INFO fs.TestDFSIO: Cleaning up test files
2.mrbench
mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。 使用 -help 查看说明:
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
mrbench -help
MRBenchmark.0.0.2
Usage:
mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>]
[-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>]
[-numRuns <number of times to run the job, default is 1>]
[-maps <number of maps for each run, default is 2>]
[-reduces <number of reduces for each run, default is 1>]
[-inputLines <number of input lines to generate, default is 1>]
[-inputType <type of input to generate, one of ascending (default), descending, random>]
[-verbose]
2.1 测试运行一个作业50次:
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
mrbench \
-numRuns 50 \
-maps 10 \
-reduces 5 \
-inputLines 10 \
-inputType descending
3.nnbench
nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。使用 -help 查看说明:
查看说明:
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
nnbench -help
NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
-operation <Available operations are create_write open_read rename delete. This option is mandatory>
* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
-maps <number of maps. default is 1. This is not mandatory>
-reduces <number of reduces. default is 1. This is not mandatory>
-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time. default is launch time + 2 mins. This is not mandatory>
-blockSize <Block size in bytes. default is 1. This is not mandatory>
-bytesToWrite <Bytes to write. default is 0. This is not mandatory>
-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
-numberOfFiles <number of files to create. default is 1. This is not mandatory>
-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
-baseDir <base DFS path. default is /benchmarks/NNBench. This is not mandatory>
-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
-help: Display the help statement
3.1 测试使用10个mapper和5个reducer来创建1000个文件
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
nnbench \
-operation create_write \
-maps 10 \
-reduces 5 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 1000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/NNBench-`hostname`
以下使用的测试文件为:${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar
4.Terasort
Terasort是测试Hadoop的一个有效的排序程序。通过Hadoop自带的Terasort排序程序,测试不同的Map任务和Reduce任务数量,对Hadoop性能的影响。 实验数据由程序中的teragen程序生成,数量为1G和10G。
一个TeraSort测试需要按三步:
- TeraGen生成随机数据
- TeraSort对数据排序
- TeraValidate来验证TeraSort输出的数据是否有序,如果检测到问题,将乱序的key输出到目录
4.1 TeraGen 生成随机数
将结果输出到目录/tmp/examples/terasort-intput[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \ teragen 10000 /tmp/examples/terasort-input
4.2 TeraSort 排序
将结果输出到目录/tmp/examples/terasort-output[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \ terasort /tmp/examples/terasort-input /tmp/examples/terasort-output
4.3 TeraValidate 验证排序
如果检测到问题,将乱序的key输出到目录/tmp/examples/terasort-validate[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \ teravalidate /tmp/examples/terasort-output /tmp/examples/terasort-validate
5.sort 程序评测 MapReduce
5.1 randomWriter 产生随机数
每个节点运行10个Map任务,每个Map产生大约1G大小的二进制随机数:
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
randomwriter /tmp/examples/random-data
5.2 sort 排序
sudo -uhdfs hadoop jar \
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar sort \
/tmp/examples/random-data /tmp/examples/sorted-data
5.3 testmapredsort 验证排序
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
testmapredsort \
-sortInput /tmp/examples/random-data \
-sortOutput /tmp/examples/sorted-data