Hadoop【环境搭建 05】【hadoop-3.1.3 单机版基准测试 TestDFSIO + mrbench + nnbench + Terasort + sort 举例】

简介: 【4月更文挑战第1天】Hadoop【环境搭建 05】【hadoop-3.1.3 单机版基准测试 TestDFSIO + mrbench + nnbench + Terasort + sort 举例】

利用hadoop自带基准测试工具包进行集群性能测试【也可以测试单机版】,我是用的是 hadoop-3.1.3,测试文件为:

${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar

以下使用的测试文件为:${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar

# 不带参数运行 可以查看参数说明
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  NNdataGenerator: Generate the data to be used by NNloadGenerator
  NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
  NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
  NNstructureGenerator: Generate the structure to be used by NNdataGenerator
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  gsleep: A sleep job whose mappers create 1MB buffer for every record.
  largesorter: Large-Sort tester
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode w/ MR.
  nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
  timelineperformance: A job that launches mappers to test timeline service performance.

翻译版:

DFSCIOTest:    libhdfs的分布式i/o基准测试。
DistributedFSCheck:    分布式文件系统一致性校验。
JHLogAnalyzer:    作业历史日志分析器。
MRReliabilityTest:    通过注入错误/故障来测试MR框架的可靠性程序。
nndatgenerator:    生成NNloadGenerator使用的数据。
NNloadGenerator: 使用无MR运行的NN loadgenerator在Namenode上生成负载。
nnloadgenerator: 使用MR作业运行的NN loadgenerator在Namenode上生成负载。
NNstructureGenerator: 生成NNdataGenerator使用的结构。
SliveTest: HDFS压力测试和实时数据验证。
TestDFSIO: 分布式i/o基准测试。
fail: 一个总是失败的作业。
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)这个都是专用名词,不翻译了。
gsleep: 一个睡眠任务,它的映射器为每条记录创建1MB的缓冲区。
largesorter: Large-Sort 测试器。
loadgen: 通用的 map/reduce 负载生成器。
mapredtest: map/reduce 测试检查。
minicluster: 单进程HDFS和MR集群。
mrbench: 一个可以创造许多小作业的map/reduce基准。
nnbench: 强调 namenode w/ 的MR基准测试。
nnbenchWithoutMR: 强调 namenode w/o 的MR基准测试。
sleep: 在每个map和reduce任务上休眠的作业。
testbigmapoutput: 一个在一个非常大的不可分割文件上作业的map/reduce程序,并执行一致性 map/reduce
testfilesystem: 文件系统读/写测试。
testmapredsort: 一个验证map-reduce框架排序的map/reduce程序。
testsequencefile: 对二进制键值对的平面文件进行测试。
testsequencefileinputformat: 序列文件输入格式测试。
testtextputformat: 文本输入格式的测试。
threadmapbench: 一个map/reduce基准,用于比较具有多个溢出的map对具有1个溢出的map的性能。
timelineperformance: 启动映射器以测试时间轴服务性能的任务。

1 TestDFSIO

TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。查看说明:

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO

2021-08-16 11:40:46,857 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: 
TestDFSIO [genericOptions] 
-read [-random | 
-backward | 
-skip [-skipSize Size]] | 
-write | 
-append | 
-truncate | 
-clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-storagePolicy storagePolicyName] [-erasureCodePolicy erasureCodePolicyName]

1.1 测试HDFS写性能

测试内容:向HDFS集群写10个10KB的文件:

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
TestDFSIO \
-write \
-nrFiles 10 \
-size 10KB \
-resFile /home/hadoop/tmp/TestDFSIO.log

查看一下:
在这里插入图片描述生成了10个文件,虽然文件内容是10KB,但是块的大小是128MB。

1.2 测试HDFS读性能

测试内容:读取HDFS集群10个128M的文件

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
TestDFSIO \
-read \
-nrFiles 10 \
-size 128MB \
-resFile /home/hadoop/tmp/TestDFSIO.log

1.3 清除测试数据

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
TestDFSIO -clean

2021-08-16 15:45:25,751 INFO fs.TestDFSIO: TestDFSIO.1.8
2021-08-16 15:45:25,752 INFO fs.TestDFSIO: nrFiles = 10
2021-08-16 15:45:25,768 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
2021-08-16 15:45:25,768 INFO fs.TestDFSIO: bufferSize = 1000000
2021-08-16 15:45:25,768 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
2021-08-16 15:45:27,051 INFO fs.TestDFSIO: Cleaning up test files

2.mrbench

mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。 使用 -help 查看说明:

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
mrbench -help

MRBenchmark.0.0.2
Usage: 
mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] 
[-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] 
[-numRuns <number of times to run the job, default is 1>] 
[-maps <number of maps for each run, default is 2>] 
[-reduces <number of reduces for each run, default is 1>] 
[-inputLines <number of input lines to generate, default is 1>] 
[-inputType <type of input to generate, one of ascending (default), descending, random>] 
[-verbose]

2.1 测试运行一个作业50次:

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
mrbench \
-numRuns 50 \
-maps 10 \
-reduces 5 \
-inputLines 10 \
-inputType descending

3.nnbench

nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。使用 -help 查看说明:

查看说明:

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
nnbench -help

NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
        -operation <Available operations are create_write open_read rename delete. This option is mandatory>
         * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
        -maps <number of maps. default is 1. This is not mandatory>
        -reduces <number of reduces. default is 1. This is not mandatory>
        -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time. default is launch time + 2 mins. This is not mandatory>
        -blockSize <Block size in bytes. default is 1. This is not mandatory>
        -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
        -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
        -numberOfFiles <number of files to create. default is 1. This is not mandatory>
        -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
        -baseDir <base DFS path. default is /benchmarks/NNBench. This is not mandatory>
        -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
        -help: Display the help statement

3.1 测试使用10个mapper和5个reducer来创建1000个文件

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-client-jobclient-3.1.3-tests.jar \
nnbench \
-operation create_write \
-maps 10 \
-reduces 5 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 1000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/NNBench-`hostname`

以下使用的测试文件为:${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar

4.Terasort

Terasort是测试Hadoop的一个有效的排序程序。通过Hadoop自带的Terasort排序程序,测试不同的Map任务和Reduce任务数量,对Hadoop性能的影响。 实验数据由程序中的teragen程序生成,数量为1G和10G。

一个TeraSort测试需要按三步:

  1. TeraGen生成随机数据
  2. TeraSort对数据排序
  3. TeraValidate来验证TeraSort输出的数据是否有序,如果检测到问题,将乱序的key输出到目录

    4.1 TeraGen 生成随机数

    将结果输出到目录/tmp/examples/terasort-intput
    [root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
    teragen 10000 /tmp/examples/terasort-input
    

    4.2 TeraSort 排序

    将结果输出到目录/tmp/examples/terasort-output
    [root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
    terasort /tmp/examples/terasort-input /tmp/examples/terasort-output
    

    4.3 TeraValidate 验证排序

    如果检测到问题,将乱序的key输出到目录/tmp/examples/terasort-validate
    [root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
    teravalidate /tmp/examples/terasort-output /tmp/examples/terasort-validate
    

5.sort 程序评测 MapReduce

5.1 randomWriter 产生随机数

每个节点运行10个Map任务,每个Map产生大约1G大小的二进制随机数:

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
randomwriter /tmp/examples/random-data

5.2 sort 排序

sudo -uhdfs hadoop jar \
[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar sort \
/tmp/examples/random-data /tmp/examples/sorted-data

5.3 testmapredsort 验证排序

[root@tcloud mapreduce]# hadoop jar ./hadoop-mapreduce-examples-3.1.3.jar \
testmapredsort \
-sortInput /tmp/examples/random-data \ 
-sortOutput /tmp/examples/sorted-data
目录
相关文章
|
6天前
|
分布式计算 资源调度 Hadoop
Hadoop【问题记录 02】【hadoop-3.1.3 单机版】ResourceManager无法启动NodeManager启动后过自动关闭 javax/activation/DataSource
【4月更文挑战第2天】Hadoop【问题记录 02】【hadoop-3.1.3 单机版】ResourceManager无法启动NodeManager启动后过自动关闭 javax/activation/DataSource
45 2
|
6天前
|
IDE Java 测试技术
【如何学习Python自动化测试】—— 自动化测试环境搭建
【如何学习Python自动化测试】—— 自动化测试环境搭建
4 0
|
6天前
|
分布式计算 Hadoop 测试技术
|
6天前
|
分布式计算 Hadoop 测试技术
|
6天前
|
分布式计算 Hadoop 测试技术
Hadoop节点网络性能的带宽测试
【4月更文挑战第23天】
25 1
|
6天前
|
分布式计算 安全 Hadoop
Hadoop节点网络性能测试时延测试
【4月更文挑战第22天】
28 2
|
6天前
|
分布式计算 Hadoop 测试技术
Hadoop节点网络性能的带宽测试
【4月更文挑战第22天】
32 4
|
6天前
|
分布式计算 Hadoop 测试技术
Hadoop节点网络性能测试准备测试工具
【4月更文挑战第22天】选择合适的网络性能测试工具对于评估Hadoop集群的网络性能至关重要。这些工具可以帮助我们收集准确的数据,为优化集群配置和性能提供有力的支持。
25 1
|
6天前
|
分布式计算 安全 Hadoop
Hadoop节点网络性能测试
【4月更文挑战第21天】
24 3
|
6天前
|
分布式计算 监控 Hadoop
Hadoop节点扩容网络性能测试
【4月更文挑战第20天】
25 5