- 浏览: 1048694 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (538)
- 奇文共赏 (36)
- spring (13)
- hibernate (10)
- AOP/Aspectj (9)
- spring security (7)
- lucence (5)
- compass (3)
- jbmp (2)
- jboss rule(drools) (0)
- birt (1)
- jasper (1)
- cxf (3)
- flex (98)
- webgis (6)
- 设计模式 (1)
- 代码重构 (2)
- log4j (1)
- tomcat (9)
- 神品音乐 (1)
- 工作计划 (2)
- appfuse (1)
- svn (4)
- 寻章摘句 (3)
- eclipse (10)
- arcgis api for flex (1)
- 算法 (5)
- opengis-cs (1)
- bug心得 (13)
- 图标 (1)
- software&key (14)
- java (17)
- 搞笑视频 (13)
- sqlserver (9)
- postgresql (1)
- postgis (0)
- geoserver (5)
- 日子 (50)
- 水晶报表 (1)
- 绝对电影 (3)
- Alternativa3D (1)
- 酷站大全 (10)
- c++ (5)
- oracle (17)
- oracle spatial (25)
- flashbuilder4 (3)
- TweenLite (1)
- DailyBuild (6)
- 华山论贱 (5)
- 系统性能 (5)
- 经典古文 (6)
- SOA/SCA/OSGI (6)
- jira (2)
- Hadoop生态圈(hadoop/hbase/pig/hive/zookeeper) (37)
- 风水 (1)
- linux操作基础 (17)
- 经济 (4)
- 茶 (3)
- JUnit (1)
- C# dotNet (1)
- netbeans (1)
- Java2D (1)
- QT4 (1)
- google Test/Mock/AutoTest (3)
- maven (1)
- 3d/OSG (1)
- Eclipse RCP (3)
- CUDA (1)
- Access control (0)
- http://linux.chinaunix.net/techdoc/beginner/2008/01/29/977725.shtml (1)
- redis (1)
最新评论
-
dove19900520:
朋友,你确定你的标题跟文章内容对应???
tomcat控制浏览器不缓存 -
wussrc:
我只想说牛逼,就我接触过的那点云计算的东西,仔细想想还真是这么 ...
别样解释云计算,太TM天才跨界了 -
hw_imxy:
endpoint="/Hello/messagebr ...
flex+java代码分两个工程 -
gaohejie:
rsrsdgrfdh坎坎坷坷
Flex 与 Spring 集成 -
李涤尘:
谢谢。不过说得有点太罗嗦了。
Oracle数据库数据的导入及导出(转)
How to Benchmark a Hadoop Cluster
http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/
Is the cluster set up correctly? The best way to answer this question is empirically: run some jobs and confirm that you get the expected results. Benchmarks make good tests, as you also get numbers that you can compare with other clusters as a sanity check on whether your new cluster is performing roughly as expected. And you can tune a cluster using benchmark results to squeeze the best performance out of it. This is often done with monitoring systems in place, so you can see how resources are being used across the cluster. To get the best results, you should run benchmarks on a cluster that is not being used by others. In practice, this is just before it is put into service, and users start relying on it. Once users have periodically scheduled jobs on a cluster it is generally impossible to find a time when the cluster is not being used (unless you arrange downtime with users), so you should run benchmarks to your satisfaction before this happens. Experience has shown that most hardware failures for new systems are hard drive failures. By running I/O intensive benchmarks—such as the ones described next—you can “burn in” the cluster before it goes live. Hadoop comes with several benchmarks that you can run very easily with minimal setup cost. Benchmarks are packaged in the test JAR file, and you can get a list of them, with descriptions, by invoking the JAR file with no arguments: Most of the benchmarks show usage instructions when invoked with no arguments. For example: The following command writes 10 files of 1,000 MB each: At the end of the run, the results are written to the console and also recorded in a local file (which is appended to, so you can rerun the benchmark and not lose old results): The files are written under the To run a read benchmark, use the Here are the results for a real run: When you’ve finished benchmarking, you can delete all the generated files from HDFS using the Hadoop comes with a MapReduce program that does a partial sort of its input. It is very useful for benchmarking the whole MapReduce system, as the full input dataset is transferred through the shuffle. The three steps are: generate some random data, perform the sort, then validate the results. First we generate some random data using Here’s how to invoke Next we can run the The overall execution time of the sort is the metric we are interested in, but it’s instructive to watch the job’s progress via the web UI ( As a final sanity check, we validate the data in This command runs the There are many more Hadoop benchmarks, but the following are widely used: Gridmix is a suite of benchmarks designed to model a realistic cluster workload, by mimicking a variety of data-access patterns seen in practice. See For tuning, it is best to include a few jobs that are representative of the jobs that your users run, so your cluster is tuned for these and not just for the standard benchmarks. If this is your first Hadoop cluster and you don’t have any user jobs yet, then Gridmix is a good substitute. When running your own jobs as benchmarks you should select a dataset for your user jobs that you use each time you run the benchmarks to allow comparisons between runs. When you set up a new cluster, or upgrade a cluster, you will be able to use the same dataset to compare the performance with previous runs. [63] In a similar vein, PigMix is a set of benchmarks for Pig available from http://wiki.apache.org/pig/PigMix. Apache Hadoop is ideal for organizations with a growing need to process massive application datasets.Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop is used to solve specific problems.How to Benchmark a Hadoop Cluster
%
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar
%
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO
TestFDSIO.0.0.4
Usage: TestFDSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile
resultFileName] [-bufferSize Bytes] TestDFSIO
tests the I/O performance of HDFS. It does this by using a MapReduce job as a convenient way to read or write files in parallel. Each file is read or written in a separate map task, and the output of the map is used for collecting statistics relating to the file just processed. The statistics are accumulated in the reduce, to produce a summary.%
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10
-fileSize 1000
%
cat TestDFSIO_results.log
----- TestDFSIO ----- : write
Date & time: Sun Apr 12 07:14:09 EDT 2009
Number of files: 10
Total MBytes processed: 10000
Throughput mb/sec: 7.796340865378244
Average IO rate mb/sec: 7.8862199783325195
IO rate std deviation: 0.9101254683525547
Test exec time sec: 163.387/benchmarks/TestDFSIO
directory by default (this can be changed by setting thetest.build.data
system property), in a directory called io_data
.-read
argument. Note that these files must already exist (having been written byTestDFSIO -write
):%
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -read -nrFiles 10
-fileSize 1000
----- TestDFSIO ----- : read
Date & time: Sun Apr 12 07:24:28 EDT 2009
Number of files: 10
Total MBytes processed: 10000
Throughput mb/sec: 80.25553361904304
Average IO rate mb/sec: 98.6801528930664
IO rate std deviation: 36.63507598174921
Test exec time sec: 47.624
-clean
argument:%
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -clean
RandomWriter
. It runs a MapReduce job with 10 maps per node, and each map generates (approximately) 10 GB of random binary data, with key and values of various sizes. You can change these values if you like by setting the properties test.randomwriter.maps_per_host
and test.randomwrite.bytes_per_map
. There are also settings for the size ranges of the keys and values; see RandomWriter
for details.RandomWriter
(found in the example JAR file, not the test one) to write its output to a directory calledrandom-data
:%
hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar randomwriter random-data
Sort
program:%
hadoop jar $HADOOP_INSTALL/hadoop-*-examples.jar sort random-data sorted-data
http://
), where you can get a feel for how long each phase of the job takes.jobtracker-host
:50030/sorted-data
is, in fact, correctly sorted:%
hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar testmapredsort -sortInput random-data \
-sortOutput sorted-data
SortValidator
program, which performs a series of checks on the unsorted and sorted data to check whether the sort is accurate. It reports the outcome to the console at the end of its run:SUCCESS! Validated the MapReduce framework's 'sort' successfully.
MRBench
(invoked with mrbench
) runs a small job a number of times. It acts as a good counterpoint to sort, as it checks whether small job runs are responsive.NNBench
(invoked with nnbench
) is useful for load testing namenode hardware.src/benchmarks/gridmix2
in the distribution for further details.[63]
发表评论
-
一网打尽当下NoSQL类型、适用场景及使用公司
2014-12-28 20:56 968一网打尽当下NoSQL类型、适用场景及使用公司 http:// ... -
别样解释云计算,太TM天才跨界了
2014-02-25 09:41 2440http://mp.weixin.qq.com/s?__bi ... -
Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS
2013-12-09 11:17 2539http://www.srccodes.com/p/arti ... -
hadoop的超时设置
2013-06-23 11:47 2429from http://blog.163.com/zheng ... -
hadoop与panasas
2012-12-26 09:53 882在应用的场景中,hadoop当然希望使用全部的本地硬盘,但是对 ... -
程序开过多线程,导致hadoop作业无法运行成功
2012-10-23 16:14 7065Exception in thread "Threa ... -
mount盘异常,导致hadoop作业无法发送
2012-10-23 16:12 956异常信息 2012-10-23 21:10:42,18 ... -
HDFS quota 設定
2012-08-02 16:22 5525http://fenriswolf.me/2012/04/04 ... -
hadoop常用的指令
2011-10-09 16:50 1702hadoop job -kill jobid 可以整个的杀掉 ... -
Hadoop基准测试
2011-08-08 10:04 1277http://www.michael-noll.com/ ... -
Hadoop Job Scheduler作业调度器
2011-05-21 11:02 2526http://hi.baidu.com/zhengxiang3 ... -
hadoop指定某个文件的blocksize,而不改变整个集群的blocksize
2011-03-20 17:20 2108文件上传的时候,使用下面的命令即可 hadoop f ... -
Hadoop Job Tuning
2011-02-28 15:53 818http://www.searchtb.com/2010/12 ... -
如何在不重启整个hadoop集群的情况下,增加新的节点
2011-02-25 10:12 14051.在namenode 的conf/slaves文件中增加新的 ... -
对hadoop task进行profiling的几种方法整理
2011-02-10 21:57 1654对hadoop task进行profiling的几种方法整 ... -
如何对hadoop作业的某个task进行debug单步跟踪
2011-02-10 21:56 2080http://blog.csdn.net/AE86_FC/ar ... -
hadoop 0.20 程式開發 eclipse plugin
2011-01-26 19:36 2258http://trac.nchc.org.tw/cloud/w ... -
hadoop-0.21.0-eclipse-plugin无法在eclipse中运行解决方案
2011-01-26 09:47 3600LINUX下将hadoop-0.21自带的hadoop ecl ... -
json在线格式化
2010-12-21 16:23 2434http://jsonformatter.curiouscon ... -
Hadoop的mapred TaskTracker端源码概览
2010-11-14 11:24 1289http://jiwenke.iteye.com/blog/3 ...
相关推荐
How to Benchmark Your Linux System.mp4How to Benchmark Your Linux System.mp4How to Benchmark Your Linux System.mp4
《基准方法在量化金融中的应用》(A Benchmark Approach to Quantitative Finance)是量化金融领域的一本重要著作,由Eduard Platen与David Heath共同撰写,并于2006年出版。本书不仅为读者提供了对基准方法在量化...
TPC-DS (Transaction Processing Performance Council Benchmark D) 是一种专门用来衡量大规模数据仓库性能的标准测试方法。通过比较不同系统的TPC-DS测试结果,可以直观地看出它们在处理大规模数据集时的表现差异。...
-v verbosity How much troubleshooting info to print -w Print out results in HTML tables -i Use HEAD instead of GET -x attributes String to insert as table attributes -y attributes String to ...
总结来说,"Benchmark.rar_benchmark_benchmark三代_benchmark模型_三代benchmark_主动控制"是一个专注于主动控制算法评估的资源包,它提供了一个先进的三代Benchmark模型,用于比较和优化控制策略的性能。...
Hortonworks由雅虎与Benchmark Capital合资成立,其工程师团队对Hadoop的贡献很大,拥有Hadoop的核心代码。 Hadoop之所以受到如Yahoo!、Facebook、LinkedIn等公司青睐,是因为其强大的数据存储和分析能力。这些公司...
ECCV2014最新论文RGBD Salient Object Detection A Benchmark and Algorithms。
You’ll make sure slow code doesn’t creep back into your Ruby application by writing performance tests, and you’ll learn the right way to benchmark Ruby. And finally, you’ll dive into the Ruby ...
"Hibench BenchMark suite.docx"涉及的是Hadoop的基准测试工具Hibench,它详细介绍了如何使用Hibench进行大数据处理性能的评估,包括各种工作负载的设定和结果分析,这对于评估和优化Hadoop集群性能至关重要。...
常用的测试工具有Hadoop自带的Benchmark工具如`hadoop dfsio`和`testDFSIO`等。通过执行这些工具的不同模式(如读、写、随机读等),可以评估Hadoop集群的性能表现,为后续的调优提供数据支持。 #### 四、结语 ...
- “坑爹的小A”不仅完成了Fritz Chess Benchmark4.3.2的汉化工作,还可能对软件进行了本地化适应,确保其在中国环境下的稳定运行。 - 汉化版的推出降低了语言障碍,使得更多用户能够理解和使用这款专业工具,促进...
Hortonworks则是由雅虎和Benchmark Capital合资成立的,其团队成员为Hadoop的早期开发者,对Hadoop代码库做出了巨大贡献。 Hortonworks的Hadoop发行版在企业中广泛应用,特别是在稳定性和支持方面表现出色。 总的来...
PHASE II OF THE ASCE BENCHMARK STUDY ON SHM
### TeraByte Sort on Apache Hadoop #### 概述 《TeraByte Sort on Apache Hadoop》是由Yahoo公司的Owen O’Malley撰写的一篇关于Hadoop基准测试方法的论文,该论文详细介绍了一种用于Hadoop平台的大规模数据排序...
Benchmark etcd性能测试工具
A Practical Guide To Quantitative Finance Interviews pdf,
** BenchmarkSQL数据库测试工具概述 ** BenchmarkSQL是一款广泛使用的开源数据库性能测试工具,它能够对各种类型的数据库系统进行基准测试,以评估其在不同工作负载下的性能。该工具旨在提供一个标准化的方法来比较...
- Hortonworks成立于2011年,最初由雅虎和Benchmark Capital共同投资成立。 - 主打产品是HDP(Hortonworks Data Platform),同样是一个100%开源的发行版,包含了Ambari,用于简化安装和管理流程。 - 2019年,...