[hadoop@sc706-26 hadoop-0.20.2]$ hadoop jar /home/hadoop/mahout-0.3/mahout-examples-0.3.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -i testdata -o output
10/09/20 14:46:07 INFO kmeans.Job: Preparing Input
10/09/20 14:46:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/09/20 14:46:08 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/20 14:46:10 INFO mapred.JobClient: Running job: job_201009201429_0001
10/09/20 14:46:11 INFO mapred.JobClient: map 0% reduce 0%
10/09/20 14:46:24 INFO mapred.JobClient: map 50% reduce 0%
10/09/20 14:46:27 INFO mapred.JobClient: map 100% reduce 0%
10/09/20 14:46:29 INFO mapred.JobClient: Job complete: job_201009201429_0001
10/09/20 14:46:29 INFO mapred.JobClient: Counters: 9
10/09/20 14:46:29 INFO mapred.JobClient: Job Counters
10/09/20 14:46:29 INFO mapred.JobClient: Rack-local map tasks=1
10/09/20 14:46:29 INFO mapred.JobClient: Launched map tasks=2
10/09/20 14:46:29 INFO mapred.JobClient: Data-local map tasks=1
10/09/20 14:46:29 INFO mapred.JobClient: FileSystemCounters
10/09/20 14:46:29 INFO mapred.JobClient: HDFS_BYTES_READ=291645
10/09/20 14:46:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=483037
10/09/20 14:46:29 INFO mapred.JobClient: Map-Reduce Framework
10/09/20 14:46:29 INFO mapred.JobClient: Map input records=601
10/09/20 14:46:29 INFO mapred.JobClient: Spilled Records=0
10/09/20 14:46:29 INFO mapred.JobClient: Map input bytes=288375
10/09/20 14:46:29 INFO mapred.JobClient: Map output records=601
10/09/20 14:46:29 INFO kmeans.Job: Running Canopy to get initial clusters
10/09/20 14:46:29 INFO canopy.CanopyDriver: Input: output/data Out: output/canopies Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure t1: 80.0 t2: 55.0
10/09/20 14:46:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/09/20 14:46:31 INFO mapred.FileInputFormat: Total input paths to process : 2
10/09/20 14:46:32 INFO mapred.JobClient: Running job: job_201009201429_0002
10/09/20 14:46:33 INFO mapred.JobClient: map 0% reduce 0%
10/09/20 14:46:42 INFO mapred.JobClient: map 50% reduce 0%
10/09/20 14:46:48 INFO mapred.JobClient: Task Id : attempt_201009201429_0002_m_000001_0, Status : FAILED
org.apache.mahout.math.CardinalityException: My cardinality is: 0, but the other is: 60
at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:275)
at org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)
at org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)
at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:108)
at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:49)
at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
10/09/20 14:46:51 INFO mapred.JobClient: map 50% reduce 16%
10/09/20 14:46:56 INFO mapred.JobClient: Task Id : attempt_201009201429_0002_m_000001_1, Status : FAILED
org.apache.mahout.math.CardinalityException: My cardinality is: 0, but the other is: 60
at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:275)
at org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)
at org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)
at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:108)
at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:49)
at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
10/09/20 14:47:02 INFO mapred.JobClient: Task Id : attempt_201009201429_0002_m_000001_2, Status : FAILED
org.apache.mahout.math.CardinalityException: My cardinality is: 0, but the other is: 60
at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:275)
at org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)
at org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)
at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:108)
at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:49)
at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
10/09/20 14:47:11 INFO mapred.JobClient: Job complete: job_201009201429_0002
10/09/20 14:47:11 INFO mapred.JobClient: Counters: 14
10/09/20 14:47:11 INFO mapred.JobClient: Job Counters
10/09/20 14:47:11 INFO mapred.JobClient: Launched reduce tasks=1
10/09/20 14:47:11 INFO mapred.JobClient: Rack-local map tasks=3
10/09/20 14:47:11 INFO mapred.JobClient: Launched map tasks=5
10/09/20 14:47:11 INFO mapred.JobClient: Data-local map tasks=2
10/09/20 14:47:11 INFO mapred.JobClient: Failed map tasks=1
10/09/20 14:47:11 INFO mapred.JobClient: FileSystemCounters
10/09/20 14:47:11 INFO mapred.JobClient: HDFS_BYTES_READ=242288
10/09/20 14:47:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=12038
10/09/20 14:47:11 INFO mapred.JobClient: Map-Reduce Framework
10/09/20 14:47:11 INFO mapred.JobClient: Combine output records=0
10/09/20 14:47:11 INFO mapred.JobClient: Map input records=301
10/09/20 14:47:11 INFO mapred.JobClient: Spilled Records=15
10/09/20 14:47:11 INFO mapred.JobClient: Map output bytes=11940
10/09/20 14:47:11 INFO mapred.JobClient: Map input bytes=242198
10/09/20 14:47:11 INFO mapred.JobClient: Combine input records=0
10/09/20 14:47:11 INFO mapred.JobClient: Map output records=15
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:163)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:152)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
请各位多多指教!
分享到:
相关推荐
Java课设基于Hadoop的kmeans实现对NBA球队球风聚类源码.zipJava课设基于Hadoop的kmeans实现对NBA球队球风聚类源码.zipJava课设基于Hadoop的kmeans实现对NBA球队球风聚类源码.zipJava课设基于Hadoop的kmeans实现对NBA...
《基于Hadoop的Kmeans算法实现详解》 Kmeans算法是一种广泛应用的无监督学习方法,主要用于数据聚类,它通过将数据点分配到最近的聚类中心来形成多个紧密聚集的簇。在大数据处理领域,结合Hadoop框架,Kmeans算法...
现在我们来深入探讨如何使用 Hadoop MapReduce 实现 KMeans 算法。 首先,我们需要理解 KMeans 算法的基本原理。KMeans 算法的核心思想是通过迭代找到最优的簇中心,使得每个数据点到所属簇中心的距离最小。算法...
hadoop jar HadoopKMeans.jar com.jgalilee.hadoop.kmeans.driver.Driver \ input/kmeans.state \ input/points.txt \ input/clusters.txt \ 2 \ output/ \ 0.0 \ 5 home - 每次迭代都可以写入文件名迭代...
将KMeans与Hadoop结合,可以处理大规模数据集,极大地提高了算法的效率。接下来,我们将深入探讨Hadoop环境下KMeans算法的实现原理、步骤以及实际应用。 一、Hadoop框架简介 Hadoop是基于Java开发的,其核心组件...
### KMeans算法在Hadoop平台上的大规模中文网页聚类应用 #### 概述 随着互联网技术的迅猛发展,网络上产生了海量的中文网页资源。如何有效地组织这些网页资源,提高用户的检索效率,成为了一个重要的研究课题。聚类...
《Hadoop实现KMeans聚类算法详解》 在大数据处理领域,Hadoop作为一个分布式计算框架,为海量数据的处理提供了高效、可靠的解决方案。而KMeans是广泛应用的一种无监督机器学习算法,常用于数据聚类。本文将深入探讨...
基于云计算平台Hadoop的并行kmeans聚类算法设计研究 本研究主要集中在基于云计算平台Hadoop的并行kmeans聚类算法设计方面,旨在解决实际应用中需要处理的海量数据问题。该研究首先对聚类研究的背景和挑战进行了深入...
Hadoop的运行原理分析深入揭示了其作为分布式处理方案的核心优势,即能够通过简单的编程模型,将复杂的数据处理任务分布到大规模的机器集群上,大幅度提升数据处理和分析的效率。对于刚刚入门的IT人员来说,掌握...
小例子是自己写的,目的是让自己熟悉一下如何在集群上运行一个mapreduce项目,大家可以参考此例子学习hadoop,对入门很有帮助。小例子是自己写的,目的是让自己熟悉一下如何在集群上运行一个mapreduce项目,大家可以...
### Hadoop运行WordCount实例详解 #### 一、Hadoop简介与WordCount程序的重要性 Hadoop 是一个由Apache基金会所开发的分布式系统基础架构。它能够处理非常庞大的数据集,并且能够在集群上运行,通过将大数据分割...
《Hadoop运行原理分析》是深入理解大数据处理框架Hadoop的核心读物,它详细解析了Hadoop如何在大规模数据集上高效运行。本文件主要涵盖了以下几个关键知识点: 1. **Hadoop概述**:Hadoop是Apache软件基金会开发的...
为了运行Hadoop项目,你需要一个配置完善的Hadoop环境,包括安装Hadoop和配置Hadoop的环境变量。同时,为了方便管理和构建项目,通常会使用Maven作为构建工具。Maven是一个项目管理和依赖管理工具,可以帮助我们管理...
步骤2:使用python脚本规范化每个文档的tfidf,将创建一个文件GutenbergBookNorm.csv-python euclidian_normalizer.py 第三步:将标准化文件GutenbergBook.csv复制到hdfs-hadoop fs -mkdir inputKmeans-hadoop fs -...
hadoop简单开发例子源码(含jar包),适用于初学者!
【标题】中的“hadoop scala spark 例子项目,运行了单机wordcount”指的是一个使用Hadoop、Scala和Spark框架实现的简单WordCount程序。在大数据处理领域,WordCount是入门级的经典示例,用于统计文本文件中单词出现...
以下是对"eclipse连接hadoop所需要的hadoop.ddl和eclipse插件和hadoop运行案例"这一主题的详细解释: 首先,让我们了解`hadoop.ddl`。DDL(Data Definition Language)通常指的是数据库中用于定义数据结构的语句,...