- 浏览: 564236 次
- 性别:
- 来自: 济南
文章分类
- 全部博客 (270)
- Ask chenwq (10)
- JSF (2)
- ExtJS (5)
- Life (19)
- jQuery (5)
- ASP (7)
- JavaScript (5)
- SQL Server (1)
- MySQL (4)
- En (1)
- development tools (14)
- Data mining related (35)
- Hadoop (33)
- Oracle (13)
- To Do (2)
- SSO (2)
- work/study diary (10)
- SOA (6)
- Ubuntu (7)
- J2SE (18)
- NetWorks (1)
- Struts2 (2)
- algorithm (9)
- funny (1)
- BMP (1)
- Paper Reading (2)
- MapReduce (23)
- Weka (3)
- web design (1)
- Data visualisation&R (1)
- Mahout (7)
- Social Recommendation (1)
- statistical methods (1)
- Git&GitHub (1)
- Python (1)
- Linux (1)
最新评论
-
brandNewUser:
楼主你好,问个问题,为什么我写的如下的:JobConf pha ...
Hadoop ChainMap -
Molisa:
Molisa 写道mapred.min.split.size指 ...
Hadoop MapReduce Job性能调优——修改Map和Reduce个数 -
Molisa:
mapred.min.split.size指的是block数, ...
Hadoop MapReduce Job性能调优——修改Map和Reduce个数 -
heyongcs:
请问导入之后,那些错误怎么解决?
Eclipse导入Mahout -
a420144030:
看了你的文章深受启发,想请教你几个问题我的数据都放到hbase ...
Mahout clustering Canopy+K-means 源码分析
package cn.edu.xmu.dm.mpdemo.ioformat; import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.SequenceFile.CompressionType; import org.apache.hadoop.io.Text; /** * desc: SequenceFileWriter * <code>SequenceFileWriteDemo</code> * * @author chenwq (irwenqiang@gmail.com) * @version 1.0 2012/05/19 */ public class SequenceFileWriteDemo { private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" }; public static void main(String[] args) throws IOException { String uri = args[0]; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); Path path = new Path(uri); IntWritable key = new IntWritable(); Text value = new Text(); SequenceFile.Writer writer = null; try { /** * fs: outputstream * conf: configuration object * key: the key' type * value: the value's type */ writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass()); // writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), // value.getClass(), CompressionType.BLOCK); for (int i = 0; i < 100; i++) { key.set(100 - i); value.set(DATA[i % DATA.length]); System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value); writer.append(key, value); } } finally { IOUtils.closeStream(writer); } } }
package cn.edu.xmu.dm.mpdemo.ioformat; import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Writable; import org.apache.hadoop.util.ReflectionUtils; /** * desc: SequenceFileReader * <code>SequenceFileReadDemo</code> * * @author chenwq (irwenqiang@gmail.com) * @version 1.0 2012/05/19 */ public class SequenceFileReadDemo { public static void main(String[] args) throws IOException { String uri = args[0]; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); Path path = new Path(uri); SequenceFile.Reader reader = null; try { reader = new SequenceFile.Reader(fs, path, conf); Writable key = (Writable) ReflectionUtils.newInstance( reader.getKeyClass(), conf); Writable value = (Writable) ReflectionUtils.newInstance( reader.getValueClass(), conf); long position = reader.getPosition(); while (reader.next(key, value)) { String syncSeen = reader.syncSeen() ? "*" : ""; System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key, value); position = reader.getPosition(); // beginning of next record } } finally { IOUtils.closeStream(reader); } } }
使用Block压缩后的大小对比:
root@ubuntu:~# hadoop fs -ls mpdemo/ Found 2 items -rw-r--r-- 3 root supergroup 4788 2012-05-19 00:11 /user/root/mpdemo/seqinput -rw-r--r-- 3 root supergroup 484 2012-05-19 00:17 /user/root/mpdemo/seqinputblock
发表评论
-
Parallel K-Means Clustering Based on MapReduce
2012-08-04 20:28 1407K-means is a pleasingly paral ... -
Pagerank在Hadoop上的实现原理
2012-07-19 16:04 1474转自:pagerank 在 hadoop 上的实现原理 ... -
Including external jars in a Hadoop job
2012-06-25 20:24 1222办法1: 把所有的第三方jar和自己的class打成一个大 ... -
[转]BSP模型与实例分析(一)
2012-06-15 22:26 0一、BSP模型概念 BSP(Bulk Synchr ... -
Hadoop中两表JOIN的处理方法
2012-05-29 10:35 9651. 概述 在传统数据库(如:MYSQL)中,JOIN ... -
Hadoop DistributedCache
2012-05-27 23:45 1129Hadoop的DistributedCache,可以把 ... -
MapReduce,组合式,迭代式,链式
2012-05-27 23:27 23931.迭代式mapreduce 一些复杂的任务难以用一 ... -
Hadoop ChainMap
2012-05-27 23:09 1990单一MapReduce对一些非常简单的问题提供了很好的支持。 ... -
广度优先BFS的MapReduce实现
2012-05-25 21:47 4313社交网络中的图模型经常需要构造一棵树型结构:从一个特定的节点出 ... -
HADOOP程序日志
2012-05-23 19:53 1020*.log日志文件和*.out日志文件 进入Hadoo ... -
TFIDF based on MapReduce
2012-05-23 11:58 952Job1: Map: input: ( ... -
个人Hadoop 错误列表
2012-05-23 11:31 1492错误1:Too many fetch-failure ... -
Hadoop Map&Reduce个数优化设置以及JVM重用
2012-05-22 11:29 2435Hadoop与JVM重用对应的参数是map ... -
有空读下
2012-05-20 23:59 0MapReduce: JT默认task scheduli ... -
Hadoop MapReduce Job性能调优——修改Map和Reduce个数
2012-05-20 23:46 26764map task的数量即mapred ... -
Hadoop用于和Map Reduce作业交互的命令
2012-05-20 16:02 1228用法:hadoop job [GENERIC_OPTION ... -
Eclipse:Run on Hadoop 没有反应
2012-05-20 11:46 1283原因: hadoop-0.20.2下自带的eclise ... -
Hadoop0.20+ custom MultipleOutputFormat
2012-05-20 11:46 1545Hadoop0.20.2中无法使用MultipleOutput ... -
Custom KeyValueTextInputFormat
2012-05-19 16:23 1719在看老版的API时,发现旧的KeyValueTextInpu ... -
Hadoop Archive解决海量小文件存储
2012-05-18 21:32 2700单台服务器作为Namenode,当文件数量规 ...
相关推荐
在IT领域,尤其是在大数据处理和图像检索系统的设计中,Hadoop SequenceFile是一种广泛使用的存储格式。这个名为"sequencify-CBIR-on-hadoop"的项目专注于将图像数据转化为SequenceFile格式,以便在基于内容的图像...
Its simple programming model, "code once and deploy at any scale" paradigm, and an ever-growing ecosystem make Hadoop an inclusive platform for programmers with different levels of expertise and ...
这可能涉及到使用`SequenceFile.Writer`和`SequenceFile.Reader`类,以及相关的序列化和反序列化工具。 **MapFile** MapFile是SequenceFile的一种扩展,它提供了一种索引结构来加速查找特定键的数据。MapFile由两...
Hadoop Data Processing and Modelling 英文azw3 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
SequenceFile.Reader reader = new SequenceFile.Reader(conf, ..., ...); while (reader.next(key, value)) { // 处理键值对 } reader.close(); ``` 3. **Sequence File的合并** 合并多个Sequence Files...
- SequenceFile Writer:使用Hadoop的`SequenceFile.Writer`类来写入键值对到SequenceFile中,需要配置正确的键和值类型,以及文件路径。 4. **META-INF目录**: 这个目录通常包含关于软件包的信息,比如类库的元...
Base SAS methods that are covered include reading and writing raw data with the DATA step and managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure....
Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.
标题中的"hadoop.dll-and-winutils.exe-for-hadoop2.7"正指向了这个问题的解决方案。 "winutils.exe"是Hadoop项目的一部分,它在Linux系统中对应的是"bin/hadoop"命令,用于执行各种系统级操作,如设置HDFS的权限、...
SequenceFile的写入是通过SequenceFile.Writer来实现的,並将小文件的名称和内容写入SequenceFile中。 需要注意的是,SequenceFile是一种二进制文件,不能使用普通的文本编辑器来查看其内容。要查看SequenceFile的...
Big Data, MapReduce, Hadoop, and Spark with Python: Master Big Data Analytics and Data Wrangling with MapReduce Fundamentals using Hadoop, Spark, and Python by LazyProgrammer English | 15 Aug 2016 | ...
从所给内容中提到的书名《Hadoop Backup and Recovery Solutions》来看,书中深入探讨了如何从Hadoop备份集群中恢复数据以及如何排查问题。这包括了对Hadoop集群备份过程中可能遇到的问题,及其解决方案的介绍,也...
在IT行业中,Hadoop是一个广泛使用的开源框架,用于存储和处理大数据。Hadoop分布式文件系统(HDFS)和MapReduce是其核心组件,允许数据在集群中的多台服务器上进行分布式计算。标题“pc机连接集群的HADOOP_HOME”指...
Data Algorithms Recipes for Scaling Up with Hadoop and Spark 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
This book jumps into the world of Hadoop ecosystem components and its tools in a simplified manner, and provides you with the skills to utilize them effectively for faster and effective development of...
Hadoop 2.7.3 Windows64位 编译bin(包含winutils.exe, hadoop.dll),自己用的,把压缩包里的winutils.exe, hadoop.dll 放在你的bin 目录 在重启eclipse 就好了