- 浏览: 283535 次
- 性别:
- 来自: 广州
文章分类
- 全部博客 (247)
- free talking (11)
- java (18)
- search (16)
- hbase (34)
- open-sources (0)
- architect (1)
- zookeeper (16)
- vm (1)
- hadoop (34)
- nutch (33)
- lucene (5)
- ubuntu/shell (8)
- ant (0)
- mapreduce (5)
- hdfs (2)
- hadoop sources reading (13)
- AI (0)
- distributed tech (1)
- others (1)
- maths (6)
- english (1)
- art & entertainment (1)
- nosql (1)
- algorithms (8)
- hadoop-2.5 (16)
- hbase-0.94.2 source (28)
- zookeeper-3.4.3 source reading (1)
- solr (1)
- TODO (3)
- JVM optimization (1)
- architecture (0)
- hbase-guideline (1)
- data mining (3)
- hive (1)
- mahout (0)
- spark (28)
- scala (3)
- python (0)
- machine learning (1)
最新评论
-
jpsb:
...
为什么需要分布式? -
leibnitz:
hi guy, this is used as develo ...
compile hadoop-2.5.x on OS X(macbook) -
string2020:
撸主真土豪,在苹果里面玩大数据.
compile hadoop-2.5.x on OS X(macbook) -
youngliu_liu:
怎样运行这个脚本啊??大牛,我刚进入搜索引擎行业,希望你能不吝 ...
nutch 数据增量更新 -
leibnitz:
also, there is a similar bug ...
2。hbase CRUD--Lease in hbase
只需要在hadoop-env.sh中修改java home,
不需要format,不需要copyFromLocal to hdfs
注意使用的是standalone状态下的hadoop
use 5s
hadoop@leibnitz-laptop:/cc/hadoop/standalone/hadoop-0.20.2$ ./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input/cluster output/wordcount
11/02/26 03:19:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/02/26 03:19:40 INFO input.FileInputFormat: Total input paths to process : 3
11/02/26 03:19:40 INFO mapred.JobClient: Running job: job_local_0001
11/02/26 03:19:40 INFO input.FileInputFormat: Total input paths to process : 3
11/02/26 03:19:40 INFO mapred.MapTask: io.sort.mb = 100
11/02/26 03:19:41 INFO mapred.MapTask: data buffer = 79691776/99614720
11/02/26 03:19:41 INFO mapred.MapTask: record buffer = 262144/327680
11/02/26 03:19:41 INFO mapred.JobClient: map 0% reduce 0%
11/02/26 03:19:41 INFO mapred.MapTask: Starting flush of map output
11/02/26 03:19:42 INFO mapred.MapTask: Finished spill 0
11/02/26 03:19:42 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/02/26 03:19:42 INFO mapred.LocalJobRunner:
11/02/26 03:19:42 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/02/26 03:19:42 INFO mapred.MapTask: io.sort.mb = 100
11/02/26 03:19:42 INFO mapred.MapTask: data buffer = 79691776/99614720
11/02/26 03:19:42 INFO mapred.MapTask: record buffer = 262144/327680
11/02/26 03:19:43 INFO mapred.MapTask: Spilling map output: record full = true
11/02/26 03:19:43 INFO mapred.MapTask: bufstart = 0; bufend = 2546041; bufvoid = 99614720
11/02/26 03:19:43 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
11/02/26 03:19:43 INFO mapred.MapTask: Starting flush of map output
11/02/26 03:19:43 INFO mapred.JobClient: map 100% reduce 0%
11/02/26 03:19:44 INFO mapred.MapTask: Finished spill 0
11/02/26 03:19:44 INFO mapred.MapTask: Finished spill 1
11/02/26 03:19:44 INFO mapred.Merger: Merging 2 sorted segments
11/02/26 03:19:44 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 740450 bytes
11/02/26 03:19:44 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
11/02/26 03:19:44 INFO mapred.LocalJobRunner:
11/02/26 03:19:44 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
11/02/26 03:19:44 INFO mapred.MapTask: io.sort.mb = 100
11/02/26 03:19:44 INFO mapred.MapTask: data buffer = 79691776/99614720
11/02/26 03:19:44 INFO mapred.MapTask: record buffer = 262144/327680
11/02/26 03:19:44 INFO mapred.MapTask: Starting flush of map output
11/02/26 03:19:45 INFO mapred.MapTask: Finished spill 0
11/02/26 03:19:45 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
11/02/26 03:19:45 INFO mapred.LocalJobRunner:
11/02/26 03:19:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
11/02/26 03:19:45 INFO mapred.LocalJobRunner:
11/02/26 03:19:45 INFO mapred.Merger: Merging 3 sorted segments
11/02/26 03:19:45 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 1474267 bytes
11/02/26 03:19:45 INFO mapred.LocalJobRunner:
11/02/26 03:19:45 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/02/26 03:19:45 INFO mapred.LocalJobRunner:
11/02/26 03:19:45 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/02/26 03:19:45 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output/wordcount
11/02/26 03:19:45 INFO mapred.LocalJobRunner: reduce > reduce
11/02/26 03:19:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/02/26 03:19:45 INFO mapred.JobClient: map 100% reduce 100%
11/02/26 03:19:45 INFO mapred.JobClient: Job complete: job_local_0001
11/02/26 03:19:45 INFO mapred.JobClient: Counters: 12
11/02/26 03:19:45 INFO mapred.JobClient: FileSystemCounters
11/02/26 03:19:45 INFO mapred.JobClient: FILE_BYTES_READ=16082737
11/02/26 03:19:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=8416674
11/02/26 03:19:45 INFO mapred.JobClient: Map-Reduce Framework
11/02/26 03:19:45 INFO mapred.JobClient: Reduce input groups=82331
11/02/26 03:19:45 INFO mapred.JobClient: Combine output records=102317
11/02/26 03:19:45 INFO mapred.JobClient: Map input records=77931
11/02/26 03:19:45 INFO mapred.JobClient: Reduce shuffle bytes=0
11/02/26 03:19:45 INFO mapred.JobClient: Reduce output records=82331
11/02/26 03:19:45 INFO mapred.JobClient: Spilled Records=255947
11/02/26 03:19:45 INFO mapred.JobClient: Map output bytes=6076039
11/02/26 03:19:45 INFO mapred.JobClient: Combine input records=629167
11/02/26 03:19:45 INFO mapred.JobClient: Map output records=629167
11/02/26 03:19:45 INFO mapred.JobClient: Reduce input records=102317
发表评论
-
hadoop-replication written flow
2017-08-14 17:00 560w:net write r :net read( ... -
hbase-export table to json file
2015-12-25 17:21 1669i wanna export a table to j ... -
yarn-similar logs when starting up container
2015-12-09 17:17 94715/12/09 16:47:52 INFO yarn.E ... -
hadoop-compression
2015-10-26 16:52 492http://blog.cloudera.com/blog ... -
hoya--hbase on yarn
2015-04-23 17:00 446Introducing Hoya – HBase on YA ... -
compile hadoop-2.5.x on OS X(macbook)
2014-10-30 15:42 2495same as compile hbase ,it ' ... -
upgrades of hadoop and hbase
2014-10-28 11:39 7421.the match relationships ... -
how to submit jars to a map reduce job?
2014-04-02 01:23 543there maybe two ways : 1.serv ... -
install snappy compression in hadoop and hbase
2014-03-08 00:36 4561.what is snappy ... -
3。hbase rpc/ipc/proxy通信机制
2013-07-15 15:12 1301一。RPC VS IPC (relationship/di ... -
hadoop-2 dfs/yarn 相关概念
2012-10-03 00:22 1909一.dfs 1.旧的dfs方案 可以看到bloc ... -
hadoop 删除节点(Decommission nodes)
2012-09-02 03:28 2680具体的操作步骤网上已经很多,这里只说明一下自己操作过程注意事项 ... -
hadoop 2(0.23.x) 与 0.20.x比较
2012-07-01 12:09 2208以下大部分内容来自网络,这里主要是进行学习,比较 1、 ... -
hadoop-2.0 alpha standalone install
2012-06-10 12:02 2508看了一堆不太相关的东西... 其实只要解压运行即可,形 ... -
hadoop源码阅读-shell启动流程-start-all
2012-05-06 01:13 876when executes start-all.sh ... -
hadoop源码阅读-shell启动流程
2012-05-03 01:58 1885open the bin/hadoop file,you w ... -
hadoop源码阅读-第二回阅读开始
2012-05-03 01:03 1030出于工作需要及版本更新带来的变动,现在开始再次进入源码 ... -
hadoop 联合 join操作
2012-01-02 18:06 1055hadoop join操作类似于sql中的功能,就是对多表进行 ... -
hadoop几种排序简介
2011-12-16 21:52 1621在map reduce框架中,除了常用的分布式计算外,排序也算 ... -
nutch搜索架构关键类
2011-12-13 00:19 14todo
相关推荐
Hadoop 支持三种模式:Local(Standalone)Mode、Pseudo-Distributed Mode 和 Fully-Distributed Mode。在单节点模式下,Hadoop 默认情况下运行于 Local Mode,作为一个单一的 JAVA 进程,多用于调试。 配置文件 ...
- 本地(Standalone)模式 - 伪分布式模式 - 完全分布式模式 5. 单节点安装指南 伪分布式操作包括配置、设置无密码SSH和执行步骤。无密码SSH的设置是必要的,以便Hadoop脚本能够无需密码访问远程守护进程。 ...
通过以上步骤,您可以成功搭建一套包括Spark Standalone/YARN模式、Hadoop、Zookeeper、Alluxio以及IDEA Scala开发环境在内的完整集群环境。这不仅有助于理解分布式系统的架构原理,还能够满足实际开发需求。在整个...
**Standalone模式** 是Spark的一种部署选项,其中所有组件都在单个集群上运行,无需Hadoop或其他分布式资源管理器。这使得Spark可以在本地环境中快速测试和开发,非常适合初学者和小型项目。 **压缩包子文件的文件...
sudo apt-get install default-jdk ``` 2. **下载Spark** 访问Apache Spark官方网站(https://spark.apache.org/downloads.html),根据你的需求选择合适的版本。通常推荐选择稳定版本,例如Spark 3.x系列。下载...
conda install pyspark ``` ##### 2.2 使用Anaconda3进行Spark编程 在安装完成后,就可以在Anaconda的Jupyter Notebook或其他Python环境中使用PySpark进行编程了。以下是一个简单的PySpark示例: ```python ...
通过`zkServer.sh status`检查Zookeeper状态,如果显示为“ZooKeeper is running in standalone mode.”,则表明Zookeeper正在正常运行。停止Zookeeper可以使用`zkServer.sh stop`命令。 安装完成后,Zookeeper可...
Hive 是一个基于 Hadoop 的数据仓库工具,主要用于存储、查询和分析大规模数据。下面将详细介绍 CentOS 7 中 Hive 的安装和使用。 1. 安装 MySQL 在安装 Hive 之前,需要先安装 MySQL 数据库。MySQL 是 Hive 的 ...
Hawq能够处理PB级别的数据,并且能够与其他Apache大数据项目如Hadoop、Hive、Presto等无缝集成。 在"incubator-hawq-master"目录下,你可能会发现以下关键文件和目录: 1. `src`:这是Hawq的主要源代码目录,包括...
2. Spark 的运行模式包括 standalone、spark on mesos 和 spark on YARN 等,spark on YARN 模式利用 Hadoop 的资源管理器。 3. Stage 的 Task 数量由 Partition 决定,Partition 是RDD 的一个子集,Spark 会将RDD ...
5. **运行模式**:Spark 支持多种运行模式,如 local(本地)、standalone(独立集群)、YARN(Hadoop 资源管理器)和 Mesos(Mesos 集群)。选择合适的运行模式取决于你的资源和需求。 6. **Spark Shell**:Spark ...
guojian@localtest:~/work$ sudo apt-get install sqoop ``` #### 二、Sqoop基本操作 ##### 1. 帮助文档 使用 `sqoop help` 可以查看所有可用的命令及其简要描述: ```bash guojian@localtest:~/work$ sqoop ...
- 新建 `/usr/java` 目录,将JDK的RPM包传输到该目录,并执行 `yum install` 或 `rpm -ivh` 命令进行安装。 **2.3. 安装Jboss** - 将Jboss的zip文件上传到CentOS,解压到适当的目录,例如 `/opt`。 - 修改Jboss...