`
haoningabc
  • 浏览: 1482148 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

hadoop学习笔记

阅读更多
启动后可以用
    * NameNode - http://localhost:50070/
    * JobTracker - http://localhost:50030/
-------- 具体流程-------------
$./bin/hadoop namenode -format 
$./bin/start-all.sh
$jps -l
$./bin/hadoop dfsadmin -report
$echo "hello hadoopworld." > /tmp/test_file1.txt
$echo "hello world hadoop,I'm test." > /tmp/test_file2.txt
$./bin/hadoop dfs -mkdir test-in
$./bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in
$./bin/hadoop dfs -ls test-in
$./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount test-in test-out
$./bin/hadoop dfs -ls test-out
$./bin/hadoop dfs -cat test-out/part-r-00000

--------- 具体流程--------------

$ ./bin/hadoop namenode -format
10/12/29 14:25:57 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = haoning/10.4.125.111
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/12/29 14:25:57 INFO namenode.FSNamesystem: fsOwner=Administrator,None,root,Administrators,Users,Debugger,Users,ora_dba
10/12/29 14:25:57 INFO namenode.FSNamesystem: supergroup=supergroup
10/12/29 14:25:57 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/12/29 14:25:57 INFO common.Storage: Image file of size 103 saved in 0 seconds.
10/12/29 14:25:57 INFO common.Storage: Storage directory \home\Administrator\tmp\dfs\name has been successfully formatted.
10/12/29 14:25:57 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at haoning/10.4.125.111
************************************************************/

$ ./bin/start-all.sh 
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-Administrator-namenode-haoning.out
localhost: datanode running as process 352. Stop it first.
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-Administrator-secondarynamenode-haoning.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-Administrator-jobtracker-haoning.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-Administrator-tasktracker-haoning.out


网上 下的,http://blogimg.chinaunix.net/blog/upfile2/100317223114.pdf写得简单易懂。
似乎hadoop对openjdk不感冒
结合http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html看吧
JDK
用hadoop-0.20.2
sudo apt-get install sun-java6-jdk
ubuntu上装在了/usr/lib/jvm里面
如果在红帽5上安装
需要下载jdk-6u23-linux-x64-rpm.bin
装完后在/usr/java/jdk1.6.0_23里面
如果ubuntu的ssh比较慢
辑/etc/ssh/ssh_config这个文件,将
#GSSAPIAuthentication no
#GSSAPIDelegateCredentials no 


用户
redhat5:
groupadd hadoop
useradd hadoop -g hadoop
vim /etc/sudoers
修改
root    ALL=(ALL)       ALL
hadoop    ALL=(ALL)       ALL

需要!强制执行
[hadoop@122226 .ssh]$ ssh-keygen -t rsa -P ""
cat id_rsa.pub >authorized_keys
ubuntu好使

配置文件
0.18的版本好像是1个hadoop-site.xml
0.20分成了多个*-site.xml
hadoop@ubuntu:/usr/local/hadoop/hadoop-0.20.2$ vim conf/hadoop-env.sh
设置export JAVA_HOME=/usr/lib/jvm/java-6-sun
hadoop@ubuntu:/usr/local/hadoop/hadoop-0.20.2$ vim conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://localhost:9000</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/tmp</value>
        </property>
</configuration>

hadoop@ubuntu:/usr/local/hadoop/hadoop-0.20.2$ vim conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>localhost:9001</value>
        </property>
</configuration>



可以把test-out考出来
./bin/hadoop dfs -copyToLocal /user/hadoop/test-out test-out
看一下,-copyToLocal和-get似乎是一个意思


执行./bin/hadoop namenode -format


hadoop@ubuntu:~/tmp$ tree
.
└── dfs
    └── name
        ├── current
        │   ├── edits
        │   ├── fsimage
        │   ├── fstime
        │   └── VERSION
        └── image
            └── fsimage

4 directories, 5 files


./bin/start-all.sh之后
执行./bin/hadoop dfs -mkdir test-in
.
|-- dfs
|   |-- data
|   |   |-- current
|   |   |   |-- blk_-1605603437240955017
|   |   |   |-- blk_-1605603437240955017_1019.meta
|   |   |   |-- dncp_block_verification.log.curr
|   |   |   `-- VERSION
|   |   |-- detach
|   |   |-- in_use.lock
|   |   |-- storage
|   |   `-- tmp
|   |-- name
|   |   |-- current
|   |   |   |-- edits
|   |   |   |-- fsimage
|   |   |   |-- fstime
|   |   |   `-- VERSION
|   |   |-- image
|   |   |   `-- fsimage
|   |   `-- in_use.lock
|   `-- namesecondary
|       |-- current
|       |   |-- edits
|       |   |-- fsimage
|       |   |-- fstime
|       |   `-- VERSION
|       |-- image
|       |   `-- fsimage
|       `-- in_use.lock
`-- mapred
    `-- local
13 directories, 18 files


启动后
hadoop@ubuntu:/usr/local/hadoop/hadoop-0.20.2$ ./bin/hadoop dfsadmin -report
Configured Capacity: 25538187264 (23.78 GB)
Present Capacity: 8219365391 (7.65 GB)
DFS Remaining: 8219340800 (7.65 GB)
DFS Used: 24591 (24.01 KB)
DFS Used%: 0%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 25538187264 (23.78 GB)
DFS Used: 24591 (24.01 KB)
Non DFS Used: 17318821873 (16.13 GB)
DFS Remaining: 8219340800(7.65 GB)
DFS Used%: 0%
DFS Remaining%: 32.18%
Last contact: Tue Dec 21 11:18:48 CST 2010


或者执行jps。
执行
hadoop@zhengxq-desktop:/usr/local/hadoop/hadoop-0.20.1$ echo "hello hadoop
world." > /tmp/test_file1.txt
hadoop@zhengxq-desktop:/usr/local/hadoop/hadoop-0.20.1$ echo "hello world
hadoop,I'm haha." > /tmp/test_file2.txt
hadoop@zhengxq-desktop:/usr/local/hadoop/hadoop-0.20.1$ bin/hadoop dfs -
copyFromLocal /tmp/test*.txt test-in

hadoop@test-linux:~/tmp$ tree
.
|-- dfs
|   |-- data
|   |   |-- current
|   |   |   |-- blk_-1605603437240955017
|   |   |   |-- blk_-1605603437240955017_1019.meta
|   |   |   |-- blk_-2047199693110071270
|   |   |   |-- blk_-2047199693110071270_1020.meta
|   |   |   |-- blk_-7264292243816045059
|   |   |   |-- blk_-7264292243816045059_1021.meta
|   |   |   |-- dncp_block_verification.log.curr
|   |   |   `-- VERSION
|   |   |-- detach
|   |   |-- in_use.lock
|   |   |-- storage
|   |   `-- tmp
|   |-- name
|   |   |-- current
|   |   |   |-- edits
|   |   |   |-- fsimage
|   |   |   |-- fstime
|   |   |   `-- VERSION
|   |   |-- image
|   |   |   `-- fsimage
|   |   `-- in_use.lock
|   `-- namesecondary
|       |-- current
|       |   |-- edits
|       |   |-- fsimage
|       |   |-- fstime
|       |   `-- VERSION
|       |-- image
|       |   `-- fsimage
|       `-- in_use.lock
`-- mapred
    `-- local

13 directories, 22 files


hadoop@test-linux:/usr/local/hadoop/hadoop-0.20.2$ ./bin/hadoop dfs -ls test-in
Found 2 items
-rw-r--r--   3 hadoop supergroup         21 2010-12-21 23:28 /user/hadoop/test-in/test_file1.txt
-rw-r--r--   3 hadoop supergroup         22 2010-12-21 23:28 /user/hadoop/test-in/test_file2.txt
hadoop@test-linux:/usr/local/hadoop/hadoop-0.20.2$ ./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount test-in test-out
10/12/21 23:36:12 INFO input.FileInputFormat: Total input paths to process : 2
10/12/21 23:36:13 INFO mapred.JobClient: Running job: job_201012212251_0001
10/12/21 23:36:14 INFO mapred.JobClient:  map 0% reduce 0%
10/12/21 23:36:55 INFO mapred.JobClient:  map 100% reduce 0%
10/12/21 23:37:14 INFO mapred.JobClient:  map 100% reduce 100%
10/12/21 23:37:16 INFO mapred.JobClient: Job complete: job_201012212251_0001
10/12/21 23:37:16 INFO mapred.JobClient: Counters: 17
10/12/21 23:37:16 INFO mapred.JobClient:   Job Counters 
10/12/21 23:37:16 INFO mapred.JobClient:     Launched reduce tasks=1
10/12/21 23:37:16 INFO mapred.JobClient:     Launched map tasks=2
10/12/21 23:37:16 INFO mapred.JobClient:     Data-local map tasks=2
10/12/21 23:37:16 INFO mapred.JobClient:   FileSystemCounters
10/12/21 23:37:16 INFO mapred.JobClient:     FILE_BYTES_READ=85
10/12/21 23:37:16 INFO mapred.JobClient:     HDFS_BYTES_READ=43
10/12/21 23:37:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=240
10/12/21 23:37:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=38
10/12/21 23:37:16 INFO mapred.JobClient:   Map-Reduce Framework
10/12/21 23:37:16 INFO mapred.JobClient:     Reduce input groups=4
10/12/21 23:37:16 INFO mapred.JobClient:     Combine output records=6
10/12/21 23:37:16 INFO mapred.JobClient:     Map input records=2
10/12/21 23:37:16 INFO mapred.JobClient:     Reduce shuffle bytes=91
10/12/21 23:37:16 INFO mapred.JobClient:     Reduce output records=4
10/12/21 23:37:16 INFO mapred.JobClient:     Spilled Records=12
10/12/21 23:37:16 INFO mapred.JobClient:     Map output bytes=67
10/12/21 23:37:16 INFO mapred.JobClient:     Combine input records=6
10/12/21 23:37:16 INFO mapred.JobClient:     Map output records=6
10/12/21 23:37:16 INFO mapred.JobClient:     Reduce input records=6


hadoop@test-linux:~/tmp$ tree
.
|-- dfs
|   |-- data
|   |   |-- current
|   |   |   |-- blk_-1605603437240955017
|   |   |   |-- blk_-1605603437240955017_1019.meta
|   |   |   |-- blk_-1792462247745372986
|   |   |   |-- blk_-1792462247745372986_1027.meta
|   |   |   |-- blk_-2047199693110071270
|   |   |   |-- blk_-2047199693110071270_1020.meta
|   |   |   |-- blk_-27635221429411767
|   |   |   |-- blk_-27635221429411767_1027.meta
|   |   |   |-- blk_-7264292243816045059
|   |   |   |-- blk_-7264292243816045059_1021.meta
|   |   |   |-- blk_-8634524858846751168
|   |   |   |-- blk_-8634524858846751168_1026.meta
|   |   |   |-- dncp_block_verification.log.curr
|   |   |   `-- VERSION
|   |   |-- detach
|   |   |-- in_use.lock
|   |   |-- storage
|   |   `-- tmp
|   |-- name
|   |   |-- current
|   |   |   |-- edits
|   |   |   |-- fsimage
|   |   |   |-- fstime
|   |   |   `-- VERSION
|   |   |-- image
|   |   |   `-- fsimage
|   |   `-- in_use.lock
|   `-- namesecondary
|       |-- current
|       |   |-- edits
|       |   |-- fsimage
|       |   |-- fstime
|       |   `-- VERSION
|       |-- image
|       |   `-- fsimage
|       `-- in_use.lock
`-- mapred
    `-- local
        |-- jobTracker
        `-- taskTracker
            `-- jobcache

16 directories, 28 files

hadoop@test-linux:/usr/local/hadoop/hadoop-0.20.2$ ./bin/hadoop dfs -lsr
drwxr-xr-x   - hadoop supergroup          0 2010-12-21 23:28 /user/hadoop/test-in
-rw-r--r--   3 hadoop supergroup         21 2010-12-21 23:28 /user/hadoop/test-in/haoning1.txt
-rw-r--r--   3 hadoop supergroup         22 2010-12-21 23:28 /user/hadoop/test-in/haoning2.txt
drwxr-xr-x   - hadoop supergroup          0 2010-12-21 23:37 /user/hadoop/test-out
drwxr-xr-x   - hadoop supergroup          0 2010-12-21 23:36 /user/hadoop/test-out/_logs
drwxr-xr-x   - hadoop supergroup          0 2010-12-21 23:36 /user/hadoop/test-out/_logs/history
-rw-r--r--   3 hadoop supergroup      16751 2010-12-21 23:36 /user/hadoop/test-out/_logs/history/localhost_1292943083664_job_201012212251_0001_conf.xml
-rw-r--r--   3 hadoop supergroup       8774 2010-12-21 23:36 /user/hadoop/test-out/_logs/history/localhost_1292943083664_job_201012212251_0001_hadoop_word+count
-rw-r--r--   3 hadoop supergroup         38 2010-12-21 23:37 /user/hadoop/test-out/part-r-00000

hadoop@test-linux:/usr/local/hadoop/hadoop-0.20.2$ ./bin/hadoop dfs -cat test-out/part-r-00000
统计出字数结果

可见
一个在dfs看到的文件对应两个文件
blk_-*_*.meta
blk_-*
但log不算


在windows上安装,先安装cygwin,全装吧,得有ssh,照百度文库“在Windows上安装Hadoop教程”,设置个链接,解决空格问题
ln -s /cygdrive/c/Program\ Files/Java/jdk1.6.0_17 \
/usr/local/jdk1.6.0_17


在windows下路径会乱一点,cygwin如果装了ssh,ssh-keygen互信过,文件夹权限一致就应该没问题了,2010年12月29日,成功在ubuntu10.10,redhat5,windowxp+cygwin上跑起单机的,准备跑个集群的试试,windows用tree /F查看tmp中结果
很奇怪
D:/cygwin/usr/local/hadoop
在这下面执行
echo "aa" >/tmp/test_file1.txt
$ ./bin/hadoop fs -copyFromLocal /tmp/test_file1.txt /user/Administrator/test-in
copyFromLocal: File /tmp/test_file1.txt does not exist.
把test_file1.txt复制到D:/tmp就好了


集群
usermod -G group user
vi /etc/group
/etc/passwd

看了http://malixxx.iteye.com/blog/459277,
相对hadoop-0.20.2
core-site.xml为
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://192.168.200.12:8888</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/Administrator/tmp</value>
        </property>
</configuration>


masters本机ip 192.168.200.12
slaves为节点ip 192.168.200.16
mapred-site.xml为
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>192.168.200.12:9999</value>
        </property>
</configuration>

如果改了mapred-site.xml不行,报错“FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to
/192.168.200.16:9999 : Cannot assign requested address”
因为jobtracker没起来,应该是配置错误,暂时没解决
其实方法就是把单机能运行的例子,改一个slave文件,(似乎slave就是datanode)
注意hadoop-env.sh设置了JAVA_HOME,如果是把namenode的hadoop拷贝到了datanode上,注意JAVA_HOME路径是否正确
---------★-----------
防火墙陶腾了我一天,考
上网找了个脚本刷上之后不报错了
accept-all.sh
#!/bin/sh

IPT='/sbin/iptables'
$IPT -t nat -F
$IPT -t nat -X
$IPT -t nat -P PREROUTING ACCEPT
$IPT -t nat -P POSTROUTING ACCEPT
$IPT -t nat -P OUTPUT ACCEPT
$IPT -t mangle -F
$IPT -t mangle -X
$IPT -t mangle -P PREROUTING ACCEPT
$IPT -t mangle -P INPUT ACCEPT
$IPT -t mangle -P FORWARD ACCEPT
$IPT -t mangle -P OUTPUT ACCEPT
$IPT -t mangle -P POSTROUTING ACCEPT
$IPT -F
$IPT -X
$IPT -P FORWARD ACCEPT
$IPT -P INPUT ACCEPT
$IPT -P OUTPUT ACCEPT
$IPT -t raw -F
$IPT -t raw -X
$IPT -t raw -P PREROUTING ACCEPT
$IPT -t raw -P OUTPUT ACCEPT

各种网络连接错误nio错就罪魁祸首就是iptables防火墙
运行成功后,
12的mastar上,启动NameNode,SecondaryNameNode,JobTracker,为
$ jps -l
32416 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
32483 org.apache.hadoop.mapred.JobTracker
1398 sun.tools.jps.Jps
32252 org.apache.hadoop.hdfs.server.namenode.NameNode

从系统中考个文件到hadoop的新建的dir中后
.
`-- tmp
    `-- dfs
        |-- name
        |   |-- current
        |   |   |-- VERSION
        |   |   |-- edits
        |   |   |-- edits.new
        |   |   |-- fsimage
        |   |   `-- fstime
        |   |-- image
        |   |   `-- fsimage
        |   `-- in_use.lock
        `-- namesecondary
            |-- current
            |-- in_use.lock
            `-- lastcheckpoint.tmp

16这个slave上启动DataNode,TaskTracker 为
# jps -l
32316 sun.tools.jps.Jps
31068 org.apache.hadoop.mapred.TaskTracker
30949 org.apache.hadoop.hdfs.server.datanode.DataNode
# 

.
`-- tmp
    |-- dfs
    |   `-- data
    |       |-- current
    |       |   |-- VERSION
    |       |   |-- blk_-4054376904853997355
    |       |   |-- blk_-4054376904853997355_1002.meta
    |       |   |-- blk_-8185269915321998969
    |       |   |-- blk_-8185269915321998969_1001.meta
    |       |   `-- dncp_block_verification.log.curr
    |       |-- detach
    |       |-- in_use.lock
    |       |-- storage
    |       `-- tmp
    `-- mapred
        `-- local

一共5个
遇到了“FileSystem is not ready yet”“retry”“could only be replicated to 0 nodes”之类的错误全没了
看了一下ipc,跟ibm那个入门级的nio的MultiPortEcho.java很像
http://bbs.hadoopor.com/thread-329-1-2.html
遇到连接不上之类的错误就找网络原因吧
在家的ubuntu单机运行,写localhost好使,换成ip就不好使了
后来卸载了virtbr0,bridge-utils也不好使
后来配置了/etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 192.168.1.118
netmask 255.255.255.0
network 192.168.0.0
broadcast 192.168.1.255
gateway 192.168.1.1

$    /etc/init.d/networking restart
清了/tmp目录 /home/hadoop/temp目录,如果是多台机器,多台机器的tmp都删,注意查看logs里面的日志,有的时候stop-all.sh后,jps -l看不到的线程但是java还在跑,用ps -ef|grep java看一下是否有java的进程,netstat -nltp|grep 9999看jobTracker是否还在跑,如果有,kill掉
重启机器,把core-site.xml,mapred-site.xml,slaves,masters都设置192.168.1.118就好使了,dhcp好像有问题,一晚上一直报错,要不就是dfsadmin -report全0,可能是ip映射有问题,还出现什么127.0.1.1的问题,反正设置固定ip就好了


一直不知道hdfs-site.xml是干什么的,上网找了个http://www.javawhat.com/showWebsiteContent.do?id=527440
<configuration>
  <property>
    <name>dfs.hosts.exclude</name>
    <value>conf/excludes</value>
  </property>
  <property>
    <name>dfs.http.address</name>
    <value>namenodeip:50070</value>
  </property>
  <property>
    <name>dfs.balance.bandwidthPerSec</name>
    <value>12582912</value>
  </property>
  <property>
    <name>dfs.block.size</name>
    <value>134217728</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/hadoop1/data/,/hadoop2/data/</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>1073741824</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.handler.count</name>
    <value>10</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>/hadoop/name/</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.namenode.handler.count</name>
    <value>64</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>True</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
</configuration>


分享到:
评论

相关推荐

    Hadoop学习笔记

    Hadoop学习笔记,自己总结的一些Hadoop学习笔记,比较简单。

    最新Hadoop学习笔记

    **Hadoop学习笔记详解** Hadoop是一个开源的分布式计算框架,由Apache基金会开发,主要用于处理和存储海量数据。它的核心组件包括HDFS(Hadoop Distributed File System)和MapReduce,两者构成了大数据处理的基础...

    Hadoop 学习笔记.md

    Hadoop 学习笔记.md

    HADOOP学习笔记

    【HADOOP学习笔记】 Hadoop是Apache基金会开发的一个开源分布式计算框架,是云计算领域的重要组成部分,尤其在大数据处理方面有着广泛的应用。本学习笔记将深入探讨Hadoop的核心组件、架构以及如何搭建云计算平台。...

    hadoop学习笔记.rar

    《Hadoop学习笔记详解》 Hadoop,作为大数据处理领域中的核心框架,是Apache软件基金会下的一个开源项目,主要用于分布式存储和并行计算。本文将根据提供的Hadoop学习笔记,深入解析Hadoop的关键概念和实战技巧,...

    3.Hadoop学习笔记.pdf

    Hadoop是一个开源框架,用于存储和处理大型数据集。由Apache软件基金会开发,Hadoop已经成为大数据处理事实上的标准。它特别适合于存储非结构化和半结构化数据,并且能够存储和运行在廉价硬件之上。Hadoop具有高可靠...

    hadoop学习笔记(三)

    在本篇"Hadoop学习笔记(三)"中,我们将探讨如何使用Hadoop的MapReduce框架来解决一个常见的问题——从大量数据中找出最大值。这个问题与SQL中的`SELECT MAX(NUMBER) FROM TABLE`查询相似,但在这里我们通过编程...

    Hadoop学习笔记整理

    "Hadoop学习笔记整理" 本篇笔记对Hadoop进行了系统的介绍和总结,从大数据的基本流程到Hadoop的发展史、特性、集群整体概述、配置文件、HDFS分布式文件系统等方面都进行了详细的讲解。 一、大数据分析的基本流程 ...

    云计算hadoop学习笔记

    云计算,hadoop,学习笔记, dd

    Hadoop学习笔记.pdf

    在初学者的角度,理解Hadoop的组成部分以及其架构设计是学习Hadoop的基础。 首先,Hadoop的分布式文件系统(HDFS)是其核心组件之一,它具有高吞吐量的数据访问能力,非常适合大规模数据集的存储和处理。HDFS的设计...

Global site tag (gtag.js) - Google Analytics