When setting up a Hadoop cluster, you'll need to designate one specific node as the master node. This server will typically host the NameNode and JobTraker
daemons. It'll also serve as the base station contacting and activating the DataNode and TaskTracker daemons on all of the slave nodes.
Hadoop uses passphraseless SSH for this purpose. SSH utilizes standard public key cryptography to create a pair of keys for user verification——one public,
one private. The public key is stored locally on every node in the cluster, and the master node sends the private key when attempting to access a remote machine. With both pieces of information, the target machine can validate the login attempt.
1. Define a common account
This access is from a user account on one node to another user account on the target machine. For Hadoop, the accounts should have the same username on
all of the nodes (we use hadoop-user in this book), and for security purpose we recommend it being a user-level account. This account is only for managing your
Hadoop cluster. Once the cluster daemons are up and running, you'll be able to run your actual MapReduce jobs from other accounts.
2. Verify SSH installation
$ which ssh
$ which sshd
$ which ssh-keygen
没有装的话,那就装个OpenSSH
3. Generate SSH key pair
Having verified that SSH is correctly installed on all nodes of the cluster, we use ssh-keygen on the master node to generate an RSA key pair. Be certain to
avoid entering a passphrase, or you'll have to manually enter that phrase every time the master node attempts to access another node.
$ ssh-keygen -t rsa
4. Distribute public key and validate logins
Albeit a bit tedious, you'll next need to copy the public key to every slave node as well as the master node:
[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub hadoop-user@target:~/master_key
Manually log in to the target node and set the master key as an authorized key (or append to the list of authorized keys if you have others defined).
[hadoop-user@target]$ mkdir ~/.ssh
[hadoop-user@target]$ chmod 700 ~/.ssh
[hadoop-user@target]$ mv ~/master_key ~/.ssh/authorized_keys
[hadoop-user@target]$ chmod 600 ~/.ssh/authorized_keys
After generating the key, you can verify it’s correctly defined by attempting to log in to the target node from the master:
[hadoop-user@master]$ ssh target
The authenticity of host 'target (xxx.xxx.xxx.xxx)' can’t be established.
RSA key fingerprint is 72:31:d8:1b:11:36:43:52:56:11:77:a4:ec:82:03:1d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'target' (RSA) to the list of known hosts.
Last login: Sun Jan 4 15:32:22 2009 from master
After confirming the authenticity of a target node to the master node, you won’t be prompted upon subsequent login attempts.
[hadoop-user@master]$ ssh target
Last login: Sun Jan 4 15:32:49 2009 from master
We’ve now set the groundwork for running Hadoop on your own cluster.
相关推荐
Setting up Hadoop for Elasticsearch Setting up Java Setting up a dedicated user Installing SSH and setting up the certificate Downloading Hadoop Setting up environment variables Configuring ...
Easy-to-understand recipes for securing and monitoring a Hadoop cluster, and design considerations Recipes showing you how to tune the performance of a Hadoop cluster Learn how to build a Hadoop ...
在Hadoop这样的分布式计算环境中,SSH无密码登录配置至关重要,因为它简化了节点间的通信,提高了运维效率。以下是关于SSH无密码登录配置的详细解释。 **一、SSH原理** SSH通过加密网络数据流,确保在非安全网络上...
【标题】"hadoop-cluster-build"涉及的知识点主要围绕着Hadoop集群的构建,这是一个大数据处理的核心技术。Hadoop是一个开源框架,它允许在廉价硬件上进行大规模数据处理,具有高度可扩展性和容错性。 【描述】...
在大数据处理领域,Hadoop是一个不可或缺的开源框架,它提供了分布式存储和计算的能力,使得海量数据的处理变得可能。本文将深入探讨“Hadoop集群配置”这一主题,结合提供的WordCount代码实例,来阐述Hadoop集群...
Hadoop cluster planning guide
### Hadoop集群部署详解 #### 一、Hadoop概述与重要性 Hadoop是一个开源软件框架,用于分布式存储和处理大型数据集。它基于Google的MapReduce论文和Google File System (GFS) 论文而设计,能够有效地处理PB级别的...
hadoop-cluster-docker, 在 Docker 容器中运行 Hadoop 在 Docker 容器内运行Hadoop集群博客:在 Docker 更新中运行Hadoop集群。博客:基于Docker搭建Hadoop集群之升级版 3节点Hadoop集群 1.拉 Docker 图像sudo do
指导Hadoop集群部署的资料, 注意: 内容是英文的, 可能有些同学会失望
本主题将深入探讨如何使用Java编程语言与SSH工具在Hadoop平台上进行文件操作和结果查询。Hadoop作为开源的大数据处理框架,提供了一个分布式文件系统(HDFS)和MapReduce计算模型,使得大规模数据处理变得可能。 ...
这就引出了“SSH for Hadoop”的重要性,即通过Secure Shell(SSH)协议来实现对Hadoop集群的安全管理。 ### SSH与Hadoop的关系 SSH是一种网络协议,用于计算机之间的安全通信。在Hadoop环境下,SSH主要用于以下几...
Spring Data for Apache Hadoop API。 Spring Data for Apache Hadoop 开发文档
Data Algorithms Recipes for Scaling Up with Hadoop and Spark 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
Hadoop在centOS系统下的安装文档,系统是虚拟机上做出来的,一个namenode,两个datanode,详细讲解了安装过程。
Several good tools and guides describe how to deploy Hadoop clusters, but very little documentation tells how to increase performance on a Hadoop cluster once it is deployed. This white paper ...
ch09 - Setting Up a Hadoop Cluster ch10 - Administering Hadoop ch11 - Pig ch12 - Hive ch13 - HBase ch14 - ZooKeeper ch15 - Sqoop ch16 - Case Studies app1 - Installing Apache Hadoop app2 - Cloudera's ...
标题中提到的“SQL for Apache Hadoop”指向一种通过SQL语言访问和操作Apache Hadoop存储的数据的能力。Hadoop是一个开源的框架,最初由Apache软件基金会开发,设计用于存储和处理大量数据。Hadoop主要采用分布式...
Hadoop Multi Node Cluster 安装步骤.pdf