- 浏览: 1476522 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (691)
- linux (207)
- shell (33)
- java (42)
- 其他 (22)
- javascript (33)
- cloud (16)
- python (33)
- c (48)
- sql (12)
- 工具 (6)
- 缓存 (16)
- ubuntu (7)
- perl (3)
- lua (2)
- 超级有用 (2)
- 服务器 (2)
- mac (22)
- nginx (34)
- php (2)
- 内核 (2)
- gdb (13)
- ICTCLAS (2)
- mac android (0)
- unix (1)
- android (1)
- vim (1)
- epoll (1)
- ios (21)
- mysql (3)
- systemtap (1)
- 算法 (2)
- 汇编 (2)
- arm (3)
- 我的数据结构 (8)
- websocket (12)
- hadoop (5)
- thrift (2)
- hbase (1)
- graphviz (1)
- redis (1)
- raspberry (2)
- qemu (31)
- opencv (4)
- socket (1)
- opengl (1)
- ibeacons (1)
- emacs (6)
- openstack (24)
- docker (1)
- webrtc (11)
- angularjs (2)
- neutron (23)
- jslinux (18)
- 网络 (13)
- tap (9)
- tensorflow (8)
- nlu (4)
- asm.js (5)
- sip (3)
- xl2tp (5)
- conda (1)
- emscripten (6)
- ffmpeg (10)
- srt (1)
- wasm (5)
- bert (3)
- kaldi (4)
- 知识图谱 (1)
最新评论
-
wahahachuang8:
我喜欢代码简洁易读,服务稳定的推送服务,前段时间研究了一下go ...
websocket的helloworld -
q114687576:
http://www.blue-zero.com/WebSoc ...
websocket的helloworld -
zhaoyanzimm:
感谢您的分享,给我提供了很大的帮助,在使用过程中发现了一个问题 ...
nginx的helloworld模块的helloworld -
haoningabc:
leebyte 写道太NB了,期待早日用上Killinux!么 ...
qemu+emacs+gdb调试内核 -
leebyte:
太NB了,期待早日用上Killinux!
qemu+emacs+gdb调试内核
groupadd hadoop
useradd hadoop -g hadoop
vim /etc/sudoers
root ALL=(ALL) ALL
后加
hadoop ALL=(ALL) ALL
[root@localhost sqoop]# mkdir /usr/local/hadoop
[root@localhost sqoop]# chown -R hadoop /usr/local/hadoop
su hadoop 注意这里 ★★★★★★★
ssh-keygen
cat id_rsa.pub >>authorized_keys
ssh localhost
vim conf/hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.6.0_45/
vim conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
vim conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
./hadoop namenode -format
[hadoop@localhost bin]$ ./hadoop namenode -format
14/03/10 00:57:17 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2-CDH3B4
STARTUP_MSG: build = git://ubuntu-slave02/ on branch -r 3aa7c91592ea1c53f3a913a581dbfcdfebe98bfe; compiled by 'hudson' on Mon Feb 21 11:52:19 PST 2011
************************************************************/
14/03/10 00:57:18 INFO util.GSet: VM type = 64-bit
14/03/10 00:57:18 INFO util.GSet: 2% max memory = 19.33375 MB
14/03/10 00:57:18 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/03/10 00:57:18 INFO util.GSet: recommended=2097152, actual=2097152
14/03/10 00:57:18 INFO namenode.FSNamesystem: fsOwner=hadoop
14/03/10 00:57:18 INFO namenode.FSNamesystem: supergroup=supergroup
14/03/10 00:57:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/03/10 00:57:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000
14/03/10 00:57:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/03/10 00:57:20 INFO common.Storage: Image file of size 112 saved in 0 seconds.
14/03/10 00:57:20 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
14/03/10 00:57:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
观察★★★★
[hadoop@localhost tmp]$ pwd
/home/hadoop/tmp
[hadoop@localhost tmp]$ tree
.
└── dfs
└── name
├── current
│ ├── edits
│ ├── fsimage
│ ├── fstime
│ └── VERSION
└── image
└── fsimage
4 directories, 5 files
[hadoop@localhost tmp]$
./start-all.sh
[hadoop@localhost bin]$ jps
51166 NameNode
51561 TaskTracker
52208 Jps
51378 SecondaryNameNode
51266 DataNode
51453 JobTracker
[hadoop@localhost bin]$
参考
http://freewxy.iteye.com/blog/1027569
运行worldcount的例子
./hadoop dfs -ls /
创建目录
./hadoop dfs -mkdir /haotest
[hadoop@localhost bin]$ vim test.txt
hello haoning,eiya haoning this is my first hadoop test ,god bless me
./hadoop dfs -copyFromLocal test.txt /haotest
[hadoop@localhost hadoop]$ bin/hadoop jar hadoop-examples-0.20.2-CDH3B4.jar wordcount /haotest /output
14/03/10 01:15:47 INFO input.FileInputFormat: Total input paths to process : 1
14/03/10 01:15:48 INFO mapred.JobClient: Running job: job_201403100100_0002
14/03/10 01:15:49 INFO mapred.JobClient: map 0% reduce 0%
14/03/10 01:15:58 INFO mapred.JobClient: map 100% reduce 0%
14/03/10 01:16:08 INFO mapred.JobClient: map 100% reduce 100%
14/03/10 01:16:09 INFO mapred.JobClient: Job complete: job_201403100100_0002
14/03/10 01:16:09 INFO mapred.JobClient: Counters: 22
14/03/10 01:16:09 INFO mapred.JobClient: Job Counters
14/03/10 01:16:09 INFO mapred.JobClient: Launched reduce tasks=1
14/03/10 01:16:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8844
14/03/10 01:16:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/10 01:16:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/10 01:16:09 INFO mapred.JobClient: Launched map tasks=1
14/03/10 01:16:09 INFO mapred.JobClient: Data-local map tasks=1
14/03/10 01:16:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10370
14/03/10 01:16:09 INFO mapred.JobClient: FileSystemCounters
14/03/10 01:16:09 INFO mapred.JobClient: FILE_BYTES_READ=123
14/03/10 01:16:09 INFO mapred.JobClient: HDFS_BYTES_READ=161
14/03/10 01:16:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=93307
14/03/10 01:16:09 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=77
14/03/10 01:16:09 INFO mapred.JobClient: Map-Reduce Framework
14/03/10 01:16:09 INFO mapred.JobClient: Reduce input groups=10
14/03/10 01:16:09 INFO mapred.JobClient: Combine output records=10
14/03/10 01:16:09 INFO mapred.JobClient: Map input records=2
14/03/10 01:16:09 INFO mapred.JobClient: Reduce shuffle bytes=123
14/03/10 01:16:09 INFO mapred.JobClient: Reduce output records=10
14/03/10 01:16:09 INFO mapred.JobClient: Spilled Records=20
14/03/10 01:16:09 INFO mapred.JobClient: Map output bytes=97
14/03/10 01:16:09 INFO mapred.JobClient: Combine input records=10
14/03/10 01:16:09 INFO mapred.JobClient: Map output records=10
14/03/10 01:16:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
14/03/10 01:16:09 INFO mapred.JobClient: Reduce input records=10
[hadoop@localhost hadoop]$
[hadoop@localhost hadoop]$ bin/hadoop dfs -cat /output/part-r-00000
,god 1
bless 1
first 1
hadoop 1
haoning,this 1
hello 1
is 1
me 1
my 1
test 1
[hadoop@localhost hadoop]$
rm: Cannot remove directory "hdfs://localhost:9000/haotest", use -rmr instead
[hadoop@localhost hadoop]$ bin/hadoop dfs -rmr /haotest
Deleted hdfs://localhost:9000/haotest
[hadoop@localhost hadoop]$
bin/hadoop dfs -copyFromLocal bin/test.txt /haotest
[hadoop@localhost hadoop]$ bin/hadoop dfs -rmr /output
Deleted hdfs://localhost:9000/output
yum install mysql mysql-server mysql-devel
用root
services mysqld start
chkconfig --list|grep mysql*
mysqladmin -u root password haoning
openicf
Kettle
/data/hadoop/sqoop/sqoop-1.2.0-CDH3B4/lib
./sqoop list-tables --connect jdbc:mysql://localhost/mysql --username root --password haoning
sqoop import --connect jdbc:mysql://localhost/mysql --username root --password haoning --table active_uuid --hive-import
★★★★★★★★★
hive
http://www.juziku.com/wiki/6028.htm
export JAVA_HOME=/usr/local/java/jdk1.6.0_45
export HBASE_HOME=/data/hbase/hbase-install/hbase-0.94.13
export HAO=/data/haoning/mygit/mynginxmodule
export hao=/data/haoning/mygit/mynginxmodule/nginx_release/nginx-1.5.6
export mm=/data/haoning/mygit/mynginxmodule/
export nn=/usr/local/nginx_upstream/sbin
export ne=/usr/local/nginx_echo/
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/data/hadoop/hive/hive-0.8.1
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export HIVE_CONF_DIR=$HIVE_HOME/hive-conf
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:/usr/local/java/jdk1.6.0_45/bin:$HBASE_HOME/bin:$PATH
export CLASSPATH=/usr/local/java/jdk1.6.0_45/jre/lib/rt.jar:$HADOOP_HOME:.
配置好变量后,hive解压即可用
[hadoop@localhost bin]$ hive
Logging initialized using configuration in jar:file:/data/hadoop/hive/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201403100348_620639513.txt
hive> show tables;
OK
Time taken: 8.822 seconds
hive>
>
>
> create table abc(id int,name string);
OK
Time taken: 0.476 seconds
hive> select * from abc;
OK
Time taken: 0.297 seconds
hive>
###./sqoop list-tables --connect jdbc:mysql://localhost/mysql --username root --password haoning
###./sqoop import --connect jdbc:mysql://10.230.13.100/mysql --username root --password haoning --table user --hive-import
./sqoop list-tables --connect jdbc:mysql://10.230.13.100/test --username root --password haoning
注意mysql的权限
必须hadoop用户能访问的远程ip的权限
mysql :
grant all privileges on *.* to root@'%' identified by "haoning";
use test
create table haohao(id int(4) not null primary key auto_increment,name char(20) not null)
insert into haohao values(1,'hao');
./sqoop import --connect jdbc:mysql://10.230.13.100/test --username root --password haoning --table haohao --hive-import
测试结果
[hadoop@localhost bin]$ hive
Logging initialized using configuration in jar:file:/data/hadoop/hive/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201403102018_222421651.txt
hive> show tables;
OK
haoge
Time taken: 6.36 seconds
hive> select * from haoge
> ;
OK
1 hao
Time taken: 1.027 seconds
hive>
注意的问题:
权限问题:
hadoop hive,sqoop最好都是hadoop用户操作的,否则会报权限错误
mysql表必须有主键
版本问题要匹配,尝试四个版本的hive
调研:
尝试使用sqoop把mysql中的表导入到hive
使用版本
hadoop:hadoop-0.20.2-CDH3B4.tar.gz
sqoop:sqoop-1.2.0-CDH3B4.tar.gz
mysqljdbc:mysql-connector-java-5.1.18.jar
hive:hive-0.8.1.tar.gz 尝试4种hive,只有这种可用
使用结果:
mysql中的表通过sqoop 导入到hive,存于hadoop中
hadoop dfs -lsr /user/hadoop/
useradd hadoop -g hadoop
vim /etc/sudoers
root ALL=(ALL) ALL
后加
hadoop ALL=(ALL) ALL
[root@localhost sqoop]# mkdir /usr/local/hadoop
[root@localhost sqoop]# chown -R hadoop /usr/local/hadoop
su hadoop 注意这里 ★★★★★★★
ssh-keygen
cat id_rsa.pub >>authorized_keys
ssh localhost
vim conf/hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.6.0_45/
vim conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
vim conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
./hadoop namenode -format
[hadoop@localhost bin]$ ./hadoop namenode -format
14/03/10 00:57:17 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2-CDH3B4
STARTUP_MSG: build = git://ubuntu-slave02/ on branch -r 3aa7c91592ea1c53f3a913a581dbfcdfebe98bfe; compiled by 'hudson' on Mon Feb 21 11:52:19 PST 2011
************************************************************/
14/03/10 00:57:18 INFO util.GSet: VM type = 64-bit
14/03/10 00:57:18 INFO util.GSet: 2% max memory = 19.33375 MB
14/03/10 00:57:18 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/03/10 00:57:18 INFO util.GSet: recommended=2097152, actual=2097152
14/03/10 00:57:18 INFO namenode.FSNamesystem: fsOwner=hadoop
14/03/10 00:57:18 INFO namenode.FSNamesystem: supergroup=supergroup
14/03/10 00:57:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/03/10 00:57:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000
14/03/10 00:57:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/03/10 00:57:20 INFO common.Storage: Image file of size 112 saved in 0 seconds.
14/03/10 00:57:20 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
14/03/10 00:57:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
观察★★★★
[hadoop@localhost tmp]$ pwd
/home/hadoop/tmp
[hadoop@localhost tmp]$ tree
.
└── dfs
└── name
├── current
│ ├── edits
│ ├── fsimage
│ ├── fstime
│ └── VERSION
└── image
└── fsimage
4 directories, 5 files
[hadoop@localhost tmp]$
./start-all.sh
[hadoop@localhost bin]$ jps
51166 NameNode
51561 TaskTracker
52208 Jps
51378 SecondaryNameNode
51266 DataNode
51453 JobTracker
[hadoop@localhost bin]$
参考
http://freewxy.iteye.com/blog/1027569
运行worldcount的例子
./hadoop dfs -ls /
创建目录
./hadoop dfs -mkdir /haotest
[hadoop@localhost bin]$ vim test.txt
hello haoning,eiya haoning this is my first hadoop test ,god bless me
./hadoop dfs -copyFromLocal test.txt /haotest
[hadoop@localhost hadoop]$ bin/hadoop jar hadoop-examples-0.20.2-CDH3B4.jar wordcount /haotest /output
14/03/10 01:15:47 INFO input.FileInputFormat: Total input paths to process : 1
14/03/10 01:15:48 INFO mapred.JobClient: Running job: job_201403100100_0002
14/03/10 01:15:49 INFO mapred.JobClient: map 0% reduce 0%
14/03/10 01:15:58 INFO mapred.JobClient: map 100% reduce 0%
14/03/10 01:16:08 INFO mapred.JobClient: map 100% reduce 100%
14/03/10 01:16:09 INFO mapred.JobClient: Job complete: job_201403100100_0002
14/03/10 01:16:09 INFO mapred.JobClient: Counters: 22
14/03/10 01:16:09 INFO mapred.JobClient: Job Counters
14/03/10 01:16:09 INFO mapred.JobClient: Launched reduce tasks=1
14/03/10 01:16:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8844
14/03/10 01:16:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/10 01:16:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/10 01:16:09 INFO mapred.JobClient: Launched map tasks=1
14/03/10 01:16:09 INFO mapred.JobClient: Data-local map tasks=1
14/03/10 01:16:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10370
14/03/10 01:16:09 INFO mapred.JobClient: FileSystemCounters
14/03/10 01:16:09 INFO mapred.JobClient: FILE_BYTES_READ=123
14/03/10 01:16:09 INFO mapred.JobClient: HDFS_BYTES_READ=161
14/03/10 01:16:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=93307
14/03/10 01:16:09 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=77
14/03/10 01:16:09 INFO mapred.JobClient: Map-Reduce Framework
14/03/10 01:16:09 INFO mapred.JobClient: Reduce input groups=10
14/03/10 01:16:09 INFO mapred.JobClient: Combine output records=10
14/03/10 01:16:09 INFO mapred.JobClient: Map input records=2
14/03/10 01:16:09 INFO mapred.JobClient: Reduce shuffle bytes=123
14/03/10 01:16:09 INFO mapred.JobClient: Reduce output records=10
14/03/10 01:16:09 INFO mapred.JobClient: Spilled Records=20
14/03/10 01:16:09 INFO mapred.JobClient: Map output bytes=97
14/03/10 01:16:09 INFO mapred.JobClient: Combine input records=10
14/03/10 01:16:09 INFO mapred.JobClient: Map output records=10
14/03/10 01:16:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
14/03/10 01:16:09 INFO mapred.JobClient: Reduce input records=10
[hadoop@localhost hadoop]$
[hadoop@localhost hadoop]$ bin/hadoop dfs -cat /output/part-r-00000
,god 1
bless 1
first 1
hadoop 1
haoning,this 1
hello 1
is 1
me 1
my 1
test 1
[hadoop@localhost hadoop]$
rm: Cannot remove directory "hdfs://localhost:9000/haotest", use -rmr instead
[hadoop@localhost hadoop]$ bin/hadoop dfs -rmr /haotest
Deleted hdfs://localhost:9000/haotest
[hadoop@localhost hadoop]$
bin/hadoop dfs -copyFromLocal bin/test.txt /haotest
[hadoop@localhost hadoop]$ bin/hadoop dfs -rmr /output
Deleted hdfs://localhost:9000/output
yum install mysql mysql-server mysql-devel
用root
services mysqld start
chkconfig --list|grep mysql*
mysqladmin -u root password haoning
openicf
Kettle
/data/hadoop/sqoop/sqoop-1.2.0-CDH3B4/lib
./sqoop list-tables --connect jdbc:mysql://localhost/mysql --username root --password haoning
sqoop import --connect jdbc:mysql://localhost/mysql --username root --password haoning --table active_uuid --hive-import
★★★★★★★★★
hive
http://www.juziku.com/wiki/6028.htm
export JAVA_HOME=/usr/local/java/jdk1.6.0_45
export HBASE_HOME=/data/hbase/hbase-install/hbase-0.94.13
export HAO=/data/haoning/mygit/mynginxmodule
export hao=/data/haoning/mygit/mynginxmodule/nginx_release/nginx-1.5.6
export mm=/data/haoning/mygit/mynginxmodule/
export nn=/usr/local/nginx_upstream/sbin
export ne=/usr/local/nginx_echo/
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/data/hadoop/hive/hive-0.8.1
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export HIVE_CONF_DIR=$HIVE_HOME/hive-conf
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:/usr/local/java/jdk1.6.0_45/bin:$HBASE_HOME/bin:$PATH
export CLASSPATH=/usr/local/java/jdk1.6.0_45/jre/lib/rt.jar:$HADOOP_HOME:.
配置好变量后,hive解压即可用
[hadoop@localhost bin]$ hive
Logging initialized using configuration in jar:file:/data/hadoop/hive/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201403100348_620639513.txt
hive> show tables;
OK
Time taken: 8.822 seconds
hive>
>
>
> create table abc(id int,name string);
OK
Time taken: 0.476 seconds
hive> select * from abc;
OK
Time taken: 0.297 seconds
hive>
###./sqoop list-tables --connect jdbc:mysql://localhost/mysql --username root --password haoning
###./sqoop import --connect jdbc:mysql://10.230.13.100/mysql --username root --password haoning --table user --hive-import
./sqoop list-tables --connect jdbc:mysql://10.230.13.100/test --username root --password haoning
注意mysql的权限
必须hadoop用户能访问的远程ip的权限
mysql :
grant all privileges on *.* to root@'%' identified by "haoning";
use test
create table haohao(id int(4) not null primary key auto_increment,name char(20) not null)
insert into haohao values(1,'hao');
./sqoop import --connect jdbc:mysql://10.230.13.100/test --username root --password haoning --table haohao --hive-import
测试结果
[hadoop@localhost bin]$ hive
Logging initialized using configuration in jar:file:/data/hadoop/hive/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201403102018_222421651.txt
hive> show tables;
OK
haoge
Time taken: 6.36 seconds
hive> select * from haoge
> ;
OK
1 hao
Time taken: 1.027 seconds
hive>
注意的问题:
权限问题:
hadoop hive,sqoop最好都是hadoop用户操作的,否则会报权限错误
mysql表必须有主键
版本问题要匹配,尝试四个版本的hive
调研:
尝试使用sqoop把mysql中的表导入到hive
使用版本
hadoop:hadoop-0.20.2-CDH3B4.tar.gz
sqoop:sqoop-1.2.0-CDH3B4.tar.gz
mysqljdbc:mysql-connector-java-5.1.18.jar
hive:hive-0.8.1.tar.gz 尝试4种hive,只有这种可用
使用结果:
mysql中的表通过sqoop 导入到hive,存于hadoop中
hadoop dfs -lsr /user/hadoop/
相关推荐
在 Hadoop 生态系统中,Sqoop 提供了一种高效、可扩展的方式,用于将大量结构化数据导入到 Hadoop 分布式文件系统(HDFS)中,或者将数据导出回传统的关系型数据库。这使得 Hadoop 能够处理来自企业级数据库的数据,...
1. **数据导入**:Sqoop 提供了命令行接口,可以将结构化数据从传统的关系型数据库迁移到Hadoop的HDFS(Hadoop Distributed File System)中,支持批量导入,提高数据传输效率。 2. **数据导出**:同样,Sqoop也允许...
通过 Sqoop,用户可以将大规模的数据从 RDBMS(如 MySQL、Oracle、SQL Server 等)批量导入到 HDFS(Hadoop 分布式文件系统),或者将 HDFS 中处理后的数据导回 RDBMS,以供其他业务系统使用。这大大提高了数据迁移...
Sqoop 是 Apache Hadoop 生态系统中的一个工具,主要用于在关系型数据库(如 MySQL、Oracle 等)和 Hadoop 分布式文件系统(HDFS)之间高效地传输数据。这个压缩包 "sqoop-1.4.2.bin__hadoop-2.0.0-alpha.tar" 提供...
7. **数据类型映射**:Sqoop 能自动或手动将数据库中的数据类型映射到 Hadoop 的数据类型。 8. **并行性控制**:用户可以设置并发度,以提高导入导出效率。 在解压 "sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz" ...
Sqoop(发音:skup)是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,...
Sqoop 是一个用于在 Hadoop 和关系型数据库之间进行数据导入导出的工具,它使得在大数据处理场景下,能够方便地将结构化的数据从 MySQL 这样的 RDBMS(关系型数据库管理系统)转移到 Hadoop 的 HDFS(Hadoop 分布式...
Sqoop 提供了高效、可靠的批处理数据导入导出功能,使得用户能够方便地将结构化数据从传统的数据库系统传输到Hadoop的HDFS(Hadoop Distributed File System),或者反向将数据从HDFS导入到数据库。 标题 "sqoop-...
Sqoop 是一个开源工具,主要用于在关系型数据库(如 MySQL、Oracle 等)与 Hadoop 分布式...在 Sqoop-1.4.6.bin__hadoop-0.23 这个版本中,用户可以享受到稳定而强大的数据迁移功能,以及与 Hadoop 0.23 的良好兼容性。
通过提供高效、可靠的批量数据传输,Sqoop 允许用户将结构化数据轻松地导入到 Hadoop 集群中,或者将处理后的数据导回 RDBMS,从而实现大数据分析的完整流程。 2. **Sqoop 功能** - **数据导入**:Sqoop 可以将...
首先,Sqoop不仅支持将数据从关系型数据库如MySQL导入到HDFS或Hive,还能直接导入到HBase。关键在于正确使用参数: 1. `--hbase-table`:此参数用于指定导入的数据应存储在哪个HBase表中。不指定的话,数据将被导入...
通过Sqoop,我们可以将结构化的数据导入到Hadoop的HDFS中,或者将Hadoop中的数据导出到关系数据库。安装Sqoop需要确保已安装Hadoop和JDBC驱动,配置相关环境变量,如SQOOP_HOME,然后下载并解压Sqoop的源码或二进制...
Sqoop是Apache Hadoop生态中的一个工具,专门用于在关系型数据库(如MySQL、Oracle等)和Hadoop之间进行数据的导入导出。这个压缩包"sqoop-1.4.3.bin__hadoop-1.0.0.tar.gz"是Sqoop 1.4.3版本针对Hadoop 1.0.0的二...
Sqoop是一款开源的数据迁移工具,它主要用于在关系型数据库(如MySQL、Oracle等)和Hadoop之间进行数据的导入导出。在大数据处理中,Sqoop扮演着至关重要的角色,它提供了高效、灵活且方便的数据传输方式。本文将...
你可以使用它将关系数据库中的数据导入到Hadoop的HDFS中,或者将Hadoop中的数据导出回关系数据库。例如,使用`sqoop import`命令可以将MySQL中的表导入到HDFS,而`sqoop export`则反之。 在实际应用中,用户可能会...
* 数据迁移:Sqoop 可以将数据从一个数据源迁移到另一个数据源,例如从 MySQL 迁移到 Hadoop。 * 数据集成:Sqoop 可以将来自不同数据源的数据集成到一起,以便进行更好的数据分析和处理。 * 大数据处理:Sqoop 可以...
1. 数据导入:从传统数据库系统导入数据到 HDFS,支持多种数据源,如 MySQL、Oracle、PostgreSQL 等。 2. 数据导出:将 HDFS 中的数据导出到关系型数据库中,实现大数据分析结果的持久化存储。 3. 支持多种操作:...
在大数据处理中,Sqoop 提供了方便的数据导入和导出功能,它能够将结构化的数据从传统数据库迁移到 Hadoop 生态系统中的组件,如 Hive。 在本主题中,我们将深入探讨 Sqoop 的导入功能,特别是如何将数据从 MySQL ...
【标题】"yinian_hive_increase_sqoop:sqoop从mysql同步数据到hive" 描述了一种使用Apache Sqoop工具将数据从MySQL数据库高效地导入到Hadoop的Hive数据仓库的过程。这个过程在大数据处理中至关重要,因为它允许用户...