Stark_Summer

浏览: 720003 次
性别:
来自: 大连

最近访客更多访客>>

loginboot

街头诗人

ahww520

sz_jack

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

tachyon与hdfs,以及spark整合

博客分类：

tachyon
spark

spark tachyon hdfs 分布式测试

Tachyon 0.7.1伪分布式集群安装与测试:
http://blog.csdn.net/stark_summer/article/details/48321605
从官方文档得知，Spark 1.4.x和Tachyon 0.6.4版本兼容，而最新版的Tachyon 0.7.1和Spark 1.5.x兼容，目前所用的Spark为1.4.1，tachyon为 0.7.1

tachyon 与 hdfs整合

修改tachyon-env.sh

export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data12

上传文件到hdfs

 hadoop fs -put /home/cluster/data/test/bank/ /data/spark/

 hadoop fs -ls /data/spark/bank/Found 3 items-rw-r--r--   3 wangyue supergroup    4610348 2015-09-11 20:02 /data/spark/bank/bank-full.csv-rw-r--r--   3 wangyue supergroup       3864 2015-09-11 20:02 /data/spark/bank/bank-names.txt-rw-r--r--   3 wangyue supergroup     461474 2015-09-11 20:02 /data/spark/bank/bank.csv1234567

通过tachyon 读取/data/spark/bank/bank-full.csv文件

val bankFullFile = sc.textFile("tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv")2015-09-11 20:08:20,136 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(177384) called with curMem=630803, maxMem=2579182382015-09-11 20:08:20,137 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_3 stored as values in memory (estimated size 173.2 KB, free 245.2 MB)2015-09-11 20:08:20,154 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(17665) called with curMem=808187, maxMem=2579182382015-09-11 20:08:20,155 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_3_piece0 stored as bytes in memory (estimated size 17.3 KB, free 245.2 MB)2015-09-11 20:08:20,156 INFO  [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_3_piece0 in memory on localhost:41040 (size: 17.3 KB, free: 245.9 MB)2015-09-11 20:08:20,157 INFO  [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 3 from textFile at <console>:21bankFullFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at textFile at <console>:21123456789

count

bankFullFile.count()
但是发现报错如下:
2015-09-11 21:34:31,494 WARN  [Executor task launch worker-6]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,495 WARN  [Executor task launch worker-6]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,489 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing123456789101112131415161718

感觉错误很诡异,有人知道这是什么原因？tell me why?

但是我在tachyon 文件系统中可以看到如下内容:

./bin/tachyon tfs ls /data/spark/bank/bank-full.csv/4502.29 KB09-11-2015 20:09:02:078  Not In Memory  /data/spark/bank/bank-full.csv/bank-full.csv123

而bank-full.csv在hdfs文件是

hadoop fs -ls /data/spark/bank/Found 3 items-rw-r--r--   3 wangyue supergroup    4610348 2015-09-11 20:02 /data/spark/bank/bank-full.csv-rw-r--r--   3 wangyue supergroup       3864 2015-09-11 20:02 /data/spark/bank/bank-names.txt-rw-r--r--   3 wangyue supergroup     461474 2015-09-11 20:02 /data/spark/bank/bank.csv123456

其实Tachyon本身将bank-full.csv文件加载到了内存，并存放到自身的文件系统里面：tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv”
Tachyon的conf/tachyon-env.sh文件里面配置的，通过export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020配置,这样tachyon://localhost:19998就可以获取hdfs文件指定路径文件

好吧，那我就先通过hdfs方式读取文件然后保存到tachyon

scala> val bankfullfile =  sc.textFile("/data/spark/bank/bank-full.csv")
scala> bankfullfile.countres0: Long = 45212scala> bankfullfile.saveAsTextFile("tachyon://master:19998/data/spark/bank/newbankfullfile")12345

未完成,待续~

0
顶

2
踩

分享到：

sparksql与hive整合 | spark取得lzo压缩文件报错 java.lang.Cla ...

2015-09-22 15:16
浏览 3944
评论(1)
分类:开源软件
查看更多

1 楼 haorengoodman 2015-12-17

Tachyon 能在做数据分类吗？
例如我有一坨hdfs文件，将这些文件加载到Tachyon ，但是之前的文件目录结构不符合现在的要求，需要重新划分文件目录结构，可以做到吗？？？

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

tachyon与hdfs,以及spark整合

tachyon 与 hdfs整合

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

tachyon与hdfs,以及spark整合

tachyon 与 hdfs整合

评论

发表评论

相关推荐

hive on spark 编译

sparksql与hive整合

Tachyon 0.7.1伪分布式集群安装与测试

Apache Spark 1.5.0正式发布

在 Databricks 可获得 Spark 1.5 预览版

spark implementation hadoop setup,cleanup

Spark的日志配置

spark 查看 job history 日志

spark总体概况

基于spark1.3.1的spark-sql实战－02

基于spark1.3.1的spark-sql实战－01

Spark 性能相关参数配置详解－任务调度篇

整合Kafka到Spark Streaming——代码示例和挑战

spark SQL编程动手实战-01

Spark API编程动手实战-08-基于IDEA使用Spark API开发Spark程序-02

Spark API编程动手实战-08-基于IDEA使用Spark API开发Spark程序-01

spark1.2.0版本搭建伪分布式环境

Spark API编程动手实战-07-join操作深入实战

Spark API编程动手实战-06-对搜狗日志文件深入实战操作

Spark API编程动手实战-05-spark文件操作和debug

最近访客更多访客>>