spark rdd 和 DF 转换

lingzhi007

浏览: 127178 次
性别:
来自: 杭州

最近访客更多访客>>

morelily

gaojingsong

gaz0301

jiedushi

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

spark 学习

spark rdd 和 DF 转换

RDD -》 DF

有两种方式

一、

一、Inferring the Schema Using Reflection

将 RDD[t] 转为一个 object ,然后 to df

val peopleDF = spark.sparkContext
  .textFile("examples/src/main/resources/people.txt")
  .map(_.split(","))
  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
  .toDF()

rdd 也能直接装 DATASet 要 import 隐式装换类 import spark.implicits._

如果转换的对象为 tuple . 转换后下标为 _1 _2 .....

二、Programmatically Specifying the Schema

把 columnt meta 和 rdd createDataFrame 在一起

val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")

// The schema is encoded in a string
val schemaString = "name age"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)

val rowRDD = peopleRDD
  .map(_.split(","))
  .map(attributes => Row(attributes(0), attributes(1).trim))

// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)

// Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")

DF to RDd

val tt = teenagersDF.rdd

rdd to ds 会有 rdd[object] 没有TODS 的异常

保险搞法

val schema = new StructType()
  .add(StructField("client_date", StringType, true))
  .add(StructField("client_time", StringType, true))
  .add(StructField("server_date", StringType, true))
  .add(StructField("server_time", StringType, true))

。。。。。。

val schema = new StructType()

  .add(StructField("client_date", StringType, true))
  .add(StructField("client_time", StringType, true))
  .add(StructField("server_date", StringType, true))
  .add(StructField("server_time", StringType, true))

。。。。。。

然后

import spark.implicits._
var cubesDF = spark.createDataFrame(cubesRDD, schema)

0
顶

0
踩

分享到：

dataframe 和 dataset api | Spark 都干啥

2016-08-07 18:04
浏览 10992
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

spark rdd 和 DF 转换

一、Inferring the Schema Using Reflection

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

spark rdd 和 DF 转换

一、Inferring the Schema Using Reflection

评论

发表评论

相关推荐

Spark SQL运行 过程 抄的别人的，记录 学习

thriftserver log4j.properties 生效

udaf 返回的 子属性

spark datasource

如何 map 端 Join。

spark thrift server 修改

hive hbase thriftserver run

scala package

SPARK SERVER

driver class

spark thrift server 调试

spark SQL conf

java 死锁 ，内存问题 分析

thriftServer proxy

hive spark conf

get day

thriftserver

thriftserver dynamicallocation

test code2

test code

最近访客更多访客>>

Spark SQL运行过程抄的别人的，记录学习

udaf 返回的子属性

java 死锁，内存问题分析