dataframe 和 dataset api

lingzhi007

浏览: 128593 次
性别:
来自: 杭州

最近访客更多访客>>

morelily

gaojingsong

gaz0301

jiedushi

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

spark 学习

dataframe 和 dataset api

dataframe

scala> teenagersDF

res14: org.apache.spark.sql.DataFrame = [name: string, age: bigint]

scala> teenagersDF.

!= flatMap repartition

## foreach rollup

+ foreachPartition sample

-> formatted schema

== getClass select

agg groupBy selectExpr

alias groupByKey show

apply hashCode sort

as head sortWithinPartitions

asInstanceOf inputFiles sparkSession

cache intersect sqlContext

coalesce isInstanceOf stat

col isLocal synchronized

collect isStreaming take

collectAsList javaRDD takeAsList

columns join toDF

count joinWith toJSON

createOrReplaceTempView limit toJavaRDD

createTempView map toLocalIterator

cube mapPartitions toString

describe na transform

distinct ne union

drop notify unionAll

dropDuplicates notifyAll unpersist

dtypes orderBy wait

ensuring persist where

eq printSchema withColumn

equals queryExecution withColumnRenamed

except randomSplit write

explain randomSplitAsList writeStream

explode rdd →

filter reduce

first registerTempTable

dataset

In the Scala API, DataFrame is simply a type alias of Dataset[Row]

val df = spark.read.json("examples/src/main/resources/people.json")

res13: org.apache.spark.sql.DataFrame = [age: bigint, name: string]

scala> df.

agg foreachPartition sample

alias groupBy schema

apply groupByKey select

as head selectExpr

cache inputFiles show

coalesce intersect sort

col isLocal sortWithinPartitions

collect isStreaming sparkSession

collectAsList javaRDD sqlContext

columns join stat

count joinWith take

createOrReplaceTempView limit takeAsList

createTempView map toDF

cube mapPartitions toJSON

describe na toJavaRDD

distinct orderBy toLocalIterator

drop persist toString

dropDuplicates printSchema transform

dtypes queryExecution union

except randomSplit unionAll

explain randomSplitAsList unpersist

explode rdd where

filter reduce withColumn

first registerTempTable withColumnRenamed

flatMap repartition write

foreach rollup writeStream

两者对象类型一样，但是，所拥有的方法并不是完全一样？

分享到：

install R | spark rdd 和 DF 转换

2016-08-07 18:12
浏览 798
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论