`
jiezhu2007
  • 浏览: 245410 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
博客专栏
Cfa1f850-3fc3-3a36-9cd8-c3415c9610c6
hadoop技术学习
浏览量:144129
Group-logo
大数据产业分析
浏览量:2980
社区版块
存档分类
最新评论

spark introduction

阅读更多
What is Spark?
Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

spark  是一个开源的计算集群系统,目标是数据分析快速的执行和快速写的。

To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce.

为了使程序更快,spark提供初级的内存计算。可以将数据加载到内存中,多次查询速度要比像Hadoop这样基于磁盘的系统快

得多。

To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.

为使程序更快,spark提供简洁的接口包括Scala,java,python。同时,你可以使用Scala和python脚本和spark交互从而快速的查询大的数据集。

What can it do?
Spark was initially developed for two applications where keeping data in memory helps: iterativealgorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can run up to 100x faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our example jobs.

spark开始是为了两个数据可以存放在内存中的应用开发的,一个是interativealgoriths 通常用于数据挖掘,另一个是interactive 数据挖掘。在这两个实际应用中,spark提供比Hadoop mapreduce快一百倍的速度。当能,你也可以把spark应用于数据处理。

Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.

spark页是shark的引擎,shark是一个hive完全兼容的数据仓库系统,shark的速度比hive快一百倍以上。

While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.

spark是一个新的引擎,他可以处理任何Hadoop支持的数据源,所以非常容易替代已有的系统。

Who uses it?
Spark was developed in the UC Berkeley AMPLab. It’s used by several groups of researchers at Berkeley to run large-scale applications such as spam filtering and traffic prediction. It’s also used to accelerate data analytics at Conviva, Quantifind, and other companies — in total, 14 companies have contributed to Spark! Spark is open source under a BSD license, so download it to check it out.

spark是伯克利AMPLAB开发。它被多个实验室用于邮件过虑,交通预测等大数据的应用。同时也被conviva,quantifind以及其他的一些公司用来加速数据分析。总共有14家公司给spark贡献源码。spark使用开源BSC许可证,所以你可以下载试用。
分享到:
评论

相关推荐

    Spark Introduction

    ### Spark简介及核心技术知识点 #### 一、Spark概述 **Spark**是一款开源集群计算系统,其设计目标是为了让数据分析不仅执行速度快,而且编写代码也快。它通过内存计算和其他优化技术实现高效的数据处理能力。 ##...

    Spark SQL Introduction

    关于spark sql的英文讲义,通过讲义的学习,可以对spark sql有一定的了解

    Spark入门、数据分析及电信数据营销案例相关共4本电子书

    Spark Introduction,Spark快速数据处理 迷你书,Top100-电信场景下借助Spark进行信令数据实时营销-田毅,超越Hadoop的大数据技术:用Spark 和Shark进行基于内存的实时大数据分析

    Introduction to Big Data with Apache Spark

    很好的关于Spark的 介绍, 其中也包含了databricks 公司的 官方推荐的eBook, A-Gentle-Introduction-to-Apache-Spark。 备注:里面共有:9 pdf file。都是English Version。 推荐都看一下。 其实很容易理解!

    蘑菇街大数据技术 Spark Shuffle Introduction 共33页.pptx

    蘑菇街大数据技术 Spark Shuffle Introduction 共33页.pptx

    Spark Graphx Introduction

    关于spark graphx的介绍,上课用讲义,英文版本,通过讲义,可以了解spark graphx

    Apache Spark 2 for Beginners [2016]

    This book offers an easy introduction to the Spark framework published on the latest version of Apache Spark 2 Perform efficient data processing, machine learning and graph processing using various ...

    A Gentle Introduction to Apache Spark

    Apache Spark是一个统一的计算引擎和一系列库,用于计算机集群上的并行数据处理。它是由多位来自Databricks的Apache Spark大数据处理系统创始人联合创立的公司出版的实用入门书籍《Spark: The Definitive Guide》中...

    Spark Streaming Introduction

    关于Spark Streaming的介绍,讲课用的讲义,英文版本的

    Spark_for_Python

    Introduction to Apache Spark** Apache Spark is an open-source, distributed computing system designed for fast computation. It provides high-level APIs in Java, Scala, and Python, and an optimized ...

    Apache Spark 2.x for Java Developers

    The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when ...

    Advanced Analytics with Spark, 2nd Edition

    Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then...

    Spark dataset introduction

    关于spark2之后dataset的介绍,以及简单应用,上课用的简短的PPT

    Big Data Analytics with Spark 无水印pdf 0分

    What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-...

    Beginning Apache Spark 2

    Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for ...

    Spark 2.0 for Beginners(PACKT,2016)

    An introduction to SparkR is covered next.Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark’s stream ...

    Advanced Analytics with Spark: Patterns for Learning from Data at Scale

    You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly ...

Global site tag (gtag.js) - Google Analytics