`

In-Memory Hadoop Accelerator

 
阅读更多

https://gridgaintech.wordpress.com/2013/11/07/hadoop-100x-faster-how-we-did-it/

 

Almost two years ago, Dmitriy and I stood in front of a white board at GridGain’s office thinking: “How can we deliver the real-time performance of GridGain’s in-memory technology to Hadoop customers without asking them rip and replace their systems and without asking them to move their datasets off Hadoop?”.

Given Hadoop’s architecture – the task seemed daunting; and it proved to be one of the more challenging engineering puzzles we have had to solve.

After two years of development, tens of thousands of lines of Java, Scala and C++ code, multiple design iterations, several releases and dozens of benchmarks later, we finally built a product that can deliver real-time performance to Hadoop customers with seamless integration and no tedious ETL. Actual customers deployments can now prove our performance claims and validate our product’s architecture.

Here’s how we did it.

The Idea – In-Memory Hadoop Accelerator

in_memory_hadoop2_whiteHadoop is based on two primary technologies: HDFS for storing data, and MapReduce for processing these data in parallel. Everything else in Hadoop and the Hadoop ecosystem sits atop these foundation blocks.

Originally, neither HDFS nor MapReduce were designed with real-time performance in mind. In order to deliver real-time processing without moving data out of Hadoop onto another platform, we had to improve the performance of both of these subsystems.

We decided to develop a high-performance in-memory file system which would provide 100% compatibility with HDFS and an optimized MapReduce implementation which would take advantage of this real-time file system. By doing so, we could offer all the advantages of our in-memory platform while minimizing the disruption of our customers’ existing Hadoop investments.

There are many projects and products that aim to improve Hadoop performance. Projects like HDFS2, Apache Tez, Cloudera Impala, HortonWorks Stinger, ScaleOut hServer and Apache Spark to name but a few, all aim to solve Hadoop performance issues in various ways. GridGain, puts a new spin on some of these approaches delivering unmatched performance gains while fanatically maintaining our commitment to making customers change less code and quickly get the benefits an in-memory computing platform can bring to their big data installations.

From a technology stand point GridGain’s In-Memory Hadoop Accelerator has some similarity to the architecture of Spark (optimized MapReduce), ScaleOut and HDFS2 (in-memory caching without ETL) and some features of Apache Tez (in-process execution), however, GridGain’s In-Memory Accelerator is the only product for Hadoop available today that combines the both the high performance HDFS-compatible file system and optimized in-memory MapReduce along with many other features in one fully integrated product.

In-Memory File System

First, we implemented GridGain’s In-Memory File System (GGFS) to accelerate I/O in the Hadoop stack. The original idea was that GGFS alone will be enough to gain significant performance increase. However, while we saw significant performance gains using GGFS, when working with our customers we quickly found that there were some not so obvious performance limitations to the way in which Hadoop performs MapReduce. It quickly became clear to us that GGFS alone won’t be enough but it was a critical piece that we needed to build first.

Note that you shouldn’t confuse GGFS with much slower alternatives like RAM disk. GGFS is based on our Memory-First architecture and addresses more than just the seek time of the “device”.

From the get go we designed GGFS to support both Hadoop v1 and YARN Hadoop v2. Further, we designed GGFS to work in two modes:

  • Primary (standalone), and
  • Secondary (caching HDFS).

In primary standalone mode GGFS acts as a bona-fide Hadoop file system that is PnP compatible with the standard HDFS interface. Our customers use it to deploy a high-performance in-memory Hadoop cluster and use it as any other Hadoop file system – albeit one that trades capacity for maximum performance.

One of the great added benefits of the primary mode is that it does away with NamedNode in the Hadoop deployment. Unlike a standard Hadoop deployment that requires shared storage for primary and secondary NameNodes which is usually implemented with a complex NFS setup mounted on each NameNode machine, GGFS seamlessly utilizes GridGain’s In-Memory Database under the hood to provide completely automatic scaling and failover without any need for additional shared storage or risky Single Point Of Failure (SPOF) architectures.

Furthermore, unlike Hadoop’s master-slave design for NamedNodes that prevents Hadoop systems from scaling linearly when adding new nodes, GGFS is built on a highly scalable, natively distributed, partitioned data store which provides linear scalability and auto-discovery of new nodes joining the cluster. Removing NamedNode form the architecture enabled dramatically better performance of IO operations.

GGFS’s primary mode provides maximum performance for IO operations but requires moving data from disk-based HDFS to an memory-based GGFS (i.e. from one file system to another). While data movement may be appropriate for some use cases, we support another operating mode in which absolutely no ETL is required – no need to move data out of HDFS. In this mode, GGFS works as an intelligent secondary in-memory distributed cache over the primary disk-based HDFS file system.

In the second mode, GGFS works as an intelligent secondary in-memory distributed cache over the primary disk-based HDFS file system. In this mode GGFS supports both synchronous and asynchronous read-through and write-through to and from HDFS providing either strong consistency or better performance in exchange for relaxed consistency with absolute transparency to the user and applications running on top of it. In this mode users can manually select which set of files and/or directories should be stored in GGFS and what mode – synchronous or asynchronous – should be used for each one of them for read-through and write-through to and from HDFS.

Another interesting feature of GGFS is its smart usage of block-level or file-level caching and eviction design. When working in primary mode GGFS utilizes file level caching to ensure corruption free storage (the file is either fully in GGFS or not at all). When in secondary mode, GridGain will automatically switch to block-level caching and eviction. What we discovered when working with our customers on real-world Hadoop payloads is that files on HDFS are often accessed not uniformly, i.e. they have significant “locality” in how portions of the file is being accessed. Put another way, certain blocks of a file are accessed more frequently than others. That observation led to our block-level caching implementation for the secondary mode that enables dramatically better memory utilization since GGFS can store only the most frequently used file blocks in memory – and not entire files which can easily measure in 100GBs in Hadoop.

Caching can NOT work effectively without a sophisticated eviction management mechanism to make sure that memory is used optimally. So, we built a new and technically robust eviction mechanism into our platform. Apart from obvious eviction features, you can configure certain files to never be evicted preserving them in memory in all cases for maximum performance, for example.

To ensure seamless and continuous performance during MapReduce file scanning, we’ve implemented smart data prefetching via streaming data that is expected to be read in the nearest future to the MapReduce task ahead of time. By doing so, GGFS ensures that whenever a MapReduce task finishes reading a file block, the next file block is already available in memory. A significant performance boost was achieved here due to our proprietary Inter-Process Communication (IPC) implementation which allows GGFS to achieve throughput of up to 30Gbit/s between two processes.

Table below shows GGFS vs. HDFS (on Flash-based SSDs) benchmark results for raw IO operations:

Benchmark GGFS, ms. HDFS, ms. Boost, % File Scan File Create File Random Access File Delete
27 667 2470%
96 961 1001%
413 2931 710%
185 1234 667%

The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

As you can see from these results the IO performance difference is quite significant. However, HDFS performance as a file system is only a part of Hadoop’s overhead. Another part, no less significant, is the MapReduce overhead. That is what we addressed with In-Memory MapReduce.

In-Memory MapReduce

Once we had our high performance in-memory file system built and tested, we turned our attention to a MapReduce implementation that would take advantage of in-memory technology.

Hadoop’s MapReduce design is one of the weakest points in Hadoop. It’s basically a inefficiently designed system when it comes to distributed processing. GridGain In-Memory MapReduce implementation relies heavily on 7 years of experience developing our widely deployed In-Memory HPC product. GridGain’s In-Memory MapReduce is designed on record-based approach vs. key-value approach of traditional MapReduce, and it enables much more streamlined parallel execution path on data stored in in-memory file system.

Furthermore, In-Memory MapReduce eliminates the standard overhead associated with the typical Hadoop job tracker polling, task tracker process creation, deployment and provisioning. All in all – GridGain’s In-Memory MapReduce is a highly optimized HPC-based implementation of the MapReduce concept enabling true low-latency data processing of data stored in GGFS.

The diagram below demonstrates the difference between a standard Hadoop MapReduce execution path and GridGain’s In-Memory MapReduce execution path:

gg_hadoop_mapred_800

As seen in this diagram our MapReduce implementation supports direct execution path from client to data node. Moreover, all execution in GridGain happens in-process with deployment handled automatically and transparently by GridGain.

In-Memory MapReduce also provides integration capability for MapReduce code written in any Hadoop supported language and not only in native Java or Scala. Developers can easily reuse existing C/C++/Python or any other existing MapReduce code with our In-Memory Accelerator for Hadoop to gain significant performance boost.

Finally, since we can remove task and job tracker polling, out of process execution, and the often unnecessary shuffling and sorting from MapReduce while letting our products work with a high-performance in-memory file system we can start seeing 10x – 100x performance increases on typical MapReduce payloads. This is not just theory, our tests and our customers can confirm this.

Below are the results for one of the internal tests that utilizes both In-Memory File System and In-Memory MapReduce. This test was specifically designed to show maximum GridGain’s Accelerator performance vs. stock Hadoop distribution for heavy I/O MapReduce jobs:

Nodes Hadoop, ms. Hadoop + GridGain Accelerator, ms. Boost, % 5 10 15 20 30 40
298,000 11,622 2,564%
201,350 5,537 3,636%
158,997 2,385 6,667%
122,008 1,647 7,407%
97,833 1,174 8,333%
82,771 780 10,612%

hadoop_chart

Tests were performed on a cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution and GridGain 5.2 release.

Management and Monitoring

No serious distributed system can be used without comprehensive DevOps support and In-Memory Accelerator for Hadoop comes with a comprehensive unified GUI-based management and monitoring tool called GridGain Visor. Over the last 12 months we’ve added significant support in Visor for Hadoop Accelerator.

Visor provides deep DevOps capabilities including an operations & telemetry dashboard, database and compute grid management, as well as GGFS management that provides GGFS monitoring and file management between HDFS, local and GGFS file systems.

visor_fm2

visor_ggfs

As part of GridGain Visor, In-Memory Accelerator For Hadoop also comes with a GUI-basedfile system profiler, which allows you to keep track of all operations your GGFS or HDFS file systems make and identifies potential hot spots.

GGFS profiler tracks speed and throughput of reads, writes, various directory operations, for all files and displays these metrics in a convenient view which allows you to sort based on any profiled criteria, e.g. from slowest write to fastest. Profiler also makes suggestions whenever it is possible to gain performance by loading file data into in-memory GGFS.

visor_profiler

Conclusion

After almost 2 years of development we have a well rounded product that can help you accelerate Hadoop MapReduce up to 100x times with minimal integration and effort. It’s based on our innovative high-performance in-memory file system and in-memory MapReduce implementation coupled with one of the best management and monitoring tools.

If you want to be able to say words “milliseconds” and “Hadoop” in one sentence – you need to take a serious look at GridGain’s In-Memory Hadoop Accelerator.

分享到:
评论

相关推荐

    Mellanox UDA Hadoop大数据解决方案.pdf

    该解决方案是基于Mellanox UDA(Unstructured Data Accelerator)的大数据解决方案,旨在加速Hadoop集群中的数据处理速度,提高数据中心的效率和可扩展性。该解决方案基于RDMA(Remote Direct Memory Access)技术和...

    少儿编程scratch项目源代码文件案例素材-绝地求生.zip

    少儿编程scratch项目源代码文件案例素材-绝地求生.zip

    嵌入式八股文面试题库资料知识宝典-文思创新面试题2010-04-08.zip

    嵌入式八股文面试题库资料知识宝典-文思创新面试题2010-04-08.zip

    一种基于剪切波和特征信息检测的太阳斑点图融合算法.pdf

    一种基于剪切波和特征信息检测的太阳斑点图融合算法.pdf

    并联型APF有源电力滤波器Matlab Simulink仿真:dq与αβ坐标系下的谐波无功检测与PI控制及SVPWM调制

    内容概要:本文详细介绍了并联型有源电力滤波器(APF)在Matlab/Simulink环境下的仿真研究。主要内容涵盖三个关键技术点:一是dq与αβ坐标系下的谐波和无功检测,利用dq变换和FBD技术实现实时检测;二是两相旋转坐标系(dq)与两相静止坐标系(αβ)下的PI控制,通过调整比例和积分环节实现精准控制;三是SVPWM调制方式的应用,通过优化开关时序提升系统效率和性能。文中还提供了详细的仿真介绍文档,包括模型搭建、参数设定以及结果分析。 适合人群:从事电力电子、自动化控制领域的研究人员和技术人员,尤其是对电力滤波器仿真感兴趣的读者。 使用场景及目标:适用于需要深入了解并联型APF工作原理和实现方式的研究人员,旨在通过仿真工具掌握谐波和无功检测、PI控制及SVPWM调制的具体应用。 其他说明:本文不仅提供了理论知识,还结合了实际操作步骤,使读者能够通过仿真模型加深对APF的理解。

    Arduino KEY实验例程【正点原子ESP32S3】

    Arduino KEY实验例程,开发板:正点原子EPS32S3,本人主页有详细实验说明可供参考。

    嵌入式八股文面试题库资料知识宝典-嵌入式C语言面试题汇总(66页带答案).zip

    嵌入式八股文面试题库资料知识宝典-嵌入式C语言面试题汇总(66页带答案).zip

    .archivetempdebug.zip

    .archivetempdebug.zip

    嵌入式系统开发_CH551单片机_USB_HID复合设备模拟_基于CH551单片机的USB键盘鼠标复合设备模拟器项目_用于通过CH551微控制器模拟USB键盘和鼠标输入设备_实现硬.zip

    嵌入式系统开发_CH551单片机_USB_HID复合设备模拟_基于CH551单片机的USB键盘鼠标复合设备模拟器项目_用于通过CH551微控制器模拟USB键盘和鼠标输入设备_实现硬

    少儿编程scratch项目源代码文件案例素材-剑客冲刺.zip

    少儿编程scratch项目源代码文件案例素材-剑客冲刺.zip

    少儿编程scratch项目源代码文件案例素材-火影.zip

    少儿编程scratch项目源代码文件案例素材-火影.zip

    两极式单相光伏并网系统的Boost电路与桥式逆变仿真及优化方法

    内容概要:本文详细介绍了两极式单相光伏并网系统的组成及其仿真优化方法。前级采用Boost电路结合扰动观察法(P&O)进行最大功率点跟踪(MPPT),将光伏板输出电压提升至并网所需水平;后级利用全桥逆变加L型滤波以及电压外环电流内环控制,确保并网电流与电网电压同频同相,实现高效稳定的并网传输。文中还提供了具体的仿真技巧,如开关频率设置、L滤波参数计算和并网瞬间软启动等,最终实现了98.2%的系统效率和低于0.39%的总谐波失真率(THD)。 适合人群:从事光伏并网系统研究、设计和开发的技术人员,特别是对Boost电路、MPPT算法、逆变技术和双环控制系统感兴趣的工程师。 使用场景及目标:适用于希望深入了解两极式单相光伏并网系统的工作原理和技术细节的研究人员和工程师。目标是在实际项目中应用这些理论和技术,提高光伏并网系统的效率和稳定性。 其他说明:文中提供的仿真技巧和伪代码有助于读者更好地理解和实现相关算法,在实践中不断优化系统性能。同时,注意电网电压跌落时快速切换到孤岛模式的需求,确保系统的安全性和可靠性。

    昭通乡镇边界,矢量边界,shp格式

    矢量边界,行政区域边界,精确到乡镇街道,可直接导入arcgis使用

    嵌入式八股文面试题库资料知识宝典-嵌入式c面试.zip

    嵌入式八股文面试题库资料知识宝典-嵌入式c面试.zip

    嵌入式八股文面试题库资料知识宝典-I2C总线.zip

    嵌入式八股文面试题库资料知识宝典-I2C总线.zip

    岩土工程中随机裂隙网络注浆模型及其应用:不同压力下注浆效果的研究

    内容概要:本文详细介绍了三种注浆模型——随机裂隙网络注浆模型、基于两相达西定律的注浆模型、基于层流和水平集的注浆扩散模型。首先,随机裂隙网络注浆模型基于地质学原理,模拟裂隙网络发育的实际地质情况,在不同注浆压力下进行注浆作业,以增强地基稳定性和提高承载能力。其次,基于两相达西定律的注浆模型利用数学公式模拟裂隙网络中的流体输送过程,适用于裂隙网络地质条件下的注浆效果分析。最后,基于层流和水平集的注浆扩散模型通过引入层流特性和水平集方法,更准确地模拟注浆过程中的扩散过程。文中还讨论了不同注浆压力对注浆效果的影响,并提出了优化建议。 适合人群:从事岩土工程、地基加固等相关领域的工程师和技术人员。 使用场景及目标:①帮助工程师选择合适的注浆模型和注浆压力;②为实际工程项目提供理论支持和技术指导;③提升地基加固的效果和效率。 其他说明:文章强调了在实际应用中需要结合地质条件、裂隙网络特点等因素进行综合分析,以达到最佳注浆效果。同时,鼓励不断创新注浆工艺和方法,以满足日益增长的地基加固需求。

    COMSOL Multiphysics 5.5与6.0版本Ar棒板粗通道流注放电仿真的电子特性分析

    内容概要:本文详细比较了COMSOL Multiphysics软件5.5和6.0版本在模拟Ar棒板粗通道流注放电现象方面的异同。重点探讨了不同版本在处理电子密度、电子温度、电场强度以及三维视图等方面的优缺点。文中不仅介绍了各版本特有的操作方式和技术特点,还提供了具体的代码实例来展示如何进行精确的仿真设置。此外,文章还讨论了网格划分、三维数据提取和电场强度后处理等方面的技术难点及其解决方案。 适合人群:从事等离子体物理研究的专业人士,尤其是熟悉COMSOL Multiphysics软件并希望深入了解其最新特性的研究人员。 使用场景及目标:帮助用户选择合适的COMSOL版本进行高效、精确的等离子体仿真研究,特别是在处理复杂的Ar棒板粗通道流注放电现象时提供指导。 其他说明:文章强调了在实际应用中,选择COMSOL版本不仅要考虑便捷性和视觉效果,还需兼顾仿真精度和可控性。

    嵌入式八股文面试题库资料知识宝典-C and C++ normal interview_8.doc.zip

    嵌入式八股文面试题库资料知识宝典-C and C++ normal interview_8.doc.zip

    通信系统中波形优化与捷变频、PRT抗干扰技术及ISRJ联合优化的应用研究

    内容概要:本文详细介绍了在现代通信系统中,抗干扰技术的重要性和具体应用方法。首先阐述了抗干扰技术的背景及其重要性,随后分别讨论了捷变频技术和波形优化技术的具体机制和优势。捷变频技术能快速改变工作频率,防止被干扰源锁定;波形优化技术则通过改进信号波形来提升抗干扰性能。接着,文章探讨了两种技术相结合的协同效应,最后重点介绍了发射信号及接收滤波器联合优化的抗干扰策略(ISRJ),这是一种综合性优化手段,旨在最大化抗干扰效果并提高通信质量。 适合人群:从事通信工程及相关领域的研究人员和技术人员,尤其是关注抗干扰技术的专业人士。 使用场景及目标:适用于需要提升通信系统稳定性和可靠性的场合,如军事通信、卫星通信等领域。目标是帮助技术人员理解和掌握先进的抗干扰技术,应用于实际项目中。 其他说明:文中提到的技术不仅限于理论层面,还涉及具体的实施细节和应用场景,有助于读者深入理解并应用于实践中。

Global site tag (gtag.js) - Google Analytics