Finding much faster ways to complete Hadoop queries for enterprise users is the aim of "Drill," the latest open-source project being undertaken by the Apache Software Foundation.
Drill has been established as an Apache Incubator Project, opening its continued development up to software engineers around the world, according to Tomer Shiran, director of product management for Hadoop vendor, MapR Technologies, which is one of the backers of the Apache Drill project.
The Drill project will work to create an open-source version of Google's DremelHadoop tools, which Google uses to speed up its internal use of its Hadoop data analysis tools.
"We've spent quite a few months talking to lots of organizations and potential users of Drill and to our customer base as well," said Shiran, who is a founding member of the Drill project. "We wanted to put this out there as an open-source project, rather than just keep it within MapR for our use alone."
Drill aids Hadoop users by enabling vastly quicker queries of huge data sets, said Shiran.
"With Drill, you'll be able to get really fast responses," he said. Users will be able to get responses within one second, which is a key difference from other tools that are available today, he added.
As it presently works as it was designed, Hadoop does batch processing of large data sets. Drill will improve on that method by doing "interactive analysis" that can find the required answers in the data more quickly, said Shiran. "Interactive analysis is much faster than batch processing."
The need for tools like Drill has been inspired by always-increasing user requirements, he said. "People have been doing queries in Hadoop, but since it doesn’t return answers to you within a few seconds, it has limitations."
Using Drill, users will be able to do ad hoc analysis and get faster responses, whether they are seeking anomalies, data trends or even network intrusions, according to Shiran. "With all of those things, you're going to have to get a pretty fast response or by the time you do figure it out, it's going to be old news."
The nascent Drill open-source project is currently in development and includes a variety of companies and individuals who are working on it right now. "A broad-based effort will be working on this," said Shiran. "There's quite a few people actively developing on the project now, so I don't think it will be a long time before we have an early version released."
Drill was inspired by Google's Dremel project, which helps Google perform data analyses on its huge data sets such as analyzing crawled Web documents, tracking install data for applications on the Android Market, analyzing spam, analyzing test results on Google’s distributed build system and more, according to Shiran.
By developing Drill as an Apache open-source project, organizers will be able to establish Drill's own APIs and establish a flexible and robust architecture that will support a broad range of data sources, data formats and query languages, according to the group.
MapR offers two versions of its Hadoop products: MapR M3, which is free; and MapR M5, which is a commercial version of the product with advanced features, including high availability, the ability to make data snapshots and do mirroring of datasets, and 24/7 support.
http://www.eweek.com/c/a/Application-Development/New-Apache-Project-Drill-Aims-to-Speed-Up-Hadoop-Queries-332333/
相关推荐
Apache Drill是一个开源的SQL查询引擎,它能够查询各种数据源,包括Hadoop上的数据、NoSQL数据库、云存储服务和本地文件系统。这本书的作者是Charles Givre和Paul Rogers,2019年首次发布。 首先,Apache Drill是一...
与传统的Hive不同,Drill不依赖MapReduce作业,并且它并不完全基于Hadoop生态系统。实际上,Drill的设计灵感来源于Google的Dremel概念,这是一种用于大规模数据查询的高效工具,后来演变为Apache软件基金会的一个...
### 使用Apache Drill技术详解 #### 一、Apache Drill概述 **Apache Drill** 是一款用于大数据交互式分析的强大工具,属于开源分布式系统。它的主要特点包括: - **支持多种数据源和格式**:不仅可以处理传统的...
Apache Drill是一款强大的、跨平台的数据查询引擎,专为大数据分析设计。它支持SQL查询语言,使得用户能够方便地处理各种不同类型的数据源,如Hadoop、NoSQL数据库、云存储等。在Apache Drill 1.18版本中,我们找到...
Apache Drill 支持 ANSI SQL 兼容语法,并提供了一些专门用于 NoSQL 和 Hadoop 数据存储系统上的语言。Drill 的查询语言包括 SELECT、INSERT、UPDATE、DELETE 等语句。Drill 的查询优化器可以自动优化查询计划,以...
Apache Drill是Google BigQuery团队发起的一个开源项目,它是一个分布式、低延迟的SQL查询引擎,设计用于处理大规模的非结构化和半结构化数据。Apache Drill的目标是提供一种简单、快速的方式来查询和分析大规模的...
Apache Drill是一款开源的分布式SQL查询引擎,专门设计用于大规模数据集的分析,尤其适用于现代大数据存储格式,如Hadoop Distributed File System (HDFS)、云存储服务以及NoSQL数据库。这款工具无需预先定义schema...
apache-drill-jdbc-plugin 适用于Apache Drill的JDBC插件 下载Apache Drill 0.9。 将代码添加到contrib中,然后用此文件夹中的pom文件替换现有的pom文件。 用mvn构建。 要仅生成软件包,请使用与以下类似的符号:...
Learning Apache Drill Queryand Analyze Distributed Data Sources with SQL
介绍Apache Kylin的背景,技术架构及演进,产品功能和性能等内容,以及开源现状和发展方向等,例如与Spark/Drill等的集成等。Apache Kylin是由eBay研发并贡献给开源社区的Hadoop上的分布式大规模联机分析(OLAP)...
文章标题《Hadoop at 10-the History and Evolution of the Apache Hadoop Ecosystem》和描述“ArchSummit”提示本文内容可能与Apache Hadoop生态系统的发展历史、现状以及未来展望相关,且很可能是在某个架构师峰会...
Apache Drill是一个分布式MPP查询层,支持针对NoSQL和Hadoop数据存储系统SQL和替代查询语言。 它的部分灵感来自 。 开发者 请阅读以设置和运行Apache Drill。 有关完整的开发人员文档,请参见 更多信息 请参阅或以...
- D Drill Hole to Drill Hole - E Mechanical Drill Hole to Cline - F Mechanical Drill Hole to Shape 3. 层叠错误(G – I) - G (无定义) - H (无定义) - I Single-line Impedance Target and Tolerance...
一组用于处理Internet域名的Apache Drill UDF UDFs 有一个UDF: suffix_extract(domain-string) :给定一个有效的互联网域名(FQDN或其他方式),这将返回一个地图的领域tld , assigned , subdomain和hostname的...
核心模块下列流程图呈现的是 Drillbit 的各个组件:以下是描述一个 Drillbit 关键组件的清单,如下所示:RPC endpoint:Drill 提供
用于 Apache Drill 的 Ruby 客户端 安装 首先, 。 对于 Homebrew,请使用: brew install apache-drill drill-embedded 并将这一行添加到您的应用程序的 Gemfile 中: gem 'drill-sergeant' 如何使用 创建...
Apache Drill 是一个开源的分布式大数据查询引擎,设计用于无模式(schema-less)的数据湖环境,支持多种文件格式,包括 Parquet、JSON、CSV等。它提供了SQL接口,使得用户能够轻松地对大规模分布式存储的数据进行...
Apache Drill的Node.js客户端Drillnode是使用REST API从Node.js连接并在Apache Drill上执行查询的客户端。安装使用NPM软件包管理器安装Drillone npm install drillnode入门// Requirementsconst Drill = require ( '...
Apache Drill是一款开源的分布式SQL查询引擎,主要用于大数据分析。它设计的目标是提供低延迟的交互式查询能力,支持多种数据源,包括Hadoop的HDFS、Amazon S3、Cassandra、MongoDB等,以及文件系统如本地文件系统或...
### Cognos Drill Up/Down:深入理解与应用 #### 引言 在现代数据分析领域,Cognos作为一款领先的企业级商业智能工具,提供了强大的报告设计与数据分析功能。其中,“Drill Up/Down”(钻取)是Cognos中的一个核心...