Hadoop and MongoDB Use Cases
The following are some example deployments with MongoDB and Hadoop. The goal is to provide a high-level description of how MongoDB and Hadoop can fit together in a typical Big Data stack. In each of the following examples MongoDB is used as the “operational” real-time data store and Hadoop is used for offline batch data processing and analysis.
Batch Aggregation
In several scenarios the built-in aggregation functionality provided by MongoDB is sufficient for analyzing your data. However in certain cases, significantly more complex data aggregation may be necessary. This is where Hadoop can provide a powerful framework for complex analytics.
In this scenario data is pulled from MongoDB and processed within Hadoop via one or more MapReduce jobs. Data may also be brought in from additional sources within these MapReduce jobs to develop a multi-datasource solution. Output from these MapReduce jobs can then be written back to MongoDB for later querying and ad-hoc analysis. Applications built on top of MongoDB can now use the information from the batch analytics to present to the end user or to drive other downstream features.
Data Warehouse
In a typical production scenario, your application’s data may live in multiple datastores, each with their own query language and functionality. To reduce complexity in these scenarios, Hadoop can be used as a data warehouse and act as a centralized repository for data from the various sources.
In this situation, you could have periodic MapReduce jobs that load data from MongoDB into Hadoop. This could be in the form of “daily” or “weekly” data loads pulled from MongoDB via MapReduce. Once the data from MongoDB is available from within Hadoop, and data from other sources are also available, the larger dataset data can be queried against. Data analysts now have the option of using either MapReduce or Pig to create jobs that query the larger datasets that incorporate data from MongoDB.
ETL Data
MongoDB may be the operational datastore for your application but there may also be other datastores that are holding your organization’s data. In this scenario it is useful to be able to move data from one datastore to another, either from your application’s data to another database or vice versa. Moving the data is much more complex than simply piping it from one mechanism to another, which is where Hadoop can be used.
In this scenario, Map-Reduce jobs are used to extract, transform and load data from one store to another. Hadoop can act as a complex ETL mechanism to migrate data in various forms via one or more MapReduce jobs that pull the data from one store, apply multiple transformations (applying new data layouts or other aggregation) and loading the data to another store. This approach can be used to move data from or to MongoDB, depending on the desired result.
MongoDB Connector for Hadoop
The MongoDB Connector for Hadoop is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.
The source code is available on github where you can find a more comprehensive readme.
If you have questions please email the mongodb-user Mailing List. For any issues please file a ticket in Jira.
Installation
The MongoDB Connector for Hadoop uses Gradle tool for compilation. To build, simply invoke the jar task as seen with the following command:
./gradlew jar
The MongoDB Connector for Hadoop supports a number of Hadoop releases. You can change the Hadoop version supported by passing the hadoop_version parameter to gradle. For instance, to build against Apache Hadoop 2.2 use the following command:
./gradlew jar -Phadoop_version=2.2
After building, you will need to place the “core” jar and the mongo-java-driver in the lib directory of each Hadoop server.
For more complete install instructions please see the install instructions in the readme
References
http://docs.mongodb.org/ecosystem/tools/hadoop/
http://docs.mongodb.org/ecosystem/use-cases/hadoop/
http://www.mongodb.com/press/integration-hadoop-and-mongodb-big-data%E2%80%99s-two-most-popular-technologies-gets-significant
相关推荐
MongoDB: The Definitive Guide MongoDB is a powerful, flexible, and scalable generalpurpose database. It combines the ability to scale out with features such as secondary indexes, range queries, ...
1. **问题原因** - MongoDB的启动脚本可能未正确安装或缺失。 - 系统路径配置不正确,导致找不到`/etc/init.d/mongodb`脚本。 - 使用的是旧版的启动方式,而系统已升级到Systemd,需要使用新的启动命令。 2. **...
- **客户行为分析**:Hadoop 处理大量历史数据,揭示客户的行为模式,支持精准营销策略的制定。 #### 四、MongoDB Connector概述及特点 MongoDB Connector 作为一个桥梁,连接了 MongoDB 和 Hadoop,使得两者之间...
【文件系统、MongoDB、Hadoop 存取方案分析】 在大数据时代,高效的数据存取方案至关重要。本文将深入探讨三种常见的数据存取方案:文件系统、MongoDB 和 Hadoop,以及它们各自的特点和适用场景。 一、文件系统 ...
深入学习MongoDB:Scaling MongoDB && 50 Tips and Tricks for MongoDB Developers深入学习MongoDB中文版Scaling MongoDB英文版50 Tips and Tricks for MongoDB Developers英文版高清完整目录3本打包合集
1. 连接MongoDB:输入`mongo.exe`命令,连接到本地MongoDB服务。 2. 创建数据库:使用`use <database_name>`命令,如`use testdb`,创建一个名为"testdb"的数据库。 3. 插入数据:在选定的数据库中,使用`db....
3. 生态系统:Hadoop有庞大的生态系统,包括HBase(分布式列式数据库)、YARN(资源管理系统)、Pig(数据流处理)、Spark(快速数据处理框架)等,这些工具进一步扩展了Hadoop的功能。 书中的《Hadoop权威指南》和...
【MongoDB&Hadoop技术交流】 本篇技术交流主要探讨了MongoDB和Hadoop两种在大数据处理领域中广泛应用的技术。MongoDB是一种NoSQL数据库,而Hadoop是分布式计算框架,两者在处理大规模数据方面有着各自的优势。 **...
MongoDB与Hadoop MapReduce的海量非结构化数据处理方案 本文旨在探索基于MongoDB与Hadoop MapReduce的海量非结构化数据处理方案,旨在解决大数据时代下的数据处理难题。该方案通过MongoDB Cluster、MongoDB-...
1. Hadoop技术及其生态系统: Hadoop是一个开源的分布式存储和计算平台,能够处理大规模数据。它包括多个组件,如HDFS(Hadoop分布式文件系统)、MapReduce(分布式计算模型)、HBase(基于列的NoSQL数据库)、...
文档层使用MongoDB:registered:有线协议,允许通过现有的MongoDB:registered:客户端绑定使用MongoDB:registered:API。 所有持久数据都存储在FoundationDB键值存储中。 文档层实现了MongoDB:registered:API(v ...
什么是 MongoDB MongoDB 简介 MongoDB 特点 安装与配置 安装 MongoDB 启动与配置 MongoDB 基本操作 数据库和集合 文档操作 查询操作 基本查询 高级查询 索引与性能优化 创建索引 索引类型 索引优化 聚合操作 聚合...
MongoDB-Hadoop 研讨会练习 MongoDB 作为操作数据库为应用程序提供支持,而 Hadoop 与强大的分析基础设施一样提供智能。 在本次研讨会中,我们将首先了解这些技术如何与适用于 Hadoop 的 MongoDB 连接器配合使用。 ...
PM2模块可自动监视mongodb的生命体征: 查询,输入,更新,删除 连接数 已用存储空间 网络速度(输入和输出) 代表名称和状态 pm2-mongodb 安装 $ npm install pm2 -g $ pm2 install pm2-mongodb 组态 NODE :用户...
一个盒子里的MongoDB (!!! mongodb @ testing over alpine:edge !!!)使用MongoDB的基于AlpineLinux的Docker映像 用法 作为服务器: docker run -d --name mongodb -p 27017:27017 -v /data/mongodb:/var/lib/...
mongodb人偶模块 目录 概述 从OS存储库或从MongoDB社区/企业存储库在RHEL / Ubuntu / Debian上安装MongoDB。 模块说明 MongoDB模块管理mongod守护程序的mongod服务器安装和配置。 目前,它仅支持一个MongoDB服务器...
Spring集成MongoDB官方指定jar包:spring-data-mongodb-1.4.1.RELEASE.jar
羽毛mongodb 用于数据库适配器,使用用于。 $ npm install --save mongodb feathers-mongodb 重要提示: feathers-mongodb实现了和。 该适配器还需要一个数据库服务器。 原料药 service(options) 返回使用给定...
安装$ npm install mqemitter-mongodb --save例子var mongodb = require ( 'mqemitter-mongodb' )var mq = mongodb ( { url : 'mongodb://127.0.0.1/mqemitter?auto_reconnect'} )var msg = { topic : 'hello world'...