`

hive-the summaries of features

    博客分类:
  • hive
 
阅读更多

  in these days ,i learned to the data warehouse framework-hive ,mainly from the ebook 'programming hive' [1],as it's about 23 chapters in detail;)

  so below are the outlines about this topic:

 

1.overview

2.architecture

3.features

4. hive vs pig,hive vs hbase

5.use cases

 

1.overview

  a data warehouse ,as you think ,one of the most explict features is data scale,may be gb,tb and event pb is a common case.in general ,we want to load the data into a database then query it by SQL ,finally generate a report to give out a decision etc.

  yep,similar to common rmdbs,hive supplies a SQL-style query language(HSQL which will explain the sqls to issue mapreduce jobs),in addition ,hive can use misc fs like local-fs,dfs,hbase cassandra as its underlying essential storage system . 

  so u maybe guess that hive is only a terminal that lies on a dfs.so it is very easy to load or process huge numbe of data via hive tool.

 

2.architecture

  

   figure 1 from 'programming hive'[1]



 

    figure from 'hive-a warehousing framework from facebook'[2]

 

  integrate both fitures above,we know that hive supplies some useful operation interfaces: cli,jdbc/odbc,web gui.and the  thrift server is used to construct a bridge between jdbc/odbc and hive core modules.

  secondly,there is a named 'metastore' machism which used to store table schema etc meta data,ie. derby db by default.as in generaly these info is small enough to place there .but u should use a spare db solutions like mysql cluster if you want to aoid a SPOF (single point  of failure) .

 

3.features

 here are certain important features referenced from 'programming hive':

  sql-style execution language

  the main sql grammers are introduced to hive,so query from  and load to hive are all no learning curves as who has basic sql knowledges;and it's easy to concise sqls command than mapreduce code to issue jobs.

  flexable,controllabe 

  in hive,there are certain execution modes to execute the sql commands.in local mode,it's fit for analyzing small data processes,as this will spawn local job to do that other than real cluster running;in parallel mode,the jobs for the same goal(bussiness) are allowed to run parallelly if that some tasks are independent;in restrict mode,some cost-heavy operations are disallowed to spawn,so this will void to preempt large resources which other jobs are necessary.

  static data 

  unlike a command rmdbs,hive doesn't support row-level DDL to operate on data,once the data are placed in,no updates are allowed;instead,u can use some DML to mantian it some schema related cases.

  own table stucture/store but not rapid query reponse time and not supports transactions

  yep,hive can be integrated with other no-sql solutions,like hbase,cassandra etc,but if u are not happy with them ,u can use the hive-own table to store data query from underlying fil system.

  but unlike nosql,hive will not support effect query responses,as it 's desinged to analyze large scale data rather than a common db,this is why hive comes from.

  partitioned table

  with this ,the data generated from hive can be scaled horizontally across the cluster.

  supports views and indexes

  hive will use a extended table to support indexes,like hbase,this is a usual solution .

  file compression

  similar to hadoop 's compress ,the output of hive is also able to use compress algorithms to reduce io floods.

 

  

4. hive vs pig ,hive vs hbase

  hive vs pig

  hive pig
execution language sql pig latin
own db yes(so it supports jdbc/odbc connectors) no
execution model compile(to MR),optimize,execute same as hive

 

  hive vs hbase

 

  hive hbase
support sql  yes no
transaction no

row-level(not confirm above 1.x

for table -level)

real response no yes
target data warehouse no-sql
index yes no
file system hdfs,mapR.. hdfs

 

 

5.use cases

  note that above 2.x the hive jar is complied with jdk7,so if u deploy it to a jdk6 or below ,it will complain the 'unsupport class version for marjor.minor:51.0' error.but u can upgrade to 1.x in the fly.

 

 

 

ref:

[1]programming hive

[2]hive-a warehousing solution from facebook

 

 

  • 大小: 143.1 KB
  • 大小: 158.7 KB
分享到:
评论

相关推荐

    含两个文件hive-jdbc-3.1.2-standalone.jar和apache-hive-3.1.2-bin.tar.gz

    含两个文件hive-jdbc-3.1.2-standalone.jar和apache-hive-3.1.2-bin.tar.gz 含两个文件hive-jdbc-3.1.2-standalone.jar和apache-hive-3.1.2-bin.tar.gz 含两个文件hive-jdbc-3.1.2-standalone.jar和apache-hive-...

    apache-hive-2.1.1-bin.tar

    apache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tarapache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tarapache-hive-2.1.1-bin.tar apache-hive-2.1.1-...

    Hive-2.1.1-CDH-3.6.1 相关JDBC连接驱动 Jar 包集合

    02、hive-exec-2.1.1-cdh6.3.1.jar 03、hive-jdbc-2.1.1-cdh6.3.1.jar 04、hive-jdbc-2.1.1-cdh6.3.1-standalone.jar 05、hive-metastore-2.1.1-cdh6.3.1.jar 06、hive-service-2.1.1-cdh6.3.1.jar 07、libfb303-...

    hive-exec-2.1.1.jar

    hive-exec-2.1.1 是 Apache Hive 的一部分,特别是与 Hive 的执行引擎相关的组件。Apache Hive 是一个构建在 Hadoop 之上的数据仓库基础设施,它允许用户以 SQL(结构化查询语言)的形式查询和管理大型数据集。Hive ...

    hive-jdbc-uber-2.6.5.jar

    For the "Driver File Paths" you are pointing to hive-jdbc-uber-x.jar. jdbc:hive2://<server>:<port10000>/ Create a new connection ("Database" > "Create Database Connection") and fill out the ...

    hive-jdbc-1.2.1-standalone.jar

    hive-jdbc-1.2.1-standalone.jar hive-jdbc驱动jar包,欢迎下载

    hive驱动包hive-jdbc-uber-2.6.5.0-292.jar(用户客户端连接使用)

    标题中的"**hive-jdbc-uber-2.6.5.0-292.jar**"是一个Uber(也称为Shaded)JAR文件,它集成了Hive JDBC驱动的所有依赖项。Uber JAR的目的是为了方便部署,因为它将所有必需的库合并到一个单一的文件中,避免了类路径...

    hive-jdbc-3.1.2-standalone

    hive-jdbc-3.1.2-standalone适用于linux

    hive-jdbc-jar-多版本.zip

    "hive-jdbc-jar-多版本.zip"是一个压缩包,包含了不同版本的Hive JDBC Uber Jars,覆盖了从1.5到1.8的多个Hive版本,适应不同的项目需求。 首先,我们要理解Uber JAR的概念。Uber JAR(也称为Shaded JAR)是一个...

    Apache Hive(apache-hive-3.1.3-bin.tar.gz)

    Apache Hive(apache-hive-3.1.3-bin.tar.gz、apache-hive-3.1.3-src.tar.gz)是一种分布式容错数据仓库系统,支持大规模分析,并使用 SQL 促进读取、写入和管理驻留在分布式存储中的 PB 级数据。Hive 构建在 Apache...

    hive-exec-*.jar包

    Missing Hive Execution Jar: /hive/hive1.2.1/lib/hive-exec-*.jar

    hive-jdbc-2.1.0-standalone.jar

    hive-jdbc-2.1.0-standalone.jar

    apache-hive-3.1.2-bin.tar.gz

    3. `conf/`:默认的配置文件,如`hive-default.xml`和`hive-site.xml`,用户可以在此修改Hive的行为。 4. `scripts/`:包含Hive的一些初始化和管理脚本。 5. `metastore/`:元数据存储相关的库和脚本,Hive使用元...

    hive-jdbc-2.1.0.jar

    hive-jdbc-2.1.0.jar

    hive-exec-3.1.2 排除guava

    hive-exec-3.1.2 排除guava

    hive-jdbc-uber-3.1.2+yanagishima-18.0

    这里我们关注的是"Hive-jdbc-uber-3.1.2",这是一个包含所有依赖的Uber jar包,旨在简化部署和使用流程。 Hive JDBC Uber 3.1.2是Hive的Java数据库连接器的一个优化版本,它通过将所有必需的依赖项打包到一个单一的...

    hive-jdbc-1.2.1.spark2.jar

    hive-serde-1.1.0,mysql-connector-java-5.1.31.jar,hive-jdbc-standalone,atlas-plugin-classloader-1.2.0,hive-bridge-shim-1.2.0

    apache-hive-3.1.2-bin.tar.gz.zip

    apache-hive-3.1.2-bin.tar.gz, 下载自:https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/, 上传至CSDN备份,本资源下载后需要解压缩zip文件,才是原本的apache-hive-3.1.2-bin.tar.gz文件

    hive-jdbc-2.1.1-cdh6.2.0-standalone.jar

    hive-jdbc-2.1.1-cdh6.2.0(ieda等jdbc链接hive2.1.1);cdh6.2.0安装的hive2.1.1

    Apache Hive(apache-hive-1.2.2-bin.tar.gz)

    Apache Hive(apache-hive-1.2.2-bin.tar.gz、apache-hive-1.2.2-src.tar.gz)是一种分布式容错数据仓库系统,支持大规模分析,并使用 SQL 促进读取、写入和管理驻留在分布式存储中的 PB 级数据。Hive 构建在 Apache...

Global site tag (gtag.js) - Google Analytics