`
badxy
  • 浏览: 145210 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论
阅读更多
HBase suffers terribly from the inability of applications to flush file data to storage before the file is closed, and a crash of any portion of the HBase servers or a service interrupting crash of HDFS will result in data loss.
In prior chapters there was a discussion of problems caused by applications or server processes attempting to exceed the system-imposed limit on the number of open files; HBase also has this problem. The problem is substantially aggravated because each Hadoop MapFile is actually two files and a directory in HDFS, and each HDFS file also has a hidden checksum file. Setting the per-process open file count very large is a necessity for the HBase servers. A storage file format, HFile, is under development and due for Hbase version 0.20.0, and is expected to solve many of the performance and reliability issues.
HBase relies utterly on a smoothly performing HDFS for its operation; any stalls or DataNode instability will show up as HBase errors. There are HDFS tuning parameters suggested in the troubleshooting section on the HBase wiki: /Troubleshooting. In particular, if the underlying HDFS cluster is experiencing a slow block report problem, HADOOP-4584 , HBase is not recommended.
HBase servers, particularly the version using memcached, are memory intensive and generally require at least a gigabyte of real memory per server; any paging will drastically affect performance. Java Virtual Machine (JVM) garbage collection thread stalls are also causing HBase failures.
HBase generally provides downloadable release bundles that track the Hadoop Core distributions. HBase is not part of the Hadoop Core distribution.
分享到:
评论

相关推荐

    Hadoop 3rd ed

    - Hadoop生态系统包括一系列工具和服务,如Hive、Pig、HBase等。 - 这些工具和服务扩展了Hadoop的功能,使其能够更好地满足不同场景的需求。 - 例如,Hive提供了一个SQL-like接口,使得用户可以直接查询HDFS中的...

    ZZ052 大数据应用与服务赛项赛题.rar

    3. 数据存储:理解Hadoop、HBase、Spark等分布式存储系统,以处理大规模数据。 4. 数据处理:运用MapReduce、Spark SQL等工具进行数据预处理和转换。 5. 数据分析:运用统计学方法和机器学习算法,如聚类、分类、...

    Phoenix文档.docx

    总的来说,Phoenix的集成和使用涉及到了Hadoop生态中的多个组件,包括Kerberos安全认证、HBase配置以及Java编程。通过理解并熟练掌握这些知识点,开发者可以在大数据环境中构建高效的数据查询系统,充分利用Phoenix...

    impala-uuid创建教程

    这一步骤可以使用Hadoop的`hdfs dfs -put`命令来完成。 #### 六、注册UDF到Impala 接下来,我们需要在Impala中注册刚刚创建的UDF。这可以通过Impala Shell命令来完成: ```sql ADD JAR hdfs://<your_hdfs_path>/...

    JAVA核心面试知识整理

    14. HBASE 15. MONGODB 16. CASSANDRA 17. 设计模式 18. 负载均衡 19. 数据库 20. 一致性算法 21. JAVA 算法 22. 数据结构. 23. 加密算法. 24. 分布式缓存 25. HADOOP 26. SPARK 27. STORM 28. YARN 29. 机器学习 30...

    大数据介绍ppt.ppt

    Hadoop 的核心组件包括分布式文件系统(HDFS)、分布式数据库存储系统(Hbase)和分布式计算构架(MapReduce)。 大数据处理技术的优点是成本降低、软件容错、简化并行分布式计算等。HDFS 是一个分布式文件存储系统...

    Cloudera CDH集群运维手册

    Cloudera CDH(Cloudera Distribution Including Apache Hadoop)是Cloudera公司提供的一款开源大数据平台,它包含了Hadoop生态系统中的多个组件,如HDFS、MapReduce、YARN、Hive、HBase等,为企业级大数据处理提供...

    IMPALA操作手册

    Impala的核心优势在于能够直接从Hadoop的HDFS或HBase中提取数据,从而绕过了MapReduce框架,显著提高了查询速度。 #### 二、ImpalaDaemon - **功能**: ImpalaDaemon是Impala的核心组件之一,负责在集群的每个...

Global site tag (gtag.js) - Google Analytics