`
Vitas_Wang
  • 浏览: 9910 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Import data to HBase

阅读更多

There have 3 common usages to import data to HBase:

1>Use ImportTsv

ImportTsv is a utility that will load data in TSV format into HBase.

There have two usages about it:

a.Loading data from TSV format in HDFS into HBase via Puts.

This kind of load is non-bulk loading

Eg:

Bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv –Dimporttsv.columns=HBASE_ROW_KEY,information:c1 table_name /user/hadoop/data

b.Another method is to use the bulk loading

It divided into 2 steps:

1>To generate StoreFiles for bulk-loading

 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.2.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,information:c1 -Dimporttsv.bulk.output=/user/hadoop/rdc_search_hfile -Dimporttsv.separator=, rdc_search_information /user/hadoop/sourcedata

2>Move the generated StoreFiles into an HBase table using completebulkload utility

bin/hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hadoop/rdc_search_hfile rdc_search_information

in above demo, the table name is rdc_search_information and it has one column family information.

Running ImportTsv with no arguments prints brief usage information:

Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>

 

Imports the given input directory of TSV data into the specified table.

 

The column names of the TSV data must be specified using the -Dimporttsv.columns

option. This option takes the form of comma-separated column names, where each

column name is either a simple column family, or a columnfamily:qualifier. The special

column name HBASE_ROW_KEY is used to designate that this column should be used

as the row key for each imported record. You must specify exactly one column

to be the row key, and you must specify a column name for every column that exists in the

input data.

 

By default importtsv will load data directly into HBase. To instead generate

HFiles of data to prepare for a bulk data load, pass the option:

  -Dimporttsv.bulk.output=/path/for/output

  Note: the target table will be created with default column family descriptors if it does not already exist.

 

Other options that may be specified with -D include:

  -Dimporttsv.skip.bad.lines=false - fail if encountering an invalid line

  '-Dimporttsv.separator=|' - eg separate on pipes instead of tabs

  -Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import

  -Dimporttsv.mapper.class=my.Mapper - A user-defined Mapper to use instead of org.apache.hadoop.hbase.mapreduce.TsvImporterMapper

 

2> We can write a java project and read the file in our OS and use the Put class to put the data into HBase.

3>Use the Pig to read the data from HDFS and write to HBase.

 

Ref: http://hbase.apache.org/book/ops_mgt.html#importtsv

 

分享到:
评论

相关推荐

    java操作Hbase之从Hbase中读取数据写入hdfs中源码

    import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Connection; import org.apache.hadoop.hbase.client.ConnectionFactory; ...

    java代码将mysql表数据导入HBase表

    import org.apache.hadoop.hbase.client.ConnectionFactory; import org.apache.hadoop.hbase.client.Table; import org.apache.hadoop.hbase.util.Bytes; public class CreateTable { public static void main...

    hbase和hadoop数据块损坏处理

    * hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'snap_test' -copyto /data/huang_test:将快照导出到 HDFS * clone_snapshot 'snap_test', 'test':将快照恢复到 HBase 表中 五、手动修复 ...

    hbase-1.4.10-bin.tar.gz

    &lt;value&gt;/path/to/zookeeper/data ``` 配置完成后,启动HBase。在单机模式下,可以使用`start-hbase.sh`命令;在分布式模式下,需要先启动Hadoop服务,然后启动HBase: ```bash # 单机模式 start-hbase.sh # ...

    springboot集成phoenix+hbase

    import org.springframework.data.jpa.repository.JpaRepository; import org.springframework.stereotype.Repository; @Repository public interface UserRepository extends JpaRepository, String&gt; { } ``` ...

    spark读取hbase数据,并使用spark sql保存到mysql

    import org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog val catalog = s"""{ "table":{"namespace":"default", "name":"my_table", "tableCoder":"PrimitiveType"}, "rowkey":"key", ...

    python3 hbase 库

    确保你的Python环境已经配置好,然后你可以通过`pip install -e /path/to/hbase`(替换/path/to/hbase为你的实际路径)来安装这个库。安装完成后,你可以在Python脚本中通过`import happybase`来引入HBase库。 2. *...

    hbase启动说明和python脚本

    def connect_to_hbase(): conn = Connection('localhost', port=9090) # 连接到本地HBase return conn def create_table(conn, table_name, families): conn.create_table(table_name, families) # 创建表,...

    hbase存储csv数据的代码实现

    我们创建了一个名为`my_table`的HBase表,并将数据逐行读入,然后根据列名('col1', 'col2', 'col3')和对应的值创建列族('data')的qualifiers。最后,使用`put`方法将这些数据写入表中。 在实际应用中,可能还...

    HBase.The.Definitive.Guide.2nd.Edition

    If you’re looking for a scalable storage solution to accommodate a virtually endless amount of data, this updated edition shows you how Apache HBase can meet your needs. Modeled after Google’s ...

    Python连接Hbase

    import happybase connection = happybase.Connection('localhost', port=9090) ``` 这里,'localhost'是HBase服务器的主机名,9090是Thrift服务的默认端口。 3. **创建表**:使用`happybase`可以方便地创建...

    python操作hbase

    from hbase import Hbase from hbase.ttypes import * try: # 创建socket连接 transport = TSocket.TSocket('127.0.0.1', 9090) # 缓冲传输层 transport = TTransport.TBufferedTransport(transport) # ...

    Python-HBase中文参考指南

    import happybase connection = happybase.Connection('localhost') table = connection.create_table('my_table', {'cf1': dict(max_versions=1)}) ``` ### 四、数据插入与查询 1. **数据插入**:通过`put`方法...

    apache hbase reference guide

    - **HBase as a MapReduce Job DataSource and Data Sink**(HBase作为MapReduce作业的数据源和数据接收器):如何将HBase作为MapReduce作业的输入输出。 - **Writing HFiles Directly During Bulk Import**(批量...

    springboot-hbase:以简单的订单业务示例hbse的使用&redis设计hbase查询分页

    import org.springframework.data.hadoop.hbase.HBasePersistentEntity; import org.springframework.data.hadoop.hbase.HBasePersistentProperty; import javax.persistence.Transient; import java.util.Date; @...

    11-HBase Java API编程实践1

    import org.apache.hadoop.hbase.client.*; import java.io.IOException; public class ExampleForHBase { public static Configuration configuration; public static Connection connection; public static ...

    大数据-数据迁移-hive、hbase、kudu迁移

    本文档详细记录了一次从自建Hadoop集群到华为云MRS(Managed Service for Big Data)的大规模数据迁移项目,涉及到了Hive、Kudu和HBase这三种不同类型的数据存储系统。以下是针对这些系统的迁移策略、流程和解决方案...

    【Python3】Hbase API

    import happybase ``` 2. 创建连接对象,指定HBase服务器地址和端口(默认为9090): ```python connection = happybase.Connection('localhost', port=9090) ``` 3. 打开或创建一个表: ```python table = ...

    python连接hbase的lib(版本0.98.8)

    import happybase # 创建连接 connection = happybase.Connection('localhost', port=9090) # 打开表 table = connection.table('my_table') # 插入数据 row_key = 'my_row' data = {'cf1:col1': 'value1', 'cf1:...

Global site tag (gtag.js) - Google Analytics