- 浏览: 613438 次
- 性别:
- 来自: 上海
文章分类
最新评论
-
月光杯:
问题解决了吗?
Exceptions in HDFS -
iostreamin:
神,好厉害,这是我找到的唯一可以ac的Java代码,厉害。
[leetcode] word ladder II -
standalone:
One answer I agree with:引用Whene ...
How many string objects are created? -
DiaoCow:
不错!,一开始对这些确实容易犯迷糊
erlang中的冒号 分号 和 句号 -
standalone:
Exception in thread "main& ...
one java interview question
源码解析基于HBase-0.20.6。
先看HTable类get()方法的code:
HTable.java
/** * Extracts certain cells from a given row. * @param get The object that specifies what data to fetch and from which row. * @return The data coming from the specified row, if it exists. If the row * specified doesn't exist, the {@link Result} instance returned won't * contain any {@link KeyValue}, as indicated by {@link Result#isEmpty()}. * @throws IOException if a remote or network exception occurs. * @since 0.20.0 */ public Result get(final Get get) throws IOException { return connection.getRegionServerWithRetries( new ServerCallable<Result>(connection, tableName, get.getRow()) { public Result call() throws IOException { return server.get(location.getRegionInfo().getRegionName(), get); } } ); }
这段code 比较绕,但至少我们知道可以去查connection的getRegionServerWithRetries方法。那么connection是个什么东西呢?
这个玩意是定义在HTable里面的:
private final HConnection connection;
何时实例化的呢?在HTable的构造函数里面:
this.connection = HConnectionManager.getConnection(conf);
这个conf是一个HBaseConfiguration对象,是HTable构造函数的参数。OK,继续道HConnectionManager里面看看这个connection怎么来的吧:
HConnectionManager.java
/** * Get the connection object for the instance specified by the configuration * If no current connection exists, create a new connection for that instance * @param conf * @return HConnection object for the instance specified by the configuration */ public static HConnection getConnection(HBaseConfiguration conf) { TableServers connection; synchronized (HBASE_INSTANCES) { connection = HBASE_INSTANCES.get(conf); if (connection == null) { connection = new TableServers(conf); HBASE_INSTANCES.put(conf, connection); } } return connection; }
现在我们知道每一个conf对应一个connection,具体来说是TableServers类对象(实现了HConnection接口),所有的connections放在一个pool里。那么connection到底干嘛用呢?我们要看看HConnection这个接口的定义。
HConnection.java
/** * Cluster connection. * {@link HConnectionManager} manages instances of this class. */ public interface HConnection { /** * Retrieve ZooKeeperWrapper used by the connection. * @return ZooKeeperWrapper handle being used by the connection. * @throws IOException */ public ZooKeeperWrapper getZooKeeperWrapper() throws IOException; /** * @return proxy connection to master server for this instance * @throws MasterNotRunningException */ public HMasterInterface getMaster() throws MasterNotRunningException; /** @return - true if the master server is running */ public boolean isMasterRunning(); /** * Checks if <code>tableName</code> exists. * @param tableName Table to check. * @return True if table exists already. * @throws MasterNotRunningException */ public boolean tableExists(final byte [] tableName) throws MasterNotRunningException; /** * A table that isTableEnabled == false and isTableDisabled == false * is possible. This happens when a table has a lot of regions * that must be processed. * @param tableName * @return true if the table is enabled, false otherwise * @throws IOException */ public boolean isTableEnabled(byte[] tableName) throws IOException; /** * @param tableName * @return true if the table is disabled, false otherwise * @throws IOException */ public boolean isTableDisabled(byte[] tableName) throws IOException; /** * @param tableName * @return true if all regions of the table are available, false otherwise * @throws IOException */ public boolean isTableAvailable(byte[] tableName) throws IOException; /** * List all the userspace tables. In other words, scan the META table. * * If we wanted this to be really fast, we could implement a special * catalog table that just contains table names and their descriptors. * Right now, it only exists as part of the META table's region info. * * @return - returns an array of HTableDescriptors * @throws IOException */ public HTableDescriptor[] listTables() throws IOException; /** * @param tableName * @return table metadata * @throws IOException */ public HTableDescriptor getHTableDescriptor(byte[] tableName) throws IOException; /** * Find the location of the region of <i>tableName</i> that <i>row</i> * lives in. * @param tableName name of the table <i>row</i> is in * @param row row key you're trying to find the region of * @return HRegionLocation that describes where to find the reigon in * question * @throws IOException */ public HRegionLocation locateRegion(final byte [] tableName, final byte [] row) throws IOException; /** * Find the location of the region of <i>tableName</i> that <i>row</i> * lives in, ignoring any value that might be in the cache. * @param tableName name of the table <i>row</i> is in * @param row row key you're trying to find the region of * @return HRegionLocation that describes where to find the reigon in * question * @throws IOException */ public HRegionLocation relocateRegion(final byte [] tableName, final byte [] row) throws IOException; /** * Establishes a connection to the region server at the specified address. * @param regionServer - the server to connect to * @return proxy for HRegionServer * @throws IOException */ public HRegionInterface getHRegionConnection(HServerAddress regionServer) throws IOException; /** * Establishes a connection to the region server at the specified address. * @param regionServer - the server to connect to * @param getMaster - do we check if master is alive * @return proxy for HRegionServer * @throws IOException */ public HRegionInterface getHRegionConnection( HServerAddress regionServer, boolean getMaster) throws IOException; /** * Find region location hosting passed row * @param tableName * @param row Row to find. * @param reload If true do not use cache, otherwise bypass. * @return Location of row. * @throws IOException */ HRegionLocation getRegionLocation(byte [] tableName, byte [] row, boolean reload) throws IOException; /** * Pass in a ServerCallable with your particular bit of logic defined and * this method will manage the process of doing retries with timed waits * and refinds of missing regions. * * @param <T> the type of the return value * @param callable * @return an object of type T * @throws IOException * @throws RuntimeException */ public <T> T getRegionServerWithRetries(ServerCallable<T> callable) throws IOException, RuntimeException; /** * Pass in a ServerCallable with your particular bit of logic defined and * this method will pass it to the defined region server. * @param <T> the type of the return value * @param callable * @return an object of type T * @throws IOException * @throws RuntimeException */ public <T> T getRegionServerForWithoutRetries(ServerCallable<T> callable) throws IOException, RuntimeException; /** * Process a batch of Puts. Does the retries. * @param list A batch of Puts to process. * @param tableName The name of the table * @return Count of committed Puts. On fault, < list.size(). * @throws IOException */ public int processBatchOfRows(ArrayList<Put> list, byte[] tableName) throws IOException; /** * Process a batch of Deletes. Does the retries. * @param list A batch of Deletes to process. * @return Count of committed Deletes. On fault, < list.size(). * @param tableName The name of the table * @throws IOException */ public int processBatchOfDeletes(ArrayList<Delete> list, byte[] tableName) throws IOException; }
上面的code是整个接口的定义,我们现在知道这玩意是封装了一些客户端查询处理请求,像put、delete这些封装在方法
public <T> T getRegionServerWithRetries(ServerCallable<T> callable) 里执行,put、delete等被封装在callable里面。这也就是为我们刚才在HTable.get()里看到的。
到这里要看TableServers.getRegionServerWithRetries(ServerCallable<T> callable)了,继续看code
public <T> T getRegionServerWithRetries(ServerCallable<T> callable) throws IOException, RuntimeException { List<Throwable> exceptions = new ArrayList<Throwable>(); for(int tries = 0; tries < numRetries; tries++) { try { callable.instantiateServer(tries!=0); return callable.call(); } catch (Throwable t) { t = translateException(t); exceptions.add(t); if (tries == numRetries - 1) { throw new RetriesExhaustedException(callable.getServerName(), callable.getRegionName(), callable.getRow(), tries, exceptions); } } try { Thread.sleep(getPauseTime(tries)); } catch (InterruptedException e) { // continue } } return null; }
比较核心的code就那两句,首先根据callable对象来完成一些定位ReginServer的工作,然后执行call来进行请求,这里要注意这个call方法是在最最最最开始的HTable.get里面的内部类里重写的。看ServerCallable类的一部分code:
public abstract class ServerCallable<T> implements Callable<T> { protected final HConnection connection; protected final byte [] tableName; protected final byte [] row; protected HRegionLocation location; protected HRegionInterface server; /** * @param connection * @param tableName * @param row */ public ServerCallable(HConnection connection, byte [] tableName, byte [] row) { this.connection = connection; this.tableName = tableName; this.row = row; } /** * * @param reload set this to true if connection should re-find the region * @throws IOException */ public void instantiateServer(boolean reload) throws IOException { this.location = connection.getRegionLocation(tableName, row, reload); this.server = connection.getHRegionConnection(location.getServerAddress()); }
所以一个ServerCallable对象包括tableName,row等,并且会通过构造函数传入一个connection引用,并且会调用该connection.getHRegionConnection方法来获取跟RegionServer打交道的一个handle(其实我也不知道称呼它啥了,不能叫connection吧,那就重复了,所以说HBase代码起的名字让我很ft,会误解)。
具体看怎么获得这个新玩意的:
HConnectinManager.java
public HRegionInterface getHRegionConnection(
HServerAddress regionServer, boolean getMaster)
throws IOException {
if (getMaster) {
getMaster();
}
HRegionInterface server;
synchronized (this.servers) {
// See if we already have a connection
server = this.servers.get(regionServer.toString());
if (server == null) { // Get a connection
try {
server = (HRegionInterface)HBaseRPC.waitForProxy(
serverInterfaceClass, HBaseRPCProtocolVersion.versionID,
regionServer.getInetSocketAddress(), this.conf,
this.maxRPCAttempts, this.rpcTimeout);
} catch (RemoteException e) {
throw RemoteExceptionHandler.decodeRemoteException(e);
}
this.servers.put(regionServer.toString(), server);
}
}
return server;
}
再挖下去看这个server怎么出来的(HBaseRPC类里面):
public static VersionedProtocol getProxy(Class<?> protocol, long clientVersion, InetSocketAddress addr, UserGroupInformation ticket, Configuration conf, SocketFactory factory) throws IOException { VersionedProtocol proxy = (VersionedProtocol) Proxy.newProxyInstance( protocol.getClassLoader(), new Class[] { protocol }, new Invoker(addr, ticket, conf, factory)); long serverVersion = proxy.getProtocolVersion(protocol.getName(), clientVersion); if (serverVersion == clientVersion) { return proxy; } throw new VersionMismatch(protocol.getName(), clientVersion, serverVersion); }
这两部分code看出用到了java的动态代理机制,server是一个动态代理对象,实现了变量serverInterfaceClass指定的接口。在这里也就是HRegionInterface,也就是说server实现了该接口的内容。那么该接口定义哪些方法呢?
public interface HRegionInterface extends HBaseRPCProtocolVersion { /** * Get metainfo about an HRegion * * @param regionName name of the region * @return HRegionInfo object for region * @throws NotServingRegionException */ public HRegionInfo getRegionInfo(final byte [] regionName) throws NotServingRegionException; /** * Return all the data for the row that matches <i>row</i> exactly, * or the one that immediately preceeds it. * * @param regionName region name * @param row row key * @param family Column family to look for row in. * @return map of values * @throws IOException */ public Result getClosestRowBefore(final byte [] regionName, final byte [] row, final byte [] family) throws IOException; /** * * @return the regions served by this regionserver */ public HRegion [] getOnlineRegionsAsArray(); /** * Perform Get operation. * @param regionName name of region to get from * @param get Get operation * @return Result * @throws IOException */ public Result get(byte [] regionName, Get get) throws IOException; /** * Perform exists operation. * @param regionName name of region to get from * @param get Get operation describing cell to test * @return true if exists * @throws IOException */ public boolean exists(byte [] regionName, Get get) throws IOException; /** * Put data into the specified region * @param regionName * @param put the data to be put * @throws IOException */ public void put(final byte [] regionName, final Put put) throws IOException; /** * Put an array of puts into the specified region * * @param regionName * @param puts * @return The number of processed put's. Returns -1 if all Puts * processed successfully. * @throws IOException */ public int put(final byte[] regionName, final Put [] puts) throws IOException; /** * Deletes all the KeyValues that match those found in the Delete object, * if their ts <= to the Delete. In case of a delete with a specific ts it * only deletes that specific KeyValue. * @param regionName * @param delete * @throws IOException */ public void delete(final byte[] regionName, final Delete delete) throws IOException; /** * Put an array of deletes into the specified region * * @param regionName * @param deletes * @return The number of processed deletes. Returns -1 if all Deletes * processed successfully. * @throws IOException */ public int delete(final byte[] regionName, final Delete [] deletes) throws IOException; /** * Atomically checks if a row/family/qualifier value match the expectedValue. * If it does, it adds the put. * * @param regionName * @param row * @param family * @param qualifier * @param value the expected value * @param put * @throws IOException * @return true if the new put was execute, false otherwise */ public boolean checkAndPut(final byte[] regionName, final byte [] row, final byte [] family, final byte [] qualifier, final byte [] value, final Put put) throws IOException; /** * Atomically increments a column value. If the column value isn't long-like, * this could throw an exception. * * @param regionName * @param row * @param family * @param qualifier * @param amount * @param writeToWAL whether to write the increment to the WAL * @return new incremented column value * @throws IOException */ public long incrementColumnValue(byte [] regionName, byte [] row, byte [] family, byte [] qualifier, long amount, boolean writeToWAL) throws IOException; // // remote scanner interface // /** * Opens a remote scanner with a RowFilter. * * @param regionName name of region to scan * @param scan configured scan object * @return scannerId scanner identifier used in other calls * @throws IOException */ public long openScanner(final byte [] regionName, final Scan scan) throws IOException; /** * Get the next set of values * @param scannerId clientId passed to openScanner * @return map of values; returns null if no results. * @throws IOException */ public Result next(long scannerId) throws IOException; /** * Get the next set of values * @param scannerId clientId passed to openScanner * @param numberOfRows the number of rows to fetch * @return Array of Results (map of values); array is empty if done with this * region and null if we are NOT to go to the next region (happens when a * filter rules that the scan is done). * @throws IOException */ public Result [] next(long scannerId, int numberOfRows) throws IOException; /** * Close a scanner * * @param scannerId the scanner id returned by openScanner * @throws IOException */ public void close(long scannerId) throws IOException; /** * Opens a remote row lock. * * @param regionName name of region * @param row row to lock * @return lockId lock identifier * @throws IOException */ public long lockRow(final byte [] regionName, final byte [] row) throws IOException; /** * Releases a remote row lock. * * @param regionName * @param lockId the lock id returned by lockRow * @throws IOException */ public void unlockRow(final byte [] regionName, final long lockId) throws IOException; /** * Method used when a master is taking the place of another failed one. * @return All regions assigned on this region server * @throws IOException */ public HRegionInfo[] getRegionsAssignment() throws IOException; /** * Method used when a master is taking the place of another failed one. * @return The HSI * @throws IOException */ public HServerInfo getHServerInfo() throws IOException; }
可以看出HRegionInterface是定义了具体的向RegionServer查询的方法。
现在回过头来,当server这个动态代理对象实例化后,经过ServerCallable.call() 最后会调到server.get()。按照java的代理机制,又会传递到我们在构造这个动态代理对象时候传进去的new Invoker(addr, ticket, conf, factory))对象去执行具体的方法。
简单的说,这个Invoker对象使用HBase的RPC客户端跟RegionServer通信完成请求以及结果接收等等。
看看这个RPC客户端长什么样吧:
public Invoker(InetSocketAddress address, UserGroupInformation ticket, Configuration conf, SocketFactory factory) { this.address = address; this.ticket = ticket; this.client = CLIENTS.getClient(conf, factory); //client就是RPC客户端 }
这个client是HBaseClient类的对象,这个HBaseClient类就是HBase中用来做RPC的客户端类。在这里HBaseClient也做了一个pool机制,不理解。。。code里面的注释如下:
// Construct & cache client. The configuration is only used for timeout,
// and Clients have connection pools. So we can either (a) lose some
// connection pooling and leak sockets, or (b) use the same timeout for all
// configurations. Since the IPC is usually intended globally, not
// per-job, we choose (a).
继续说下去,看这么一个client怎么完成最后的请求:
public Writable call(Writable param, InetSocketAddress addr, UserGroupInformation ticket) throws IOException { Call call = new Call(param); Connection connection = getConnection(addr, ticket, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } // local exception throw wrapException(addr, call.error); } return call.value; } }
又见connection,这次的connection可是用来发送接收数据用的thread了。从getConnection(addr, ticket, call)推断又是一个pool,果不其然:
/** Get a connection from the pool, or create a new one and add it to the * pool. Connections to a given host/port are reused. */ private Connection getConnection(InetSocketAddress addr, UserGroupInformation ticket, Call call) throws IOException { if (!running.get()) { // the client is stopped throw new IOException("The client is stopped"); } Connection connection; /* we could avoid this allocation for each RPC by having a * connectionsId object and with set() method. We need to manage the * refs for keys in HashMap properly. For now its ok. */ ConnectionId remoteId = new ConnectionId(addr, ticket); do { synchronized (connections) { connection = connections.get(remoteId); if (connection == null) { connection = new Connection(remoteId); connections.put(remoteId, connection); } } } while (!connection.addCall(call)); //we don't invoke the method below inside "synchronized (connections)" //block above. The reason for that is if the server happens to be slow, //it will take longer to establish a connection and that will slow the //entire system down. connection.setupIOstreams(); return connection; }
也就是说,只要所要查询的RegionServer的addr和用户组信息一样,就会共享一个connection。connection拿到后会将当前call放进自己内部的一个队列里(维护着call的id=》call的一个映射),当call完成后会更新call的状态(主要是否完成这么一个标志Call.done以及将请求结果填充在Call.value里)。
好了现在的情形是,现在看connection如何发送请求数据。
/** Initiates a call by sending the parameter to the remote server. * Note: this is not called from the Connection thread, but by other * threads. * @param call */ public void sendParam(Call call) { if (shouldCloseConnection.get()) { return; } DataOutputBuffer d=null; try { synchronized (this.out) { if (LOG.isDebugEnabled()) LOG.debug(getName() + " sending #" + call.id); //for serializing the //data to be written d = new DataOutputBuffer(); d.writeInt(call.id); call.param.write(d); byte[] data = d.getData(); int dataLength = d.getLength(); out.writeInt(dataLength); //first put the data length out.write(data, 0, dataLength);//write the data out.flush(); } } catch(IOException e) { markClosed(e); } finally { //the buffer is just an in-memory buffer, but it is still polite to // close early IOUtils.closeStream(d); } }
从code里面看出,请求发送是synchronized,所以会有上一篇日志里提到的问题。
HBase客户端的code先看到这里吧。
下面这个图帮助理解一下上面各种pool
发表评论
-
HBase Schema Design
2013-05-24 11:41 1189As someone has said here 引用You ... -
Question on HBase source code
2013-05-22 15:05 1114I'm reading source code of hbas ... -
Using the libjars option with Hadoop
2013-05-20 15:03 967As I have said in my last post, ... -
Use HBase to Solve Page Access Problem
2013-05-17 14:48 1189Currrently I'm working on sth l ... -
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/thi
2013-05-16 15:27 1140If you meet this exception, mak ... -
What's Xen?
2012-12-23 17:19 1124Xen的介绍。 -
[youtube] Scaling the Web: Databases & NoSQL
2012-03-23 13:11 1084Very good talk on this subject. ... -
Cloud Security?
2011-09-02 14:23 848看了一些文章,主要是保证用户怎么保证存储在公有云的数据的完整性 ... -
【读书笔记】Data warehousing and analytics infrastructure at facebook
2011-03-18 22:03 1952这好像是sigmod2010上的paper。 读了之后做了以 ... -
hbase中region的flush
2011-01-20 15:52 8788都知道memstore大小到一定程度后会flush到disk上 ... -
cassandra example
2011-01-19 16:39 1757http://www.rackspacecloud.com/b ... -
想了解Thrift,留个记号
2011-01-19 16:35 144Thrift: Scalable Cross-Langu ... -
impact of total region numbers?
2011-01-19 16:31 931这几天tune了hbase的几个参数,有些有意思的结果。具体看 ... -
Will all HFiles managed by a regionserver kept open
2011-01-19 10:29 1482code 没看仔细,所以在hbase 的mail list上面 ... -
细说HBase怎么完成一个Get操作 (server side)
2011-01-14 16:33 4337上面有一篇记录了client边的过程,现在看看RegionSe ... -
problems in building hadoop
2010-12-22 10:28 1009When I try to modify some code ... -
HBase importing
2010-11-30 12:54 1006slides by hbase developer Ryan ... -
HDFS scalability: the limits to growth
2010-11-30 12:52 2043Abstract: The Hadoop Distr ... -
hadoop-0.20.2+737 and hbase-0.20.6 not compatible?
2010-11-11 13:28 1311master log里面发现 0 region server ... -
hbase master cannot start up
2010-11-11 13:24 29582010-10-06 15:21:44,704 INFO ...
相关推荐
当前状态:完全通过 HBase 0.94 和 0.94.16Java hbase-client支持 HBase 服务器的版本[√] 0.94.x[√] 0.94.0[√] 0.94.160.95.x0.96.x安装$ npm install hbase-client使用 CRUD:通过 zookeeper 创建 HBase ...
4. **执行(execute)操作**:`execute`方法是一个更为灵活的接口,允许我们传递一个HBase操作的回调函数。这样,我们可以自定义更复杂的查询、更新或删除操作。回调函数通常会是一个实现了HBaseCallback接口的匿名...
HBase,作为Apache软件基金会的一个开源项目,是构建在Hadoop分布式文件系统(HDFS)之上的分布式、面向列的数据库,专为大数据设计,支持海量数据的实时读写。在最新的版本HBase 2.3.2中,其客户端接口提供了丰富的...
HbaseClient是Apache HBase的核心组件之一,它是客户端与HBase分布式数据库进行交互的桥梁。本文将深入探讨HbaseClient的工作原理、主要功能以及使用技巧,帮助读者更好地理解和掌握HBase的数据操作。 首先,Hbase...
Hbase是一个分布式的、面向列的数据库,常用于大数据存储,是Apache Hadoop生态系统的一部分。以下是一步步的安装和配置步骤: 1. **下载Hbase**: 首先,你需要访问清华大学的开源软件镜像站点,找到Hbase的最新...
"phoenix-5.0.0-HBase-2.0-client" 是一个针对Apache HBase数据库的Phoenix客户端库,主要用于通过SQL查询语句与HBase进行交互。这个版本的Phoenix客户端是为HBase 2.0版本设计和优化的,确保了与该版本HBase的兼容...
总结起来,"hbase-client_lib.rar"是一个包含了HBase-1.3.1客户端所有必要依赖的资源,对于需要操作HBase集群的开发者来说,这是一个非常重要的工具。通过深入研究这些jar包,开发者可以提升对HBase的理解,从而更...
hbase-client-2.1.0-cdh6.3.0.jar
hbase phoenix 客户端连接jdbc的jar包,SQuirreL SQL Client,DbVisualizer 等客户端连接hbase配置使用
phoenix-client-hbase-2.2-5.1.2.jar
phoenix-4.14.1-HBase-1.2-client.jar
赠送jar包:hbase-client-1.4.3.jar; 赠送原API文档:hbase-client-1.4.3-javadoc.jar; 赠送源代码:hbase-client-1.4.3-sources.jar; 赠送Maven依赖信息文件:hbase-client-1.4.3.pom; 包含翻译后的API文档:...
HBase是一个分布式、面向列的NoSQL数据库,它构建于Hadoop之上,提供实时访问大量数据的能力。Scala是一种强大的函数式编程语言,与Java虚拟机(JVM)兼容,因此非常适合编写HBase的客户端程序。 首先,确保你的...
赠送jar包:hbase-client-1.1.2.jar; 赠送原API文档:hbase-client-1.1.2-javadoc.jar; 赠送源代码:hbase-client-1.1.2-sources.jar; 赠送Maven依赖信息文件:hbase-client-1.1.2.pom; 包含翻译后的API文档:...
在本实验中,我们将深入学习如何在大数据环境中使用HBase,这是一个分布式列式数据库,它在Hadoop生态系统中扮演着重要角色。实验的目标是让你理解HBase在Hadoop架构中的地位,以及掌握通过Shell命令和Java API进行...
下面是一个 Java 操作 Hbase 的示例程序: 首先,引入需要的包: ```java import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; ...
首先,HBase是Apache Hadoop生态中的一个分布式列式数据库,它提供实时的数据访问,并且支持大规模数据存储。HBase的设计目标是对超大型表进行随机、实时读写操作。而HDFS则是Hadoop的核心组件,作为一个分布式文件...
赠送jar包:hbase-client-1.1.2.jar; 赠送原API文档:hbase-client-1.1.2-javadoc.jar; 赠送源代码:hbase-client-1.1.2-sources.jar; 包含翻译后的API文档:hbase-client-1.1.2-javadoc-API文档-中文(简体)-...
在"hbase-2.3.3-client/conf"目录下,有一个名为"hbase-site.xml"的文件,这里可以设置诸如集群地址、端口、安全认证等参数。对于分布式环境,需要确保客户端的配置与集群服务器保持一致。 HBase客户端的核心操作...
例如,你可以创建一个Spring Boot应用,利用HBaseTemplate进行数据操作,并使用MapReduce进行批量数据处理。同时,通过编写协处理器来增强系统的功能和性能。 总之,理解并熟练运用HBase的Java客户端API、Spring...