HBase Coprocessor示例

paddy.w

浏览: 507359 次
性别:
来自: 北京

最近访客更多访客>>

devcang

tom2139779

dongguangming88

zhoujing_06

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

HBase

        HBase的coprocessor分为两类，Observer和EndPoint。Observer相当于触发器，代码部署在服务端，相当于对API调用的代理。介绍这方面的文章不少，在此不赘述。这里想说一下EndPoint的使用。

        EndPoint相当于存储过程。0.94.x之前使用EndPoint需要实现CoprocessorProtocol接口，而0.96.x的EndPoint改为用protobufs作为RPC的协议。在此用一个具体的例子说明一下新版的EndPoint该怎么使用。

        例如：统计一张表的行数。
        首先首先编写protobuf文件并编译。

option java_package = "linecounter";
option java_outer_classname = "LineCounterServer";
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option optimize_for=SPEED;

message CountRequest {
    required string askWord = 1;
}

message CountResponse {
    required int64 retWord = 1;
}

service LineCounter {
    rpc countLine(CountRequest)
        returns (CountResponse);
}

        编译后会生成LineCounterServer.java
        CountRequest是发送给服务端的消息，这里定义字符串askWord来存放具体消息内容。CounterResponse是返回的结果，统计的是行数，所以用long类型存放。LineCounter中定义一个方法countLine，传递请求，返回响应。具体说明请参见protobuf。
        实现EndPoint

public class LineCounterEndPoint extends LineCounterServer.LineCounter implements Coprocessor, CoprocessorService {

    private RegionCoprocessorEnvironment env;

    @Override
    public void start(CoprocessorEnvironment coprocessorEnvironment) throws IOException {
        if (coprocessorEnvironment instanceof RegionCoprocessorEnvironment)
            this.env = (RegionCoprocessorEnvironment) coprocessorEnvironment;
        else throw new CoprocessorException("Must be loaded on a table region!!");
    }

    @Override
    public void stop(CoprocessorEnvironment coprocessorEnvironment) throws IOException {

    }

    @Override
    public Service getService() {
        return this;
    }

    @Override
    public void countLine(RpcController controller, LineCounterServer.CountRequest request, RpcCallback<LineCounterServer.CountResponse> done) {
        RegionScanner scanner = null;
        LineCounterServer.CountResponse.Builder respBuilder = LineCounterServer.CountResponse.newBuilder();
        if (!"count".equals(request.getAskWord())) {
            respBuilder.setRetWord(23333);
        } else {
            long count = 0;
            try {
                Scan scan = new Scan();
                scan.setMaxVersions(1);
                scanner = env.getRegion().getScanner(scan);
                List<Cell> list = new ArrayList<>();
                while (scanner.next(list))
                    count += 1;
                respBuilder.setRetWord(count);
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (scanner != null)
                    try {
                        scanner.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
            }
        }
        done.run(respBuilder.build());
    }
}

        LineCounterEndPoint需要继承抽象类LineCounter并实现Coprocessor和CoprocessorService接口。LineCounter在刚才生成的java文件里。
        start和stop方法分别负责endpoint执行前的初始化和结束后的清理工作。start方法的参数是一个接口，需要根据实际环境将其转成需要的类型。
        主要需要实现的是countLine方法，这也刚才在protobuf中定义的方法。为了测试效果，这里对请求做了一个区分：如果收到的请求信息不是“count”，那么返回23333；否则统计region的记录行数并返回。
        实现Client端

public class LineCounterClient {

    public static void main(String[] args) throws Throwable {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "zk_host1:2181,zk_host2:2181,zk_host3:2181");
        conf.set("hbase.master", "host_master:60000");
        HTable table = new HTable(conf, "count_test");
        final LineCounterServer.CountRequest req = LineCounterServer.CountRequest.newBuilder().setAskWord("count").build();
        Map<byte[], Long> tmpRet = table.coprocessorService(LineCounterServer.LineCounter.class, null, null, new Batch.Call<LineCounterServer.LineCounter, Long>() {
            @Override
            public Long call(LineCounterServer.LineCounter instance) throws IOException {
                ServerRpcController controller = new ServerRpcController();
                BlockingRpcCallback<LineCounterServer.CountResponse> rpc = new BlockingRpcCallback<>();
                instance.countLine(controller, req, rpc);
                LineCounterServer.CountResponse resp = rpc.get();
                return resp.getRetWord();
            }
        });
        long ret = 0;
        for (long l : tmpRet.values())
            ret += l;
        System.out.println("lines: " + ret);
    }
}

首先设置zookeeper和master的地址和接口信息。然后构造请求即CountRequest，先将请求信息设置为“count”。调用HTable的coprocessorService方法

public <T extends Service, R> Map<byte[],R> coprocessorService(final Class<T> service,
      byte[] startKey, byte[] endKey, final Batch.Call<T,R> callable)

        该方法有四个参数，第1个参数是protobuf生成的LineCounter类对象。第2个和第3个参数分别为起始和结束rowkey，这里的意思是范围内rowkey所在的region都会调用endpoint，这里设为null表明所有的region都会调用。第4个参数为接口，需要重写call方法。
        方法的返回值是Map类型，Map的size与参与计算的region个数一致。所以最后需要做的一步是讲返回结果进行累加，得到最后的结果。
        此程序返回5782，是表count_test的行数。若请求消息设置为“hello”，程序返回23333。

        coprocessorService还有一个五参数方法，第五个参数是一个CallBack接口，还可以如此实现：

public class LineCounterClient {

    public static void main(String[] args) throws Throwable {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "zk_host1:2181,zk_host2:2181,zk_host3:2181");
        conf.set("hbase.master", "host_master:60000");
        HTable table = new HTable(conf, "count_test");
        final LineCounterServer.CountRequest req = LineCounterServer.CountRequest.newBuilder().setAskWord("count").build();
        final AtomicLong ret = new AtomicLong();
        table.coprocessorService(LineCounterServer.LineCounter.class, null, null, new Batch.Call<LineCounterServer.LineCounter, Long>() {
            @Override
            public Long call(LineCounterServer.LineCounter instance) throws IOException {
                ServerRpcController controller = new ServerRpcController();
                BlockingRpcCallback<LineCounterServer.CountResponse> rpc = new BlockingRpcCallback<>();
                instance.countLine(controller, req, rpc);
                LineCounterServer.CountResponse resp = rpc.get();
                return resp.getRetWord();
            }
        }, new Batch.Callback<Long>() {
            @Override
            public void update(byte[] region, byte[] row, Long result) {
                ret.getAndAdd(result);
                System.out.println(Bytes.toString(row)+": "+result);
            }
        });
        System.out.println("lines: " + ret.get());
    }
}

每调用一次call方法之后会调用一次update方法，因此在外部定义一个变量ret存放结果，每次调用update时更新ret的值即可。

分享到：