UserScan的处理流程分析
前置说明
Userscan是通过client或cp中发起的scanner操作。
在Scan中通过caching属性来返回可以返回多少条数据,每次进行next时。
通过batch属性来设置每次在rs端每次next kv时,可读取多少个kv,(在同一行的情况下)
在生成Scan实例时,最好是把family与column都设置上,这样能保证查询的最高效.
client端通过生成Scan实例,通过HTable下的如下方法得到ClientScanner实例
public ResultScanner getScanner(final Scan scan)
在生成的ClientScanner实例中的callable属性的值为生成的一个ScannerCallable实例。
并通过callable.prepare(tries != 0);方法得到此scan的startkey所在的region的location.在meta表中。
把startkey对应的location中得到此location的HRegionInfo信息。
并设置ClientScanner.currentRegion的值为当前的region.也就是startkey所在的region.
通过ClientScanner.next向rs发起rpc调用操作。调用HRegionServer.scan
public ScanResponse scan(finalRpcControllercontroller, final ScanRequest request)
ClientScanner.next时,首先是发起openScanner操作,得到一个ScannerId
通过ScannerCallable.call方法:
if (scannerId == -1L) {
this.scannerId = openScanner();
} else {
openScanner方法:中发起一个scan操作,通过rpc调用rs.scan
ScanRequest request =
RequestConverter.buildScanRequest(
getLocation().getRegionInfo().getRegionName(),
this.scan, 0, false);
try {
ScanResponse response = getStub().scan(null, request);
longid = response.getScannerId();
if (logScannerActivity) {
LOG.info("Open scanner=" + id + " for scan=" + scan.toString()
+ " on region " + getLocation().toString());
}
returnid;
HregionServer.scan中对openScanner的处理:
public ScanResponse scan(finalRpcControllercontroller, final ScanRequest request)
throws ServiceException {
Leases.Lease lease = null;
String scannerName = null;
........................................很多代码没有显示
requestCount.increment();
intttl = 0;
HRegion region = null;
RegionScannerscanner = null;
RegionScannerHolder rsh = null;
booleanmoreResults = true;
booleancloseScanner = false;
ScanResponse.Builder builder = ScanResponse.newBuilder();
if (request.hasCloseScanner()) {
closeScanner = request.getCloseScanner();
}
introws = 1;
if (request.hasNumberOfRows()) {
rows = request.getNumberOfRows();
}
if (request.hasScannerId()) {
.................................很多代码没有显示
} else {
得到请求的HRegion实例,也就是startkey所在的HRegion
region = getRegion(request.getRegion());
ClientProtos.Scan protoScan = request.getScan();
booleanisLoadingCfsOnDemandSet = protoScan.hasLoadColumnFamiliesOnDemand();
Scan scan = ProtobufUtil.toScan(protoScan);
// if the request doesn't set this, get the default region setting.
if (!isLoadingCfsOnDemandSet) {
scan.setLoadColumnFamiliesOnDemand(region.isLoadingCfsOnDemandDefault());
}
scan.getAttribute(Scan.SCAN_ATTRIBUTES_METRICS_ENABLE);
如果scan没有设置family,把region中所有的family当成scan的family
region.prepareScanner(scan);
if (region.getCoprocessorHost() != null) {
scanner = region.getCoprocessorHost().preScannerOpen(scan);
}
if (scanner == null) {
执行HRegion.getScanner方法。生成HRegion.RegionScannerImpl方法
scanner = region.getScanner(scan);
}
if (region.getCoprocessorHost() != null) {
scanner = region.getCoprocessorHost().postScannerOpen(scan, scanner);
}
把生成的RegionScanner添加到scanners集合容器中。并设置scannerid(一个随机的值),
scannername是scannerid的string版本。添加过期监控处理,
通过hbase.client.scanner.timeout.period配置过期时间,默认值为60000ms
老版本通过hbase.regionserver.lease.period配置。
过期检查线程通过Leases完成。对scanner的过期处理通过一个
HregionServer.ScannerListener.leaseExpired实例来完成。
scannerId = addScanner(scanner, region);
scannerName = String.valueOf(scannerId);
ttl = this.scannerLeaseTimeoutPeriod;
}
............................................很多代码没有显示
Hregion.getScanner方法生成RegionScanner实例流程
publicRegionScannergetScanner(Scan scan) throws IOException {
returngetScanner(scan, null);
}
层次的调用,此时传入的kvscannerlist为null
protectedRegionScannergetScanner(Scan scan,
List<KeyValueScanner> additionalScanners) throws IOException {
startRegionOperation(Operation.SCAN);
try {
// Verify families are all valid
prepareScanner(scan);
if(scan.hasFamilies()) {
for(byte [] family : scan.getFamilyMap().keySet()) {
checkFamily(family);
}
}
returninstantiateRegionScanner(scan, additionalScanners);
} finally {
closeRegionOperation();
}
}
最终生成一个HRegion.RegionScannerImpl实例
protectedRegionScannerinstantiateRegionScanner(Scan scan,
List<KeyValueScanner> additionalScanners) throws IOException {
returnnewRegionScannerImpl(scan, additionalScanners, this);
}
RegionScanner实例的生成构造方法:
RegionScannerImpl(Scan scan, List<KeyValueScanner> additionalScanners, HRegion region)
throws IOException {
this.region = region;
this.maxResultSize = scan.getMaxResultSize();
if (scan.hasFilter()) {
this.filter = newFilterWrapper(scan.getFilter());
} else {
this.filter = null;
}
this.batch = scan.getBatch();
if (Bytes.equals(scan.getStopRow(), HConstants.EMPTY_END_ROW) && !scan.isGetScan()) {
this.stopRow = null;
} else {
this.stopRow = scan.getStopRow();
}
// If we are doing a get, we want to be [startRow,endRow] normally
// it is [startRow,endRow) and if startRow=endRow we get nothing.
this.isScan = scan.isGetScan() ? -1 : 0;
// synchronize on scannerReadPoints so that nobody calculates
// getSmallestReadPoint, before scannerReadPoints is updated.
IsolationLevelisolationLevel = scan.getIsolationLevel();
synchronized(scannerReadPoints) {
if (isolationLevel == IsolationLevel.READ_UNCOMMITTED) {
// This scan can read even uncommitted transactions
this.readPt = Long.MAX_VALUE;
MultiVersionConsistencyControl.setThreadReadPoint(this.readPt);
} else {
this.readPt = MultiVersionConsistencyControl.resetThreadReadPoint(mvcc);
}
scannerReadPoints.put(this, this.readPt);
}
// Here we separate all scanners into two lists - scanner that provide data required
// by the filter to operate (scanners list) and all others (joinedScanners list).
List<KeyValueScanner> scanners = newArrayList<KeyValueScanner>();
List<KeyValueScanner> joinedScanners = newArrayList<KeyValueScanner>();
if (additionalScanners != null) {
scanners.addAll(additionalScanners);
}
迭代每一个要进行scan的store。生成具体的StoreScanner实例。通常情况下joinedHead的值为null
for (Map.Entry<byte[], NavigableSet<byte[]>> entry :
scan.getFamilyMap().entrySet()) {
Storestore = stores.get(entry.getKey());
生成StoreScanner实例。通过HStore.getScanner(scan,columns);
KeyValueScannerscanner = store.getScanner(scan, entry.getValue());
if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
|| this.filter.isFamilyEssential(entry.getKey())) {
scanners.add(scanner);
} else {
joinedScanners.add(scanner);
}
}
生成KeyValueHeap实例,把所有的storescanner的开始位置移动到startkey的位置并得到top的StoreScanner,
this.storeHeap = newKeyValueHeap(scanners, comparator);
if (!joinedScanners.isEmpty()) {
this.joinedHeap = newKeyValueHeap(joinedScanners, comparator);
}
}
得到StoreScanner实例的HStore.getScanner(scan,columns)方法
publicKeyValueScannergetScanner(Scan scan,
finalNavigableSet<byte []> targetCols) throws IOException {
lock.readLock().lock();
try {
KeyValueScannerscanner = null;
if (this.getCoprocessorHost() != null) {
scanner = this.getCoprocessorHost().preStoreScannerOpen(this, scan, targetCols);
}
if (scanner == null) {
scanner = newStoreScanner(this, getScanInfo(), scan, targetCols);
}
returnscanner;
} finally {
lock.readLock().unlock();
}
}
生成StoreScanner的构造方法:
publicStoreScanner(Storestore, ScanInfo scanInfo, Scan scan, finalNavigableSet<byte[]> columns)
throws IOException {
this(store, scan.getCacheBlocks(), scan, columns, scanInfo.getTtl(),
scanInfo.getMinVersions());
如果设置有scan的_raw_属性时,columns的值需要为null
if (columns != null && scan.isRaw()) {
thrownewDoNotRetryIOException(
"Cannot specify any column for a raw scan");
}
matcher = newScanQueryMatcher(scan, scanInfo, columns,
ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP,
oldestUnexpiredTS);
得到StoreFileScanner,StoreFileScanner中引用的StoreFile.Reader中引用HFileReaderV2,
HFileReaderV2的实例在StoreFile.Reader中如果已经存在,不会重新创建,这样会加快scanner的创建时间。
// Pass columns to try to filter out unnecessary StoreFiles.
List<KeyValueScanner> scanners = getScannersNoCompaction();
// Seek all scanners to the start of the Row (or if the exact matching row
// key does not exist, then to the start of the next matching Row).
// Always check bloom filter to optimize the top row seek for delete
// family marker.
if (explicitColumnQuery && lazySeekEnabledGlobally) {
for (KeyValueScannerscanner : scanners) {
scanner.requestSeek(matcher.getStartKey(), false, true);
}
} else {
if (!isParallelSeekEnabled) {
for (KeyValueScannerscanner : scanners) {
scanner.seek(matcher.getStartKey());
}
} else {
parallelSeek(scanners, matcher.getStartKey());
}
}
// set storeLimit
this.storeLimit = scan.getMaxResultsPerColumnFamily();
// set rowOffset
this.storeOffset = scan.getRowOffsetPerColumnFamily();
// Combine all seeked scanners with a heap
heap = newKeyValueHeap(scanners, store.getComparator());
注册,如果有storefile更新时,把更新后的storefile添加到这个StoreScanner中来。
this.store.addChangedReaderObserver(this);
}
发起scan的rpc操作
client端发起openScanner操作后,得到一个scannerId.此时发起scan操作。
通过ScannerCallable.call中发起call的操作,在scannerId不等于-1时,
Result [] rrs = null;
ScanRequest request = null;
try {
incRPCcallsMetrics();
request = RequestConverter.buildScanRequest(scannerId, caching, false, nextCallSeq);
ScanResponse response = null;
PayloadCarryingRpcController controller = newPayloadCarryingRpcController();
try {
controller.setPriority(getTableName());
response = getStub().scan(controller, request);
...................................此处省去一些代码
nextCallSeq++;
longtimestamp = System.currentTimeMillis();
// Results are returned via controller
CellScannercellScanner = controller.cellScanner();
rrs = ResponseConverter.getResults(cellScanner, response);
HregionServer.scan方法中对scan时的处理流程:
得到scan中的caching属性的值,此值主要用来响应client返回的条数。如果一行数据包含多个kv,算一条
introws = 1;
if (request.hasNumberOfRows()) {
rows = request.getNumberOfRows();
}
如果client传入的scannerId有值,也就是不等于-1时,表示不是openScanner操作,检查scannerid是否过期
if (request.hasScannerId()) {
rsh = scanners.get(scannerName);
if (rsh == null) {
LOG.info("Client tried to access missing scanner " + scannerName);
thrownewUnknownScannerException(
"Name: " + scannerName + ", already closed?");
}
此处主要是检查region是否发生过split操作。如果是会出现NotServingRegionException操作。
scanner = rsh.s;
HRegionInfo hri = scanner.getRegionInfo();
region = getRegion(hri.getRegionName());
if (region != rsh.r) { // Yes, should be the same instance
thrownewNotServingRegionException("Region was re-opened after the scanner"
+ scannerName + " was created: " + hri.getRegionNameAsString());
}
} else {
...................................此处省去一些生成Regionscanner的代码
}
表示有设置caching,如果是执行scan,此时的默认值为1,当前scan中设置有caching后,使用scan中设置的值
if (rows > 0) {
// if nextCallSeq does not match throw Exception straight away. This needs to be
// performed even before checking of Lease.
// See HBASE-5974
是否有配置nextCallSeq的值,第一次调用时,此值为0,每调用一次加一,client也一样,每调用一次加一。
if (request.hasNextCallSeq()) {
if (rsh == null) {
rsh = scanners.get(scannerName);
}
if (rsh != null) {
if (request.getNextCallSeq() != rsh.nextCallSeq) {
thrownewOutOfOrderScannerNextException("Expected nextCallSeq: " + rsh.nextCallSeq
+ " But the nextCallSeq got from client: " + request.getNextCallSeq() +
"; request=" + TextFormat.shortDebugString(request));
}
// Increment the nextCallSeq value which is the next expected from client.
rsh.nextCallSeq++;
}
}
try {
先从租约管理中移出此租约,防止查找时间大于过期时间而出现的超时
// Remove lease while its being processed in server; protects against case
// where processing of request takes > lease expiration time.
lease = leases.removeLease(scannerName);
生成要返回的条数的一个列表,scan.caching
List<Result> results = newArrayList<Result>(rows);
longcurrentScanResultSize = 0;
booleandone = false;
调用cp的preScannernext,如果返回为true,表示不在执行scan操作。
// Call coprocessor. Get region info from scanner.
if (region != null && region.getCoprocessorHost() != null) {
Boolean bypass = region.getCoprocessorHost().preScannerNext(
scanner, results, rows);
if (!results.isEmpty()) {
for (Result r : results) {
if (maxScannerResultSize < Long.MAX_VALUE){
for (Cellkv : r.rawCells()) {
// TODO
currentScanResultSize += KeyValueUtil.ensureKeyValue(kv).heapSize();
}
}
}
}
if (bypass != null && bypass.booleanValue()) {
done = true;
}
}
执行scan操作。Cp的preScannerNext返回为false,或没有设置cp(主要是RegionObServer)
返回给client的最大size通过hbase.client.scanner.max.result.size配置,默认为long.maxvalue
如果scan也设置有maxResultSize,使用scan设置的值
if (!done) {
longmaxResultSize = scanner.getMaxResultSize();
if (maxResultSize <= 0) {
maxResultSize = maxScannerResultSize;
}
List<Cell> values = newArrayList<Cell>();
MultiVersionConsistencyControl.setThreadReadPoint(scanner.getMvccReadPoint());
region.startRegionOperation(Operation.SCAN);
try {
inti = 0;
synchronized(scanner) {
此处开始迭代,开始调用regionScanner(HRegion.RegionScannerImpl.nextRaw(List))进行查找,
迭代的长度为scan设置的caching的大小,如果执行RegionScanner.nextRaw(List)返回为false,时也会停止迭代
for (; i < rows
&& currentScanResultSize < maxResultSize; i++) {
返回的true表示还有数据,可以接着查询,否则表示此region中已经没有符合条件的数据了。
// Collect values to be returned here
booleanmoreRows = scanner.nextRaw(values);
if (!values.isEmpty()) {
if (maxScannerResultSize < Long.MAX_VALUE){
for (Cellkv : values) {
currentScanResultSize += KeyValueUtil.ensureKeyValue(kv).heapSize();
}
}
results.add(Result.create(values));
}
if (!moreRows) {
break;
}
values.clear();
}
}
region.readRequestsCount.add(i);
} finally {
region.closeRegionOperation();
}
// coprocessor postNext hook
if (region != null && region.getCoprocessorHost() != null) {
region.getCoprocessorHost().postScannerNext(scanner, results, rows, true);
}
}
如果没有可以再查找的数据时,设置response的moreResults为false
// If the scanner's filter - if any - is done with the scan
// and wants to tell the client to stop the scan. This is done by passing
// a null result, and setting moreResults to false.
if (scanner.isFilterDone() && results.isEmpty()) {
moreResults = false;
results = null;
} else {
添加结果到response中,如果hbase.client.rpc.codec配置有codec的值,
默认取hbase.client.default.rpc.codec配置的值,默认为KeyValueCodec
如果上面说的codec配置不为null时,把results生成为一个iterator,并生成一个匿名的CallScanner实现类
设置到scan时传入的controller中。这样能提升查询数据的读取性能。
如果没有配置codec时,默认直接把results列表设置到response中,这样响应的数据可能会比较大。
addResults(builder, results, controller);
}
} finally {
重新把租约放入到租约检查管理器中,此租约主要来检查client多长时间没有发起过scan的操作。
// We're done. On way out re-add the above removed lease.
// Adding resets expiration time on lease.
if (scanners.containsKey(scannerName)) {
if (lease != null) leases.addLease(lease);
ttl = this.scannerLeaseTimeoutPeriod;
}
}
}
client端获取响应的数据:ScannerCallable.call方法中
rrs = ResponseConverter.getResults(cellScanner, response);
ResponseConverter.getResults方法的实现
publicstatic Result[] getResults(CellScannercellScanner, ScanResponse response)
throws IOException {
if (response == null) returnnull;
// If cellscanner, then the number of Results to return is the count of elements in the
// cellsPerResult list. Otherwise, it is how many results are embedded inside the response.
intnoOfResults = cellScanner != null?
response.getCellsPerResultCount(): response.getResultsCount();
Result[] results = new Result[noOfResults];
for (inti = 0; i < noOfResults; i++) {
cellScanner如果codec配置为有值时,在rs响应时会生成一个匿名的实现
if (cellScanner != null) {
......................................
intnoOfCells = response.getCellsPerResult(i);
List<Cell> cells = newArrayList<Cell>(noOfCells);
for (intj = 0; j < noOfCells; j++) {
try {
if (cellScanner.advance() == false) {
.....................................
String msg = "Results sent from server=" + noOfResults + ". But only got " + i
+ " results completely at client. Resetting the scanner to scan again.";
LOG.error(msg);
thrownewDoNotRetryIOException(msg);
}
} catch (IOException ioe) {
...........................................
LOG.error("Exception while reading cells from result."
+ "Resetting the scanner to scan again.", ioe);
thrownewDoNotRetryIOException("Resetting the scanner.", ioe);
}
cells.add(cellScanner.current());
}
results[i] = Result.create(cells);
} else {
否则,没有设置codec,直接从response中读取出来数据,
// Result is pure pb.
results[i] = ProtobufUtil.toResult(response.getResults(i));
}
}
returnresults;
}
在ClientScanner.next方法中,如果还没有达到scan的caching的值,(默认为1)也就是countdown的值还不等于0
,countdown的值为得到一个Result时减1,通过nextScanner重新得到下一个region,并发起连接去scan数据。
Do{
.........................此处省去一些代码。
if (values != null && values.length > 0) {
for (Result rs : values) {
cache.add(rs);
for (Cellkv : rs.rawCells()) {
// TODO make method in Cell or CellUtil
remainingResultSize -= KeyValueUtil.ensureKeyValue(kv).heapSize();
}
countdown--;
this.lastResult = rs;
}
}
} while (remainingResultSize > 0 && countdown > 0 && nextScanner(countdown, values == null));
对于这种类型的查询操作,可以使用得到一个ClientScanner后,不执行close操作。
在rs的timeout前每次定期去从rs中拿一定量的数据下来。缓存到ClientScanner的cache中。
每次next时从cache中直接拿数据
Hregion.RegionScannerImpl.nextRaw(list)方法分析
RegionScannerImpl是对RegionScanner接口的实现。
Rs的scan在执行时通过regionScanner.nextRaw(list)来获取数据。
通过regionScanner.isFilterDone来检查此region的查找是否完成。
调用nextRaw方法,此方法调用另一个重载方法,batch是scan中设置的每次可查询最大的单行中的多少个kv的kv个数
publicbooleannextRaw(List<Cell> outResults)
throws IOException {
returnnextRaw(outResults, batch);
}
publicbooleannextRaw(List<Cell> outResults, intlimit) throws IOException {
booleanreturnResult;
调用nextInternal方法。
if (outResults.isEmpty()) {
// Usually outResults is empty. This is true when next is called
// to handle scan or get operation.
returnResult = nextInternal(outResults, limit);
} else {
List<Cell> tmpList = newArrayList<Cell>();
returnResult = nextInternal(tmpList, limit);
outResults.addAll(tmpList);
}
调用filter.reset方法,清空当前row的filter的相关信息。
ResetFilters();
如果filter.filterAllRemaining()的返回值为true,时表示当前region的查找条件已经结束,不能在执行查找操作。
没有可以接着查找的需要,也就是没有更多要查找的行了。
if (isFilterDone()) {
returnfalse;
}
................................此处省去一些代码
returnreturnResult;
}
nextInternal方法处理流程:
privatebooleannextInternal(List<Cell> results, intlimit)
throws IOException {
if (!results.isEmpty()) {
thrownewIllegalArgumentException("First parameter should be an empty list");
}
RpcCallContextrpcCall = RpcServer.getCurrentCall();
// The loop here is used only when at some point during the next we determine
// that due to effects of filters or otherwise, we have an empty row in the result.
// Then we loop and try again. Otherwise, we must get out on the first iteration via return,
// "true" if there's more data to read, "false" if there isn't (storeHeap is at a stop row,
// and joinedHeap has no more data to read for the last row (if set, joinedContinuationRow).
while (true) {
if (rpcCall != null) {
// If a user specifies a too-restrictive or too-slow scanner, the
// client might time out and disconnect while the server side
// is still processing the request. We should abort aggressively
// in that case.
longafterTime = rpcCall.disconnectSince();
if (afterTime >= 0) {
thrownewCallerDisconnectedException(
"Aborting on region " + getRegionNameAsString() + ", call " +
this + " after " + afterTime + " ms, since " +
"caller disconnected");
}
}
得到通过startkey seek后当前最小的一个kv。
// Let's see what we have in the storeHeap.
KeyValue current = this.storeHeap.peek();
byte[] currentRow = null;
intoffset = 0;
shortlength = 0;
if (current != null) {
currentRow = current.getBuffer();
offset = current.getRowOffset();
length = current.getRowLength();
}
检查是否到了stopkey,如果是,返回false,joinedContinuationRow是多个cf的关联查找,不用去管它
booleanstopRow = isStopRow(currentRow, offset, length);
// Check if we were getting data from the joinedHeap and hit the limit.
// If not, then it's main path - getting results from storeHeap.
if (joinedContinuationRow == null) {
// First, check if we are at a stop row. If so, there are no more results.
if (stopRow) {
如果是stopRow,同时filter.hasFilterRow返回为true时,
可通过filterRowCells来检查要返回的kvlist,也可以用来修改要返回的kvlist
if (filter != null && filter.hasFilterRow()) {
filter.filterRowCells(results);
}
returnfalse;
}
通过filter.filterRowkey来过滤检查key是否需要排除,如果是排除返回true,否则返回false
// Check if rowkey filter wants to exclude this row. If so, loop to next.
// Technically, if we hit limits before on this row, we don't need this call.
if (filterRowKey(currentRow, offset, length)) {
如果rowkey是需要排除的rowkey,检查是否有下一行数据。如果没有下一行数据,返回flase,表示当前region查找结束
否则清空当前的results,重新进行查找
booleanmoreRows = nextRow(currentRow, offset, length);
if (!moreRows) returnfalse;
results.clear();
continue;
}
开始执行region下此scan需要的所有store的StoreScanner的next进行查找,把查找的结果放到results列表中。
如果一行中包含有多个kv,现在查找这些kv达到传入的limit的大小的时候,返回kv_limit的一个空的kv。
(查找的大小已经达到limit(batch)的一行最大scan的kv个数,返回kv_limit),
否则表示还没有查找到limit的kv个数,但是当前row对应的所有达到条件的kv都已经查找完成,返回最后一个kv。
返回的kv如果不是kv_limit,那么有可能是null或者是下一行的第一个kv.
KeyValue nextKv = populateResult(results, this.storeHeap, limit, currentRow, offset,
length);
如果达到limit的限制时,filter.hasFilterRow的值一定得是false,
否则会throw IncompatibleFilterException
如果达到limit的限制时,返回true,当前row的所有kv查找结束,返回true可以接着向下查找
提示:如果hbase一行数据中可能包含多个kv时,最好是在scan时设置batch的属性,否则会一直查找到所有的kv结束
// Ok, we are good, let's try to get some results from the main heap.
if (nextKv == KV_LIMIT) {
if (this.filter != null && filter.hasFilterRow()) {
thrownewIncompatibleFilterException(
"Filter whose hasFilterRow() returns true is incompatible with scan with limit!");
}
returntrue; // We hit the limit.
}
是否到结束行,从这一行代码中可以看出,stoprow是不包含的,因为nextKv肯定是下一行row中第一个kv的值
stopRow = nextKv == null ||
isStopRow(nextKv.getBuffer(), nextKv.getRowOffset(), nextKv.getRowLength());
// save that the row was empty before filters applied to it.
finalbooleanisEmptyRow = results.isEmpty();
如果是stopRow,同时filter.hasFilterRow返回为true时,
可通过filterRowCells来检查要返回的kvlist,也可以用来修改要返回的kvlist
// We have the part of the row necessary for filtering (all of it, usually).
// First filter with the filterRow(List).
if (filter != null && filter.hasFilterRow()) {
filter.filterRowCells(results);
}
如果当前row的查找没有找到合法的kv,也就是results的列表没有值,检查是否还有下一行,
如果有,重新进行查找,否则表示当前region的查找最结尾处,不能再进行查找,返回fasle
if (isEmptyRow) {
booleanmoreRows = nextRow(currentRow, offset, length);
if (!moreRows) returnfalse;
results.clear();
// This row was totally filtered out, if this is NOT the last row,
// we should continue on. Otherwise, nothing else to do.
if (!stopRow) continue;
returnfalse;
}
// Ok, we are done with storeHeap for this row.
// Now we may need to fetch additional, non-essential data into row.
// These values are not needed for filter to work, so we postpone their
// fetch to (possibly) reduce amount of data loads from disk.
if (this.joinedHeap != null) {
..................................进行关联查找的代码,不显示,也不分析
}
} else {
多个store进行关联查询,不分析,通常情况不会有
// Populating from the joined heap was stopped by limits, populate some more.
populateFromJoinedHeap(results, limit);
}
// We may have just called populateFromJoinedMap and hit the limits. If that is
// the case, we need to call it again on the next next() invocation.
if (joinedContinuationRow != null) {
returntrue;
}
如果这次的查找,results的结果为空,表示没有查找到结果,检查是否还有下一行数据,如果有重新进行查找,
否则返回false表示此region的查找结束
// Finally, we are done with both joinedHeap and storeHeap.
// Double check to prevent empty rows from appearing in result. It could be
// the case when SingleColumnValueExcludeFilter is used.
if (results.isEmpty()) {
booleanmoreRows = nextRow(currentRow, offset, length);
if (!moreRows) returnfalse;
if (!stopRow) continue;
}
非stoprow时,表示还可以有下一行的数据,也就是可以接着进行next操作。否则表示此region的查找结束
// We are done. Return the result.
return !stopRow;
}
}
UserScan时的ScanQueryMatcher.match方法处理
user scan时的ScanQueryMatcher在newRegionScannerImpl(scan, additionalScanners, this);时生成。
在生成StoreScanner时通过如下代码生成matcher实例。
matcher = newScanQueryMatcher(scan, scanInfo, columns,
ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP,
oldestUnexpiredTS);
matcher.isUserScan的值此时为true.
publicMatchCodematch(KeyValue kv) throws IOException {
检查当前region的查找是否结束。pageFilter就是通过控制此filter中的方法来检查是否需要
if (filter != null && filter.filterAllRemaining()) {
returnMatchCode.DONE_SCAN;
}
byte [] bytes = kv.getBuffer();
intoffset = kv.getOffset();
intkeyLength = Bytes.toInt(bytes, offset, Bytes.SIZEOF_INT);
offset += KeyValue.ROW_OFFSET;
intinitialOffset = offset;
shortrowLength = Bytes.toShort(bytes, offset, Bytes.SIZEOF_SHORT);
offset += Bytes.SIZEOF_SHORT;
检查传入的kv是否是当前行的kv,也就是rowkey是否相同,如果当前的rowkey小于传入的rowkey,
表示现在已经next到下一行了,返回DONE,表示当前行查找结束
intret = this.rowComparator.compareRows(row, this.rowOffset, this.rowLength,
bytes, offset, rowLength);
if (ret <= -1) {
returnMatchCode.DONE;
} elseif (ret >= 1) {
如果当前的rowkey大于传入的rowkey,表示当前next出来的kv比现在的kv要小,执行nextrow操作。
// could optimize this, if necessary?
// Could also be called SEEK_TO_CURRENT_ROW, but this
// should be rare/never happens.
returnMatchCode.SEEK_NEXT_ROW;
}
是否跳过当前行的其它kv比较,这是一个优化项。
// optimize case.
if (this.stickyNextRow)
returnMatchCode.SEEK_NEXT_ROW;
如果当前行的所有要查找的(scan)column都查找完成了,其它的当前行中非要scan的kv,
直接不比较,执行nextrow操作。
if (this.columns.done()) {
stickyNextRow = true;
returnMatchCode.SEEK_NEXT_ROW;
}
//Passing rowLength
offset += rowLength;
//Skipping family
bytefamilyLength = bytes [offset];
offset += familyLength + 1;
intqualLength = keyLength -
(offset - initialOffset) - KeyValue.TIMESTAMP_TYPE_SIZE;
检查当前KV的TTL是否过期,如果过期,检查是否SCAN中还有下一个COLUMN,如果有返回SEEK_NEXT_COL。
否则返回SEEK_NEXT_ROW。
longtimestamp = Bytes.toLong(bytes, initialOffset + keyLength - KeyValue.TIMESTAMP_TYPE_SIZE);
// check for early out based on timestamp alone
if (columns.isDone(timestamp)) {
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
/*
* The delete logic is pretty complicated now.
* This is corroborated by the following:
* 1. The store might be instructed to keep deleted rows around.
* 2. A scan can optionally see past a delete marker now.
* 3. If deleted rows are kept, we have to find out when we can
* remove the delete markers.
* 4. Family delete markers are always first (regardless of their TS)
* 5. Delete markers should not be counted as version
* 6. Delete markers affect puts of the *same* TS
* 7. Delete marker need to be version counted together with puts
* they affect
*/
bytetype = bytes[initialOffset + keyLength – 1];
如果当前KV是删除的KV。
if (kv.isDelete()) {
此处会进入。把删除的KV添加到DeleteTracker中,默认是ScanDeleteTracker
if (!keepDeletedCells) {
// first ignore delete markers if the scanner can do so, and the
// range does not include the marker
//
// during flushes and compactions also ignore delete markers newer
// than the readpoint of any open scanner, this prevents deleted
// rows that could still be seen by a scanner from being collected
booleanincludeDeleteMarker = seePastDeleteMarkers ?
tr.withinTimeRange(timestamp) :
tr.withinOrAfterTimeRange(timestamp);
if (includeDeleteMarker
&& kv.getMvccVersion() <= maxReadPointToTrackVersions) {
this.deletes.add(bytes, offset, qualLength, timestamp, type);
}
// Can't early out now, because DelFam come before any other keys
}
此处的检查不会进入,userscan不保留删除的数据
if (retainDeletesInOutput
|| (!isUserScan && (EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes)
|| kv.getMvccVersion() > maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
if (!isUserScan) {
// if this is not a user scan (compaction), we can filter this deletemarker right here
// otherwise (i.e. a "raw" scan) we fall through to normal version and timerange checking
returnMatchCode.INCLUDE;
}
} elseif (keepDeletedCells) {
if (timestamp < earliestPutTs) {
// keeping delete rows, but there are no puts older than
// this delete in the store files.
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
// else: fall through and do version counting on the
// delete markers
} else {
returnMatchCode.SKIP;
}
// note the following next else if...
// delete marker are not subject to other delete markers
} elseif (!this.deletes.isEmpty()) {
如果deleteTracker中不为空时,也就是当前行中有删除的KV,检查当前KV是否是删除的KV
提示:删除的KV在compare时,比正常的KV要小,所以在执行next操作时,delete的KV会先被查找出来。
如果是删除的KV,根据KV的删除类型,如果是版本被删除,返回SKIP。
否则如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
DeleteResultdeleteResult = deletes.isDeleted(bytes, offset, qualLength,
timestamp);
switch (deleteResult) {
caseFAMILY_DELETED:
caseCOLUMN_DELETED:
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
caseVERSION_DELETED:
caseFAMILY_VERSION_DELETED:
returnMatchCode.SKIP;
caseNOT_DELETED:
break;
default:
thrownewRuntimeException("UNEXPECTED");
}
}
检查KV的时间是否在SCAN要查找的时间范围内,
inttimestampComparison = tr.compare(timestamp);
如果大于SCAN的最大时间,返回SKIP。
if (timestampComparison >= 1) {
returnMatchCode.SKIP;
} elseif (timestampComparison <= -1) {
如果小于SCAN的最小时间,如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
检查当前KV的column是否是SCAN中指定的column列表中包含的值,如果是INCLUDE。
否则如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
// STEP 1: Check if the column is part of the requested columns
MatchCodecolChecker = columns.checkColumn(bytes, offset, qualLength, type);
如果column是SCAN中要查找的column之一
if (colChecker == MatchCode.INCLUDE) {
ReturnCodefilterResponse = ReturnCode.SKIP;
// STEP 2: Yes, the column is part of the requested columns. Check if filter is present
if (filter != null) {
执行filter.filterKeyValue操作。并返回filter过滤的结果
// STEP 3: Filter the key value and return if it filters out
filterResponse = filter.filterKeyValue(kv);
switch (filterResponse) {
caseSKIP:
returnMatchCode.SKIP;
caseNEXT_COL:
如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
caseNEXT_ROW:
stickyNextRow = true;
returnMatchCode.SEEK_NEXT_ROW;
caseSEEK_NEXT_USING_HINT:
returnMatchCode.SEEK_NEXT_USING_HINT;
default:
//It means it is either include or include and seek next
break;
}
}
/*
* STEP 4: Reaching this step means the column is part of the requested columns and either
* the filter is null or the filter has returned INCLUDE or INCLUDE_AND_NEXT_COL response.
* Now check the number of versions needed. This method call returns SKIP, INCLUDE,
* INCLUDE_AND_SEEK_NEXT_ROW, INCLUDE_AND_SEEK_NEXT_COL.
*
* FilterResponse ColumnChecker Desired behavior
* INCLUDE SKIP row has already been included, SKIP.
* INCLUDE INCLUDE INCLUDE
* INCLUDE INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE INCLUDE_AND_SEEK_NEXT_ROW INCLUDE_AND_SEEK_NEXT_ROW
* INCLUDE_AND_SEEK_NEXT_COL SKIP row has already been included, SKIP.
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_ROW INCLUDE_AND_SEEK_NEXT_ROW
*
* In all the above scenarios, we return the column checker return value except for
* FilterResponse (INCLUDE_AND_SEEK_NEXT_COL) and ColumnChecker(INCLUDE)
*/
此处主要是检查KV的是否是SCAN的最大版本内,到这个地方,除非是KV超过了要SCAN的最大版本,或者KV的TTL过期。
否则肯定是会包含此KV的值。
colChecker =
columns.checkVersions(bytes, offset, qualLength, timestamp, type,
kv.getMvccVersion() > maxReadPointToTrackVersions);
//Optimize with stickyNextRow
stickyNextRow = colChecker == MatchCode.INCLUDE_AND_SEEK_NEXT_ROW ? true : stickyNextRow;
return (filterResponse == ReturnCode.INCLUDE_AND_NEXT_COL &&
colChecker == MatchCode.INCLUDE) ? MatchCode.INCLUDE_AND_SEEK_NEXT_COL
: colChecker;
}
stickyNextRow = (colChecker == MatchCode.SEEK_NEXT_ROW) ? true
: stickyNextRow;
returncolChecker;
}
相关推荐
病人跟踪治疗信息管理系统采用B/S模式,促进了病人跟踪治疗信息管理系统的安全、快捷、高效的发展。传统的管理模式还处于手工处理阶段,管理效率极低,随着病人的不断增多,传统基于手工管理模式已经无法满足当前病人需求,随着信息化时代的到来,使得病人跟踪治疗信息管理系统的开发成了必然。 本网站系统使用动态网页开发SSM框架,Java作为系统的开发语言,MySQL作为后台数据库。设计开发了具有管理员;首页、个人中心、病人管理、病例采集管理、预约管理、医生管理、上传核酸检测报告管理、上传行动轨迹管理、分类管理、病人治疗状况管理、留言板管理、系统管理,病人;首页、个人中心、病例采集管理、预约管理、医生管理、上传核酸检测报告管理、上传行动轨迹管理、病人治疗状况管理,前台首页;首页、医生、医疗资讯、留言反馈、个人中心、后台管理、在线咨询等功能的病人跟踪治疗信息管理系统。在设计过程中,充分保证了系统代码的良好可读性、实用性、易扩展性、通用性、便于后期维护、操作方便以及页面简洁等特点。
liunx project 5
分享课程——PostgreSQL DBA实战视频教程(完整10门课程合集)
计算机科学基础期末考试试题
练习与巩固《C语言程序设计》理论知识,通过实践检验和提高实际能力,进一步培养自己综合分析问题和解决问题的能力。掌握运用C语言独立地编写调试应用程序和进行其它相关设计的技能。
1. 数据集资源 公开低光照数据集:用于模型训练的低光照图像数据集,这些数据集包含了多种低光照条件下的图像,并附有增强后的高质量图像。 合成数据:在不足数据的情况下,可以通过对高亮度图像进行暗化处理生成低光图像对,以增强数据量。 自建数据集:对于特定场景,如安防、医疗等,可以拍摄或收集特定条件下的低光照图像来创建数据集,满足特定应用需求。 2. 硬件资源 GPU:大规模模型训练需要高性能计算,以支持大规模图像处理和神经网络训练。 数据存储:由于图像数据较大,需要大容量的存储设备如HDD或SSD来存储数据集及中间结果。 3. 深度学习框架及工具 PyTorch:支持构建和训练神经网络模型,尤其适合卷积神经网络(CNN)和生成对抗网络(GAN)的实现。 CUDA和cuDNN:为GPU加速库,在模型训练时可显著提升运行效率。
双哥微服务
fb000f5e-12c5-a46b-102a-f08bdfa015f1.json
ASP.NET跑腿服务网站源码 开发环境 :Asp.net + VS2010 + C# + ACCESS 网站介绍: 适合人群:跑腿服务行业公司,服务资讯公司或者其他行业企业、 做服务行业建站的技术人员、技术人员学习参考都行。 技术特点:非常清爽大气的网站,界面华丽,工整,采用全div布局, 含flash图片切换功能,强大的后台信息管理功能。 功能介绍: 后台功能:系统参数设置(网站标题,关键字,内容,站长联系方式等)、系统栏目频道设置、新闻管 理、服务项目管理、公司介绍内容管、系统模版管理(可管理前台页面模版内容,具体到头部页面,底 部页面,首页,内容页,新网页等)、系统日志管理、系统管理员管理、频道管理(频道类型、频道内 容、内容发布以及编辑)。 后台地址:网址/admin/login.aspx 账户:admin 密码:admin888
c语言
环境说明: 开发语言:Java/php JDK版本:JDK1.8 数据库:mysql 5.7及以上 数据库工具:Navicat11及以上 开发软件:eclipse/idea 小程序框架:uniapp/原生小程序 开发工具:HBuilder X/微信开发者
人工智能(Artificial Intelligence,缩写为AI)是一种通过计算机程序模拟人类智能与行为的技术和理论。它可以用于各种领域,例如:自动驾驶、机器翻译、语音识别、图像识别、医疗诊断等。近年来,人工智能逐渐成为了技术界和商业领域的热门话题。
c语言
基于JAVA实现的离散数学题库管理系统
Matlab领域上传的视频均有对应的完整代码,皆可运行,亲测可用,适合小白; 1、代码压缩包内容 主函数:main.m; 调用函数:其他m文件;无需运行 运行结果效果图; 2、代码运行版本 Matlab 2019b;若运行有误,根据提示修改;若不会,私信博主; 3、运行操作步骤 步骤一:将所有文件放到Matlab的当前文件夹中; 步骤二:双击打开main.m文件; 步骤三:点击运行,等程序运行完得到结果; 4、仿真咨询 如需其他服务,可私信博主; 4.1 博客或资源的完整代码提供 4.2 期刊或参考文献复现 4.3 Matlab程序定制 4.4 科研合作
# 基于C++的MiniSQL数据库系统 ## 项目简介 MiniSQL是一个轻量级的关系型数据库管理系统,旨在提供基本的SQL解析和执行功能。该项目参考了CMU15445 BusTub框架,并在其基础上进行了修改和扩展,以兼容原MiniSQL实验指导的要求。MiniSQL支持缓冲池管理、索引管理、记录管理等核心功能,并提供了简单的交互式SQL解析和执行引擎。 ## 项目的主要特性和功能 1. 缓冲池管理实现了一个高效的缓冲池管理器,用于缓存磁盘上的数据页,以提高数据访问速度。 2. 索引管理支持B+树索引,提供高效的插入、删除和查找操作。 3. 记录管理实现了记录的插入、删除、更新和查询功能,支持持久化存储。 4. 元数据管理提供了表和索引的元数据管理功能,支持持久化存储和检索。 5. 并发控制实现了基本的锁管理器,支持事务的并发控制。 6. 查询执行提供了简单的查询执行引擎,支持基本的SQL语句解析和执行。 ## 安装使用步骤
社会科学研究Top 10,000 Papers数据解析被引次数下载次数等 一、数据背景与来源 该数据集来源于SSRN(Social Science Research Network)的社会科学研究Top 10,000 Papers,是根据多种学术影响力指标统计得出的,在其平台上最受关注的前10,000篇学术论文的汇总。这些数据反映了国际研究领域的热点话题和发展趋势,对于国内学者研究者来说,是了解社科领域研究进展的重要窗口。 二、数据内容概览 样本数量:数据集包含10,000条记录,每条记录代表一篇在SSRN平台上具有高影响力的学术论文。 论文范围:涵盖社会科学研究的各个领域,包括但不限于经济学、政治学、社会学、心理学、教育学等。 关键指标: 数据下载次数:反映了论文的受欢迎程度和研究者对其内容的关注度。 引用次数:体现了论文在学术界的认可度和影响力,是评估论文质量的重要指标之一。 Rank Paper Total New Downloads Total # of Downloads Total # of Citations # of Authors
行业研究报告、行业调查报告、研报
【作品名称】:基于 Java+Mysql 实现的企业人事管理系统【课程设计/毕业设计】(源码+设计报告) 【适用人群】:适用于希望学习不同技术领域的小白或进阶学习者。可作为毕设项目、课程设计、大作业、工程实训或初期项目立项。 【项目介绍】: [1]管理员可以对员工的基本信息的增删改查,普通员工仅可查看; [2]对公司里所有员工进行分配工号,并进行基本信息的录入; [3]对新聘用的员工,将其信息加入到员工档案记录中; [4]对于解聘的员工,将其信息从员工档案记录中删除。 [5]当员工信息发生变动时,修改员工档案记录中相应的属性。 (三)员工岗位信息管理 [1]对公司里所有员工的职务及岗位信息(岗位职责)进行记录; [2]记录员工调动前后的具体职务,以及调动时间。 (四)考勤管理 [1]对员工上班刷卡的记录进行统一编号;登记员工上班时间(准时、迟到)。 [2]对员工下班刷卡的记录进行统一编号;登记员工下班时间(准时、早 【资源声明】:本资源作为“参考资料”而不是“定制需求”,代码只能作为参考,不能完全复制照搬。需要有一定的基础看懂代码,自行调试代码并解决报错,能自行添加功能修改代码。
# 基于Arduino编程的冰箱警报系统 ## 项目简介 这是一个基于Arduino编程的项目,通过连接到冰箱门开关的警报系统来提醒用户冰箱门开启时间过长。用户可以在设定的时间内关闭冰箱门,否则警报会响起。项目使用LCD控制器面板来设置和配置警报延迟时间。 ## 项目的主要特性和功能 1. 警报功能在冰箱门开启后,系统会开始计时,如果用户在设定的时间内未关闭冰箱门,警报会响起。 2. LCD配置面板使用LCD控制器面板设置和配置警报延迟时间。 3. 可配置警报时间用户可以根据需要调整警报延迟时间。 4. 状态显示LCD面板显示冰箱门的状态(开启关闭)。 ## 安装使用步骤 1. 下载并解压项目文件。 2. 准备硬件部件根据提供的物料清单(Bill of Materials)准备所需的硬件部件。 3. 连接硬件部件按照项目文档中的连接表(Connection Table)将硬件部件连接到Arduino主板和LCD控制器面板。