UserScan的处理流程分析
前置说明
Userscan是通过client或cp中发起的scanner操作。
在Scan中通过caching属性来返回可以返回多少条数据,每次进行next时。
通过batch属性来设置每次在rs端每次next kv时,可读取多少个kv,(在同一行的情况下)
在生成Scan实例时,最好是把family与column都设置上,这样能保证查询的最高效.
client端通过生成Scan实例,通过HTable下的如下方法得到ClientScanner实例
public ResultScanner getScanner(final Scan scan)
在生成的ClientScanner实例中的callable属性的值为生成的一个ScannerCallable实例。
并通过callable.prepare(tries != 0);方法得到此scan的startkey所在的region的location.在meta表中。
把startkey对应的location中得到此location的HRegionInfo信息。
并设置ClientScanner.currentRegion的值为当前的region.也就是startkey所在的region.
通过ClientScanner.next向rs发起rpc调用操作。调用HRegionServer.scan
public ScanResponse scan(finalRpcControllercontroller, final ScanRequest request)
ClientScanner.next时,首先是发起openScanner操作,得到一个ScannerId
通过ScannerCallable.call方法:
if (scannerId == -1L) {
this.scannerId = openScanner();
} else {
openScanner方法:中发起一个scan操作,通过rpc调用rs.scan
ScanRequest request =
RequestConverter.buildScanRequest(
getLocation().getRegionInfo().getRegionName(),
this.scan, 0, false);
try {
ScanResponse response = getStub().scan(null, request);
longid = response.getScannerId();
if (logScannerActivity) {
LOG.info("Open scanner=" + id + " for scan=" + scan.toString()
+ " on region " + getLocation().toString());
}
returnid;
HregionServer.scan中对openScanner的处理:
public ScanResponse scan(finalRpcControllercontroller, final ScanRequest request)
throws ServiceException {
Leases.Lease lease = null;
String scannerName = null;
........................................很多代码没有显示
requestCount.increment();
intttl = 0;
HRegion region = null;
RegionScannerscanner = null;
RegionScannerHolder rsh = null;
booleanmoreResults = true;
booleancloseScanner = false;
ScanResponse.Builder builder = ScanResponse.newBuilder();
if (request.hasCloseScanner()) {
closeScanner = request.getCloseScanner();
}
introws = 1;
if (request.hasNumberOfRows()) {
rows = request.getNumberOfRows();
}
if (request.hasScannerId()) {
.................................很多代码没有显示
} else {
得到请求的HRegion实例,也就是startkey所在的HRegion
region = getRegion(request.getRegion());
ClientProtos.Scan protoScan = request.getScan();
booleanisLoadingCfsOnDemandSet = protoScan.hasLoadColumnFamiliesOnDemand();
Scan scan = ProtobufUtil.toScan(protoScan);
// if the request doesn't set this, get the default region setting.
if (!isLoadingCfsOnDemandSet) {
scan.setLoadColumnFamiliesOnDemand(region.isLoadingCfsOnDemandDefault());
}
scan.getAttribute(Scan.SCAN_ATTRIBUTES_METRICS_ENABLE);
如果scan没有设置family,把region中所有的family当成scan的family
region.prepareScanner(scan);
if (region.getCoprocessorHost() != null) {
scanner = region.getCoprocessorHost().preScannerOpen(scan);
}
if (scanner == null) {
执行HRegion.getScanner方法。生成HRegion.RegionScannerImpl方法
scanner = region.getScanner(scan);
}
if (region.getCoprocessorHost() != null) {
scanner = region.getCoprocessorHost().postScannerOpen(scan, scanner);
}
把生成的RegionScanner添加到scanners集合容器中。并设置scannerid(一个随机的值),
scannername是scannerid的string版本。添加过期监控处理,
通过hbase.client.scanner.timeout.period配置过期时间,默认值为60000ms
老版本通过hbase.regionserver.lease.period配置。
过期检查线程通过Leases完成。对scanner的过期处理通过一个
HregionServer.ScannerListener.leaseExpired实例来完成。
scannerId = addScanner(scanner, region);
scannerName = String.valueOf(scannerId);
ttl = this.scannerLeaseTimeoutPeriod;
}
............................................很多代码没有显示
Hregion.getScanner方法生成RegionScanner实例流程
publicRegionScannergetScanner(Scan scan) throws IOException {
returngetScanner(scan, null);
}
层次的调用,此时传入的kvscannerlist为null
protectedRegionScannergetScanner(Scan scan,
List<KeyValueScanner> additionalScanners) throws IOException {
startRegionOperation(Operation.SCAN);
try {
// Verify families are all valid
prepareScanner(scan);
if(scan.hasFamilies()) {
for(byte [] family : scan.getFamilyMap().keySet()) {
checkFamily(family);
}
}
returninstantiateRegionScanner(scan, additionalScanners);
} finally {
closeRegionOperation();
}
}
最终生成一个HRegion.RegionScannerImpl实例
protectedRegionScannerinstantiateRegionScanner(Scan scan,
List<KeyValueScanner> additionalScanners) throws IOException {
returnnewRegionScannerImpl(scan, additionalScanners, this);
}
RegionScanner实例的生成构造方法:
RegionScannerImpl(Scan scan, List<KeyValueScanner> additionalScanners, HRegion region)
throws IOException {
this.region = region;
this.maxResultSize = scan.getMaxResultSize();
if (scan.hasFilter()) {
this.filter = newFilterWrapper(scan.getFilter());
} else {
this.filter = null;
}
this.batch = scan.getBatch();
if (Bytes.equals(scan.getStopRow(), HConstants.EMPTY_END_ROW) && !scan.isGetScan()) {
this.stopRow = null;
} else {
this.stopRow = scan.getStopRow();
}
// If we are doing a get, we want to be [startRow,endRow] normally
// it is [startRow,endRow) and if startRow=endRow we get nothing.
this.isScan = scan.isGetScan() ? -1 : 0;
// synchronize on scannerReadPoints so that nobody calculates
// getSmallestReadPoint, before scannerReadPoints is updated.
IsolationLevelisolationLevel = scan.getIsolationLevel();
synchronized(scannerReadPoints) {
if (isolationLevel == IsolationLevel.READ_UNCOMMITTED) {
// This scan can read even uncommitted transactions
this.readPt = Long.MAX_VALUE;
MultiVersionConsistencyControl.setThreadReadPoint(this.readPt);
} else {
this.readPt = MultiVersionConsistencyControl.resetThreadReadPoint(mvcc);
}
scannerReadPoints.put(this, this.readPt);
}
// Here we separate all scanners into two lists - scanner that provide data required
// by the filter to operate (scanners list) and all others (joinedScanners list).
List<KeyValueScanner> scanners = newArrayList<KeyValueScanner>();
List<KeyValueScanner> joinedScanners = newArrayList<KeyValueScanner>();
if (additionalScanners != null) {
scanners.addAll(additionalScanners);
}
迭代每一个要进行scan的store。生成具体的StoreScanner实例。通常情况下joinedHead的值为null
for (Map.Entry<byte[], NavigableSet<byte[]>> entry :
scan.getFamilyMap().entrySet()) {
Storestore = stores.get(entry.getKey());
生成StoreScanner实例。通过HStore.getScanner(scan,columns);
KeyValueScannerscanner = store.getScanner(scan, entry.getValue());
if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
|| this.filter.isFamilyEssential(entry.getKey())) {
scanners.add(scanner);
} else {
joinedScanners.add(scanner);
}
}
生成KeyValueHeap实例,把所有的storescanner的开始位置移动到startkey的位置并得到top的StoreScanner,
this.storeHeap = newKeyValueHeap(scanners, comparator);
if (!joinedScanners.isEmpty()) {
this.joinedHeap = newKeyValueHeap(joinedScanners, comparator);
}
}
得到StoreScanner实例的HStore.getScanner(scan,columns)方法
publicKeyValueScannergetScanner(Scan scan,
finalNavigableSet<byte []> targetCols) throws IOException {
lock.readLock().lock();
try {
KeyValueScannerscanner = null;
if (this.getCoprocessorHost() != null) {
scanner = this.getCoprocessorHost().preStoreScannerOpen(this, scan, targetCols);
}
if (scanner == null) {
scanner = newStoreScanner(this, getScanInfo(), scan, targetCols);
}
returnscanner;
} finally {
lock.readLock().unlock();
}
}
生成StoreScanner的构造方法:
publicStoreScanner(Storestore, ScanInfo scanInfo, Scan scan, finalNavigableSet<byte[]> columns)
throws IOException {
this(store, scan.getCacheBlocks(), scan, columns, scanInfo.getTtl(),
scanInfo.getMinVersions());
如果设置有scan的_raw_属性时,columns的值需要为null
if (columns != null && scan.isRaw()) {
thrownewDoNotRetryIOException(
"Cannot specify any column for a raw scan");
}
matcher = newScanQueryMatcher(scan, scanInfo, columns,
ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP,
oldestUnexpiredTS);
得到StoreFileScanner,StoreFileScanner中引用的StoreFile.Reader中引用HFileReaderV2,
HFileReaderV2的实例在StoreFile.Reader中如果已经存在,不会重新创建,这样会加快scanner的创建时间。
// Pass columns to try to filter out unnecessary StoreFiles.
List<KeyValueScanner> scanners = getScannersNoCompaction();
// Seek all scanners to the start of the Row (or if the exact matching row
// key does not exist, then to the start of the next matching Row).
// Always check bloom filter to optimize the top row seek for delete
// family marker.
if (explicitColumnQuery && lazySeekEnabledGlobally) {
for (KeyValueScannerscanner : scanners) {
scanner.requestSeek(matcher.getStartKey(), false, true);
}
} else {
if (!isParallelSeekEnabled) {
for (KeyValueScannerscanner : scanners) {
scanner.seek(matcher.getStartKey());
}
} else {
parallelSeek(scanners, matcher.getStartKey());
}
}
// set storeLimit
this.storeLimit = scan.getMaxResultsPerColumnFamily();
// set rowOffset
this.storeOffset = scan.getRowOffsetPerColumnFamily();
// Combine all seeked scanners with a heap
heap = newKeyValueHeap(scanners, store.getComparator());
注册,如果有storefile更新时,把更新后的storefile添加到这个StoreScanner中来。
this.store.addChangedReaderObserver(this);
}
发起scan的rpc操作
client端发起openScanner操作后,得到一个scannerId.此时发起scan操作。
通过ScannerCallable.call中发起call的操作,在scannerId不等于-1时,
Result [] rrs = null;
ScanRequest request = null;
try {
incRPCcallsMetrics();
request = RequestConverter.buildScanRequest(scannerId, caching, false, nextCallSeq);
ScanResponse response = null;
PayloadCarryingRpcController controller = newPayloadCarryingRpcController();
try {
controller.setPriority(getTableName());
response = getStub().scan(controller, request);
...................................此处省去一些代码
nextCallSeq++;
longtimestamp = System.currentTimeMillis();
// Results are returned via controller
CellScannercellScanner = controller.cellScanner();
rrs = ResponseConverter.getResults(cellScanner, response);
HregionServer.scan方法中对scan时的处理流程:
得到scan中的caching属性的值,此值主要用来响应client返回的条数。如果一行数据包含多个kv,算一条
introws = 1;
if (request.hasNumberOfRows()) {
rows = request.getNumberOfRows();
}
如果client传入的scannerId有值,也就是不等于-1时,表示不是openScanner操作,检查scannerid是否过期
if (request.hasScannerId()) {
rsh = scanners.get(scannerName);
if (rsh == null) {
LOG.info("Client tried to access missing scanner " + scannerName);
thrownewUnknownScannerException(
"Name: " + scannerName + ", already closed?");
}
此处主要是检查region是否发生过split操作。如果是会出现NotServingRegionException操作。
scanner = rsh.s;
HRegionInfo hri = scanner.getRegionInfo();
region = getRegion(hri.getRegionName());
if (region != rsh.r) { // Yes, should be the same instance
thrownewNotServingRegionException("Region was re-opened after the scanner"
+ scannerName + " was created: " + hri.getRegionNameAsString());
}
} else {
...................................此处省去一些生成Regionscanner的代码
}
表示有设置caching,如果是执行scan,此时的默认值为1,当前scan中设置有caching后,使用scan中设置的值
if (rows > 0) {
// if nextCallSeq does not match throw Exception straight away. This needs to be
// performed even before checking of Lease.
// See HBASE-5974
是否有配置nextCallSeq的值,第一次调用时,此值为0,每调用一次加一,client也一样,每调用一次加一。
if (request.hasNextCallSeq()) {
if (rsh == null) {
rsh = scanners.get(scannerName);
}
if (rsh != null) {
if (request.getNextCallSeq() != rsh.nextCallSeq) {
thrownewOutOfOrderScannerNextException("Expected nextCallSeq: " + rsh.nextCallSeq
+ " But the nextCallSeq got from client: " + request.getNextCallSeq() +
"; request=" + TextFormat.shortDebugString(request));
}
// Increment the nextCallSeq value which is the next expected from client.
rsh.nextCallSeq++;
}
}
try {
先从租约管理中移出此租约,防止查找时间大于过期时间而出现的超时
// Remove lease while its being processed in server; protects against case
// where processing of request takes > lease expiration time.
lease = leases.removeLease(scannerName);
生成要返回的条数的一个列表,scan.caching
List<Result> results = newArrayList<Result>(rows);
longcurrentScanResultSize = 0;
booleandone = false;
调用cp的preScannernext,如果返回为true,表示不在执行scan操作。
// Call coprocessor. Get region info from scanner.
if (region != null && region.getCoprocessorHost() != null) {
Boolean bypass = region.getCoprocessorHost().preScannerNext(
scanner, results, rows);
if (!results.isEmpty()) {
for (Result r : results) {
if (maxScannerResultSize < Long.MAX_VALUE){
for (Cellkv : r.rawCells()) {
// TODO
currentScanResultSize += KeyValueUtil.ensureKeyValue(kv).heapSize();
}
}
}
}
if (bypass != null && bypass.booleanValue()) {
done = true;
}
}
执行scan操作。Cp的preScannerNext返回为false,或没有设置cp(主要是RegionObServer)
返回给client的最大size通过hbase.client.scanner.max.result.size配置,默认为long.maxvalue
如果scan也设置有maxResultSize,使用scan设置的值
if (!done) {
longmaxResultSize = scanner.getMaxResultSize();
if (maxResultSize <= 0) {
maxResultSize = maxScannerResultSize;
}
List<Cell> values = newArrayList<Cell>();
MultiVersionConsistencyControl.setThreadReadPoint(scanner.getMvccReadPoint());
region.startRegionOperation(Operation.SCAN);
try {
inti = 0;
synchronized(scanner) {
此处开始迭代,开始调用regionScanner(HRegion.RegionScannerImpl.nextRaw(List))进行查找,
迭代的长度为scan设置的caching的大小,如果执行RegionScanner.nextRaw(List)返回为false,时也会停止迭代
for (; i < rows
&& currentScanResultSize < maxResultSize; i++) {
返回的true表示还有数据,可以接着查询,否则表示此region中已经没有符合条件的数据了。
// Collect values to be returned here
booleanmoreRows = scanner.nextRaw(values);
if (!values.isEmpty()) {
if (maxScannerResultSize < Long.MAX_VALUE){
for (Cellkv : values) {
currentScanResultSize += KeyValueUtil.ensureKeyValue(kv).heapSize();
}
}
results.add(Result.create(values));
}
if (!moreRows) {
break;
}
values.clear();
}
}
region.readRequestsCount.add(i);
} finally {
region.closeRegionOperation();
}
// coprocessor postNext hook
if (region != null && region.getCoprocessorHost() != null) {
region.getCoprocessorHost().postScannerNext(scanner, results, rows, true);
}
}
如果没有可以再查找的数据时,设置response的moreResults为false
// If the scanner's filter - if any - is done with the scan
// and wants to tell the client to stop the scan. This is done by passing
// a null result, and setting moreResults to false.
if (scanner.isFilterDone() && results.isEmpty()) {
moreResults = false;
results = null;
} else {
添加结果到response中,如果hbase.client.rpc.codec配置有codec的值,
默认取hbase.client.default.rpc.codec配置的值,默认为KeyValueCodec
如果上面说的codec配置不为null时,把results生成为一个iterator,并生成一个匿名的CallScanner实现类
设置到scan时传入的controller中。这样能提升查询数据的读取性能。
如果没有配置codec时,默认直接把results列表设置到response中,这样响应的数据可能会比较大。
addResults(builder, results, controller);
}
} finally {
重新把租约放入到租约检查管理器中,此租约主要来检查client多长时间没有发起过scan的操作。
// We're done. On way out re-add the above removed lease.
// Adding resets expiration time on lease.
if (scanners.containsKey(scannerName)) {
if (lease != null) leases.addLease(lease);
ttl = this.scannerLeaseTimeoutPeriod;
}
}
}
client端获取响应的数据:ScannerCallable.call方法中
rrs = ResponseConverter.getResults(cellScanner, response);
ResponseConverter.getResults方法的实现
publicstatic Result[] getResults(CellScannercellScanner, ScanResponse response)
throws IOException {
if (response == null) returnnull;
// If cellscanner, then the number of Results to return is the count of elements in the
// cellsPerResult list. Otherwise, it is how many results are embedded inside the response.
intnoOfResults = cellScanner != null?
response.getCellsPerResultCount(): response.getResultsCount();
Result[] results = new Result[noOfResults];
for (inti = 0; i < noOfResults; i++) {
cellScanner如果codec配置为有值时,在rs响应时会生成一个匿名的实现
if (cellScanner != null) {
......................................
intnoOfCells = response.getCellsPerResult(i);
List<Cell> cells = newArrayList<Cell>(noOfCells);
for (intj = 0; j < noOfCells; j++) {
try {
if (cellScanner.advance() == false) {
.....................................
String msg = "Results sent from server=" + noOfResults + ". But only got " + i
+ " results completely at client. Resetting the scanner to scan again.";
LOG.error(msg);
thrownewDoNotRetryIOException(msg);
}
} catch (IOException ioe) {
...........................................
LOG.error("Exception while reading cells from result."
+ "Resetting the scanner to scan again.", ioe);
thrownewDoNotRetryIOException("Resetting the scanner.", ioe);
}
cells.add(cellScanner.current());
}
results[i] = Result.create(cells);
} else {
否则,没有设置codec,直接从response中读取出来数据,
// Result is pure pb.
results[i] = ProtobufUtil.toResult(response.getResults(i));
}
}
returnresults;
}
在ClientScanner.next方法中,如果还没有达到scan的caching的值,(默认为1)也就是countdown的值还不等于0
,countdown的值为得到一个Result时减1,通过nextScanner重新得到下一个region,并发起连接去scan数据。
Do{
.........................此处省去一些代码。
if (values != null && values.length > 0) {
for (Result rs : values) {
cache.add(rs);
for (Cellkv : rs.rawCells()) {
// TODO make method in Cell or CellUtil
remainingResultSize -= KeyValueUtil.ensureKeyValue(kv).heapSize();
}
countdown--;
this.lastResult = rs;
}
}
} while (remainingResultSize > 0 && countdown > 0 && nextScanner(countdown, values == null));
对于这种类型的查询操作,可以使用得到一个ClientScanner后,不执行close操作。
在rs的timeout前每次定期去从rs中拿一定量的数据下来。缓存到ClientScanner的cache中。
每次next时从cache中直接拿数据
Hregion.RegionScannerImpl.nextRaw(list)方法分析
RegionScannerImpl是对RegionScanner接口的实现。
Rs的scan在执行时通过regionScanner.nextRaw(list)来获取数据。
通过regionScanner.isFilterDone来检查此region的查找是否完成。
调用nextRaw方法,此方法调用另一个重载方法,batch是scan中设置的每次可查询最大的单行中的多少个kv的kv个数
publicbooleannextRaw(List<Cell> outResults)
throws IOException {
returnnextRaw(outResults, batch);
}
publicbooleannextRaw(List<Cell> outResults, intlimit) throws IOException {
booleanreturnResult;
调用nextInternal方法。
if (outResults.isEmpty()) {
// Usually outResults is empty. This is true when next is called
// to handle scan or get operation.
returnResult = nextInternal(outResults, limit);
} else {
List<Cell> tmpList = newArrayList<Cell>();
returnResult = nextInternal(tmpList, limit);
outResults.addAll(tmpList);
}
调用filter.reset方法,清空当前row的filter的相关信息。
ResetFilters();
如果filter.filterAllRemaining()的返回值为true,时表示当前region的查找条件已经结束,不能在执行查找操作。
没有可以接着查找的需要,也就是没有更多要查找的行了。
if (isFilterDone()) {
returnfalse;
}
................................此处省去一些代码
returnreturnResult;
}
nextInternal方法处理流程:
privatebooleannextInternal(List<Cell> results, intlimit)
throws IOException {
if (!results.isEmpty()) {
thrownewIllegalArgumentException("First parameter should be an empty list");
}
RpcCallContextrpcCall = RpcServer.getCurrentCall();
// The loop here is used only when at some point during the next we determine
// that due to effects of filters or otherwise, we have an empty row in the result.
// Then we loop and try again. Otherwise, we must get out on the first iteration via return,
// "true" if there's more data to read, "false" if there isn't (storeHeap is at a stop row,
// and joinedHeap has no more data to read for the last row (if set, joinedContinuationRow).
while (true) {
if (rpcCall != null) {
// If a user specifies a too-restrictive or too-slow scanner, the
// client might time out and disconnect while the server side
// is still processing the request. We should abort aggressively
// in that case.
longafterTime = rpcCall.disconnectSince();
if (afterTime >= 0) {
thrownewCallerDisconnectedException(
"Aborting on region " + getRegionNameAsString() + ", call " +
this + " after " + afterTime + " ms, since " +
"caller disconnected");
}
}
得到通过startkey seek后当前最小的一个kv。
// Let's see what we have in the storeHeap.
KeyValue current = this.storeHeap.peek();
byte[] currentRow = null;
intoffset = 0;
shortlength = 0;
if (current != null) {
currentRow = current.getBuffer();
offset = current.getRowOffset();
length = current.getRowLength();
}
检查是否到了stopkey,如果是,返回false,joinedContinuationRow是多个cf的关联查找,不用去管它
booleanstopRow = isStopRow(currentRow, offset, length);
// Check if we were getting data from the joinedHeap and hit the limit.
// If not, then it's main path - getting results from storeHeap.
if (joinedContinuationRow == null) {
// First, check if we are at a stop row. If so, there are no more results.
if (stopRow) {
如果是stopRow,同时filter.hasFilterRow返回为true时,
可通过filterRowCells来检查要返回的kvlist,也可以用来修改要返回的kvlist
if (filter != null && filter.hasFilterRow()) {
filter.filterRowCells(results);
}
returnfalse;
}
通过filter.filterRowkey来过滤检查key是否需要排除,如果是排除返回true,否则返回false
// Check if rowkey filter wants to exclude this row. If so, loop to next.
// Technically, if we hit limits before on this row, we don't need this call.
if (filterRowKey(currentRow, offset, length)) {
如果rowkey是需要排除的rowkey,检查是否有下一行数据。如果没有下一行数据,返回flase,表示当前region查找结束
否则清空当前的results,重新进行查找
booleanmoreRows = nextRow(currentRow, offset, length);
if (!moreRows) returnfalse;
results.clear();
continue;
}
开始执行region下此scan需要的所有store的StoreScanner的next进行查找,把查找的结果放到results列表中。
如果一行中包含有多个kv,现在查找这些kv达到传入的limit的大小的时候,返回kv_limit的一个空的kv。
(查找的大小已经达到limit(batch)的一行最大scan的kv个数,返回kv_limit),
否则表示还没有查找到limit的kv个数,但是当前row对应的所有达到条件的kv都已经查找完成,返回最后一个kv。
返回的kv如果不是kv_limit,那么有可能是null或者是下一行的第一个kv.
KeyValue nextKv = populateResult(results, this.storeHeap, limit, currentRow, offset,
length);
如果达到limit的限制时,filter.hasFilterRow的值一定得是false,
否则会throw IncompatibleFilterException
如果达到limit的限制时,返回true,当前row的所有kv查找结束,返回true可以接着向下查找
提示:如果hbase一行数据中可能包含多个kv时,最好是在scan时设置batch的属性,否则会一直查找到所有的kv结束
// Ok, we are good, let's try to get some results from the main heap.
if (nextKv == KV_LIMIT) {
if (this.filter != null && filter.hasFilterRow()) {
thrownewIncompatibleFilterException(
"Filter whose hasFilterRow() returns true is incompatible with scan with limit!");
}
returntrue; // We hit the limit.
}
是否到结束行,从这一行代码中可以看出,stoprow是不包含的,因为nextKv肯定是下一行row中第一个kv的值
stopRow = nextKv == null ||
isStopRow(nextKv.getBuffer(), nextKv.getRowOffset(), nextKv.getRowLength());
// save that the row was empty before filters applied to it.
finalbooleanisEmptyRow = results.isEmpty();
如果是stopRow,同时filter.hasFilterRow返回为true时,
可通过filterRowCells来检查要返回的kvlist,也可以用来修改要返回的kvlist
// We have the part of the row necessary for filtering (all of it, usually).
// First filter with the filterRow(List).
if (filter != null && filter.hasFilterRow()) {
filter.filterRowCells(results);
}
如果当前row的查找没有找到合法的kv,也就是results的列表没有值,检查是否还有下一行,
如果有,重新进行查找,否则表示当前region的查找最结尾处,不能再进行查找,返回fasle
if (isEmptyRow) {
booleanmoreRows = nextRow(currentRow, offset, length);
if (!moreRows) returnfalse;
results.clear();
// This row was totally filtered out, if this is NOT the last row,
// we should continue on. Otherwise, nothing else to do.
if (!stopRow) continue;
returnfalse;
}
// Ok, we are done with storeHeap for this row.
// Now we may need to fetch additional, non-essential data into row.
// These values are not needed for filter to work, so we postpone their
// fetch to (possibly) reduce amount of data loads from disk.
if (this.joinedHeap != null) {
..................................进行关联查找的代码,不显示,也不分析
}
} else {
多个store进行关联查询,不分析,通常情况不会有
// Populating from the joined heap was stopped by limits, populate some more.
populateFromJoinedHeap(results, limit);
}
// We may have just called populateFromJoinedMap and hit the limits. If that is
// the case, we need to call it again on the next next() invocation.
if (joinedContinuationRow != null) {
returntrue;
}
如果这次的查找,results的结果为空,表示没有查找到结果,检查是否还有下一行数据,如果有重新进行查找,
否则返回false表示此region的查找结束
// Finally, we are done with both joinedHeap and storeHeap.
// Double check to prevent empty rows from appearing in result. It could be
// the case when SingleColumnValueExcludeFilter is used.
if (results.isEmpty()) {
booleanmoreRows = nextRow(currentRow, offset, length);
if (!moreRows) returnfalse;
if (!stopRow) continue;
}
非stoprow时,表示还可以有下一行的数据,也就是可以接着进行next操作。否则表示此region的查找结束
// We are done. Return the result.
return !stopRow;
}
}
UserScan时的ScanQueryMatcher.match方法处理
user scan时的ScanQueryMatcher在newRegionScannerImpl(scan, additionalScanners, this);时生成。
在生成StoreScanner时通过如下代码生成matcher实例。
matcher = newScanQueryMatcher(scan, scanInfo, columns,
ScanType.USER_SCAN, Long.MAX_VALUE, HConstants.LATEST_TIMESTAMP,
oldestUnexpiredTS);
matcher.isUserScan的值此时为true.
publicMatchCodematch(KeyValue kv) throws IOException {
检查当前region的查找是否结束。pageFilter就是通过控制此filter中的方法来检查是否需要
if (filter != null && filter.filterAllRemaining()) {
returnMatchCode.DONE_SCAN;
}
byte [] bytes = kv.getBuffer();
intoffset = kv.getOffset();
intkeyLength = Bytes.toInt(bytes, offset, Bytes.SIZEOF_INT);
offset += KeyValue.ROW_OFFSET;
intinitialOffset = offset;
shortrowLength = Bytes.toShort(bytes, offset, Bytes.SIZEOF_SHORT);
offset += Bytes.SIZEOF_SHORT;
检查传入的kv是否是当前行的kv,也就是rowkey是否相同,如果当前的rowkey小于传入的rowkey,
表示现在已经next到下一行了,返回DONE,表示当前行查找结束
intret = this.rowComparator.compareRows(row, this.rowOffset, this.rowLength,
bytes, offset, rowLength);
if (ret <= -1) {
returnMatchCode.DONE;
} elseif (ret >= 1) {
如果当前的rowkey大于传入的rowkey,表示当前next出来的kv比现在的kv要小,执行nextrow操作。
// could optimize this, if necessary?
// Could also be called SEEK_TO_CURRENT_ROW, but this
// should be rare/never happens.
returnMatchCode.SEEK_NEXT_ROW;
}
是否跳过当前行的其它kv比较,这是一个优化项。
// optimize case.
if (this.stickyNextRow)
returnMatchCode.SEEK_NEXT_ROW;
如果当前行的所有要查找的(scan)column都查找完成了,其它的当前行中非要scan的kv,
直接不比较,执行nextrow操作。
if (this.columns.done()) {
stickyNextRow = true;
returnMatchCode.SEEK_NEXT_ROW;
}
//Passing rowLength
offset += rowLength;
//Skipping family
bytefamilyLength = bytes [offset];
offset += familyLength + 1;
intqualLength = keyLength -
(offset - initialOffset) - KeyValue.TIMESTAMP_TYPE_SIZE;
检查当前KV的TTL是否过期,如果过期,检查是否SCAN中还有下一个COLUMN,如果有返回SEEK_NEXT_COL。
否则返回SEEK_NEXT_ROW。
longtimestamp = Bytes.toLong(bytes, initialOffset + keyLength - KeyValue.TIMESTAMP_TYPE_SIZE);
// check for early out based on timestamp alone
if (columns.isDone(timestamp)) {
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
/*
* The delete logic is pretty complicated now.
* This is corroborated by the following:
* 1. The store might be instructed to keep deleted rows around.
* 2. A scan can optionally see past a delete marker now.
* 3. If deleted rows are kept, we have to find out when we can
* remove the delete markers.
* 4. Family delete markers are always first (regardless of their TS)
* 5. Delete markers should not be counted as version
* 6. Delete markers affect puts of the *same* TS
* 7. Delete marker need to be version counted together with puts
* they affect
*/
bytetype = bytes[initialOffset + keyLength – 1];
如果当前KV是删除的KV。
if (kv.isDelete()) {
此处会进入。把删除的KV添加到DeleteTracker中,默认是ScanDeleteTracker
if (!keepDeletedCells) {
// first ignore delete markers if the scanner can do so, and the
// range does not include the marker
//
// during flushes and compactions also ignore delete markers newer
// than the readpoint of any open scanner, this prevents deleted
// rows that could still be seen by a scanner from being collected
booleanincludeDeleteMarker = seePastDeleteMarkers ?
tr.withinTimeRange(timestamp) :
tr.withinOrAfterTimeRange(timestamp);
if (includeDeleteMarker
&& kv.getMvccVersion() <= maxReadPointToTrackVersions) {
this.deletes.add(bytes, offset, qualLength, timestamp, type);
}
// Can't early out now, because DelFam come before any other keys
}
此处的检查不会进入,userscan不保留删除的数据
if (retainDeletesInOutput
|| (!isUserScan && (EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes)
|| kv.getMvccVersion() > maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
if (!isUserScan) {
// if this is not a user scan (compaction), we can filter this deletemarker right here
// otherwise (i.e. a "raw" scan) we fall through to normal version and timerange checking
returnMatchCode.INCLUDE;
}
} elseif (keepDeletedCells) {
if (timestamp < earliestPutTs) {
// keeping delete rows, but there are no puts older than
// this delete in the store files.
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
// else: fall through and do version counting on the
// delete markers
} else {
returnMatchCode.SKIP;
}
// note the following next else if...
// delete marker are not subject to other delete markers
} elseif (!this.deletes.isEmpty()) {
如果deleteTracker中不为空时,也就是当前行中有删除的KV,检查当前KV是否是删除的KV
提示:删除的KV在compare时,比正常的KV要小,所以在执行next操作时,delete的KV会先被查找出来。
如果是删除的KV,根据KV的删除类型,如果是版本被删除,返回SKIP。
否则如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
DeleteResultdeleteResult = deletes.isDeleted(bytes, offset, qualLength,
timestamp);
switch (deleteResult) {
caseFAMILY_DELETED:
caseCOLUMN_DELETED:
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
caseVERSION_DELETED:
caseFAMILY_VERSION_DELETED:
returnMatchCode.SKIP;
caseNOT_DELETED:
break;
default:
thrownewRuntimeException("UNEXPECTED");
}
}
检查KV的时间是否在SCAN要查找的时间范围内,
inttimestampComparison = tr.compare(timestamp);
如果大于SCAN的最大时间,返回SKIP。
if (timestampComparison >= 1) {
returnMatchCode.SKIP;
} elseif (timestampComparison <= -1) {
如果小于SCAN的最小时间,如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
检查当前KV的column是否是SCAN中指定的column列表中包含的值,如果是INCLUDE。
否则如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
// STEP 1: Check if the column is part of the requested columns
MatchCodecolChecker = columns.checkColumn(bytes, offset, qualLength, type);
如果column是SCAN中要查找的column之一
if (colChecker == MatchCode.INCLUDE) {
ReturnCodefilterResponse = ReturnCode.SKIP;
// STEP 2: Yes, the column is part of the requested columns. Check if filter is present
if (filter != null) {
执行filter.filterKeyValue操作。并返回filter过滤的结果
// STEP 3: Filter the key value and return if it filters out
filterResponse = filter.filterKeyValue(kv);
switch (filterResponse) {
caseSKIP:
returnMatchCode.SKIP;
caseNEXT_COL:
如果SCAN中还有下一个要SCAN的column时,返回SEEK_NEXT_COL。
否则表示当前行没有需要在进行查找的KV,返回SEEK_NEXT_ROW。
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
caseNEXT_ROW:
stickyNextRow = true;
returnMatchCode.SEEK_NEXT_ROW;
caseSEEK_NEXT_USING_HINT:
returnMatchCode.SEEK_NEXT_USING_HINT;
default:
//It means it is either include or include and seek next
break;
}
}
/*
* STEP 4: Reaching this step means the column is part of the requested columns and either
* the filter is null or the filter has returned INCLUDE or INCLUDE_AND_NEXT_COL response.
* Now check the number of versions needed. This method call returns SKIP, INCLUDE,
* INCLUDE_AND_SEEK_NEXT_ROW, INCLUDE_AND_SEEK_NEXT_COL.
*
* FilterResponse ColumnChecker Desired behavior
* INCLUDE SKIP row has already been included, SKIP.
* INCLUDE INCLUDE INCLUDE
* INCLUDE INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE INCLUDE_AND_SEEK_NEXT_ROW INCLUDE_AND_SEEK_NEXT_ROW
* INCLUDE_AND_SEEK_NEXT_COL SKIP row has already been included, SKIP.
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_ROW INCLUDE_AND_SEEK_NEXT_ROW
*
* In all the above scenarios, we return the column checker return value except for
* FilterResponse (INCLUDE_AND_SEEK_NEXT_COL) and ColumnChecker(INCLUDE)
*/
此处主要是检查KV的是否是SCAN的最大版本内,到这个地方,除非是KV超过了要SCAN的最大版本,或者KV的TTL过期。
否则肯定是会包含此KV的值。
colChecker =
columns.checkVersions(bytes, offset, qualLength, timestamp, type,
kv.getMvccVersion() > maxReadPointToTrackVersions);
//Optimize with stickyNextRow
stickyNextRow = colChecker == MatchCode.INCLUDE_AND_SEEK_NEXT_ROW ? true : stickyNextRow;
return (filterResponse == ReturnCode.INCLUDE_AND_NEXT_COL &&
colChecker == MatchCode.INCLUDE) ? MatchCode.INCLUDE_AND_SEEK_NEXT_COL
: colChecker;
}
stickyNextRow = (colChecker == MatchCode.SEEK_NEXT_ROW) ? true
: stickyNextRow;
returncolChecker;
}
相关推荐
基于C++开发的WEB服务器,支持C/C++、Python、Java等多语言混合开发WEB应用
基于STM8单片机的编程实例,可供参考学习使用,希望对你有所帮助
基于STM8单片机的编程实例,可供参考学习使用,希望对你有所帮助
该靶场仅供学习使用!
电影详情链接 图片链接 影片中文名 影片外国名 评分 评价人数 概况 相关信息 https://movie.douban.com/subject/1292052/ https://img2.doubanio.com/view/photo/s_ratio_poster/public/p480747492.jpg 肖申克的救赎 The Shawshank Redemption 9.7 2529468 希望让人自由 导演: 弗兰克·德拉邦特 Frank Darabont 主演: 蒂姆·罗宾斯 Tim Robbins ... 1994 美国 犯罪 剧情 https://movie.douban.com/subject/1291546/ https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2561716440.jpg 霸王别姬 9.6 1880353 风华绝代 导演: 陈凯歌 Kaige Chen 主演: 张国荣 Leslie Cheung 张丰毅 ........
汽车中间件市场调研报告:2023年全球汽车中间件市场销售额达到了78亿美元 在数字化转型的浪潮中,汽车中间件作为连接硬件与软件的关键桥梁,正引领着汽车行业的新一轮变革。随着全球汽车产业的快速发展,中间件市场规模持续扩大,展现出前所未有的增长潜力。然而,面对复杂多变的市场环境和不断涌现的新技术,企业如何精准把握市场脉搏,实现可持续发展?本文将深入探讨全球及中国汽车中间件市场的现状、趋势及竞争格局,为您揭示咨询的重要性。 市场概况: 根据QYResearch(恒州博智)的统计及预测,2023年全球汽车中间件市场销售额达到了78亿美元(约7803百万美元),预计2030年将达到156亿美元(约15630百万美元),年复合增长率(CAGR)为10.3%(2024-2030)。这一数据不仅彰显了中间件市场的强劲增长动力,也预示着未来巨大的市场空间。 技术创新与趋势: 随着自动驾驶、车联网等技术的不断发展,汽车中间件正面临着前所未有的技术挑战与机遇。新一代中间件需要具备更高的实时性、更低的延迟以及更强的数据处理能力,以满足复杂多变的汽车应用场景。同时,云计算、大数据、人工智能等技术的融合应用,将进
python语言mp3pl爬虫程序代码QZQ
# 小语种字体TTF文件转PNG图片的方法 ## 准备工作 1. 下载python3.9,推荐3.6~3.9,几个依赖包在这个版本运行的好。 2. 下载 FontForge-mingw-w64 ,可自行下载,或从文末打包好的工具包直接使用。 3. 下载需要导出的字体ttf文件,最好先装在本机系统上。 ## 导出方法 1. 把 `tts2png2.py` 文件复制到软件的bin目录下。 2. 修改 `tts2png2.py` 文件中的字体路径,注意Windows用双斜杠。 3. 从bin目录打开终端(管理员模式启动)。 4. 运行脚本 `./ffpython .\tts2png2.py`。
在21世纪的科技浪潮中,人工智能(AI)无疑是最为耀眼的明星之一,它以惊人的速度改变着我们的生活、工作乃至整个社会的运行方式。而在人工智能的广阔领域中,大模型(Large Models)的崛起更是开启了智能技术的新纪元,引领着AI向更加复杂、高效、智能的方向发展。本文将深入探讨人工智能大模型的内涵、技术特点、应用领域以及对未来的影响。 一、人工智能大模型的内涵 人工智能大模型,顾名思义,是指具有庞大参数规模和数据处理能力的AI模型。这些模型通过深度学习算法,在海量数据上进行训练,能够学习到丰富的知识表示和复杂的模式识别能力。与传统的小型或中型模型相比,大模型在理解自然语言、生成高质量内容、进行跨模态信息处理等方面展现出前所未有的优势。它们不仅能够执行特定的任务,如图像识别、语音识别,还能进行创造性的工作,如文本生成、音乐创作,甚至在某些情况下展现出接近或超越人类的智能水平。 二、技术特点 海量数据与高效训练:大模型依赖于庞大的数据集进行训练,这些数据涵盖了广泛的主题和情境,使得模型能够学习到丰富的语义信息和上下文理解能力。同时,高效的训练算法和硬件加速技术,如TPU(Tensor Processing Unit)和GPU,使得大规模模型的训练成为可能。 自注意力机制与Transformer架构:许多领先的大模型采用了Transformer架构,特别是其自注意力机制,这种设计使得模型在处理序列数据时能够捕捉到长距离依赖关系,极大地提高了模型的表达能力和泛化能力。 多任务学习与迁移学习:大模型通常具备多任务学习的能力,即在一次训练中同时学习多个任务,这有助于模型学习到更通用的知识表示。此外,迁移学习使得这些模型能够轻松适应新任务,只需少量额外数据或微调即可。
下垂控制-基于T型三电平逆变器的下垂控制,电压电流双闭环,采用LCL滤波,SPWM调制方式 1.提供simulink仿真源文件 2.提供下垂控制原理与下垂系数计算方法 3.中点平衡控制,电压电流双闭环控制 4.提供参考文献
城市选择器一个仿大众点评的城市快速选择器, 最少只需 一行 代码即可启动城市选择器, 支持页面样式修改,多元化自定义截屏 版本日志V0.4.6优化地理位置设置时有时会设置不成功问题修复其他若干问题修改UI默认主题色V0.4.5修改设置位置信息方式,由之前必须在打开页面之前获取位置信息改为允许用户在打开页面后设置位置信息,具体使用方式见 Step3简化配置项,不需要在AndroidManifest中再注册Activity,并默认隐藏titlebarV0.4.3修复更新数据库表结构后第一次进入会闪退问题V0.4.0数据库表结构修改,增加了高德地图citycode设置gps城市的api略有改动见 Step3V0.3.3紧急修复一个可能导致内存泄漏问题优化提高滑动检索效率隐藏下拉刷新labelV0.3.1在搜索框后面添加一个清空搜索框按钮修复搜索框中输入空格会搜索出全部城市问题修复搜索结果弹出框中文字在不同theme下显示不同颜色问题,现在已统一为黑色其他调用时参数合法性校验V0.3.0简化api调用形式,修改为Rx形式,见操作步骤
慢性病大数据分析处理慢性病项目
Multisim单片机资源单片机C语言程序设计实训100例提取方式是百度网盘分享地址
PMSM永磁同步电机最大转矩电流比MTPA控制仿真,弱磁控制仿真,前馈补偿仿真程序,详细解析教程文档。 这是一份非常完美的仿真文件及详细教程,从仿真效果图看转速、电流及转矩跟随非常稳定。 该算法架构包含如下模块: 1)SVPWM矢量控制模块 2)转速环PI调节器、电流环PI调节器; 3)MTPA调节器; 4)弱磁控制器; 5)前馈补偿; 一份该仿真的算法说明文档,每一步都有详细介绍如何搭建,包括环路参数怎么算,拿来做毕设或者学习都很方便; 几篇参考文献; 一篇作者自己写的算法总结,让你少走弯路; 两个视频;
手绘卡通儿童人物幼儿园教学课件模板
Simulink电动汽车仿真模型(包含行驶阻力模型,工作模式切模型,驾驶员模型,PID控制模块等,NEDC,CLTC工况仿真结果)东西很全
是一款全功能的备份软件,支持增量备份、差异备份、完全备份和同步备份,帮助用户轻松保护关键文件。支持多种存储选项,包括本地硬盘、USB驱动器、网络文件夹、云存储和 FTP 服务器等。 【使用方法】: 1. 下载并安装 Perfect Backup。 2. 打开软件,选择备份类型(增量、差异、完全或同步)。 3. 指定源文件和目标存储位置。 4. 设置备份计划,选择备份频率。 5. 点击“开始备份”,执行备份任务。
基于STM8单片机的编程实例,可供参考学习使用,希望对你有所帮助
资源描述: HTML5实现好看的创意房屋设计公司网页源码,好看的创意房屋设计公司网页源码,创意房屋设计公司网页源码模板,HTML创意房屋设计公司网页源码,内置酷炫的动画,界面干净整洁,页面主题,全方位介绍内容,可以拆分多个想要的页面,可以扩展自己想要的,注释完整,代码规范,各种风格都有,代码上手简单,代码独立,可以直接运行使用。也可直接预览效果。 资源使用: 点击 index.html 直接查看效果