lucene 3.5之SimpleStringInterner

deepfuture

浏览: 4438098 次
性别:
来自: 湛江

最近访客更多访客>>

linxl2011

mars36

jccz_zys

zkm0309

博主相关

博客

微博

相册

留言

关于我

博客专栏

: SQLite源码剖析
浏览量：80452

: WIN32汇编语言学习应用...
浏览量：71036

: 神奇的perl
浏览量：104321

: lucene等搜索引擎解析...
浏览量：288021

: 深入lucene3.5源码...
浏览量：15209

: VB.NET并行与分布式编...
浏览量：68635

: silverlight 5...
浏览量：32699

: 算法下午茶系列
浏览量：46375

文章分类

社区版块

存档分类

博客分类：

深入lucene 3.5源码
搜索引擎

lucene

public class SimpleStringInterner extends StringInterner
可看出SimpleStringInterner提供一个简单的字符串引用缓存，节约内存，保证同一值的字符串使用同一段内存空间,因为使用了intern，所以继承自StringInterner

package org.apache.lucene.util;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/**
* Simple lockless and memory barrier free String intern cache that is guaranteed
* to return the same String instance as String.intern()
* does.
*
* @lucene.internal
*/

public class SimpleStringInterner extends StringInterner {

private static class Entry {
    final private String str;
    final private int hash;
    private Entry next;
    private Entry(String str, int hash, Entry next) {
      this.str = str;
      this.hash = hash;
      this.next = next;
    }
}

private final Entry[] cache;
private final int maxChainLength;
/ **

** param tableSize Size of the hash table, should be a power of two. tableSize
* * @param maxChaingenth Maximum length of each bucket, after which the oldest item inserted is dropped.
   */
public SimpleStringInterner(int tableSize, int maxChainLength) {
    cache = new Entry[Math.max(1,BitUtil.nextHighestPowerOfTwo(tableSize))];
    this.maxChainLength = Math.max(2,maxChainLength);
}

@Override
public String intern(String s) {
    int h = s.hashCode();
    // In the future, it may be worth augmenting the string hash
    // if the lower bits need better distribution.
    int slot = h & (cache.length-1);

    Entry first = this.cache[slot];
    Entry nextToLast = null;

    int chainLength = 0;

    for(Entry e=first; e!=null; e=e.next) {
      if (e.hash == h && (e.str == s || e.str.compareTo(s)==0)) {
      // if (e.str == s || (e.hash == h && e.str.compareTo(s)==0)) {
        return e.str;
      }

      chainLength++;
      if (e.next != null) {
        nextToLast = e;
      }
    }

    // insertion-order cache: add new entry at head
    s = s.intern();
    this.cache[slot] = new Entry(s, h, first);
    if (chainLength >= maxChainLength) {
      // prune last entry
      nextToLast.next = null;
    }
    return s;
}
}

分析
1）构造了一个链表，每个元素为Entry对象，存储了元素的字符串格式的值、
哈希值以及下一个元素

private static class Entry {
    final private String str;
    final private int hash;
    private Entry next;
    private Entry(String str, int hash, Entry next) {
      this.str = str;
      this.hash = hash;
      this.next = next;
    }
}
2）定义了一个cache，实质为一个Entry对象组成的数组，同时包括最大的
长度maxChainLength
   private final Entry[] cache;
private final int maxChainLength;

/**
   * @param tableSize Size of the hash table, should be a power of two.
   * @param maxChainLength Maximum length of each bucket, after which the oldest item inserted is dropped.
   */
3) SimpleStringInterner完成缓存区的建立与申请，其中 tableSize是
哈希表的大小，且以2为倍数，maxChainlenth表示元素的最大数量，
且大于或等于2
public SimpleStringInterner(int tableSize, int maxChainLength) {
    cache = new Entry[Math.max(1,BitUtil.nextHighestPowerOfTwo(tableSize))];
    this.maxChainLength = Math.max(2,maxChainLength);
}

4)
A)
关于@Override
@override有注释文档的作用，可有可无有点像鸡肋

但它对于编程粗心的人可是个很人性化的功能

如果想重写父类的方法，比如toString()方法的话，在被重载的方法前面加上@Override ，这样编译的时候系统可以帮你检查方法的正确性

如下

@Override
public String toString(){...}这是正确的

如果将toString写成tostring

@Override
public String tostring(){...}编译器可以检测出这种写法是错误的,提醒你改正

而如果不加@Override
public String tostring(){...}这样编译器是不会报错的，它会认为是你在类中加的新方法

B)重载StringInterner的intern方法，s.hashCode()调用了String类的哈希方法
Java String类提供hashCode() 方法:
public int hashCode()返回int型(32位)
计算方式为：
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
空字符串的hash值为0
例如，"FB" 和"Ea"有相同的hash值:
70 × 31 + 66 = 69 × 31 + 97

注意^表示按位异或

下面代码首先会计算出字符串的hash值，然后根据hash值在缓存中找到相应位置，可能会
发生多个不同字符串具有同一hash值的情况，因此将这些具有相同hash值的字符串各自组成
链表；然后在链表中找到相关位置，如果找到则返回，无法找到，则增加一个新的节点加入到
链表中，一个比较典型hash桶算法
@Override
public String intern(String s) {
    int h = s.hashCode();
    // In the future, it may be worth augmenting the string hash
    // if the lower bits need better distribution.
//slot为在缓存中的位置
    int slot = h & (cache.length-1);
//找到第一个适合的位置first，注意考虑到不同字符串的hash值相同，因此如果位置已经占用，必须通过e=e.next找到下一个可放置的位置
    Entry first = this.cache[slot];
    Entry nextToLast = null;

    int chainLength = 0;
    //将具有该HASH值的位置做为链表头开始，沿着链表寻找，
    for(Entry e=first; e!=null; e=e.next) {
     //如果字符串已经在缓存中存在，直接返回其值
      if (e.hash == h && (e.str == s || e.str.compareTo(s)==0)) {
      // if (e.str == s || (e.hash == h && e.str.compareTo(s)==0)) {
        return e.str;
      }

      chainLength++;
      //下一位置已经被占用，则将nextToLast标记为当前位置
      if (e.next != null) {
        nextToLast = e;
      }
      //沿着链表继续寻找字符串,找到了返回该字符串，到链表尾仍找不到，e将等于null，进入下一步，构造一个新的节点，将该节点加入链表中
    }

// insertion-order cache: add new entry at head
//    构造一个链表中新节点的值，hash和 next(下一个链接):
new Entry(s, h, first);
将找到的由相同hash值组成的链表的首部做为新节点的next，就是说
将新节点插入到链表首部
// insertion-order cache: add new entry at head
    s = s.intern();
    this.cache[slot] = new Entry(s, h, first);
    if (chainLength >= maxChainLength) {
      // prune last entry
      nextToLast.next = null;
    }
    return s;

1
顶

0
踩

分享到：

lucene3.5之SmallFloat | lucene3.5之ToStringUtils

2012-03-23 10:25
浏览 2149
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论