内存缓存与硬盘缓存访问速度的比较

mozhenghua

浏览: 328782 次
性别:
来自: 杭州

最近访客更多访客>>

huang_love_ok

wang_eye

贝铃-Turing

joechl

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java
io speed
io

内存硬盘访问速度 lucene indexreader randomfileaccess

这两天在为一个应用做solr搜索方案定制的过程中，需要用到solr的fieldcache，在估算fieldcache需要的内存容量，缓存中key是int，value是两个64bit大小的long类型数组，数据量大约是8100w，64×8100w/1024/1024，大致需要10G的容量，

然而服务器总共也只有8G内存，实在无法支持这么大容量的缓存数据。

于是开始想是不是可以有其他的替换的方案，可以不需要使用这么大的缓存容量，有能满足缓存的需要。考虑是不是可以将fieldcache中的数据内存存放到硬盘中去，在调用的时候可以通过key值快速计算出文档中的偏移量从而量数据取出，因为直观感觉只要知道一个文件偏移量而取内存应该是很快的。

光有感觉是不行的，还需要实际测试一下，测试硬盘访问速度到底和内存访问速度相差多大。

初始化测试数据

分别写一个向内存中和向硬盘中写数据的代码,内容如下：

向内存中写

Map<Integer, Long[]> data = new HashMap<Integer, Long[]>();

for (int i = 0; i < 2000000; i++) {
	data.put(i, new Long[] { (long) (i + 1), (long) (i + 2) });
}

向硬盘中写

import java.io.File;
import java.io.RandomAccessFile;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.payloads.PayloadHelper;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class DocReplication {

	public static Analyzer analyzer;
	static {
		analyzer = new StandardAnalyzer(Version.LUCENE_34);
	}

	public static void main(String[] arg) throws Exception {
		RandomAccessFile randomFile = new RandomAccessFile(new File(
				"DocReplication.text"), "rw");

		Directory dir = new SimpleFSDirectory(new File("indexdir"));

		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_34,
				analyzer);
		iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
		IndexWriter writer = new IndexWriter(dir, iwc);

		for (int i = 0; i < 2000000; i++) {
                        // 向一个随机访问文件中写
			randomFile.write(PayloadHelper.encodeInt(i));
			randomFile.write(long2Array(i + 1));
			randomFile.write(long2Array(i + 2));
                        // 向lucene中document中写
			Document doc = new Document();
			doc.add(new Field("id", String.valueOf(i), Field.Store.YES,
					Field.Index.NOT_ANALYZED_NO_NORMS));
			doc.add(new Field("id2", String.valueOf(i), Field.Store.YES,
					Field.Index.NOT_ANALYZED_NO_NORMS));
			writer.addDocument(doc);
			System.out.println("point:" + randomFile.getFilePointer());
		}
		writer.commit();
		writer.close();
		randomFile.close();
	}

	static byte[] long2Array(long val) {

		int off = 0;
		byte[] b = new byte[8];
		b[off + 7] = (byte) (val >>> 0);
		b[off + 6] = (byte) (val >>> 8);
		b[off + 5] = (byte) (val >>> 16);
		b[off + 4] = (byte) (val >>> 24);
		b[off + 3] = (byte) (val >>> 32);
		b[off + 2] = (byte) (val >>> 40);
		b[off + 1] = (byte) (val >>> 48);
		b[off + 0] = (byte) (val >>> 56);
		return b;

	}
}

以上向内存中和向硬盘中写都是写200w条数据，在执行向硬盘中写的过程中分别是向lucene的索引文件和向RandomAccessFile随机文件中写，下面介绍一下用RandomAccessFile写的文件结构，文件中一条记录的数据结构，如图：

一条记录的长度为20字节，只要拿到docid也就是key值就能计算出RandomAccessFile的文件偏移量=docid × 20。

至于为什么要向lucene索引文件中写的原因是，想比较一下通过lucene的 indexread.get(docid) 方法取得到document的fieldvalue 的访问速度，和用RandomAccessFile访问缓存值的速度到底谁更快。

编写读数据测试案例

从自定义随机文件中读取

public static void main(String[] args) throws Exception {

		RandomAccessFile randomFile = new RandomAccessFile(new File(
				"DocReplication.text"), "rw");

		long current = System.currentTimeMillis();

		for (int i = 0; i < 100000; i++) {

			int docid = (int) (Math.random() * 2000000);
			randomFile.seek(docid * 20 + 4);

			randomFile.readLong();
			randomFile.readLong();

		}

		System.out.println((System.currentTimeMillis() - current) / 1000);

		randomFile.close();

	}

从内存中读取

public static void main(String[] args) {
		Map<Integer, Long[]> data = new HashMap<Integer, Long[]>();

		for (int i = 0; i < 2000000; i++) {
			data.put(i, new Long[] { (long) (i + 1), (long) (i + 2) });
		}
		long start = System.currentTimeMillis();
		Long[] row = null;
		long tmp = 0;
		for (int i = 0; i < 100000; i++) {
			int doc = (int) (Math.random() * 2000000);
			row = data.get(doc);
			tmp = row[0];
			tmp = row[1];
		}
		System.out.println((System.currentTimeMillis() - start) );
	}

从lucene索引文件中随机访问

public static void main(String[] args) throws Exception {
	Directory dir = new SimpleFSDirectory(new File("indexdir"));
	long start = System.currentTimeMillis();
	IndexReader reader = IndexReader.open(dir);
	Document doc = null;
	for (int i = 0; i < 100000; i++) {
		int docid = (int) (Math.random() * 2000000);
		doc = reader.document(docid);
		doc.get("id");
		doc.get("id2");
	}

	System.out.println("consume:" + (System.currentTimeMillis() - start)/ 1000);
}

三个测试案例，都是从目标存储中从有200w数据量的cache中随机取出一个key，通过key取到value，这样的过程重复10w次，看看需要花费多少时间。

测试结果：