- 浏览: 235176 次
- 性别:
- 来自: 杭州
-
文章分类
最新评论
搜索引擎之索引建立原理
<p><span style=""></span>
</p>
<p></p>
<p></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 18pt;"><span style="">我们假设有这样一张商业信息数据表,我们用什么样的算法和结构(索引技术)可以快速方便实现搜索功能呢?</span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sid</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Subject</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Type</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Price</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">id</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Description</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">PostTime</span></span></p>
</td>
</tr>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">0</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sell apple</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-family: Times New Roman;"><span style="font-size: 9pt;" lang="EN-US">Sale</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">50</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">101</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US">We are one of the largest …</span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2002-10-02</span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy Digital Camera</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">200</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">102</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US">Our company mobile-point is situated in the …</span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-02-10</span></span></p>
</td>
</tr>
<tr style="height: 13.5pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sell dog shoes</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-family: Times New Roman;"><span style="font-size: 9pt;" lang="EN-US">Sale</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">45</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">103</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Dog Shoes Model Number: 02GLPS074 Place of …</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-12-18</span></span></p>
</td>
</tr>
<tr style="height: 17.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">3</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy Toys And Jewelry</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">1000</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">104</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">I am interested in items that can be ordered in small …</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-08-20</span></span></p>
</td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="">(商业信息数据表)</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">一般我们需要的查询的需求分为下面两类:</span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;">a.<span style='font: 7pt "Times New Roman";'><span style="font-size: small;"> </span></span></span></span></span><span style="">希望通过某个关键字找到所需信息(根据关键字来搜索)</span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;">b.<span style='font: 7pt "Times New Roman";'><span style="font-size: small;"> </span></span></span></span></span><span style="">希望能够根据</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">type, price, PostTime, id</span></span><span style="">等字段进行检索(单字段信息检索)</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span></strong><strong style=""><span style="">.建立索引</span></strong><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span></span></span></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">对于</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">a</span></span><span style="">需求:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">我们使用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;"> hashtable</span></span><span style="">来达到快速检索的目的,既将可能用于查询的关键字作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">hashtable</span></span><span style="">的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key(</span></span><span style="">主键</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span><span style="">,跟该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">有关的记录信息(我们称为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">)作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">对应的值。主键</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">来源于用于关键字查询的字段的文本(这些字段一般都要求是文本类型),一般一段文本可以分解出许多</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">,我们把分解的过程称为分词,分解出来的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">称为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">。每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">是用于关键字查询的最小单位。</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">一般会记录该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">的所在的文档</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ID(Doc Number), </span></span><span style="">所在文本的位置</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(Prox)</span></span><span style="">,在文本中出现的次数(</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Freq</span></span><span style="">),所在的字段等等。详细结构如下:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; width: 576px; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15.45pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 91.35pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="122" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token Size</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 34.65pt; padding-top: 0cm; border-bottom: #ece9d8; height: 15.45pt; background-color: transparent;" rowspan="2" width="46" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small; font-family: Times New Roman;"></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">DocFreq</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 81pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="108" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">DocList(</span></span><span style="">链表</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126.2pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ProxDelta(</span></span><span style="">文件位置指针</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
</tr>
<tr style="height: 12pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 12pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 12pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 306.2pt; padding-top: 0cm; height: 12pt; background-color: transparent;" colspan="7" width="408" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="">(</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">)</span></span></p>
</td>
</tr>
<tr style="height: 18.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 18.75pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 18.75pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="background-color: transparent; border: #ece9d8; padding: 0cm;" colspan="8" width="454">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small; font-family: Times New Roman;"></span></p>
</td>
</tr>
<tr style="height: 14.8pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 34.65pt; padding-top: 0cm; border-bottom: #ece9d8; height: 14.8pt; background-color: transparent;" rowspan="2" width="46" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small; font-family: Times New Roman;"></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 72pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="96" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Doc Number</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 63pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token Freq</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Field Bit</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 117.2pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="156" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="">下一个节点内容</span><span style=""><span style="font-family: Times New Roman;"> <span lang="EN-US">…</span></span></span></span></p>
</td>
</tr>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 306.2pt; padding-top: 0cm; height: 15pt; background-color: transparent;" colspan="7" width="408" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(DocList</span></span><span style="">链表</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
</tr>
<tr height="0">
<td style="background-color: transparent; border: #ece9d8;" width="49"></td>
<td style="background-color: transparent; border: #ece9d8;" width="73"></td>
<td style="background-color: transparent; border: #ece9d8;" width="46"></td>
<td style="background-color: transparent; border: #ece9d8;" width="60"></td>
<td style="background-color: transparent; border: #ece9d8;" width="36"></td>
<td style="background-color: transparent; border: #ece9d8;" width="36"></td>
<td style="background-color: transparent; border: #ece9d8;" width="48"></td>
<td style="background-color: transparent; border: #ece9d8;" width="60"></td>
<td style="background-color: transparent; border: #ece9d8;" width="12"></td>
<td style="background-color: transparent; border: #ece9d8;" width="156"></td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(Token HashTable)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ProxDelta(</span></span><span style="">文件位置指针</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; width: 576px; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 13.5pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">N0</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 38.55pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="51" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,0)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 43.95pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="59" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 25.5pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="34" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,n0-1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">N1</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 38.55pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="51" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,0)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27.75pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="37" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 59.7pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="80" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,n1-1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
</tr>
<tr style="height: 17.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 189pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" colspan="5" width="252" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">第</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span><span style="">个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">doc</span></span><span style="">中该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">出现的位置情况</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 198pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" colspan="5" width="264" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">第</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">2</span></span><span style="">个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">doc</span></span><span style="">中该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">出现的位置情况</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(</span></span><span style="">详细的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">结构图</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">查询的时候我们会用同样的分词算法把用户输入的关键分解为多个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token,</span></span><span style="">每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">都去找这个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable, </span></span><span style="">然后将每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">查到的结果集进行合并,返回给用户。</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">对于</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">b</span></span><span style="">需求:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">一般字段有下面几种类型:</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT, STRING, ENUM, RANGE, NUMBER, BIT, DATE</span></span><span style="">等。一般我们是将</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT, STRING, NUMBER</span></span><span style="">类型的字段采用上述</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">的方法建立索引,只不过他们的分词方法是不一样的,其中</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT</span></span><span style="">类型的字段是要进行分词的,</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">STRING</span></span><span style="">类型的字段是不需要分词的,整体作为一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">,</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">NUMBER</span></span><span style="">类型的字段是将字段转化为数字作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">,这样可以节省空间;我们将</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ENUM, RANGE, BIT</span></span><span style="">类型的字段采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">的方法建立索引,下面具体说明</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">的索引结构(以</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Type</span></span><span style="">字段为例):</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<div>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 7.2pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 7.2pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Type Value</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 7.2pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Bit Map</span></span></span></p>
</td>
</tr>
<tr style="height: 15.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 15.25pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Buy</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 15.25pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 1010</span></span></span></p>
</td>
</tr>
<tr style="height: 14.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 14.75pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="font-family: Times New Roman;"><span style="" lang="EN-US">Sale</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 14.75pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0101</span></span></span></p>
</td>
</tr>
<tr style="height: 15.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 15.75pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 15.75pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … … …</span></span></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span><span style=""></span>^doc#16<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">我们再举另外一个例子(用结构体来表示):</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">假定我们有两个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">enum</span></span><span style="">类型的字段,每一个类型有几个可能的值,共</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">20</span></span><span style="">条记录:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Field:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Country - China, Hong Kong, Japan, Korea, USA</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""></span><span style=""> </span>Color - Blue ,Red, White</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Records:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-blue, Hong Kong-blue, Japan-red, China-red, China-white,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>0-0<span style=""> </span><span style=""></span>1-0<span style=""> </span>2-1<span style=""> </span>0-1<span style=""> </span><span style=""></span><span style=""></span>0-2</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>USA-red, Korea-white, China-white, China-white, USA-blue,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>4-1<span style=""> </span>3-2<span style=""> </span>0-2<span style=""> </span>0-2<span style=""> </span>4-0</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-red, USA-red, USA-blue, Hong Kong-white, Japan-Red,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>0-1<span style=""> </span>4-1<span style=""> </span>4-0<span style=""> </span>1-2<span style=""> </span><span style=""></span><span style=""></span>2-1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-red, China-Red, China-BLUE, China-white, China-RED,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>0-1<span style=""> </span><span style=""></span><span style=""></span>0-1<span style=""> </span>0-0<span style=""> </span>0-2<span style=""> </span>0-1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;"> </span></span></span><span style="">这些数据就可以用下面的结构体来表示:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>SEnumDesc {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfFields = 2;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pField[0] = "Country", pField[1] = "Color";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pValues[0] = {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfValues = 5;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[0] = "china", pVal[1] = "hong kong", pVal[2] = "japan",</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[3] = "korea", pVal[4] = "usa";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>},</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pValues[1] = {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfValues = 3;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[0] = "blue", pVal[1] = "red", pVal[2] = "white";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>};</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfDocs = 20;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pBitmap[0] {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[0][0] = xxxx xxxx xxxx 1111 1000 0101 1001 1001;<span style=""> </span>// china</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[1][0] = xxxx xxxx xxxx 0000 0010 0000 0000 0010;<span style=""> </span>// hong kong</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[2][0] = xxxx xxxx xxxx 0000 0100 0000 0000 0100;<span style=""> </span>// japan</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[3][0] = xxxx xxxx xxxx 0000 0000 0000 0100 0000;<span style=""> </span>// korea</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>pMap[4][0] = xxxx xxxx xxxx 0000 0001 1010 0010 0000;<span style=""> </span>// usa</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>//<span style=""> </span>^doc#20<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pBitmap[1] {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[0][0] = xxxx xxxx xxxx 0010 0001 0010 0000 0011;<span style=""> </span>// blue</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[1][0] = xxxx xxxx xxxx 1001 1100 1100 0010 1100;<span style=""> </span>// red</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[2][0] = xxxx xxxx xxxx 0100 0010 0001 1101 0000;<span style=""> </span>// white</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 21pt; text-indent: 21pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">//<span style=""> </span>^doc#20<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 18pt;"><span style="">如果一个字段是</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Range</span></span><span style="">类型的,或者搜索的时候是要根据一个范围来查找的,比如说</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">price</span></span><span style="">字段,每一条记录都一个价格值,但是搜索的时候我们一般是根据一个价格范围来查找,对于这样的字段我们建立索引的时候也是采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构,既先自己定义几个范围,一个记录该字段的值属于哪个范围,我们就将位置上设</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span><span style="">。以</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">price</span></span><span style="">字段为例。</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<div>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Range</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Bit Map</span></span></span></p>
</td>
</tr>
<tr style="height: 6.65pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 6.65pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;"><50</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 6.65pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0100</span></span></span></p>
</td>
</tr>
<tr style="height: 13.95pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 13.95pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">[50,100)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 13.95pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0001</span></span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">[100,500)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0010</span></span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">>=500</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 1000</span></span></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>^doc#16<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;"> </span></span></span><span style="">对于</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Bit</span></span></strong><span style="">类型的字段同样也是采用</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span></strong><span style="">的结构,这里就不在阐述了。</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构的好处是节省空间,结果集合逻辑运算简单快速,但并不是所有</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Enum</span></span><span style="">类型的字段都采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构,当枚举值大于</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">32</span></span></strong><span style="">个的时候,采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">就不方便了,这个时候我们会采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">的结构来建立索引。</span></p>
<p></p>
</p>
<p></p>
<p></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 18pt;"><span style="">我们假设有这样一张商业信息数据表,我们用什么样的算法和结构(索引技术)可以快速方便实现搜索功能呢?</span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sid</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Subject</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Type</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Price</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">id</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Description</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">PostTime</span></span></p>
</td>
</tr>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">0</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sell apple</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-family: Times New Roman;"><span style="font-size: 9pt;" lang="EN-US">Sale</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">50</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">101</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US">We are one of the largest …</span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2002-10-02</span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy Digital Camera</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">200</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">102</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US">Our company mobile-point is situated in the …</span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-02-10</span></span></p>
</td>
</tr>
<tr style="height: 13.5pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sell dog shoes</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-family: Times New Roman;"><span style="font-size: 9pt;" lang="EN-US">Sale</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">45</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">103</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Dog Shoes Model Number: 02GLPS074 Place of …</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-12-18</span></span></p>
</td>
</tr>
<tr style="height: 17.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">3</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy Toys And Jewelry</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">1000</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">104</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">I am interested in items that can be ordered in small …</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-08-20</span></span></p>
</td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="">(商业信息数据表)</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">一般我们需要的查询的需求分为下面两类:</span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;">a.<span style='font: 7pt "Times New Roman";'><span style="font-size: small;"> </span></span></span></span></span><span style="">希望通过某个关键字找到所需信息(根据关键字来搜索)</span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;">b.<span style='font: 7pt "Times New Roman";'><span style="font-size: small;"> </span></span></span></span></span><span style="">希望能够根据</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">type, price, PostTime, id</span></span><span style="">等字段进行检索(单字段信息检索)</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span></strong><strong style=""><span style="">.建立索引</span></strong><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span></span></span></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">对于</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">a</span></span><span style="">需求:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">我们使用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;"> hashtable</span></span><span style="">来达到快速检索的目的,既将可能用于查询的关键字作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">hashtable</span></span><span style="">的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key(</span></span><span style="">主键</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span><span style="">,跟该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">有关的记录信息(我们称为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">)作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">对应的值。主键</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">来源于用于关键字查询的字段的文本(这些字段一般都要求是文本类型),一般一段文本可以分解出许多</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">,我们把分解的过程称为分词,分解出来的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">称为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">。每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">是用于关键字查询的最小单位。</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">一般会记录该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">的所在的文档</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ID(Doc Number), </span></span><span style="">所在文本的位置</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(Prox)</span></span><span style="">,在文本中出现的次数(</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Freq</span></span><span style="">),所在的字段等等。详细结构如下:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; width: 576px; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15.45pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 91.35pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="122" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token Size</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 34.65pt; padding-top: 0cm; border-bottom: #ece9d8; height: 15.45pt; background-color: transparent;" rowspan="2" width="46" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small; font-family: Times New Roman;"></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">DocFreq</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 81pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="108" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">DocList(</span></span><span style="">链表</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126.2pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ProxDelta(</span></span><span style="">文件位置指针</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
</tr>
<tr style="height: 12pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 12pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 12pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 306.2pt; padding-top: 0cm; height: 12pt; background-color: transparent;" colspan="7" width="408" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="">(</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">)</span></span></p>
</td>
</tr>
<tr style="height: 18.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 18.75pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 18.75pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="background-color: transparent; border: #ece9d8; padding: 0cm;" colspan="8" width="454">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small; font-family: Times New Roman;"></span></p>
</td>
</tr>
<tr style="height: 14.8pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 34.65pt; padding-top: 0cm; border-bottom: #ece9d8; height: 14.8pt; background-color: transparent;" rowspan="2" width="46" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small; font-family: Times New Roman;"></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 72pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="96" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Doc Number</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 63pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token Freq</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Field Bit</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 117.2pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="156" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="">下一个节点内容</span><span style=""><span style="font-family: Times New Roman;"> <span lang="EN-US">…</span></span></span></span></p>
</td>
</tr>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 306.2pt; padding-top: 0cm; height: 15pt; background-color: transparent;" colspan="7" width="408" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(DocList</span></span><span style="">链表</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
</tr>
<tr height="0">
<td style="background-color: transparent; border: #ece9d8;" width="49"></td>
<td style="background-color: transparent; border: #ece9d8;" width="73"></td>
<td style="background-color: transparent; border: #ece9d8;" width="46"></td>
<td style="background-color: transparent; border: #ece9d8;" width="60"></td>
<td style="background-color: transparent; border: #ece9d8;" width="36"></td>
<td style="background-color: transparent; border: #ece9d8;" width="36"></td>
<td style="background-color: transparent; border: #ece9d8;" width="48"></td>
<td style="background-color: transparent; border: #ece9d8;" width="60"></td>
<td style="background-color: transparent; border: #ece9d8;" width="12"></td>
<td style="background-color: transparent; border: #ece9d8;" width="156"></td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(Token HashTable)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ProxDelta(</span></span><span style="">文件位置指针</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; width: 576px; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 13.5pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">N0</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 38.55pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="51" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,0)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 43.95pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="59" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 25.5pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="34" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,n0-1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">N1</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 38.55pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="51" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,0)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27.75pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="37" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 59.7pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="80" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,n1-1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
</tr>
<tr style="height: 17.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 189pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" colspan="5" width="252" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">第</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span><span style="">个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">doc</span></span><span style="">中该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">出现的位置情况</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 198pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" colspan="5" width="264" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">第</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">2</span></span><span style="">个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">doc</span></span><span style="">中该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">出现的位置情况</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(</span></span><span style="">详细的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">结构图</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">查询的时候我们会用同样的分词算法把用户输入的关键分解为多个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token,</span></span><span style="">每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">都去找这个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable, </span></span><span style="">然后将每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">查到的结果集进行合并,返回给用户。</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">对于</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">b</span></span><span style="">需求:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">一般字段有下面几种类型:</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT, STRING, ENUM, RANGE, NUMBER, BIT, DATE</span></span><span style="">等。一般我们是将</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT, STRING, NUMBER</span></span><span style="">类型的字段采用上述</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">的方法建立索引,只不过他们的分词方法是不一样的,其中</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT</span></span><span style="">类型的字段是要进行分词的,</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">STRING</span></span><span style="">类型的字段是不需要分词的,整体作为一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">,</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">NUMBER</span></span><span style="">类型的字段是将字段转化为数字作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">,这样可以节省空间;我们将</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ENUM, RANGE, BIT</span></span><span style="">类型的字段采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">的方法建立索引,下面具体说明</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">的索引结构(以</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Type</span></span><span style="">字段为例):</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<div>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 7.2pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 7.2pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Type Value</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 7.2pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Bit Map</span></span></span></p>
</td>
</tr>
<tr style="height: 15.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 15.25pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Buy</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 15.25pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 1010</span></span></span></p>
</td>
</tr>
<tr style="height: 14.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 14.75pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="font-family: Times New Roman;"><span style="" lang="EN-US">Sale</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 14.75pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0101</span></span></span></p>
</td>
</tr>
<tr style="height: 15.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 15.75pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 15.75pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … … …</span></span></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span><span style=""></span>^doc#16<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">我们再举另外一个例子(用结构体来表示):</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">假定我们有两个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">enum</span></span><span style="">类型的字段,每一个类型有几个可能的值,共</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">20</span></span><span style="">条记录:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Field:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Country - China, Hong Kong, Japan, Korea, USA</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""></span><span style=""> </span>Color - Blue ,Red, White</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Records:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-blue, Hong Kong-blue, Japan-red, China-red, China-white,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>0-0<span style=""> </span><span style=""></span>1-0<span style=""> </span>2-1<span style=""> </span>0-1<span style=""> </span><span style=""></span><span style=""></span>0-2</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>USA-red, Korea-white, China-white, China-white, USA-blue,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>4-1<span style=""> </span>3-2<span style=""> </span>0-2<span style=""> </span>0-2<span style=""> </span>4-0</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-red, USA-red, USA-blue, Hong Kong-white, Japan-Red,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>0-1<span style=""> </span>4-1<span style=""> </span>4-0<span style=""> </span>1-2<span style=""> </span><span style=""></span><span style=""></span>2-1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-red, China-Red, China-BLUE, China-white, China-RED,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>0-1<span style=""> </span><span style=""></span><span style=""></span>0-1<span style=""> </span>0-0<span style=""> </span>0-2<span style=""> </span>0-1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;"> </span></span></span><span style="">这些数据就可以用下面的结构体来表示:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>SEnumDesc {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfFields = 2;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pField[0] = "Country", pField[1] = "Color";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pValues[0] = {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfValues = 5;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[0] = "china", pVal[1] = "hong kong", pVal[2] = "japan",</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[3] = "korea", pVal[4] = "usa";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>},</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pValues[1] = {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfValues = 3;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[0] = "blue", pVal[1] = "red", pVal[2] = "white";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>};</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfDocs = 20;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pBitmap[0] {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[0][0] = xxxx xxxx xxxx 1111 1000 0101 1001 1001;<span style=""> </span>// china</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[1][0] = xxxx xxxx xxxx 0000 0010 0000 0000 0010;<span style=""> </span>// hong kong</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[2][0] = xxxx xxxx xxxx 0000 0100 0000 0000 0100;<span style=""> </span>// japan</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[3][0] = xxxx xxxx xxxx 0000 0000 0000 0100 0000;<span style=""> </span>// korea</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>pMap[4][0] = xxxx xxxx xxxx 0000 0001 1010 0010 0000;<span style=""> </span>// usa</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>//<span style=""> </span>^doc#20<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pBitmap[1] {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[0][0] = xxxx xxxx xxxx 0010 0001 0010 0000 0011;<span style=""> </span>// blue</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[1][0] = xxxx xxxx xxxx 1001 1100 1100 0010 1100;<span style=""> </span>// red</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[2][0] = xxxx xxxx xxxx 0100 0010 0001 1101 0000;<span style=""> </span>// white</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 21pt; text-indent: 21pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">//<span style=""> </span>^doc#20<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 18pt;"><span style="">如果一个字段是</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Range</span></span><span style="">类型的,或者搜索的时候是要根据一个范围来查找的,比如说</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">price</span></span><span style="">字段,每一条记录都一个价格值,但是搜索的时候我们一般是根据一个价格范围来查找,对于这样的字段我们建立索引的时候也是采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构,既先自己定义几个范围,一个记录该字段的值属于哪个范围,我们就将位置上设</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span><span style="">。以</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">price</span></span><span style="">字段为例。</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<div>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Range</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Bit Map</span></span></span></p>
</td>
</tr>
<tr style="height: 6.65pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 6.65pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;"><50</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 6.65pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0100</span></span></span></p>
</td>
</tr>
<tr style="height: 13.95pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 13.95pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">[50,100)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 13.95pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0001</span></span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">[100,500)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0010</span></span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">>=500</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 1000</span></span></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>^doc#16<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;"> </span></span></span><span style="">对于</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Bit</span></span></strong><span style="">类型的字段同样也是采用</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span></strong><span style="">的结构,这里就不在阐述了。</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构的好处是节省空间,结果集合逻辑运算简单快速,但并不是所有</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Enum</span></span><span style="">类型的字段都采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构,当枚举值大于</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">32</span></span></strong><span style="">个的时候,采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">就不方便了,这个时候我们会采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">的结构来建立索引。</span></p>
<p></p>
相关推荐
处理网页阶段,搜索引擎会对抓取的数据进行一系列预处理,包括关键词提取、建立索引、去重、分词(对于中文网站尤为重要)、分析超链接,以及评估网页的重要性或丰富度。这些工作旨在提高搜索效率和结果的相关性。 ...
标题中的“用C语言写的C搜索引擎含多种建立索引的方式”揭示了这是一个使用C语言编写的搜索引擎项目,其中包含了多种创建索引的技术。这可能是针对文本文件或网页内容的搜索功能,通过索引来提高搜索效率。C语言因其...
搜索引擎是信息检索领域的重要工具,其核心在于倒排索引的构建。倒排索引是一种高效的数据结构,用于快速定位到包含特定查询词的文档。在这个项目中,我们使用简单的C语言来实现这一过程,这对于初学者理解搜索引擎...
《Google搜索引擎原理》这篇文章探讨了Google搜索引擎的设计与实现,它是搜索引擎领域的里程碑之作,特别适合初学者了解搜索引擎的基本概念和技术挑战。Google搜索引擎在处理超文本信息方面表现出色,其索引的网页...
在实际应用中,搜索引擎的工作流程主要包括网页抓取(爬虫)、索引建立以及查询处理三个核心步骤。 #### 二、网页抓取(爬虫) 网页抓取是搜索引擎工作的第一步,它主要依靠网络爬虫(Spider)来完成。网络爬虫是...
总结来说,Lucene搜索引擎的基本工作原理包括建立倒排索引、处理用户查询以及返回相关性最高的结果。同时,Lucene还支持与目录索引的集成,适应各种搜索场景。通过理解这些原理,开发者可以更好地利用Lucene构建高效...
然后,在索引阶段,搜索引擎将信息进行分类整理,建立搜索引擎数据库。在查询阶段,用户可以输入关键字,并通过搜索引擎的查询结果,获得相关信息。 搜索引擎的语法规则也非常重要。使用逻辑操作符可以帮助用户组织...
接下来是索引阶段,搜索引擎将爬取的网页内容进行预处理,包括分词、去除停用词、建立倒排索引等。倒排索引是一种高效的数据结构,它将每个单词映射到包含该单词的文档列表,从而加快搜索速度。在C#中,可以使用...
《搜索引擎原理、实践与应用》是一本深入探讨搜索引擎技术的权威资料,涵盖了从基础理论到实际操作的全方位知识。在互联网信息爆炸的时代,搜索引擎作为获取信息的重要工具,其工作原理和优化策略对于开发者、研究...
这一步骤中,搜索引擎会对网页内容进行分析,提取关键词,并建立索引库,以便于后续的快速检索。 3. **排名算法**:当用户提交查询请求时,搜索引擎会根据用户的查询词在索引库中查找匹配的网页,并通过复杂的算法对...
3. 建立索引:倒排索引是搜索引擎的核心,它将每个单词映射到包含该单词的文档列表,加快搜索速度。 4. 查询处理:用户输入查询后,搜索引擎解析查询,匹配索引,并返回相关度最高的结果。 5. 排名算法:如...
首先,搜索引擎的工作原理可以概括为四个基本步骤:文档收集、用户信息需求分析、文档表示和查询处理。例如,以谷歌为例,它通过爬虫技术遍历互联网上的网页,将这些网页存储在一个庞大的数据库中。当用户输入查询时...
随着互联网的快速发展和信息量的爆炸式增长,搜索引擎已成为获取信息的重要工具之一。《搜索引擎:原理、技术与系统》这本书由李晓明、闫宏飞和王继民共同撰写,通过详细介绍搜索引擎的工作原理、关键技术及其系统...
倒排索引是实现这种搜索引擎的关键技术,它极大地优化了文本匹配和搜索过程。在这个主题中,我们将深入探讨倒排索引的概念、工作原理以及在Python中的实现。 **倒排索引概念** 倒排索引(Inverted Index)是一种...
4. **建立索引(Indexing)**:经过预处理后,搜索引擎将网页内容转化为可搜索的索引结构,如倒排索引,使得用户查询时能快速定位到相关信息。 5. **查询处理(Query Processing)**:当用户输入查询时,搜索引擎会...
在早期的互联网时代,这样的系统是搜索引擎技术的基础,它们通过建立文本索引来加速信息查找。在这个过程中,我们将学习如何为一个文本文件创建索引,并实现简单的搜索功能。 首先,我们需要理解搜索引擎的基本原理...
一个大规模搜索引擎的检索系统需要处理海量网页数据,建立索引,并提供高效的检索服务。在这样的背景下,倒排文件作为一种简单高效的索引技术被广泛应用。然而,在中文搜索引擎中,使用自动分词进行全文检索时,会...
元搜索引擎(Meta Search Engine)是一种特殊的搜索引擎,它不直接抓取网页信息,也不建立自己的索引数据库。相反,当用户提交查询请求时,元搜索引擎会在多个不同的搜索引擎上同时执行搜索,并将这些搜索引擎返回的...