- 浏览: 99352 次
- 性别:
- 来自: 深圳
文章分类
最新评论
-
sulanyan29:
您 好,请问下android程序中调用以下两个命令,开启: s ...
linux 防火墙启动、添加规则 -
rainliu:
Can use jrockit monitor for IBM ...
java堆栈溢出 JRockit+Tomcat 实战调试
如果要实现一个能支撑大数据量并发搜索的引擎的关键词匹配,而是需要选择用一种紧凑高效的数据结构来实现,譬如说Trie。下面介绍一下Trie..
Trie,又称单词查找树或键树,是一种树形结构,是一种哈希树的变种。典型应用是用于统计和排序大量的字符串(但不仅限于字符串),所以经常被搜索引擎系统用于文本词频统计。它的优点是:最大限度地减少无谓的字符串比较,查询效率比哈希表高。
散列是一种常见的高效查找方法,它根据数组下标查询,所以速度快。首先根据词表构造散列表,具体来说就是用给定的哈希函数构造词典到数组下标的映射,如果存在冲突,则根据选择的冲突处理方法解决地址冲突。然后可以在哈希表的基础上执行哈希查找。
冲突导致散列性能降低。不存在冲突的散列表叫做完美散列(perfect hash)。整词散列不适合分词的最长匹配查找方式。
数字搜索树采用了逐字散列的方法,可以看成是一种逐字的完美散列。一个数字搜索Trie树(retrieve树)的一个节点只保留一个字符。如果一个单词比一个字符长,则包含第一个字符的节点有指针指向下一个字符的节点,以此类推。这样组成一个层次结构的树,树的第一层包括所有单词的第一个字符,树的第二层包括所有单词的第二个字符,以此类推,数字搜索树的最大高度是词典中最长单词的长度。例如,如下单词序列组成的词典(as at be by he in is it of on or to)会生成如图4-4所示的数字搜索树:
数字搜索树的结构独立于生成数时单词进入的顺序。这里,Trie树的高度是2。因为树的高度很小,在数字搜索Trie树中搜索一个单词的速度很快。但是,这是以内存消耗为代价的,树中的每一个节点都需要很多内存。假设每个词都是由26个小写英文字母中的一个组成的,这个节点中会有26个指针。所以不太可能直接用这样的数字搜索树来存储中文这样的大字符集。
它有3个基本性质:
根节点不包含字符,除根节点外每一个节点都只包含一个字符。
从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串。
每个节点的所有子节点包含的字符都不相同。
这是一个Trie结构的例子: Trie example.svg
在这个Trie结构中,保存了A、to、tea、ted、ten、i、in、inn这8个字符串,仅占用8个字节(不包括指针占用的空间)。
Trie树在实现上有一个树类(SearchTrie)和一个节点类(TrieNode)。SearchTrie的主要方法有两个:
增加单词到搜索树,方法原型是:addWord(String word)。
从文本的指定位置开始匹配单词,方法原型是:matchLong(String text,int offset)。
给定一个固定电话号码,找出这个电话号码对应的区域。固定电话号码都是以0开始的多位数字,可以通过给定电话号码的前缀找出对应的地区,例如:
可以使用数字搜索树算法快速查找电话号码前缀。例如:0 3 7 1 这个区号有四个节点对应,也就是说有4个TrieNode对象。第一个对象中的splitChar属性是0;第二个对象中的splitChar属性是3;第三个对象中的splitChar属性是7;第四个对象中的splitChar属性是1。每个TrieNode对象有10个孩子节点,分别对应孩子节点的splitChar为0到9。
Trie树的节点类定义如下:
查询的过程对于查询词来说,从前往后一个字符一个字符的匹配。对于Trie树来说,是从根节点往下匹配的过程。从给定电话号码搜索前缀的方法如下所示:
考虑把Trie树改成通用的结构,使用散列表存储孩子节点,并使用范型定义值类型。
Java 实现
Trie,又称单词查找树或键树,是一种树形结构,是一种哈希树的变种。典型应用是用于统计和排序大量的字符串(但不仅限于字符串),所以经常被搜索引擎系统用于文本词频统计。它的优点是:最大限度地减少无谓的字符串比较,查询效率比哈希表高。
散列是一种常见的高效查找方法,它根据数组下标查询,所以速度快。首先根据词表构造散列表,具体来说就是用给定的哈希函数构造词典到数组下标的映射,如果存在冲突,则根据选择的冲突处理方法解决地址冲突。然后可以在哈希表的基础上执行哈希查找。
冲突导致散列性能降低。不存在冲突的散列表叫做完美散列(perfect hash)。整词散列不适合分词的最长匹配查找方式。
数字搜索树采用了逐字散列的方法,可以看成是一种逐字的完美散列。一个数字搜索Trie树(retrieve树)的一个节点只保留一个字符。如果一个单词比一个字符长,则包含第一个字符的节点有指针指向下一个字符的节点,以此类推。这样组成一个层次结构的树,树的第一层包括所有单词的第一个字符,树的第二层包括所有单词的第二个字符,以此类推,数字搜索树的最大高度是词典中最长单词的长度。例如,如下单词序列组成的词典(as at be by he in is it of on or to)会生成如图4-4所示的数字搜索树:
数字搜索树的结构独立于生成数时单词进入的顺序。这里,Trie树的高度是2。因为树的高度很小,在数字搜索Trie树中搜索一个单词的速度很快。但是,这是以内存消耗为代价的,树中的每一个节点都需要很多内存。假设每个词都是由26个小写英文字母中的一个组成的,这个节点中会有26个指针。所以不太可能直接用这样的数字搜索树来存储中文这样的大字符集。
它有3个基本性质:
根节点不包含字符,除根节点外每一个节点都只包含一个字符。
从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串。
每个节点的所有子节点包含的字符都不相同。
这是一个Trie结构的例子: Trie example.svg
在这个Trie结构中,保存了A、to、tea、ted、ten、i、in、inn这8个字符串,仅占用8个字节(不包括指针占用的空间)。
Trie树在实现上有一个树类(SearchTrie)和一个节点类(TrieNode)。SearchTrie的主要方法有两个:
增加单词到搜索树,方法原型是:addWord(String word)。
从文本的指定位置开始匹配单词,方法原型是:matchLong(String text,int offset)。
给定一个固定电话号码,找出这个电话号码对应的区域。固定电话号码都是以0开始的多位数字,可以通过给定电话号码的前缀找出对应的地区,例如:
0995:新疆:托克逊县 0856:贵州:铜仁 0996:新疆:焉耆回族自治县
可以使用数字搜索树算法快速查找电话号码前缀。例如:0 3 7 1 这个区号有四个节点对应,也就是说有4个TrieNode对象。第一个对象中的splitChar属性是0;第二个对象中的splitChar属性是3;第三个对象中的splitChar属性是7;第四个对象中的splitChar属性是1。每个TrieNode对象有10个孩子节点,分别对应孩子节点的splitChar为0到9。
Trie树的节点类定义如下:
public static final class TrieNode { protected TrieNode[] children; //孩子节点 protected char splitChar; //分隔字符 protected String area; //电话所属地区信息 /** * 构造方法 * * @param splitchar分隔字符 */ protected TrieNode(char splitchar) { children = new TrieNode[10]; area = null; this.splitChar = splitchar; } } 加载词,形成数字搜索树的方法如下: private void addWord(String string, TSTNode root, String area) { TSTNode tstNode = root; for (int i = 1; i < string.length(); i++) { char c0 = string.charAt(i); int ind = Integer.parseInt(string.substring(i, i + 1)); TSTNode tmpNode = tstNode.children[ind]; if (null == tmpNode) { tmpNode = new TSTNode(c0); } if (i == string.length() - 1) { tmpNode.area = area; } tstNode.children[ind] = tmpNode; tstNode = tmpNode; } }
查询的过程对于查询词来说,从前往后一个字符一个字符的匹配。对于Trie树来说,是从根节点往下匹配的过程。从给定电话号码搜索前缀的方法如下所示:
public String search(String tel) { TrieNode tstNode = root; for (int i = 1; i < tel.length(); i++) {//从前往后一个字符一个字符的匹配 tstNodetstNode = tstNode.children[(tel.charAt(i)-'0')]; if (null != tstNode.area) { return tstNode.area; } } return null;//没找到 }
考虑把Trie树改成通用的结构,使用散列表存储孩子节点,并使用范型定义值类型。
public class TrieNode<T> { private Character splitChar; //分隔字符 private T nodeValue; //值信息 private Map<Character, TrieNode<T>> children = new HashMap <Character, TrieNode<T>>(); //孩子节点 }
Java 实现
/** * Description: This program implements a trie that allows to search words. It * reads words from a file specified by the user in the command line * and constructs the dictionary. It allows user to find, print, add, * and delete words from the trie. It folds all inpput into lower-case * before storing them or looking them up. It also ignores characters * that are not letters and only stores words that are three letters * long. A reference pointer is used, when finding a word, the reference * pointer will point to this word, when printing it will point to the * last printed word, and when deleting it will be reset to the trie root. * * This class implements a dictionary tree in a structure called trie. * We use a representation called "Leftmost-Child-Right-Sibling" * representation. A node points to a list of its children and it points * to the leftmost child in the list, and each child in the list points * to the sibling to the right of it. We add one extra node at the end * of the list to implement a pointer back to the parent, using the child * pointer in that extra node. The root does not represent any letter, and * the last sibling on each list (that points to the parent) has a right * sibling pointer set to null. * * There is a pointer to the current position in the trie, this position * is updated when we search a word to point to the start of the found * word, when printing it points to the starting position of last printed * word, and when deleting it resets position reference to the root. Finding * and adding work in the same way, checking first if child node match letter * if not then checking with siblings, until a match is found. Printing starts * from the starting node position of the word, it goes to the end of the sibling * list and then up to the parent node, gets its letter and continues until it * get to the root. Deleting starts also from the node starting position of the * word and continuous up the tree in the same as printing, but checking on its * way if the nodes are still useful, if not it removes reference to these nodes. * The basic structure of the trie is a node that contains a specific letter of a * word, a child pointer, a sibling pointer and a boolean vallue indicating if * this is the starting position of a word. The construction of the trie is done * by reading the input file one line at a time and using the add word method. * * Input: It receives as input in the command line the name of the filename to * be used as input in constructing the trie. * Later it receives input telling what operation to perform: add, delete * or find a word, or print n number of words * Output: It prints out wether the operation requested by the user was succesful * or not, and the output requested by each command. * * Status: This class works according to program specifications */ import java.io.*; // For console input public class Trie { public Node root; // Trie root public Node current; // Current Position Inside the Trie /** Array of letters that are used in comparison operations in order to * know which letter is currently being stored into the trie */ private String[] letters = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"}; /** Stores the ocurrence of each letter read into the trie, the counter * stored at each index corresponds to the letter stored at the same index * in array letters */ private double[] letterCount = new double[26]; /** Stores the percentage of ocurrence of each letter in the trie, the * precentage stored at each index corresponds to the letter stored at the * same index in array letters */ private double[] percentageOfLetter = new double[26]; // Total number of letters read into the trie private double totalLetterCount; // Numbers used in calculating a random number to find a random letter private final double max = 100.0; private final double min = 0.0; /** * Declares an empty trie */ public Trie() { Node root = new Node(); } /** * Build the dictionary trie reading the input from a given filename * @param An array of strings with the command line arguments * @return None */ public Trie(String[] commandLine) throws IOException { root = new Node(); // Declare a new trie BufferedReader reader; // To read from a file String input; // Store line of input // Check if filename was declared and if it exists checkIfValidFilename( commandLine ); // Declare in which file the input is in reader = new BufferedReader( new FileReader ( commandLine[0] ) ); // Declare in which file the input is in //reader = new BufferedReader( new FileReader ( "words" ) ); // Read each line of the input file while ( (input = reader.readLine() ) != null) { // Add word to trie addWord(input); } // current position in trie is the root current = root; // Calculate percentage of ocurrence of letters calculatePercentages(); } /** * Description: Node * This class implement a node structure which is used as the * structural components of the trie */ public class Node{ private String letter; // Letter hold in the node boolean isAWord; // Indicates wether this node represents starting positionof a word private Node next; // Leftmost child pointer private Node sibling; // Right sibling pointer /** * Creates a new node * @param None * @return None */ Node () { letter = null; isAWord = false; next = sibling = null; } /** * Creates a new node with given letter and boolean value * @param A string for the letter hold in node * A boolean value to check if this the start of a word * @return None */ Node (String s) { letter = s; isAWord = false; next = sibling = null; } } // Check if the given letter is in the next level of the trie from the given node pointer // Returns boolean if it finds it, else returns null public Node findLetter(Node crtPosition, String letter) { Node pointer = crtPosition; // Initialize pointer for position refernce in trie // Word is not in tree if still looking for word letters and there aren't any more if (pointer.next == null) { return null; } else { // There is a child from the current node // Check if next node contains the current letter we're looking for if ( pointer.next.letter.equals(letter) ) { pointer = pointer.next; // go to next child } else { // Check if a sibling contains the same letter pointer = pointer.next; // go to start of node sibling list // Loop until a node with the same letter is found or sibling list ends while ( pointer.sibling.sibling != null && !pointer.sibling.letter.equals(letter) ) { pointer = pointer.sibling; } // If no sibling with same letter is found, word isn't in trie if ( pointer.sibling.sibling == null ) { return null; // String is not in tree } else { // A sibling has the letter we're looking for, continue pointer = pointer.sibling; // One sibling has same letter } } } return pointer; } /** * Find a given word in the given trie * @param A string to look up in the trie * The root of the trie in where to look up the word * @return A pointer to the node with the starting position (letter) of the word found * If word is not found, it returns a null pointer */ public Node findStr(String word) { Node pointer = root; // Initialize pointer for position refernce in trie String input = word; // Word to look up // Start a while lopp searching the chars of the word in trie starting backwards while (input.length() >= 1) { // Get the last letter in the word String firstChar = input.substring( 0, 1 ); input = input.substring( 1, input.length() ); // Update string minus last letter if ( !Character.isLetter(firstChar.charAt(0)) ) { // If char is not a letter skip it continue; } // Word is not in tree if still looking for word letters and there aren't any more if (pointer.next == null) { return null; } else { // There is a child from the current node // Check if next node contains the current letter we're looking for if ( pointer.next.letter.equals(firstChar) ) { pointer = pointer.next; // go to next child } else { // Check if a sibling contains the same letter pointer = pointer.next; // go to start of node sibling list // Loop until a node with the same letter is found or sibling list ends while ( pointer.sibling.sibling != null && !pointer.sibling.letter.equals(firstChar) ) { pointer = pointer.sibling; } // If no sibling with same letter is found, word isn't in trie if ( pointer.sibling.sibling == null ) { return null; // String is not in tree } else { // A sibling has the letter we're looking for, continue pointer = pointer.sibling; // One sibling has same letter } } } } // All letters from word match, check if current pointer reference is start of word if ( pointer.isAWord == true) { current = pointer; return pointer; // Return pointer to start of found word } else { // Letters matched, but it not a word in the tree return null; // String is not in tree } } /** * Print n number of words starting from a given reference in the trie * @param The root of the trie * The current reference pointer in the trie * The number of words to be printed * @return A pointer reference to the last word printed */ public Node printNextWords( int n) { Node pointer = current; if (root.next == null) { // check if Dictinary has no words System.out.println(" Trie is Empty"); return root; } System.out.println(); // Start loop to print n number of words while ( n > 0) { // Check if we are at the end of the trie or there is no child if ( pointer.next == null || pointer.next == root) { if ( pointer.next == root) { // We are at the end of the trie System.out.println(">> End of Trie <<"); pointer = root; // Reset the position reference break; // Stop printing words } else { // Pointer has no child pointer = pointer.sibling; // Goto pointer sibling // If we are at the end of the list, back up to parent node next sibling while ( pointer.sibling == null ) { if ( pointer.next == root ) { // We are at the end of the trie break; // Stop printing words } else { // Back up to parent node next sibling, list in above level pointer = pointer.next.sibling; } } if ( pointer.isAWord == true) { // If pointer is start of word, prin it printWord(pointer); n--; // Decrement words to print counter } } } else { // Child of current reference pointer is not null pointer = pointer.next; if ( pointer.isAWord == true) { // If pointer is start of word, prin it printWord(pointer); n--; // Decrement words to print counter } } } current = pointer; return pointer; // Pointer to last word printed } /** * Delete a given word from given trie * @param String to be found and deleted * The root from the given trie * @return A pointer to the next closest parent from the current nodes deleted * If word to be deleted is not found, returns a null pointer */ public Node deleteWord(String lookUpWord) { Node pointer = findStr( lookUpWord); // Check if word is in tree Node tempPointer = null; // Extra reference pointer if (pointer == null) { // Word is not in tree return null; } else { // Word is in trie, delete it if ( pointer.next != null ) { // Word starting position has children nodes pointer.isAWord = false; // No longer represents a word return pointer; } else { // Word starting position has no children nodes tempPointer = pointer; while (true) { // Loop until all parts of the word not used by others are deleted do { // Go to the parent of the sibling list node of current pointer tempPointer = tempPointer.sibling; } while ( tempPointer.sibling != null ); // Goto last sibling tempPointer = tempPointer.next; // Goto parent // Check if our node is inmmediate child & has no children or siblings if ( tempPointer.next == pointer && pointer.sibling.sibling == null) { tempPointer.next = null; // Delete reference to child if ( tempPointer == root) { // We have arrived to root, no more to delete break; } // if parent is not the start of a word, it needs to be deleted if ( tempPointer.isAWord == false ) { pointer = tempPointer; // Point now to the parent continue; } else { // parent is a word, can not delete anymore nodes break; } } else { // Current pointer has sibling nodes // check if child is pointer first in sibling list, if ( tempPointer.next == pointer ) { // update the child to be the pointer's sibling tempPointer.next = tempPointer.next.sibling; break; } tempPointer = tempPointer.next; // Pointer is not first sibling // Find node previous to our start of word node while ( tempPointer.sibling != pointer ) { tempPointer = tempPointer.sibling; } // Update reference of previous node to pointer sibling tempPointer.sibling = pointer.sibling; break; } } } } return tempPointer; } /** * Add a word to the trie * @param The root of the trie * The string to be added * @return None */ public Node addWord(String input) { Node pointer = root; // Remove characters that are not letters input = removeNonLetterChars(input); // The words in this trie have to be at least 3 characters long if (input.length() < 3 ) { return pointer; } input = input.toLowerCase(); // Convert string to lower case while (input.length() >= 1) { // Get the first letter of the word String firstChar = input.substring( 0, 1 ); // Update string minus first letter input = input.substring( 1, input.length() ); // If char is not a letter skip it if ( !Character.isLetter(firstChar.charAt(0)) ) { continue; } // Keep track of letters read into the trie countLetter(firstChar); if (pointer.next == null) { // No more child nodes, We need to add a new child Node newChild = new Node(firstChar); // Set child to be new node pointer.next = newChild; // Create the last sibling in list pointing to parent Node lastSibling = new Node(); lastSibling.sibling = null; lastSibling.next = pointer; newChild.sibling = lastSibling; pointer = newChild; // Update pointer } else { // There is still children nodes if ( pointer.next.letter.equals(firstChar) ) { // node has same letter, go to next child pointer = pointer.next; } else { // Child has not the same letter we're looking for pointer = pointer.next; // Check if there's a sibling with same letter while ( pointer.sibling.sibling != null && !pointer.sibling.letter.equals( firstChar ) ) { pointer = pointer.sibling; } // Check if we got tothe last sibling, node matching letter was not found if (pointer.sibling.sibling == null) { // Add new node at end of sibling list Node newSibling = new Node(firstChar); newSibling.sibling = pointer.sibling; pointer.sibling = newSibling; pointer = newSibling; } else { //There is a node matching the letter we're looking for // update pointer to this position pointer = pointer.sibling; } } } } if ( pointer == root ) { // All characters in word were not a letter return null; // No word will be added } pointer.isAWord = true; return pointer; } /** * Print a word that that starts at the given pointer reference * @param A node referencce pointer * @return None */ public void printWord(Node pointer) { String word = ""; // Declaration of string to hold word to print do { word = pointer.letter + word; // Store next character of word do { // Go to the end of the sibling list pointer = pointer.sibling; } while ( pointer.sibling != null ); pointer = pointer.next; // Go to the parent of the node } while (pointer.sibling != null ); // Stop when we get to the root System.out.println(word); // Print word } /** * Get a random letter contained inside the trie based on how often it * occurs in the trie * @ param None * @ return A string containing a letter */ public String getRandomLetter() { /** The method used to get arandom letter is based on the percentage of * how many times it was read into the trie. It gets a random number * between 0.0 and 100.00, it adds the percentages of letters starting * from "a" to "z" until a number equal or bigger than the random * number is found, then we take the letter where this occurred as the * random letter. */ String randomString = null; double temp = 0.0; // Get a random number between 0.0 and 100.0 double randomNumber = Math.floor (Math.random() * ( max - min + 1) ) + min; // Add the ocurrence percentages of each letter starting from "a" to "z" for (int i = 0; i < percentageOfLetter.length; i++) { // Store the percentage sum temp = percentageOfLetter[i] + temp; // Stop when the sum is equal or bigger than random number if ( temp >= randomNumber ) { randomString = letters[i]; break; } } // Return letter where the sum isequal or bigger than random number return randomString; } /** * Count the number of times aeach letter is read in the trie * @ param A string containing the letter to be counted * @ return None */ public void countLetter(String letter) { String temp; // Temporary string holder // Compare letter to array containing alphabet letters for (int i = 0; i < letters.length; i++ ) { temp = letters[i]; // If a match is found if ( temp.equals( letter ) ) { // Increase counter of that particular letter letterCount[i]++; // Increase counter of total letters read totalLetterCount++; } } } /** * Calculate the percentage of ocurrance of all letters read into trie * @ param None * @ return None */ public void calculatePercentages() { // Valculate the percentage of ocurrence of each letter for (int i = 0; i < percentageOfLetter.length; i++) { // Math formula: Percentage ox X = times X occurs * 100 / total percentageOfLetter[i] = (letterCount[i] * 100) / totalLetterCount; } } /** * Checks if a file was specified by the user in the inputline to use as input to * build the trie, and if it was declared, it checks it exists. * @param A string array containing the arguments from the command line * @return None */ public static void checkIfValidFilename( String[] commandLine) throws IOException { int numberOfArgs = commandLine.length; // Number of arguments in command line // Check if an input file was actually declared if ( numberOfArgs < 1 ) { System.out.println( " No Input File Declared." ); System.out.println( " Ending Program . . . \n" ); System.exit(0); // End program if no file was declared } try { // Check if the file actually exists BufferedReader reader = new BufferedReader( new FileReader ( commandLine[0] ) ); } catch (FileNotFoundException e) { System.out.println( " Input Filename Not Found" ); System.out.println( " Ending Program . . . \n" ); System.exit(0); // End program if no file was declared } } /** * Removes the non letter characters from the received String and returns that string * @ param A string * @ return A string with the non letter characters removed */ public String removeNonLetterChars(String word) { // Temporary string will hold string with no letter chars String tempWord = ""; // Traverse the entire string while ( word.length() >=1 ) { // Get the first letter of the word String firstChar = word.substring(0, 1); // Update string minus first letter word = word.substring(1, word.length() ); //Add character to the new word only if it is a letter if ( Character.isLetter( firstChar.charAt(0) ) ) { tempWord = tempWord + firstChar; } } // Returned string with no letter chars return tempWord; } /** * Check if number in a given string is really a number and bigger than zero * @param A string containing a numerical value * @return A boolean value, true f string is a number and bigger than zero * false otherwise */ public boolean isValidNumber(String str) { char[]c = str.toCharArray(); // An array of the chrarcter of the string // Check that every character is a number for ( int i = 0; i < str.length(); i ++ ) { if ( !Character.isDigit( c[i] ) ) { System.out.println(" Argument Is Not A Number"); printInvalidCommand(); return false; } } int number = Integer.parseInt( str ); // Check if the number is bigger than zero if ( number < 1 ) { System.out.println(" Number not valid: Smaller than zero"); printInvalidCommand(); return false; } return true; } /** * Print a invalid command message * @param None * @return None */ public void printInvalidCommand() { System.out.println("Not Valid Command. Please Try Again. "); } }
发表评论
-
java堆栈溢出 JRockit+Tomcat 实战调试
2012-07-24 10:19 27991. JRockit简介 Jrockit是Bea开发的符合J ... -
java.lang.OutOfMemoryError: unable to create new native thread
2012-02-12 16:09 3603今天系统突然收到错误日志: Feb 12, 2012 1:28 ... -
Lucene 分词解读(二)--Analyzer
2011-09-19 17:33 1380Lucene中的Analyzer 为了更好地搜索中文,在Lu ... -
Lucene写自己的Analyzer
2011-09-19 17:32 1473实现一个简单的分析器(Analyzer)的例子如下所示:] ... -
Lucene 分词解读(一)
2011-09-19 17:31 1018Lucene中的中文分词 Lucene中处理中文的常用方法有 ... -
三叉Trie树
2011-09-19 17:30 1262在一个三叉搜索树(Tern ... -
三叉Trie树
2011-09-13 16:20 6在一个三叉搜索树(Tern ... -
Lucene写自己的Analyzer
2011-09-13 15:57 44实现一个简单的分析器(Analyzer)的例子如下所示:] ... -
Lucene 分词解读(二)--Analyzer
2011-09-13 15:51 9Lucene中的Analyzer 为了更好地搜索中文,在Lu ... -
Lucene 分词解读(一)
2011-09-13 15:46 1010Lucene中的中文分词 Lucene中处理中文的常用方法有 ... -
大并发搜索下值得考虑的一种数据结构---Trie
2011-09-12 23:42 0如果要实现一个能支撑大数据量并发搜索的引擎,一般不会采用luc ... -
cannot make any changes to the index (it was opened with readOnly = true)
2011-09-10 13:13 1437在调用IndexReader.open(final Di ... -
nginx 301 重定向 包括域名、目录、文件等方法 (二)
2011-09-09 14:24 10609nginx rewrite 伪静态配置参数详细说明 正则表达 ... -
nginx 301 重定向 包括域名、目录、文件等方法 (一)
2011-09-09 14:15 1473在网站建设中需要网页 ... -
查看Oracle表空间大小的方法
2011-09-08 10:16 861Oracle表空间大小的查看方法应该是我们都需要掌握的知识,下 ...
相关推荐
Trie,又称为前缀树或字典树,是一种用于存储动态集合或关联数组的树形数据结构。它的主要特点是通过键的前缀来组织节点,从而快速进行前缀查询和模糊搜索。在Go语言中实现Trie,可以有效地支持字符串集合的增删查改...
Trie,也被称为字典树或前缀树,是一种用于存储字符串的数据结构,尤其适合进行高效的字符串查找。它的主要特点是能通过字典序的方式快速地查找具有相同前缀的字符串。Trie树的每个节点通常包含一个字符,并且有指向...
本项目聚焦于利用Java实现的双数组Trie树,这是一种在字符串处理和搜索中广泛使用的数据结构。 Trie树,又称“前缀树”或“字典树”,是一种用于存储键值对的数据结构,特别适用于字符串。它的主要特点是能以O(1)的...
而Trie树(也称前缀树)是一种用于高效存储和检索具有公共前缀的字符串的数据结构。 描述中的“一个简单而快速的Go库,用于将输入字符串模糊匹配到目标字符串列表”进一步强调了库的设计目标:简洁性和高性能。这...
2. **Trie树(前缀树)**:构建一个Trie树,通过前缀匹配的方式快速判断输入字符串是否包含敏感词。Trie树适合处理大量且有公共前缀的敏感词,查找速度快且节省空间。 3. **动态规划**:可以使用动态规划算法,如...
在IT领域,数据结构是构建高效算法的基础,而Trie(又称字典树或前缀树)作为一种特殊的数据结构,广泛应用于字符串相关的搜索和查找问题。`rust_sequence_trie`项目是一个用Rust编程语言实现的Trie数据结构库,其...
- **堆**:一种特殊的树形数据结构,满足最大堆或最小堆性质,常用于优先队列和堆排序。 - **哈希表**:通过哈希函数快速查找元素,提供O(1)的平均查找时间,用于构建索引和缓存。 - **树**:包括二叉树、平衡树...
3. Trie树(字典树):Trie树是一种高效的前缀搜索数据结构,用于存储大量关键词,通过遍历树结构快速找到与输入字符匹配的关键词。 4. 缓存策略:为了提高响应速度,搜索提示系统通常会缓存近期或高频的搜索词,...
- Trie树(也称为前缀树或字典树)是一种字符串检索数据结构,用于高效地查找具有公共前缀的字符串。 - 在Trie树中,每个节点代表一个字符串前缀,叶子节点代表完整字符串。 - 适用于关键词搜索、自动补全等功能...
1. **Trie树**:Trie树是一种字符串查找的数据结构,每个节点代表一个前缀,通过节点间的连接表示字符串的关联。在Cleo中,Trie树用于存储关键词,当用户输入部分字符时,可以迅速找到所有匹配的完整词汇。Trie树的...
LPM算法的基本思想是通过构建一种数据结构,如字典树(Trie)、AC自动机(Aho-Corasick Automaton)或者Patricia树,来快速查找给定字符串的最长前缀。在字典树中,每个节点代表一个字符串的前缀,而边则表示字符的...
综上所述,TSS算法是一种高效的包分类方法,它利用了Trie数据结构的优势,使得在大量规则下,数据包的分类速度得以显著提高。通过理解和实现TSS算法,我们可以更好地理解网络流量管理的底层机制,以及如何通过编程...
Trie树是一种字符串检索的数据结构,它通过节点之间的链接来表示前缀关系。每个节点包含一个字符,从根节点到任意节点的路径上经过的字符组合构成了一个字符串。在Go-go-wordsfilter中,每个敏感词都会被插入到这个...
**数据结构选择**:考虑到空间和时间效率,这里建议采用**Trie树**(前缀树)来存储字典中的所有单词。Trie树能够有效地支持快速查询以及插入操作。 2. **查询流程**: - 将字典中的每个单词插入到Trie树中。 - ...
在IT领域,自动完成系统是一种常见且至关重要的功能,它被广泛应用于各种场景,如电话簿搜索、字典查询、IP路由、最长前缀匹配以及命令行自动完成等。本项目"Auto-Completion-System-design"专注于这些领域的系统...
最长前缀匹配(Longest Prefix Match, LPM)是路由查找中的核心算法,它在IP网络中用于确定数据包应被转发到哪个下一跳,基于其目的地IP地址的最长匹配前缀。这个存储库,"LPM-Algorithms-Linux-XIA",包含了多种...
它使用了先进的数据结构和算法,如B树和Trie树,确保了数据操作的高效性和一致性。 Go-MemDB的主要组件包括: 1. 数据表(Table):这是存储数据的基本单元,类似于传统数据库的表格。每个数据表都有自己的索引,...
5. **性能优化**:为了提高效率,可以考虑使用多线程处理大文本,或者使用并发数据结构如ConcurrentHashMap来存储敏感词库。 6. **异常处理**:在读取文件或处理用户输入时,应添加适当的错误处理机制,以应对可能...
5. `unittest_mtrie.exe`:这可能是针对M-trie数据结构的单元测试,M-trie是一种优化的前缀树,用于快速查找和匹配前缀相同的元素,例如在网络地址中。 6. `unittest_udp_address.exe`:测试UDP地址处理的单元测试...
这些数据结构能够快速地查找和匹配前缀,以生成实时的搜索建议。 2. **模糊匹配**:为了提供更准确的建议,Cleo可能实现了模糊匹配算法,如Levenshtein距离或Jaccard相似度,允许用户在输入不完全准确的情况下也能...