- 浏览: 753256 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
lgh1992314:
a offset: 26b offset: 24c offse ...
java jvm字节占用空间分析 -
ls0609:
语音实现在线听书http://blog.csdn.net/ls ...
Android 语音输入API使用 -
wangli61289:
http://viralpatel-net-tutorials ...
Android 语音输入API使用 -
zxjlwt:
学习了素人派http://surenpi.com
velocity宏加载顺序 -
tt5753:
谢啦........
Lucene的IndexWriter初始化时的LockObtainFailedException的解决方法
字典树查找,Trie,又称字典树、单词查找树,是一种树形结构,用于保存大量的字符串。它的优点是:利用字符串的公共前缀来节约存储空间。
package com.jwetherell.algorithms.data_structures; import java.util.Arrays; /** * A trie, or prefix tree, is an ordered tree data structure that is used to * store an associative array where the keys are usually strings. * * == This is NOT a compact Trie. == * * http://en.wikipedia.org/wiki/Trie * * @author Justin Wetherell <phishman3579@gmail.com> */ public class Trie<C extends CharSequence> { private int size = 0; protected INodeCreator creator = null; protected Node root = null; /** * Default constructor. */ public Trie() { } /** * Constructor with external Node creator. */ public Trie(INodeCreator creator) { this.creator = creator; } /** * Create a new node for sequence. * * @param parent node of the new node. * @param character which represents this node. * @param isWord signifies if the node represents a word. * @return Node which was created. */ protected Node createNewNode(Node parent, Character character, boolean isWord) { return (new Node(parent, character, isWord)); } /** * Add sequence to trie. * * @param seq to add to the trie. * @return True if sequence is added to trie or false if it already exists. */ public boolean add(C seq) { return (this.addSequence(seq)!=null); } /** * Add sequence to trie. * * @param seq to add to the trie. * @return Node which was added to trie or null if it already exists. */ protected Node addSequence(C seq) { if (root==null) { if (this.creator==null) root = createNewNode(null, null, false); else root = this.creator.createNewNode(null, null, false); } int length = (seq.length() - 1); Node prev = root; //For each Character in the input, we'll either go to an already define // child or create a child if one does not exist for (int i = 0; i < length; i++) { Node n = null; Character c = seq.charAt(i); int index = prev.childIndex(c); //If 'prev' has a child which starts with Character c if (index >= 0) { //Go to the child n = prev.getChild(index); } else { //Create a new child for the character if (this.creator==null) n = createNewNode(prev, c, false); else n = this.creator.createNewNode(prev, c, false); prev.addChild(n); } prev = n; } //Deal with the first character of the input string not found in the trie Node n = null; Character c = seq.charAt(length); int index = prev.childIndex(c); //If 'prev' already contains a child with the last Character if (index >= 0) { n = prev.getChild(index); //If the node doesn't represent a string already if (n.isWord == false) { //Set the string to equal the full input string n.character = c; n.isWord = true; size++; return n; } else { //String already exists in Trie return null; } } else { //Create a new node for the input string if (this.creator==null) n = createNewNode(prev, c, true); else n = this.creator.createNewNode(prev, c, true); prev.addChild(n); size++; return n; } } /** * Remove sequence from the trie. * * @param sequence to remove from the trie. * @return True if sequence was remove or false if sequence is not found. */ public boolean remove(C sequence) { if (root == null) return false; //Find the key in the Trie Node previous = null; Node node = root; int length = (sequence.length() - 1); for (int i = 0; i <= length; i++) { char c = sequence.charAt(i); int index = node.childIndex(c); if (index >= 0) { previous = node; node = node.getChild(index); } else { return false; } } if (node.childrenSize > 0) { //The node which contains the input string and has children, just NULL out the string node.isWord = false; } else { //The node which contains the input string does NOT have children int index = previous.childIndex(node.character); //Remove node from previous node previous.removeChild(index); //Go back up the trie removing nodes until you find a node which represents a string while (previous != null && previous.isWord==false && previous.childrenSize == 0) { if (previous.parent != null) { int idx = previous.parent.childIndex(previous.character); if (idx >= 0) previous.parent.removeChild(idx); } previous = previous.parent; } } size--; return true; } /** * Get node which represents the sequence in the trie. * * @param seq to find a node for. * @return Node which represents the sequence or NULL if not found. */ protected Node getNode(C seq) { if (root == null) return null; //Find the string in the trie Node n = root; int length = (seq.length() - 1); for (int i = 0; i <= length; i++) { char c = seq.charAt(i); int index = n.childIndex(c); if (index >= 0) { n = n.getChild(index); } else { //string does not exist in trie return null; } } return n; } /** * Does the trie contain the sequence. * * @param seq to locate in the trie. * @return True if sequence is in the trie. */ public boolean contains(C seq) { Node n = this.getNode(seq); if (n==null || !n.isWord) return false; //If the node found in the trie does not have it's string // field defined then input string was not found return n.isWord; } /** * Number of sequences in the trie. * * @return number of sequences in the trie. */ public int size() { return size; } /** * {@inheritDoc} */ @Override public String toString() { return TriePrinter.getString(this); } protected static class Node { private static final int MINIMUM_SIZE = 2; protected Node[] children = new Node[MINIMUM_SIZE]; protected int childrenSize = 0; protected Node parent = null; protected boolean isWord = false; //Signifies this node represents a word protected Character character = null; //First character that is different the parent's string protected Node(Node parent, Character character, boolean isWord) { this.parent = parent; this.character = character; this.isWord = isWord; } protected void addChild(Node node) { if (childrenSize>=children.length) { children = Arrays.copyOf(children, ((children.length*3)/2)+1); } children[childrenSize++] = node; } protected boolean removeChild(int index) { if (index>=childrenSize) return false; children[index] = null; childrenSize--; System.arraycopy(children, index+1, children, index, childrenSize-index); if (childrenSize>=MINIMUM_SIZE && childrenSize<children.length/2) { children = Arrays.copyOf(children, childrenSize); } return true; } protected int childIndex(Character character) { for (int i = 0; i < childrenSize; i++) { Node c = children[i]; if (c.character.equals(character)) return i; } return Integer.MIN_VALUE; } protected Node getChild(int index) { if (index>=childrenSize) return null; return children[index]; } protected int getChildrenSize() { return childrenSize; } /** * {@inheritDoc} */ @Override public String toString() { StringBuilder builder = new StringBuilder(); if (isWord == true) builder.append("Node=").append(isWord).append("\n"); for (int i=0; i<childrenSize; i++) { Node c = children[i]; builder.append(c.toString()); } return builder.toString(); } } protected static interface INodeCreator { /** * Create a new node for sequence. * * @param parent node of the new node. * @param character which represents this node. * @param isWord signifies if the node represents a word. * @return Node which was created. */ public Node createNewNode(Node parent, Character character, boolean type); } protected static class TriePrinter { public static <C extends CharSequence> void print(Trie<C> trie) { System.out.println(getString(trie)); } public static <C extends CharSequence> String getString(Trie<C> tree) { return getString(tree.root, "", null, true); } protected static <C extends CharSequence> String getString(Node node, String prefix, String previousString, boolean isTail) { StringBuilder builder = new StringBuilder(); String string = null; if (node.character!=null) { String temp = String.valueOf(node.character); if (previousString!=null) string = previousString + temp; else string = temp; } builder.append(prefix + (isTail ? "└── " : "├── ") + ((node.isWord == true) ? ("(" + node.character + ") " + string) : node.character) + "\n"); if (node.children != null) { for (int i = 0; i < node.childrenSize - 1; i++) { builder.append(getString(node.children[i], prefix + (isTail ? " " : "│ "), string, false)); } if (node.childrenSize >= 1) { builder.append(getString(node.children[node.childrenSize - 1], prefix + (isTail ? " " : "│ "), string, true)); } } return builder.toString(); } } }
测试代码
private static boolean testTrie() { { long count = 0; long addTime = 0L; long removeTime = 0L; long beforeAddTime = 0L; long afterAddTime = 0L; long beforeRemoveTime = 0L; long afterRemoveTime = 0L; long memory = 0L; long beforeMemory = 0L; long afterMemory = 0L; //Trie. if (debug>1) System.out.println("Trie."); testNames[testIndex] = "Trie"; count++; if (debugMemory) beforeMemory = DataStructures.getMemoryUse(); if (debugTime) beforeAddTime = System.currentTimeMillis(); Trie<String> trie = new Trie<String>(); for (int i=0; i<unsorted.length; i++) { int item = unsorted[i]; String string = String.valueOf(item); trie.add(string); if (validateStructure && !(trie.size()==i+1)) { System.err.println("YIKES!! "+item+" caused a size mismatch."); handleError(trie); return false; } if (validateContents && !trie.contains(string)) { System.err.println("YIKES!! "+string+" doesn't exist."); handleError(trie); return false; } } if (debugTime) { afterAddTime = System.currentTimeMillis(); addTime += afterAddTime-beforeAddTime; if (debug>0) System.out.println("Trie add time = "+addTime/count+" ms"); } if (debugMemory) { afterMemory = DataStructures.getMemoryUse(); memory += afterMemory-beforeMemory; if (debug>0) System.out.println("Trie memory use = "+(memory/count)+" bytes"); } String invalid = INVALID.toString(); boolean contains = trie.contains(invalid); boolean removed = trie.remove(invalid); if (contains || removed) { System.err.println("Trie invalidity check. contains="+contains+" removed="+removed); return false; } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed); if (debug>1) System.out.println(trie.toString()); long lookupTime = 0L; long beforeLookupTime = 0L; long afterLookupTime = 0L; if (debugTime) beforeLookupTime = System.currentTimeMillis(); for (int item : unsorted) { String string = String.valueOf(item); trie.contains(string); } if (debugTime) { afterLookupTime = System.currentTimeMillis(); lookupTime += afterLookupTime-beforeLookupTime; if (debug>0) System.out.println("Trie lookup time = "+lookupTime/count+" ms"); } if (debugTime) beforeRemoveTime = System.currentTimeMillis(); for (int i=0; i<unsorted.length; i++) { int item = unsorted[i]; String string = String.valueOf(item); trie.remove(string); if (validateStructure && !(trie.size()==unsorted.length-(i+1))) { System.err.println("YIKES!! "+item+" caused a size mismatch."); handleError(trie); return false; } if (validateContents && trie.contains(string)) { System.err.println("YIKES!! "+string+" still exists."); handleError(trie); return false; } } if (debugTime) { afterRemoveTime = System.currentTimeMillis(); removeTime += afterRemoveTime-beforeRemoveTime; if (debug>0) System.out.println("Trie remove time = "+removeTime/count+" ms"); } contains = trie.contains(invalid); removed = trie.remove(invalid); if (contains || removed) { System.err.println("Trie invalidity check. contains="+contains+" removed="+removed); return false; } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed); count++; if (debugMemory) beforeMemory = DataStructures.getMemoryUse(); if (debugTime) beforeAddTime = System.currentTimeMillis(); for (int i=unsorted.length-1; i>=0; i--) { int item = unsorted[i]; String string = String.valueOf(item); trie.add(string); if (validateStructure && !(trie.size()==unsorted.length-i)) { System.err.println("YIKES!! "+item+" caused a size mismatch."); handleError(trie); return false; } if (validateContents && !trie.contains(string)) { System.err.println("YIKES!! "+string+" doesn't exists."); handleError(trie); return false; } } if (debugTime) { afterAddTime = System.currentTimeMillis(); addTime += afterAddTime-beforeAddTime; if (debug>0) System.out.println("Trie add time = "+addTime/count+" ms"); } if (debugMemory) { afterMemory = DataStructures.getMemoryUse(); memory += afterMemory-beforeMemory; if (debug>0) System.out.println("Trie memory use = "+(memory/count)+" bytes"); } contains = trie.contains(invalid); removed = trie.remove(invalid); if (contains || removed) { System.err.println("Trie invalidity check. contains="+contains+" removed="+removed); return false; } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed); if (debug>1) System.out.println(trie.toString()); lookupTime = 0L; beforeLookupTime = 0L; afterLookupTime = 0L; if (debugTime) beforeLookupTime = System.currentTimeMillis(); for (int item : unsorted) { String string = String.valueOf(item); trie.contains(string); } if (debugTime) { afterLookupTime = System.currentTimeMillis(); lookupTime += afterLookupTime-beforeLookupTime; if (debug>0) System.out.println("Trie lookup time = "+lookupTime/count+" ms"); } if (debugTime) beforeRemoveTime = System.currentTimeMillis(); for (int i=0; i<unsorted.length; i++) { int item = unsorted[i]; String string = String.valueOf(item); trie.remove(string); if (validateStructure && !(trie.size()==unsorted.length-(i+1))) { System.err.println("YIKES!! "+item+" caused a size mismatch."); handleError(trie); return false; } if (validateContents && trie.contains(string)) { System.err.println("YIKES!! "+string+" still exists."); handleError(trie); return false; } } if (debugTime) { afterRemoveTime = System.currentTimeMillis(); removeTime += afterRemoveTime-beforeRemoveTime; if (debug>0) System.out.println("Trie remove time = "+removeTime/count+" ms"); } contains = trie.contains(invalid); removed = trie.remove(invalid); if (contains || removed) { System.err.println("Trie invalidity check. contains="+contains+" removed="+removed); return false; } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed); //sorted long addSortedTime = 0L; long removeSortedTime = 0L; long beforeAddSortedTime = 0L; long afterAddSortedTime = 0L; long beforeRemoveSortedTime = 0L; long afterRemoveSortedTime = 0L; if (debugMemory) beforeMemory = DataStructures.getMemoryUse(); if (debugTime) beforeAddSortedTime = System.currentTimeMillis(); for (int i=0; i<sorted.length; i++) { int item = sorted[i]; String string = String.valueOf(item); trie.add(string); if (validateStructure && !(trie.size()==(i+1))) { System.err.println("YIKES!! "+item+" caused a size mismatch."); handleError(trie); return false; } if (validateContents && !trie.contains(string)) { System.err.println("YIKES!! "+item+" doesn't exist."); handleError(trie); return false; } } if (debugTime) { afterAddSortedTime = System.currentTimeMillis(); addSortedTime += afterAddSortedTime-beforeAddSortedTime; if (debug>0) System.out.println("Trie add time = "+addSortedTime+" ms"); } if (debugMemory) { afterMemory = DataStructures.getMemoryUse(); memory += afterMemory-beforeMemory; if (debug>0) System.out.println("Trie memory use = "+(memory/(count+1))+" bytes"); } contains = trie.contains(invalid); removed = trie.remove(invalid); if (contains || removed) { System.err.println("Trie invalidity check. contains="+contains+" removed="+removed); return false; } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed); if (debug>1) System.out.println(trie.toString()); lookupTime = 0L; beforeLookupTime = 0L; afterLookupTime = 0L; if (debugTime) beforeLookupTime = System.currentTimeMillis(); for (int item : sorted) { String string = String.valueOf(item); trie.contains(string); } if (debugTime) { afterLookupTime = System.currentTimeMillis(); lookupTime += afterLookupTime-beforeLookupTime; if (debug>0) System.out.println("Trie lookup time = "+lookupTime/(count+1)+" ms"); } if (debugTime) beforeRemoveSortedTime = System.currentTimeMillis(); for (int i=sorted.length-1; i>=0; i--) { int item = sorted[i]; String string = String.valueOf(item); trie.remove(string); if (validateStructure && !(trie.size()==i)) { System.err.println("YIKES!! "+item+" caused a size mismatch."); handleError(trie); return false; } if (validateContents && trie.contains(string)) { System.err.println("YIKES!! "+item+" still exists."); handleError(trie); return false; } } if (debugTime) { afterRemoveSortedTime = System.currentTimeMillis(); removeSortedTime += afterRemoveSortedTime-beforeRemoveSortedTime; if (debug>0) System.out.println("Trie remove time = "+removeSortedTime+" ms"); } contains = trie.contains(invalid); removed = trie.remove(invalid); if (contains || removed) { System.err.println("Trie invalidity check. contains="+contains+" removed="+removed); return false; } else System.out.println("Trie invalidity check. contains="+contains+" removed="+removed); if (testResults[testIndex]==null) testResults[testIndex] = new long[6]; testResults[testIndex][0]+=addTime/count; testResults[testIndex][1]+=removeTime/count; testResults[testIndex][2]+=addSortedTime; testResults[testIndex][3]+=removeSortedTime; testResults[testIndex][4]+=lookupTime/(count+1); testResults[testIndex++][5]+=memory/(count+1); if (debug>1) System.out.println(); } return true; }
发表评论
-
对字符串进行验证之前先进行规范化
2013-09-17 23:18 13947对字符串进行验证之前先进行规范化 应用系统中经常对字 ... -
使用telnet连接到基于spring的应用上执行容器中的bean的任意方法
2013-08-08 09:17 1471使用telnet连接到基于spring的应用上执行容器中 ... -
jdk7和8的一些新特性介绍
2013-07-06 16:07 10110更多ppt内容请查看:htt ... -
java对于接口和抽象类的代理实现,不需要有具体实现类
2013-06-12 09:50 2952原文链接:http://www.javaarch.net/j ... -
Java EE 7中对WebSocket 1.0的支持
2013-06-05 09:27 3837原文链接:http://www.javaarch.n ... -
Java Web使用swfobject调用flex图表
2013-05-28 19:05 1121Java Web使用swfobject调用 ... -
spring使用PropertyPlaceholderConfigurer扩展来满足不同环境的参数配置
2013-05-21 15:57 3332spring使用PropertyPlaceholderCon ... -
java国际化
2013-05-20 20:57 4473java国际化 本文来自:http://www.j ... -
RSS feeds with Java
2013-05-20 20:52 1214RSS feeds with Java 原文来自:htt ... -
使用ibatis将数据库从oracle迁移到mysql的几个修改点
2013-04-29 10:40 1674我们项目在公司的大战略下需要从oracle ... -
线上机器jvm dump分析脚本
2013-04-19 10:48 2905#!/bin/sh DUMP_PIDS=`p ... -
eclipse远程部署,静态文件实时同步插件
2013-04-06 20:18 5460eclipse 远程文件实时同步,eclipse远程 ... -
java价格处理的一个问题
2013-03-26 21:21 1829我们经常会处理一些价格,比如从运营上传的文件中将某 ... -
java 服务降级开关设计思路
2013-03-23 16:35 3764java 服务屏蔽开关系统,可以手工降级服务,关闭服 ... -
poi解析excel内存溢出
2013-03-20 22:21 6397真是悲剧啊,一个破内部使用系统20多个人使用的后 ... -
简单web安全框架
2013-03-16 11:56 1543web安全框架,主要用servlet filter方 ... -
基于servlet的简单的页面缓存框架
2013-03-11 19:27 1216基于servlet的页面级缓存框架的基本用法: 代码参考: ... -
Eclipse使用过程中出现java.lang.NoClassDefFoundError的解决方案
2013-02-01 17:22 1572如果jdk,classpath设置正确,突然在eclipse ... -
jetty对于包的加载顺序的处理
2013-01-28 22:58 41301.问题 今天在本地和测试环境用jet ... -
hsqldb源码分析系列6之事务处理
2013-01-20 15:20 1706在session的 public Result ...
相关推荐
我们将深入探讨Trie树数据结构以及如何使用Java来构建这样的系统。 首先,我们要理解什么是Trie树,也被称为前缀树或字典树。Trie树是一种特殊的树形数据结构,用于存储一个字符串集合。它的特点是每个节点代表一个...
在Java中实现Trie树,首先定义节点类`TrieNode`,包括指向26个子节点的数组(表示26个英文字母),字符数据成员`data`,以及词频计数器`freq`。例如: ```java public class TrieNode { public TrieNode[] ...
本项目聚焦于利用Java实现的双数组Trie树,这是一种在字符串处理和搜索中广泛使用的数据结构。 Trie树,又称“前缀树”或“字典树”,是一种用于存储键值对的数据结构,特别适用于字符串。它的主要特点是能以O(1)的...
一棵用List来存储子结点的字典树——当然,也可以用哈希表等形式存储。 这篇笔记记录了构建思路,文末是源码 一、构建思路 Step1 设计结点——数据结构 Step2 实现相应的操作方法——增删改查 Step1 设计结点 我们...
总的来说,Java实现的字典树TrieTree是一个强大的工具,尤其适用于处理大量字符串数据,如四六级高频词汇的存储和查询。通过理解和运用这种数据结构,我们可以提高算法效率,优化应用程序的性能。
基于Java链表实现的字典树(trie),实现了增删改查等功能,它说摘要必须大于50字我还能说啥啥啥啥
Trie,又称为字典树或前缀树,是一种用于存储关联数组的数据结构,它允许我们高效地进行字符串的查找、插入和删除操作。在Java中实现Trie数据结构,可以帮助我们快速处理大量字符串数据,例如在搜索引擎中进行关键词...
### 关于Trie树 #### 一、Trie树简介 Trie树,又称为前缀树或字典树,是一种用于存储字符串的树形数据结构。Trie树的优势在于可以高效地支持字符串的查找操作,尤其是在处理大量词汇时表现尤为突出。在搜索引擎、...
在IT领域,字典树(Trie,也称为前缀树或字首树)是一种用于存储动态集合或关联数组的数据结构。它以高效的方式处理字符串查询,尤其适用于查找具有共同前缀的单词。字典树的主要优点是它可以快速地通过共享前缀来...
在Java中,Trie树的节点通常由一个枚举类型定义节点种类(如内部节点和叶子节点),并使用一个字符数组来存储指向子节点的指针。以下是一个简化的TrieNode类的示例: ```java enum NodeKind {LN, BN} class ...
可能包含了Trie树的插入、查找、删除等操作的递归版本。 8. **理解递归** - 在学习递归时,重要的是理解递归函数如何工作,包括它的基本情况、递归情况以及如何返回结果。 - 实践是掌握递归的关键,编写递归程序...
总结来说,这个Java敏感词过滤源码是基于DFA和Trie树算法实现的,可以高效地在文本中检测和处理敏感词汇。对于需要在Java环境中进行文本审核或内容过滤的开发者来说,这是一个非常有价值的资源。通过深入理解和修改...
在Java编程中,"制作树TREE"通常是指创建和操作数据结构中的树。树是一种非线性数据结构,由节点...如果需要更深入地学习,可以进一步研究各种特定类型的树,如堆、B树、Trie树等,以及它们在算法和数据处理中的应用。
在这个Java程序中,我们可能会使用Trie树或者KMP算法,因为它们对于大量关键词的查找效率较高。Trie树能够快速定位并匹配关键词,而KMP算法则适用于处理单个字符串的模式匹配问题。 1. **Trie树**:Trie树是一种...
DoubleArrayTrie Java编写的DoubleArrayTrie介绍用法// construct and buildDoubleArrayTrie dat = new DoubleArrayTrie(); for(String word: words) { dat.Insert(word); } System.out.println(dat.Base.length); ...
总结来说,Java实现文章汉字关键词(违禁词)识别需要结合多种数据结构和算法,如哈希表、Trie树、分词库和过滤算法。通过合理的设计和优化,我们可以构建出高效、准确的违禁词检测系统,满足内容审核的需求。
在IT领域,字典树(Trie,也称为前缀树或字典树)是一种用于存储动态集合或关联数组的数据结构。它允许我们快速查找、插入和删除字符串,特别是对于有公共前缀的字符串,效率非常高。这个实例是用Java语言实现的,...
Java字典树实现。 精简优化版处理。适合Java屏蔽字并进行。进行检测并处理的情况
- Trie树(字典树):用于快速查找字符串,常用于自动补全功能。 - B树和B+树:常用于数据库和文件系统的索引结构,能高效地处理大数据量的存储和检索。 4. **Java集合框架中的树结构**: - `java.util.TreeSet`...