Trie树(单词查找树或键树)

Dev|il

浏览: 126888 次
性别:
来自: 成都

最近访客更多访客>>

temfulX

vision_xie

xyzstyle

hvgoei

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

数据结构
树

Trie树

Trie树，又称单词查找树或键树，是一种树形结构，是一种哈希树的变种。典型应用是用于统计和排序大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。它的优点是：最大限度地减少无谓的字符串比较，查询效率比哈希表高。
Trie是一颗存储多个字符串的树。相邻节点间的边代表一个字符，这样树的每条分支代表一则子串，而树的叶节点则代表完整的字符串。和普通树不同的地方是，相同的字符串前缀共享同一条分支。还是例子最清楚。给出一组单词，inn, int, at, age, adv, ant, 我们可以得到下面的Trie：

tire树的查找时间复杂度为O(n)
实现代码(借鉴了某位博主的代码):

#include <iostream>
using namespace std;

const int branch = 26; //定义分支

typedef struct Trie_Node{
	bool isStr; //判断此处是否构成一个串
	struct Trie_Node *next[branch]; //指向各个指数的指针
	Trie_Node(): isStr(false)
	{
		memset(next, NULL, sizeof(next)); //对各个分支初始化
	}
}*pTrie_Node;


class Trie{
private:
	pTrie_Node root; //根
public:
	Trie();
	void insert(const char *word);
	bool search(char *word);
	void destory(pTrie_Node root);
};

Trie::Trie()
{
	this->root = new Trie_Node();
}
//插入结点
void Trie::insert(const char *word)
{
	pTrie_Node loc = this->root;
	while(*word)
	{
		if(loc->next[*word - 'a'] == NULL) //不存在则建立
		{
			pTrie_Node tmp = new Trie_Node();
			loc->next[*word - 'a'] = tmp;
		}
		loc = loc->next[*word - 'a'];
		word++;
	}
	loc->isStr = true; //标识此处有个字符串
}

bool Trie::search(char *word)
{
	pTrie_Node loc = this->root;
	while(*word && loc)
	{
		loc = loc->next[*word - 'a'];
		word++;
	}
	return (loc != NULL && loc->isStr);
}

void Trie::destory(pTrie_Node cur)
{
	for(int i = 0; i < branch; i++)
		if(cur->next[i] != NULL)
			destory(cur->next[i]);
	delete cur;
}

int main()
{
	Trie t;
	t.insert("a");
	t.insert("abc");
	if(t.search("abcd"))
		cout<<"存在"<<endl;
	return 0;
}

下面是tire树的题目:
http://acm.hdu.edu.cn/showproblem.php?pid=1251

统计难题

Time Limit: 4000/2000 MS (Java/Others) Memory Limit: 131070/65535 K (Java/Others)
Total Submission(s): 7502 Accepted Submission(s): 2902

Problem Description
Ignatius最近遇到一个难题,老师交给他很多单词(只有小写字母组成,不会有重复的单词出现),现在老师要他统计出以某个字符串为前缀的单词数量(单词本身也是自己的前缀).

Input
输入数据的第一部分是一张单词表,每行一个单词,单词的长度不超过10,它们代表的是老师交给Ignatius统计的单词,一个空行代表单词表的结束.第二部分是一连串的提问,每行一个提问,每个提问都是一个字符串.

注意:本题只有一组测试数据,处理到文件结束.

Output
对于每个提问,给出以该字符串为前缀的单词的数量.

Sample Input
banana
band
bee
absolute
acm

ba
b
band
abc

Sample Output
2
3
1
0

AC代码:

#include <iostream>
using namespace std;

const int branch = 26;

typedef struct trie_Node{
	bool isStr;
	struct trie_Node *next[branch];
	trie_Node():isStr(false)
	{
		memset(next, NULL, sizeof(next));
	}
}*pTrie_Node;

class Trie{
private:
	pTrie_Node root;
public:
	Trie()
	{
		root = new trie_Node();
	}
	void insert(const char *word);
	int search(const char *prefix);
	void find(pTrie_Node cur, int &cnt);
};

void Trie::insert(const char *word)
{
	pTrie_Node loc = root;
	while(*word)
	{
		if(loc->next[*word - 'a'] == NULL)
		{
			pTrie_Node tmp = new trie_Node();
			loc->next[*word - 'a'] = tmp;
		}
		loc = loc->next[*word - 'a'];
		word++;
	}
	loc->isStr = true;
}

int Trie::search(const char *prefix)
{
	pTrie_Node loc = root;
	int cnt = 0;
	while(*prefix && loc)
	{
		loc = loc->next[*prefix - 'a'];
		prefix++;
	}
	if(loc != NULL)
		find(loc, cnt);
	return cnt;
}

void Trie::find(pTrie_Node cur, int &cnt)
{
	if(cur->isStr)
		cnt++;
	for(int i = 0; i < branch; i++)
	{
		if(cur->next[i] != NULL)
			find(cur->next[i], cnt);
	}
}
int main()
{
	char str[15];
	Trie t;
	while(cin.getline(str, 11))
	{
		if(str[0] == '\0')
			break;
		t.insert(str);
	}
	while(cin>>str)
	{
		cout<<t.search(str)<<endl;
	}
	return 0;
}

http://acm.hdu.edu.cn/showproblem.php?pid=2222

Keywords Search

Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 11008 Accepted Submission(s): 3825

Problem Description
In the modern time, Search engine came into the life of everybody like Google, Baidu, etc.
Wiskey also wants to bring this feature to his image retrieval system.
Every image have a long description, when users type some keywords to find the image, the system will match the keywords with description of image and show the image which the most keywords be matched.
To simplify the problem, giving you a description of image, and some keywords, you should tell me how many keywords will be match.

Input
First line will contain one integer means how many cases will follow by.
Each case will contain two integers N means the number of keywords and N keywords follow. (N <= 10000)
Each keyword will only contains characters 'a'-'z', and the length will be not longer than 50.
The last line is the description, and the length will be not longer than 1000000.

Output
Print how many keywords are contained in the description.

Sample Input
1
5
she
he
say
shr
her
yasherhs

Sample Output
3

Author
Wiskey

题意:输出，以上字符串序列在目标串中匹配的个数

#include <iostream>
using namespace std;

const int branch = 26;
const int _N = 10005;


char desc[1000005];
char keyword[55];
/*注意两点:
第一组数据:
abc
abcabcabcabcabc
输入1
第二组数据
2
a
a
a
上面这组数据是输出2，不是1
*/
typedef struct trie_Node{
	bool isStr;
	bool isMatch; //记录是否匹配过
	int cnt; //记录单词的个数
	struct trie_Node *next[branch];
	trie_Node():isStr(false),isMatch(false), cnt(0)
	{
		memset(next, NULL, sizeof(next));
	}
}*pTrie_Node;

class Trie{
private:
	pTrie_Node root;
public:
	Trie()
	{
		root = new trie_Node();
	}
	void insert(const char *word);
	int search(const char *keyword);
};

void Trie::insert(const char *word)
{
	pTrie_Node loc = root;
	while(*word)
	{
		if(loc->next[*word - 'a'] == NULL)
		{
			pTrie_Node tmp = new trie_Node();
			loc->next[*word - 'a'] = tmp;
		}
		loc = loc->next[*word - 'a'];
		word++;
	}
	loc->cnt++;
	loc->isStr = true;
}

int Trie::search(const char *keyword)
{
	int cnt = 0;
	pTrie_Node loc = root;
	while(*keyword && loc)
	{
		loc = loc->next[*keyword - 'a'];
		if(loc != NULL && loc->isStr && !loc->isMatch)
		{
			cnt += loc->cnt;
			loc->isMatch = true;
		}
		keyword++;
	}
	return cnt;
}


int main()
{
	int cas, n, i, cnt, len;
	cin>>cas;
	while(cas--)
	{
		cin>>n;
		Trie t;
		for(i = 0; i < n; i++)
		{
			cin>>keyword;
			t.insert(keyword);
		}
		cin>>desc;
		len = strlen(desc);
		cnt = 0;
		for(i = 0; i < len; i++)
			if(cnt < n)
				cnt += t.search(&desc[i]);
			else
				break;
		cout<<cnt<<endl;
	}
	return 0;
}

AC自动机代码:

#include <iostream>
#include <queue>
using namespace std;

const int branch = 26;
char desc[1000005];
char keyword[55];

typedef struct trie_node{
	bool isword;
	int words;
	struct trie_node *fail;
	struct trie_node *next[branch];
	trie_node():isword(false), fail(NULL), words(0)
	{
		memset(next, NULL, sizeof(next));
	}
}*pTrie_Node;

class ACTrie{
private:
	pTrie_Node root;
public:
	ACTrie()
	{
		root = new trie_node();
	}
	pTrie_Node getRoot()
	{
		return root;
	}
	void insert(const char *word);
	void build_ac_automation();
	int search(const char *desc);
	void destory(pTrie_Node);
};

void ACTrie::insert(const char *word)
{
	pTrie_Node loc = root;
	while(*word && loc)
	{
		if(loc->next[*word - 'a'] == NULL)
			loc->next[*word - 'a'] = new trie_node();
		loc = loc->next[*word - 'a'];
		word++;
	}
	loc->words++;
	loc->isword = true;
}

void ACTrie::build_ac_automation()
{
	int i;
	queue<pTrie_Node> q;
	pTrie_Node p, temp;
	q.push(root);
	while(!q.empty())
	{
		temp = q.front();
		q.pop();
		for(i = 0; i < branch; i++)
			if(temp->next[i])
			{
				if(temp == root)
					temp->next[i]->fail = root;
				else
				{
					p = temp->fail;
					while(p)
					{
						if(p->next[i] != NULL)
						{
							temp->next[i]->fail = p->next[i];
							break;
						}
						p = p->fail;
					}
					if(!p)
						temp->next[i]->fail = root;
				}
				q.push(temp->next[i]);
			}
	}
}

int ACTrie::search(const char *desc)
{
	pTrie_Node loc = root;
	int index, cnt = 0;
	while(*desc)
	{
		index = *desc - 'a';
		while(!loc->next[index] && loc != root)
			loc = loc->fail;
		loc = loc->next[index];
		loc = !loc ? root : loc;
		pTrie_Node temp = loc;
		while(temp != root && temp->isword)
		{
			cnt += temp->words;
			temp->isword = false;
			temp = temp->fail;
		}
		desc++;
	}
	return cnt;
}
void ACTrie::destory(pTrie_Node cur)
{
	for(int i = 0; i < branch; i++)
		if(cur->next[i] != NULL)
			destory(cur->next[i]);
	delete cur;
}
int main()
{
	int cas, n, i;
	cin>>cas;
	while(cas--)
	{
		ACTrie t;
		cin>>n;
		for(i = 0; i < n; i++)
		{
			cin>>keyword;
			t.insert(keyword);
		}
		t.build_ac_automation();
		cin>>desc;
		cout<<t.search(desc)<<endl;
		t.destory(t.getRoot());
	}
	return 0;
}

http://acm.hdu.edu.cn/showproblem.php?pid=1075

What Are You Talking About

Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 102400/204800 K (Java/Others)
Total Submission(s): 5008 Accepted Submission(s): 1500

Problem Description
Ignatius is so lucky that he met a Martian yesterday. But he didn't know the language the Martians use. The Martian gives him a history book of Mars and a dictionary when it leaves. Now Ignatius want to translate the history book into English. Can you help him?

Input
The problem has only one test case, the test case consists of two parts, the dictionary part and the book part. The dictionary part starts with a single line contains a string "START", this string should be ignored, then some lines follow, each line contains two strings, the first one is a word in English, the second one is the corresponding word in Martian's language. A line with a single string "END" indicates the end of the directory part, and this string should be ignored. The book part starts with a single line contains a string "START", this string should be ignored, then an article written in Martian's language. You should translate the article into English with the dictionary. If you find the word in the dictionary you should translate it and write the new word into your translation, if you can't find the word in the dictionary you do not have to translate it, and just copy the old word to your translation. Space(' '), tab('\t'), enter('\n') and all the punctuation should not be translated. A line with a single string "END" indicates the end of the book part, and that's also the end of the input. All the words are in the lowercase, and each word will contain at most 10 characters, and each line will contain at most 3000 characters.

Output
In this problem, you have to output the translation of the history book.

Sample Input
START
from fiwo
hello difh
mars riwosf
earth fnnvk
like fiiwj
END
START
difh, i'm fiwo riwosf.
i fiiwj fnnvk!
END

Sample Output
hello, i'm from mars.
i like earth!

Hint
Huge input, scanf is recommended.

题意:第一个START到END之间是字典表,前面代表原词,后面代表火星文,现在给你一个句子,里面有火星文,请翻译句子

/*
这题要注意一点,在搜索过程中,被搜索的是否是在字典里面是一个单词
比如 在字典里面
like abcabc
而句子里面有一个abc 这样搜索过程中也会输出空白
if(loc != NULL && loc->isword) 
		cout<<loc->wd;
因为这个问题 wa了一次,注意
*/
#include <iostream>
using namespace std;

const int branch = 26;
char st[3005];
typedef struct trie_Node{
	bool isword;
	char wd[15];
	struct trie_Node *next[branch];
	trie_Node() : isword(false)
	{
		memset(next, NULL, sizeof(next));
	}
}*pTrie_Node;

class Trie{
private:
	pTrie_Node root;
public:
	Trie()
	{
		root = new trie_Node();
	}
	void insert(const char *word, const char *oldword);
	bool search(const char *word);
};

void Trie::insert(const char *word, const char *oldword)
{
	pTrie_Node loc = root;
	while(*word && loc)
	{
		if(loc->next[*word - 'a'] == NULL)
		{
			pTrie_Node tmp = new trie_Node();
			loc->next[*word - 'a'] = tmp;
		}
		loc = loc->next[*word - 'a'];
		word++;
	}
	strcpy(loc->wd, oldword);
	loc->isword = true;
}

bool Trie::search(const char *word)
{
	pTrie_Node loc = root;
	while(*word && loc)
	{
		loc = loc->next[*word - 'a'];
		word++;
	}
	if(loc != NULL && loc->isword) 
		cout<<loc->wd;
	else
		return false;
	return true;
}
int main()
{
	Trie t;
	char oldword[15], word[15];
	int i, j, len;
	cin>>oldword;
	while(cin>>oldword)
	{
		if(!strcmp(oldword, "END"))
			break;
		cin>>word;
		t.insert(word, oldword);
	}
	cin>>oldword;
	getchar();
	while(cin.getline(st, 3005))
	{
		if(!strcmp(st, "END"))
			break;
		len = strlen(st);
		word[0] = '\0';
		for(i = 0, j = 0; i < len; i++)
		{
			if(st[i] >= 'a' && st[i] <= 'z')
				word[j++] = st[i];
			else
			{
				word[j] = '\0';
				if(word[0] != '\0')
					if(!t.search(word))
						cout<<word;
				cout<<st[i];
				j = 0;
			}
		}
		cout<<endl;
	}
	return 0;
}

http://acm.hdu.edu.cn/showproblem.php?pid=1671

Phone List

Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 3374 Accepted Submission(s): 1113

Problem Description
Given a list of phone numbers, determine if it is consistent in the sense that no number is the prefix of another. Let’s say the phone catalogue listed these numbers:
1. Emergency 911
2. Alice 97 625 999
3. Bob 91 12 54 26
In this case, it’s not possible to call Bob, because the central would direct your call to the emergency line as soon as you had dialled the first three digits of Bob’s phone number. So this list would not be consistent.

Input
The first line of input gives a single integer, 1 <= t <= 40, the number of test cases. Each test case starts with n, the number of phone numbers, on a separate line, 1 <= n <= 10000. Then follows n lines with one unique phone number on each line. A phone number is a sequence of at most ten digits.

Output
For each test case, output “YES” if the list is consistent, or “NO” otherwise.

Sample Input
2
3
911
97625999
91125426
5
113
12340
123440
12345
98346

Sample Output
NO
YES

题意:大致含义是说输入的串中，任何一个串不能是另外一个串的字串

#include <iostream>
using namespace std;

const int branch = 10;

//注意点:记得每次释放字典树,因为为多个用例

typedef struct trie_Node{
	bool isword;
	struct trie_Node *next[branch];
	trie_Node() : isword(false)
	{
		memset(next, NULL, sizeof(next));
	}
}*pTrie_Node;

class Trie{
private:
	pTrie_Node root;
public:
	Trie()
	{
		root = new trie_Node();
	}
	pTrie_Node getRoot()
	{
		return root;
	}
	void insert(const char *word);
	bool search(pTrie_Node cur);
	void destory(pTrie_Node cur);
};

void Trie::insert(const char *word)
{
	pTrie_Node loc = root;
	while(*word && loc)
	{
		if(loc->next[*word - '0'] == NULL)
		{
			pTrie_Node tmp = new trie_Node();
			loc->next[*word - '0'] = tmp;
		}
		loc = loc->next[*word - '0'];
		word++;
	}
	loc->isword = true;
}

bool Trie::search(pTrie_Node cur)
{
	int i;
	if(cur->isword)
		for(i = 0; i < branch; i++)
			if(cur->next[i] != NULL)
				return false;
	for(i = 0; i < branch; i++)
		if(cur->next[i] != NULL)
			if(!search(cur->next[i]))
				return false;
	return true;
}

void Trie::destory(pTrie_Node cur)
{
	for(int i = 0; i < branch; i++)
		if(cur->next[i] != NULL)
			destory(cur->next[i]);
	delete cur;
}

int main()
{
	int t, n, i;
	char phone[11];
	cin>>t;
	while(t--)
	{
		cin>>n;
		Trie t;
		for(i = 0; i < n; i++)
		{
			cin>>phone;
			t.insert(phone);
		}
		if(!t.search(t.getRoot()))
			cout<<"NO"<<endl;
		else
			cout<<"YES"<<endl;
		t.destory(t.getRoot()); //这句很重要,不然要超内存
	}
	return 0;
}

查看图片附件

分享到：

HDU 2203(亲和串) | HDU2060(Snooker)

2011-09-21 12:56
浏览 2363
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论