实现变位词的程序（文件内容排序的实现）

CalvinMnakor

浏览: 52516 次
性别:
来自: 上海

最近访客更多访客>>

poterban

draculav

lirihong

粉墨登场A380

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

编程珠玑随笔

iOS 编程 C C++C#

《编程珠玑》第二章提到的问题C，查找一个单词的变位词，如果，直接全排列，然后再各个比对，那效率很低，书中使用了标签来表示同一类单词，而这一标签就是签名。签名的方式很多，不同签名，不同作用方式。由于变位词是指字母相同，但字母顺序不同的单词。故使用全字母签名。由此提出了“三段式”管道结构，三部分分别加签名，排序以及挤压合并。
第一部分，产生带有签名的词典：

void ExtendTheDictionary()
{
	//extend the dictionary in the dictionary.txt
	char word[WORDMAX],sig[WORDMAX];
	fstream iofile("dictionary.txt",ios::in|ios::out);
	assert(iofile);
	cout<<"input the word..."<<endl;
	while (scanf("%s",word) != EOF)
	{
		UpperToLower(word);
		strcpy(sig,word);
		Signature(sig);
		iofile.seekp(0,ios::end);
		iofile<<sig<<'\t'<<'\t'<<word<<endl;
		//c语言标准库中的排序算法
		//qsort(sig,strlen(sig),sizeof(char),charcomp);
		printf("%s %s\n",sig,word);
	}
	iofile.close();
}

第二部分，按签名排序（方法很简单，也是标题中提到的关于文件内部排序的方法，每次提取出当前未排序的单词文件中最小单词，然后更新未排序单词文件）：

void SortTheDictionary()
{
	char word[WORDMAX],sig[WORDMAX],word1[WORDMAX],sig1[WORDMAX],word2[WORDMAX],sig2[WORDMAX];
	cout<<"be sorting now...";
	//实现签名的排序 并生成新文件
	char *temp = "dictionary.txt";
	ofstream outfile("SortedDictionary.txt",ios::out);
	assert(outfile);
	while (true)
	{
		ifstream tmp1(temp,ios::in|ios::_Nocreate);
		//因为要删除上一个旧文件，需要清空
		ofstream tmp2("temp2.txt",ios::out);
		assert(tmp1);
		assert(tmp2);
		//find the smallest one in the current text
		if(! (tmp1>>sig>>word))  break;
		while (tmp1>>sig1>>word1)
		{
			//将当前文件中最小的签名提取出到最终文件，其他放到临时文件中
			if (strcmp(sig1,sig) > 0)
			{
				tmp2<<sig1<<'\t'<<'\t'<<word1<<endl;
			}
			else
			{
				strcpy(sig2,sig);
				strcpy(word2,word);
				strcpy(sig,sig1);
				strcpy(word,word1);
				tmp2<<sig2<<'\t'<<'\t'<<word2<<endl;
			}

		}
		outfile.seekp(0,ios::end);
		outfile<<sig<<'\t'<<'\t'<<word<<endl;
		tmp1.close();
		tmp2.close();
		//copy剩下未排序的元素
		//temp2 -> temp1
		ifstream f2("temp2.txt",ios::in);
		ofstream f1("temp1.txt",ios::out);
		//assert(f1);
		//assert(f2);
		f2.seekg(0);
		while (f2>>sig>>word)
		{
			f1<<sig<<'\t'<<'\t'<<word<<endl;
		}
		f1.close();
		f2.close();
		//更改下次排序原文件
		temp = "temp1.txt";
	}
	outfile.close();
	cout<<endl<<"√ ok."<<endl;
}

第三部分，合并。此问题解决很简单，比较相邻两单词的签名是否相同，如果是的话，就合并到一行。

	ifstream infile("SortedDictionary.txt",ios::in);
	ofstream outfile("Result.txt",ios::out);
	assert(infile);
	assert(outfile);
	infile>>sig>>word;
	outfile<<sig<<'\t'<<'\t'<<word;
	while (infile>>sig1>>word1)
	{
		if (strcmp(sig1,sig) == 0)
		{
			if (strcmp(word1,word) == 0)
			{
				//same word here
				continue;
			}
			else
			{
				//same signature,different word
				outfile<<'\t'<<'\t'<<word1;
			}
		}
		else
		{
			outfile<<endl;
			outfile<<sig1<<'\t'<<'\t'<<word1;
			strcpy(sig,sig1);
			strcpy(word,word1);
		}
	}
	infile.close();
	outfile.close();

另外，单词查找的话，我编了一个小函数，找到某一单词的同位词。但是，实现上有一些问题，就是，单词本身也会出现在其变位词中，如，输入stop，结果会显示：stop tops pots。还有待进一步改善。

void FindTheShiftingWord()
{
	bool result1 = false; //has this signature
	bool result2 = false; //has shifting words
	char sig[WORDMAX],word[WORDMAX],sig1[WORDMAX],word1[WORDMAX];
	//define the cache for one line
	char temp[WORDMAX*10];
	cout<<"now input the word whose shifting words u wanna find:";
	cin>>word;
	cout<<"result are:"<<endl;
	strcpy(sig,word);
	Signature(sig);
	ifstream infile("Result.txt",ios::in);
	assert(infile);
	//the first word
	infile>>sig1;
	if (strcmp(sig,sig1) == 0)
	{
        result1 = true;
		while (infile.get(word1,WORDMAX,'\n'))
		{
			//and put out all the words
			if (strcmp(word,word1) == 0) continue;
			else 
			{
				cout<<word1<<endl;
				result2 = true;
			}
		}
	}
	else
	{
		infile.seekg(0);
		while(infile.getline(temp,WORDMAX*10))
		{
			//find this line's signature
			infile>>sig1;
			if (strcmp(sig,sig1) == 0)
			{
				//find the same signature
				result1 = true;
				while (infile.get(word1,WORDMAX,'\n'))
				{
					//and put out all the words
					if (strcmp(word,word1) == 0) continue;
					else 
					{
						cout<<word1<<endl;
						result2 = true;
					}
				}
				break;//get out
			}
			else
			{
				//not the same signature
				//the next line for checking out
				continue;
			}
		}
	}
	if (! result1)
	{
		cout<<"？ the dictionary does not have this word!"<<endl;
	}
	if (result1 && !result2)
	{
		cout<<"？ cannot find the same word!"<<endl;
	}
}

上述程序都集合到一个小程序中，可以实现一些功能，如添加单词，扩充词典，查看词典，找变位词等。
此外，在尝试从其他文件导入数据上遇到些麻烦，在这点上还在努力中。

2
顶

0
踩

分享到：

GOOGLE走了？ | 《编程珠玑》二分查找在大量数据中的使用 ...

2010-03-21 22:30
浏览 3413
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论