2006年百度之星程序设计大赛试题-百度语言翻译机(解答)

全部 Ruby Python PHP Flash C++ .net Rails Flex C C# Django

浏览 12028 次

锁定老帖子主题：2006年百度之星程序设计大赛试题-百度语言翻译机(解答) 该帖已经被评为良好帖
作者	正文
zuroc 等级: 性别: 文章: 138 积分: 561 来自: 江苏	发表时间：2007-05-13 相关推荐: 【C++】第13章：类继承【C++】详谈explicit关键字背后的隐式类型转换和编译器优化 C++中的explicit关键字（入门自用）C++ 类与对象（中）构造/析构/拷贝构造函数+运算符重载 C++学习5-explicit/深拷贝/浅拷贝/拷贝构造/析构更多相关推荐 C++ 题目+我的解答打包下载 http://www.cppblog.com/Files/zuroc/06_baidustar_translator.zip 题目: 百度语言翻译机百度的工程师们是非常注重效率的，在长期的开发与测试过程中，他们逐渐创造了一套独特的缩略语。他们在平时的交谈、会议，甚至在各种技术文档中都会大量运用。为了让新员工可以更快地适应百度的文化，更好地阅读公司的技术文档，人力资源部决定开发一套专用的翻译系统，把相关文档中的缩略语和专有名词翻译成日常语言。输入要求：输入数据包含三部分： 1. 第一行包含一个整数N(N＜=10000)，表示总共有多少个缩略语的词条； 2. 紧接着有N行的输入，每行包含两个字符串，以空格隔开。第一个字符串为缩略语（仅包含大写英文字符，长度不超过10字节），第二个字符串为日常语言（不包含空格，长度不超过255字节）； 3. 从第N+2开始到输入结束为包含缩略语的相关文档（总长度不超过1000000个字节）。例： 6 PS 门户搜索部 NLP 自然语言处理 PM 产品市场部 HR 人力资源部 PMD 产品推广部 MD 市场发展部百度的部门包括PS，PM，HR，PMD，MD等等，其中PS还包括NLP小组。样例：in.txt 输出要求：输出将缩略语转换成日常语言后的文档。（将缩略语转换成日常语言，其他字符保留原样）。例：百度的部门包括门户搜索部，产品市场部，人力资源部，产品推广部，市场发展部等等，其中门户搜索部还包括自然语言处理小组。样例：out.txt 评分规则： 1．程序将运行在一台Linux机器上（内存使用不作严格限制），在每一测试用例上运行不能超过10秒，否则该用例不得分； 2．要求程序能按照输入样例的格式读取数据文件，按照输出样例的格式将运行结果输出到标准输出上。如果不能正确读入数据和输出数据，该题将不得分； 3．该题目共有4个测试用例，每个测试用例为一个输入文件。各测试用例占该题目分数的比例分别为25%，25%，25%，25%； 4．该题目20分。注意事项： 1．输入数据是中英文混合的，中文采用GBK编码。 GBK：是又一个汉字编码标准，全称《汉字内码扩展规范》。采用双字节表示，总体编码范围为 8140-FEFE，首字节在 81-FE 之间，尾字节在40-FE 之间，排除xx7F。总计 23940 个码位，共收入 21886 个汉字和图形符号，其中汉字（包括部首和构件）21003 个，图形符号 883 个。 2．为保证答案的唯一性，缩略语的转换采用正向最大匹配（从左到右为正方向）原则。请注意样例中PMD的翻译。 cpp 代码 /* 我的思路 1.缩略语 vector＜ string > //用来保存缩略语按string的length排序,来满足"缩略语的转换采用正向最大匹配". 2.一次性的进行文本替换,以防止替换内容再次被替换 map＜pair＜int,int>,string> //位置范围-缩略语 vector＜pair＜int,int>> //保存位置范围 map＜string,string> //缩略语 / #include ＜fstream> #include ＜sstream> #include ＜iostream> #include ＜vector> #include ＜map> #include ＜list> #include ＜string> #include ＜algorithm> #include ＜functional> using namespace std; #define BEG_END(c) (c.begin()),(c.end()) typedef string::size_type str_size; /* 转换string为指定的类型 / template＜typename Target, typename Source> Target lexical_cast(const Source& arg) { Target result; istringstream(arg)>>result; return result; } vector＜str_size> find_all(const string& source , const string& aim) { vector＜str_size> poses; str_size pos=0; str_size aim_len=aim.size(); while ( (pos=source.find(aim, pos)) != string::npos) { poses.push_back(pos); pos += aim_len; } return poses; } bool is_long(const string& a , const string& b) { return a.length()>b.length(); } bool is_first_small(const pair＜str_size,str_size>& a , const pair＜str_size,str_size>& b) { return a.first＜b.first; } template＜class T,class I> bool not_in_scope(I begin,const I& end,const T& aim) { for (;begin!=end;++begin) { if ( (aim>=(begin->first) ) && (aim＜= (begin->first+begin->second) ) )return false; } return true; } int main() { string infile_name="in.txt" , outfile_name="out.txt"; ofstream outfile(outfile_name.c_str()); //ostream& outfile = cout; ifstream infile(infile_name.c_str()); if (!infile) { cerr＜＜"Error : can't open input file "＜＜infile_name＜＜" .\n"; return -1; } string line; vector＜string> abbr_dict; map＜string,string> abbr_word; getline(infile,line); for (int i=lexical_cast＜int>(line);i!=0;--i) { getline(infile,line); string abbr,word; istringstream(line)>>abbr>>word; abbr_dict.push_back(abbr); abbr_word[abbr]=word; //cout＜＜abbr＜＜' '＜＜word＜＜'\n'; } sort(BEG_END(abbr_dict),is_long); while (getline(infile,line)) { typedef vector＜pair＜str_size,str_size> > replace_scope; replace_scope to_replace_scope; map＜pair＜str_size,str_size>,string> to_replace; for ( vector＜string>::iterator i=abbr_dict.begin(),end=abbr_dict.end(); i!=end; ++i ) { vector＜str_size> poses=find_all(line,i); str_size aim_len=i->size(); for (vector＜str_size>::iterator j=poses.begin(),end=poses.end();j !=end;++j) { pair＜str_size,str_size> scope=make_pair(j,aim_len); if (not_in_scope(BEG_END(to_replace_scope),j)) { to_replace_scope.push_back(scope); to_replace[scope]=i; } } } sort(BEG_END(to_replace_scope),is_first_small); str_size offset=0; for ( replace_scope::iterator i=to_replace_scope.begin(),end=to_replace_scope.end(); i!=end; ++i ) { str_size len=i->second ; string word=abbr_word[to_replace[i]]; line.replace(i->first+offset,len ,word); offset+=word.size()-len; } outfile＜＜line＜＜'\n'; } return 0; } 声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

Eastsun 等级: 性别: 文章: 790 积分: 1329 来自: 天津	发表时间：2007-05-15 很是怀疑楼主的算法时间复杂度能符合要求吗? 感觉用Trie Tree来处理会比楼主的快很多.
返回顶楼	回帖地址 0 0 请登录后投票

论坛首页 → 编程语言技术版

跳转论坛: