看了一篇文章叫做Automated Curse Generator(传送门:http://thedailywtf.com/Articles/The-Automated-Curse-Generator.aspx),描述了一个程序员完成了客户的一个蛋疼要求的故事。蛋疼的要求大概就是需要一个自动生成符合英文构词法的单词的程序。 1 prot 生成这种超长的单词倒是我没有料到的,不过里面确实有几个看起来挺像回事的单词。。。。 以下是程序,因为用到了unix系统的特性所以没法直接在win上编译,修改也不是太麻烦。 #include <stdio.h>
恩,这是个看起来很好玩的要求,于是我也做了一个。算法很简单,找一个单词表,统计所有的4个字母组合后的字母的出现频率。举个例子,例如frequency这个单词,就会生成 freq->u, requ->e, eque->n, quen->c, uenc->y, ency->[结束] 这么几个组合。所有的单词都统计后,就能得出一个总的频率,按照这个概率随机生成单词。
因为只是玩玩,所以一切从简,写出了一个quick & dirty的程序,在网上找了一个大概25W词汇的英语单词表做学习材料。以下是效果:
2 proteinicentiricificallydactyluserdantignonereintrafantablephoracothygrotentously
3 protochrongdomshipsmithericlashiestawa
4 protasimonialisms
5 proteineratophysialogyniasesitetrazolyschriekerchanisostomycetournetsmanshipochil
6 protoxicopathyesisms
7 proteinbushedly
8 protherficianilingulatercouchymatiidaeingnessessionatedly
9 protomyosaurophyllumiformitypennywordlenessessivityscalaimlessnessednessessablenessednessessedly
10 protavationaturaticallywesteepdownsteriorrheartfullyricyclophycephaloscopyic
11 proteinphascallylisms
12 protthalmophytoneagynostegoristomiestcritesisticalnessessibilityphlogotometrosyntaxationshipplicativellikenessedly
13 protomycetoanatoriestereolidineaterallywaymenosaurushlachronisticallyhoodspeaksmanshipeltschidiallyhalicylindrinnerieurshunlessnessestoof
14 protulaneticallyisms
15 protohyperrheadstipperyrootlessnessessessionistickboatingleablenessessorshipmuckwoodguilfanticuloushkashablenessessory
16 protomideposedly
17 protoxinessessimoreconominednessessariallymantablenessellatoryieldtschizognationabilityshirtmakerinjararaturedly
18 protobiastraloguedochondriaceousnesselapodalisticulerdometrysalpinessessionisms
19 protingermilletably
20 protopatiatednessessaryshipcranatotherapinionisticalitypianglieronomenonthisisestereritesistablenessessorelatingularizelyite
21 protencephalimitosphygmometoeducatednessessaryshipshippushbucketerolingspurlerythrowweededuciteratrostwhatll
22 protonealienenignorarielanaryatidaemonumbernessessoralshipsmithracids
23 proteriaceae
24 protoperacreptionallygraphistorilymphonesmit
25 proteinerlesslingnessessionshipseysuringinessessingulphinessessivenessessinessessorialisticklessnessessoriumpherifenesianophilicologyes
26 protomycosinglyfoilstoccasionlessly
27 protalizedekerykeionisticityburnwaressibilitariesteriidaeinscripturesinglymorphobismutterediencephaluropsychodillodiestivitatorrhapsodontoothinnablenessessory
28 protobacteriallymantationshipsmithymenosindigestorlessly
29protervewishingsailorfongsteroepicosestataxylenialesestibilitypicrystallopiidaeinglyrmaeaninonexcubicularimethyletianalciteablenessessplasticentostoyennegarstoneallerooniestinescentesisticalisticisms
30 proteletteressionlessnessessivenessessoinessedellentialshipersoninflammaticallywhawksbeardosporiumshotgut
31 protoarcissionallegoricisms
32 protoe
33 prothermophoryloninettyfolkslidermatoryctisticallyishnessessoriedlershipstarianthrowablenesthesibilitypennyleticnessessivitysestaffirmanciestdom
34protographyllusterellingslewayslidinervisioplasmatoderatitenessessionallylthiopilingsettermindercroseconclusionablenessessorshipsmitchetterincisurablenessessivelyarchediverdesillophoresinawakemositrochromantisticallypiditicallegonizesisticatedly
35 protolitisestaticalnessessointellatedly
36 protizenereignnessessivelyarthriftilyergitesisteatomycosidaeingratoninfringidaeantertonsidenecatoriansacrificatedly
37 protusishablenessessitisms
38 prototinguishablenesessionalizederacaroticipationallyishnessessioneducibly
39proterocuriastantonyingsticednessessaltworkgroupallingesterediscopicallywagonmannelmakeriatingnesseeminglymoidosucculatelytrotholoclasticallyishnessessessinglymorestravagedientifetasomalluvioscopendinglymorphogakarpaxopsicallopoeticallyisms
40 protochromometeriestlabilitypicallylthiobdellieriestainographicallisms
41 protuslikenednessedly
42 protoindehistomacedinglyforwestednessedly
43 protohexylatednessessivelyarchy
44 proteolysise
45 proteinitentumacidnessessessionagentizery
46 proteoscurianteraceously
47 protasserahedralistsertebranderripeworksmittericulatorywellidepredicrocitroporously
48 protednessessessionlithoiceremongermistriesteriaterminatednessessingnessessexitessantly
49 proteolizednessessageer
50 protodiallyingnessessionallyshoploideadlerythrillabioglossatedly
51 protomoustedderpaintimumsiestabilitypygospermonephroneously
52 proteideteromopeximidectoryisms
53 prothongianthrodipheroelectomycetotoxicalloposolepses
54 protalophiliatomeridegrettereliquejuggiestimatesessessiblenessessivenessessedly
55 proteriestlementangerindorsablendustfullyribonumerstrousticismuscartiatednessessibilitypickeressessory
56 protulatorianicalnessessibilitylenessengingnesseriologyniousnessessioneredistulationedomicitypewritednessessory
57 prototransummermographerismolysisteriforicallyhuffiniteshippositationinterpsinornatiformatoiditisesponsolencedentaryasthesianshavuothamitopatenessessivelyk
58 protovewoodednessedly
59 proteidaeinsuer
60 protomycosisms
61 protomitigraduaticosismongeriproconusurpingoliardialysisms
62 proteletochromeristicalicinglymorementaliasinghostomycotylenessedly
63 protomatizationalphatonesquelinesianshaverifyinglyrmaliencephalicylationallylenessessessivitishnessednessessinguinedrickweariacisms
64 protomycosisms
65 proteolyticalnessericretersionabobusterontgometerogilantsomeredly
66prothinglymosishnessessessoriallicinemakeriodicallyinglefishednessessionistickwisehensionalizingivoroughterillettednessesselgarrowlingulablenesitesistationallithickenioidalitypifyingstockrumpediously
67 protographiscenithageer
68 protozincompactfullyingstantioplegicranealisms
69 protomously
70 protobraggressionshipsheelectroscibilitypedantnessednessessionalgiantly
71 protomyelonincunningscourtesisterinesserincivitously
72 protechnicallylationablessnesseraryl
73 protoceromagnetistriakidaniedly
74 protryosoteriestly
75 protocornereagressinganizemanastrofuscatheleniureiba
76 protoproductorymbiodictiblenesserinessessibilitylisms
77 protessesterceptionallycolumnatenessly
78 protechny
79 proteelyvinylidynamicilityshinglymoreterometerotomachiomagnettinglyn
80 protomyelopianshipbrokeriestlessneridiotypedalferentrisiformaliastocatholeininessessivenesisolysisms
81proteuraxoneurallyingnessessonsymbolonymalkinspousalingnessessalemakinglyingnessessiblenessessivenessessayessorlingsettsianshirterverspeculturinednessessessiblennialitypewrinkageproofy
82 protomizednessettlebackerediblenessessoriously
83 protecturamentadenotionaliteleorhinoleadeyeseednt
84 protencephalocellulariacetickstrianthismatedly
85 protransibilitypedaliasticatedly
86 protestimaturalisms
87proteoperidometrylenessessionartitionshipstocreaselledgerygiantivalencheletypinaxonotricitoustibourreacquettershipilitarytosorusophytinglyfoamerallyshockablenessessessometerogenousnessestabaginiumshipshipstonebulorrhinusimalizationalismanshiplankerslidestrydomshoniousnessessionalisms
88 proticallydrargyratednessessionistshippuritessessivenesiarylsulfonatrophyllidanselessessitantly
89 protaxiallywoodbounterturably
90 protomics
91 protemplanthornbrachyde
92 proteacheoneductiously
93 protretchettallups
94 proterythoseneradoxishlyk
95 protavianthanesthenoxalinologistingspillaloonerradiorrhaphyllumismatistivityshiftfisheddablenessessirouerimesiastouresocioliterrespotisms
96 protoko
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
struct rec {
int freq;
int __pad;
struct rec *next;
};
// _ denotes irrelavant, not yet implemented
// in predition means 'end of word'
#define _ 26
// N is sum of freq
#define N 27
struct rec wordfreq[27][27][27][27][28];
unsigned int myrand(void)
{
static int fd = 0;
static int i = 128;
unsigned static int buf[128];
if(!fd) {
fd = open("/dev/urandom", O_RDONLY);
}
if(i == 128) {
i = 0;
read(fd, &buf, sizeof(buf));
}
return buf[i++];
}
void learn(char *s)
{
do {
wordfreq[s[0]-'a'][s[1]-'a'][s[2]-'a'][s[3]-'a'][s[4] ? s[4]-'a' : _ ].freq++;
s++;
} while(s[4]);
}
void postlearn()
{
int i, j, k, l, m, s;
for(i=0; i<26; i++)
for(j=0; j<26; j++)
for(k=0; k<26; k++)
for(l=0; l<26; l++) {
s = 0;
for(m=0; m<27; m++) {
s += wordfreq[i][j][k][l][m].freq;
}
wordfreq[i][j][k][l][N].freq = s;
if(wordfreq[i][j][k][l][_].freq)
printf("%d N(%c%c%c%c)\n",
s,
'a' + i,
'a' + j,
'a' + k,
'a' + l
);
}
}
void generate(char *s)
{
int r, i, n;
struct rec *p;
n = 10000;
while(n--) {
p = &wordfreq[s[0]-'a'][s[1]-'a'][s[2]-'a'][s[3]-'a'][0];
if(p[N].freq) {
r = myrand() % p[N].freq;
for(i=0; i<27; i++) { // 27 for [a-z] and 'end of word'
r -= p[i].freq;
if(r<0) break;
}
s[4] = 'a' + i;
s++;
if(i == 26) { // if 'end of word'
s[3] = 0;
break;
}
} else {
s[4] = 0;
return;
// not reaching
s[4] = 'a' + myrand() % 26 ;
s++;
}
}
}
int main()
{
char buf[10000];
FILE *f;
printf(":: Learning wordlist...\n");
memset(wordfreq, 0, sizeof(wordfreq));
f = fopen("wlist", "r");
while(!feof(f)) {
fgets(buf, 100, f);
buf[strlen(buf)-1] = 0;
learn(buf);
}
fclose(f);
printf(":: Thinking...\n");
postlearn();
printf(":: OK \n");
while(1) {
gets(buf);
buf[4] = 0;
generate(buf);
printf("%s\n", buf);
}
return 0;
}
相关推荐
### 计算机数据结构与算法常用英语词汇详解 在计算机科学领域,数据结构与算法是核心基础之一,掌握相关的专业英语词汇对于学习、研究及国际交流至关重要。以下是对标题和描述中提到的关键知识点的详细解析,旨在...
在本项目中,"C++实现英语高中必备3500词汇随机顺序生成"是一个用C++编程语言编写的程序,旨在帮助高中生有效地学习和记忆英语词汇。这个程序的核心功能是将高中阶段需要掌握的3500个英语单词按照随机顺序呈现,以...
实验步骤: 1、数据预处理 首先针对5个数据表格进行数据预处理,交通类、语言类、典籍...3、编写的apriori算法文件Generate rules.ipynb,通过设置支持度support=0.3 置信度=0.5生成155条关联规则,存于rules.csv文件。
接着,我们讨论如何使用聚类算法生成摘要。通常步骤如下: 1. **预处理**:去除停用词、标点符号,进行词干提取和词形还原,降低文本复杂度。 2. **特征提取**:使用TF-IDF(词频-逆文档频率)或其他方法计算每个...
标题 "英语四级单词检索生成相关词汇表" 涉及的是一个C#编程项目,旨在帮助用户准备英语四级考试。这个应用的核心功能是输入一个英语四级单词,然后系统会自动检索并生成与输入单词相关的词汇表,以促进记忆和学习...
在本篇内容中,我们将深入探讨“网站主题标签生成与推荐算法”的核心概念和技术细节,这一主题基于一个真实的企业项目视频教程。通过本教程的学习,我们可以了解到如何利用大数据生态圈技术来为网站内容打上合适的...
《中英文分词算法详解与应用》 分词是自然语言处理中的基础步骤,它将连续的文本序列切分成有意义的词语单元,为后续的文本分析、信息检索、机器翻译等任务提供支持。本文将深入探讨由KaiToo搜索开发的中英文分词...
《皇室取名软件:海量词汇与精密算法的智慧结晶》 在当今信息化社会,名字不再仅仅是个人身份的标识,更是文化和个性的象征。对于新生命的降临或是虚拟世界的创建,一个富有深意的名字往往能增添不少魅力。"皇室...
遗忘算法是一种在自然语言处理(NLP)领域中用于词汇分析和信息检索的算法,它在文本处理中扮演着重要角色。在这个“遗忘算法(词库生成、分词、词权重)演示程序”中,我们可以深入理解这个算法如何应用于实际操作...
计算机编程及常用术语英语词汇大全 计算机编程及常用术语英语词汇大全是计算机科学和信息技术领域中非常...以上是根据计算机编程及常用术语英语词汇大全生成的相关知识点,涵盖了计算机科学和信息技术领域的多个方面。
在给定的"遗忘算法(词库生成&分词&词权重)演示程序"中,我们可以推测这个程序是用于自然语言处理(NLP)领域的,具体涉及词库生成、分词和计算词权重这三个关键步骤。 词库生成是NLP的基础工作之一,它通常涉及到...
词云生成,作为一种数据可视化技术,能够以图形的方式直观地展示文本中高频词汇,从而帮助人们快速理解文本主要内容。在本场景中,我们利用“词云生成器.exe”这个工具来分析一段文字,提取其中频繁出现的词语并形成...
在IT领域,尤其是在自然语言处理(NLP)和文本挖掘中,遗忘算法常常被用于学习和优化词典或词汇表的构建。这个压缩包文件提供了一个C#实现的遗忘算法演示程序,它涵盖了词库生成、分词以及计算词权重等关键步骤。...
这份术语表为学习者提供了丰富的中英文对照词汇,帮助理解和研究算法的各个方面。以下是一些关键概念的详细解释: 1. **顺序算法(sequential algorithm)**:按照特定顺序执行指令的算法,每个步骤依赖于前一步的...
3. **字典方法**:从预定义的英文词汇表中选取单词,然后进行随机组合或拼接。 4. **唯一性检查**:确保生成的用户名在数据库中是独一无二的,避免重复。 在实际应用中,英文用户名生成工具的用途广泛: 1. **用户...
本词汇表是针对FLUENT软件的专业英语术语,对于理解和操作该软件至关重要。 1. **算法 (algorithm)**: FLUENT软件使用不同的数值算法来解决复杂的流体力学方程,例如有限体积法(FVM),用于求解流场中的连续性、动量...
计算机常用的编程英语词汇 在计算机编程领域中,英语词汇是必备的工具,掌握这些词汇可以帮助程序员更好地理解和应用计算机编程技术。下面是常用的计算机编程英语词汇,供大家参考: 1. 数据结构(Data Structures...
### 安全性可控的生成式文本隐写算法 #### 一、引言与背景 随着信息技术的发展,信息隐藏技术成为信息安全领域的重要组成部分。其中,文本作为一种基础且频繁使用的通信方式,由于其语言特性和使用频率,成为了...