- 浏览: 344558 次
- 性别:
- 来自: 大西洋底
文章分类
最新评论
-
jfztaq:
问题果然解决了,太感谢了
Chrome经常性的“喔唷,崩溃了”问题 -
saintor:
因为不是每个subclass都执行Cloneable吧。
Java Object类的方法们 -
337240552:
写的不错 这个东西晕死一堆人。
对JavaScript中原型的理解 -
liang86liang:
jkleeo 写道很深奥啊.
C/CPP只有在大学的时候听说过 ...
Windows下用Eclipse搭建C/C++开发环境 -
ahong520:
看来你也是四国军棋爱好者,啥时候切磋一下
四国军棋游戏V0.3.5(未完成)
统计一篇英文文档或一本小说中单词出现的次数,下面代码使用的是英文版小说"悲惨世界"做例子。 有两个需要注意的地方,一个是如何使用正则式分割单词,一个是HashMap中对元素按值排序无法直接完成,中间做了一下转化:
结果如下:
部分输出:
import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.regex.Pattern; public class EnglishWordsStatics { public static final String EN_FOLDER_FILE = "C:/resources/Books/English/Les Miserables.txt"; public static final String OUTPUT = "C:/resources/Books/English/Les Miserables - Words.txt"; private HashMap<String, Integer> result = new HashMap<String, Integer>(); private int total = 0; /** * Handle one English fiction * * @param file * @throws IOException */ public void handleOneFile(File file) throws IOException { if (file == null) throw new NullPointerException(); BufferedReader in = new BufferedReader(new FileReader(file)); String line; // split by space ' ( ) * + ' . / [0-9] : ; ? [ ] ` { } | Pattern pattern = Pattern .compile("[ ,?;.!\"'|[0-9]:`\\-\\(\\)\\[\\]]+"); while ((line = in.readLine()) != null) { line = line.toLowerCase(); String[] words = pattern.split(line); for (String word : words) { if (word.length() > 0) { total++; if (!result.containsKey(word)) { result.put(word, 1); } else { Integer i = result.get(word); i++; result.put(word, i); } } } } in.close(); System.out.println("Total words: " + total); System.out.println("Total different words: " + result.size()); } /** * Print the statics result * @throws IOException */ public void saveResult() throws IOException { // Sorting List<Node> list = new ArrayList<Node>(); for (String word : result.keySet()) { Node p = new Node(word, result.get(word)); list.add(p); } Collections.sort(list); FileWriter fw = new FileWriter (new File (OUTPUT)); for (Node p : list) { fw.write(p.getWord() + "\t" + p.getNum()+"\n"); } fw.close(); System.out.println ("Done"); } /** * @param args */ public static void main(String[] args) throws IOException { EnglishWordsStatics ews = new EnglishWordsStatics(); ews.handleOneFile(new File(EN_FOLDER_FILE)); ews.saveResult(); } } /** * For sorting, store the words - num * */ class Node implements Comparable<Node> { private String word; private int num; public Node() { } public Node(String word, int num) { super(); this.word = word; this.num = num; } public String getWord() { return word; } public int getNum() { return num; } @Override public int compareTo(Node o) { return o.getNum() - num; } }
结果如下:
Total words: 607563 Total different words: 22882 Done
部分输出:
the 43538 of 21107 and 15865 a 15365 to 14663 in 11813 he 10280 was 9251 that 8413 it 7026 his 6813 had 6564 is 6504 which 5506 with 4737 on 4714 at 4292 this 4208 not 3981 i 3910 you 3768 one 3500 as 3447 for 3129 him 3118 have 2919 there 2869 her 2767 who 2676 all 2606 she 2605 by 2604 from 2568 be 2484 are 2258 an 2249 they 2236 but 2187 s 2141 man 2107 no 2058 were 1962 what 1932 said 1879 been 1601 marius 1471 when 1429 we 1407 their 1323 two 1284 jean 1275 so 1262 will 1258 me 1207 my 1206 more 1198 himself 1155 valjean 1154 them 1126 has 1122 would 1114 these 1097 then 1097 into 1058 like 1055 out 1047 did 1046 little 1034 cosette 1033 m 1005 very 976 its 969 up 965 or 955 do 952 other 940 old 939 than 930 day 869 only 837 some 830 good 830 made 823 time 795 nothing 794 those 779 your 765 if 752 without 739 could 727 de 725 rue 720 first 681 about 678 well 665 where 663 father 638 men 638 say 635 here 631 now 608 should 592 moment 591 over 585 come 582 hand 576 see 573 through 571 any 570 eyes 566 am 561 know 560 even 559 same 551 us 549 after 549 still 546 thenardier 544 great 543 just 538 thought 534 must 533 before 530 once 514 under 511 upon 509 door 508 three 499 being 493 people 491 child 490 how 489 book 487 house 487 head 482 let 480 sort 478 again 474 young 474 go 473 every 472 night 471 each 471 longer 469 javert 465 light 463 right 460 name 458 paris 458 woman 455 can 454 such 446 way 445 place 444 long 443 life 443 back 438 went 431 saint 430 seemed 424 never 421 called 420 four 417 took 416 take 400 seen 397 t 395 years 389 something 389 chapter 388 air 384 left 382 love 381 whom 381 make 380 monsieur 377 though 377 god 373 point 371 mother 368 whole 367 might 367 most 367 between 366 may 363 shall 361 does 358 voice 352 street 352 last 352 almost 351 much 350 down 348 our 348 turned 346 own 342 thing 341 having 340 towards 338 passed 336 face 336 everything 334 always 329 poor 329 soul 327 against 327 order 327 felt 322 off 320 hundred 320 bishop 320 side 318 replied 315 la 314 things 313 certain 312 word 312 away 312 gavroche 312 wall 311 another 308 behind 307 because 307 few 306 hour 306 going 306 room 306 barricade 302 taken 299 five 299 francs 297 too 297 fact 297 black 296 saw 296 fauchelevent 296 put 294 while 291 heard 291 came 290 found 290 heart 284 end 282 enjolras 282 entered 282 madeleine 281 near 281 why 280 themselves 269 madame 268 bed 267 dead 265 sometimes 265 words 265 yes 261 white 260 ah 259 evening 259 girl 253 death 252 six 252 garden 252 le 251 mind 250 itself 249 since 248 thus 248 morning 247 began 246 remained 246 open 245 also 245 gillenormand 244 nor 241 beneath 240 many 240 children 239 half 237 second 237 think 236 table 236 opened 235 set 235 don 233 get 232 terrible 231 hands 231 full 231 done 228 herself 228 large 228 become 228 world 225 anything 224 feet 224 both 223 human 223 person 222 water 219 arms 217 work 217 alone 217 sewer 214 fantine 213 far 211 whose 210 fell 210 idea 210 courfeyrac 209 o 208 police 207 twenty 204 days 204 matter 199 give 199 above 199 already 198 added 198 returned 196 window 194 exclaimed 193 thousand 193 possible 191 corner 190 france 190 earth 190 later 188 however 188 held 187 d 186 knew 186 front 186 louis 186 age 185 less 185 round 183 case 183 speak 182 sir 181 fire 181 tell 180 among 180 yet 180 clock 179 true 179 cold 178 revolution 178 grave 177 lost 176 saying 176 des 174 resumed 173 glance 173 women 173 l 172 part 172 silence 171 look 170 became 170 jondrette 170 rather 169 arm 169 manner 168 new 168 stood 167 sister 167 nevertheless 166 pass 166 iron 165 stone 165 low 164 appeared 164 caught 163 reached 162 oh 162 perhaps 162 raised 162 hair 162 convent 161 read 161 war 160 grand 160 du 159 society 159 beheld 158 fall 158 placed 157 wine 156 shadow 154 happy 154 forth 154 form 154 within 153 making 152 small 152 ground 152 turn 151 state 151 hours 151 nature 151 following 151 grandfather 151 darkness 151 coat 150 joy 149 chamber 149 presence 148 suddenly 148 find 148 myself 147 road 147 letter 147 shop 147 live 147 eye 146 fine 146 foot 146 law 145 paper 145 sight 145 napoleon 145 close 144 smile 144 closed 144 times 143 trees 143 moreover 142 th 142 walls 142 reader 142 seized 142 neither 141 gave 141 quarter 141 history 140 short 140 ancient 139 battle 139 asked 139 beginning 139 king 139 course 138 red 138 present 138 better 138 told 138 third 138 want 137 brought 137 question 137 ever 137 streets 137 piece 137 others 136 rose 136 lay 136 during 136 continued 136 given 136 looked 136 along 136 century 135 knows 134 sound 134 pocket 134 taking 134 rest 134 force 134 enter 134 money 133 direction 133 understand 132 waterloo 132 formed 132 call 131 perceived 131 necessary 130 able 129 strange 129 around 129 melancholy 129 english 129 return 128 sun 128 thou 128 year 128 seated 128 public 127 daughter 127 single 126 mysterious 126 bottom 126 filled 126 gazed 125 floor 125 dark 125 boulevard 124 ten 124 beside 124 cried 124 bourgeois 123 whether 123 die 123 cast 123 visible 123 past 122 seven 122 convict 122 country 121 impossible 121 mayor 120 cut 120 guard 120 hardly 120 appearance 120 shadows 119 laid 119 charming 119 hole 118 means 118 town 118 probably 118 gloomy 117 drew 117 broken 117 pontmercy 117 disappeared 117 blood 117 profound 117 french 117 galleys 116 morrow 116 mademoiselle 116 nearly 116 epoch 116 makes 116 except 116 doubt 115 happiness 115 sous 115 received 115 often 115 together 114 general 114 living 114 mabeuf 114 post 114 least 114 followed 113 comes 113 cannot 113 outside 113 bad 113 says 113 stones 112 leblanc 112 eight 112 paid 112 arrived 112 beautiful 112 houses 112 movement 112 lived 112 re 111 cross 111 known 111 truth 111 depths 111 step 110 hear 110 carriage 110 flowers 109 immense 109 gone 109 lighted 108 progress 108 ideas 108 bread 107 evil 107 girls 107 mouth 107 brother 106 steps 106 sword 106 quite 106 social 106 escape 106 hideous 106 liberty 105 recognized 105 carried 105 army 105 caused 105 certainly 105 pretty 104 hold 104 mingled 104 attention 104 spot 104 effect 103 combeferre 103 thirty 103 slang 103 fallen 103 coming 103 future 103 ago 103 wish 102 pay 102 heaven 102 need 102 shot 102 really 102 family 102 struck 101 passing 101 until 101 below 101 midst 101 months 101 horse 101 city 99 wife 99 conscience 99 loved 99 friends 98 line 98 teeth 98 yourself 97 duty 97 soon 97 breath 97 chair 97 served 97 bent 96 enough 96 sign 96 justice 96 unknown 96 grantaire 96 body 95 seems 95 distance 95 frightful 94 thoughts 94 remain 94 candle 94 high 94 sleep 94 although 93 hat 93 produced 93 covered 93 singular 93 fellow 93 forty 93 wind 92 eponine 92 moments 92 instant 92 simple 92 fifteen 92 further 92 fear 92 secret 92 peace 92 understood 91 insurrection 91 presented 91 bit 91 slowly 91 ll 90 walked 90 occasion 90 formidable 90 doctor 90 gaze 90 square 90 top 90 becomes 90 porter 90 allowed 90 brow 90 glass 89 rendered 89 sad 89 blind 89 husband 89 souls 89 montparnasse 89 horrible 89 windows 89 according 88 monseigneur 88 son 88 pale 88 leave 88 halted 87 enormous 87 succeeded 87 minutes 87 stranger 87 dressed 86 vague 86 ran 86 either 86 power 86 serious 86 uttered 86 tone 86 laugh 86 none 86 forest 86 obliged 86 blue 85 spring 85 sombre 85 use 85 heads 85 touched 84 existed 84 home 84 pavement 84 view 84 despair 84 petit 84 forms 84 prison 84 reply 84 june 83 sky 83 sur 83 doing 83 knees 83 middle 82 hope 82 fixed 82 colonel 82 watch 82 haste 81 killed 81 care 81 misery 81 cannon 81 noise 81 real 81 names 81 prisoner 80 eat 80 bossuet 80 several 80 letters 80 burst 80 spoke 80 youth 80 big 80 destiny 80 tree 80 crime 79 church 79 gentleman 79 fashion 79 deal 79 address 79 rich 79 lower 79 entering 79 asleep 79 vast 78 hence 78 perfectly 78 honest 78 composed 78 standing 78 concealed 78 master 78 resembled 78 service 78 whence 78 sure 77 motionless 77 gun 77 stars 77 number 77 winter 77 civilization 77 terror 77 amid 77 besides 77 magloire 77 chimney 77 honor 77 thrust 77 forced 77 thinking 77 walk 76 baron 76 et 76 chance 76 reason 76 deserted 76 gloom 76 begun 76 school 76 paces 76 neck 76 emperor 76 affair 76 seeing 76 rain 76 ideal 76 speaking 75 latter 75 traversed 75 inn 75 everywhere 75 persons 75 cry 75 lofty 75 beyond 75 march 75 wild 75 feel 75 respect 75 montfermeil 75 got 74 paused 74 holy 74 subject 74 beings 74 else 74 court 74 dawn 74 fault 74 whatever 74 priest 74 aside 74 mass 74 turning 73 peculiar 73 wrong 73 creature 73 rope 73 worthy 73 tholomyes 73 wore 73 shouted 73 race 73 drawing 72 space 72 opening 72 fifty 72 horses 72 sou 72 divine 72 gate 72 shoes 72 double 72 wounded 72 breast 72 spirit 72 free 72 recognize 72 waiting 72 walking 71 change 71 thanks 71 written 71 lines 71 soldier 71 box 71 coffin 71 stared 71 pronounced 70 play 70 account 70 listened 70 bench 70 gentle 70 passage 70 silver 70 evidently 70 memory 70 situation 70 addressed 70 dream 70 kept 70 named 70 possessed 69 key 69 erect 69 behold 69 pity 69 green 69 building 69 cap 69 fresh 69 sainte 68 run 68 bare 68 departure 68 preceding 68 cart 68 mean 68 tried 68 narrow 68 picpus 68 keep 68 soldiers 68 ill 67 obscure 67 angle 67 cloud 67 wellington 67 talking 67 ended 67 finished 67 approached 67 condemned 67 month 67 existence 67 virtue 66 story 66 distant 66 habit 66 quitted 66 attack 66 object 66 wood 66 complete 66 immediately 66 shut 66 sent 66 absolute 65 lightning 65 supreme 65 etc 65 sweet 65 dog 65 dropped 65 noticed 65 revery 65 calm 65 listen 65 believe 65 entrance 65 wrath 65 heavy 65 bore 64 obscurity 64 crowd 64 abyss 64 finally 64 ask 64 rags 64 shoulders 63 pure 63 flight 63 takes 63 goes 63 thither 63 happened 63 died 63 doors 63 emerged 63 advanced 63 twilight 63 fatal 63 gamin 63 deep 63 effort 63 horror 63 stupid 63 committed 63 demanded 63 prioress 63 possession 63 plumet 63 advance 62 sense 62 fifth 62 blow 62 instinct 62 bring 62 best 62 roof 62 daylight 62 revolt 62 purpose 62 merely 62 questions 62 linen 62 aunt 62 conscious 62 tomb 62 gold 62 note 62 attitude 62 encountered 62 field 61 descended 61 england 61 ourselves 61 talk 61 flung 61 suffering 61 action 61 faubourg 61 rise 61 yellow 61 absolutely 61 lies 61 merry 61 required 61 illuminated 61 cure 61 seem 61 self 61 exists 61 repeated 60 ear 60 across 60 mentioned 60 hall 60 falling 60 occupied 60 infinite 60 straw 60 smoke 60 straight 60 branch 60 philosophy 60 cause 60 observed 60 lips 60 pistol 59 holding 59 horizon 59 knowing 59 violent 59 former 59 maire 59 indescribable 59 hung 59 bridge 58
发表评论
-
文件分割与合并
2020-03-19 20:59 262package com.test.filestool; ... -
盒子里面另一个是红球的概率问题
2019-05-08 09:27 769问题如下:引用有三个盒子,其中一个里面是两个红球,一个里面是两 ... -
Mac OS X 下运行Java standalone 连接 Notes
2017-11-27 12:32 788Mac OS X 下运行Java standalone 连接 ... -
随机密码生成
2015-09-10 10:19 784import java.util.Random; p ... -
Java 处理mail subject
2015-06-15 21:16 1078对于mail subject 前面烦人的各种Re: 或Fw: ... -
有趣的“生命游戏”
2013-04-04 10:56 1035“生命游戏” 本世纪70年代,人们曾疯魔一种被称作“生命游戏” ... -
有趣的统计英文字母频率的例子
2013-03-01 01:13 1390统计的是英文版"悲惨世界",代码如下,使用 ... -
有趣的将一个十进制整数转换成二进制输出的算法
2013-02-27 00:20 1343原题是将一个十进制整数转换成二进制输出。 分析:任何数可以表 ... -
统一批量修改照片名字
2012-09-01 14:00 2925在给小宝拍的照片中,有我手机拍的,有媳妇手机拍的,还有相机拍的 ... -
关于Java的UUID
2012-08-30 18:40 8317UUID或者UNID或者UID,是一个统一唯一标识,可以用来标 ... -
关于Java中的哈希表 HashMap,Hashtable 等
2012-07-27 10:10 2790首先来了解一下基本概念 所谓哈希表(Hash Table,又 ... -
关于Java中的哈希表
2012-07-27 10:01 1关于Java中的哈希表首先 ... -
关于Java的“浅拷贝”和“深拷贝” (clone method)
2012-07-24 14:31 1297这是关于Java的clone, 一些知道的和不知道的。 1. ... -
从某网站下载MP3的例子
2012-05-29 23:14 1402从某网站下载MP3的例子。为安全起见,将网站信息匿了。 ... -
统计项目中Java文件数和Java代码行数
2010-12-25 11:51 6474其实就是使用递归遍历目录下所有文件 import jav ... -
Java循环内goto语句的替代方案
2010-12-12 23:04 3248众所周知,Java虚拟机根本没有实现goto关键字。我的一个函 ... -
Struts 2 + Spring 2 + JPA + AJAX示例
2009-09-12 21:18 2581这个例子其实就是来自Struts 2的文档,但是原例子针对的是 ... -
Java线程编程学习笔记(二)
2009-06-11 17:23 1331这里是上一篇:Java线程编程学习笔记(一) Java线程编 ... -
Java线程编程学习笔记(一)
2009-04-09 10:46 2195"Java Thread Programming&q ... -
学习Spring 2.5和Hibernate 3的代码示例
2008-06-06 16:01 2514代码内容(每个包都是一个独立的应用,彼此不干涉): 一个最小 ...
相关推荐
这里我们主要探讨“单词字母频率统计”的概念、实现方法以及它在不同场景下的应用。 首先,理解“单词字母频率统计”:这是一种统计技术,用于计算一个给定文本中每个字母出现的频次。这种统计可以帮助我们了解文本...
给定指定单词,统计其在选定文本中出现的频率 在磁盘目录下保存一篇英文文章,通过程序打开该文件,对里面的数据进行操作;将磁盘文件中的英文文章先用链表装起来,单词一个个地存放到链表中的结点中;这样一来对...
本实践项目聚焦于“统计单词频率”,这是一个典型的文本处理问题,旨在通过编程来实现对文本数据的高效分析。在这个过程中,我们将学习如何统计单词个数,查询特定单词及其出现频率,以及定位单词在文本中的行号。 ...
在这个特定的项目中,“C++双向链表统计文章单词出现频率”是一个涉及数据结构和算法的应用,目标是实现一个程序来分析文本文件,计算并显示文章中每个单词出现的次数。双向链表作为数据结构的核心,其特点是每个...
标题 "统计单词在文章中出现频率" 描述的是一个C++编程任务,目的是设计并实现一个程序,能够读取一个包含英文文章的文本文件,分析其中的单词,并统计每个单词出现的次数。最终,程序会将这些信息写入另一个文件,...
英文单词频率分析器
vc6.0制作的英文单词统计程序,可对txt中的英文单词统计并排序,显示前十的单词
标题 "统计单词出现频率代码" 描述的是一个用于计算英文文章中单词频率的程序。这个程序可以帮助我们了解一篇文章中各个单词出现的频次,对于文本分析、信息检索或语言学习等场景都十分有用。标签 "单词" 和 "频率" ...
如题 c语言统计英文单词 先输入文件地址 然后按照提示操作
在IT领域,尤其是在编程与数据处理方面,统计文本单词频率是一项基本且重要的任务。通过给定的代码示例,我们可以深入探讨如何使用C++结合STL(标准模板库)中的`map`容器来高效地完成这一工作。 ### 核心知识点...
哈希查找 写一篇英文的自我介绍,统计各单词出现的次数,选取适当的哈希函数,构造哈希表,用链表来解决冲突,然后实现哈希查找。
可以用简单的图形界面显示文本所有英文单词的数目,并可以查询固定单词的个数
实验11-1-1 英文单词排序 (25 分) 本题要求编写程序,输入若干英文单词,对这些单词按长度从小到大排序后输出。如果长度相同,按照输入的顺序不变。 输入格式: 输入为若干英文单词,每行一个,以#作为输入结束...
字符串分割 复制代码 代码如下: ...统计英文单词的个数的python代码 复制代码 代码如下: # -*- coding: utf-8 -*- import os,sys info = os.getcwd() #获取当前文件名称 fin = open(u’c:/a.txt’) info = fin.read
在"二叉搜索树统计单词频率"的问题中,我们首先需要读取用户输入的一段文本,将其中的单词提取出来。这个过程通常涉及到字符串处理,例如分隔符分割、大小写转换等,以便进行统一的比较。我们可以使用C++的标准...
从磁盘中输入文件,然后对文件中的单词进行统计,并由高到低的顺序输出单词及其出现频率
在数据结构课程设计中,"统计单词频率"是一个常见的实践项目,它涉及到文本处理、数据组织和算法应用。这个项目的主要目标是分析文本文件中的单词出现频次,并以可视化的方式展示出来。MFC(Microsoft Foundation ...
在C++编程语言中,统计英文文章中的单词个数并计算每个字母的出现频率是一项基础但重要的任务。这个过程涉及到字符串处理、字符分析以及计数算法。以下将详细阐述实现这一功能所需的关键知识点: 1. **字符串处理**...
标题 "英文单词txt下载 英语单词txt、word文档下载-15325行英文单词" 提供的信息主要集中在英语学习资源上,这通常意味着它包含了一个文本文件,里面列举了15325个英文单词。这些词汇可能是按照字母顺序排列,也可能...
有趣的英文单词游戏.doc