Problem Statement
|
|
In written languages, some symbols may appear more often than others. Expected frequency tables have been defined for many languages. For each symbol in a language, a frequency table will contain its expected percentage in a typical passage written in that language. For example, if the symbol "a" has an expected percentage of 5, then 5% of the letters in a typical passage will be "a". If a passage contains 350 letters, then 'a' has an expected count of 17.5 for that passage (17.5 = 350 * 5%). Please note that the expected count can be a non-integer value.
The deviation of a text with respect to a language frequency table can be computed in the following manner. For each letter ('a'-'z') determine the difference between the expected count and the actual count in the text. The deviation is the sum of the squares of these differences. Blank spaces (' ') and line breaks (each element of text is a line) are ignored when calculating percentages.
Each frequency table will be described as a concatenation of up to 16 strings of the form "ANN", where A is a lowercase letter ('a'-'z') and NN its expected frequency as a two-digit percentage between "00" (meaning 0%) and "99" (meaning 99%), inclusive. Any letter not appearing in a table is not expected to appear in a typical passage (0%). You are given a String[] frequencies of frequency tables of different languages. Return the lowest deviation the given text has with respect to the frequency tables.
|
Definition
|
|
Class: |
SymbolFrequency |
Method: |
language |
Parameters: |
String[], String[] |
Returns: |
double |
Method signature: |
double language(String[] frequencies, String[] text) |
(be sure your method is public) |
|
|
|
Notes
|
- |
The returned value must be accurate to within a relative or absolute value of 1E-9. |
Constraints
|
- |
frequencies will contain between 1 and 10 elements, inclusive. |
- |
Each element of frequencies will be formatted as described in the statement. |
- |
Each element of frequencies will contain between 6 and 48 characters, inclusive. |
- |
No letter will appear twice in the same element of frequencies. |
- |
The sum of the percentages in each element of frequencies will be equal to 100. |
- |
text will contain between 1 and 10 elements, inclusive. |
- |
Each element of text will contain between 1 and 50 characters, inclusive. |
- |
Each element of text will contain only lowercase letters ('a'-'z') and spaces (' '). |
- |
text will have at least one non-space character. |
Examples
|
0) |
|
|
{"a30b30c40","a20b40c40"}
|
{"aa bbbb cccc"}
|
|
Returns: 0.0
|
The first table indicates that 30% of the letters are expected to be 'a', 30% to be 'b', and 40% to be 'c'. The second table indicates that 20% are expected to be 'a', 40% to be 'b', and 40% to be 'c'. We consider the text to have length 10, as blank spaces are ignored. With respect to the first table, there are 2 'a' where 3 were expected (a difference of 1), one more 'b' than expected (again a difference of 1) and as many 'c' as expected. The sum of the squares of those numbers gives a deviation of 2.0. As for the second table, the text matches expected counts exactly, so its deviation with respect to that language is 0.0. |
|
|
1) |
|
|
{"a30b30c40","a20b40c40"}
|
{"aaa bbbb ccc"}
|
|
Returns: 2.0
|
Here we use the same tables as in the previous example, but with a different text. The counts for 'b' and 'c' each differ by 1 from the expected counts in the first table, and the counts for 'a' and 'c' each differ by 1 from the expected counts in the second table. The text has a deviation of 2.0 with respect to both tables. |
|
|
2) |
|
|
{"a10b10c10d10e10f50"}
|
{"abcde g"}
|
|
Returns: 10.8
|
Here, each of the letters 'a' through 'e' is expected to make up 10% of the letters (0.6 letters). Each of those letters actually appears once, so the difference is 0.4, which becomes 0.16 when squared. 50% of the letters (3 letters) are expected to be 'f', but 'f' does not appear at all. The square of this difference is 9.0. No 'g's are expected to appear, but there is one in the text. This adds 1.0 to the deviation. The final deviation for this table is: 0.16+0.16+0.16+0.16+0.16+9.0+1.0=10.8. |
|
|
3) |
|
|
{"a09b01c03d05e20g01h01i08l06n08o06r07s09t08u07x01" ,"a14b02c05d06e15g01h01i07l05n07o10r08s09t05u04x01"}
|
{"this text is in english" ,"the letter counts should be close to" ,"that in the table"}
|
|
Returns: 130.6578
|
These two frequency tables correspond (roughly) to the frequencies found in the English and Spanish languages, respectively. The English passage, as expected, has a lower deviation in the first table than in the second one. |
|
|
4) |
|
|
{"a09b01c03d05e20g01h01i08l06n08o06r07s09t08u07x01" ,"a14b02c05d06e15g01h01i07l05n07o10r08s09t05u04x01"}
|
{"en esta es una oracion en castellano" ,"las ocurrencias de cada letra" ,"deberian ser cercanas a las dadas en la tabla"}
|
|
Returns: 114.9472
|
The same tables again, but with Spanish passage. This time the second table, which correspond to frequencies in Spanish, gives the lowest deviation. |
|
|
5) |
|
|
{"z99y01", "z99y01", "z99y01", "z99y01", "z99y01", "z99y01", "z99y01", "z99y01", "z99y01", "z99y01"}
|
{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
|
|
Returns: 495050.0
|
|
|
import java.util.*;
public class SymbolFrequency {
static int N = 26;
public double language(String[] freqs, String[] text) {
List<int[]> maps = new ArrayList<int[]>();
for (int i = 0; i < freqs.length; i++) {
int[] map = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
for (int j = 0; j < freqs[i].length(); j += 3)
map[freqs[i].charAt(j)-'a'] = Integer.parseInt(freqs[i].substring(j+1, j+3));
maps.add(map);
}
int count = 0;
int freq[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
for (int i = 0; i < text.length; i++)
for (int j = 0; j < text[i].length(); j++) {
char c = text[i].charAt(j);
if (Character.isLowerCase(c)) {
count++;
freq[c-'a']++;
}
}
double min = Double.MAX_VALUE;
for (int i = 0; i < maps.size(); i++) {
int[] map = maps.get(i);
double now = 0.0;
for (int j = 0; j < N; j++)
now += Math.pow(freq[j] - (count * map[j] / 100.0), 2);
if (now < min)
min = now;
}
return min;
}
}
分享到:
相关推荐
TCHS-SRM-1 SRM - 算法单轮比赛 2. USACO - C++11 礼物1.cpp 骑车.cpp 测试.cpp 3.乌拉尔 - - C++11,Java 1.8 乌拉尔在线法官的可能解决方案 反向Root.cpp 总和文件 求和程序 最终排名.cpp 磁暴.cpp 磁暴.java 寂寞...
1. **3G移动通信技术概述** - WCDMA (宽带码分多址) - TD-SCDMA (时分同步码分多址) - CDMA2000 (码分多址2000) 2. **C语言编程实践** - 文件包含了C语言代码示例 - 涉及到的基本输入输出操作 - 数学函数的...
1. **FXC E1 板**:FXC E1 板提供4个75欧姆的2M接口,可以作为回路网络的主控或从属方,线路接口4的Rx-connector可作为同步接口接收外部时钟信号。但FXC E1板并不是微波传输设备。 2. **VX系列模块**:FXC E1对应的...
4. 配置ULTRASITE传输时,不需要配置EDAP(可能是指电子数据接入点),而需要配置TCHs(时隙信道)、TRXSIG(传输信号)和OMUSIG(操作维护信号)。 5. DE34基站的公共设备直流电源由CSUA模块提供,而非PWSB、PSUA...
BER通常以百分比形式表示,例如1×10^-6表示每百万个比特中有1个比特出错。 #### 支持的测试环境与信道 CMU300支持多种测试环境与信道,包括但不限于: - **上行链路(UL)与下行链路(DL)信号**:包括信道编号...
【TOPCODER算法PPT1】是一份关于2007年TopCoder竞赛算法讲座的机密文件,它揭示了这个全球知名的编程竞赛平台的核心特点和价值。TopCoder社区是其核心,拥有遍布全球的会员,包括众多活跃的参赛者,涵盖了学生和专业...
要告诉技术人员验证元素,请将“tchs”属性添加到元素。 <input type="text" tchs=""></input> 技术人员利用规则来验证元素。 验证是在每个元素的基础上完成的,并且根据所使用的元素进行不同的工作...