`
chenzugang
  • 浏览: 11070 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
最近访客 更多访客>>
社区版块
存档分类
最新评论

根据GB2312汉字区位码获取汉字拼音的工具类

阅读更多
申明:
本工具类的原型来自互联网,本人作了算法上的调整。所以严格上不算原创,只能算是编著。

主要思路:
根据gb2312汉字的区位码,建立区位码与拼音的对应关系,因为一定范围内的汉字区位码的拼音是一样的,例如,如果汉字的区位码在(20317,20319]范围内,那么其拼音为“a”。
所以本工具类首先建立连个数组,一个边界放区位码,何谓边界区位码,如20317与20318两个区位码相连续,但是其对应的拼音一个是“ai”,一个是“a”,不连续,那么20317就是边界区位码;一个数组存放边界区位码对应的拼音,利用边界区位码作为数组的索引,对应的拼音为值。
对于给定的汉字,首先计算器区位码,而后根据取上不取下的原则找到其拼音对应的边界区位码,如一个汉字的区位码是20318,那么20318两端的边界区位码一个是20317,一个是20319,根据取上不取下原则,获得20318对应的区位码拼音的边界区位码20319,根据20319获得20318的拼音为“a”。

缺点:字符集不大,仅限于gb312;多音字只对应一个音,如“银行”,“行走”中的“行”字的拼音是一样的,所以如果外部输入都是汉字,由本工具转换为拼音去查同音,没有问题,如果外部输入拼音,就有问题。



/**
 * GB2313汉字转为对应的拼音的助手类
 * <p>
 * created on 2007-10-12 下午06:13:03
 * 
 * @author chenzugang@gmail.com
 * 
 */
public class GB2312ToAlphaUtil {
    private static String[] alphaArray = new String[20320];
    static {
        alphaArray[20319] = "a";
        alphaArray[20317] = "ai";
        alphaArray[20304] = "an";
        alphaArray[20295] = "ang";
        alphaArray[20292] = "ao";
        alphaArray[20283] = "ba";
        alphaArray[20265] = "bai";
        alphaArray[20257] = "ban";
        alphaArray[20242] = "bang";
        alphaArray[20230] = "bao";
        alphaArray[20051] = "bei";
        alphaArray[20036] = "ben";
        alphaArray[20032] = "beng";
        alphaArray[20026] = "bi";
        alphaArray[20002] = "bian";
        alphaArray[19990] = "biao";
        alphaArray[19986] = "bie";
        alphaArray[19982] = "bin";
        alphaArray[19976] = "bing";
        alphaArray[19805] = "bo";
        alphaArray[19784] = "bu";
        alphaArray[19775] = "ca";
        alphaArray[19774] = "cai";
        alphaArray[19763] = "can";
        alphaArray[19756] = "cang";
        alphaArray[19751] = "cao";
        alphaArray[19746] = "ce";
        alphaArray[19741] = "ceng";
        alphaArray[19739] = "cha";
        alphaArray[19728] = "chai";
        alphaArray[19725] = "chan";
        alphaArray[19715] = "chang";
        alphaArray[19540] = "chao";
        alphaArray[19531] = "che";
        alphaArray[19525] = "chen";
        alphaArray[19515] = "cheng";
        alphaArray[19500] = "chi";
        alphaArray[19484] = "chong";
        alphaArray[19479] = "chou";
        alphaArray[19467] = "chu";
        alphaArray[19289] = "chuai";
        alphaArray[19288] = "chuan";
        alphaArray[19281] = "chuang";
        alphaArray[19275] = "chui";
        alphaArray[19270] = "chun";
        alphaArray[19263] = "chuo";
        alphaArray[19261] = "ci";
        alphaArray[19249] = "cong";
        alphaArray[19243] = "cou";
        alphaArray[19242] = "cu";
        alphaArray[19238] = "cuan";
        alphaArray[19235] = "cui";
        alphaArray[19227] = "cun";
        alphaArray[19224] = "cuo";
        alphaArray[19218] = "da";
        alphaArray[19212] = "dai";
        alphaArray[19038] = "dan";
        alphaArray[19023] = "dang";
        alphaArray[19018] = "dao";
        alphaArray[19006] = "de";
        alphaArray[19003] = "deng";
        alphaArray[18996] = "di";
        alphaArray[18977] = "dian";
        alphaArray[18961] = "diao";
        alphaArray[18952] = "die";
        alphaArray[18783] = "ding";
        alphaArray[18774] = "diu";
        alphaArray[18773] = "dong";
        alphaArray[18763] = "dou";
        alphaArray[18756] = "du";
        alphaArray[18741] = "duan";
        alphaArray[18735] = "dui";
        alphaArray[18731] = "dun";
        alphaArray[18722] = "duo";
        alphaArray[18710] = "e";
        alphaArray[18697] = "en";
        alphaArray[18696] = "er";
        alphaArray[18526] = "fa";
        alphaArray[18518] = "fan";
        alphaArray[18501] = "fang";
        alphaArray[18490] = "fei";
        alphaArray[18478] = "fen";
        alphaArray[18463] = "feng";
        alphaArray[18448] = "fo";
        alphaArray[18447] = "fou";
        alphaArray[18446] = "fu";
        alphaArray[18239] = "ga";
        alphaArray[18237] = "gai";
        alphaArray[18231] = "gan";
        alphaArray[18220] = "gang";
        alphaArray[18211] = "gao";
        alphaArray[18201] = "ge";
        alphaArray[18184] = "gei";
        alphaArray[18183] = "gen";
        alphaArray[18181] = "geng";
        alphaArray[18012] = "gong";
        alphaArray[17997] = "gou";
        alphaArray[17988] = "gu";
        alphaArray[17970] = "gua";
        alphaArray[17964] = "guai";
        alphaArray[17961] = "guan";
        alphaArray[17950] = "guang";
        alphaArray[17947] = "gui";
        alphaArray[17931] = "gun";
        alphaArray[17928] = "guo";
        alphaArray[17922] = "ha";
        alphaArray[17759] = "hai";
        alphaArray[17752] = "han";
        alphaArray[17733] = "hang";
        alphaArray[17730] = "hao";
        alphaArray[17721] = "he";
        alphaArray[17703] = "hei";
        alphaArray[17701] = "hen";
        alphaArray[17697] = "heng";
        alphaArray[17692] = "hong";
        alphaArray[17683] = "hou";
        alphaArray[17676] = "hu";
        alphaArray[17496] = "hua";
        alphaArray[17487] = "huai";
        alphaArray[17482] = "huan";
        alphaArray[17468] = "huang";
        alphaArray[17454] = "hui";
        alphaArray[17433] = "hun";
        alphaArray[17427] = "huo";
        alphaArray[17417] = "ji";
        alphaArray[17202] = "jia";
        alphaArray[17185] = "jian";
        alphaArray[16983] = "jiang";
        alphaArray[16970] = "jiao";
        alphaArray[16942] = "jie";
        alphaArray[16915] = "jin";
        alphaArray[16733] = "jing";
        alphaArray[16708] = "jiong";
        alphaArray[16706] = "jiu";
        alphaArray[16689] = "ju";
        alphaArray[16664] = "juan";
        alphaArray[16657] = "jue";
        alphaArray[16647] = "jun";
        alphaArray[16474] = "ka";
        alphaArray[16470] = "kai";
        alphaArray[16465] = "kan";
        alphaArray[16459] = "kang";
        alphaArray[16452] = "kao";
        alphaArray[16448] = "ke";
        alphaArray[16433] = "ken";
        alphaArray[16429] = "keng";
        alphaArray[16427] = "kong";
        alphaArray[16423] = "kou";
        alphaArray[16419] = "ku";
        alphaArray[16412] = "kua";
        alphaArray[16407] = "kuai";
        alphaArray[16403] = "kuan";
        alphaArray[16401] = "kuang";
        alphaArray[16393] = "kui";
        alphaArray[16220] = "kun";
        alphaArray[16216] = "kuo";
        alphaArray[16212] = "la";
        alphaArray[16205] = "lai";
        alphaArray[16202] = "lan";
        alphaArray[16187] = "lang";
        alphaArray[16180] = "lao";
        alphaArray[16171] = "le";
        alphaArray[16169] = "lei";
        alphaArray[16158] = "leng";
        alphaArray[16155] = "li";
        alphaArray[15959] = "lia";
        alphaArray[15958] = "lian";
        alphaArray[15944] = "liang";
        alphaArray[15933] = "liao";
        alphaArray[15920] = "lie";
        alphaArray[15915] = "lin";
        alphaArray[15903] = "ling";
        alphaArray[15889] = "liu";
        alphaArray[15878] = "long";
        alphaArray[15707] = "lou";
        alphaArray[15701] = "lu";
        alphaArray[15681] = "lv";
        alphaArray[15667] = "luan";
        alphaArray[15661] = "lue";
        alphaArray[15659] = "lun";
        alphaArray[15652] = "luo";
        alphaArray[15640] = "ma";
        alphaArray[15631] = "mai";
        alphaArray[15625] = "man";
        alphaArray[15454] = "mang";
        alphaArray[15448] = "mao";
        alphaArray[15436] = "me";
        alphaArray[15435] = "mei";
        alphaArray[15419] = "men";
        alphaArray[15416] = "meng";
        alphaArray[15408] = "mi";
        alphaArray[15394] = "mian";
        alphaArray[15385] = "miao";
        alphaArray[15377] = "mie";
        alphaArray[15375] = "min";
        alphaArray[15369] = "ming";
        alphaArray[15363] = "miu";
        alphaArray[15362] = "mo";
        alphaArray[15183] = "mou";
        alphaArray[15180] = "mu";
        alphaArray[15165] = "na";
        alphaArray[15158] = "nai";
        alphaArray[15153] = "nan";
        alphaArray[15150] = "nang";
        alphaArray[15149] = "nao";
        alphaArray[15144] = "ne";
        alphaArray[15143] = "nei";
        alphaArray[15141] = "nen";
        alphaArray[15140] = "neng";
        alphaArray[15139] = "ni";
        alphaArray[15128] = "nian";
        alphaArray[15121] = "niang";
        alphaArray[15119] = "niao";
        alphaArray[15117] = "nie";
        alphaArray[15110] = "nin";
        alphaArray[15109] = "ning";
        alphaArray[14941] = "niu";
        alphaArray[14937] = "nong";
        alphaArray[14933] = "nu";
        alphaArray[14930] = "nv";
        alphaArray[14929] = "nuan";
        alphaArray[14928] = "nue";
        alphaArray[14926] = "nuo";
        alphaArray[14922] = "o";
        alphaArray[14921] = "ou";
        alphaArray[14914] = "pa";
        alphaArray[14908] = "pai";
        alphaArray[14902] = "pan";
        alphaArray[14894] = "pang";
        alphaArray[14889] = "pao";
        alphaArray[14882] = "pei";
        alphaArray[14873] = "pen";
        alphaArray[14871] = "peng";
        alphaArray[14857] = "pi";
        alphaArray[14678] = "pian";
        alphaArray[14674] = "piao";
        alphaArray[14670] = "pie";
        alphaArray[14668] = "pin";
        alphaArray[14663] = "ping";
        alphaArray[14654] = "po";
        alphaArray[14645] = "pu";
        alphaArray[14630] = "qi";
        alphaArray[14594] = "qia";
        alphaArray[14429] = "qian";
        alphaArray[14407] = "qiang";
        alphaArray[14399] = "qiao";
        alphaArray[14384] = "qie";
        alphaArray[14379] = "qin";
        alphaArray[14368] = "qing";
        alphaArray[14355] = "qiong";
        alphaArray[14353] = "qiu";
        alphaArray[14345] = "qu";
        alphaArray[14170] = "quan";
        alphaArray[14159] = "que";
        alphaArray[14151] = "qun";
        alphaArray[14149] = "ran";
        alphaArray[14145] = "rang";
        alphaArray[14140] = "rao";
        alphaArray[14137] = "re";
        alphaArray[14135] = "ren";
        alphaArray[14125] = "reng";
        alphaArray[14123] = "ri";
        alphaArray[14122] = "rong";
        alphaArray[14112] = "rou";
        alphaArray[14109] = "ru";
        alphaArray[14099] = "ruan";
        alphaArray[14097] = "rui";
        alphaArray[14094] = "run";
        alphaArray[14092] = "ruo";
        alphaArray[14090] = "sa";
        alphaArray[14087] = "sai";
        alphaArray[14083] = "san";
        alphaArray[13917] = "sang";
        alphaArray[13914] = "sao";
        alphaArray[13910] = "se";
        alphaArray[13907] = "sen";
        alphaArray[13906] = "seng";
        alphaArray[13905] = "sha";
        alphaArray[13896] = "shai";
        alphaArray[13894] = "shan";
        alphaArray[13878] = "shang";
        alphaArray[13870] = "shao";
        alphaArray[13859] = "she";
        alphaArray[13847] = "shen";
        alphaArray[13831] = "sheng";
        alphaArray[13658] = "shi";
        alphaArray[13611] = "shou";
        alphaArray[13601] = "shu";
        alphaArray[13406] = "shua";
        alphaArray[13404] = "shuai";
        alphaArray[13400] = "shuan";
        alphaArray[13398] = "shuang";
        alphaArray[13395] = "shui";
        alphaArray[13391] = "shun";
        alphaArray[13387] = "shuo";
        alphaArray[13383] = "si";
        alphaArray[13367] = "song";
        alphaArray[13359] = "sou";
        alphaArray[13356] = "su";
        alphaArray[13343] = "suan";
        alphaArray[13340] = "sui";
        alphaArray[13329] = "sun";
        alphaArray[13326] = "suo";
        alphaArray[13318] = "ta";
        alphaArray[13147] = "tai";
        alphaArray[13138] = "tan";
        alphaArray[13120] = "tang";
        alphaArray[13107] = "tao";
        alphaArray[13096] = "te";
        alphaArray[13095] = "teng";
        alphaArray[13091] = "ti";
        alphaArray[13076] = "tian";
        alphaArray[13068] = "tiao";
        alphaArray[13063] = "tie";
        alphaArray[13060] = "ting";
        alphaArray[12888] = "tong";
        alphaArray[12875] = "tou";
        alphaArray[12871] = "tu";
        alphaArray[12860] = "tuan";
        alphaArray[12858] = "tui";
        alphaArray[12852] = "tun";
        alphaArray[12849] = "tuo";
        alphaArray[12838] = "wa";
        alphaArray[12831] = "wai";
        alphaArray[12829] = "wan";
        alphaArray[12812] = "wang";
        alphaArray[12802] = "wei";
        alphaArray[12607] = "wen";
        alphaArray[12597] = "weng";
        alphaArray[12594] = "wo";
        alphaArray[12585] = "wu";
        alphaArray[12556] = "xi";
        alphaArray[12359] = "xia";
        alphaArray[12346] = "xian";
        alphaArray[12320] = "xiang";
        alphaArray[12300] = "xiao";
        alphaArray[12120] = "xie";
        alphaArray[12099] = "xin";
        alphaArray[12089] = "xing";
        alphaArray[12074] = "xiong";
        alphaArray[12067] = "xiu";
        alphaArray[12058] = "xu";
        alphaArray[12039] = "xuan";
        alphaArray[11867] = "xue";
        alphaArray[11861] = "xun";
        alphaArray[11847] = "ya";
        alphaArray[11831] = "yan";
        alphaArray[11798] = "yang";
        alphaArray[11781] = "yao";
        alphaArray[11604] = "ye";
        alphaArray[11589] = "yi";
        alphaArray[11536] = "yin";
        alphaArray[11358] = "ying";
        alphaArray[11340] = "yo";
        alphaArray[11339] = "yong";
        alphaArray[11324] = "you";
        alphaArray[11303] = "yu";
        alphaArray[11097] = "yuan";
        alphaArray[11077] = "yue";
        alphaArray[11067] = "yun";
        alphaArray[11055] = "za";
        alphaArray[11052] = "zai";
        alphaArray[11045] = "zan";
        alphaArray[11041] = "zang";
        alphaArray[11038] = "zao";
        alphaArray[11024] = "ze";
        alphaArray[11020] = "zei";
        alphaArray[11019] = "zen";
        alphaArray[11018] = "zeng";
        alphaArray[11014] = "zha";
        alphaArray[10838] = "zhai";
        alphaArray[10832] = "zhan";
        alphaArray[10815] = "zhang";
        alphaArray[10800] = "zhao";
        alphaArray[10790] = "zhe";
        alphaArray[10780] = "zhen";
        alphaArray[10764] = "zheng";
        alphaArray[10587] = "zhi";
        alphaArray[10544] = "zhong";
        alphaArray[10533] = "zhou";
        alphaArray[10519] = "zhu";
        alphaArray[10331] = "zhua";
        alphaArray[10329] = "zhuai";
        alphaArray[10328] = "zhuan";
        alphaArray[10322] = "zhuang";
        alphaArray[10315] = "zhui";
        alphaArray[10309] = "zhun";
        alphaArray[10307] = "zhuo";
        alphaArray[10296] = "zi";
        alphaArray[10281] = "zong";
        alphaArray[10274] = "zou";
        alphaArray[10270] = "zu";
        alphaArray[10262] = "zuan";
        alphaArray[10260] = "zui";
        alphaArray[10256] = "zun";
        alphaArray[10254] = "zuo";
    }

    private static int[] positionArray = new int[396];
    static {
        positionArray[395] = 20319;
        positionArray[394] = 20317;
        positionArray[393] = 20304;
        positionArray[392] = 20295;
        positionArray[391] = 20292;
        positionArray[390] = 20283;
        positionArray[389] = 20265;
        positionArray[388] = 20257;
        positionArray[387] = 20242;
        positionArray[386] = 20230;
        positionArray[385] = 20051;
        positionArray[384] = 20036;
        positionArray[383] = 20032;
        positionArray[382] = 20026;
        positionArray[381] = 20002;
        positionArray[380] = 19990;
        positionArray[379] = 19986;
        positionArray[378] = 19982;
        positionArray[377] = 19976;
        positionArray[376] = 19805;
        positionArray[375] = 19784;
        positionArray[374] = 19775;
        positionArray[373] = 19774;
        positionArray[372] = 19763;
        positionArray[371] = 19756;
        positionArray[370] = 19751;
        positionArray[369] = 19746;
        positionArray[368] = 19741;
        positionArray[367] = 19739;
        positionArray[366] = 19728;
        positionArray[365] = 19725;
        positionArray[364] = 19715;
        positionArray[363] = 19540;
        positionArray[362] = 19531;
        positionArray[361] = 19525;
        positionArray[360] = 19515;
        positionArray[359] = 19500;
        positionArray[358] = 19484;
        positionArray[357] = 19479;
        positionArray[356] = 19467;
        positionArray[355] = 19289;
        positionArray[354] = 19288;
        positionArray[353] = 19281;
        positionArray[352] = 19275;
        positionArray[351] = 19270;
        positionArray[350] = 19263;
        positionArray[349] = 19261;
        positionArray[348] = 19249;
        positionArray[347] = 19243;
        positionArray[346] = 19242;
        positionArray[345] = 19238;
        positionArray[344] = 19235;
        positionArray[343] = 19227;
        positionArray[342] = 19224;
        positionArray[341] = 19218;
        positionArray[340] = 19212;
        positionArray[339] = 19038;
        positionArray[338] = 19023;
        positionArray[337] = 19018;
        positionArray[336] = 19006;
        positionArray[335] = 19003;
        positionArray[334] = 18996;
        positionArray[333] = 18977;
        positionArray[332] = 18961;
        positionArray[331] = 18952;
        positionArray[330] = 18783;
        positionArray[329] = 18774;
        positionArray[328] = 18773;
        positionArray[327] = 18763;
        positionArray[326] = 18756;
        positionArray[325] = 18741;
        positionArray[324] = 18735;
        positionArray[323] = 18731;
        positionArray[322] = 18722;
        positionArray[321] = 18710;
        positionArray[320] = 18697;
        positionArray[319] = 18696;
        positionArray[318] = 18526;
        positionArray[317] = 18518;
        positionArray[316] = 18501;
        positionArray[315] = 18490;
        positionArray[314] = 18478;
        positionArray[313] = 18463;
        positionArray[312] = 18448;
        positionArray[311] = 18447;
        positionArray[310] = 18446;
        positionArray[309] = 18239;
        positionArray[308] = 18237;
        positionArray[307] = 18231;
        positionArray[306] = 18220;
        positionArray[305] = 18211;
        positionArray[304] = 18201;
        positionArray[303] = 18184;
        positionArray[302] = 18183;
        positionArray[301] = 18181;
        positionArray[300] = 18012;
        positionArray[299] = 17997;
        positionArray[298] = 17988;
        positionArray[297] = 17970;
        positionArray[296] = 17964;
        positionArray[295] = 17961;
        positionArray[294] = 17950;
        positionArray[293] = 17947;
        positionArray[292] = 17931;
        positionArray[291] = 17928;
        positionArray[290] = 17922;
        positionArray[289] = 17759;
        positionArray[288] = 17752;
        positionArray[287] = 17733;
        positionArray[286] = 17730;
        positionArray[285] = 17721;
        positionArray[284] = 17703;
        positionArray[283] = 17701;
        positionArray[282] = 17697;
        positionArray[281] = 17692;
        positionArray[280] = 17683;
        positionArray[279] = 17676;
        positionArray[278] = 17496;
        positionArray[277] = 17487;
        positionArray[276] = 17482;
        positionArray[275] = 17468;
        positionArray[274] = 17454;
        positionArray[273] = 17433;
        positionArray[272] = 17427;
        positionArray[271] = 17417;
        positionArray[270] = 17202;
        positionArray[269] = 17185;
        positionArray[268] = 16983;
        positionArray[267] = 16970;
        positionArray[266] = 16942;
        positionArray[265] = 16915;
        positionArray[264] = 16733;
        positionArray[263] = 16708;
        positionArray[262] = 16706;
        positionArray[261] = 16689;
        positionArray[260] = 16664;
        positionArray[259] = 16657;
        positionArray[258] = 16647;
        positionArray[257] = 16474;
        positionArray[256] = 16470;
        positionArray[255] = 16465;
        positionArray[254] = 16459;
        positionArray[253] = 16452;
        positionArray[252] = 16448;
        positionArray[251] = 16433;
        positionArray[250] = 16429;
        positionArray[249] = 16427;
        positionArray[248] = 16423;
        positionArray[247] = 16419;
        positionArray[246] = 16412;
        positionArray[245] = 16407;
        positionArray[244] = 16403;
        positionArray[243] = 16401;
        positionArray[242] = 16393;
        positionArray[241] = 16220;
        positionArray[240] = 16216;
        positionArray[239] = 16212;
        positionArray[238] = 16205;
        positionArray[237] = 16202;
        positionArray[236] = 16187;
        positionArray[235] = 16180;
        positionArray[234] = 16171;
        positionArray[233] = 16169;
        positionArray[232] = 16158;
        positionArray[231] = 16155;
        positionArray[230] = 15959;
        positionArray[229] = 15958;
        positionArray[228] = 15944;
        positionArray[227] = 15933;
        positionArray[226] = 15920;
        positionArray[225] = 15915;
        positionArray[224] = 15903;
        positionArray[223] = 15889;
        positionArray[222] = 15878;
        positionArray[221] = 15707;
        positionArray[220] = 15701;
        positionArray[219] = 15681;
        positionArray[218] = 15667;
        positionArray[217] = 15661;
        positionArray[216] = 15659;
        positionArray[215] = 15652;
        positionArray[214] = 15640;
        positionArray[213] = 15631;
        positionArray[212] = 15625;
        positionArray[211] = 15454;
        positionArray[210] = 15448;
        positionArray[209] = 15436;
        positionArray[208] = 15435;
        positionArray[207] = 15419;
        positionArray[206] = 15416;
        positionArray[205] = 15408;
        positionArray[204] = 15394;
        positionArray[203] = 15385;
        positionArray[202] = 15377;
        positionArray[201] = 15375;
        positionArray[200] = 15369;
        positionArray[199] = 15363;
        positionArray[198] = 15362;
        positionArray[197] = 15183;
        positionArray[196] = 15180;
        positionArray[195] = 15165;
        positionArray[194] = 15158;
        positionArray[193] = 15153;
        positionArray[192] = 15150;
        positionArray[191] = 15149;
        positionArray[190] = 15144;
        positionArray[189] = 15143;
        positionArray[188] = 15141;
        positionArray[187] = 15140;
        positionArray[186] = 15139;
        positionArray[185] = 15128;
        positionArray[184] = 15121;
        positionArray[183] = 15119;
        positionArray[182] = 15117;
        positionArray[181] = 15110;
        positionArray[180] = 15109;
        positionArray[179] = 14941;
        positionArray[178] = 14937;
        positionArray[177] = 14933;
        positionArray[176] = 14930;
        positionArray[175] = 14929;
        positionArray[174] = 14928;
        positionArray[173] = 14926;
        positionArray[172] = 14922;
        positionArray[171] = 14921;
        positionArray[170] = 14914;
        positionArray[169] = 14908;
        positionArray[168] = 14902;
        positionArray[167] = 14894;
        positionArray[166] = 14889;
        positionArray[165] = 14882;
        positionArray[164] = 14873;
        positionArray[163] = 14871;
        positionArray[162] = 14857;
        positionArray[161] = 14678;
        positionArray[160] = 14674;
        positionArray[159] = 14670;
        positionArray[158] = 14668;
        positionArray[157] = 14663;
        positionArray[156] = 14654;
        positionArray[155] = 14645;
        positionArray[154] = 14630;
        positionArray[153] = 14594;
        positionArray[152] = 14429;
        positionArray[151] = 14407;
        positionArray[150] = 14399;
        positionArray[149] = 14384;
        positionArray[148] = 14379;
        positionArray[147] = 14368;
        positionArray[146] = 14355;
        positionArray[145] = 14353;
        positionArray[144] = 14345;
        positionArray[143] = 14170;
        positionArray[142] = 14159;
        positionArray[141] = 14151;
        positionArray[140] = 14149;
        positionArray[139] = 14145;
        positionArray[138] = 14140;
        positionArray[137] = 14137;
        positionArray[136] = 14135;
        positionArray[135] = 14125;
        positionArray[134] = 14123;
        positionArray[133] = 14122;
        positionArray[132] = 14112;
        positionArray[131] = 14109;
        positionArray[130] = 14099;
        positionArray[129] = 14097;
        positionArray[128] = 14094;
        positionArray[127] = 14092;
        positionArray[126] = 14090;
        positionArray[125] = 14087;
        positionArray[124] = 14083;
        positionArray[123] = 13917;
        positionArray[122] = 13914;
        positionArray[121] = 13910;
        positionArray[120] = 13907;
        positionArray[119] = 13906;
        positionArray[118] = 13905;
        positionArray[117] = 13896;
        positionArray[116] = 13894;
        positionArray[115] = 13878;
        positionArray[114] = 13870;
        positionArray[113] = 13859;
        positionArray[112] = 13847;
        positionArray[111] = 13831;
        positionArray[110] = 13658;
        positionArray[109] = 13611;
        positionArray[108] = 13601;
        positionArray[107] = 13406;
        positionArray[106] = 13404;
        positionArray[105] = 13400;
        positionArray[104] = 13398;
        positionArray[103] = 13395;
        positionArray[102] = 13391;
        positionArray[101] = 13387;
        positionArray[100] = 13383;
        positionArray[99] = 13367;
        positionArray[98] = 13359;
        positionArray[97] = 13356;
        positionArray[96] = 13343;
        positionArray[95] = 13340;
        positionArray[94] = 13329;
        positionArray[93] = 13326;
        positionArray[92] = 13318;
        positionArray[91] = 13147;
        positionArray[90] = 13138;
        positionArray[89] = 13120;
        positionArray[88] = 13107;
        positionArray[87] = 13096;
        positionArray[86] = 13095;
        positionArray[85] = 13091;
        positionArray[84] = 13076;
        positionArray[83] = 13068;
        positionArray[82] = 13063;
        positionArray[81] = 13060;
        positionArray[80] = 12888;
        positionArray[79] = 12875;
        positionArray[78] = 12871;
        positionArray[77] = 12860;
        positionArray[76] = 12858;
        positionArray[75] = 12852;
        positionArray[74] = 12849;
        positionArray[73] = 12838;
        positionArray[72] = 12831;
        positionArray[71] = 12829;
        positionArray[70] = 12812;
        positionArray[69] = 12802;
        positionArray[68] = 12607;
        positionArray[67] = 12597;
        positionArray[66] = 12594;
        positionArray[65] = 12585;
        positionArray[64] = 12556;
        positionArray[63] = 12359;
        positionArray[62] = 12346;
        positionArray[61] = 12320;
        positionArray[60] = 12300;
        positionArray[59] = 12120;
        positionArray[58] = 12099;
        positionArray[57] = 12089;
        positionArray[56] = 12074;
        positionArray[55] = 12067;
        positionArray[54] = 12058;
        positionArray[53] = 12039;
        positionArray[52] = 11867;
        positionArray[51] = 11861;
        positionArray[50] = 11847;
        positionArray[49] = 11831;
        positionArray[48] = 11798;
        positionArray[47] = 11781;
        positionArray[46] = 11604;
        positionArray[45] = 11589;
        positionArray[44] = 11536;
        positionArray[43] = 11358;
        positionArray[42] = 11340;
        positionArray[41] = 11339;
        positionArray[40] = 11324;
        positionArray[39] = 11303;
        positionArray[38] = 11097;
        positionArray[37] = 11077;
        positionArray[36] = 11067;
        positionArray[35] = 11055;
        positionArray[34] = 11052;
        positionArray[33] = 11045;
        positionArray[32] = 11041;
        positionArray[31] = 11038;
        positionArray[30] = 11024;
        positionArray[29] = 11020;
        positionArray[28] = 11019;
        positionArray[27] = 11018;
        positionArray[26] = 11014;
        positionArray[25] = 10838;
        positionArray[24] = 10832;
        positionArray[23] = 10815;
        positionArray[22] = 10800;
        positionArray[21] = 10790;
        positionArray[20] = 10780;
        positionArray[19] = 10764;
        positionArray[18] = 10587;
        positionArray[17] = 10544;
        positionArray[16] = 10533;
        positionArray[15] = 10519;
        positionArray[14] = 10331;
        positionArray[13] = 10329;
        positionArray[12] = 10328;
        positionArray[11] = 10322;
        positionArray[10] = 10315;
        positionArray[9] = 10309;
        positionArray[8] = 10307;
        positionArray[7] = 10296;
        positionArray[6] = 10281;
        positionArray[5] = 10274;
        positionArray[4] = 10270;
        positionArray[3] = 10262;
        positionArray[2] = 10260;
        positionArray[1] = 10256;
        positionArray[0] = 10254;
    }

    /**
     * 获取输入字符串的拼音。对于输入的符号和英文字母,原样返回。
     * <p>目前对一字多音的处理还欠缺,如”行业“的拼音会转为”xingye“。只能对GB2312字符集进行解析。
     * @param gb2312Str 输入字符串
     * @return
     */
    public static String getAlpha(String gb2312Str) {
        if (!StringUtils.hasText(gb2312Str)) {
            return null;
        }
        char[] singleGB2312Chars = gb2312Str.toCharArray();
        StringBuffer alphaBuffer = new StringBuffer();
        for (int i = 0; i < singleGB2312Chars.length; i++) {
            String alpha = getSpellByAscii(getCnAscii((singleGB2312Chars[i])));
            if(null == alpha) {
                alphaBuffer.append(singleGB2312Chars[i]);
            } else {
                alphaBuffer.append(alpha);
            }            
        }
        return alphaBuffer.toString();
    }

    /**
     * 获取给定字符串的拼音大写的首字母
     * @param gb2312Str
     * @return
     */
    public static String getCapitalAlpha(String gb2312Str) {
        if (!StringUtils.hasText(gb2312Str)) {
            return null;
        }
        char[] singleGB2312Chars = gb2312Str.toCharArray();
        StringBuffer alphaBuffer = new StringBuffer();
        for (int i = 0; i < singleGB2312Chars.length; i++) {
            String alpha = getSpellByAscii(getCnAscii((singleGB2312Chars[i])));
            if (null != alpha) {
                alphaBuffer.append(alpha.toUpperCase().charAt(0));
            }
        }
        return alphaBuffer.toString();
    }

    /**
     * 获得单个汉字的Ascii.
     * 
     * @param cn
     *            char 汉字字符
     * @return int 错误返回 0,否则返回ascii
     */
    private static int getCnAscii(char cn) {
        byte[] bytes = (String.valueOf(cn)).getBytes();
        if (bytes == null || bytes.length > 2 || bytes.length <= 0) { // 错误
            return 0;
        }
        if (bytes.length == 1) { // 英文字符
            return bytes[0];
        }
        if (bytes.length == 2) { // 中文字符
            int hightByte = 256 + bytes[0];
            int lowByte = 256 + bytes[1];
            int ascii = (256 * hightByte + lowByte) - 256 * 256;
            return ascii;
        }
        return 0; // 错误
    }

    /**
     * 根据ASCII码找对应的拼音
     * 
     * @param ascii
     *            int 字符对应的ASCII
     * @return String 拼音
     */
    private static String getSpellByAscii(int ascii) {
        if (ascii > 0 && ascii < 160) { // 单字符--英文或半角字符
            return String.valueOf((char) ascii);
        }
        if (ascii < -20319 || ascii > -10247) { // 不知道的字符
            return null;
        }
        int key = Math.abs(ascii);
        int startIndex = positionSearch(key);
        if (0 == startIndex && positionArray[startIndex + 1] < key) {
            return null;
        }
        return alphaArray[positionArray[startIndex]];
    }

    /**
     * 折半法查找拼音位置
     * 
     * @param key
     *            汉字ascii码绝对值
     * @return 对应的拼音位置
     */
    private static int positionSearch(int key) {
        int low = 1;
        int high = positionArray.length;
        int mid = (low + high) / 2;
        while (low < high) {
            if (key > positionArray[mid] && key < positionArray[mid + 1]) {
                return mid + 1;
            } else if (key <= positionArray[mid] && key > positionArray[mid - 1]) {
                return mid;
            } else if (key < positionArray[mid]) {
                high = mid - 1;
            } else {
                low = mid + 1;
            }
            mid = (low + high) / 2;
        }
        return mid;
    }
}

分享到:
评论
1 楼 xiaojing4037429 2013-02-28  
GB2312二级字库不是根据拼音弄的吧?

相关推荐

    汉字区位码查询

    因此,区位码可以表示6763个不同的汉字,这涵盖了GB2312标准中的基本汉字。区位码的主要优点在于直观和简单,每个汉字都有一个独一无二的编码,方便查找和输入。 "汉字区位码查询工具"是一种软件应用,它的设计目标...

    汉字区位码查询V1.2

    随着Unicode和GB系列编码的普及,区位码在现代计算机中的使用已经相对较少,但在某些特定领域,如汉字输入法的开发、老式设备的兼容性问题,以及某些特定的编码转换场景下,区位码查询工具仍然有着不可替代的作用。...

    汉字区位码查询软件 易于查询汉字的好工具

    4. **其他功能**:高级的查询软件可能还包含复制功能,用户可以直接复制区位码,或者提供转换功能,将区位码转换为其他编码格式,如GB2312、GBK、UTF-8等。 对于初学者来说,理解区位码和使用区位码查询软件是非常...

    区位码查询软件(机读卡涂写的必备软件)

    总的来说,区位码查询软件是处理机读卡汉字涂写问题的有效工具,通过其便捷的查询功能,帮助用户准确无误地完成信息填写,提高数据录入的准确性和效率。在教育、考试等场景中,这类软件扮演了不可或缺的角色。

    c# 汉字转成拼音码

    一种常见的自行实现方法是利用GB2312或GBK编码中的汉字区位码。区位码是汉字的两个字节编码,分别对应汉字的区号和位号,通过对区位码的查询,可以得到对应汉字的拼音码。不过,这一方法存在一定的局限性,由于不同...

    c#查询测字拼音首字母

    在C#编程中,查询汉字拼音首字母是一个常见的需求,特别是在处理中文数据、搜索功能或者构建中文输入法等场景下。标题所提到的“c#查询测字拼音首字母”实际上是在指如何通过C#语言获取汉字对应的拼音首字母。描述中...

    查询区位码软件方便高考填写

    3. **编码转换**:除了查询,这类软件往往还支持区位码与其他编码系统的转换,如GB2312、GBK、UTF-8等,便于在不同系统或应用间进行数据交换。 4. **学习辅助**:对于学习汉字编码的学生,区位码查询软件还可以提供...

    根据中文自动生成拼音代码(delphi)

    标题“根据中文自动生成拼音代码 (Delphi)”暗示我们将讨论如何在Delphi环境下实现中文到拼音的自动转换。 首先,我们要了解中文字符(汉字)与拼音之间的关系。汉字是基于汉语的表意文字,而拼音是用拉丁字母来...

    pinyin:汉语转拼音区位码表,包含一、二级汉字共7000个左右,PHP编写,其他语言类似

    区位码是汉字的一种编码方式,由两个16进制数字组成,对应汉字在字库中的位置。这种编码对于处理汉字字符集,尤其是早期的GB2312或GBK编码标准,非常重要。 这个项目使用了PHP编程语言来实现汉字到拼音的转换功能。...

    汉字编码表

    GBK是GB2312标准的扩展,支持更多的汉字,同时兼容GB2312。例如,“一”字的GBK编码为“D2BB”。 **6. 笔画数** 指汉字所包含的笔画数量。例如,“一”字有1画。 **7. 部首** 部首是汉字的一种分类方式,主要用于...

    Java版汉字转拼音,全拼,简拼

    汉字与拼音之间的转换主要依赖于汉字的区位码和音标信息。区位码是每个汉字在计算机中存储的编码位置,而音标信息则关联了汉字的读音。早期的转换方法会通过查找区位码对应的拼音表来实现,但在现代Java中,我们可以...

    区位输入法(文档)

    从这些例子中可以看出,区位码的分布并不完全是按照汉字的笔画顺序或者拼音顺序排列的,而是根据一定的规则进行编码。这种编码方式虽然在某些方面不如现代输入法方便,但在特定的历史背景下起到了重要作用。 ### ...

    汉字字库生成软件

    2. **汉字查询**:通过输入汉字或拼音,用户可以快速查找对应的汉字代码,如Unicode码、区位码等,这对于编写涉及汉字处理的程序非常实用。 3. **字库生成**:软件可能具备生成自定义汉字字库的功能,允许用户根据...

    C#汉字转拼音类库项目hanziChangePinyinHelper.zip

    3. **拼音区编码**:在早期的计算机系统中,汉字是通过区位码来表示的,每个汉字对应一个特定的区号和位号。虽然现在大部分系统使用Unicode编码,但了解这个概念有助于理解拼音转换的底层原理。 4. **拼音组**:在...

    汉字代码查询[整理].pdf

    每一页文件中,汉字区位码用四位阿拉伯数字表示,这是汉字在国标编码中的唯一标识。通过这四位数字,可以迅速定位到特定的汉字,这对于数据存储和处理具有重要意义。 此外,文件还摘录了《字符集和信息编码国家标准...

    苏州大学 计算机中文信息处理技术 考试试卷..doc

    2. 区位码是 1801 的汉字,它在 GB2312-80 中的机内码应该是 B2A1H。 六、汉字信息处理应用 1. TrueType 字库是一种常用的字库,用于显示汉字信息。 2. 手机上使用的编码方案,如 T9 拼音、T9 笔画等,属于小键盘...

    国标码 电子设计 液晶显示 计算机

    - 国标码的每个汉字都有一个固定的区位码,由两个数字组成,第一个数字称为区码,第二个数字称为位码。 - 机内码是由区位码经过一定的转换得到的,具体转换方式为:区码+20H、位码+20H(其中H表示十六进制)。 ##...

    WPF大数据量多层级搜索定位资源

    这通常涉及到对汉字转换为拼音的库的使用,如Pinyin4Net,或者自己实现基于Unicode区位码的映射规则。例如,我们可以遍历汉字的Unicode值,判断其在哪个拼音区间,从而获取对应的首字母。 `TreeViewHelper.cs`文件...

    全国计算机等级考试NCRE一级B理论题讲义.pdf

    7. **汉字输入法**:常见的汉字输入法有五笔字型码(基于汉字结构)、全拼(根据汉字的拼音)和简拼(拼音的简化形式)。区位码输入法是一字一码,没有重码,但不是所有用户都易于记忆和使用。 8. **汉字编码**:...

    vba-vbs-access-python-microstation学习体会

    例如,可以使用VBA的Asc函数获取汉字的区位码。然后,作者讲解了自定义函数的概念和方法,用于实现自定义的功能。例如,可以使用VBA的Function语句定义一个自定义函数。 最后,作者讲解了汉字转拼音的概念和方法,...

Global site tag (gtag.js) - Google Analytics