`
isiqi
  • 浏览: 16500366 次
  • 性别: Icon_minigender_1
  • 来自: 济南
社区版块
存档分类
最新评论

HDUOJ1053 Entropy

阅读更多

Entropy

Time Limit: 2000/1000 MS (Java/Others)Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 1096Accepted Submission(s): 411

Problem Description
An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with “wasted” or “extra” information removed. In other words, entropy encoding removes information that was not necessary in the first place to accurately encode the message. A high degree of entropy implies a message with a great deal of wasted information; english text encoded in ASCII is an example of a message type that has very high entropy. Already compressed messages, such as JPEG graphics or ZIP archives, have very little entropy and do not benefit from further attempts at entropy encoding.

English text encoded in ASCII has a high degree of entropy because all characters are encoded using the same number of bits, eight. It is a known fact that the letters E, L, N, R, S and T occur at a considerably higher frequency than do most other letters in english text. If a way could be found to encode just these letters with four bits, then the new encoding would be smaller, would contain all the original information, and would have less entropy. ASCII uses a fixed number of bits for a reason, however: it’s easy, since one is always dealing with a fixed number of bits to represent each possible glyph or character. How would an encoding scheme that used four bits for the above letters be able to distinguish between the four-bit codes and eight-bit codes? This seemingly difficult problem is solved using what is known as a “prefix-free variable-length” encoding.

In such an encoding, any number of bits can be used to represent any glyph, and glyphs not present in the message are simply not encoded. However, in order to be able to recover the information, no bit pattern that encodes a glyph is allowed to be the prefix of any other encoding bit pattern. This allows the encoded bitstream to be read bit by bit, and whenever a set of bits is encountered that represents a glyph, that glyph can be decoded. If the prefix-free constraint was not enforced, then such a decoding would be impossible.

Consider the text “AAAAABCD”. Using ASCII, encoding this would require 64 bits. If, instead, we encode “A” with the bit pattern “00”, “B” with “01”, “C” with “10”, and “D” with “11” then we can encode this text in only 16 bits; the resulting bit pattern would be “0000000000011011”. This is still a fixed-length encoding, however; we’re using two bits per glyph instead of eight. Since the glyph “A” occurs with greater frequency, could we do better by encoding it with fewer bits? In fact we can, but in order to maintain a prefix-free encoding, some of the other bit patterns will become longer than two bits. An optimal encoding is to encode “A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”. (This is clearly not the only optimal encoding, as it is obvious that the encodings for B, C and D could be interchanged freely for any given encoding without increasing the size of the final encoded message.) Using this encoding, the message encodes in only 13 bits to “0000010110111”, a compression ratio of 4.9 to 1 (that is, each bit in the final encoded message represents as much information as did 4.9 bits in the original encoding). Read through this bit pattern from left to right and you’ll see that the prefix-free encoding makes it simple to decode this into the original text even though the codes have varying bit lengths.

As a second example, consider the text “THE CAT IN THE HAT”. In this text, the letter “T” and the space character both occur with the highest frequency, so they will clearly have the shortest encoding bit patterns in an optimal encoding. The letters “C”, “I’ and “N” only occur once, however, so they will have the longest codes.

There are many possible sets of prefix-free variable-length bit patterns that would yield the optimal encoding, that is, that would allow the text to be encoded in the fewest number of bits. One such optimal encoding is to encode spaces with “00”, “A” with “100”, “C” with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N” with “1011” and “T” with “01”. The optimal encoding therefore requires only 51 bits compared to the 144 that would be necessary to encode the message with 8-bit ASCII encoding, a compression ratio of 2.8 to 1.
Input
The input file will contain a list of text strings, one per line. The text strings will consist only of uppercase alphanumeric characters and underscores (which are used in place of spaces). The end of the input will be signalled by a line containing only the word “END” as the text string. This line should not be processed.
Output
For each text string in the input, output the length in bits of the 8-bit ASCII encoding, the length in bits of an optimal prefix-free variable-length encoding, and the compression ratio accurate to one decimal point.
Sample Input
AAAAABCD THE_CAT_IN_THE_HAT END
Sample Output
64 13 4.9 144 51 2.8
Source
分享到:
评论

相关推荐

    cross_entropy.zip

    在IT领域,尤其是在医学图像处理和分析中,交叉熵(Cross-Entropy)是一个重要的评价指标,常用于衡量模型预测概率分布与实际标签之间的差异。在本案例中,我们讨论的是如何在MATLAB环境下计算交叉熵,以评估图像...

    Local Shannon entropy measure with statistical tests for image randomness.pdf

    In this paper we propose a new image randomness measure using Shannon entropy over local image blocks. The proposed local Shannon entropy measure overcomes several weaknesses of the conventional ...

    causal entropy markov games

    文章标题中提到的“causal entropy markov games”指的是因果熵马尔可夫博弈,这是一种结合了博弈论和机器学习视角的理论研究。在这里,因果熵的概念被用来定义策略均衡,而马尔可夫博弈则是在多智能体交互环境中,...

    Entropy16.dmg

    mac 办公软件,压缩解压缩软件,Entropy16.dmg

    entropy.zip_shannon entropy_功率谱香农熵_奇异 熵_奇异熵_香农熵

    在给定的压缩包文件中,我们有两个主要的焦点:香农熵(Shannon Entropy)和奇异熵(Singular Entropy)。这两个熵都是通过不同的方法来分析信号的复杂性和结构。 香农熵是信息论的基础,由克劳德·香农(Claude Shannon...

    transfer_entropy_传递熵_transferentropy_matlab_

    在IT领域,尤其是在数据分析、信号处理和复杂系统研究中,传递熵(Transfer Entropy)是一种重要的统计工具,用于衡量两个时间序列之间的信息流方向。它不仅揭示了系统的因果关系,而且能够检测非线性和时变的依赖性...

    canonical-entropy/nop-app-mall

    canonical-entropy/nop-app-mallcanonical-entropy/nop-app-mallcanonical-entropy/nop-app-mallcanonical-entropy/nop-app-mallcanonical-entropy/nop-app-mallcanonical-entropy/nop-app-mallcanonical-entropy/...

    entropy-0.11-cp35-cp35m-win_amd64.whl.zip

    标题 "entropy-0.11-cp35-cp35m-win_amd64.whl.zip" 提供的信息表明,这是一个包含Python扩展模块的压缩文件。`entropy` 是一个可能的Python库名称,版本号为0.11。`cp35` 指的是这个库是为Python 3.5版本编译的,`...

    test-entropy.rar_MATLAB ENTROPY TEST_entropy_entropy of a image_

    在图像处理领域,熵(Entropy)是一个非常重要的概念,它被广泛用于衡量图像的信息含量或复杂程度。在MATLAB环境中,我们可以直接对图像进行熵的计算,以评估图像的质量。这个"test-entropy.rar"压缩包包含了一个...

    关于混沌Kolmogorov熵的计算程序 K entropy.rar

    (On chaotic Kolmogorov entropy calculation procedure, based on predecessors have written. Has been used a lot of chaotic time series K entropy calculations.) 文件列表: correlation_interal.dll ...

    matlab开发-Entropy

    在给定的压缩包文件中,"matlab开发-Entropy"的主题涉及到的是使用Matlab进行熵(Entropy)相关的开发工作,这通常与信息论、机器学习,特别是决策树算法紧密相关。熵是衡量信息不确定性或系统状态混乱程度的一个...

    微分熵_Differential Entropy_matlab

    资源名:微分熵_Differential Entropy_matlab 资源类型:matlab项目全套源码 源码说明: 全部项目源码都是经过测试校正后百分百成功运行的,如果您下载后不能运行可联系我进行指导或者更换。 适合人群:新手及有...

    Entropy_entropymatlab_

    The information provided by the source will be the average value of the information provided by each symbol individually each time they appear. This parameter is called Source Entropy.

    ksw_entropy_opencv.zip_entropy_ksw_opencv ksw entropy_zip

    这个“ksw_entropy_opencv.zip”压缩包文件似乎与使用OpenCV实现的一种特定的熵计算方法有关,名为“ksw entropy”。熵在信息论中是一个关键概念,它衡量的是信息的不确定性或系统的混乱程度。在这个上下文中,我们...

    超像素分割算法Entropy Rate Superpixel的实现代码

    本代码是论文: Liu M Y, Tuzel O, ... Entropy rate superpixel segmentation[C]// Computer Vision and Pattern Recognition. IEEE, 2011:2097-2104. 的实现代码,网络上的原始链接均已失效,分享出来给大家研究

    Rao’s Quadratic Entropy, Risk Management and Portfolio.pdf

    Rao’s Quadratic Entropy(RQE)是风险管理与投资组合理论中的一个重要概念,由统计学、生态学等领域广泛使用的Rao提出的多样性度量方法在金融经济学中的应用。这一度量标准旨在解决投资组合分散性的有效衡量问题,...

    information and entropy

    在2004年的MIT开放课程中,有一门关于《信息与熵》(6.050J/2.110J Information and Entropy)的课程,它深入浅出地探讨了信息理论中的核心概念——信息与熵,并将其应用到了计算、通信以及其他科学与工程领域。...

    entropy-0.11-cp27-cp27m-win_amd64.whl.zip

    标题中的"entropy-0.11-cp27-cp27m-win_amd64.whl.zip"是一个Python库的归档文件,用于在Windows操作系统(AMD64架构)上安装。这个文件遵循了Python Wheel格式,是预编译的Python软件包,通常用于简化安装过程。"cp...

    entropy_2.rar_NEW_entropy

    Entropy new measurements

Global site tag (gtag.js) - Google Analytics