`
aigo
  • 浏览: 2689700 次
  • 性别: Icon_minigender_1
  • 来自: 宜昌
社区版块
存档分类
最新评论

各个编程语言都无法表达出非2的幂的float型变量的问题

    博客分类:
  • Math
阅读更多

原文:http://stackoverflow.com/questions/588004/is-floating-point-math-broken

 

表现:

0.1+0.2==0.3->false
0.1+0.2->0.30000000000000004

Any ideas why this happens?

 

回答1:

Binary floating point math is like this. In most programming languages, it is based on the IEEE 754 standard. JavaScript uses 64-bit floating point representation, which is the same as Java's double. The crux of the problem is that numbers are represented in this format as a whole number times a power of two; rational numbers (such as 0.1, which is 1/10) whose denominator is not a power of two cannot be exactly represented.

For 0.1 in the standard binary64 format, the representation can be written exactly as

  • 0.1000000000000000055511151231257827021181583404541015625 in decimal, or
  • 0x1.999999999999ap-4 in C99 hexfloat notation.

In contrast, the rational number 0.1, which is 1/10, can be written exactly as

  • 0.1 in decimal, or
  • 0x1.99999999999999...p-4 in an analogue of C99 hexfloat notation, where the ... represents an unending sequence of 9's.

The constants 0.2 and 0.3 in your program will also be approximations to their true values. It happens that the closest double to 0.2 is larger than the rational number 0.2 but that the closest double to 0.3 is smaller than the rational number 0.3. The sum of 0.1 and 0.2 winds up being larger than the rational number 0.3 and hence disagreeing with the constant in your code.

A fairly comprehensive treatment of floating-point arithmetic issues is What Every Computer Scientist Should Know About Floating-Point Arithmetic. For an easier-to-digest explanation, see floating-point-gui.de.

 

回答2:

A Hardware Designer's Perspective

I believe I should add a hardware designer’s perspective to this since I design and build floating point hardware. Knowing the origin of the error may help in understanding what is happening in the software, and ultimately, I hope this helps explain the reasons for why floating point errors happen, and seem to accumulate over time.

1. Overview

From an engineering perspective, most floating point operations will have some element of error since the hardware that does the floating point computations is only required to have an error of less than one half of one unit in the last place. Therefore, much hardware will stop at a precision that's only necessary to yield an error of less than one half of one unit in the last place for a single operationwhich is especially problematic in floating point division. What constitutes a single operation depends upon how many operands the unit takes. For most, it is two, but some units take 3 or more operands. Because of this, there is no guarantee that repeated operations will result in a desirable error since the errors add up over time.

2. Standards

Most processors follow the IEEE-754 standard but some use denormalized, or different standards . For example, there is a denormalized mode in IEEE-754 which allows representation of very small floating point numbers at the expense of precision. The following however, will cover the normalized mode of IEEE-754 which is the typical mode of operation.

In the IEEE-754 standard, hardware designers are allowed any value of error/epsilon as long as it's less than one half of one unit in the last place, and the result only has to be less than one half of one unit in the last place for one operation. This explains why when there are repeated operations, the errors add up. For IEEE-754 double precision, this is the 54th bit, since 53 bits are used to represent the numeric part (normalized), also called the mantissa, of the floating point number (e.g. the 5.3 in 5.3e5). The next sections go into more detail on the causes of hardware error on various floating point operations.

3. Cause of Rounding Error in Division

The main cause of the error in floating point division, are the division algorithms used to calculate the quotient. Most computer systems calculate division using multiplication by an inverse, mainly in Z=X/YZ = X * (1/Y). Division is computed iteratively i.e. each cycle computes some bits of the quotient until the desired precision is reached, which for IEEE-754 is anything with an error of less than one unit in the last place. The table of reciprocals of Y (1/Y) is known as the quotient selection table (QST) in slow division, and the size in bits of the quotient selection table is usually the width of the radix, or number of bits of the quotient computed in each iteration, plus a few guard bits. For the IEEE-754 standard, double precision (64-bit), it would be the size of the radix of the divider, plus a few guard bits k, where k>=2. So for example, a typical Quotient Selection Table for a divider that computes 2 bits of the quotient at a time (radix 4) would be 2+2= 4 bits (plus a few optional bits).

3.1 Division Rounding Error: Approximation of Reciprocal

What reciprocals are in the quotient selection table depend on the division method: slow division such as SRT division, or fast division such as Goldschmidt division; each entry is modified according to the division algorithm in an attempt to yield the lowest possible error. In any case though, all reciprocals are approximations of the actual reciprocal, and introduce some element of error. Both slow division and fast division methods calculate the quotient iteratively, i.e. some number of bits of the quotient are calculated each step, then the result is subtracted from the dividend, and the divider repeats the steps until the error is less than one half of one unit in the last place. Slow division methods calculate a fixed number of digits of the quotient in each step and are usually less expensive to build, and fast division methods calculate a variable number of digits per step and are usually more expensive to build. The most important part of the division methods is that most of them rely upon repeated multiplication by an approximation of a reciprocal, so they are prone to error.

4. Rounding Errors in Other Operations: Truncation

Another cause of the rounding errors in all operations are the different modes of truncation of the final answer that IEEE-754 allows. There's truncate, round-towards-zero, round-to-nearest (default),round-down, and round-up. All methods introduce an element of error of less than one half of one unit in the last place for a single operation. Over time and repeated operations, truncation also adds cumulatively to the resultant error. This truncation error, is especially problematic in exponentiation, which involves some form of repeated multiplication.

5. Repeated Operations

Since the hardware that does the floating point calculations only needs to yield a result with an error of less than one half of one unit in the last place for a single operation, the error will grow over repeated operations if not watched. This is the reason that in computations that require a bounded error, mathematicians use methods such as using the round-to-nearest even digit in the last place of IEEE-754, because over time, the errors are more likely to cancel each other out, and Interval Arithmeticcombined with variations of the IEEE 754 rounding modes to predict rounding errors, and correct them. Because of its low relative error compared to other rounding modes, round to nearest even digit (in the last place), is the default rounding mode of IEEE-754.

Note that the default rounding mode, round-to-nearest even digit in the last place, guarantees an error of less than one half of one unit in the last place for one operation. Using the truncation, round-up, and round down alone may result in an error that is greater than one half of one unit in the last place, but less than one unit in the last place, so these modes are not recommended unless they are used in Interval Arithmetic.

6. Summary

In short, the fundamental reason for the errors in floating point operations is a combination of the truncation in hardware, and the truncation of a reciprocal in the case of division. Since the IEEE-754 standard only requires an error of less than one half of one unit in the last place for a single operation, the floating point errors over repeated operations will add up unless corrected.

 

 

分享到:
评论

相关推荐

    如何把float类型变量发送到串口调试助手

    2. 将float变量的值赋给共用体变量。 3. 将字符数组中的每个字符(即float变量的字节)分别通过串口发送出去。 这种通过共用体访问float变量底层字节的方法,可以确保数据按照正确的格式进行传输,而不会被改变。...

    mobus rtu传输float 类型变量换算工具

    在进行工控数据采集时,很多设备都是使用RS485 接口传输modbus RTU协议数据,比较常见的是,传输一个字(2个bytes),但是有些变量超过了两个字节,比如float类型、int、long、double类型数据在modbus RTU传输时,该...

    float类型变量在单片机中怎么存储.pdf

    接下来,关于如何把float类型的变量发送到串口调试助手的问题。串口调试助手是一种非常有用的工具,它可以帮助开发者在不进行程序烧录的情况下,实时地观察和调试程序中的数据。发送float类型变量到串口调试助手,不...

    青少年编程讲义:变量.pdf

    总结来说,变量是编程语言中存储和处理数据的基础工具,理解变量的定义、类型、赋值和使用方法是编程学习的重要一环。通过熟练掌握这些知识,青少年可以更好地入门编程,构建出自己的程序世界。

    1.2编程基础之变量定义、赋值及转换(10题)--题目 有链接.pdf

    首先,文件描述了编程基础中的变量定义、赋值及类型转换的相关练习题目,涉及的编程语言可能是C++。这里总结与这些概念相关的知识点: 1. 变量定义: 变量定义是指在程序中声明一个变量,为它分配存储空间,并指定...

    4BYTE转换成float型代码

    标题"4BYTE转换成float型代码"指的是将4个字节的数据转换为浮点数(float)的程序代码。在C++或Visual C++(vc)环境中,这通常涉及到低级别内存操作和类型转换。以下是转换过程中涉及的关键知识点: 1. **字节序**...

    1.2 编程基础之变量定义、赋值及转换 python版.zip

    文件"1.2编程基础之变量定义、赋值及转换 05 填空:类型转换2.py"和"1.2编程基础之变量定义、赋值及转换 04 填空:类型转换1.py"可能是关于不同类型之间的转换练习,比如浮点数到整数的舍入(在"1.2编程基础之变量...

    Arduino编程语言参考大全(官方网站)

    标题《Arduino编程语言参考大全(官方网站)》表明了这份文档是官方提供的Arduino编程语言的详细参考资料。Arduino是一种基于简单易用的硬件和软件平台,广泛用于电子原型设计和交互式项目。文档说明Arduino程序由三大...

    Java编程语言基础.docx

    Java 编程语言基础是 Java 语言的基础知识,涵盖标识符、字面值、变量、数据类型等概念。下面是详细的知识点总结: 一、标识符 标识符是 Java 源程序中由程序员自己命名的单词,包括类名、方法名、变量名、接口名...

    16位转浮点型float,MODBUS 32位转浮点型float 64位转双浮点型double

    2. **MODBUS中的32位转浮点型(float)** MODBUS协议中,32位数据可以用于表示一个浮点数。根据IEEE 754标准,浮点数的32位表示由8位符号位、8位指数位和23位尾数组成。转换时,需要解码这32位数据并按照标准进行计算...

    编程语言基础-C语言电子教案

    C语言是一种广泛应用于系统开发、软件工程和嵌入式系统的高级编程语言,它的简洁性、高效性和灵活性使得它成为初学者和专业程序员的首选。本电子教案“编程语言基础-C语言”涵盖了C语言的基础概念和核心特性,旨在...

    FLOAT型的二进制输出.rar

    在计算机科学中,数据类型是编程语言的基础,用于定义变量的存储空间和处理方式。`FLOAT`型是一种常见的浮点数类型,它在内存中以二进制形式存储,这在许多计算和数据处理场景中至关重要。本篇将深入探讨`FLOAT`型的...

    arduino编程语言

    Arduino 编程语言是建立在 C/C++ 基础上的,其实也就是基础的 C 语言,Arduino 语言只不过把 AVR 单片机(微控制器)相关的一些参数设置都函数化,不用我们去了解他的底层,让我们不了解 AVR 单片机(微控制器)的...

    【Go编程基础】03类型与变量

    在编程世界中,Go语言(也称为Golang)是由Google开发的一种静态类型的、编译型的、并发型的、垃圾回收的、具有C风格语法的编程语言。它旨在提高开发者的生产效率,同时提供高性能。本节我们将深入探讨Go语言的基础...

    C语言中int到float的强制类型转换

    在C语言中,32位的float型变量有着这样的规定:首位表示符号位s,接下来的8位(指数域)用于表示2的指数E,剩余的23位(小数域)表示M(取值范围为[1,2)或[0,1))。 float型变量可以分成三种情况——规格化值、...

    16进制如何转换成float型数据

    在进行串口通信的过程中,经常会遇到需要将从下位机接收到的16进制数据转换为浮点数(float)的情况。以下将详细介绍这一过程的关键步骤和技术细节。 ### 一、16进制数据与浮点数的关系 16进制是一种常用的数制表示...

    青少年python编程之变量PPT

    本资源摘要信息主要讲解了 Python 编程语言中变量的概念和应用。变量是存储数据的容器,它可以存储不同的数据类型,如字符串、整数、浮点数和布尔型等。变量的命名规则是只能由字母、下划线和数字组成,不能以数字...

    01、Java编程基础知识入门:变量与数据类型

    在编程世界中,Java是一种广泛使用的面向对象的编程语言,以其跨平台的特性及强大的功能深受程序员喜爱。本文将深入探讨“Java编程基础知识入门:变量与数据类型”这一主题,这是学习Java的第一步,也是构建程序逻辑...

    float变量在内存当中存储格式.doc

    在计算机科学领域,`float`类型变量的存储方式一直是编程基础中的一个重要知识点。本文将深入探讨`float`变量在内存中的存储格式,包括其二进制表示、转换方法以及这种存储方式背后的原理。 #### 二、`float`与`int...

Global site tag (gtag.js) - Google Analytics