- 浏览: 222658 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
dysking:
SWT 和 JFace -
wangyuhfut:
东西不少啊。学习中。。。
一个比较好、中文说明的emacs配置文件 1 -
pacocai:
呵呵!学习,学习~~~不过要说编辑器的话个人更喜欢用VIM,比 ...
一个比较好、中文说明的emacs配置文件 1 -
zhf1zhf2:
这两百多个记起来也不容易啊
英国人是这样背的! -
regex:
试了两次,都是乱码,版本是23.1.1,看来不适合
汉化 Emacs 菜单
5.17 23:50 更新
5.16 20:30 翻译更新
Real Close to the Machine: Floating Point in D
走近真实的机器: D 中的浮点
Introduction 介绍
by Don Clugston
Computers were originally conceived as devices for performing mathematics. The earliest computers spent most of their time solving equations. Although the engineering and scientific community now forms only a miniscule part of the computing world, there is a fantastic legacy from those former times: almost all computers now feature superb hardware for performing mathematical calculations accurately and extremely quickly. Sadly, most programming languages make it difficult for programmers to take full advantage of this hardware. An even bigger problem is the lack of documentation; even for many mathematical programmers, aspects of floating-point arithmetic remain shrouded in mystery.
计算机原先设想的是从事数学计算的设备。最早的电脑大部分时间是求解方程。虽然工程和科学界目前有一小部分是计算机的世界,它从以往的时代继承了一份珍贵的遗产: 所有的电脑都有极好的硬件性能来快速完成精确的数学计算。遗憾的是,大多数编程语言很难使程序员能够充分利用这一硬件。一个更大的问题是缺乏文档;甚至对许多数学程序员,各方面的浮点算法仍然被笼罩在神秘之中。
As a systems programming language, the D programming language attempts to remove all barriers between the programmer and the compiler, and between the programmer and the machine. This philosophy is particularly evident in support for floating-point arithmetic. A personal anecdote may illustrate the importance of having an accurate understanding of the hardware.
作为一个系统级的编程语言,D 语言试图消除程序员和编译器、程序员和机器之间的所有障碍
,这种哲学特别明显的表现在支持浮动点运算。 一个个人的趣闻可能说明对于硬件的准确理解的重要性。
My first floating-point nightmare occurred in a C++ program which hung once in every hundred runs or so. I eventually traced the problem to a while loop which occasionally failed to terminate. The essence of the code is shown in Listing 1.
我的第一个浮点噩梦是发生在每百次运行挂起一次的 C + + 程序。我最终找到问题是一个while循环有时未能终止。
基本代码见列表1.
double q[8];
...
int x = 0;
while (x < {
if ( q[x] >= 0 ) return true;
if ( q[x] < 0 ) ++x;
}
return false;
=========================================
5.17 23:50更新
Initially, I was completely baffled as to how this harmless-looking loop could fail. But eventually, I discovered that q had not been initialized properly; q[7] contained random garbage. Occasionally, that garbage had every bit set, which mean that q[7] was a Not-a-Number (NaN), a special code which indicates that the value of the variable is nonsense. NaNs were not mentioned in the compiler's documentation - the only information I could find about them was in Intel's assembly instruction set documentation! Any comparison involving a NaN is false, so q[7] was neither >= 0, nor < 0, killing my program. Until that unwelcome discovery, I'd been unaware that NaNs even existed. I had lived in a fool's paradise, mistakenly believing that every floating point number was either positive, negative, or zero.
起初,我是完全困惑此看着无害的(harmless-looking)循环如何会失败。但最终我发现 q
没有恰当的初始化;q[7] 是很随意的垃圾,偶尔,这个垃圾拥有所有的 bit set,这意味着
q[7] 是一个 NaN ( Not-a-Number,参见 d 编程手册),一个特殊的值的变量是毫无意义的。NaNs 在编译器的文档中没有提到----唯一的信息我只能在在英特尔的汇编指令集文件中找到!任何比较涉及到 NaN 都是 false,所以q [ 7 ]既不是“> = 0 ,也不是 “< 0” ,
直到那种讨厌的发现,我本人一直不知道 NaN 的存在。我曾经生活在一个傻瓜的天堂中,
错误的以为,每一个浮点数不是正数,就是负数, 或 零。
My experience would have been quite different in D. The "strange" features of floating point have a higher visibility in the language, improving the education of numerical programmers. Uninitialized floating point numbers are initialized to NaN by the compiler, so the problematic loop would fail every time, not intermittently. Numerical programmers in D will generally execute their programs with the 'invalid' floating point exception enabled. Under those circumstances, as soon as the program accessed the uninitialized variable, a hardware exception would occur, summoning the debugger. Easy access to the "strange" features of floating point results in better educated programmers, reduced confusion, faster debugging, better testing, and hopefully, more reliable and correct numerical programs. This article will provide a brief overview of the support for floating point in the D programming language.
Demystifying Floating-Point
D guarantees that all built-in floating-point types conform to IEEE 754 arithmetic, making behaviour entirely predictable (note that this is not the same as producing identical results on all platforms). IEEE 754-2008 is the latest revision of the IEEE 754 Standard for Floating-Point Arithmetic. D is progressing towards full compliance with 754-2008.
The IEEE standard floating point types currently supported by D are float and double. Additionally, D supports the real type, which is either 'IEEE 80-bit extended' if supported by the CPU; otherwise, it is the same as double. In the future, the new types from 754-2008 will be added: quadruple, decimal64, and decimal128.
The characteristics of these types are easily accessible in the language via properties. For example, float.max is the maximum value which can be stored in a float; float.mant_dig is the number of digits (bits) stored in the mantissa.
To make sense of mathematics in D, it's necessary to have a basic understanding of IEEE floating-point arithmetic. Fundamentally, it is a mapping of the infinitely many real numbers onto a small number of bytes. Only 4000 million distinct numbers are representable as an IEEE 32-bit float. Even with such a pathetically small representation space, IEEE floating point arithmetic does a remarkably good job of maintaining the illusion that mathematical real numbers are being used; but it's important to understand when the illusion breaks down.
Most problems arise from the distribution of these representable numbers. The IEEE number line is quite different to the mathematical number line.
+ +-----------+------------+ .. +.. +----------+----------+ + #
-infinity -float.max -1 -float.min 0 float.min 1 float.max infinity NaN
Notice that half of the IEEE number line lies between -1 and +1. There are 1000 million representable floats between 0 and 0.5, but only 8 million between 0.5 and 1. This has important implications for accuracy: the effective precision is incredibly high near zero. Several examples will be presented where numbers in the range -1 to +1 are treated seperately to take advantage of this.
Notice also the special numbers: ±∞; the so-called "subnormals" between ±float.min and 0, which are represented at reduced precision; the fact that there are TWO zeros, +0 and -0, and finally "NaN"("Not-a-Number"), the nonsense value, which caused so much grief in Listing 1.
Why does NaN exist? It serves a valuable role: it eradicates undefined behaviour from floating-point arithmetic. This makes floating-point completely predictable. Unlike the int type, where 3/0 invokes a hardware division by zero trap handler, possibly ending your program, the floating-point division 3.0/0.0 results in ∞. Numeric overflow (eg, real.max*2) also creates ∞. Depending on the application, ∞ may be a perfectly valid result; more typically, it indicates an error. Nonsensical operations, such as 0.0 / 0.0, result in NaN; but your program does not lose control. At first glance, infinity and NaN may appear unnecessary -- why not just make it an error, just as in the integer case? After all, it is easy to avoid division by zero, simply by checking for a zero denominator before every division. The real difficulty comes from overflow: it is extremely difficult to determine in advance whether an overflow will occur in a multiplication.
Subnormals are necessary to prevent certain anomalies, and preserve important relationships such as: "x - y == 0 if and only if x == y".
Since ∞ can be produced by overflow, both +∞ and -∞ are required. Both +0 and -0 are required in order to preserve identities such as: if x>0, then 1/(1/x) > 0. In almost all other cases, however, there is no difference between +0 and -0.
It's worth noting that these "special values" are usually not very efficient. On x86 machines, for example, a multiplication involving a NaN, an infinity, or a subnormal can be twenty to fifty times slower than an operation on normal numbers. If your numerical code is unexpectedly slow, it's possible that you are inadvertently creating many of these special values. Enabling floating-point exception trapping, described later, is a quick way to confirm this.
One of the biggest factor obscuring what the machine is doing is in the conversion between binary and decimal. You can eliminate this by using the "%a" format when displaying results. This is an invaluable debugging tool, and an enormously helpful aid when developing floating-point algorithms. The 0x1.23Ap+6 hexadecimal floating-point format can also be used in source code for ensuring that your input data is exactly what you intended.
The Quantized Nature of Floating-Point
The fact that the possible values are limited gives access to some operations which are not possible on mathematical real numbers. Given a number x, nextUp(x) gives the next representable number which is greater than x. nextDown(x) gives the next representable number which is less than x.
Numerical analysts often describe errors in terms of "units in the last place"(ulp), a surprisingly subtle term which is often used rather carelessly. [footnote: The most formal definition is found in [J.-M. Muller, "On the definition of ulp(x)",INRIA Technical Report 5504 (2005).]: If x is a real number that lies between two finite consecutive floating-point numbers a and b of type F, without being equal to one of them, then ulp(x)=abs(b-a); otherwise ulp(x) = x*F.epsilon. Moreover, ulp(NaN) is NaN, and ulp(±F.infinity) = ±F.max*F.epsilon.] I prefer a far simpler definition: The difference in ulps between two numbers x and y is is the number of times which you need to call nextUp() or nextDown() to move from x to y. [Footnote: This will not be an integer if either x or y is a real number, rather than a floating point number.] The D library function feqrel(x, y) gives the number of bits which are equal between x and y; it is an easy way to check for loss-of-precision problems.
The quantized nature of floating point has some interesting consequences.
* ANY mathematical range [a,b), (a,b], or (a,b) can be converted into a range or the form [a,b]. (The converse does not apply: there is no (a,b) equivalent to [-∞, ∞]).
* A naive binary chop doesn't work correctly. The fact that there are hundreds or thousands of times as many representable numbers between 0 and 1, as there are between 1 and 2, is problematic for divide-and-conquer algorithms. A naive binary chop would divide the interval [0 .. 2] into [0 .. 1] and [1 .. 2]. Unfortunately, this is not a true binary chop, because the interval [0 .. 1] contains more than 99% of the representable numbers from the original interval!
Condition number
Using nextUp, it's easy to approximately calculate the condition number.
real x = 0x1.1p13L;
real u = nextUp(x);
int bitslost = feqrel(x, u) - feqrel(exp(x), exp(u));
This shows that at these huge values of x, a one-bit error in x destroys 12 bits of accuracy in exp(x)! The error has increased by roughly 6000 units in the last place. The condition number is thus 6000 at this value of x.
The semantics of float, double, and real
For the x86 machines which dominate the market, floating point has traditionally been performed on a descendent of the 8087 math coprocessor. These "x87" floating point units were the first to implement IEEE754 arithmetic. The SSE2 instruction set is an alternative for x86-64 processors, but x87 remains the only portable option for floating point 32-bit x86 machines (no 32-bit AMD processors support SSE2).
The x87 is unusual compared to most other floating-point units. It _only_ supports 80-bit operands, henceforth termed "real80". All double and float operands are first converted to 80-bit, all arithmetic operations are performed at 80-bit precision, and the results are reduced to 64-bit or 32-bit precision if required. This means that the results can be significantly more accurate than on a machine which supports at most 64 bit operations. However, it also poses challenges for writing portable code. (Footnote: The x87 allows you to reduce the mantissa length to be the same as 'double or float, but it retains the real80 exponent, which means different results are obtained with subnormal numbers. To precisely emulate double arithmetic slows down floating point code by an order of magnitude).
Apart from the x87 family, the Motorola 68K (but not ColdFire) and Itanium processors are the only ones which support 80-bit floating point.
A similar issue relates to the FMA (fused multiply and accumulate) instruction, which is available on an increasing number of processors, including PowerPC, Itanium, Sparc, and Cell. On such processors, when evaluating expressions such as x*y + z, the x*y is performed at twice the normal precison. Some calculations which would otherwise cause a total loss of precision, are instead calculated exactly. The challenge for a high-level systems programming language is to create an abstraction which provides predictable behaviour on all platforms, but which nonetheless makes good use of the available hardware.
D's approach to this situation arises from the following observations:
1. It is extremely costly performance-wise to ensure identical behaviour on all processors. In particular, it is crippling for the x87.
2. Very many programs will only run on a particular processor. It would be unfortunate to deny the use of more accurate hardware, for the sake of portability which would never be required.
3. The requirements for portability and for high precision are never required simultaneously. If double precision is inadequate, increasing the precision on only some processors doesn't help.
4. The language should not be tied to particular features of specific processors.
A key design goal is: it should be possible to write code such that, regardless of the processor which is used, the accuracy is never worse than would be obtained on a system which only supports the double type.
(Footnote: real is close to 'indigenous' in the Borneo proposal for the Java programming language[Ref Borneo]).
Consider evaluating x*y + z*w, where x, y, z and w are double.
1. double r1 = x * y + z * w;
2. double a = x * y; double r2 = a + z * w;
3. real b = x * y; double r3 = b + z * w;
Note that during optimisation, (2) and (3) may be transformed into (1), but this is implementation-dependent. Case (2) is particularly problematic, because it introduces an additional rounding.
On a "simple" CPU, r1==r2==r3. We will call this value r0. On PowerPC, r2==r3, but r1 may be more accurate than the others, since it enables use of FMA. On x86, r1==r3, which may be more accurate than r0, though not as much as for the PowerPC case. r2, however, may be LESS accurate than r0.
By using real for intermediate values, we are guaranteed that our results are never worse than for a simple CPU which only supports double.
Properties of the Built-in Types
The fundamental floating-point properties are epsilon, min and max. The six integral properties are simply the log2 or log10 of these three.
float double real80 quadruple decimal64 decimal128
epsilon 0x1p-23 0x1p-52 0x1p-63 0x1p-112 1e-16 (1p-54) 1e-34 (1p-113)
[min 0x1p-126 0x1p-1022 0x1p-16382 0x1p-16382 1e-383 1e-6143
..max) 0x1p+128 0x1p+1024 0x1p+16384 0x1p+16384 1e+385 1e+6145
binary properties
mant_dig 24 53 64 113 53 112
min_exp -125 -1021 -16381 -16381
max_exp +128 +1024 +16384 +16384
decimal properties
dig 6 15 18 33 16 34
min_10_exp -37 -307 -4932 -4932 -382 -6142
max_10_exp +38 +308 +4932 +4932 385 +6145
When writing code which should adapt to different CPUs at compile time, use static if with the mant_dig property. For example, static if (real.mant_dig==64) is true if 80-bit reals are available. For binary types, the dig property gives only the minimum number of valid decimal digits. To ensure that that every representable number has a unique decimal representation, two additional digits are required. Similarly, for decimal numbers, mant_dig is a lower bound on the number of valid binary digits.
Useful relations for a floating point type F, where x and y are of type F
* The smallest representable number is F.min * F.epsilon
* Any integer between 0 and (1/F.epsilon) can be stored in F without loss of precision. 1/F.epsilon is always a exact power of the base.
* If a number x is subnormal, x*(1/F.epsilon) is normal, and exponent(x) = exponent(x*(1/F.epsilon)) - (mant_dig-1).
* x>0 if and only if 1/(1/x) > 0; x<0 if and only if 1/(1/x) < 0.
* If x-y==0, then x==y && isFinite(x) && isFinite(y). Note that if x==y==infinity, then isNaN(x-y).
* F.max * F.min = 4.0 for binary types, 10.0 for decimal types.
Addition and subtraction
* Some loss of precision occurs with x±y if exponent(x)!=exponent(y). The number of digits of precision which are lost is abs(exponent(x)-exponent(y)).
* x±y has total loss of precision, if and only if (1) abs(x * F.epsilon) > abs(y), in which case x+y == x, x-y == x or (2) abs(y * F.epsilon) > abs(x), in which case x+y == y, x-y == -y
* Addition is commutative: a + b == b + a.
* Subtraction is not quite commutative: a - b == -(b - a), but produce +0 and -0 if a==b.
* Addition is not associative at all.
Multiplication and division
* Multiplication and division are always at risk of overflow or underflow. For any abs(x) > F.epsilon, there is at least one finite y such that x/y will overflow to ∞. For any abs(x) < F.epsilon, there is at least one finite y such that x/y will underflow to zero. For any abs(x) > 1, there is at least one finite y such that x*y will overflow to ∞. For any abs(x) < 1, there is at least one finite y such that x*y will underflow to zero.
* x*x will overflow if abs(x)>sqrt(F.max), and underflow to zero if abs(x) < sqrt(F.min*F.epsilon)
* Multiplication is commutative. a * b == b * a
* . Multiplication is not associative in general: a*(b*c) != (a*b)*c, because (1) there is a risk of overflow or underflow and (2) b*c may be an exact calculation, so that a*(b*c) contains only one round-off error, whereas (a*b)*c contains two. The roundoff errors may therefore accumulate at the rate of just under 1 ulp per multiplication.
* However, a limited form of associativity is possible if the type used for intermediate results is larger than any of the operands (which happens on x87 and Itanium machines). If R is the intermediate type, and F is the type being multiplied, up to min(R.max_exp/F.max_exp, R.epsilon/F.epsilon) values of type F can be multiplied together in any order without influencing the result. For example, if R is double, multiplication of 8 floats f1*f2*f3*f4*f5*f6*f7*f8 is completely associative. On x87, 130 floats can be safely multiplied together in any order, and 16 doubles can similarly be multiplied together safely. Strict distributivity does not hold even under these circumstances, as it may destroy the sign of -0.
* The distributive law almost never holds. For example, 4*x + 6*x != 10*x if x==nextDown(1.5). a*x + b*x == (a+b)*x for all x only if the operations a*x, b*x, and (a+b) are all exact operations, which is true only if a and b are exact powers of 2. Even then, if a==-b and x==-0, then a*x+b*x==0.0, (a+b)*x==-0.0.
* Performing a division by multiplication by the reciprocal returns a result which (in round-to-nearest mode) is at most 1.5 ulps from the correctly rounded result. For almost any denominator, the rounding is incorrect (>0.5ulps) for 27% of numerators. [Ref: N. Brisebarre, J-M Muller, and S.K. Raina, "Accelerating Correctly Rounded Floating-Point Division when the Divisor Is Known in Advance", IEEE Trans. on Computers, Vol 53, pp 1069-1072 (2004)].
Powers and logarithms
* F.mant_dig = -log2(F.epsilon) for binary types;
* F.dig = -log10(F.epsilon) for decimal types.
* F.max = exp2(F.max_exp*(1-F.epsilon)) for binary types;
* F.max = exp10(F.max_10_exp*(1-F.epsilon)) for decimal types.
* For any positive finite x, F.min_exp - F.mant_dig <= log2(x) < F.max_exp for binary types, F.min_10_exp - F.dig <= log10(x) < F.max_10_exp for decimal types
* exp2(x) == 0 if x < F.min_exp - F.mant_dig, exp2(x) == infinity if x >= F.max_exp
NaN payloads
According to the IEEE 754 standard, a 'payload' can be stored in the mantissa of a NaN. This payload can contain information about how or why it was generated. Historically, almost no programming languages have ever made use of this potentially powerful feature. In D, this payload consists of a positive integer.
* real NaN(ulong payload) -- create a NaN with a "payload", where the payload is a ulong.
* ulong getNaNPayload(real x) -- returns the integer payload. Note that if narrowing conversions have occured, the high-order bits may have changed.
Never store a pointer as an integer payload inside a NaN. The garbage collector will not be able to find it!
NCEG comparison operations
As well as the usual <, >, <=, and >= comparison operators, D also supports the "NCEG" operators. Most of them are the direct negation of the ordinary operators. Additionally, <>, <>=, !<>, and !<>= are provided. These 8 new operators are different from the normal operators only when a NaN is involved, so for the most part they are quite obscure. They are useful mainly in eliminating the possibility of NaN before beginning a calculation. The most useful relationships are probably:
* x <>= y is the same as !isNaN(x) && !isNaN(y), (except that signalling NaNs may be triggered by <>=).
* x !<>= y is the same as isNaN(x) || isNaN(y).
If y is any compile-time constant (eg 0), these reduce to !isNaN(x) and isNaN(x). Note that x==x is the same as !isNaN(x), and x!=x is the same as isNaN(x). abs(x) !< x.infinity is the same as isNaN(x) || isInfinity(x) The above relationships are useful primarily because they can be used in compile time functions. Very few uses are known for the remaining NCEG operators.
The IEEE Rounding Modes
The rounding mode is controlled within a scope. Rounding mode will be restored to its previous state at the end of that scope. Four rounding modes can be set. The default mode, Round to nearest, is the most statistically accurate, but the least intuitive. In the event of tie, the result is rounded to an even number.
Rounding mode rndint(4.5) rndint(5.5) rndint(-4.5) Notes
Round to nearest 4 6 -4 Ties round to an even number
Round down 4 5 -5
Round up 5 6 -4
Round to zero 4 5 -4
There are very few reasons for changing the rounding mode. The round-up and round-down modes were created specifically to allow fast implementations of interval arithmetic; they are crucial to certain libraries, but rarely used elsewhere. The round-to-zero mode is used for casting floating-point numbers to integers. Since mode switching is slow, especially on Intel machines, it may be useful to switch to round-to-zero mode, in order to exactly duplicate the behaviour of cast(int) in an inner loop.
The only other commonly cited reason for changing the rounding mode is as a simple check for numerical stability: if the calculation produces wildly different results when the rounding mode is changed, it's a clear sign that it is suffering from round-off errors.
The IEEE Exception Status Flags
All IEEE-compiliant processors include special status bits that indicate when "weird" things have happened that programs might want to know about. For example, ieeeFlags.divideByZero tells if any infinities have been created by dividing by zero. They are 'sticky' bits: once they have been set, they remain set until explicitly cleared. By only checking this once at the end of a calculation, it may be possible to avoid comparing thousands of comparisions that are almost never going to fail.
Here's a list of the weird things that can be detected:
invalid
This is set if any NaN's have been generated. This can happen with ∞ - ∞, ∞ * 0, 0 * ∞, 0/0, ∞/∞, ∞%∞, or x%0, for any number x. Several other operations, such as sqrt(-1), can also generate a NaN. The invalid condition is also set when a 'signalling NaN' is accessed, indicating use of an uninitialized variable. This almost always indicates a programming error.
overflow
Set if ∞ was generated by adding or multiplying two numbers that were so large that the sum was greater than real.max. This almost always indicates that the result is incorrect; and corrective action needs to be taken.
divisionByZero
Set if ±∞ was generated by dividing by zero. This usually indicates a programming error, but not always; some types of calculations return correct results even when division by zero occurs. (For example, 1/(1+ 1/x) == 0 if x == 0). Note that division by a tiny, almost-zero number also produces an infinite result, but sets the overflow flag rather than the divisionByZero flag.
underflow
This happens if two numbers are subtracted or divided and are so tiny that the result lost precision because it was subnormal. Extreme underflow produces a zero result. Underflow almost never creates problems, and can usually be ignored.
inexact
This indicates that rounding has occurred. Almost all floating point operations set this flag! It was apparently included in the hardware to support some arcane tricks used in the pioneering days of numerical analysis. It can always be ignored.
Floating-point traps can be enabled for any of the categories listed above. When enabled, a hardware exception will be generated. This can be an invaluable debugging aid. A more advanced usage, not yet supported on any platform(!) is to provide a nested function to be used as a hardware exception handler. This is most useful for the overflow and underflow exceptions.
Floating point and 'pure nothrow'
Every floating point operation, even the most trivial, is affected by the floating-point rounding state, and writes to the sticky flags. The status flags and control state are thus 'hidden variables', potentially affecting every pure function; and if the floating point traps are enabled, any floating point operation can generate a hardware exception. D provides a facility for the floating-point control mode and exception flags to be usable in limited circumstances even when pure and nothrow functions are called.
[TODO: I've made two proposals, but I haven't persauded Walter yet!].
Conclusion
Although D is a general-purpose programming language and supports many high-level concepts, it gives direct and convenient access to almost all features of modern floating-point hardware. This makes it an excellent language for development of robust, high-performance numerical code. It is also a language which encourages a deep understanding of the machine, making it fertile ground for innovation and for developing new algorithms.
References and Further Reading
1. "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
2. "An Interview with the Old Man of Floating-Point: Reminiscences elicited from William Kahan by Charles Severance"
3. N. Brisebarre, J-M Muller, and S.K. Raina, "Accelerating Correctly Rounded Floating-Point Division when the Divisor Is Known in Advance", IEEE Trans. on Computers, Vol 53, pp 1069-1072 (2004).
4. "The Borneo language"
5.16 20:30 翻译更新
Real Close to the Machine: Floating Point in D
走近真实的机器: D 中的浮点
Introduction 介绍
by Don Clugston
Computers were originally conceived as devices for performing mathematics. The earliest computers spent most of their time solving equations. Although the engineering and scientific community now forms only a miniscule part of the computing world, there is a fantastic legacy from those former times: almost all computers now feature superb hardware for performing mathematical calculations accurately and extremely quickly. Sadly, most programming languages make it difficult for programmers to take full advantage of this hardware. An even bigger problem is the lack of documentation; even for many mathematical programmers, aspects of floating-point arithmetic remain shrouded in mystery.
计算机原先设想的是从事数学计算的设备。最早的电脑大部分时间是求解方程。虽然工程和科学界目前有一小部分是计算机的世界,它从以往的时代继承了一份珍贵的遗产: 所有的电脑都有极好的硬件性能来快速完成精确的数学计算。遗憾的是,大多数编程语言很难使程序员能够充分利用这一硬件。一个更大的问题是缺乏文档;甚至对许多数学程序员,各方面的浮点算法仍然被笼罩在神秘之中。
As a systems programming language, the D programming language attempts to remove all barriers between the programmer and the compiler, and between the programmer and the machine. This philosophy is particularly evident in support for floating-point arithmetic. A personal anecdote may illustrate the importance of having an accurate understanding of the hardware.
作为一个系统级的编程语言,D 语言试图消除程序员和编译器、程序员和机器之间的所有障碍
,这种哲学特别明显的表现在支持浮动点运算。 一个个人的趣闻可能说明对于硬件的准确理解的重要性。
My first floating-point nightmare occurred in a C++ program which hung once in every hundred runs or so. I eventually traced the problem to a while loop which occasionally failed to terminate. The essence of the code is shown in Listing 1.
我的第一个浮点噩梦是发生在每百次运行挂起一次的 C + + 程序。我最终找到问题是一个while循环有时未能终止。
基本代码见列表1.
double q[8];
...
int x = 0;
while (x < {
if ( q[x] >= 0 ) return true;
if ( q[x] < 0 ) ++x;
}
return false;
=========================================
5.17 23:50更新
Initially, I was completely baffled as to how this harmless-looking loop could fail. But eventually, I discovered that q had not been initialized properly; q[7] contained random garbage. Occasionally, that garbage had every bit set, which mean that q[7] was a Not-a-Number (NaN), a special code which indicates that the value of the variable is nonsense. NaNs were not mentioned in the compiler's documentation - the only information I could find about them was in Intel's assembly instruction set documentation! Any comparison involving a NaN is false, so q[7] was neither >= 0, nor < 0, killing my program. Until that unwelcome discovery, I'd been unaware that NaNs even existed. I had lived in a fool's paradise, mistakenly believing that every floating point number was either positive, negative, or zero.
起初,我是完全困惑此看着无害的(harmless-looking)循环如何会失败。但最终我发现 q
没有恰当的初始化;q[7] 是很随意的垃圾,偶尔,这个垃圾拥有所有的 bit set,这意味着
q[7] 是一个 NaN ( Not-a-Number,参见 d 编程手册),一个特殊的值的变量是毫无意义的。NaNs 在编译器的文档中没有提到----唯一的信息我只能在在英特尔的汇编指令集文件中找到!任何比较涉及到 NaN 都是 false,所以q [ 7 ]既不是“> = 0 ,也不是 “< 0” ,
直到那种讨厌的发现,我本人一直不知道 NaN 的存在。我曾经生活在一个傻瓜的天堂中,
错误的以为,每一个浮点数不是正数,就是负数, 或 零。
My experience would have been quite different in D. The "strange" features of floating point have a higher visibility in the language, improving the education of numerical programmers. Uninitialized floating point numbers are initialized to NaN by the compiler, so the problematic loop would fail every time, not intermittently. Numerical programmers in D will generally execute their programs with the 'invalid' floating point exception enabled. Under those circumstances, as soon as the program accessed the uninitialized variable, a hardware exception would occur, summoning the debugger. Easy access to the "strange" features of floating point results in better educated programmers, reduced confusion, faster debugging, better testing, and hopefully, more reliable and correct numerical programs. This article will provide a brief overview of the support for floating point in the D programming language.
Demystifying Floating-Point
D guarantees that all built-in floating-point types conform to IEEE 754 arithmetic, making behaviour entirely predictable (note that this is not the same as producing identical results on all platforms). IEEE 754-2008 is the latest revision of the IEEE 754 Standard for Floating-Point Arithmetic. D is progressing towards full compliance with 754-2008.
The IEEE standard floating point types currently supported by D are float and double. Additionally, D supports the real type, which is either 'IEEE 80-bit extended' if supported by the CPU; otherwise, it is the same as double. In the future, the new types from 754-2008 will be added: quadruple, decimal64, and decimal128.
The characteristics of these types are easily accessible in the language via properties. For example, float.max is the maximum value which can be stored in a float; float.mant_dig is the number of digits (bits) stored in the mantissa.
To make sense of mathematics in D, it's necessary to have a basic understanding of IEEE floating-point arithmetic. Fundamentally, it is a mapping of the infinitely many real numbers onto a small number of bytes. Only 4000 million distinct numbers are representable as an IEEE 32-bit float. Even with such a pathetically small representation space, IEEE floating point arithmetic does a remarkably good job of maintaining the illusion that mathematical real numbers are being used; but it's important to understand when the illusion breaks down.
Most problems arise from the distribution of these representable numbers. The IEEE number line is quite different to the mathematical number line.
+ +-----------+------------+ .. +.. +----------+----------+ + #
-infinity -float.max -1 -float.min 0 float.min 1 float.max infinity NaN
Notice that half of the IEEE number line lies between -1 and +1. There are 1000 million representable floats between 0 and 0.5, but only 8 million between 0.5 and 1. This has important implications for accuracy: the effective precision is incredibly high near zero. Several examples will be presented where numbers in the range -1 to +1 are treated seperately to take advantage of this.
Notice also the special numbers: ±∞; the so-called "subnormals" between ±float.min and 0, which are represented at reduced precision; the fact that there are TWO zeros, +0 and -0, and finally "NaN"("Not-a-Number"), the nonsense value, which caused so much grief in Listing 1.
Why does NaN exist? It serves a valuable role: it eradicates undefined behaviour from floating-point arithmetic. This makes floating-point completely predictable. Unlike the int type, where 3/0 invokes a hardware division by zero trap handler, possibly ending your program, the floating-point division 3.0/0.0 results in ∞. Numeric overflow (eg, real.max*2) also creates ∞. Depending on the application, ∞ may be a perfectly valid result; more typically, it indicates an error. Nonsensical operations, such as 0.0 / 0.0, result in NaN; but your program does not lose control. At first glance, infinity and NaN may appear unnecessary -- why not just make it an error, just as in the integer case? After all, it is easy to avoid division by zero, simply by checking for a zero denominator before every division. The real difficulty comes from overflow: it is extremely difficult to determine in advance whether an overflow will occur in a multiplication.
Subnormals are necessary to prevent certain anomalies, and preserve important relationships such as: "x - y == 0 if and only if x == y".
Since ∞ can be produced by overflow, both +∞ and -∞ are required. Both +0 and -0 are required in order to preserve identities such as: if x>0, then 1/(1/x) > 0. In almost all other cases, however, there is no difference between +0 and -0.
It's worth noting that these "special values" are usually not very efficient. On x86 machines, for example, a multiplication involving a NaN, an infinity, or a subnormal can be twenty to fifty times slower than an operation on normal numbers. If your numerical code is unexpectedly slow, it's possible that you are inadvertently creating many of these special values. Enabling floating-point exception trapping, described later, is a quick way to confirm this.
One of the biggest factor obscuring what the machine is doing is in the conversion between binary and decimal. You can eliminate this by using the "%a" format when displaying results. This is an invaluable debugging tool, and an enormously helpful aid when developing floating-point algorithms. The 0x1.23Ap+6 hexadecimal floating-point format can also be used in source code for ensuring that your input data is exactly what you intended.
The Quantized Nature of Floating-Point
The fact that the possible values are limited gives access to some operations which are not possible on mathematical real numbers. Given a number x, nextUp(x) gives the next representable number which is greater than x. nextDown(x) gives the next representable number which is less than x.
Numerical analysts often describe errors in terms of "units in the last place"(ulp), a surprisingly subtle term which is often used rather carelessly. [footnote: The most formal definition is found in [J.-M. Muller, "On the definition of ulp(x)",INRIA Technical Report 5504 (2005).]: If x is a real number that lies between two finite consecutive floating-point numbers a and b of type F, without being equal to one of them, then ulp(x)=abs(b-a); otherwise ulp(x) = x*F.epsilon. Moreover, ulp(NaN) is NaN, and ulp(±F.infinity) = ±F.max*F.epsilon.] I prefer a far simpler definition: The difference in ulps between two numbers x and y is is the number of times which you need to call nextUp() or nextDown() to move from x to y. [Footnote: This will not be an integer if either x or y is a real number, rather than a floating point number.] The D library function feqrel(x, y) gives the number of bits which are equal between x and y; it is an easy way to check for loss-of-precision problems.
The quantized nature of floating point has some interesting consequences.
* ANY mathematical range [a,b), (a,b], or (a,b) can be converted into a range or the form [a,b]. (The converse does not apply: there is no (a,b) equivalent to [-∞, ∞]).
* A naive binary chop doesn't work correctly. The fact that there are hundreds or thousands of times as many representable numbers between 0 and 1, as there are between 1 and 2, is problematic for divide-and-conquer algorithms. A naive binary chop would divide the interval [0 .. 2] into [0 .. 1] and [1 .. 2]. Unfortunately, this is not a true binary chop, because the interval [0 .. 1] contains more than 99% of the representable numbers from the original interval!
Condition number
Using nextUp, it's easy to approximately calculate the condition number.
real x = 0x1.1p13L;
real u = nextUp(x);
int bitslost = feqrel(x, u) - feqrel(exp(x), exp(u));
This shows that at these huge values of x, a one-bit error in x destroys 12 bits of accuracy in exp(x)! The error has increased by roughly 6000 units in the last place. The condition number is thus 6000 at this value of x.
The semantics of float, double, and real
For the x86 machines which dominate the market, floating point has traditionally been performed on a descendent of the 8087 math coprocessor. These "x87" floating point units were the first to implement IEEE754 arithmetic. The SSE2 instruction set is an alternative for x86-64 processors, but x87 remains the only portable option for floating point 32-bit x86 machines (no 32-bit AMD processors support SSE2).
The x87 is unusual compared to most other floating-point units. It _only_ supports 80-bit operands, henceforth termed "real80". All double and float operands are first converted to 80-bit, all arithmetic operations are performed at 80-bit precision, and the results are reduced to 64-bit or 32-bit precision if required. This means that the results can be significantly more accurate than on a machine which supports at most 64 bit operations. However, it also poses challenges for writing portable code. (Footnote: The x87 allows you to reduce the mantissa length to be the same as 'double or float, but it retains the real80 exponent, which means different results are obtained with subnormal numbers. To precisely emulate double arithmetic slows down floating point code by an order of magnitude).
Apart from the x87 family, the Motorola 68K (but not ColdFire) and Itanium processors are the only ones which support 80-bit floating point.
A similar issue relates to the FMA (fused multiply and accumulate) instruction, which is available on an increasing number of processors, including PowerPC, Itanium, Sparc, and Cell. On such processors, when evaluating expressions such as x*y + z, the x*y is performed at twice the normal precison. Some calculations which would otherwise cause a total loss of precision, are instead calculated exactly. The challenge for a high-level systems programming language is to create an abstraction which provides predictable behaviour on all platforms, but which nonetheless makes good use of the available hardware.
D's approach to this situation arises from the following observations:
1. It is extremely costly performance-wise to ensure identical behaviour on all processors. In particular, it is crippling for the x87.
2. Very many programs will only run on a particular processor. It would be unfortunate to deny the use of more accurate hardware, for the sake of portability which would never be required.
3. The requirements for portability and for high precision are never required simultaneously. If double precision is inadequate, increasing the precision on only some processors doesn't help.
4. The language should not be tied to particular features of specific processors.
A key design goal is: it should be possible to write code such that, regardless of the processor which is used, the accuracy is never worse than would be obtained on a system which only supports the double type.
(Footnote: real is close to 'indigenous' in the Borneo proposal for the Java programming language[Ref Borneo]).
Consider evaluating x*y + z*w, where x, y, z and w are double.
1. double r1 = x * y + z * w;
2. double a = x * y; double r2 = a + z * w;
3. real b = x * y; double r3 = b + z * w;
Note that during optimisation, (2) and (3) may be transformed into (1), but this is implementation-dependent. Case (2) is particularly problematic, because it introduces an additional rounding.
On a "simple" CPU, r1==r2==r3. We will call this value r0. On PowerPC, r2==r3, but r1 may be more accurate than the others, since it enables use of FMA. On x86, r1==r3, which may be more accurate than r0, though not as much as for the PowerPC case. r2, however, may be LESS accurate than r0.
By using real for intermediate values, we are guaranteed that our results are never worse than for a simple CPU which only supports double.
Properties of the Built-in Types
The fundamental floating-point properties are epsilon, min and max. The six integral properties are simply the log2 or log10 of these three.
float double real80 quadruple decimal64 decimal128
epsilon 0x1p-23 0x1p-52 0x1p-63 0x1p-112 1e-16 (1p-54) 1e-34 (1p-113)
[min 0x1p-126 0x1p-1022 0x1p-16382 0x1p-16382 1e-383 1e-6143
..max) 0x1p+128 0x1p+1024 0x1p+16384 0x1p+16384 1e+385 1e+6145
binary properties
mant_dig 24 53 64 113 53 112
min_exp -125 -1021 -16381 -16381
max_exp +128 +1024 +16384 +16384
decimal properties
dig 6 15 18 33 16 34
min_10_exp -37 -307 -4932 -4932 -382 -6142
max_10_exp +38 +308 +4932 +4932 385 +6145
When writing code which should adapt to different CPUs at compile time, use static if with the mant_dig property. For example, static if (real.mant_dig==64) is true if 80-bit reals are available. For binary types, the dig property gives only the minimum number of valid decimal digits. To ensure that that every representable number has a unique decimal representation, two additional digits are required. Similarly, for decimal numbers, mant_dig is a lower bound on the number of valid binary digits.
Useful relations for a floating point type F, where x and y are of type F
* The smallest representable number is F.min * F.epsilon
* Any integer between 0 and (1/F.epsilon) can be stored in F without loss of precision. 1/F.epsilon is always a exact power of the base.
* If a number x is subnormal, x*(1/F.epsilon) is normal, and exponent(x) = exponent(x*(1/F.epsilon)) - (mant_dig-1).
* x>0 if and only if 1/(1/x) > 0; x<0 if and only if 1/(1/x) < 0.
* If x-y==0, then x==y && isFinite(x) && isFinite(y). Note that if x==y==infinity, then isNaN(x-y).
* F.max * F.min = 4.0 for binary types, 10.0 for decimal types.
Addition and subtraction
* Some loss of precision occurs with x±y if exponent(x)!=exponent(y). The number of digits of precision which are lost is abs(exponent(x)-exponent(y)).
* x±y has total loss of precision, if and only if (1) abs(x * F.epsilon) > abs(y), in which case x+y == x, x-y == x or (2) abs(y * F.epsilon) > abs(x), in which case x+y == y, x-y == -y
* Addition is commutative: a + b == b + a.
* Subtraction is not quite commutative: a - b == -(b - a), but produce +0 and -0 if a==b.
* Addition is not associative at all.
Multiplication and division
* Multiplication and division are always at risk of overflow or underflow. For any abs(x) > F.epsilon, there is at least one finite y such that x/y will overflow to ∞. For any abs(x) < F.epsilon, there is at least one finite y such that x/y will underflow to zero. For any abs(x) > 1, there is at least one finite y such that x*y will overflow to ∞. For any abs(x) < 1, there is at least one finite y such that x*y will underflow to zero.
* x*x will overflow if abs(x)>sqrt(F.max), and underflow to zero if abs(x) < sqrt(F.min*F.epsilon)
* Multiplication is commutative. a * b == b * a
* . Multiplication is not associative in general: a*(b*c) != (a*b)*c, because (1) there is a risk of overflow or underflow and (2) b*c may be an exact calculation, so that a*(b*c) contains only one round-off error, whereas (a*b)*c contains two. The roundoff errors may therefore accumulate at the rate of just under 1 ulp per multiplication.
* However, a limited form of associativity is possible if the type used for intermediate results is larger than any of the operands (which happens on x87 and Itanium machines). If R is the intermediate type, and F is the type being multiplied, up to min(R.max_exp/F.max_exp, R.epsilon/F.epsilon) values of type F can be multiplied together in any order without influencing the result. For example, if R is double, multiplication of 8 floats f1*f2*f3*f4*f5*f6*f7*f8 is completely associative. On x87, 130 floats can be safely multiplied together in any order, and 16 doubles can similarly be multiplied together safely. Strict distributivity does not hold even under these circumstances, as it may destroy the sign of -0.
* The distributive law almost never holds. For example, 4*x + 6*x != 10*x if x==nextDown(1.5). a*x + b*x == (a+b)*x for all x only if the operations a*x, b*x, and (a+b) are all exact operations, which is true only if a and b are exact powers of 2. Even then, if a==-b and x==-0, then a*x+b*x==0.0, (a+b)*x==-0.0.
* Performing a division by multiplication by the reciprocal returns a result which (in round-to-nearest mode) is at most 1.5 ulps from the correctly rounded result. For almost any denominator, the rounding is incorrect (>0.5ulps) for 27% of numerators. [Ref: N. Brisebarre, J-M Muller, and S.K. Raina, "Accelerating Correctly Rounded Floating-Point Division when the Divisor Is Known in Advance", IEEE Trans. on Computers, Vol 53, pp 1069-1072 (2004)].
Powers and logarithms
* F.mant_dig = -log2(F.epsilon) for binary types;
* F.dig = -log10(F.epsilon) for decimal types.
* F.max = exp2(F.max_exp*(1-F.epsilon)) for binary types;
* F.max = exp10(F.max_10_exp*(1-F.epsilon)) for decimal types.
* For any positive finite x, F.min_exp - F.mant_dig <= log2(x) < F.max_exp for binary types, F.min_10_exp - F.dig <= log10(x) < F.max_10_exp for decimal types
* exp2(x) == 0 if x < F.min_exp - F.mant_dig, exp2(x) == infinity if x >= F.max_exp
NaN payloads
According to the IEEE 754 standard, a 'payload' can be stored in the mantissa of a NaN. This payload can contain information about how or why it was generated. Historically, almost no programming languages have ever made use of this potentially powerful feature. In D, this payload consists of a positive integer.
* real NaN(ulong payload) -- create a NaN with a "payload", where the payload is a ulong.
* ulong getNaNPayload(real x) -- returns the integer payload. Note that if narrowing conversions have occured, the high-order bits may have changed.
Never store a pointer as an integer payload inside a NaN. The garbage collector will not be able to find it!
NCEG comparison operations
As well as the usual <, >, <=, and >= comparison operators, D also supports the "NCEG" operators. Most of them are the direct negation of the ordinary operators. Additionally, <>, <>=, !<>, and !<>= are provided. These 8 new operators are different from the normal operators only when a NaN is involved, so for the most part they are quite obscure. They are useful mainly in eliminating the possibility of NaN before beginning a calculation. The most useful relationships are probably:
* x <>= y is the same as !isNaN(x) && !isNaN(y), (except that signalling NaNs may be triggered by <>=).
* x !<>= y is the same as isNaN(x) || isNaN(y).
If y is any compile-time constant (eg 0), these reduce to !isNaN(x) and isNaN(x). Note that x==x is the same as !isNaN(x), and x!=x is the same as isNaN(x). abs(x) !< x.infinity is the same as isNaN(x) || isInfinity(x) The above relationships are useful primarily because they can be used in compile time functions. Very few uses are known for the remaining NCEG operators.
The IEEE Rounding Modes
The rounding mode is controlled within a scope. Rounding mode will be restored to its previous state at the end of that scope. Four rounding modes can be set. The default mode, Round to nearest, is the most statistically accurate, but the least intuitive. In the event of tie, the result is rounded to an even number.
Rounding mode rndint(4.5) rndint(5.5) rndint(-4.5) Notes
Round to nearest 4 6 -4 Ties round to an even number
Round down 4 5 -5
Round up 5 6 -4
Round to zero 4 5 -4
There are very few reasons for changing the rounding mode. The round-up and round-down modes were created specifically to allow fast implementations of interval arithmetic; they are crucial to certain libraries, but rarely used elsewhere. The round-to-zero mode is used for casting floating-point numbers to integers. Since mode switching is slow, especially on Intel machines, it may be useful to switch to round-to-zero mode, in order to exactly duplicate the behaviour of cast(int) in an inner loop.
The only other commonly cited reason for changing the rounding mode is as a simple check for numerical stability: if the calculation produces wildly different results when the rounding mode is changed, it's a clear sign that it is suffering from round-off errors.
The IEEE Exception Status Flags
All IEEE-compiliant processors include special status bits that indicate when "weird" things have happened that programs might want to know about. For example, ieeeFlags.divideByZero tells if any infinities have been created by dividing by zero. They are 'sticky' bits: once they have been set, they remain set until explicitly cleared. By only checking this once at the end of a calculation, it may be possible to avoid comparing thousands of comparisions that are almost never going to fail.
Here's a list of the weird things that can be detected:
invalid
This is set if any NaN's have been generated. This can happen with ∞ - ∞, ∞ * 0, 0 * ∞, 0/0, ∞/∞, ∞%∞, or x%0, for any number x. Several other operations, such as sqrt(-1), can also generate a NaN. The invalid condition is also set when a 'signalling NaN' is accessed, indicating use of an uninitialized variable. This almost always indicates a programming error.
overflow
Set if ∞ was generated by adding or multiplying two numbers that were so large that the sum was greater than real.max. This almost always indicates that the result is incorrect; and corrective action needs to be taken.
divisionByZero
Set if ±∞ was generated by dividing by zero. This usually indicates a programming error, but not always; some types of calculations return correct results even when division by zero occurs. (For example, 1/(1+ 1/x) == 0 if x == 0). Note that division by a tiny, almost-zero number also produces an infinite result, but sets the overflow flag rather than the divisionByZero flag.
underflow
This happens if two numbers are subtracted or divided and are so tiny that the result lost precision because it was subnormal. Extreme underflow produces a zero result. Underflow almost never creates problems, and can usually be ignored.
inexact
This indicates that rounding has occurred. Almost all floating point operations set this flag! It was apparently included in the hardware to support some arcane tricks used in the pioneering days of numerical analysis. It can always be ignored.
Floating-point traps can be enabled for any of the categories listed above. When enabled, a hardware exception will be generated. This can be an invaluable debugging aid. A more advanced usage, not yet supported on any platform(!) is to provide a nested function to be used as a hardware exception handler. This is most useful for the overflow and underflow exceptions.
Floating point and 'pure nothrow'
Every floating point operation, even the most trivial, is affected by the floating-point rounding state, and writes to the sticky flags. The status flags and control state are thus 'hidden variables', potentially affecting every pure function; and if the floating point traps are enabled, any floating point operation can generate a hardware exception. D provides a facility for the floating-point control mode and exception flags to be usable in limited circumstances even when pure and nothrow functions are called.
[TODO: I've made two proposals, but I haven't persauded Walter yet!].
Conclusion
Although D is a general-purpose programming language and supports many high-level concepts, it gives direct and convenient access to almost all features of modern floating-point hardware. This makes it an excellent language for development of robust, high-performance numerical code. It is also a language which encourages a deep understanding of the machine, making it fertile ground for innovation and for developing new algorithms.
References and Further Reading
1. "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
2. "An Interview with the Old Man of Floating-Point: Reminiscences elicited from William Kahan by Charles Severance"
3. N. Brisebarre, J-M Muller, and S.K. Raina, "Accelerating Correctly Rounded Floating-Point Division when the Divisor Is Known in Advance", IEEE Trans. on Computers, Vol 53, pp 1069-1072 (2004).
4. "The Borneo language"
发表评论
-
土耳其文《d编程》range 翻译 一
2011-11-15 02:01 1524Ranges 范围 Ranges are an abstra ... -
土耳其文《d编程》range 翻译 二
2011-11-15 01:59 1018As you can see, that output doe ... -
d2 range 和 标准C++中的Iterator(迭代器)简介
2011-05-07 12:59 2140原文: http://hi.baidu.com/c ... -
三访安德烈Alexandrescu(第2部)
2010-08-20 12:53 1433Google翻译哦 面试------> 应翻成 访谈 ... -
三访安德烈Alexandrescu(第一部分)
2010-08-20 12:43 1356google翻译哦 Interview with Andre ... -
Garden Editor project 日记 之二 10.16 ---
2009-10-16 02:39 02009.10.16 T[new] misgivings ... -
Garden Editor project 日记 之一 09.09.25 --- 10.15
2009-09-24 22:56 0kill two birds with one stone, ... -
template metaprogramming 9
2009-09-09 16:08 1180原文:https://docs.google.co ... -
Migrating to Shared (2.030 新)
2009-05-12 23:03 11625.19 0:10 更新(完成) ... -
D 2.0 的gc
2008-12-04 19:53 1260http://lucifer1982.wordpress.co ... -
垃圾回收 2.014
2008-06-10 07:20 990无版本差异 D 是一种全面采用垃圾回收(Garbage Co ... -
类 class 2.014
2008-06-09 22:51 1105D 的面向对象的特性都来源于类。类层次里的顶层是 Object ... -
接 口 2.014
2008-06-09 22:51 853接口声明: interface 标 ... -
C 语言接口 2.014
2008-06-09 22:50 1039D 的设计就是要在目标系统上能够很好地符合于 C 编译器。D ... -
Traits 特征 2.014
2008-06-07 11:25 12636.14 翻译 (d语言的反 ... -
常量和不变量 Const and Invariant 2.014
2008-06-07 11:22 1321请参考: D 2.0 Const/Final/Invarian ... -
词法 2.014
2008-06-07 10:22 1510在 D 中,词法分析独立于语法分析和语义分析。词法分析器是将源 ... -
枚 举 2.014
2008-06-07 08:41 1166枚举声明: enum 枚举标记 枚举体 enum 枚举体 en ... -
函 数 2 (2.014)
2008-06-07 08:22 10447 嵌套函数 函数可以被 ... -
函 数 2.014
2008-06-07 08:21 1314[size=large]函数体: 块 ...
相关推荐
这个名为“floating_point_adder.zip”的压缩包包含了实现浮点加法功能的VHDL代码,即“floating_point_adder.vhd”文件。VHDL是一种硬件描述语言,用于设计数字逻辑系统,如 FPGA(现场可编程门阵列)或 ASIC(专用...
2. CPU寄存器集 CPU寄存器是CPU内部的小型存储区域,用于暂存即将处理的数据和中间运算结果。DSP28335的CPU寄存器集包括整数寄存器、浮点寄存器和控制寄存器等。其中,浮点寄存器专门用于存储浮点数数据。 3. 浮点...
标题“floating_point_math.rar_floating”和描述中的引用——David Goldberg的文章“Every Computer Scientist Should Know About Floating-Point Arithmetic”,强调了理解和掌握浮点计算的重要性。 浮点数在...
decimal floating-point arithmetic in computer programming environments. This standard specifies exception conditions and their default handling. An implementation of a floating-point system conforming...
Floating Toucher Premium v2.8.apk
This documents describes a free single precision floating point unit. This floating point unit can perform add, subtract, multiply, divide, integer to floating point and floating point to integer ...
Floating Point Exception(解决方案).md
《所有计算机科学家都应该了解的关于浮点运算的知识》是David Goldberg所著的一篇经典论文,提供了浮点运算的标准规定和详细讨论,这篇文章对计算机科学领域有深远的影响。浮点运算是一种在计算机中进行数值计算的...
This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point ...
在IEEE 754-2008标准中,32位基2格式正式被称为binary32,而在1985年的IEEE 754标准中它被称作“single”。早些时候,不同计算机系统使用过4字节的其他浮点数格式。 单精度浮点数的优势在于其比相同位宽的定点数...
在单片机编程中,由于资源有限,浮点数(floating-point)运算通常比整数运算更为复杂且耗时。为了有效地处理浮点数并将其转换为字符串,开发者需要掌握特定的技术和算法。本文将详细讲解如何在单片机环境下将浮点数...
**Fixed-point and floating point.ppt** 这个PPT文件很可能是详细介绍了定点和浮点运算的各个方面,包括它们的表示方法、运算规则、优缺点,以及如何在实际 DSP 应用中进行转换和优化。通过学习这份资料,开发者...
本文档涉及的是"IEEE standard for binary floating-point arithmetic",即IEEE754标准,这是一个国际上广泛采用的关于二进制浮点数算术运算的规范。该标准定义了如何在数字计算机系统中表示和执行浮点数的加法、...
本资源"CS1-CJ1 Floating Point to Fixed Point Conversion for HMI.rar"聚焦于一个关键的技术主题:浮点数到定点数的转换,这对于HMI(人机界面)与PLC之间的数据交互至关重要。 浮点数和定点数是两种不同的数值...
4.5 Scheduling floating point code..................................................................................35 5 Pentium Pro, II and III pipeline.................................................
runtime error R6002 -floating point support not loaded 将未出现过该错误的电脑中Keil的\ARM\ARMCC\bin\armlink.exe复制,覆盖掉报错电脑中的Keil安装目录下\ARM\ARMCC\bin\armlink.exe即可。每天出现该报错,...
Floating-ArcMenu, 所有应用程序的prety菜单 浮动 arcmenu为所有应用程序 prety菜单中新特性的特性向ArcMenu和FloatActionButton添加了SnackBar向上/向下事件( 通过使用CoordinatorLayout作为父级)以编程
根据提供的文件信息,我们来详细探讨IEEE 754-2019标准,即《IEEE Standard for Floating-Point Arithmetic》的主要知识点。 ### IEEE 754标准概述 IEEE 754标准是由电气和电子工程师协会(IEEE)开发的一系列关于...
【船级社】 KR Guidance for Floating Liquefied Gas Production Units.