Representation of Numbers in the Computer: Difference between revisions
Created page with "Questions to [mailto:davrot@uni-bremen.de David Rotermund] ''By Jan Wiersig, modified by Udo Ernst and translated into English by Daniel Harnack. David Rotermund replaced the Matlab code with Python code.'' == Bits and Bytes == In digital computers, information is represented by discrete states of circuit elements. The smallest possible unit of information is one bit (binary digit). It can be in two different states, that may be described differently: {| class="wikitab..." |
|||
| (10 intermediate revisions by the same user not shown) | |||
| Line 145: | Line 145: | ||
Real numbers are, by their nature, analogue quantities. Hence we would expect the handling of these numbers on digital computers not to be completely problem-free. Present digital computers usually represent real numbers as floating-point numbers. | Real numbers are, by their nature, analogue quantities. Hence we would expect the handling of these numbers on digital computers not to be completely problem-free. Present digital computers usually represent real numbers as floating-point numbers. | ||
floating-point number = mantissa ⋅ basis <sup>exponent</sup> | |||
Thereby, the precision, with which the real number can be represented, is determined by the number of available bits.”Simple precision” (i.e. float32) requires 4 Bytes, for ''double precision'' (i.e. float64) 8~Bytes are needed. The latter is the default configuration in Python and Matlab. The IEEE format of double precision uses 53-Bits for the mantissa, 11-Bits for the exponent and for the basis the remaining 2. One Bit of the mantissa respectively the exponent are used for the sign of the quantity. Thus, the exponent can vary between<math display="inline">-1024</math> and <math display="inline">+1023</math>. The mantissa always represents a value in the interval <math display="inline">[1, 2[</math> in the IEEE notation. Here, the <math display="inline">52</math> Bits are utilized to add up fractions of exponents of 2. The value of the mantissa yields <math display=" | Thereby, the precision, with which the real number can be represented, is determined by the number of available bits.”Simple precision” (i.e. float32) requires 4 Bytes, for ''double precision'' (i.e. float64) 8~Bytes are needed. The latter is the default configuration in Python and Matlab. The IEEE format of double precision uses 53-Bits for the mantissa, 11-Bits for the exponent and for the basis the remaining 2. One Bit of the mantissa respectively the exponent are used for the sign of the quantity. Thus, the exponent can vary between<math display="inline">-1024</math> and <math display="inline">+1023</math>. The mantissa always represents a value in the interval <math display="inline">[1, 2[</math> in the IEEE notation. Here, the <math display="inline">52</math> Bits are utilized to add up fractions of exponents of 2. The value of the mantissa yields <math display="inline">mantissa = 1 + \sum_{i=1}^{52} b_i 2^{-i}</math>, with <math display="inline">b_i=1</math> , if the <math display="inline">i</math>-th bit in the mantissa is set. | ||
== Range Error == | == Range Error == | ||
| Line 156: | Line 156: | ||
and for double precision | and for double precision | ||
<math>2^{\pm 1023} \approx 10^{\pm 308}</math> | |||
Via application of arithmetic operations on these numbers, the range can be exceeded. The error occurring in that case is named a range error. As an example we consider the Bohr radius in SI units | Via application of arithmetic operations on these numbers, the range can be exceeded. The error occurring in that case is named a range error. As an example we consider the Bohr radius in SI units | ||
<math>a_0 = \frac{4\pi\varepsilon_0\hbar^2}{m_ee^2}\approx 5.3\times 10^{-11} \mbox{m}</math> | |||
The quantity <math display="inline">\hbar</math> is Planck’s quantum of action divided by <math display="inline">2\pi</math>. Bohr’s radius is in the range of single precision floating-point numbers. However, the same does not hold for the numerator <math display="inline">4\pi\varepsilon_0\hbar^2 \approx 1.24\cdot 10^{-78}\mbox{KgC}^2\mbox{m}</math> and the denominator <math display="inline">m_ee^2 \approx 2.34\times 10^{-68}\mbox{KgC}^2</math>. I.e. neither the numerator nor the denominator can be represented as a single precision floating-point number. Hence, the calculation of Bohr’s radius by the given formula can be problematic. A simple solution of this problem lies in the use of natural units, such as Bohr’s radius, for distances, etc. | The quantity <math display="inline">\hbar</math> is Planck’s quantum of action divided by <math display="inline">2\pi</math>. Bohr’s radius is in the range of single precision floating-point numbers. However, the same does not hold for the numerator <math display="inline">4\pi\varepsilon_0\hbar^2 \approx 1.24\cdot 10^{-78}\mbox{KgC}^2\mbox{m}</math> and the denominator <math display="inline">m_ee^2 \approx 2.34\times 10^{-68}\mbox{KgC}^2</math>. I.e. neither the numerator nor the denominator can be represented as a single precision floating-point number. Hence, the calculation of Bohr’s radius by the given formula can be problematic. A simple solution of this problem lies in the use of natural units, such as Bohr’s radius, for distances, etc. | ||
| Line 166: | Line 166: | ||
An even bigger problem can be illustrated by the calculation of the factorial. The factorial is defined as | An even bigger problem can be illustrated by the calculation of the factorial. The factorial is defined as | ||
<math>n! = n\cdot(n-1)\cdot(n-2)\cdot\ldots3\cdot 2\cdot 1</math> | |||
In Python or Matlab, it can be easily verified by using the function factorial(n), that the factorial for <math display="inline">n>170</math> can not be represented, even with double precision numbers. A way out is provided by the use of logarithms, since the logarithm of a bigger number still gives moderately small values, e.g. <math display="inline">\log_{10}(10^{100}) = 100</math>. It ensues that | In Python or Matlab, it can be easily verified by using the function factorial(n), that the factorial for <math display="inline">n>170</math> can not be represented, even with double precision numbers. A way out is provided by the use of logarithms, since the logarithm of a bigger number still gives moderately small values, e.g. <math display="inline">\log_{10}(10^{100}) = 100</math>. It ensues that | ||
<math>\ln(n!) = \ln(n) + \ln(n-1) + \ldots + \ln(3) + \ln(2) + \ln(1)</math> | |||
For bigger <math display="inline">n</math>, the evaluation of this expression is, however, to laborious. If <math display="inline">n>30</math>, one is advised to use Stirling’s formula | For bigger <math display="inline">n</math>, the evaluation of this expression is, however, to laborious. If <math display="inline">n>30</math>, one is advised to use Stirling’s formula | ||
<math>\ln(n!) = n\ln(n)-n+\frac{1}{2}\ln(2\pi n)+\ln\left(1+\frac{1}{12n}+\frac{1}{288n^2}+\ldots\right)</math> | |||
The factorial <math display="inline">n!</math> can than be written as the following | The factorial <math display="inline">n!</math> can than be written as the following | ||
<math>n! = \mbox{mantissa}\times 10^{\mbox{exponent}}</math> | |||
To get the mantissa and the exponent, we form the logarithm to the basis 10 (reminder: <math display="inline">\log_{10}(x) = \ln(x)/\ln(10)</math>) | To get the mantissa and the exponent, we form the logarithm to the basis 10 (reminder: <math display="inline">\log_{10}(x) = \ln(x)/\ln(10)</math>) | ||
<math>\log_{10}(n!) = \log_{10}(\mbox{mantissa})+{\mbox{exponent}}</math> | |||
We now associate the integer part of <math display="inline">\log_{10}(n!)</math> with the exponent. The post-decimal places are associated with the mantissa, i.e. mantissa = <math display="inline">10^a</math> with <math display="inline">a = \log_{10}(n!)-\mbox{exponent}</math>. | We now associate the integer part of <math display="inline">\log_{10}(n!)</math> with the exponent. The post-decimal places are associated with the mantissa, i.e. mantissa = <math display="inline">10^a</math> with <math display="inline">a = \log_{10}(n!)-\mbox{exponent}</math>. | ||
| Line 195: | Line 195: | ||
x = x * 2 | x = x * 2 | ||
print(x) # -> 2.220446049250313e-16</syntaxhighlight>One might think that this constitutes an infinite loop. To the contrary, the loop will be left in finite time. The result for double precision is <math display="inline">x \approx 2\times 10^{-16}</math> | print(x) # -> 2.220446049250313e-16</syntaxhighlight>One might think that this constitutes an infinite loop. To the contrary, the loop will be left in finite time. The result for double precision is <math display="inline">x \approx 2\times 10^{-16}</math> <syntaxhighlight lang="python3"> | ||
import sys | |||
print(sys.float_info.epsilon) | |||
</syntaxhighlight>eps is the smallest number with <math display="inline">1+\epsilon > 1</math> | |||
, and is the . Rounding errors of this order of magnitude occur on a regular basis. For example, Python calculates <math display="inline">\sin{\pi} \approx 1.2246\times 10^{-16}</math> <syntaxhighlight lang="python3"> | |||
import math | |||
print(math.sin(math.pi)) | |||
</syntaxhighlight>It shall be mentioned hat the machine accuracy for double precision is exactly <math display="inline">\epsilon= 2^{-52}</math>, since 52 bits (plus one bit for the sign) are used for the mantissa. This rounding error might appear to be small and negligible. However, if further calculations are performed with rounded numbers, the rounding errors can accumulate with each calculation and grow to a significant value. | |||
Latest revision as of 10:29, 20 October 2025
Questions to David Rotermund
By Jan Wiersig, modified by Udo Ernst and translated into English by Daniel Harnack. David Rotermund replaced the Matlab code with Python code.
Bits and Bytes
In digital computers, information is represented by discrete states of circuit elements. The smallest possible unit of information is one bit (binary digit). It can be in two different states, that may be described differently:
| true | false | |
| yes | no | |
| 1 | 0 | |
A bit is easily realized physically:
| current flowing | current not flowing | |
| voltage there | voltage gone | |
| magnetized | not magnetized | |
For example, in the communication between integrated circuits, information is traded via changes in voltage, while the bits on the hard disk are written by magnetization of ferromagnetic materials. Between electronic musical instruments, bits are exchanged through current loops (MIDI-Standard). In the stone age of computers, the fifties, bits were even once realized as air bubbles in a liquid medium.
By combining several bits, complex information like numbers, letters (e.g. 8 bit-ASCII-code), images, color, music, etc, can be represented. Much like in the decimal system, where whole numbers are made up of the digits 0-9, the same numbers can be written in the so called dual system by sequences of 0 and 1:
| dual system | decimal system |
|---|---|
| 000 | 0 |
| 001 | 1 |
| 010 | 2 |
| 011 | 3 |
| 100 | 4 |
| 101 | 5 |
| 110 | 6 |
| 111 | 7 |
Two further examples for 8-bit numbers:
01010101
10101010
To represent negative numbers, a small trick is necessary: one specific bit codes for the sign of the number. In an -bit number, one would for example reserve the -th it for the sign, and the remaining bits code, as usual, a binary number : If the -th bit is not set, then the result is by default . If the -th bit is set, the result is . For illustration, again a small table:
| dual system | decimal system |
|---|---|
| 01111111 | +127 |
| 01111110 | +126 |
| 00000010 | +2 |
| 00000001 | +1 |
| 00000000 | +0 |
| 11111111 | -1 |
| 11111110 | -2 |
| 10000010 | -126 |
| 10000001 | -127 |
| 10000000 | -128 |
Certain bit lengths have names:
| 1 Byte | 8 Bit |
| 1 Word | 16 Bit |
| 1 Kilobyte | 1024 Byte |
| 1 Megabyte | 1024 Kilobyte |
| 1 Gigabyte | 1024 Megabyte |
| 1 Terabyte | 1024 Gigabyte |
Representation of Real Numbers and Numerical Errors
Real numbers are, by their nature, analogue quantities. Hence we would expect the handling of these numbers on digital computers not to be completely problem-free. Present digital computers usually represent real numbers as floating-point numbers.
floating-point number = mantissa ⋅ basis exponent
Thereby, the precision, with which the real number can be represented, is determined by the number of available bits.”Simple precision” (i.e. float32) requires 4 Bytes, for double precision (i.e. float64) 8~Bytes are needed. The latter is the default configuration in Python and Matlab. The IEEE format of double precision uses 53-Bits for the mantissa, 11-Bits for the exponent and for the basis the remaining 2. One Bit of the mantissa respectively the exponent are used for the sign of the quantity. Thus, the exponent can vary between and . The mantissa always represents a value in the interval in the IEEE notation. Here, the Bits are utilized to add up fractions of exponents of 2. The value of the mantissa yields , with , if the -th bit in the mantissa is set.
Range Error
The maximal range of the floating-point numbers is determined by the number of bits used to code for the exponent. A typical number for single precision is
and for double precision
Via application of arithmetic operations on these numbers, the range can be exceeded. The error occurring in that case is named a range error. As an example we consider the Bohr radius in SI units
The quantity is Planck’s quantum of action divided by . Bohr’s radius is in the range of single precision floating-point numbers. However, the same does not hold for the numerator and the denominator . I.e. neither the numerator nor the denominator can be represented as a single precision floating-point number. Hence, the calculation of Bohr’s radius by the given formula can be problematic. A simple solution of this problem lies in the use of natural units, such as Bohr’s radius, for distances, etc.
An even bigger problem can be illustrated by the calculation of the factorial. The factorial is defined as
In Python or Matlab, it can be easily verified by using the function factorial(n), that the factorial for can not be represented, even with double precision numbers. A way out is provided by the use of logarithms, since the logarithm of a bigger number still gives moderately small values, e.g. . It ensues that
For bigger , the evaluation of this expression is, however, to laborious. If , one is advised to use Stirling’s formula
The factorial can than be written as the following
To get the mantissa and the exponent, we form the logarithm to the basis 10 (reminder: )
We now associate the integer part of with the exponent. The post-decimal places are associated with the mantissa, i.e. mantissa = with .
From these examples we learn that range errors can usually be circumvented with a little creativity.
Rounding Error
Rounding errors stem from the finite precision of the mantissa. The following program illustrates this fact:
x: float = 1.0
while 1 + x != 1:
x = x / 2
x = x * 2
print(x) # -> 2.220446049250313e-16
One might think that this constitutes an infinite loop. To the contrary, the loop will be left in finite time. The result for double precision is
import sys
print(sys.float_info.epsilon)
eps is the smallest number with
, and is the . Rounding errors of this order of magnitude occur on a regular basis. For example, Python calculates
import math
print(math.sin(math.pi))
It shall be mentioned hat the machine accuracy for double precision is exactly
, since 52 bits (plus one bit for the sign) are used for the mantissa. This rounding error might appear to be small and negligible. However, if further calculations are performed with rounded numbers, the rounding errors can accumulate with each calculation and grow to a significant value.