Consider a computer that uses 20-bit floating point numbers of the form
with a 1-bit sign indicator,a 7-bit exponent,and a 12-bit mantissa,stored as binary numbers. The most significant bit of the mantissa must be 1.is a bias subtracted from n to
represent both positive and negative exponents.
Note thatfor positive numbers andfor negative numbers and the maximum value of the 7-bit exponent isi.e.
The length of the exponent controls the range of numbers that can be represented. To ensure
however that numbers with small magnitude can be represented as accurately as numbers with
large amplitude, we subtract the biasfrom the exponentThus, the effective
range of the exponent is notbut
The minimum value ofand its maximum value isThus,
The absolute value of the largest oating point number that can
be stored in the computer isComputations involving larger numbers, e.g.produce an overow error.
The smaller absolute number that can be stored isSimilarly computations involving smaller numbers, e.g.produce an underflow error.
Consider the number represented by
Sign |
Exponent |
Mantissa |
0 |
1001001 |
110100010011 |
That is
The sign indicator is 0, i.e. the number is positive.
The exponent isso the effective exponenti.e.
The mantissa gives
So, the machine number represents
The next floating point number that we can store in this machine is
Sign |
Exponent |
Mantissa |
0 |
1001001 |
110100010100 |
The sign and the exponent remain unchanged and we simply add 1 to the least significant bit of the mantissa. The new number isso our primitive computer would be unable to store exactly any number between 836.75 and 837, leading to a relative uncertainty equal to
At worst, the relative uncertainty in the value of floating point numbers that this primitive computer can store is equal to
Suppose that we perform a calculation to which the answer is
There are two ways to approximate this:
1. the most accurate is rounding to the nearest floating point number,
2. Many computers simply chop off the expression at the bit length of the mantissa
and ignore the extra digits, giving an answer of