# Introduction to Floating Point Numbers in Java

In this tutorial we introduce the technology that Java uses to store floating point numbers. Java implements the 1985 IEEE 754 floating point format.

Two floating point types are supported; single precision and double precision. Here is a summary of each:

Java Size Size Range Precision Type (bytes) (bits) approximate in decimal digits ================================================================ float 4 32 +/- 3.4 * 1038 6-7 double 8 64 +/- 1.8 * 10308 15

As an example, consider the decimal (base 10) value 2.0. Humans prefer base 10, but computers prefer base 2.

When stored in IEEE 754 single precision format it looks like this in binary (base 2):

seee eeee emmm mmmm mmmm mmmm mmmm mmmm 0100 0000 0000 0000 0000 0000 0000 0000

Where s is the sign, e is the exponent and m is the mantissa, or fraction.

The exponent is 'biased' by +127. In other words, the exponent value stored in the number has 127 added to it. Referring to the decimal 2.0 example, above, the exponent works out to be 128, but after subtracting 127 the true exponent is actually 1.

The mantissa has an assumed 1 as the leftmost digit. This slick trick provides an additional bit of precision. 24 bits of precision fit into 23 bits!

The radix point of the mantissa is assumed to be to the right of the assumed "1". Referring to our decimal 2.0 example again, the mantissa looks like this:

1.00000000000000000000000 (base 2)

That's 1. followed by 23 zeroes. Remember we are still working with base 2.

Next we apply the exponent. In our example the exponent worked out to be 1. Therefore we will shift the radix point 1 place to the right. In other words we, are multiplying by 2. The result is this:

10.000000000000000000000 (base 2)

This number translates to 2.0 in base 10.

Figure 01 illustrates a useful technique for retrieving the internal storage format of a floating point number.

int x = Float.floatToIntBits((float)2.0); System.out.printf("\n hex format of 2.0 = %x", x);

The code snippet in Figure 01 generates the output displayed in Figure 02. It illustrates how to obtain the internal storage format of the floating point number. The method used, floatToIntBits(), is a member of the Float class. It's a static method so we don't need to instantiate. A companion method exists in the Double class.

We can easily cobble up a snippet of code that builds properly but carries a potential problem. Figure 03 illustrates the precision problem.

We start with a number that looks mostly harmless: 1236.0007. We store the number in a float data item (IEEE 754 Single Precision format).

Next, we subtract the integer part of the number. Intuitively the result of the subtraction should be .0007. It's not.

Finally, we perform a test for equality to verify that we still have the .0007.

Figure 4 is the output of the code snippet in Figure 03.

OK, we have seen that the result of 1236.0007 minus 1236 is not .0007.

What went wrong? Is Java broken?

Not at all. When we store 1236.0007 into **f1** (remember that f1 is a float data item), the bit pattern that is stored is 0x449a8006. This bit pattern actually represents 1236.000732421875, which is the closest we can get to 1236.0007 using 24 bits of precision. We simply cannot store it any better.

On a side note, Figure 05 illustrates how to reach infinity. The IEEE 754 format provides a special value for representing infinity. The code is a little kludgy, but it does illustrate how to overflow a floating point data item.

Figure 06 is the output of the code in Figure 05.

We have seen that floating point numbers can come with precision errors under some circumstances