Miscellaneous Notes
February 2026
Aathreya Kadambi
This note feels criminal to my mission of organizing my notes in a clean way, but alas, I think it’s necessary.
Computer Science
Floating Point Representation: IEEE 754
The floating point representation system is crucial to scientific computing. The vibe of this topic shouldn’t actually be new: IEEE-754 is really just glorified and slightly modified scientific notation. The implementation, however, utilizes several nuances that are specific to computer representations.
Vibe. Scientific notation (representing real decimal numbers in the form ) is a useful way to represent a broad range of numbers with consistent relative (as opposed to absolute) precision.
Nuance. Computer representations are binary.
Since computers can’t represent “decimal points”, “times”, and “exponentiation” in an easily human-readable format, we need a sort of standardized way to interpret a collection of bits as a real number. That’s where IEEE-754 comes in.
Check out Lukas Kollmer’s visualizer for a good visual on how to interpret IEEE 754.
Another good resource is the table at this link which I reproduce here for ease:
Single Precision (32-bit) | Double Precision (64-bit) | Object Represented | ||
|---|---|---|---|---|
| E (8) | M (23) | E (11) | M (52) | |
| 0 | 0 | 0 | 0 | true zero (±0) |
| 0 | nonzero | 0 | nonzero | denormalized (subnormal) number |
| 1–254 | anything | 1–2046 | anything | normalized floating point number |
| 255 | 0 | 2047 | 0 | infinity (±∞) |
| 255 | nonzero | 2047 | nonzero | not a number (NaN) |
An introductory experiment to try on Kollmer’s visualizer: flip all the exponents to zero. Can you use the chart to explain why the green number in the centered formula is still one, despite it technically being zero in binary?
The purpose of denormalized or subnormal numbers is to allow a much larger range at small magnitudes by utilizing the full length of the mantissa for improved precision.
To simply explain the standard, there are two pieces. But first, let us introduce a new type of scientific notation:
First, the usual scientific notation corresponds to normalized representations:
For the following, let us say that we are using a floating point representation with a bias of . For our convention, and is a negative number. This is what I’ve seen at UC Berkeley, but different opinions seem to exist. If your bias is positive, it likely means that you should be swapping all the signs in front of in the formulas below, or you’ve encountered a very strange exam question. 🪦
From Human Representations to IEEE 754 Floating Point Representations
As you might have noticed above, regardless of whether we use the normalized or denormalized representations, there are three components:
- : the sign of your number. This takes one bit to represent.
- : the “mantissa” or “significand” of your number.
- : the “exponent” of your number.
Mantissa is a historically loaded word, so the word significand is often slightly preferred, but to be honest at UC Berkeley people seem to say mantissa more frequently in my experience.
We need to put these three pieces into the three pieces of the IEEE 754 representation, which are conveniently named to refer to which part of the number they store information about. is a single bit representing the sign, consists of bits representing the exponent, and consists of bits representing the mantissa.
Nuance. Since binary strings are most easily understood as nonnegative integers, we utilize the “bias” to offset the raw number in to obtain .
In the end, we will try to compute , , and , which are binary strings representing the pieces of information above. Then, our number can be stored as in memory.
- is simply 0 if your number is positive and 1 if your number is negative.
- is:
- Normalized case: If , .
- Denormalized case: If , rearrange your number into the denormalized representation above to make , and set .
- is always , however what means corresponds to either the normalized or denormalized human representation above.
And boom! If you use these, you should get your IEEE 754 representation.
From IEEE 754 Floating Point Representations
Suppose you are now given an IEEE 754 representations, with binary strings: , , and . We now want to recover the sign, , and of our number.
- If is 0, our number is positive, and if it’s 1, it’s negative.
- is always .
- If , you should convert into the denormalized human representation,
- If , you should convert into the normalized human representation.
- is always , but be careful to use the normalized or denormalized human representation above.
And that’s it!
Some Other Useful Formulas
Another interesting point of discussion is the “step size”. Suppose your representation has mantissa bits. If your exponent is , then the step size is in the normalized case, and in the denormalized case. Alternatively, since , the step size is also in the normalized case, and in the denormalized case. Since for denormalized numbers, this is .
