# Floating Types

## Introduction

Digital computers are great, but when it comes to computing numbers with arbitrary precision there is a problem. There is no way of representing every possible number as this would mean that a number might take up an infinite amount of space. Thankfully, clever engineers invented floating point values which offer a reasonable trade-off between range and precision.

## History of floating point

According to wikipedia the invention of floating point values dates back to as early as 1914 when Leonardo Torres y Quevedo proposed an early form of floating point values for a calculator design.
The Z1 built by Konrad Zuse in 1938 was the first freely programmable computer and it already had full support for 24-bit floating point values!
The Z3 completed in 1941 even had representations for +/-∞.

Because there are many subtleties in floating point numbers without one obvious way of doing things, by the 1980s there where many incomatible floating point implementations. This was the source of many problems and lead to the standardization of floating points in IEEE 754

## (Binary) floating point numbers - a refresher

The way computers store (binary) floating point numbers is by breaking them up into the sign, n digits of the fraction and an exponent which is stored in m bits. The floating point number thus needs 1 signt bit + n fraction bits + m exponent bits of space. Special values that floating point numbers can represent are +/-zero, +/-∞ and NaN (not a number)

In IEE754, every operation (+-*/...) is always allowed for floating points even e.g. dividing by zero. The result of dividing by zero can be NaN or ∞. A NaN is always unequal to any other number including itself. If numbers get too big to be represented by the floating point number, the result will be ∞.

Because of the nature how floating point numbers are stored in computers, the density of floating point numbers decreases the further you are from zero. The result of this is that within the range of a floating point the relative error will always be less than a certain epsilon.

## 32-bit floats

If the type of a float is not inferrable it is assumed to be 64-bit. That is why we have to call f32 with 4.0 in below example.