Eye tracking and analysis of the human eye is one of the perspective technologies. It can substantially improve our interaction with our smartphones. If you are reading this you probably have your opinion already. The real question is whether the precision is sufficient – thus how precise it can actually get.
Before we can reason about the precision, let’s first establish some facts/assumptions that we will consider. Eye is a strange natural camera, that has evolved somewhere around 500 million years ago. By analyzing the eye we understand detecting the different elements, the most important is the iris. An iris is a close to circular object with a deformable hole in the middle. Well, almost in the middle – surprisingly, outer and inner boundary of the iris rarely fully concentric circles. The pupil is often not circular, but rather ellipsoid. On average is the diameter of an iris 12mm, but actually it can vary anywhere from 10.2mm to 13mm – very young babies are, of course, outside this interval. The pupil varies a bit more, typically from 1.5mm in bright light to 8mm in dark environment.
To evaluate how precise detection of iris can get, we need to know how big it will be once captured by a camera sensor. To evaluate that we need to know the resolution of the camera and the field of view. Let’s consider a phone OnePlus 5T. It has a 16 megapixels sensor with the pixel size 1.12μm, aperture f/1.7 and focal length: 27.22mm.
Computing the resolution of the camera
What is the field of view?
We can compute the angular field of view using following relationship:
where is the focal length and the horizontal FoV is basically width of the sensor in mm. Now, our horizontal field of view (FoV) is , but for the purpose of computing the angular FoV we will use the value 35mm. You ask why? It is because the focal length 27.22 is not the real focal length, but an adjusted value that would provide the same perspective on the full-frame sensor.
The angular FoV of the smartphone is therefore:
We can verify it approximately with a simple protractor by simply putting the sensor to the center and checking how many degrees is visible:
This is of course only an approximate – due to imperfections of my setup and some cropping we see actually a few degrees less.
(Theoretical) Size of the pupil in the image
Since the width of the captured image is 4608 pixels, one pixel is capturing . Of course, we still don’t know how much details that represents with respect to the iris, because we don’t know how far is the iris located. Common sense says that closer the object is, more details is captured. If we consider 4mm object (size of a rather smaller pupil) and plot is as a function of distance we get following graph:
From the graph we can conclude that even in the larger distance, the size of the pupil in the captured image will be around 20 pixels. This means that we can detect changes in the iris of , right?
In all our computations we considered the size of the sensors, and focal length of the lens, but that is not enough for two reasons:
- sensors are not ideal
- lenses are not ideal
We considered the resolution of the camera to be 16 megapixels. This is a pixel resolution, however, for image processing we actually don’t care about pixels, but more about spatial resolution. Spatial resolution is a measure of how small pattern in an image can be resolved in a sufficient contrast before it appears as a blur.
Size of the pupil in the image (adjusting for diffraction)
The sensor will have some noise, but what I believe is more important is the imperfections of the lens. Lenses in our smartphones are small, really really small. In fact, they are so small that their aperture is limiting the spatial resolution due to diffraction. All lenses are limited in their resolving power by diffraction. Diffraction is a physical phenomena which occurs when light passes through a small aperture before it reaches the sensor.
Luckily, we can easily compute the resolution limit of a lens due to diffraction from the f number which is information that many manufacturers proudly provide.
where is the light wavelength and is the f number for the lens. Let’s consider 700 nm (0.7µm) as the wavelength which is the largest from the visible band of the light (our worst case scenario). Our resolution is then for our camera and for the red light. As we can see, manufacturer declared size of the pixel to be 1.12 µm, however the lens can resolve only 2.9µm. As you can see, the wavelength plays a big role, because for violet/blue spectrum with wavelength 350 nm (0.35µm) in our resolution 2x better . So even in the case of the 350nm light, the resolution is limited by the lens (or aperture).
If we recalculate the resolution, we can resolve 1779 points horizontally and 1334 vertically which is in total 2 373 186 points. So instead of 16 million points (16 MPx) we can resolve only a bit less than 2.4 million.
If we update the our function of a 4mm object size depending on the distance we get following graph (using the resolution for 700nm (red) light):
And that is still only theoretical limit. In practice, there is a lot of imperfections of the optical elements in the lens which are combined by imperfections of the sensor.
The lesson learned is, that the real resolution we can capture from the cameras (details we can resolve), is guaranteed to be lower than the pixel resolution declared by the manufacturer. To get the true resolution, One should better test the camera, by measuring the point spread function of the particular unit and even in particular conditions. A nice tutorial to evaluate the camera can be found here.