Yubo
Introduction
Autofocus is one of the three basic camera auto-algorithms enabled in most consumer photography cameras, with the other two being auto-exposure and auto-white balance. Out of the three, both autofocus and auto-exposure requires a feedback mechanism to adjust physical characteristics of the camera lens or sensors, respectively. In the case of autofocus, the software will evaluate whether or not the captured image is in focus, and will adjust the lens position for subsequent captures if it is not.
This project will focus specifically on simulating the VCM focuser. The VCM focuser implemented in this project utilizes exclusively contrast detection, and does not use phase detection. In a VCM focuser, the camera is attached in a VCM barrel with springs on one side of the barrel, and a magnet on the other. The magnet is controlled using an electric current. When the magnet is inactive, the springs will hold the lens at the rest position, which usually focuses at infinity. When the magnet is activated, depending on the current, will move the lens to specified positions within the barrel.
This project attempts to provide a virtual focuser implementation and a basic autofocus algorithm to evaluate. There are many different focuser types, many of which have been covered in the lectures. There are many benefits to creating a virtual focuser. First, it allows the tweaking of physical focuser characteristics, without having to rely on an abundance of physical focuser modules. Second, it abstracts away the reliance on hardware when evaluating autofocus algorithms. This enables camera software developers to perform system and regression tests on a cloud environment without having to set up a static and controlled scene.
The github for the project can be found at: https://github.com/YuboHan/FocusSimulator
Related Lecture Material
Lecture 1 defines a scene to be in focus when it satisfies the lens maker equation (thin lens approximation).

When the object is out of focus, the 'blurriness' of the scene is affected by the aperture size. The area of the blur itself is known as the circle of confusion.

There are many ways to adjust focus. In lecture one, we discussed the light field camera. With this method, the camera is able to adjust focus after the image has taken, and thus does not require an autofocus algorithm to be run while we are taking the image. Lecture two covers two more techniques in adjusting focus by physically moving the lens towards and away from the sensors: VCM, and MEMs technology. The lectures also mention a method of autofocusing called Phase Detection. This project however, will be implementing another method of auto-focus called Contrast detection. The main difference is that phase detection utilizes the phase of incoming rays, whereas contrast detection only adjusts focus based on pixel sensor information.
A neat in-class demonstration was shown to demonstrate the idea of phase detection autofocus:
An image is generally more focused if it contains higher frequency components. As shown in lecture 1, this project will utilize the Discrete Fourier Transform of the image. We will implement an algorithm to weigh the DFT such that the final 'focus score' will be higher, if the image contains more higher frequency signals.
Definition of Terms
Here is a list of commonly used terminology in this project, and their meaning:
- Convergence: The autofocus algorithm is said to be converged if it confidently believes the current image achieves a sufficiently high level of focus and detail, and the algorithm will exit with success.
- Convergence point: Focuser position where the image is converged.
- Convergence region: Smallest range that the autofocus algorithm can be confident the convergence point lies within.
- Focuser Position: Abstract integer between 0-255 that represents the position of the lens in the focuser barrel. In contrast, lens position references the physical location of the lens in the focuser barrel.
- Lens Position: The physical position of the lens in the focuser barrel. In constrast, focuser position is the abstract integer value of the lens position in the focuser barrel.
Architectural Design
This section describes the high level design of the project, why certain decisions are made, any assumptions, and literary sources used. The project will make use of a main program, and two modules: the focus simulator, and the autofocus algorithm.
At initialization, the focus simulator will accept the physical conditions of the scene and focuser. These include parameters such as distance of scene away from the camera, and camera properties such as aperture size and focal length. It will also accept a focuser position value between 0-255 (8 bit integer). This input value will be provided by the autofocus algorithm to adjust the lens position within the focuser barrel. This module will then output an image of what the captured image would look like at the lens position specified by the focuser position.
The autofocus algorithm will accept the output image from the focus simulator as an input. On the first capture. The module itself will output a state and a value between 0-255. The state determines if the algorithm considers the image in focus, and if it isn’t, the output value will determine where the next focuser position will be. The image is considered to be converged when it is sufficiently sharp and in focus on the target object. Similarly, the position of the lens in the focuser barrel that produces the converged image is called the convergence point.
See below for a diagram showing the relationships between the units:
3.1 Scene Properties
class SceneProperties:
distance # Distance between scene and camera
3.2 Focuser Properties
class FocuserProperties:
aperture # Aperture of lens
focalLength # Focal length of lens
barrelLength # Length of VCM barrel
maxNumCaptures # Max captures before converging
maxBarrelSpeed # Max distance lens can move within the barrel per capture
3.3 Focuser Simulator
To calculate blur radius, we use the formula1:
Where:
- σ is the blur radius in pixels
- ρ is used to convert physical to pixels
- r is the blur radius in meters
- A is the aperture
- f is the focal length
- u is the distance of scene to camera. We assume this to be >> s
- s is the distance from the lens to sensor
For simplicity, we also assume the blur to be a Gaussian blur, and there is no chromatic aberration. These assumptions are made to narrow the scope of the project, and may be implemented in future revisions. Geometric blur may be considered as future improvements.
The VCM barrel will be modeled as:
Barrel length b is provided in the focuser properties. Recall the focuser position to be an 8-bit integer. We can define 0 to be when the lens is closest to the sensor array (the image is focused at infinity), and 255 to be the maximum distance the lens can be from the sensor array. These values are chosen as they are standard in many open source camera APIs, such as the Android camera2 API22. We assume the focuser position values be uniformly distributed across the focuser barrel. As such, the initial position of the lens will always be 0.
We define several physical limitations to the VCM focuser. First, the physical lens is bound within the VCM barrel, and cannot move outside of its range. Second, we define a maximum speed that the lens is allowed to move. The speed of the lens for a capture is independent of the previous speed. For example, if the max speed is defined to be 10us per capture, and the previous displacement was +10us, the maximum negative displacement this capture can be is -10us. In other words, between each capture, we assume the lens to return to stop moving after it assumes its new position. This decision was made as most physical cameras must have its lens stationary for the duration of a frame due to their use of rolling shutters.
3.4 Autofocus Algorithm
The autofocus algorithm accepts an initial image that has been blurred by the focus simulator. It will then perform a Fast Fourier Transform (FFT) on the center 50x50 pixels, keeping track of the FFT metrics. It will then sweep the entire length of the VCM barrel by providing different focuser positions, keeping track of the FFT metrics at each position. Using the information from the sweep, it will then fine tune the focuser positions to obtain the best focus for the image, we consider the focuser to be at convergence at this point. The number of frames must not exceed a maximum number. Normally, this max is dependent on the camera mode. For example, sports mode will have a smaller max capture number than portrait mode. But for the purpose of this project, we assume the amount max amount of captures to be a property of the focuser. Additionally, we assume that the calculated new focuser position will be applied directly to the next capture. Typically, from camera capture to the apply the new focuser position, it takes 2-3 frames. In other words, for capture X, the earliest the new focuser position calculated from capture X can be applied is at capture X+3. This assumption is made to simplify the project scope.
The autofocus algorithm can be described as two components: the focus score, and the autofocus algorithm. The focus is an abstract value of the current capture that determines how in focus the image is. The focus score is then used in the autofocus algorithm to determine autofocus convergence. The autofocus algorithm can additionally be subdivided into three steps: the initial sweep, moving the focuser into the convergence region, and finding the convergence point. The convergence region is defined to be the smallest region of the focuser barrel we can assume that contains the convergence point.
The pseudocode can be found below:
imageList = [] # Contains a history of captures, and their focus scores.
focuserPosition = 0 # 0 - 255. Keeps track of current position
# Do an initial sweep of the focuser barrel
while curCapture < maxNumCaptures - 1:
if not firstRun:
focuserPosition += maxBarrelSpeed * 255 / barrelLength
img = blurImage(focuserPosition)
focusScore = getFocusScore(img)
imageList.append(img, focusScore)
if we do not have enough captures to sweep entire barrel:
if current focusScore < last focusScore + threshold:
# If we cannot sweep entire barrel, we assume when the focusScore
# decreases past a threshold, we passed point of convergence
break
if focuserPosition = 255:
# Entire barrel has been sweeped, exit
break
# Select best score from the imageList index i. We assume best focus score must be be
# between indexes i-1 and i+1
upperThreshold = imageList[i+1] # focuser position (1-255)
# Move barrel to within upperThreshold
while curCapture < maxNumCaptures - 1 and focuserPosition > upperThreshold:
focuserPosition -= maxBarrelSpeed * 255 / barrelLength
imageList.append(blurImage(focuserPosition), focusScore)
# Now try to converge
while curCapture < maxNumCaptures - 1:
# Get newest best image. Let i denote index with best focusScore in imageList.
# This is calculated for each loop, in case the previous calculation produced
# new best.
closerThreshold = imageList[i-1] or imageList[i+1] whichever is closer
furtherThreshold = imageList[i+1] or imageList[i+1] whichever is further
# Next focuser position is 1/3 between closerThreshold and furtherThreshold
focusPosition = (closerThreshold*2 + furtherThreshold)/3
imageList.append(blurImage(focuserPosition), focuserScore)
# We consider the focuser converged when focusScore and position from previous
# capture is within a threshold of current capture
if converged:
break
# The final capture will always attempt to go to position with best focus score
return blurImage(best focuser position)
3.4.1 Focus Score
To calculate the focus score, we first take the FFT transform of the image using openCV, then add up the values in the FFT. Images in focus will have higher frequency components than out of focus images. Therefore, we apply a weight to each spectral frequency based on its position. The pseudocode can be found below:
def getFocusScore(img)
ftransform = numpy.fft.fft2(img)
for value, x, y in ftransform: # (x, y) = (0, 0) is zero frequency
weight = log(x) + log(y)
focusScore += value * weight
return focusScore / (ftransform.height + ftransform.width)
I found using log(x) + log(y) for weight, compared to using x + y, or 1 as weight produced more qualitatively accurate results:
- weight = x + y sometimes produced incorrect focus scores if the extremely high frequency components contain noise
- weight = 1 is very inaccurate and error prone due to the low frequency components being valued the same as higher frequency components
- weight = log(x) + log(y) seems like a good approximation between the above two methods tested
The focus score is calculated not by using the FFT transform of the entire image, but a 50x50 region within the image. In conventional cameras, many autofocus algorithms focus on specific regions, and as the scenes are 3 dimensional, as the specific region comes into focus, the other regions that do not lie on the same plane will move out of focus. Using a 50x50 region enables us to implement touch to focus once depth is introduced to the simulator. Currently, the 50x50 region is defaulted to the center of the image.
3.4.2 Autofocus Algorithm Implementation
The autofocus algorithm can be divided into three steps: the barrel sweep, moving the focuser into the convergence region, and finding the convergence point.
When doing the barrel sweep, we start at focuser position 0, incrementing by the max possible focuser step until the max focuser position at 255. If it’s not possible to fully sweep the barrel and return to the position with the highest focus score, assume that any local maxima is the global maxima. In other words, if the current focus score is lower than the previous focus score, and it’s not possible for us to return to the previous focus score after sweeping the barrel, we will cease barrel sweep and move directly to step three: finding the convergence point.
The second step, to move the focuser into the convergence region, we first must calculate the convergence region. The convergence region is defined as the region contained by the data points one index above and one index below the index with the highest focus score. We assume that the convergence point to be in this region.
At the third step, we can make several assumptions:
- the focuser position is within the convergence region;
- the entire convergence region can be traversed by at most two steps
We define the smallest and largest values of the convergence region to be the thresholds. The focuser position for the next capture is ⅓ of the way from the closer threshold to the further threshold, relative to the current focuser position. This is done for several reasons:
- The ⅓ point between from the closer threshold to the further threshold is always within one step of the focuser. This is because the convergence region is at most two steps wide.
- Due to the nature of the initial sweep of the focuser barrel, all points are equally spaced. If we are to use ½ instead, it runs the risk of getting stuck at a particular point. Suppose we scanned 0, 63, 127, 191, 255. Suppose 78 is the optimal position, and the best position so far is 63. The algorithm will pick the convergence region to be 0 and 127. Because ½ of 0 and 127 is 63, the algorithm will never converge at 78.
- Using anything above ½ does not guarantee the new focuser position to be within one step of the current position, thus wasting unneeded frames. Additionally, it will take longer to converge as the focuser will act like an underdamped system, oscillating around the convergence point.
The one limitation is that the focuser cannot be able to traverse the entire barrel in one sweep. This would mean the focuser will be stuck at the ⅓ point.
During the third step, for the current capture, we consider the algorithm converged if the current position and focus score is within a threshold of the previous position and focus score. These values can change, but are currently programmed in as within 5 positions (out of 255), and within a 0.1 focus score.
The last capture will always attempt to move the focuser closer to the best measured position, regardless of the state of the autofocus algorithm. If the best measured position is too far, it will move the focuser 1 maximum step towards it. This last capture is only significant if the third step was not able to converge. This usually occurs if the maximum number of captures is too low.
Results
This section summarizes the results of this project, and presents some examples of potential experiments that can be achieved using this focuser simulation.
We define the default focuser as:
focalLength = 0.05 # meters aperture = 0.028 # meters fNumber = 1.8 viewAngle = 100 # degrees barrelLength = 0.01 # meters maxNumCaptures = 20 maxBarrelSpeed = 0.002 # meters/capture
We define the default scene as:
sceneDistance = 1 # meters
4.1 Robustness Evaluation
We will evaluate how well the algorithm performs to determine its robustness. To do this, we will define a set of operational ranges:
focalLength = {0.01, 0.05}
Aperture = {0.01, 0.05}
viewAngle = {60, 160}
barrelLength = {0.005, 0.05}
maxBarrelSpeed = {0.001, 0.01}
sceneDistance = {0.5, 100}
We make the additional restriction that max barrel speed must be less than barrel length.
For 1000 runs of random parameters (distributed uniformly), we check if the algorithm has converged. The results are 978 converged out of 1000. See Appendix for list of first 100 results.
4.2 Qualitative Results
This section contains an example of what the focuser interface looks like while performing the sweep. The left image is the entire image after blurring. The center image is the center 50x50 pixels we use to calculate the focus score. The right image is the resulting magnitude spectrum calculated from the center image.
4.3 Barrel Speed vs. Convergence
We can vary the barrel speed from 1/10 of the barrel length to the full barrel length per capture, to see how long the algorithm takes to converge.
From the graph, we can see that there is no direct correlation between captures to converge and barrel speed, although there is a slight downwards trend as barrel speed increases. The random spread of the convergence speed is most likely due to luck: if the algorithm can hit the highest focus score without having to adjust, then it will converge quickly. Otherwise, it may take time to fine tune itself. The general downwards trend of the graph is expected: the faster the algorithm can finish the sweep, the faster it can converge. Using this logic, it should be expected that points above 0.005 m/capture should converge in similar captures, as it takes all barrel speeds the same time to sweep the barrel (2 captures). However, the data is too random to draw concrete conclusions.
4.4 Scene Distance vs. Convergence Speed
We can also see the relationship between the scene distance vs. convergence speed by varying the scene distance from 0.5m to 1000m.
From the information in figure 2, the convergence speed above 100m appears to be the same. We can now zoom in to where the scene distance is closer to the camera.
The results are mostly expected. Since the focuser always start sweeping from a distance closest to the center, the start position is always focused at infinity (lens is at focal point). This means that if the focus is at infinity, then the lens need to sweep always to the end before returning. If the object is at a macro distance, once the sweep is done the focuser will already be close to the object, thus making convergence faster.
The exception is when scene distance = 0.5. Upon closer examination, this is due to the focus position being close to 255 (253 to be exact), and the system behaves like an overdamped system.
Future Improvements
This section provides a summary of further improvements that can be made to the autofocus simulation. All suggested improvements were introduced in section 2.
Chromatic aberration: Blur image with chromatic aberration in mind, and see how the autofocus algorithm changes. I suspect the effect will be minimal as the algorithm is in black and white.
Camera latency: Introduce the 3 frame latency delay in the camera pipeline to apply new autofocus parameters. This improvement will make the system behave more like a real time autofocuser but will require significant revisions to the autofocus algorithm outlined in this report.
Introduce different focus modes: Different focus modes may include sports mode, which requires fast convergence, portrait mode with lanes introducing significant bokeh, low light focusing (large aperture), etcetera.
References
1. Y. Xiong and S. A. Shafer, "Depth from focusing and defocusing," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 1993, pp. 68-73. Available: http://www.cim.mcgill.ca/~langer/MY_PAPERS/MannanLanger-CRV16-DFDCalib.pdf [Accessed: 03-Dec-2019].
2. “android.hardware.camera2.CaptureRequest,” Android Developer Documentation. [Online]. Available: https://developer.android.com/reference/android/hardware/camera2/CaptureRequest.html#CONTROL_AF_MODE. [Accessed: 03-Dec-2019].
Additional references from the internal Nvidia Camera Architecture designs, brownbags, presentations used but not referenced for NDA purposes.
Appendix A - Robustness Test Results
See https://docs.google.com/document/d/1dbuxbXM_8jDIH43FmakniGnJeYNKMcgXWWu7I-0OnQM/edit?usp=sharing for Appendix A



