Psych221 Project Suggestions: Difference between revisions
No edit summary |
|||
Line 28: | Line 28: | ||
This project is to simulate the spectral signals we expect for (a) a specific light source, given (b) skin with different amounts of blood, and (c) different ratios of oxygenated to deoxygenated blood. It will also include simulations of fluorescing tissue. | This project is to simulate the spectral signals we expect for (a) a specific light source, given (b) skin with different amounts of blood, and (c) different ratios of oxygenated to deoxygenated blood. It will also include simulations of fluorescing tissue. | ||
You will be able to draw from previous class projects, open-source software, and our advice to simulate light-based medical imaging devices. | You will be able to draw from previous class projects, open-source software, and our advice to simulate new types of light-based medical imaging devices. | ||
* ''What you will learn'': What are the design decisions involved in making a camera for medical imaging and diagnosis | * ''What you will learn'': What are the design decisions involved in making a camera for medical imaging and diagnosis |
Revision as of 23:02, 22 September 2023
Below we list project suggestions for Psych 221. We update this page regularly with ideas for projects.
- We describe how you should create the write-up on the Project Guidelines page.
- More than one person or group can work on the same project.
- Just after mid-terms you will be asked to turn in a short paragraph proposing your project
- We want to make sure you are the right person for your proposed project
- If you want to work on a project that is not listed, but perhaps it is helpful for your research, ask us.
See the links to past projects in the bar at the left.
Projects Fall 2023
Color science: Simulations of James Clerk Maxwell's experiments
In a series of papers around 1860, James Clerk Maxwell introduced the color matching experiment. His papers contain tables of data and schematics of the instrument that enabled him to quantify the color matching functions in two different people.
Using the tools in ISETBio and ISETCam, I would like to implement a tutorial that simulates the apparatus and simulates the likely experimental outcomes given what we have learned about the human visual system in the 160 years since Maxwell performed his work.
This project would describe Maxwell's approach, simulate the apparatus, and simulate the likely experimental measures. These would be contained in a Mathlab LiveScript that could serve as a class tutorial in future years.
- What you will learn: You will understand color matching and the encoding of wavelength by the human retina.
Medical imaging: Simulations of skin reflectance and fluorescence
A number of factors impact the images we can obtain of surface tissue in health and disease. Two factors are the density and oxygenation of the blood, and the presence of fluorophores in cells that may measure the health of the underlying cells. In some cases these two interact.
This project is to simulate the spectral signals we expect for (a) a specific light source, given (b) skin with different amounts of blood, and (c) different ratios of oxygenated to deoxygenated blood. It will also include simulations of fluorescing tissue.
You will be able to draw from previous class projects, open-source software, and our advice to simulate new types of light-based medical imaging devices.
- What you will learn: What are the design decisions involved in making a camera for medical imaging and diagnosis
Physically based simulation: Night time driving
Both autonomous and ADAS driving projects often rely, in part, on simulated scenes. In particular, modern ADAS systems for lane keeping and pedestrian detection rely heavily on cameras and headlights, often first designed and evaluated using simulation. Creating and benchmarking the required physically accurate 3D spectral radiance scenes is possible, but challenging. We are developing tools for simulating cars (with their headlights on!) driving at night. This project involves using computer graphics (PBRT) and Matlab to evaluate camera and headlight design options when used with industry standard benchmark scenarios.
- What you will learn: How to model lights and sensors using Matlab and ISETAuto, as well as how to conduct experiments with them and evaluate the results.
Optics simulation
The pointspread function is a measure of optical quality. In simulation, the wavefront aberration measured in the output pupil (a complex function) predicts the pointspread function (a real valued function). But inverting the calculation, from pointspread to wavefront, is underconstrained and subject to different types of errors.
It is possible to use the point spread function (PSF) to estimate the wavefront aberrations of a lens when the PSF is fully sampled and the aberrations are not too severe, and when something is known about the possible solutions. For example, we may know that the wavefront aberrations are a smooth function; or that they are dominated by a few low-order Zernike modes.
We know a great deal about the distribution of the Zernike polynomials for the human eye, and we also know a great deal about many common lenses. Regularization may help a lot, and I would like to know how much and when.
Here are some other examples of when it may be possible to obtain the PSF and estimate the wavefront aberrations:
- Microscopy: The PSF of a microscope can be used to estimate the wavefront aberrations of the microscope objective lens. This information can be used to correct the aberrations and improve the image quality.
- Telescopy: The PSF of a telescope can be used to estimate the wavefront aberrations of the telescope optics. This information can be used to correct the aberrations and improve the resolution of the telescope.
- Ophthalmology: The PSF of the human eye can be used to estimate the wavefront aberrations of the cornea and lens.
This information can be used to correct the aberrations and improve the visual acuity of the patient.
An application for me is this: The pointspread often varies with field height (on-axis to off-axis). It is difficult to interpolate the pointspread from samples at different field heights. I have always wondered whether interpolating the wavefront aberrations at different field heights might be a better approach.
- What you will learn: Basic optics and optical characterization.
Projects Fall 2022
Depth sensing image system
Brightway Vision has loaned us a test system. Their system is a special CMOS sensor coupled with a precisely controlled light source. The system measures the time of flight of the photons from the light source back to the sensor to measures images at a series of distance ranges. We have their software for controlling the system, and we have methods for reading the data. We would like one of the class projects to be a calibration of the system. One of the cool features of the system is that - when properly controlled - it can see through fog. And, we have a fog machine!
Designer glasses with color filters
A critical topic we cover in class is the idea of color matching. This topic explains which lights appear the same to the eye, or camera, even though they are physically different. It is possible to use the principles of color matching to create special visual effects. For example, we can design glasses with color filters that make certain objects appear more similar and certain objects that appear similar can be made to appear more different. This project will have you use your knowledge of color vision to design color filters for different purposes and then validate your choice using the software tools in the class.
Ask us about the Pickleball problem.
Optics analyses
The software toolboxes in the class include methods for designing and evaluating certain simple lenses. In this project, you will design lenses to magnify and minify images. You will be asked to quantify the lenses you design by illustrating their pointspread functions, optical transfer function, chromatic aberration, and geometric distortion. Some more complex calculations such as illustrating the light field and the impact of placing small pieces of metal in the light path, may be part of the advanced projects.
Sensor design experiments
Image sensors have a number of different properties that impact how well they can capture light - these include pixel size, well capacity, various noise characteristics, and their color filter arrays. In this project you will be asked to use the ISETCam tools to design different types of sensors as well as acquisition policy (e.g., burst mode, exposure bracketing) and evaluate how well the sensors with the policy perform in different imaging contexts (high dynamic range, low light levels, day time).
Sensor calibration
These projects will teach you how to assess various camera properties that are needed to evaluate image quality. We collected the data; your role will be to analyze the data to learn about camera calibration. Click on the links below to see one of four distinct projects about camera calibration for color.
- Color calibration
- Camera noise
- Lens shading
Cornell box lighting estimation
Human vision metrics
A section of the course is devoted to understanding human vision, including the optics, cone absorptions, and spatial pattern sensitivity. In this project you will be asked to use engineering metrics (ssim, s-cielab) that are designed to evaluate certain aspects of image quality. You will explore how well these metrics perform as you characterize them with simple test targets and more complex images.
Image processing algorithms including potentially neural networks
Some of you will be interested in image understanding, or computer vision, applications. With the explosion of neural networks to perform various vision tasks, especially convolutional neural networks, there are many opportunities to perform simple experiments. We will list more here in the future - but here are a couple of examples that might interest you now.
Facial Recognition & Deep Fakes
Humans have been obsessed with faces since before we were technically humans. Now, understanding faces has become one of the most-studied problems in AI. The tools and technology are evolving rapidly, so our ideas for projects in this area are very flexible and fluid.
As tools, we have integrated the deepface toolkit and Faces in The Wild dataset into our isetml repo. It allows experimenting with face detection, face matching, and face verification. Project could include evaluating how different sensors or lenses affected accuracy, or other key metrics. (David and Brian)
Simulating skin reflectance
Light is absorbed and scattered in biological tissue in complex ways. The absorption and scattering are simulated using Monte Carlo methods that depend on various tissue type. The tissue types are defined by parameters that define components such as blood, water, chromophores (tissue components that absorb light only), and fluorophores (tissue components that absorb and emit light). This project will involve using Monte Carlo simulation software packages (e.g. https://inverselight.github.io/ValoMC/ or https://omlc.org/software/mc/ ) to simulate skin reflectance and visualize the effects of changing the parameters. You will vary parameters (such as concentration of melanin and blood oxygen saturation) and visualize how the parameters change the spectral reflectance and how these changes impact typical RGB camera sensors.
References I.V. Meglinski, S.J. Matcher,Computer simulation of the skin reflectance spectra, Computer Methods and Programs in Biomedicine, Volume 70, Issue 2, 2003, Pages 179-186 Aleksi A Leino, Aki Pulkkinen, and Tanja Tarvainen, "ValoMC: a Monte Carlo software and MATLAB toolbox for simulating light transport in biological tissue," OSA Continuum 2, 957-972 (2019) Cameras for dentists (no longer available) Alternative title: Predicting (and measuring) the visibility of tissue autofluorescence
The VELscope is an “adjunct device” that some dentists use to increase the visibility of oral lesions that may or may not be cancerous. The device is based on the theory that healthy oral mucosal tissue will fluoresce when illuminated with short wavelength light. The tissue fluorescence is attributed to FAD (flavin adenine dinucleotide), an enzyme that plays an important role in cell metabolism. The device includes a short wavelength light (peak energy at 425 nm) that illuminates the oral cavity and an eyepiece containing a longpass filter to block the reflected light. When dentists view the oral cavity through the Velscope eyepiece, they should observe the autofluorescence of healthy oral mucosal tissue. The expectation is that observation of dark spots in the oral mucosal tissue that do not emit FAD fluorescence indicates the presence and location of unhealthy oral mucosal tissue that should be further investigated.
We have a Velscope in the lab and a calibrated digital camera. The goal of this project is to model the Velscope and camera in order to calculate the expected camera sensor values for different concentrations of FAD. The data and ISETcam scripts that are necessary to predict FAD autofluorescence and to model the camera will be provided. The Velscope is also available to collect calibrated camera image data. A goal of the project is to analyze predicted and measured camera sensor data. An added bonus is to model a camera that has the same spectral sensitivities of the human eye and calculate color difference metrics for different concentrations of FAD.
References:
Joyce Farrell, Zheng Lyu, Zhenyi Liu, Henryk Blasinski, Zhihao Xu, Jian Rong, Feng Xiao, Brian Wandell, "Soft-prototyping imaging systems for oral cancer screening" in Proc. IS&T Int’l. Symp. on Electronic Imaging: Imaging Sensors and Systems, 2020, pp 212-1 - 212-7, https://doi.org/10.2352/ISSN.2470-1173.2020.7.ISS-212
Lyu Z, Jiang H, Xiao F, Rong J, Zhang T, Wandell B, Farrell J. Simulations of fluorescence imaging in the oral cavity. Biomed Opt Express. 2021 Jun 21;12(7):4276-4292. doi: 10.1364/BOE.429995. PMID: 34457414; PMCID: PMC8367257.
Projects Fall 2021
Hyperspectral camera projects
IMEC devices as an example (whatever happened to that guy Peter?)
Color cross-talk for tiny, crowded patterned pixels
Recovery of spectral estimation from the device response
Optics
Neural network solution for the RTF - explore alternatives to polynomial fitting
Machine learning
Zheng differentiable color metric
Something related to Andrey?
Human ISETBio
Tadashi illusion simulation
Curation of the Artal, Polans, and Thibos data set - with a nice web interface and explanation
Enchroma [1]
Projects Fall 2020
Overview of Image Systems Simulation Software Tools available for all class projects
Image systems simulation for camera design and evaluation
Several projects below address different parts of the same question: Can we use camera simulation and graphics rendering to create images that are effectively equivalent to real pictures taken by a modern cell phone camera?
There are various aspects of this question we need to evaluate, and the several projects in this section each address some part. Here is a link to a videodescribing the goal of the projects described below.
Modeling the spatial distribution of light in a 3D scene
Mentors: Zheng and Joyce
Physically-based ray tracing software can be used to model the way light is reflected off surfaces, transmitted through filters and lenses, and impinges on an imaging surface, such as a sensor array in a digital camera or the retina in the human eye. We incorporated PBRT (physically-based ray-tracing) software (developed at Stanford) into an image systems simulation programming environment in order to calculate the optical irradiance image of a 3D scene and predict the sensor images that would be captured by a calibrated digital camera placed in the 3D scene.
In the mid-1980s, researchers at Cornell University compared photographs of a real physical box with computer graphics renderings of the simulated box. The goal of this project is to compare digital camera images of a real physical box with predicted camera images of the simulated box.
A key step in achieving this goal is to model the spectral energy and the spatial distribution of the light illuminating the box and objects in the box.
Students will learn how to use image systems simulation software (specifically, ISET3d) to create different models for the spatial distribution of light within the simulated Cornell Box. These models will be used to predict the amount of light reflected from the walls of the box and from surfaces within the box. Students will compare predicted spectral radiance with measurements of spectral radiance obtained using a spectroradiometer. The digital camera images and the measurement data will be provided to the students.
Building a camera model of the Google Pixel 4A camera
Mentors: Brian, Joyce, Zheng and Dave
In the class we will learn about the electrical characterization of image sensors, as well as means of characterizing the wavelength sensitivity properties. This project will analyze camera images obtained with the Google Pixel 4a to estimate many camera parameters. For example, we will make measurements of relative illumination due to the lens, sensor spectral quantum efficiency, and various types of sensor noise.
The mentors and students will design the experimental measurements. The mentors will acquire the data in the Packard Lab. The students will write scripts to estimate the parameters.
Evaluating the accuracy of the sensor simulation: Color and spatial metrics
Mentors: Joyce, Brian
Suppose you have used image systems engineering methods to model a camera. No model will ever be perfect. What methods would you do to test that the simulated camera was 'accurate enough' for practical use in judging the quality of the simulation? We will discuss this with you, suggest some approaches and listen to you approaches, and then implement some of them. One set of quantitative experiments will measure how accurately we capture the color of objects. Another set of experiments will measure how accurately we capture the spatial resolution of the camera.
Evaluating the perceptual accuracy: Visual Psychophysical Experiments on the Web
Mentors: Joyce
Quantitative measurements of the sensor values is a good approach to assess the validity of the simulations. Another important approach is to ask whether the real and simulated images look alike. We can only judge this perceptual similarity by having people perform experiments.
Mechanical Turk The internet provides access to a large and diverse pool of human subjects, but researchers do not typically use the internet to conduct vision experiments due to the inability to calibrate displays, control the stimulus presentation and constrain the viewing conditions. Nonetheless, there have been many attempts to conduct online visual psychophysical experiments, and this project asks you to set up an experiment using Mechanical Turk.
Similarity ratings On each trial the participant looks at a pair of images. The participants provides a number between 1 and 4 that says how likely the two images were obtained by the same camera. We will provide pairs of images that can be used in the experiment.
As you design this experiment, consider what you might do to learn about the viewing conditions under which people make their perceptual judgements. For example, is there some way in which you can estimate the viewing distance between the display and the person, or the properties of the display (e.g. gamma, resolution).
The goal of this project is to implement an experiment on Mechanical Turk. You should be able to demonstrate the experiment, but we do not expect you to collect and/or analyze data from the experiment.
This can be a group project that includes a survey of the literature and software that has already been developed for online vision experiments. Below are links to two relevant references, but there are many more papers and software packages to be found on the web.
- Lavin, Silverstein and Zhang (1999), "Visual experiment on the Web," Proc. SPIE 3644, Human Vision and Electronic Imaging IV, (19 May 1999); doi: 10.1117/12.348482
- Li, Jun, Yeatman and Reinecke (2020) “Controlling for Participants’ Viewing Distance in Large-Scale, Psychophysical Online Experiments Using a Virtual Chinrest”, Scientific Reports 10 (1) , doi: 10.1038/s41598-019-57204-1
How does semantic labeling by a convolutional neural networks depend on camera parameters?
Link to video describing additional software applications and tools available for this project
Mentors: Dave and Brian
Several famous convolutional neural networks for semantic labeling are implemented in Matlab (including Googlenet and Resnet). David has built tools that enable us to (a) download labeled images from large databases, (b) implement different camera models (varying lenses and sensors) to render these images, and (c) evaluate network performance at labeling the rendered images. These projects ask you to analyze how different camera models influence neural network performance.
We expect you to use the pre-trained networks for this project. But adventurous students - or those of you who are already skilled in networks - can go further and experiment with transfer learning or fine-tuning existing nets to adapt them to a specific camera design and evaluate the results.
As a general principle, you should consider that the camera parameters may have a different impact that depends on the task you are trying to achieve.
Spatial variation
Suppose that we change the spatial resolution of the image. Two ways the resolution might change are using diffraction limited lenses with different f-numbers and/or using sensors with different pixel sizes. Can we quantify what types of semantic classification might be influenced as the spatial resolution changes?
Sensor noise
It is expensive to reduce the sensor and pixel noise. And different sensors have different amounts of read noise, fixed pattern noises (DSNU, PRNU). Even for the same sensor, the amount of noise will differ depending on the exposure duration and luminance level. Can you find semantic categories of images for which the noise matters, and some for which the noise does not matter?
Color
The same for color. In some cases classes can be discriminated well without color using monochrome images. In some cases the three color channels are useful. Can you find examples that illustrate this principle, and can you find ways to quantify the value of color in some cases?
Depth
We can recover a lot about the scene just by looking at a depth map. Can these depth map images be used in classifiers? How well will the classifier trained on radiance images perform if we ask it to identify semantic categories based on the depth image?
As a general point, can you learn something about the network from the mistakes it makes? From what it gets right and wrong, what will be your speculation about the information the network is using to make the semantic classification. This will depend on the stimuli you use in your experiment. What if your categories are oranges and lemons? What if your categories are letters of the alphabet?
Modeling the human contrast sensitivity function from fovea to periphery
Mentors: Brian
ISETBio is a software system that is comparable to ISETCam, but focused on the human eye. The current ISETBio software effectively calculates the first stage of vision - cone photoreceptor absorptions. Some aspects of visual sensitivity are determined by the photoreceptors. One of the factors that is not really known is how the changing size of the photoreceptors - they are small in the fovea and much larger in the periphery - impact visual contrast sensitivity. We will write a calculator relevant for human space-color sensitivity.
If you are interested in the biology of the human eye and perceptual image metrics, I can show you ISETBio and some calculations we would like to implement.
Light field camera modeling
Mentors: Brian and Zheng
If you like to program, and you are interested in light fields, then this could be a good project for you.
Using ISETCam and ISET3d, we can simulate sensor data that we would obtain from light field cameras. There are a number of algorithms that people use to refocus and estimate depth with such sensor data. These algorithms are part of Donald Dansereau's software package, Light Field Toolbox. We use some of Don's functions in ISETCam. It would be nice to integrate more of his algorithms the simulated data more closely with Don's toolbox. This project would extend the current light field camera scripts by adding more examples that use Don's toolbox as well as providing documentation.
One part of this project that could be particularly interesting but somewhat advanced: Can we use ISET3d simulations of a dual-pixel autofocus camera to create a depth map? How well can we do? This is a good, but advanced, project.
Projects Fall 2019
ISETCam whole system validation
Mentors: Brian, Joyce and Zheng
Set up a physical 3D scene based on the Cornell box. Use the spectrophotometer to measure the radiance inside the Ronnie Luo gray box with the diffuse light source. Calibrate a lens and camera that provide raw sensor data. Compare the simulation with the prediction. We could use a calibrated camera (e.g., the Nikon). New features of this project include lens calibration.
Convert scene and oi to Matlab objects/classes.
Cinema 4D experts? Ability to control Cinema 4D programmatically as one can do with Blender
Experimental data collection for simulation comparison
Creating the ISET3D Cornell Box images, large quantities
Optics related, say lens de-centering
ISETBio related
Geometric calibration of a camera
Mentors: Brian and Zheng
There are several online videos and software packages that describe how to measure and correct for camera lens distortion.
This project involves using calibration targets and software (see references below) to estimate the camera’s intrinsic, extrinsic and lens-distortion parameters. In the process of doing this, you will learn what these parameters are and how they are calculated.
References https://www.mathworks.com/help/vision/ug/camera-calibration.html https://www.mathworks.com/videos/camera-calibration-with-matlab-81233.html http://ksimek.github.io/2013/08/13/intrinsic/ http://ksimek.github.io/2012/08/22/extrinsic/ https://www.mathworks.com/help/vision/ug/stereo-camera-calibrator-app.html
Image alignment
Mentors: Brian
Basic project: Evaluate different software algorithms for their accuracy on image alignment. We can assess the algorithm performance using test images from the ISET3d software. This software starts with computer graphics models and ends up producing (a) image data, and (b) pixel level labels that define the object location in each image (ground truth). We can use the ground truth data to measure how well the alignment algorithm performs. We can also use the ISET3D software generate images simulated from cameras with different types of lenses, different sensors, and different types of motion (global or object level motion).
Useful for students interested in image processing (alignment algorithms) or computer graphics (ISET3d).
Learning the image processing pipeline
There are a number of papers that describe methods of learning how to map from raw sensor data to jpg values. Several are referenced here, including work from Stanford (L3). We can provide you with raw data and image processed data, and you can experiment with training neural networks to perform the transformation. Also, the DeepISP folks have some images from Samsung.
- DeepISP: Towards Learning an End-to-End Image Processing Pipeline, Eli Schwartz, Raja Giryes and Alex M. Bronstein Data
- Learning the Image Processing Pipeline (2017) , H. Jiang, Q. Tian, J. E. Farrell, B. Wandell, IEEE Transactions on Image Processing, Volume 10, pages 5032 - 5042 Code and data
- Papers from Milanfar at Google (e.g. Blade).
Designing and evaluating illumination systems for scientific and industrial imaging applications
Mentors: Joyce and Zheng
There are many scientific and industrial imaging applications that combine an imaging sensor with ring light illumination. Ring lights are designed to surround a camera. Depending on where the camera is placed, the illumination can be both non-uniform and produce least amount of light at the center of the imaging area. It is sometimes possible to calibrate and correct for the non-uniform illumination, but for many types of ring light illumination configurations, the calibration cannot compensate when the central imaging region is not receiving enough light.
This project involves using open-source Matlab code for modeling and comparing different lighting configurations. It may be ideal for someone who is taking the class remotely.
Reference Fhionnlaoich et al., (2019). Optimising Light Source Positioning for Even and Flux-Efficient Illumination. Journal of Open Source Software, 4(37), 1392. https://doi.org/10.21105/joss.01392
Designing a multispectral light to excite tissue fluorescence
Mentors: Joyce and Zheng
The project will help us build a better light source system for the fluorescence measurements. We want to have a system that can easily switch between the light sources with different wavelengths, and (b) improve the uniformity of the light illuminating the mouth. This could be a good project for someone with a background in mechanical engineering and optics.
Basic project: Setup system baseline, lens/optics component design, system performance validation. The skills will involve (a) getting hands-on experience of working and designing on optical table, (b) using a spectrophotometer, and (c) knowledge of engineering optic path.
Measuring tissue fluorescence
Mentors: Joyce and Zheng
We are acquiring data about the fluorescence arising from different parts of the human mouth. We measure fluorescence by illuminating the mouth using a short wavelength (blue) light, and then making spectral photometric measurements of the light emitted from different locations. Even though the illuminant contains only, say, 400 nm light, the emitted light contains energy at 450 to 600 nm. These wavelengths are the fluorescent, rather than reflectance, signal.
This project will be to help us acquire more data, annotate the data, and place them in a database. Further, it will involve using software we have recently added to ISETCam to separate the light into the parts that are fluorescence and reflectance. We will also try to use statistical methods to characterize differences between people and differences between measurements made from different parts of the mouth. For example, can we use principal components and k-means clustering algorithms to understand more about the data.
Basic project: Collect spectral photometric measurements, interact with the database, and perform the fluorescence estimation. The skills will involve learning how (a) to use a spectrophotometer, (b) work with human participants to obtain measurements, and (c) design and record data for a reproducible experiment.
Simulating and designing a camera to measure tissue fluorescence
Mentors: Joyce and Zheng
A spectrophotometer is an expensive and challenging instrument to use. Moreover, it does not acquire a full image but only measures the light from a small part of the image. It would be desirable to build a camera that measures an entire image and to estimate the fluorescence from such an image. The goal of this project is to design a camera that can estimate tissue reflectance and fluorescence as well as the spectrophotometer.
There are several different aspects to this project.
Simulation: . We will teach you how to simulate cameras with different spectral properties and predict their output for different types of spectral inputs. We will also teach you how to compare the performance of a real and simulated camera. You will have opportunities to improve the simulations and the performance evaluation. There are also opportunities for developing and testing algorithms for estimating tissue reflectance and fluorescence from a camera that has only 3 or 2 spectral channels. Simulation has the advantage of having ground truth data. Hence optimization methods and machine learning are both possible.
Software enhancements for a prototype camera: We will show you how to use a prototype camera that includes a UV and a broadband light, customized filters and an imaging sensor. You have opportunities to develop software that improves how the camera captures, transmits and analyzes the camera images.
Mechanical design: There are many opportunities to improve on how the camera is used. For example, one could redesign the form factor so that it is easier to capture images of the mouth. Or, one could design an apparatus that keeps a person’s head fixed while the camera is positioned to capture images of different parts of the mouth.
Eye movements and visual acuity
Mentor: Brian Wandell
ISETBio [2] estimates the photoreceptor current responses in the presence of eye movements. This project uses ISETBio to produce (rectangular grid) photoreceptor responses to a slanted edge pattern. You then run the ISO12233 code on the output to calculate the MTF. The purpose of the project is to vary the amplitude and nature of the eye movements (using the eye movement model in ISETBio) and to show the impact of the eye movements on the MTF.
One group might make this comparison in foveal regions and a second group could do the calculation for peripheral retina, where the cone apertures are much bigger.
Calculate retinal ganglion cell responses
Mentor: EJ Chichilnisky
The ISETBio software is designed to predict responses in the human eye. It works with ISETCam.
Any camera or sensor provides an (imperfect) representation of a scene, from which we often try to reconstruct the scene itself. For example, cameras use three color sensors to represent the spectral properties of the environment: the reconstruction of the spectrum from these three sensors is far from complete, but if the sensors and the reconstruction algorithm are designed well, the reconstruction may be sufficient to represent the scene in a useful way.
The retina is a sensor that transmits to the brain an imperfect representation of the spatial properties of a scene, and a retinal prosthesis provides an even less perfect representation. Recent work attempts to describe how the visual scene may be reconstructed linearly from normal retinal activity (Brackbill et al), how machine learning methods may enhance such a reconstruction (Parthasarathy et al), and how this kind of reconstruction can be helpful in reasoning about how to make an effective retinal prosthesis (Golden et al).
Develop software tools to reconstruct the spatial properties of natural visual scenes from the output of the retina. Simulate retinal output using ISETBio. Perform a linear reconstruction of the scene using the logic of Brackbill et al. Consider whether simple nonlinearities (e.g. a nonlinear lookup table) might improve the reconstruction. Consider whether a different retinal encoding would support more accurate reconstruction. Consider what image metrics should be used to evaluate the quality of the reconstruction. If a retinal prosthesis could activate some of the neurons, but not all, how good would the representation be, and could the reconstruction take into account the missing cells to improve the reconstruction?
Here are some references of people working on this topic.
- Brackbill et al (draft paper, incomplete, but OK to share with just this class)[3]
- Parsasarathy et al [4]
- Golden et al [5]
Wavefront aberrations and point spread functions
Mentor: Brian Wandell
ISETBio (and ISETCam) have functions that calculate from the wavefront aberration of an optical system to the point spread function. This calculation only involves calculating the magnitude of the Fourier Transform of the aberration. In addition, these systems often summarize the wavefront aberration using the Zernike polynomial coefficients. These coefficients define the wavefront aberration on a circular support region.
While the forward calculation (from wavefront aberration to PSF) is straightforward, returning from the PSF to the wavefront (in terms of Zernike polynomials) is not straightforward. I would like to be able to get a reliable estimate of the wavefront, as expressed in terms of Zernike polynomials) from a point spread. I have started, but not yet succeeded. If you like working on these kinds of calculations, let's do this project. It will involve laying out the math, writing the code, and testing the code with different examples.
It would be best if the math was straightforward. But if you really want to do this by training a neural network to do it, I could help you with that by producing many examples. But, really, ...
Avian vision simulation
Mentor: Henryk Blasinski
Different species have evolved in different environments, and as a consequence their vision systems have adapted to specific properties those environments. For example Teodore and Nilsson (Nature 2019) postulate that birds developed additional photoreceptors sensitive in the UV wavelength range because they improve their perception of leaves. The authors conducted a large number of simulations predicting the appearance of different forest scenes in different light conditions. These simulations use basic models of light propagation and interactions with objects in a scene.
Basic project:The goal of this project is to reproduce, and extend the result from Teodore and Nilsson, but using PBRT and ray-tracing. This environment allows to create geometrically complex scenes and model many types of interactions between objects. Some of the extensions could include:
1. Creating a 3D model of a forest
2. Varying the reflectance and transmission properties of leaves.
3. Modifying ambient illumination (time of day, season, etc.)
4. Evaluation of different photoreceptor sensitivity curves, and their impact on a vision task.
Reference: Cynthia Tedore, Dan-Eric Nilsson, ‚Avian UV vision enhances leaf surface contrasts in forest environments,’ Nature 2019; https://www.nature.com/articles/s41467-018-08142-5
Quantifying subjective judgments about image quality
Mentor: Joyce Farrell
The method of pairwise preference judgments generates reliable and informative data about the relative quality of two images. Several subjects compare two images to each other. The percentage of the time one image is preferred over the other is used as an index of the relative quality of the two images. The disadvantage of this method is that it requires many comparisons, typically ten or so for every pair of images.
Silverstein and Farrell (2001) proposed a method to reduce the number of pairwise preference judgments by selecting a subset of pairwise comparisons. Instead of comparing every pair of images (the complete method), a partial method is used that makes more comparisons between images that have similar quality values than between images that have very different quality values. A sorting algorithm is used to efficiently order the images with paired comparisons, and each comparison is recorded. When the sorting is completed, more trials will have been conducted between images that have similar quality value than images that have very different quality values. Regression is used to scale the resulting comparison matrix into a one dimensional perceptual quality estimate.
The method assumes that the images can be positioned on a one dimensional quality line. Given this assumption, it uses an efficient sorting method to reduce the number of preference judgments necessary to quantify the quality of each image.
This project uses simulations to test the method and the assumptions upon which it relies. An added bonus would be to include a method for testing the assumption that the images can be positioned on a one dimensional quality line (hint: look for violations of transitivity).
All material necessary to accomplish this project are provided in this paper: “Efficient method for paired comparison” by D. A. Silverstein and J. E. Farrell (2001) [6]
Projects Fall 2018
Oral health camera design
Mentors: Joyce and Zhenyi
We are acquiring data about the fluorescence arising from different parts of the human mouth. We measure fluorescence by illuminating the mouth using a short wavelength (blue) light, and then making spectral photometric measurements of the light emitted from different locations. Even though the illuminant contains only, say, 400 nm light, the emitted light contains energy at 450 to 600 nm. These wavelengths are the fluorescent, rather than reflectance, signal.
This project will be to help us acquire more data, annotate the data, and place them in a database. Further, it will involve using software we have recently added to ISETCam to separate the light into the parts that are fluorescence and reflectance. We will also try to use statistical methods to characterize differences between people and differences between measurements made from different parts of the mouth. For example, can we use principal components and k-means clustering algorithms to understand more about the data.
Basic project: Collect spectral photometric measurements, interact with the database, and perform the fluorescence estimation. The skills will involve learning how (a) to use a spectrophotometer, (b) work with human participants to obtain measurements, and (c) design and record data for a reproducible experiment.
A more advanced aspect of this project: A spectrophotometer is an expensive and challenging instrument to use. Moreover, it does not acquire a full image but only measures the light from a small part of the image. It would be desirable to build a camera that measures an entire image and to estimate the fluorescence from such an image. If we know a lot about the reflected light (see the Basic project') we might be able to design a calibrated camera that acquires enough data to estimate the fluorescence throughout an entire image. The goal of this project is to design and simulate the amount of light, spectral character of the light and the camera, and image processing software to embed in such a camera.
Computer graphics asset creation and rendering
Mentors: Zhenyi and Trisha
A great deal is known about the illuminants and reflectance spectra of typical objects within the visible range. Much of this knowledge was obtained because people needed it to design effective consumer cameras. With the increasing use of cameras for machine vision applications, it is becoming increasingly valuable to learn about the reflectance and illumination beyond the visible wavelengths, extending to the band gap of CMOS imagers (about 1000nm). This data could be used to guide a range of automotive and drone applications.
Basic project: Use a spectral photometer to collect spectral reflectance samples of objects that extend across the wavelength range to 950nm or 1000nm. This project involves creating a methodology for (a) acquiring images, (b) acquiring spectral data from identified image locations, (b) measuring the illumination and reflectance from these location. We then need a method for storing and retrieving the data using our database.
Advanced part I: Search the web for existing databases with material reflectance that extends into the long-wavelength (near infrared) regions. Create models of the spectra using principal components methods, k-means clustering algorithms, or other data science tools.
Advanced part II: Create computer graphics renderings of the optical image of driving scenes using reflectance data and a camera that is specified all the way into the near infrared.
Exploring rendering algorithms using machine-learning (e.g., L3)
Mentors: Zheng and Brian
This project would be great for anyone interested in training small neural networks. Using ISETCam, we can create a great many sensor images of approximately natural scenes. We are interested in creating these sensor images using different types of sensors (e.g., standard Bayer and a Bayer with a white pixel rather than two green pixels).
For this project, we would like you to try to train a neural network that converts the data from one type of sensor into another
Basic project: We will provide you with a set of ISETCam scenes to use for this project. You can use ISETCam methods to predicted sensor responses from, say, a simple RGB Bayer camera. Then you use these same methods to calculate the predicted responses from a modified version of that camera. A first modification would be to double the spatial resolution of the camera. Because the images are simulated, they will be pixel-wise aligned in the sense that a 4x4 region of the lower resolution camera will correspond to an 8x8 region of the high resolution camera. You can use a tool (e.g., pyTorch or TensorFlow) to find a mapping from the low resolution to the high resolution image. Different methods for designing and building the network - such as autoencoder methods [7] - might be applied.
Many versions of this project might tried, such as predicting a monochrome sensor response from an RGB response, or - if you dare! - an RGB from a monochrome given some environment (fruits). Or predicting the sensor responses under daylight illumination from the sensor responses under tungsten illumination. Or predicting the sensor responses of an image without camera-shake from the sensor responses of an image with camera-shake. Or predicting what the sensor responses for a high illumination (bright light, 15ms exposure) from a capture at low illumination (dark light, 15 ms exposure).
Spatial CIELAB vs ISETBio and only front-end physiological optics
Mentors: Trisha and Brian
CIELAB Delta E is a color difference metric that measures the similarity of two colors to a human observer. Although widely used, the CIELAB metric is only suitable for measuring color difference of large uniform color targets. Therefore, the spatial CIELAB metric was created to extend the Delta E metric to color images instead of uniform patches. This is necessary because color discrimination and appearance is a function of spatial pattern, therefore spatial CIELAB was designed to takes into account the spatial-color sensitivity of the human eye.
We would like someone to use ISETBio to create L,M,S receptor responses to images. We would then calculate ISETBio-CIELAB differences based on these L,M,S values and compare them with Spatial CIELAB differences computed directly from a display screen. Since the L,M,S values calculated through human optics should already taken into account parts of the spatial-color sensitivity of the human eye, we are interested to see any similarities or differences between these two calculations. The critical aspect of this project is designing test targets.
You will learn: How to use ISETBio to calculate L,M,S values and how to calculate CIELAB values and Spatial CIELAB values.
Advanced Project: Add in eye movements to the calculation and calculate based on the mean response that incorporates eye movements.
Human Optics as a function of eccentricity
Mentors: Trisha
This project would be great for someone who is interested in optics and optical modeling software.
ISET3d is an extension of ISETCam that allows users to simulate 3D scenes and realistic lens prescriptions using ray-tracing and computer graphics. Using ISET3d, we have the ability to simulate a physiological model of human optics, which allows us to predict the optical image after a 3D scene is passed through the optics of the human eye. However, there are many different models of the human eye, which all differ in detail and complexity.
We would be interested in using either ISET3d or other optical modeling software to quantify the off-axis (e.g. wide-angle) performance of several human eye models. We can do this by calculating optical images at different angles away from the center of the retina, and quantifying the modulation transfer function or point spread function at each of these location. We can then compare their performance with known values in the literature.
You will learn: How to use ISET3d to model the optics of the human eye.
Advanced Project: Some physiological models of the eye have accommodation (focusing) modeling as well. In other words, we can change the lens prescription to model the human eye focusing near and far. Can we quantify the difference in these accommodation models?
Projects Fall 2017
Simulation of Cone Responses for Photosensitive Epilepsy
Patients with photosensitive epilepsy can get seizures from being exposed to flashing lights. Certain frequencies and colors are highly epileptogenic and in 1997 one Pokémon episode resulted in seizures and hospital visits in over 600 children in Japan. Specifically the red flickering lights can be very provocative (Takahashi and Tsukahara 1976; Binnie et al., 1984). It is hypothesized that the red flashes are highly epileptogenic because the only stimulate the red cones. (Harding 1998). A current study that investigated effects of age and gender similarly found that red stimuli are much more likely to induce epileptic activity. Can we simulate how do the cone responses differ across the different colored filters?
This project will use a toolbox called ISETBIO to simulate cone responses. ISETBIO is analogous to ISET, but specifically simulates the human visual system: from a stimuli, through the optics of the eye, onto the retina and photoreceptors, and eventually into the retinal ganglion cells. We can use ISETBIO (1) to setup a simulation with different colored filters and (2) analyze cone responses to stimuli that cause epilepsy.
Mentor: Dora Hermes
Speeding up lens simulations in a ray-tracer
Instead of using ISET to simulate the optics of an imaging system, we have the option of using a graphics ray-tracer to trace rays through a full optical lens system. This allows us to model full 3D scenes in our simulations instead of the flat 2D scenes used in ISET.
In our work, we use an open-source ray-tracer called PBRT (Physically Based Ray Tracer) that we've modified to trace rays through a given optical system. We shoot a ray from the camera sensor and use Snell's law to refract the ray through each surface in a lens system. However, this type of ray-tracing can be very slow and would benefit greatly with a speed-up. One possibility is to precompute ray paths and to load them in during rendering time.
In this project, we explore various methods to speed up the lens simulation in a ray-tracer. An ideal student would have some working knowledge in C++ and an interest in computer graphics.
Mentor: Trisha Lian
Modeling a cell phone camera pipeline
The good folks at Google wrote a paper describing how they make high quality images on a cell phone camera. The paper is included on our Canvas web-site.
Burst photography for high dynamic range and low-light imaging on mobile cameras. Hasinoff et al., ACM Trans. Graph. Vol. 35, No 6. Article 192 (2016).
For some of the projects, we can divide up different parts of the image processing pipeline described in this paper and simulate the expected results using the ISET tools. The critical simulation concerns the acquisition of many brief images, alignment of these images, and combining the results into a high quality result. Let’s see how far we can get in doing an assessment of their burst photography design with software simulation tools.
Camera properties and machine-learning algorithm performance
There are two thoughts about image sensors and machine-learning algorithms. One group of people thinks that the algorithms will run across any type of camera. Another group thinks that changing the camera optics and sensor may have an impact on the algorithm performance.
It is likely that the truth is somewhere in between. Some optics and sensor changes will have an impact on some types of algorithms. But we are not aware of any systematic studies that have examined how changing out camera parameters will influence the performance of convolutional neural nets (CNNs).
We can use the ISET tools in this class to simulate images obtained by cameras with optics and sensors. Those of you who are interested or skilled in machine-learning for image classification or object detection can create a project to evaluate how well a CNN trained for one camera will generalize to images obtained from a different camera.
Cell phone camera variation
Problem: There is a lot of interest in testing the image quality of smartphones, made especially relevant by the DxOMark rankings, which are often cited by the press and by phone manufacturers as a measure of the image quality of a particular model of smartphone. However, out of necessity, only one or a few examples of each phone can be exhaustively tested. That raises the question of how much unit-to-unit variation affects the scores, and if there are correlations in that variance based on sensor model, specific lens, or smartphone brand or price.
Suggested Project: Create a crowd-sourced experiment where volunteers (passers-by?) could take a photo of a test target and send in the result. Then analyze the data to attempt to determine, through some combination of data analysis and machine learning, how much variation there is between multiple samples of a particular model, and whether that varies with brand, price, sensor, optics, or some other potentially surprising factor.
Mentor: David Cardinal, http://www.cardinalphoto.com
Sensor Calibration and Simulation
ISET makes it possible to predict the output of an imaging sensor, given a set of sensor simulation parameters. The simulation parameters are derived from a few fundamental measurements that characterize sensor spectral sensitivity and electrical properties including dark current, read noise, dark signal non-uniformity and photoreceptor non-uniformity. This project will involve making these measurements and deriving the simulation parameters for a camera that is in our lab. Calibration targets, measurement equipment, and software programs will be provided. There are also ISET scripts that describe the measurement methods and calculate the sensor parameters.
s_sensorEstimation.m illustrates how to measure the spectral response of a digital camera. s_sensorAnalyzeDarkVoltage illustrates how to measure dark noise s_sensorPixelReadNoise.m illustrates how to measure pixel read noise. s_sensorSpatialNoiseDSNU.m illustrates how to measure the DSNU of a sensor array. s_sensorSpatialNoisePRNU.m illustrates how to measure PRNU
You can use the measured and known sensor parameters to predict the RGB camera values for a color calibration target. You can then compare the predicted RGB camera values to the actual RGB camera values of the target, taken from a specific camera.
References: http://scien.stanford.edu/jfsite/Papers/ImageCapture/Farrell_Okincha_Parmar.pdf
Mentor: Joyce Farrell
Geometric calibration of a stereo camera
There are several online videos and software packages that describe how to measure and correct for camera lens distortion, and how to estimate the size and location of objects based on the images the objects project onto two cameras in a stereo configuration.
This project involves using calibration targets and software (see references below) to estimate the camera’s intrinsic, extrinsic and lens-distortion parameters. In the process of doing this, you will learn what these parameters are, how they are calculated, and how the accuracy of the estimated parameter values affect the accuracy of object size and distance predictions.
References: https://www.mathworks.com/help/vision/ug/camera-calibration.html https://www.mathworks.com/videos/camera-calibration-with-matlab-81233.html http://ksimek.github.io/2013/08/13/intrinsic/ http://ksimek.github.io/2012/08/22/extrinsic/ https://www.mathworks.com/help/vision/ug/stereo-camera-calibrator-app.html
Mentors: Joyce Farrell and Trisha Lian
Depth from Stereo Images
Database of synthetic stereo images
The Middlebury Stereo dataset is a collection of stereo images with “ground truth” disparities or depth maps. Researchers and students have used datasets that are part of this collection to compare different methods for estimating depth from stereo images. The depth maps are inherently noisy due to that they are empirically measured using range-sensing devices or structured lighting
This project will use our lab software to create a new database of synthetic stereo camera images and associated depth maps. You can modify the scene properties of a scene, position the cameras in the scene, modify the baseline distance separating two cameras, and modify properties of the optics and sensors in the two cameras.
Mentor: Trisha Lian
Stereo algorithm assessment
As a related project, a cooperating group might run depth estimation algorithms that are already published on the web (see, for example, functions in opencv ) and learn about how camera parameters such as baseline separation, optics, and/or sensor resolution affect the accuracy of the depth estimation algorithms.
Projects Fall 2016
RealSense 3D-imaging
The following are some project ideas that involve the real-time RGB-D imaging technologies using RealSense dev kits. We will provide both RealSense SR300 (short-range depth-camera based on coded-light technique) and LR200 (long-range version based on IR-assisted stereo-3D technique).
Projected Texture Stereo
RealSense LR200 module uses a projected texture stereo system. Measure and model the system’s optical properties and implement techniques for generating high-quality pattern texture projectors, as outlined in published work. Mentor: Leo Keselman
Computational Photography
Using depth maps from either the RealSense SR300 or LR200, create examples of depth-of-field blur, tilt-shift effects, and other post-processing effects. Mentor: Leo Keselman
Stereo Algorithms
RealSense LR200 hardware produces depth maps using stereo matching algorithms. However, they also provide left and right images. Implement, test and design alternative stereo matching algorithms, and compare with the results provided with built-in algorithms in the LR200 ASIC and accessed through the API. Mentor: Leo Keselman
Visual Odometry
There exists many techniques for estimating camera position when given an image. RealSense SR300 and LR200 provide both rectified images and depth maps. With these, a wide range of techniques, from ICP to three-point-pose RANSAC can be used to implement 3D scanning of large environments. Implement such a system. Mentor: Leo Keselman
Image systems simulation
Autonomous vehicle sensors: Forensic analysis of the fatal Tesla car crash
On May 7, 2016, a 40-year old man was killed when his Tesla crashed in Florida. There are many articles describing the accident and speculating about the cause. For example, Telsa reported that “Neither Autopilot nor the driver noticed the white side of the tractor trailer against a brightly lit sky, so the brake was not applied.”
The Tesla car had a Mobileye system that includes several cameras and an image processing module. There is enough known about the imaging sensors in the Mobileye system to predict the images the sensors would have captured for different types of scenes.
This class project will use the ISET digital camera simulation software to model different scenes and image sensor parameters (e.g. exposure duration and video rate). Extra bonus points if you use machine learning (svm) to determine whether a system can detect the difference between different types of scenes. For example, what type of imaging sensor is required to detect the difference between a “white side of a tractor trailer” and “a brightly lit sky”?
References:
Inside the Self-Driving Tesla Fatal Accident, by Anjali Singhvi and Karl Russell, NYTimes, July 12, 2016; Tesla faults brakes, but not autopilot, in fatal crash. By Neal Boudette, Business Day, July 29, 2016; Mobileye EMP evaluation platform; Fatal crash prompts federal investigation of Tesla self-driving cars, by Sam Thielman, The Guardian, July 13, 2016; Autopilot 2.0 adds more sensors to be better than ever, report says, by Chris Mills, BGR, Aug 11, 2016; Tesla Autopilot 2.0: retrofit to next gen sensors likely to be available for some owners, Fred Lambert, electrek, August 6, 2016; Tesla Autopilot 2.0: next gen Autopilot powered by more radar, new triple camera, some equipment already in production, Fred Lambert, electrek, August 11, 2016; Researchers trick Tesla Model S. Autopilot, Brandon Turkus, Autoblog, Aug 4, 2016; Another crash on Telsa autopilog, another driver admits to not paying attention, was cleaning his dash, by Fred Lambert, electrek, August 19, 2016; Tesla Model S, Wikipedia; Understanding the fatal Tesla accident on Autopilot and the NHTSA probe, Fred Lambert, July 1 2016 “WTF is the deal with driverless car guru George Hotz’s Comma Points?”, by Joe Carmichael, July 7, 2016; Uber and Volvo partner up, robot ride-sharing starts this summer, by Jonathan Gitlin, ARS Technica, Aug 18, 2016
Comma.ai startup in SF; Drive.ai startup in SF; Nauto – startup in Palo Alto
Learning a driving simulator, by Eder Santana and George Hotz
Mentor: Joyce Farrell
360 Camera Capture Simulation
The recent popularity of head mounted displays and VR has increased interest in constructing 360 cameras that can capture and render stereo panoramas. A couple of recent examples inlcude Facebook's Surround360 or Nokia's OZO camera.
With a combination of a customized ray-tracing renderer (PBRT-spectral) and a MATLAB toolbox to control it (RenderToolbox3) we have the ability to simulate 360 cameras in a 3D virtual scene created in a modeling program such as Blender. To do this we specify the distribution of cameras, their lenses, focus, FOV, etc. and take a "snapshot" of a virtual scene. For example, we can place 6 virtual cameras in a circle with a 1 foot radius, attach wide angle lenses to all cameras, and take images from each camera. Because the scene is virtual, we also have access to the ground-truth depth map and true panorama.
This project will focus on using these simulation tools to evaluate either the 360 stereo stitching algorithms or the design of the camera itself.
Note: Facebook's stitching code for it's Surround360 camera is now open source and on Github.
Some potential ideas to start with:
1. Can you design a database of virtual scenes that can help evaluate the effectiveness of 360 stereo stitching algorithms? This would include constructing a variety of scenes with a modeling program (e.g. Blender, Maya), porting the scenes to the simulation software and then taking 360 camera snapshots using the tools described above. Using a combination of the ground truth and the results of a stitching algorithm, can we evaluate how well the algorithm performs?
2. Can you evaluate the design of a 360 camera using the simulation tools above? For example, how would the quality of the panorama change by having 12 cameras in a ring instead of the 16 cameras on the Surround360 camera? This direction may require you to dig into the stitching code and make appropriate adjustments.
C++ and Python skills are necessary for using Facebook's open source stitching code. An understanding of basic Computer Vision would also be helpful.
Mentor: Trisha Lian for using the simulation software
Underwater simulations
The advent of GoPro camera has made underwater photography much more accessible. Unfortunately images captured underwater rarely look pleasing, they have washed out colors and low contrast due to scattering. To better understand the impact of water and its different constituents on underwater target appearance we built a ray-tracing based simulation environment for underwater photography. With this tool we think we can render images of underwater targets that look realistic, or do they?
To have some notion of how water really influences color appearance we also captured a number of underwater images using a variety of consumer cameras. In this project you will learn about raytracing through water and different mathematical models used to compute the interactions between lights, targets and water. Ultimately your goal will be to improve the simulation environment to make the simulated and captured images as visually close as possible.
Henryk, Trisha, Joyce
Model RealSense camera
In the past few years color+depth cameras such as Intel RealSense have become commonly available. Such cameras provide images of the scene together with depth maps i.e. arrays of numbers describing distances between the camera and points in the scene. Very often color and depth modules of a particular camera take advantage of fundamentally different physical processes to produce their images. Consequently substantially different tools are necessary to model how cameras produce color images and depth images.
Color image acquisition can be modelled with computer graphics tools, such as PBRT. PBRT is a ray tracking software that accurately simulates how light interacts with different objects in a 3D scene and how the light is projected onto a camera sensor. These rendering tools can be modified to incorporate the behavior of complex lens systems and elaborate camera designs such as light field cameras. In fact we have a modified version of PBRT to perform precisely such simulations (Spectral PBRT).
A different set of simulation tools is necessary to model depth estimation. One such tool is Blensor, which is a plug-in to Blender, an open source 3D editing tool. Blensor has been designed specifically to model how different types of depth cameras capture their data.
Unfortunately having two different tools is very inconvenient for modelling purposes. It is easy to loose track of simulation parameters, for example, camera poses and positions, scene orientations etc. Your goal for this project is to create a wrapper around PBRT and Blensor to allow users to easily and seamlessly use both tools. Ideally a user would define a model of a depth and color camera together with a scene mesh that would represent the world. The wrapper would need to handle the different tools, and make sure that the color and depth data is consistent with the scene mesh and camera models.
We hope that with the wrapper you will create you will be able to create a good model of a RealSense depth camera.
Achin, Henryk, Trisha
Myopia/Hypermetropia VR Experience
Myopia (near-sightedness) and hypermetropia (far-sightedness) are the most common eye problems in the world. With virtual reality, we have the potential to simulate the visual experience of uncorrected myopia or hypermetropia. This project will focus on creating such a VR experience. One possible path is to use Unity to create virtual rooms with interesting features that can highlight the experience of these vision problems. This would involve writing a shader that can blur the scene, as realistically as possible, according to depth and presenting this altered image through the VR goggles. Additional features may include sliders to change severity or to add other effects to increase the realism of the experience.
Students who work on this project may potentially be put in touch with documentary filmmakers interested in a creating a piece on myopia.
Trisha
Computer vision and computational photography
Reflectance, Fluorescence, and Color Matching
Fluorescence emission is a common property of biological tissues and materials and it strongly impacts the appearance of surfaces under different illuminants. Its presence makes any color matching task much more difficult. One example of a biological substance for which color matching is important are teeth. Natural enamel fluoresces under shot wavelength light, and whenever a dentist fills in a cavity he/she needs to select the filling with the color matching the tooth. However what may appear similar under dentists lamp, may look very different in broad daylight.
In this project you will perform a set of measurements of how teeth reflect and fluoresce light and then help design the spectral reflectance properties of a better dental filling that will be less visible under different illuminants.
Henryk, Joyce
Auto-cropping using Deep Learning
One of the most common post-processing tasks in photography is cropping of images for improved visual impact. This has only gotten more important with the widespread adoption of fixed-focal-length smartphones as the most common cameras in use today. There have been a number of very sophisticated attempts to automate this otherwise labor intensive process using adherence to various rules of composition (see References below). However, they suffer from growing complexity, as each attempt to improve the system requires layering yet more specialized knowledge. This seems like an ideal challenge for a deep learning based solution.
There don’t seem to be any (publicly available at least) frameworks for solving this problem in its entirety, but there have been several attempts to rate the aesthetics of photographs using deep learning (see References below). So, the project is to see if a similar approach can be used to automatically improve images by cropping them in some fashion. It provides some interesting challenges in design of the deep learning system. For example, should it be designed to evaluate each image and its possible crops independently, or is there a way to directly measure the success of a crop compared to the original image? The total solution spaces is extraordinarily large, so a variety of simplifying assumptions (for example a limited number of potential crops for each image) is assumed.”
Some references:
http://www.arminsamii.com/research/papers/crop-paper.pdf
https://www.cs.umd.edu/~djacobs/pubs_files/UIST2003.pdf
Optimizing photo composition (refers to above papers) https://people.mpi-inf.mpg.de/~chen/papers/photocompos.pdf
Rating Pictorial Aesthetics using Deep Learning http://infolab.stanford.edu/~wangz/project/imsearch/Aesthetics/ACMMM2014/lu.pdf
Mentor: David Cardinal
Human vision simulations (ISETBIO)
Predicting visual acuity from wavefront aberrations
Andrew B. Watson; Albert J. Ahumada, Jr
Abstract
It is now possible to routinely measure the aberrations of the human eye, but there is as yet no established metric that relates aberrations to visual acuity. A number of metrics have been proposed and evaluated, and some perform well on particular sets of evaluation data. But these metrics are not based on a plausible model of the letter acuity task and may not generalize to other sets of aberrations, other data sets, or to other acuity tasks. Here we provide a model of the acuity task that incorporates optical and neural filtering, neural noise, and an ideal decision rule. The model provides an excellent account of one large set of evaluation data. Several suboptimal rules perform almost as well. A simple metric derived from this model also provides a good account of the data set.
http://jov.arvojournals.org/article.aspx?articleid=2122162
A formula for the mean human optical modulation transfer function as a function of pupil size
Andrew B. Watson
Abstract: We have constructed an analytic formula for the mean radial modulation transfer function of the best-corrected human eye as a function of pupil diameter, based on previously collected wave front aberrations from 200 eyes (Thibos, Hong, Bradley, & Cheng, 2002). This formula will be useful in modeling the early stages of human vision.
http://jov.arvojournals.org/article.aspx?articleid=2121488&resultClick=1
A unified formula for light-adapted pupil size
Andrew B. Watson; John I. Yellott
Abstract The size of the pupil has a large effect on visual function, and pupil size depends mainly on the adapting luminance, modulated by other factors. Over the last century, a number of formulas have been proposed to describe this dependence. Here we review seven published formulas and develop a new unified formula that incorporates the effects of luminance, size of the adapting field, age of the observer, and whether one or both eyes are adapted. We provide interactive demonstrations and software implementations of the unified formula.
http://www.journalofvision.org/content/12/10/12/
Mentor: Wandell
The impact of small eye movements on high frequency resolution of the eye
Simulate the effects described in this paper using ISETBIO
Abstract: Humans and other species explore a visual scene by making rapid eye movements (saccades) two to three times every second. Although the eyes may appear immobile in the brief intervals between saccades, microscopic (fixational) eye movements are always present, even when an observer is attending to a single point. These movements occur during the very periods in which visual information is acquired and processed, and their functions have long been debated. Recent technical advances in controlling retinal stimulation during normal oculomotor activity have shed new light on the visual contributions of fixational eye movements and the degree to which these movements can be controlled. The emerging body of evidence, reviewed in this article, indicates that fixational eye movements are important components of the strategy by which the visual system processes fine spatial details; they enable both precise positioning of the stimulus on the retina and encoding of spatial information into the joint space–time domain.
Control and Functions of Fixational Eye Movements Annual Review of Vision Science Vol. 1: 499-518 (Volume publication date November 2015) First published online as a Review in Advance on October 14, 2015 DOI: 10.1146/annurev-vision-082114-035742
The unsteady eye: an information-processing stage, not a bug [8]
Mentor: Wandell
Effects of age on color appearance
Use ISETBIO to simulate the combined effects of an aging eye - changes in lens opacity, light scatter, pupil size, and so on - on various perceptual phenomena, such as color appearance.
From (Brainard, D. H. & Hurlbert, A. C. (2015). Colour vision: understanding #TheDress. Current Biology, 25, R549–R568, doi: 10.1016/j.cub.2015.05.020).
There are, in fact, a number of well-documented individual differences in the sensory apparatus that supports colour vision (reviewed in [13,14]). These include differences in pre-retinal filtering of light (for example, by the lens and macular pigment) — which, intriguingly, mostly affect short-wavelength or ‘‘bluish’’ light — differences in the spectral sensitivities of the retina’s cone photoreceptors, and differences in the relative numbers of cones of different classes. This type of front-end difference affects the information extracted from an image by different individuals, and might thus lead to differences in colour constancy. Other individual differences that can be revealed with much simpler stimuli may also be important. For example, as noted above, the stimulus seen as achromatic differs from one person to another, as do the stimuli that are perceived as pure examples of the unique hues (red, green, blue, and yellow) [15]. These differences themselves may be driven by front-end sensory differences, by differences in neural mechanisms that calibrate the colour vision system [16,17], or by an interaction between the two. Lastly, there might be individual differences in higher-order neural processes that specifically mediate colour constancy. A full understanding of the individual differences in how the dress is perceived will ultimately require data that relate, on a person-by-person basis, the perception of the dress to a full set of individual difference measurements of colour vision. The rich dataset of Lafer- Sousa et al. [2] suggests that age and gender do predict, to some extent, the variability in people’s response to the dress. Intriguingly, the density of pre-retinal pigments is also known to vary systematically with age."
Mentor: Wandell
Projects Fall 2015
A new approach to image processing (L3)
We have developed a new image processing pipeline (L3) for a digital camera based on machine learning and high speed processing with GPUs. L3 (Local, Linear, Learned) automates and customize image processing pipeline for a given design to speed camera development, leveraging advanced camera simulation and machine learning techniques.
Reference:
[2] Automatically designing an image processing pipeline for a five-band camera prototype using the Local, Linear, Learned (L3) method
Accelerating L3 Processing Pipeline for Cameras with Novel CFAs on NVIDIA® Shield™ Tablets using GPUs
L3 classifies input image patches into categories that are local in space and response, and automatically learns linear operators that transform pixels to the calibrated output space using training data from camera simulation. The local and linear processing of individual pixels makes L3 ideal for parallelization.
This project aims to accelerate the L3 pipeline on NVIDIA® Shield™ Tablets using GPUs for real time rendering of videos. A tablet application that demonstrates the fast rendering feature of the L3 method is potentially to be accomplished. The learned linear operators and video data captured by a multispectral camera prototype will be provided. The CUDA / C++ (or CUDA / Matlab) code that works on a PC will be provided as a starting point.
Skills preferred: CUDA, Android Programming
Mentor: Haomiao Jiang
High Dynamic Range Video Using the L3 Method
High dynamic range (HDR) imaging has advanced and translated to consumer products during the last decade. The majority of HDR techniques capture and combine multiple exposures to recover details and contrast simultaneously in dark and bright regions. However, this strategy requires the scene to be still during the multiple captures and is therefore inherently not suitable for HDR video acquisition. Altering the exposure settings in CFA is a promising approach for single-shot HDR image and HDR video acquisition, by trading-off spatial resolution. These novel HDR CFAs require time and effort to develop tuned image processing pipelines.
This project aims to explore the feasibility of L3 method on these novel HDR CFAs, particularly for HDR video application. Various HDR CFAs will be compared through the resultant images from the L3 processing pipeline in order to determine the optimal design.
References:
[3] Cheng, CH. et al., "High Dynamic Range image capturing by Spatial Varying Exposed Color Filter Array with specific Demosaicking Algorithm," IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 2009.
[4] F Yasuma, T Mitsunaga, D Iso, SK Nayar, Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum, Image Processing, IEEE Transactions on 19 (9), 2241-2253
Mentors: Qiyuan Tian and Steve Lansel
Designing L3 Processing Pipeline for a Camera Testkit with an RGB/W CFA Clear pixels have been introduced to CFA to transmit much more light for low light photography (e.g. Aptina’s Clarity+ sensor, OmniVision’s Clear Pixel sensor inside Moto X and Sony’s Exmor RS RGB/W sensor). However, it is challenging to develop satisfying image processing pipelines that produce high image quality. In simulation, L3 has been demonstrated as an effective and efficient processing pipeline for RGB/W sensor (see the movie comparing L3 processing results for a conventional RGB sensor and a RGBW sensor at a series of light levels, link).
This project aims to design an L3 processing pipeline for a camera teskit with an RGB/W CFA following the procedures described in Reference [2]. The camera testkit will be first calibrated for camera simulation. L3 processing pipeline will then be created from the simulation and tested on the raw images captured by the testkit.
Mentor: Qiyuan Tian, Haomiao Jiang
Color Matching in Dentistry
When dentists fill a cavity, they must select a composite material. When they replace a tooth or place a crown or veneer on an existing tooth, they design or order a porcelain implant. These decisions require the dentist to compare the color of teeth with the color of the composite or porcelain material. Dentists try to select the color or shade of the material that provides the best color match to the surrounding teeth, but they also complain that this is a difficult task.
By now you have learned how to use the CIELAB color difference metric to predict whether two colors will appear to match under a fixed illumination. You have also learned that these predictions are not invariant with changes in illumination. In other words, if you change the lighting, the colors of two different materials may no longer appear to match. Therefore, the smile that looks so perfect in the dentist’s office under fluorescent lighting might have imperfections under daylight.
This project has three components. First, we will make spectrophotometric measurements of 1) the reflectance of teeth in-situ in different individuals, 2) the reflectance of different composite and porcelain material, and 3) the spectral power of the light that falls on teeth in-situ under different lighting conditions. Second, we will use this data and the CIELAB color difference metric to predict whether people will be able to detect the difference between teeth and composite and porcelain material under different lighting conditions. Third, we will use the data in ISET simulations in order to determine the tradeoffs in color matching accuracy, cost and convenience. More specifically, we will simulate an imaging system based on a cell phone camera with flash/no-flash mode that has the potential of providing dentists with an alternative to the more expensive spectrophotometric devices that are currently on the market.
Mentor: Joyce Farrell (joyce_farrell@stanford.edu) and Henryk Blasinski (hblasins@stanford.edu)
Simulation projects dusing ISETBIO
ISETBIO is an ISET based Matlab toolbox that can simulate human optics and photoreceptor sampling. With ISETBIO, we can accurately compute the optical irradiance image that impinges on the retina and the number of photons absorbed by human photoreceptors (cones) for a given scene. ISETBIO is capable of simulating human individuals with different optics (myopia, astigmatism, etc.) and cone mosaics (colorblind, density difference, etc.).
Reproduce and Compare with Recent Papers In this project, you are expected to reproduce the results from one recent paper with ISETBIO. You are expected to work with your mentor to rewrite it in ISETBIO and try to explain every difference (if any) from the original code.
Here is a set of papers by Watson that are computational, in Mathematica, and related to Optics and Retina
Modulation Transfer Function and pupil size Pupil size and light level
Here a paper related to the human point spread function
Computing human optical point spread functions
Retinal ganglion cell modeling
A formula for human retinal ganglion cell receptive field density as a function of visual field location
Or ganglion cells and behavior
Retina-V1 model of detectability across the visual field. The original code for the paper will be provided.
Skills preferred: Matlab programming
Mentor: Haomiao Jiang, Brian Wandell
Simulate an eccentric camera
Write a simulation of the Foveon sensor.
Or,
Write a simulation of the Light.co camera.
Mentor: Brian Wandell
Monitoring the environment
We have ideas about how to take calibrated underwater images captured by GoPro cameras to monitor the health of coral reefs. There are various components to the project (camera calibration, modeling of light transport through water, and automating image upload, storage and analysis).
Mentor: Henryk Blasinski
An underwater, multispectral light source
Underwater imaging is quickly gaining importance not only due to its applicability in marine ecosystem monitoring, but also proliferation of inexpensive action cameras such as GoPro. Unfortunately, the colors in images acquired under water are severely distorted due to scatter and absorption phenomena. One approach to recover more spectral detail is to use active illumination techniques, this approach has proved to be very useful on the surface. In this project you will design and build an underwater, LED based multispectral light source that fits a standard GoPro size, underwater housing. With all the hardware in place you will have a chance to evaluate the accuracy and performance of active illumination spectral recovery in underwater scenarios. This is a hardware oriented project, you will be expected build and integrate the final system, which means that you should be familiar with soldering, PCB design and possibly even some CAD tools.
Skills Preferred: Hardware design experience, OR good with web-site programming.
Mentor: Henryk Blasinski
Oculus
Geometric Camera Calibration In order to simulate degradations of the human visual system using images captured by a camera, it is necessary to know exactly how those images have been captured. This project uses simple camera models that use efficient and flexible calibration procedures to derive geometric parameters such as focal length, radial distortion and the position and rotation of two cameras. There are well-established techniques that estimate these parameters using a specific calibration target like a checkerboard. The goal of this project is to become familiar with those techniques and use them on real images (OpenCV provides many building blocks which can be used) with an image undistortion procedure and a stereo image rectification procedure.
Skills preferred: Knowledge of C++
Mentor:
Streaming and Augmenting Stereo Camera Images One of the long term goals is the simulation of certain degradations of the human visual system and the evaluation of computer-aided visual enhancements to counteract those degradations. A crucial ingredient in achieving this is a software pipeline which can stream images from a stereo camera to an augmented or virtual reality device in real-time. Hence, the goal of this project is to build such a pipeline to capture, stream, and feed images from cameras in real-time to an Oculus Rift device.
Skills preferred: Knowledge of C++, Willing to learn Oculus Rift SDK, optionally also OpenGL SL or CUDA
Mentor:
Image Display Use the Oculus Rift to display images to human subjects that simulate (recreate) visual sensations that a person with a particular visual condition would see. This could be low vision, a type of color blindness, loss of central vision due to macular degeneration, or the effect of a retinal prosthesis in a blind person. We will help you use isetBio to create images to simulate one of these conditions. You will render the images on a calibrated Oculus Rift.
Information Display The goal of this project is to capture and display information so that people can track their movements and navigate in an environment with only visual input from the Oculus Rift. This will be accomplished by interfacing a Project Tango device with an Oculus Rift display. The Project Tango has sensors and software designed to track the 3D motion of the device and create a map of the environment using simultaneous localization and mapping (SLAM) algorithms. The output of the Project Tango is usually rendered on a laptop display. In this project, you will render the output on an Oculus Rift.
3D Projects
Almost anything with Real Sense
Depth Sensing With an Endoscope Using Flashes
Depth sensing has been a recent industry trend for many imaging applications. One less explored route is the use of depth sensing for endoscopes, to help identify tumors or other problems. For this project, initially use simulated Scene3D endoscope images to prototype a depth sensing algorithm involving 2 flashes and 2 captures (other capture procedures could be used as well). Prototyping using simulation is a nice, structured way to try out new algorithms quickly. Next, apply this algorithm using a real endoscope and tackle the real-world challenges involved.
Mentor: Steve Lansel
Curved Sensor Simulation
Sony and other imaging companies have recently unveiled curved sensors to improve image quality. Curved sensors bring imaging improvements because of the physics of geometric optics. For simple lenses, usually the focal area is in the shape of the surface of a sphere. However, most sensors are planar, so are only able to capture a small portion of the focal area. Lens engineers usually try to account for this problem using many lens elements and aspheric lenses. However, a curved sensor could potentially be a far simpler, and less expensive solution to obtain high quality images, in a smaller form factor. Instead of using a complex lens to obtain high resolution, imaging engineers could simply use a simple lens and a curved sensor to obtain the same, or even better results.
This projects involves using Scene3D, a full pipeline camera simulator to compare the resolution and chromatic aberration benefits of curved sensors and a simple lens, versus a planar sensor and a complex lens.
Mentor: Brian Wandell
Integration with OpenCV
Do we want to create scenes of some sort (stereo? different illumination? different noise? Different optics?) and test openCV algorithms for robustness against the range of simulated images.
Integration with Caffe
Simulation environments can be used to produce millions of images with a purpose in mind. We can then use these images to train machine learning algorithms. Is there something we want to ask people to do with, say, RenderToolbox to generate many examples and train on with Caffe?
Multispectral imaging for classification
Image classification is a very hot topic in computer vision. Most algorithms however operate on RGB camera channels, as if trying to mimic human visual system. In reality spectral information is much more abundant and can possibly be used to enhance classification algorithms. This project aims at investigating how much the accuracy of computer vision tasks can be improved if more spectrally sophisticated cameras were used . Specifically you will use a five band camera prototype to evaluate fruit and vegetable aging and perform flower classification, you will compare its performance to the performance of a classical RGB camera.
Skills Preferred: computer vision, machine learning
Gullstrand Eye and ray tracing of human optics
We are building a tool for modeling eyes, including the human, from ray tracing fundamentals. There is a famous model eye that we would like to implement.
Gullstrand eye search
We would like you to implement and test the Gullstrand eye with the ray tracing software in the ciset package (a close relative of ISET).
Mentor: Brian Wandell
Projects 2014
Predicting human performance using ISETBIO
ISETBIO is an ISET based Matlab toolbox that can simulate human optics and photoreceptor sampling. With ISETBIO, we can compute the optical irradiance image that impinges on the retina and the number of photons absorbed by human photoreceptors (cones) for a given scene.
For this project, a tutorial script describing how to calculate cone absorptions will be provided and the students will be responsible for trying to answer one of following questions:
- What's the maximum necessary display resolution (ppi) at certain viewing distance for Vernier acuity.
- What's the maximum necessary display resolution (ppi) at certain viewing distance for contrast (CSF) resolution.
To answer these kind of questions, students are encouraged to build two scenes and use their preferred machine learning algorithm (e.g. SVM/Neural Network/Random Forest, etc.) to classify cone absorption sensor data for two same or two different scenes into "same" or "different" classes. When classification accuracy for cone absorption data is greater than a pre-determined value (say, 75%), we would predict that the observer can tell the difference between the two scenes. You can compare these predictions with published data from real human observers.
Preferred Knowledge: familiarity with at least one machine learning algorithm
Mentor: Haomiao Jiang
Hardware project: Build a Multispectral Imaging System
Build a multispectral imaging system based on a rotating color filter wheel and monochrome camera.
If you have experience in design and 3D printing, you can build several necessary parts.
If you have an interest in engineering applications for art history, there is an opportunity to use the system to capture images of paintings in the Cantor Arts museum.
Mentors: Henryk Blasinski and Joyce Farrell
Hardware project: Build an inexpensive spectrophotometer
In this project, you will build a simple spectrophotometer using a clean DVD-R, a USB webcam and stiff black card paper
Here's a website introducing how to do it: http://publiclab.org/wiki/spectrometer
After building the device, you need to compare it to the performance of a much more expensive (~$50K) spectrophotometer that we have in the lab
Mentor: Haomiao Jiang
Camera Image Quality Metrics
The International Standards Organization (ISO) is developing a set of camera image quality metrics to quantify the spatial resolution, noise and color accuracy of digital cameras. http://proceedings.spiedigitallibrary.org/data/Conferences/SPIEP/64097/829302_1.pdf
Many of these metrics have been implemented in ISET.
You can use ISET to calculate these metrics for simulated cameras that have different optical properties, numbers of pixels and image processing methods. You can also use ISET to simulate how each camera captures and processes natural scenes (e.g. faces and landscapes). You can then compare the metrics with the appearance of these images as they are rendered on a display.
In this project, you will use ISET and CPIQ to quantify and illustrate how the metrics and the images change when you decrease the size of camera pixels (and inversely increase the number of camera pixels). This method will allow you to analyze how resolution tradeoffs with sensitivity: Small pixels make it possible to increase the number of sensor pixels sampling the optical irradiance image, but it also decreases the amount of photons a small pixel can capture. What do you prefer, a high resolution noisy image or a low resolution clear image? How does this depend on the display, viewing distance, etc.?
Mentor: Joyce Farrell
ISET model for real camera
In this project, you will build an accurate ISET model for a physical camera we have. You will take pictures of known scenes, analyze the captured images, and try to build an ISET model.
The goal is for the ISET model of the camera to give approximately the same computational results as the RAW output from the real camera. The similarities could be measured by the noise, color, spatial resolution and etc. Analyzing the errors between the model and the real camera will determine the model's accuracy.
If time permits, you can try to implement an image processing pipeline for the camera and evaluate the performance of the processed images.
Mentors: Qiyuan Tian, Steve Lansel, Joyce Farrell
Analysis and Compression of L3 Filters
The L3 algorithm is a learned image processing pipeline for cameras. The algorithm learns optimal linear filters for a given camera based on training data, light level, illumination color, and optics. For a complete camera this may result in many (possibly hundreds) optimized filters. We believe the filters will be closely related for similar camera settings. The goal is to analyze the filters, store a compressed set of filters, and interpolate the needed filters from this compressed set. This way we only need to store a smaller set of filters and can extrapolate to lots of new camera settings. Here is a recent SPIE paper on L3: https://drive.google.com/file/d/0B0Gw85qGqJxhbXJlcmZjbmhOQ2s/edit?usp=sharing.
Mentors: Qiyuan Tian, Steve Lansel, Brian Wandell
ISET model for underwater imaging
With the proliferation of cameras such as GoPro more and more people have started taking underwater images. These usually have large amounts of distortion, both spectral and spatial, originating from the medium in which the image was taken. Rather than experiment in the real world, the impact of different light transport phenomena on RGB images can be understood via simulation environments. In this project you will implement, enhance and integrate with ISET the underwater image simulation system described in the paper below.
Color image simulation for underwater optics
Mentors: Joyce Farrell and Henryk Blasinski
App for Programmable Camera in iOS / Android
We have a prototype programmable camera to be used with iOS or Android devices. The project's goal is to make an app that will run on iOS or Android and uses the camera. Think of an interesting camera app, and we can work together to build it. Prior experience in iOS or Android is needed.
Mentors: Steve Lansel and Munenori Fukunishi
Image classification with a five band camera
Recently image classification and object recognition have become very popular topics. Large majority, if not all, algorithms use images acquired with traditional, three channel (RGB) cameras. The goal of the project is to evaluate the performance of the state of the art algorithms applied to images captured with a five band camera. Will the recognition/classification performance change, and if so by how much? To get the flavor of the project you can look at the following paper:
Multispectral SIFT for scene category recognition
Mentors: Henryk Blasinski, Steve Lansel
Analysis of a real camera lens
Can we characterize how a lens blurs a point of light (point spread functions or psfs) by analyzing camera images of test targets that are displayed on a color monitor? This project has many possible variations.
- Illuminate red, green and blue pixels on a display and capture an image of the display with a camera placed on a tripod a far distance away. Vary the pattern of red, green and blue pixels (e.g. noise pattern).
- Estimate the psfs of a real camera with a real lens by analyzing camera images of displayed targets. Use a prosumer digital camera and vary 1/f# and observer how the estimated psfs change.
- Estimate the psfs for different field heights, wavelengths and depths.
- Use the estimated psfs to predict camera images of other displayed "natural" images, such as a face. Compare the predicted camera images to actual camera images.
Here are links to papers that describe a method for empirically estimating the psf of a camera lens. The links include code that you can download
- http://www.cs.ubc.ca/labs/imager/tr/2013/SimpleLensImaging/
- http://www.ipol.im/pub/art/2012/admm-nppsf/
People: Brian Wandell, Andy Lin, Joyce Farrell
Psf analysis and image deblurring using a simulated camera lens
The point spread function(psf) of a lens is an extremely important lens property. One possible application of knowing the psf is image deconvolution (deblurring). Deconvolution can drastically improve image sharpness. The following paper provides a good technique for estimating a psf and deconvolving an image with that psf: http://www.cs.ubc.ca/labs/imager/tr/2013/SimpleLensImaging/
Tasks
- Andy Lin will provide simulated camera images of several different types of spatial test targets. Your task will be to use the code from http://www.cs.ubc.ca/labs/imager/tr/2013/SimpleLensImaging/ to estimates psfs from the simulated camera images.
- To evaluate how well the psf estimation code works, compare the estimated psfs to the known psfs that Andy used to generate the simulated camera images.
- As another evaluation technique, use the estimated and known psfs to "deblur" a blurred image containing a secret message using the deconvolution code downloaded from the same site. The secret message will only be legible after proper deconvolution of the image. Andy will provide this blurred image.
Mentor: Andy Lin
Medical imaging: Super resolution microscopy
http://en.wikipedia.org/wiki/Super_resolution_microscopy
Super resolution microscopy refers to methods that build up a high resolution image of target by integrating many multiple images of the target illuminated such that only a small subset of the image points are captured in any one image. The camera image then samples a subset of the pixels in a high resolution image. The location of the pixels in many camera images are combined to construct a single full high resolution image of the target. By placing a point at the center of each sampled point, one can get very accurate spatial information about the location (phase) of illuminated points in the target. Because the center of a dot is smaller than the lens psf, some people assert that super-resolution methods beat the limit of lens diffraction. But you know better than that. Diffraction is a limit that no earthly being can beat. Nonetheless, by sampling with stochastic and sparse arrays of pixels, one can do a better job of locating the center of sampled points and hence build up a higher resolution image.
You can write an ISET simulation to test one of these super-resolution methods.
Alternatively, you can test methods for super-resolution imaging using real camera images. For example, take a camera image of a displayed image, (such as a face or a high resolution test chart) . Then take a capture a series of images of the display when only a subset of the pixels in the face (or chart) are illuminated. The illuminated pixels in each subset will be far away from each other such that the optical images of the pixels illuminated in each image do not overlap. You can further experiment by taking a blurry image of a face (say, by setting the caemra 1/f# to 12). Then, display subsets of pixels of the face that are widely separated. Find the location of the center of each illuminated pixel and combine the data to create a non-blurred camera image.
Mentor: Brian Wandell, Haomiao Jiang
Eulerian video processing (Bill Freeman thing)
Repeat one of the experiments from Bill Freeman. There published paper could be found from http://people.csail.mit.edu/mrub/vidmag/
Also, you need to compare the results for cameras with 3 color channels (rgb) and with 5 color channels (prototype in our lab).
Scene 3D System
The goal of the Scene3D project is to simulate the complete imaging pipeline for 3D scenes, from the scene to the lens , to the sensor and to the mage processing. Simulations of sensor and image processing are implemented in ISET. The novel part of Scene3D involves using a technique in 3D graphics called ray-tracing, which produces a physically accurate simulation of light rays that are refracted through lenses and towards the sensor. We modified the PBRT ray-tracer to simulate the important effects of diffraction and to be able to handle complex lenses and multispectral inputs and outputs. The end goal of the Scene3D project is to provide an infrastructure for rapid image systems prototyping.
[Scene3D project https://github.com/ydnality/Scene3D]
One important aspect in photography involves color balancing. Often times, photographs taken under different illuminant conditions will produce images that don't appear natural. For example, images taken under indoor tungsten lighting will exhibit an unnatural yellow/orange tint. These images must be corrected for in order to appear natural.
This class project involves applying the camera pipeline simulation provided by the Scene3D infrastructure for use in designing a color-balancing algorithm.
Tasks
- Start with a 3D radiance scene generated by Andy Lin. Modify the parameters of the scene to make different renderings of the scene under different light conditions.
- Design and implement an intelligent method for "correcting" (color-balancing) the illuminant.
- (Challenge/Optional) Design a color balancing method that is able to correct for scenes with 2 or more different illuminants.
Mentor: Andy Lin
PBRT and Zemax optics modeling
Scene3D use a combination of PBRT and ISET to simulate the complete imaging processing pipeline of a digital camera. The unique contribution of Scene3D is that it applies a technique in 3D graphics called ray-tracing, to produce a physically accurate simulation of light rays as they are refracted through lenses and towards the sensor. We modified the PBRT ray-tracer to simulate the important effects of diffraction and to be able to handle complex lenses and multispectral inputs and outputs. However, we have yet to verify this pipeline empirically.
One way we plan to evaluate our modifications to PBRT is to compare the point spread functions we generate with point spread functions generated by Zemax, a well-established software package used by many optics professionals. We provide a Zemax macro that can be used to generate the PSFs that ISET needs. Although Zemax can produce physically accurate PSF's, it cannot produce rendered physically accurate 3D multispectral images like PBRT.
Tasks
- This project would involve taking several PBRT multi-element lens models, and creating equivalent Zemax models.
- Use the Zemax to ISET interface to generate the data necessary for the ISET simulations.
- PSF's using the PBRT model will be provided as ISET optical images. We provide a Zemax macro that can be used to generate PSF for lenses that are modeled in Zemax.
- Compare and analyze the PSF's produced by these two different methods under different aperture and distances as verification.
Experience with Zemax is preferred.
Mentor: Andy Lin
Gesturing in a Virtual 3D space
The Holografika multi-projector display system creates a 3D light field that people can view without the need for special googles. Leap Motion is a controller that can sense small finger movements using an infrared led and camera. We linked these two devices so that users can grasp and move virtual objects in the 3D light space created by the Holografika display. We also linked the Leap Motion to a conventional stereoscopic display that uses an LCD with shutter goggles. The goal of this project is to compare how well users can use the hand-gesture controller to move objects in the virtual 3D spaces created by the two different types of displays.
The project has possible variations. - You can find a suitable OpenGL app or game from the Leap Motion Airspace app store that measures agility to quantify the learning rates of new users. The objects floating in front of the Holografika display will be aligned to the users hands in that 3D space, but not so with the flat LCD display. Possibly include mouse mode in the tests. -Using a 3D top-down street view map of London, test users skills at finding a location by panning and zooming a holographic 3D map of London on both kinds of displays, using hand gestures. Does the user's self-reported confidence correlate to measured performance and how does display type affect that? Use the metrics to predict the actual benefit for different kinds of organizations to transition from mouse control to (hands in air) gesture devices with 2D and Holographic displays.
The equipment is calibrated and available in Packard 070.
Here is a link to the companies involved: www.holografika.com and www.leapmotion.com
You can watch a video of the talk by the inventor of Holografika (Tibor Balogh) at https://talks.stanford.edu/scien/scien-colloquium-series/
Mentors: Dave Singhal, Harlyn Baker, Peter Kovacs
Projects 2013
Camera Forensics
You are presented with a digital image and asked to determine if it has been manipulated and if so to localize the manipulation in the image. Color filter array (CFA) interpolation generates a tell-tale signature in a digital image that can be used in a forensic setting. CFA interpolation leads to strong correlations between a specific subset of pixels and their spatial and chromatic neighbors. Build a classifier that takes as input a digital image and automatically detects which parts of an image do and do not exhibit the expected CFA correlations. Begin by generating a synthetic set of test images that have undergone your choice of CFA interpolation. Test your forensic analysis on these uncompressed images and then quantify the efficacy of your approach on increasingly more JPEG compressed images. Disputes often erupt over the provenance of photos. Consider how you might use your new forensic technique to distinguish between images taken from different types of cameras (e.g., a Canon PowerShot vs. a Nikon D-series).
References
- A Survey of Image Forgery Detection
- Exposing Digital Forgeries in Color Filter Array Interpolated Images
Tasks
- We provide you with training images
- You develop the classifier based on the papers
- We provide you with test images to see how you did
Image Forensics
You are presented with a JPEG image and asked to determine if it originated directly from a camera/mobile device, or if it was re-saved one or more times. Multiple compressions at different compression levels leave behind specific statistical artifacts in the distribution of DCT coefficients. These artifacts can be used to distinguish between singly and multiply compressed images. Build a classifier that can distinguish between singly and doubly compressed images (assume that the second compression level is different than the first). Validate your classifier on a large data set of images. Quantify the conditions under which the classifier is effective and not. Extend your classifier to distinguish between one, two, and three compressions. The expert forger becomes aware of your forensic technique and writes a special purpose encoder that will re-save a JPEG image with the same compression quality as the original. Consider how you might counter this by detecting multiple compressions made with the same compression setting.
References
Tasks
- We provide you with training images
- You develop the classifier based on the papers
- We provide you with test images to see how you did
Turbulence removal
X. Zhu and P. Milanfar, "Removing Atmospheric Turbulence via Space-Invariant Deconvolution" IEEE Trans. on Pattern Analysis and Machine Intelligence vol. 35, no. 1, pp. 157-170, Jan. 2013
Also see related talk and Project page
Options
- You obtain by measurement or simulation example images and then use their methods.
- You develop a variant of their method, exploring deconvolution, registration, or some other part of the algorithm more deeply than in the original paper.
- You find another approach and compare that approach to this one.
Photon calculator utility (ISET)
Build a program, perhaps based on the ISET library, that calculates the spectral irradiance at the sensor from the scene radiance and a specification of the optics. Doing this for diffraction-limited optics, specifying only the f/#, is sufficient.
The utility should be backed by a wiki page that illustrates all of the steps in doing that calculation. This project should produce an educational and useful calculator.
- Doing an implementation that can run on a browser on the Internet is best.
- Doing a straight Matlab implementation with a nice GUI is also good.
- Implementing the ISET (Matlab) routines as a Python calculator has value, as well.
Updating Wikipedia
Help us make Wikipedia better. There are surprisingly many Wikipedia entries on imaging and human vision that are just a few sentences long. Look-up for example: 'Troland', 'Stiles-Crawford effect', 'Photopic vision', 'Human PSF' or 'Active Pixel Sensor' to see how poor these entries are. Your mission, should you choose to accept it, is to improve these (or other) entries. Think of your work as of a paper, which is published online, rather than in a .pdf format. Of course, just as with writing any research paper, your work should start with a thorough literature review, select the relevant pieces of information and write them up in a way approachable to a non-expert in the field.
Neuroimaging (special approval)
With the opening of Stanford's Center for Cognitive and Neurobiological Imaging (CNI), we now have access to a large number of MR scans of the human brain. We are also closely connected to the MR hardware and image processing algorithms.
While this course is not specifically about neuroimaging, some of the methods in the course might be usefully applied to the data collected at the CNI. For students already working in MR and interested in such signal processing, we might be able to develop some projects that build on your interest.
Two possible projects are algorithms to:
- Identify when two MR images are of the same brain (brainprint), even if they were acquired using different contrasts.
- Evaluate image quality and MR artifacts
Scene database for computer vision testing (special approval)
Build scenes, say using Blender and PBRT, that we can run through the ISET simulation to produce images. Then analyze these calibrated scenes using computer vision algorithms to derive the depth, illumination, and shading. See this example page for folks who created a database from real, rather than simulated, scenes.
Color balancing (special approval)
Color balancing refers to the process of converting camera rgb data into display rgb values. If one simply copies the sensor pixel values into the display values, the resulting image will not generally be a good color representation of the original scene. An important step in the image processing pipeline is to transform camera rgb values to display values such that the display image appears to match the original scene that was captured.
A simple and common approach to color balancing is to make an educated guess about the scene illumination based on an analysis of the camera rgb values. The estimated illuminant is used to select a color transform (typically a 3x3 transform or a look-up table) that maps camera rgb values into human sensor (xyz) values for an ideal illuminant, such as daylight. The goal of this transform is to render the scene that the camera captured as if the scene were illuminated by daylight.
Most camera processing pipelines use a standard illuminant called D65 as the ideal rendering illuminant. As far as we know, no one has tested the assumption that people prefer to view objects illuminated by D65. The preferred rendering illuminant may also depend on the objects that are being rendered..
The project will use hyperspectral data of faces, fruit and vegetables and outdoor scenes, and spectral power distributions of different illuminants to generate images that people will view on calibrated displays. People will be asked to indicate which color renderings they prefer. In this way, we will collect preference data about preferred rendering illuminants. The preference data will provide a useful guide for engineers who are designing color balancing methods.
Hyperspectral video (special approval)
Help us build and evaluate a hyperspectral video system based on led lights synced with a monochrome video camera. Capture hyperspectral video images of human faces and estimate pulse rate by the change in color sensor values over time (see http://people.csail.mit.edu/mrub/papers/vidmag.pdf)
Biology of the mouse eye image formation (special approval)
There is a huge amount of biology done in mouse. There is a movement to study the mouse retina in particular. To study the retina, we would like to be able to understand how the cornea lens in the mouse blur the retinal image.
Adaptive optics to the rescue: Williams and his colleagues analyzed the optical quality of the mouse eye. Specifically, they measured the wavefront aberrations from 20 wild type mice. They provide the data in their article.
Optical properties of the mouse eye
Brainard, Hofer and I have written a wavefront toolbox in Matlab that enables us to specify the wavefront aberration and calculate retinal images in ISET. This project is to use our software to reproduce Figures 10 and 11 from the paper.
You can do this! If you do, many people will cite your project because there are many people who work on mouse.
Active LED-based illumination (special approval)
These days LEDs can produce high intensity light with well defined spectral properties. We are interested in a hardware system that allows to control both the on/off times of a set of LEDs, as well as their intensity using a simple Arduino microcontroller. One way you can do this, and we have a working prototype (refer to this project), is to use pulse width modulated signals to control the duty cycle of an LED. If you operate at high enough frequency, then you will perceive the rapidly flickering LEDs as having lower or higher intensity. In this project, however, we are interested in controlling the LED intensity more directly, so that even at the micro-time scale you control the LED intensity directly, rather than switch it on/off.
Projects 2012
Image processing
Hyperspectral Imaging
Analysis of hyperspectral images of paintings by famous artists
Consumer digital cameras capture electromagnetic energy in three different spectral bands. Multispectral and hyperspectral cameras capture electromagnetic energy in many more spectral bands. We used two different hyperspectral cameras to capture images of several paintings in the Cantor Arts Museum. One of the cameras captures images in 160 different spectral bands ranging between 400 and 1000 nm (visible and near-infrared or VNIR). The other camera captures images in 256 different spectral bands ranging between 1000 and 2500 nm (short-wave infrared or SWIR). There is a very large literature on hyperspectral imaging of paintings that we will use to guide our analysis of the data we have already collected. (http://www.springerlink.com/content/80342384844k0r21/fulltext.pdf) In particular, we should be able determine if there is a drawing beneath the painting (an “underpainting) and to characterize the paint pigment. This analysis will allow us to determine the history of the painting and assess its originality. We hope that this project will serve as the groundwork for an exhibit at the Cantor Arts Museum. (JEF and TS) Here is a nice website that describes methods used in art forensics (http://www.webexhibits.org/pigments/intro/look.html)
Analysis of hyperspectral images of live organs during surgery
Several research labs are investigating the advantages of hyperspectral imaging in robotic surgery. This is because hyperspectral cameras can capture a wider range of spectral data, including electromagnetic energy that the human eye cannot see. One of the challenges is how to map information that is normally invisible to surgeons onto visible images that enhance the ability to discriminate between different tissue types in a meaningful way. We have collected hyperspectral images of organs in a live pig during surgery. This project will analyze this data to determine if information in the invisible regions of the electromagnetic spectrum (> 700nm) can be used to enhance the information that surgeons see during an operation. (see http://www.intechopen.com/source/pdfs/9221/InTech-Hyperspectral_imaging_a_new_modality_in_surgery.pdf ) (JEF and TS)
Colorimetric reproduction of human faces
We collected VNIR (160 narrowband spectral images ranging between 400-100 nm) hyperspectral images of human faces, outdoor scenes, still life (fruit) and paintings. The hyperspectral image data can be used to generate a representation of spectral reflectance of the objects in a scene and the spectral power of the scene illumination. These representations can, in turn, be used as input to the ISET digital camera simulation software. ISET can then be used to predict the output of digital cameras with different color channels. For example, one can simulate a digital camera with three or more color channels, and vary the spectral sensitivities of each of the color channels. One can also vary the spatial distribution of those channels. Finally, one can vary both the demosaicking and color balancing algorithms in the digital camera. This project provides an excellent tutorial on how a digital camera works and gives you the opportunity to develop your own color imaging processing algorithms. (JEF and TS)
References and Web Links
Novel detectors for RGB and NIR
Using NIR to enhance visible data
Several Susstrunk lab papers. Some others.
http://infoscience.epfl.ch/record/148419/files/81_susstrunk_v5.pdf http://infoscience.epfl.ch/record/153994/files/de24567-susstrunk.pdf
http://www.comp.nus.edu.sg/~dfanbo/papers/VisualEnhanceHSI_Kim_PR2011July.pdf
http://gitl.sysu.edu.cn/papers/cvpr-2008-zhang.pdf
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5652900&tag=1
NIR Flash
http://www.comp.nus.edu.sg/~zhuoshao/NIRFlash/nirflash_icip2010_high.pdf
Photo retouching metric (Kee and Farid)
A perceptual metric for photo retouching Eric Kee and Hany Farid
Department of Computer Science, Dartmouth College, Hanover, NH 03755 October 19, 2011 (received for review July 5, 2011)
In recent years, advertisers and magazine editors have been widely criticized for taking digital photo retouching to an extreme. Impossibly thin, tall, and wrinkle- and blemish-free models are routinely splashed onto billboards, advertisements, and magazine covers. The ubiquity of these unrealistic and highly idealized images has been linked to eating disorders and body image dissatisfaction in men, women, and children. In response, several countries have considered legislating the labeling of retouched photos. We describe a quantitative and perceptually meaningful metric of photo retouching. Photographs are rated on the degree to which they have been digitally altered by explicitly modeling and estimating geometric and photometric changes. This metric correlates well with perceptual judgments of photo retouching and can be used to objectively judge by how much a retouched photo has strayed from reality.
- Watch a video of the SCIEN talk by Farid at Stanford, Jan. 31.
- Implement and analyze the algorithm.
- Perform experiments with existing online pictures
- Suggest improvements
Visibility of movie subtitles
A persistent problem in watching foreign movies is that sometimes the subtitles are illegible. Why? Because the contrast of the default background that is assumed is wrong and you have white characters on a light background. I assume this is done automatically because it is too expensive to have people judge frame by frame whether the script is visible. Need I say more. Some automated system that could assess the brightness of the standard background space where subtitles are printed and then adjust the contrast to be legible would be a huge improvement for the industry.
E. Markman, a committed viewer of subtitled films.
Image Quality
3D Image Quality Metrics
Develop algorithms for Shooting in 3D and Displaying in 2D. Explore ways in which to improve 2D rendering of 3D content in order to enhance “immersive video”.
Optics
Wavefront Toolbox
(BW)
Advances in adaptive optics now make it possible to measure the wavefront aberrations of the living human eye. Many groups are making these measurements in both control subjects and subjects with different types of optical dysfunctions.
These aberrations are usually specified in a way that is difficult to apply to image processing: The aberrations are specified as the weights on a set of Zernike polynomials. It is a simple matter of programming to convert these polynomial weights to a point spread function that can be applied in image processing algorithms.
We have received software from experts on this topic that implements the conversion. We can probably obtain a large number of samples of measurements from different categories of human eyes. In this project, we would create a web-site to convert the Zernike polynomials to point spread functions and illustrate how those pointspread functions would influence the quality of the optical image falling on the retina.
As we accumulate additional summaries of the human measurements, we might look for statistical patterns that might be explained in terms of the biological properties of the human cornea and lens.
See:
Chromatic and wavefront aberrations: L-, M- and S-cone stimulation with typical
and extreme retinal image quality
Florent Autrusseau, Larry Thibos, Steven K. Shevell
Vision Research 51 (2011) 2282–2294
Integrating 3D Distributed Ray Tracing and Image Quality
(BW), (AL), (JEF)
PBRT
Radiance
RenderToolbox
Implement and test Nayar Generalized Patent
Read the patent and implement tests of the idea.
Reference:
Neuroimaging
(AT), (AM), (RFD), (GS)
With the opening of Stanford's Center for Cognitive and Neurobiological Imaging (CNI), we now have access to a large number of MR scans of the human brain. We are also closely connected to the MR hardware and image processing algorithms.
While this course is not specifically about neuroimaging, some of the methods in the course might be usefully applied to the data collected at the CNI. For students already working in MR and interested in such signal processing, we might be able to develop some projects that build on your interest.
- Intelligent compression algorithm for multi-channel image data stored in frequency space (p-file compression)
- Algorithm to classify volumes that contain brains in a database of MR images that includes phantoms, squash, fruits, etc. (brain detector)
- Algorithm to identify when two MR images are of the same brain (brainprint), even if they were acquired using different contrasts.
- We can also do another one on MR artifact detection (so many artifacts, so few projects...)
Suggestions and projects from previous years
To see projectsfrom previous years, visit SCIEN Class Projects Page.