Psych221 Project Suggestions: Difference between revisions

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search
No edit summary
Line 20: Line 20:
== Oral health camera design ==
== Oral health camera design ==

Mentors:  Joyce and Zhenyi
Mentors:  Joyce and Zheng

We are acquiring data about the fluorescence arising from different parts of the human mouth.  We measure fluorescence by illuminating the mouth using a short wavelength (blue) light, and then making spectral photometric measurements of the light emitted from different locations.  Even though the illuminant contains only, say, 400 nm light, the emitted light contains energy at 450 to 600 nm.  These wavelengths are the fluorescent, rather than reflectance, signal.
We are acquiring data about the fluorescence arising from different parts of the human mouth.  We measure fluorescence by illuminating the mouth using a short wavelength (blue) light, and then making spectral photometric measurements of the light emitted from different locations.  Even though the illuminant contains only, say, 400 nm light, the emitted light contains energy at 450 to 600 nm.  These wavelengths are the fluorescent, rather than reflectance, signal.

Revision as of 04:17, 22 October 2019

Below we list project suggestions for Psych 221. We update this page regularly with ideas for projects.

  • We describe how you should create the write-up on the Project Guidelines page.
  • More than one person or group can work on the same project.
  • Just after mid-terms you will be asked to turn in a short paragraph proposing your project
  • We want to make sure you are the right person for your proposed project
  • If you want to work on a project that is not listed, but perhaps it is helpful for your research, ask us.

There are links to earlier Psych 221 projects.

Projects Fall 2019

ISETCam whole system validation

Mentors: Brian, Joyce and Zheng

Set up a physical 3D scene based on the Cornell box. Use the spectrophotometer to measure the radiance inside the Ronnie Luo gray box with the diffuse light source. Calibrate a lens and camera that provide raw sensor data. Compare the simulation with the prediction. We could use a calibrated camera (e.g., the Nikon). New features of this project include lens calibration.

Oral health camera design

Mentors: Joyce and Zheng

We are acquiring data about the fluorescence arising from different parts of the human mouth. We measure fluorescence by illuminating the mouth using a short wavelength (blue) light, and then making spectral photometric measurements of the light emitted from different locations. Even though the illuminant contains only, say, 400 nm light, the emitted light contains energy at 450 to 600 nm. These wavelengths are the fluorescent, rather than reflectance, signal.

This project will be to help us acquire more data, annotate the data, and place them in a database. Further, it will involve using software we have recently added to ISETCam to separate the light into the parts that are fluorescence and reflectance. We will also try to use statistical methods to characterize differences between people and differences between measurements made from different parts of the mouth. For example, can we use principal components and k-means clustering algorithms to understand more about the data.

Basic project: Collect spectral photometric measurements, interact with the database, and perform the fluorescence estimation. The skills will involve learning how (a) to use a spectrophotometer, (b) work with human participants to obtain measurements, and (c) design and record data for a reproducible experiment.

A more advanced aspect of this project: A spectrophotometer is an expensive and challenging instrument to use. Moreover, it does not acquire a full image but only measures the light from a small part of the image. It would be desirable to build a camera that measures an entire image and to estimate the fluorescence from such an image. If we know a lot about the reflected light (see the Basic project') we might be able to design a calibrated camera that acquires enough data to estimate the fluorescence throughout an entire image. The goal of this project is to design and simulate the amount of light, spectral character of the light and the camera, and image processing software to embed in such a camera.

Image alignment

Mentors: Brian

Basic project: Evaluate different software algorithms for their accuracy on image alignment. Do this using test images from the ISET3d software, where the ground truth is known.

Projects Fall 2018

Oral health camera design

Mentors: Joyce and Zhenyi

We are acquiring data about the fluorescence arising from different parts of the human mouth. We measure fluorescence by illuminating the mouth using a short wavelength (blue) light, and then making spectral photometric measurements of the light emitted from different locations. Even though the illuminant contains only, say, 400 nm light, the emitted light contains energy at 450 to 600 nm. These wavelengths are the fluorescent, rather than reflectance, signal.

This project will be to help us acquire more data, annotate the data, and place them in a database. Further, it will involve using software we have recently added to ISETCam to separate the light into the parts that are fluorescence and reflectance. We will also try to use statistical methods to characterize differences between people and differences between measurements made from different parts of the mouth. For example, can we use principal components and k-means clustering algorithms to understand more about the data.

Basic project: Collect spectral photometric measurements, interact with the database, and perform the fluorescence estimation. The skills will involve learning how (a) to use a spectrophotometer, (b) work with human participants to obtain measurements, and (c) design and record data for a reproducible experiment.

A more advanced aspect of this project: A spectrophotometer is an expensive and challenging instrument to use. Moreover, it does not acquire a full image but only measures the light from a small part of the image. It would be desirable to build a camera that measures an entire image and to estimate the fluorescence from such an image. If we know a lot about the reflected light (see the Basic project') we might be able to design a calibrated camera that acquires enough data to estimate the fluorescence throughout an entire image. The goal of this project is to design and simulate the amount of light, spectral character of the light and the camera, and image processing software to embed in such a camera.

Computer graphics asset creation and rendering

Mentors: Zhenyi and Trisha

A great deal is known about the illuminants and reflectance spectra of typical objects within the visible range. Much of this knowledge was obtained because people needed it to design effective consumer cameras. With the increasing use of cameras for machine vision applications, it is becoming increasingly valuable to learn about the reflectance and illumination beyond the visible wavelengths, extending to the band gap of CMOS imagers (about 1000nm). This data could be used to guide a range of automotive and drone applications.

Basic project: Use a spectral photometer to collect spectral reflectance samples of objects that extend across the wavelength range to 950nm or 1000nm. This project involves creating a methodology for (a) acquiring images, (b) acquiring spectral data from identified image locations, (b) measuring the illumination and reflectance from these location. We then need a method for storing and retrieving the data using our database.

Advanced part I: Search the web for existing databases with material reflectance that extends into the long-wavelength (near infrared) regions. Create models of the spectra using principal components methods, k-means clustering algorithms, or other data science tools.

Advanced part II: Create computer graphics renderings of the optical image of driving scenes using reflectance data and a camera that is specified all the way into the near infrared.

Exploring rendering algorithms using machine-learning (e.g., L3)

Mentors: Zheng and Brian

This project would be great for anyone interested in training small neural networks. Using ISETCam, we can create a great many sensor images of approximately natural scenes. We are interested in creating these sensor images using different types of sensors (e.g., standard Bayer and a Bayer with a white pixel rather than two green pixels).

For this project, we would like you to try to train a neural network that converts the data from one type of sensor into another

Basic project: We will provide you with a set of ISETCam scenes to use for this project. You can use ISETCam methods to predicted sensor responses from, say, a simple RGB Bayer camera. Then you use these same methods to calculate the predicted responses from a modified version of that camera. A first modification would be to double the spatial resolution of the camera. Because the images are simulated, they will be pixel-wise aligned in the sense that a 4x4 region of the lower resolution camera will correspond to an 8x8 region of the high resolution camera. You can use a tool (e.g., pyTorch or TensorFlow) to find a mapping from the low resolution to the high resolution image. Different methods for designing and building the network - such as autoencoder methods [1] - might be applied.

Many versions of this project might tried, such as predicting a monochrome sensor response from an RGB response, or - if you dare! - an RGB from a monochrome given some environment (fruits). Or predicting the sensor responses under daylight illumination from the sensor responses under tungsten illumination. Or predicting the sensor responses of an image without camera-shake from the sensor responses of an image with camera-shake. Or predicting what the sensor responses for a high illumination (bright light, 15ms exposure) from a capture at low illumination (dark light, 15 ms exposure).

Spatial CIELAB vs ISETBio and only front-end physiological optics

Mentors: Trisha and Brian

CIELAB Delta E is a color difference metric that measures the similarity of two colors to a human observer. Although widely used, the CIELAB metric is only suitable for measuring color difference of large uniform color targets. Therefore, the spatial CIELAB metric was created to extend the Delta E metric to color images instead of uniform patches. This is necessary because color discrimination and appearance is a function of spatial pattern, therefore spatial CIELAB was designed to takes into account the spatial-color sensitivity of the human eye.

We would like someone to use ISETBio to create L,M,S receptor responses to images. We would then calculate ISETBio-CIELAB differences based on these L,M,S values and compare them with Spatial CIELAB differences computed directly from a display screen. Since the L,M,S values calculated through human optics should already taken into account parts of the spatial-color sensitivity of the human eye, we are interested to see any similarities or differences between these two calculations. The critical aspect of this project is designing test targets.

You will learn: How to use ISETBio to calculate L,M,S values and how to calculate CIELAB values and Spatial CIELAB values.

Advanced Project: Add in eye movements to the calculation and calculate based on the mean response that incorporates eye movements.

Human Optics as a function of eccentricity

Mentors: Trisha

This project would be great for someone who is interested in optics and optical modeling software.

ISET3d is an extension of ISETCam that allows users to simulate 3D scenes and realistic lens prescriptions using ray-tracing and computer graphics. Using ISET3d, we have the ability to simulate a physiological model of human optics, which allows us to predict the optical image after a 3D scene is passed through the optics of the human eye. However, there are many different models of the human eye, which all differ in detail and complexity.

We would be interested in using either ISET3d or other optical modeling software to quantify the off-axis (e.g. wide-angle) performance of several human eye models. We can do this by calculating optical images at different angles away from the center of the retina, and quantifying the modulation transfer function or point spread function at each of these location. We can then compare their performance with known values in the literature.

You will learn: How to use ISET3d to model the optics of the human eye.

Advanced Project: Some physiological models of the eye have accommodation (focusing) modeling as well. In other words, we can change the lens prescription to model the human eye focusing near and far. Can we quantify the difference in these accommodation models?

Projects Fall 2017

Simulation of Cone Responses for Photosensitive Epilepsy

Patients with photosensitive epilepsy can get seizures from being exposed to flashing lights. Certain frequencies and colors are highly epileptogenic and in 1997 one Pokémon episode resulted in seizures and hospital visits in over 600 children in Japan. Specifically the red flickering lights can be very provocative (Takahashi and Tsukahara 1976; Binnie et al., 1984). It is hypothesized that the red flashes are highly epileptogenic because the only stimulate the red cones. (Harding 1998). A current study that investigated effects of age and gender similarly found that red stimuli are much more likely to induce epileptic activity. Can we simulate how do the cone responses differ across the different colored filters?

This project will use a toolbox called ISETBIO to simulate cone responses. ISETBIO is analogous to ISET, but specifically simulates the human visual system: from a stimuli, through the optics of the eye, onto the retina and photoreceptors, and eventually into the retinal ganglion cells. We can use ISETBIO (1) to setup a simulation with different colored filters and (2) analyze cone responses to stimuli that cause epilepsy.

Mentor: Dora Hermes

Speeding up lens simulations in a ray-tracer

Instead of using ISET to simulate the optics of an imaging system, we have the option of using a graphics ray-tracer to trace rays through a full optical lens system. This allows us to model full 3D scenes in our simulations instead of the flat 2D scenes used in ISET.

In our work, we use an open-source ray-tracer called PBRT (Physically Based Ray Tracer) that we've modified to trace rays through a given optical system. We shoot a ray from the camera sensor and use Snell's law to refract the ray through each surface in a lens system. However, this type of ray-tracing can be very slow and would benefit greatly with a speed-up. One possibility is to precompute ray paths and to load them in during rendering time.

In this project, we explore various methods to speed up the lens simulation in a ray-tracer. An ideal student would have some working knowledge in C++ and an interest in computer graphics.

Mentor: Trisha Lian

Modeling a cell phone camera pipeline

The good folks at Google wrote a paper describing how they make high quality images on a cell phone camera. The paper is included on our Canvas web-site.

Burst photography for high dynamic range and low-light imaging on mobile cameras. Hasinoff et al., ACM Trans. Graph. Vol. 35, No 6. Article 192 (2016).

For some of the projects, we can divide up different parts of the image processing pipeline described in this paper and simulate the expected results using the ISET tools. The critical simulation concerns the acquisition of many brief images, alignment of these images, and combining the results into a high quality result. Let’s see how far we can get in doing an assessment of their burst photography design with software simulation tools.

Camera properties and machine-learning algorithm performance

There are two thoughts about image sensors and machine-learning algorithms. One group of people thinks that the algorithms will run across any type of camera. Another group thinks that changing the camera optics and sensor may have an impact on the algorithm performance.

It is likely that the truth is somewhere in between. Some optics and sensor changes will have an impact on some types of algorithms. But we are not aware of any systematic studies that have examined how changing out camera parameters will influence the performance of convolutional neural nets (CNNs).

We can use the ISET tools in this class to simulate images obtained by cameras with optics and sensors. Those of you who are interested or skilled in machine-learning for image classification or object detection can create a project to evaluate how well a CNN trained for one camera will generalize to images obtained from a different camera.

Cell phone camera variation

Problem: There is a lot of interest in testing the image quality of smartphones, made especially relevant by the DxOMark rankings, which are often cited by the press and by phone manufacturers as a measure of the image quality of a particular model of smartphone. However, out of necessity, only one or a few examples of each phone can be exhaustively tested. That raises the question of how much unit-to-unit variation affects the scores, and if there are correlations in that variance based on sensor model, specific lens, or smartphone brand or price.

Suggested Project: Create a crowd-sourced experiment where volunteers (passers-by?) could take a photo of a test target and send in the result. Then analyze the data to attempt to determine, through some combination of data analysis and machine learning, how much variation there is between multiple samples of a particular model, and whether that varies with brand, price, sensor, optics, or some other potentially surprising factor.

Mentor: David Cardinal,

Sensor Calibration and Simulation

ISET makes it possible to predict the output of an imaging sensor, given a set of sensor simulation parameters. The simulation parameters are derived from a few fundamental measurements that characterize sensor spectral sensitivity and electrical properties including dark current, read noise, dark signal non-uniformity and photoreceptor non-uniformity. This project will involve making these measurements and deriving the simulation parameters for a camera that is in our lab. Calibration targets, measurement equipment, and software programs will be provided. There are also ISET scripts that describe the measurement methods and calculate the sensor parameters.

s_sensorEstimation.m illustrates how to measure the spectral response of a digital camera. s_sensorAnalyzeDarkVoltage illustrates how to measure dark noise s_sensorPixelReadNoise.m illustrates how to measure pixel read noise. s_sensorSpatialNoiseDSNU.m illustrates how to measure the DSNU of a sensor array. s_sensorSpatialNoisePRNU.m illustrates how to measure PRNU

You can use the measured and known sensor parameters to predict the RGB camera values for a color calibration target. You can then compare the predicted RGB camera values to the actual RGB camera values of the target, taken from a specific camera.


Mentor: Joyce Farrell

Geometric calibration of a stereo camera

There are several online videos and software packages that describe how to measure and correct for camera lens distortion, and how to estimate the size and location of objects based on the images the objects project onto two cameras in a stereo configuration.

This project involves using calibration targets and software (see references below) to estimate the camera’s intrinsic, extrinsic and lens-distortion parameters. In the process of doing this, you will learn what these parameters are, how they are calculated, and how the accuracy of the estimated parameter values affect the accuracy of object size and distance predictions.


Mentors: Joyce Farrell and Trisha Lian

Depth from Stereo Images

Database of synthetic stereo images

The Middlebury Stereo dataset is a collection of stereo images with “ground truth” disparities or depth maps. Researchers and students have used datasets that are part of this collection to compare different methods for estimating depth from stereo images. The depth maps are inherently noisy due to that they are empirically measured using range-sensing devices or structured lighting

This project will use our lab software to create a new database of synthetic stereo camera images and associated depth maps. You can modify the scene properties of a scene, position the cameras in the scene, modify the baseline distance separating two cameras, and modify properties of the optics and sensors in the two cameras.

Mentor: Trisha Lian

Stereo algorithm assessment

As a related project, a cooperating group might run depth estimation algorithms that are already published on the web (see, for example, functions in opencv ) and learn about how camera parameters such as baseline separation, optics, and/or sensor resolution affect the accuracy of the depth estimation algorithms.

Projects Fall 2016

RealSense 3D-imaging

The following are some project ideas that involve the real-time RGB-D imaging technologies using RealSense dev kits. We will provide both RealSense SR300 (short-range depth-camera based on coded-light technique) and LR200 (long-range version based on IR-assisted stereo-3D technique).

Projected Texture Stereo

RealSense LR200 module uses a projected texture stereo system. Measure and model the system’s optical properties and implement techniques for generating high-quality pattern texture projectors, as outlined in published work. Mentor: Leo Keselman

Computational Photography

Using depth maps from either the RealSense SR300 or LR200, create examples of depth-of-field blur, tilt-shift effects, and other post-processing effects. Mentor: Leo Keselman

Stereo Algorithms

RealSense LR200 hardware produces depth maps using stereo matching algorithms. However, they also provide left and right images. Implement, test and design alternative stereo matching algorithms, and compare with the results provided with built-in algorithms in the LR200 ASIC and accessed through the API. Mentor: Leo Keselman

Visual Odometry

There exists many techniques for estimating camera position when given an image. RealSense SR300 and LR200 provide both rectified images and depth maps. With these, a wide range of techniques, from ICP to three-point-pose RANSAC can be used to implement 3D scanning of large environments. Implement such a system. Mentor: Leo Keselman

Image systems simulation

Autonomous vehicle sensors: Forensic analysis of the fatal Tesla car crash

On May 7, 2016, a 40-year old man was killed when his Tesla crashed in Florida. There are many articles describing the accident and speculating about the cause. For example, Telsa reported that “Neither Autopilot nor the driver noticed the white side of the tractor trailer against a brightly lit sky, so the brake was not applied.”

The Tesla car had a Mobileye system that includes several cameras and an image processing module. There is enough known about the imaging sensors in the Mobileye system to predict the images the sensors would have captured for different types of scenes.

This class project will use the ISET digital camera simulation software to model different scenes and image sensor parameters (e.g. exposure duration and video rate). Extra bonus points if you use machine learning (svm) to determine whether a system can detect the difference between different types of scenes. For example, what type of imaging sensor is required to detect the difference between a “white side of a tractor trailer” and “a brightly lit sky”?


Inside the Self-Driving Tesla Fatal Accident, by Anjali Singhvi and Karl Russell, NYTimes, July 12, 2016; Tesla faults brakes, but not autopilot, in fatal crash. By Neal Boudette, Business Day, July 29, 2016; Mobileye EMP evaluation platform; Fatal crash prompts federal investigation of Tesla self-driving cars, by Sam Thielman, The Guardian, July 13, 2016; Autopilot 2.0 adds more sensors to be better than ever, report says, by Chris Mills, BGR, Aug 11, 2016; Tesla Autopilot 2.0: retrofit to next gen sensors likely to be available for some owners, Fred Lambert, electrek, August 6, 2016; Tesla Autopilot 2.0: next gen Autopilot powered by more radar, new triple camera, some equipment already in production, Fred Lambert, electrek, August 11, 2016; Researchers trick Tesla Model S. Autopilot, Brandon Turkus, Autoblog, Aug 4, 2016; Another crash on Telsa autopilog, another driver admits to not paying attention, was cleaning his dash, by Fred Lambert, electrek, August 19, 2016; Tesla Model S, Wikipedia; Understanding the fatal Tesla accident on Autopilot and the NHTSA probe, Fred Lambert, July 1 2016 “WTF is the deal with driverless car guru George Hotz’s Comma Points?”, by Joe Carmichael, July 7, 2016; Uber and Volvo partner up, robot ride-sharing starts this summer, by Jonathan Gitlin, ARS Technica, Aug 18, 2016 startup in SF; startup in SF; Nauto – startup in Palo Alto

Learning a driving simulator, by Eder Santana and George Hotz

Mentor: Joyce Farrell

360 Camera Capture Simulation

The recent popularity of head mounted displays and VR has increased interest in constructing 360 cameras that can capture and render stereo panoramas. A couple of recent examples inlcude Facebook's Surround360 or Nokia's OZO camera.

With a combination of a customized ray-tracing renderer (PBRT-spectral) and a MATLAB toolbox to control it (RenderToolbox3) we have the ability to simulate 360 cameras in a 3D virtual scene created in a modeling program such as Blender. To do this we specify the distribution of cameras, their lenses, focus, FOV, etc. and take a "snapshot" of a virtual scene. For example, we can place 6 virtual cameras in a circle with a 1 foot radius, attach wide angle lenses to all cameras, and take images from each camera. Because the scene is virtual, we also have access to the ground-truth depth map and true panorama.

This project will focus on using these simulation tools to evaluate either the 360 stereo stitching algorithms or the design of the camera itself.

Note: Facebook's stitching code for it's Surround360 camera is now open source and on Github.

Some potential ideas to start with:

1. Can you design a database of virtual scenes that can help evaluate the effectiveness of 360 stereo stitching algorithms? This would include constructing a variety of scenes with a modeling program (e.g. Blender, Maya), porting the scenes to the simulation software and then taking 360 camera snapshots using the tools described above. Using a combination of the ground truth and the results of a stitching algorithm, can we evaluate how well the algorithm performs?

2. Can you evaluate the design of a 360 camera using the simulation tools above? For example, how would the quality of the panorama change by having 12 cameras in a ring instead of the 16 cameras on the Surround360 camera? This direction may require you to dig into the stitching code and make appropriate adjustments.

C++ and Python skills are necessary for using Facebook's open source stitching code. An understanding of basic Computer Vision would also be helpful.

Mentor: Trisha Lian for using the simulation software

Underwater simulations

The advent of GoPro camera has made underwater photography much more accessible. Unfortunately images captured underwater rarely look pleasing, they have washed out colors and low contrast due to scattering. To better understand the impact of water and its different constituents on underwater target appearance we built a ray-tracing based simulation environment for underwater photography. With this tool we think we can render images of underwater targets that look realistic, or do they?

To have some notion of how water really influences color appearance we also captured a number of underwater images using a variety of consumer cameras. In this project you will learn about raytracing through water and different mathematical models used to compute the interactions between lights, targets and water. Ultimately your goal will be to improve the simulation environment to make the simulated and captured images as visually close as possible.

Henryk, Trisha, Joyce

Model RealSense camera

In the past few years color+depth cameras such as Intel RealSense have become commonly available. Such cameras provide images of the scene together with depth maps i.e. arrays of numbers describing distances between the camera and points in the scene. Very often color and depth modules of a particular camera take advantage of fundamentally different physical processes to produce their images. Consequently substantially different tools are necessary to model how cameras produce color images and depth images.

Color image acquisition can be modelled with computer graphics tools, such as PBRT. PBRT is a ray tracking software that accurately simulates how light interacts with different objects in a 3D scene and how the light is projected onto a camera sensor. These rendering tools can be modified to incorporate the behavior of complex lens systems and elaborate camera designs such as light field cameras. In fact we have a modified version of PBRT to perform precisely such simulations (Spectral PBRT).

A different set of simulation tools is necessary to model depth estimation. One such tool is Blensor, which is a plug-in to Blender, an open source 3D editing tool. Blensor has been designed specifically to model how different types of depth cameras capture their data.

Unfortunately having two different tools is very inconvenient for modelling purposes. It is easy to loose track of simulation parameters, for example, camera poses and positions, scene orientations etc. Your goal for this project is to create a wrapper around PBRT and Blensor to allow users to easily and seamlessly use both tools. Ideally a user would define a model of a depth and color camera together with a scene mesh that would represent the world. The wrapper would need to handle the different tools, and make sure that the color and depth data is consistent with the scene mesh and camera models.

We hope that with the wrapper you will create you will be able to create a good model of a RealSense depth camera.

Achin, Henryk, Trisha

Myopia/Hypermetropia VR Experience

Myopia (near-sightedness) and hypermetropia (far-sightedness) are the most common eye problems in the world. With virtual reality, we have the potential to simulate the visual experience of uncorrected myopia or hypermetropia. This project will focus on creating such a VR experience. One possible path is to use Unity to create virtual rooms with interesting features that can highlight the experience of these vision problems. This would involve writing a shader that can blur the scene, as realistically as possible, according to depth and presenting this altered image through the VR goggles. Additional features may include sliders to change severity or to add other effects to increase the realism of the experience.

Students who work on this project may potentially be put in touch with documentary filmmakers interested in a creating a piece on myopia.


Computer vision and computational photography

Reflectance, Fluorescence, and Color Matching

Fluorescence emission is a common property of biological tissues and materials and it strongly impacts the appearance of surfaces under different illuminants. Its presence makes any color matching task much more difficult. One example of a biological substance for which color matching is important are teeth. Natural enamel fluoresces under shot wavelength light, and whenever a dentist fills in a cavity he/she needs to select the filling with the color matching the tooth. However what may appear similar under dentists lamp, may look very different in broad daylight.

In this project you will perform a set of measurements of how teeth reflect and fluoresce light and then help design the spectral reflectance properties of a better dental filling that will be less visible under different illuminants.

Henryk, Joyce

Auto-cropping using Deep Learning

One of the most common post-processing tasks in photography is cropping of images for improved visual impact. This has only gotten more important with the widespread adoption of fixed-focal-length smartphones as the most common cameras in use today. There have been a number of very sophisticated attempts to automate this otherwise labor intensive process using adherence to various rules of composition (see References below). However, they suffer from growing complexity, as each attempt to improve the system requires layering yet more specialized knowledge. This seems like an ideal challenge for a deep learning based solution.

There don’t seem to be any (publicly available at least) frameworks for solving this problem in its entirety, but there have been several attempts to rate the aesthetics of photographs using deep learning (see References below). So, the project is to see if a similar approach can be used to automatically improve images by cropping them in some fashion. It provides some interesting challenges in design of the deep learning system. For example, should it be designed to evaluate each image and its possible crops independently, or is there a way to directly measure the success of a crop compared to the original image? The total solution spaces is extraordinarily large, so a variety of simplifying assumptions (for example a limited number of potential crops for each image) is assumed.”

Some references:

Optimizing photo composition (refers to above papers)

Rating Pictorial Aesthetics using Deep Learning

Mentor: David Cardinal

Human vision simulations (ISETBIO)

Predicting visual acuity from wavefront aberrations

Andrew B. Watson; Albert J. Ahumada, Jr
It is now possible to routinely measure the aberrations of the human eye, but there is as yet no established metric that relates aberrations to visual acuity. A number of metrics have been proposed and evaluated, and some perform well on particular sets of evaluation data. But these metrics are not based on a plausible model of the letter acuity task and may not generalize to other sets of aberrations, other data sets, or to other acuity tasks. Here we provide a model of the acuity task that incorporates optical and neural filtering, neural noise, and an ideal decision rule. The model provides an excellent account of one large set of evaluation data. Several suboptimal rules perform almost as well. A simple metric derived from this model also provides a good account of the data set.

A formula for the mean human optical modulation transfer function as a function of pupil size

Andrew B. Watson

Abstract: We have constructed an analytic formula for the mean radial modulation transfer function of the best-corrected human eye as a function of pupil diameter, based on previously collected wave front aberrations from 200 eyes (Thibos, Hong, Bradley, & Cheng, 2002). This formula will be useful in modeling the early stages of human vision.

A unified formula for light-adapted pupil size

Andrew B. Watson; John I. Yellott

Abstract The size of the pupil has a large effect on visual function, and pupil size depends mainly on the adapting luminance, modulated by other factors. Over the last century, a number of formulas have been proposed to describe this dependence. Here we review seven published formulas and develop a new unified formula that incorporates the effects of luminance, size of the adapting field, age of the observer, and whether one or both eyes are adapted. We provide interactive demonstrations and software implementations of the unified formula.

Mentor: Wandell

The impact of small eye movements on high frequency resolution of the eye

Simulate the effects described in this paper using ISETBIO

Abstract: Humans and other species explore a visual scene by making rapid eye movements (saccades) two to three times every second. Although the eyes may appear immobile in the brief intervals between saccades, microscopic (fixational) eye movements are always present, even when an observer is attending to a single point. These movements occur during the very periods in which visual information is acquired and processed, and their functions have long been debated. Recent technical advances in controlling retinal stimulation during normal oculomotor activity have shed new light on the visual contributions of fixational eye movements and the degree to which these movements can be controlled. The emerging body of evidence, reviewed in this article, indicates that fixational eye movements are important components of the strategy by which the visual system processes fine spatial details; they enable both precise positioning of the stimulus on the retina and encoding of spatial information into the joint space–time domain.

Control and Functions of Fixational Eye Movements Annual Review of Vision Science Vol. 1: 499-518 (Volume publication date November 2015) First published online as a Review in Advance on October 14, 2015 DOI: 10.1146/annurev-vision-082114-035742

The unsteady eye: an information-processing stage, not a bug [2]

Mentor: Wandell

Effects of age on color appearance

Use ISETBIO to simulate the combined effects of an aging eye - changes in lens opacity, light scatter, pupil size, and so on - on various perceptual phenomena, such as color appearance.

From (Brainard, D. H. & Hurlbert, A. C. (2015). Colour vision: understanding #TheDress. Current Biology, 25, R549–R568, doi: 10.1016/j.cub.2015.05.020).

There are, in fact, a number of well-documented individual differences in the sensory apparatus that supports colour vision (reviewed in [13,14]). These include differences in pre-retinal filtering of light (for example, by the lens and macular pigment) — which, intriguingly, mostly affect short-wavelength or ‘‘bluish’’ light — differences in the spectral sensitivities of the retina’s cone photoreceptors, and differences in the relative numbers of cones of different classes. This type of front-end difference affects the information extracted from an image by different individuals, and might thus lead to differences in colour constancy. Other individual differences that can be revealed with much simpler stimuli may also be important. For example, as noted above, the stimulus seen as achromatic differs from one person to another, as do the stimuli that are perceived as pure examples of the unique hues (red, green, blue, and yellow) [15]. These differences themselves may be driven by front-end sensory differences, by differences in neural mechanisms that calibrate the colour vision system [16,17], or by an interaction between the two. Lastly, there might be individual differences in higher-order neural processes that specifically mediate colour constancy. A full understanding of the individual differences in how the dress is perceived will ultimately require data that relate, on a person-by-person basis, the perception of the dress to a full set of individual difference measurements of colour vision. The rich dataset of Lafer- Sousa et al. [2] suggests that age and gender do predict, to some extent, the variability in people’s response to the dress. Intriguingly, the density of pre-retinal pigments is also known to vary systematically with age."

Mentor: Wandell

Projects Fall 2015

A new approach to image processing (L3)

We have developed a new image processing pipeline (L3) for a digital camera based on machine learning and high speed processing with GPUs. L3 (Local, Linear, Learned) automates and customize image processing pipeline for a given design to speed camera development, leveraging advanced camera simulation and machine learning techniques.


[1] Automating the design of image processing pipelines for novel color filter arrays: local, linear, learned (L3) method

[2] Automatically designing an image processing pipeline for a five-band camera prototype using the Local, Linear, Learned (L3) method

Accelerating L3 Processing Pipeline for Cameras with Novel CFAs on NVIDIA® Shield™ Tablets using GPUs

L3 classifies input image patches into categories that are local in space and response, and automatically learns linear operators that transform pixels to the calibrated output space using training data from camera simulation. The local and linear processing of individual pixels makes L3 ideal for parallelization.

This project aims to accelerate the L3 pipeline on NVIDIA® Shield™ Tablets using GPUs for real time rendering of videos. A tablet application that demonstrates the fast rendering feature of the L3 method is potentially to be accomplished. The learned linear operators and video data captured by a multispectral camera prototype will be provided. The CUDA / C++ (or CUDA / Matlab) code that works on a PC will be provided as a starting point.

Skills preferred: CUDA, Android Programming

Mentor: Haomiao Jiang

High Dynamic Range Video Using the L3 Method

High dynamic range (HDR) imaging has advanced and translated to consumer products during the last decade. The majority of HDR techniques capture and combine multiple exposures to recover details and contrast simultaneously in dark and bright regions. However, this strategy requires the scene to be still during the multiple captures and is therefore inherently not suitable for HDR video acquisition. Altering the exposure settings in CFA is a promising approach for single-shot HDR image and HDR video acquisition, by trading-off spatial resolution. These novel HDR CFAs require time and effort to develop tuned image processing pipelines.

This project aims to explore the feasibility of L3 method on these novel HDR CFAs, particularly for HDR video application. Various HDR CFAs will be compared through the resultant images from the L3 processing pipeline in order to determine the optimal design.


[3] Cheng, CH. et al., "High Dynamic Range image capturing by Spatial Varying Exposed Color Filter Array with specific Demosaicking Algorithm," IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 2009.

[4] F Yasuma, T Mitsunaga, D Iso, SK Nayar, Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum, Image Processing, IEEE Transactions on 19 (9), 2241-2253

Mentors: Qiyuan Tian and Steve Lansel

Designing L3 Processing Pipeline for a Camera Testkit with an RGB/W CFA Clear pixels have been introduced to CFA to transmit much more light for low light photography (e.g. Aptina’s Clarity+ sensor, OmniVision’s Clear Pixel sensor inside Moto X and Sony’s Exmor RS RGB/W sensor). However, it is challenging to develop satisfying image processing pipelines that produce high image quality. In simulation, L3 has been demonstrated as an effective and efficient processing pipeline for RGB/W sensor (see the movie comparing L3 processing results for a conventional RGB sensor and a RGBW sensor at a series of light levels, link).

This project aims to design an L3 processing pipeline for a camera teskit with an RGB/W CFA following the procedures described in Reference [2]. The camera testkit will be first calibrated for camera simulation. L3 processing pipeline will then be created from the simulation and tested on the raw images captured by the testkit.

Mentor: Qiyuan Tian, Haomiao Jiang

Color Matching in Dentistry

When dentists fill a cavity, they must select a composite material. When they replace a tooth or place a crown or veneer on an existing tooth, they design or order a porcelain implant. These decisions require the dentist to compare the color of teeth with the color of the composite or porcelain material. Dentists try to select the color or shade of the material that provides the best color match to the surrounding teeth, but they also complain that this is a difficult task.

By now you have learned how to use the CIELAB color difference metric to predict whether two colors will appear to match under a fixed illumination. You have also learned that these predictions are not invariant with changes in illumination. In other words, if you change the lighting, the colors of two different materials may no longer appear to match. Therefore, the smile that looks so perfect in the dentist’s office under fluorescent lighting might have imperfections under daylight.

This project has three components. First, we will make spectrophotometric measurements of 1) the reflectance of teeth in-situ in different individuals, 2) the reflectance of different composite and porcelain material, and 3) the spectral power of the light that falls on teeth in-situ under different lighting conditions. Second, we will use this data and the CIELAB color difference metric to predict whether people will be able to detect the difference between teeth and composite and porcelain material under different lighting conditions. Third, we will use the data in ISET simulations in order to determine the tradeoffs in color matching accuracy, cost and convenience. More specifically, we will simulate an imaging system based on a cell phone camera with flash/no-flash mode that has the potential of providing dentists with an alternative to the more expensive spectrophotometric devices that are currently on the market.

Mentor: Joyce Farrell ( and Henryk Blasinski (

Simulation projects dusing ISETBIO

ISETBIO is an ISET based Matlab toolbox that can simulate human optics and photoreceptor sampling. With ISETBIO, we can accurately compute the optical irradiance image that impinges on the retina and the number of photons absorbed by human photoreceptors (cones) for a given scene. ISETBIO is capable of simulating human individuals with different optics (myopia, astigmatism, etc.) and cone mosaics (colorblind, density difference, etc.).

Reproduce and Compare with Recent Papers In this project, you are expected to reproduce the results from one recent paper with ISETBIO. You are expected to work with your mentor to rewrite it in ISETBIO and try to explain every difference (if any) from the original code.

Here is a set of papers by Watson that are computational, in Mathematica, and related to Optics and Retina

Modulation Transfer Function and pupil size
Pupil size and light level

Here a paper related to the human point spread function

Computing human optical point spread functions

Retinal ganglion cell modeling

A formula for human retinal ganglion cell receptive field density as a function of visual field location

Or ganglion cells and behavior

Retina-V1 model of detectability across the visual field. The original code for the paper will be provided.

Skills preferred: Matlab programming

Mentor: Haomiao Jiang, Brian Wandell

Simulate an eccentric camera

Write a simulation of the Foveon sensor.


Write a simulation of the camera.

Mentor: Brian Wandell

Monitoring the environment

We have ideas about how to take calibrated underwater images captured by GoPro cameras to monitor the health of coral reefs. There are various components to the project (camera calibration, modeling of light transport through water, and automating image upload, storage and analysis).

Mentor: Henryk Blasinski

An underwater, multispectral light source

Underwater imaging is quickly gaining importance not only due to its applicability in marine ecosystem monitoring, but also proliferation of inexpensive action cameras such as GoPro. Unfortunately, the colors in images acquired under water are severely distorted due to scatter and absorption phenomena. One approach to recover more spectral detail is to use active illumination techniques, this approach has proved to be very useful on the surface. In this project you will design and build an underwater, LED based multispectral light source that fits a standard GoPro size, underwater housing. With all the hardware in place you will have a chance to evaluate the accuracy and performance of active illumination spectral recovery in underwater scenarios. This is a hardware oriented project, you will be expected build and integrate the final system, which means that you should be familiar with soldering, PCB design and possibly even some CAD tools.

Skills Preferred: Hardware design experience, OR good with web-site programming.

Mentor: Henryk Blasinski


Geometric Camera Calibration In order to simulate degradations of the human visual system using images captured by a camera, it is necessary to know exactly how those images have been captured. This project uses simple camera models that use efficient and flexible calibration procedures to derive geometric parameters such as focal length, radial distortion and the position and rotation of two cameras. There are well-established techniques that estimate these parameters using a specific calibration target like a checkerboard. The goal of this project is to become familiar with those techniques and use them on real images (OpenCV provides many building blocks which can be used) with an image undistortion procedure and a stereo image rectification procedure.

Skills preferred: Knowledge of C++


Streaming and Augmenting Stereo Camera Images One of the long term goals is the simulation of certain degradations of the human visual system and the evaluation of computer-aided visual enhancements to counteract those degradations. A crucial ingredient in achieving this is a software pipeline which can stream images from a stereo camera to an augmented or virtual reality device in real-time. Hence, the goal of this project is to build such a pipeline to capture, stream, and feed images from cameras in real-time to an Oculus Rift device.

Skills preferred: Knowledge of C++, Willing to learn Oculus Rift SDK, optionally also OpenGL SL or CUDA


Image Display Use the Oculus Rift to display images to human subjects that simulate (recreate) visual sensations that a person with a particular visual condition would see. This could be low vision, a type of color blindness, loss of central vision due to macular degeneration, or the effect of a retinal prosthesis in a blind person. We will help you use isetBio to create images to simulate one of these conditions. You will render the images on a calibrated Oculus Rift.

Information Display The goal of this project is to capture and display information so that people can track their movements and navigate in an environment with only visual input from the Oculus Rift. This will be accomplished by interfacing a Project Tango device with an Oculus Rift display. The Project Tango has sensors and software designed to track the 3D motion of the device and create a map of the environment using simultaneous localization and mapping (SLAM) algorithms. The output of the Project Tango is usually rendered on a laptop display. In this project, you will render the output on an Oculus Rift.

3D Projects

Almost anything with Real Sense

Depth Sensing With an Endoscope Using Flashes

Depth sensing has been a recent industry trend for many imaging applications. One less explored route is the use of depth sensing for endoscopes, to help identify tumors or other problems. For this project, initially use simulated Scene3D endoscope images to prototype a depth sensing algorithm involving 2 flashes and 2 captures (other capture procedures could be used as well). Prototyping using simulation is a nice, structured way to try out new algorithms quickly. Next, apply this algorithm using a real endoscope and tackle the real-world challenges involved.

Mentor: Steve Lansel

Curved Sensor Simulation

Sony and other imaging companies have recently unveiled curved sensors to improve image quality. Curved sensors bring imaging improvements because of the physics of geometric optics. For simple lenses, usually the focal area is in the shape of the surface of a sphere. However, most sensors are planar, so are only able to capture a small portion of the focal area. Lens engineers usually try to account for this problem using many lens elements and aspheric lenses. However, a curved sensor could potentially be a far simpler, and less expensive solution to obtain high quality images, in a smaller form factor. Instead of using a complex lens to obtain high resolution, imaging engineers could simply use a simple lens and a curved sensor to obtain the same, or even better results.

This projects involves using Scene3D, a full pipeline camera simulator to compare the resolution and chromatic aberration benefits of curved sensors and a simple lens, versus a planar sensor and a complex lens.

Mentor: Brian Wandell

Integration with OpenCV

Do we want to create scenes of some sort (stereo? different illumination? different noise? Different optics?) and test openCV algorithms for robustness against the range of simulated images.

Integration with Caffe

Simulation environments can be used to produce millions of images with a purpose in mind. We can then use these images to train machine learning algorithms. Is there something we want to ask people to do with, say, RenderToolbox to generate many examples and train on with Caffe?

Multispectral imaging for classification

Image classification is a very hot topic in computer vision. Most algorithms however operate on RGB camera channels, as if trying to mimic human visual system. In reality spectral information is much more abundant and can possibly be used to enhance classification algorithms. This project aims at investigating how much the accuracy of computer vision tasks can be improved if more spectrally sophisticated cameras were used . Specifically you will use a five band camera prototype to evaluate fruit and vegetable aging and perform flower classification, you will compare its performance to the performance of a classical RGB camera.

Skills Preferred: computer vision, machine learning

Gullstrand Eye and ray tracing of human optics

We are building a tool for modeling eyes, including the human, from ray tracing fundamentals. There is a famous model eye that we would like to implement.

Gullstrand eye search

We would like you to implement and test the Gullstrand eye with the ray tracing software in the ciset package (a close relative of ISET).

Mentor: Brian Wandell

Projects 2014

Predicting human performance using ISETBIO

ISETBIO is an ISET based Matlab toolbox that can simulate human optics and photoreceptor sampling. With ISETBIO, we can compute the optical irradiance image that impinges on the retina and the number of photons absorbed by human photoreceptors (cones) for a given scene.

For this project, a tutorial script describing how to calculate cone absorptions will be provided and the students will be responsible for trying to answer one of following questions:

  1. What's the maximum necessary display resolution (ppi) at certain viewing distance for Vernier acuity.
  2. What's the maximum necessary display resolution (ppi) at certain viewing distance for contrast (CSF) resolution.

To answer these kind of questions, students are encouraged to build two scenes and use their preferred machine learning algorithm (e.g. SVM/Neural Network/Random Forest, etc.) to classify cone absorption sensor data for two same or two different scenes into "same" or "different" classes. When classification accuracy for cone absorption data is greater than a pre-determined value (say, 75%), we would predict that the observer can tell the difference between the two scenes. You can compare these predictions with published data from real human observers.

Preferred Knowledge: familiarity with at least one machine learning algorithm

Mentor: Haomiao Jiang

Hardware project: Build a Multispectral Imaging System

Build a multispectral imaging system based on a rotating color filter wheel and monochrome camera.

If you have experience in design and 3D printing, you can build several necessary parts.

If you have an interest in engineering applications for art history, there is an opportunity to use the system to capture images of paintings in the Cantor Arts museum.

Mentors: Henryk Blasinski and Joyce Farrell

Hardware project: Build an inexpensive spectrophotometer

In this project, you will build a simple spectrophotometer using a clean DVD-R, a USB webcam and stiff black card paper

Here's a website introducing how to do it:

After building the device, you need to compare it to the performance of a much more expensive (~$50K) spectrophotometer that we have in the lab

Mentor: Haomiao Jiang

Camera Image Quality Metrics

The International Standards Organization (ISO) is developing a set of camera image quality metrics to quantify the spatial resolution, noise and color accuracy of digital cameras.

Many of these metrics have been implemented in ISET.

You can use ISET to calculate these metrics for simulated cameras that have different optical properties, numbers of pixels and image processing methods. You can also use ISET to simulate how each camera captures and processes natural scenes (e.g. faces and landscapes). You can then compare the metrics with the appearance of these images as they are rendered on a display.

In this project, you will use ISET and CPIQ to quantify and illustrate how the metrics and the images change when you decrease the size of camera pixels (and inversely increase the number of camera pixels). This method will allow you to analyze how resolution tradeoffs with sensitivity: Small pixels make it possible to increase the number of sensor pixels sampling the optical irradiance image, but it also decreases the amount of photons a small pixel can capture. What do you prefer, a high resolution noisy image or a low resolution clear image? How does this depend on the display, viewing distance, etc.?

Mentor: Joyce Farrell

ISET model for real camera

In this project, you will build an accurate ISET model for a physical camera we have. You will take pictures of known scenes, analyze the captured images, and try to build an ISET model.

The goal is for the ISET model of the camera to give approximately the same computational results as the RAW output from the real camera. The similarities could be measured by the noise, color, spatial resolution and etc. Analyzing the errors between the model and the real camera will determine the model's accuracy.

If time permits, you can try to implement an image processing pipeline for the camera and evaluate the performance of the processed images.

Mentors: Qiyuan Tian, Steve Lansel, Joyce Farrell

Analysis and Compression of L3 Filters

The L3 algorithm is a learned image processing pipeline for cameras. The algorithm learns optimal linear filters for a given camera based on training data, light level, illumination color, and optics. For a complete camera this may result in many (possibly hundreds) optimized filters. We believe the filters will be closely related for similar camera settings. The goal is to analyze the filters, store a compressed set of filters, and interpolate the needed filters from this compressed set. This way we only need to store a smaller set of filters and can extrapolate to lots of new camera settings. Here is a recent SPIE paper on L3:

Mentors: Qiyuan Tian, Steve Lansel, Brian Wandell

ISET model for underwater imaging

With the proliferation of cameras such as GoPro more and more people have started taking underwater images. These usually have large amounts of distortion, both spectral and spatial, originating from the medium in which the image was taken. Rather than experiment in the real world, the impact of different light transport phenomena on RGB images can be understood via simulation environments. In this project you will implement, enhance and integrate with ISET the underwater image simulation system described in the paper below.

Color image simulation for underwater optics

Mentors: Joyce Farrell and Henryk Blasinski

App for Programmable Camera in iOS / Android

We have a prototype programmable camera to be used with iOS or Android devices. The project's goal is to make an app that will run on iOS or Android and uses the camera. Think of an interesting camera app, and we can work together to build it. Prior experience in iOS or Android is needed.

Mentors: Steve Lansel and Munenori Fukunishi

Image classification with a five band camera

Recently image classification and object recognition have become very popular topics. Large majority, if not all, algorithms use images acquired with traditional, three channel (RGB) cameras. The goal of the project is to evaluate the performance of the state of the art algorithms applied to images captured with a five band camera. Will the recognition/classification performance change, and if so by how much? To get the flavor of the project you can look at the following paper:

Multispectral SIFT for scene category recognition

Mentors: Henryk Blasinski, Steve Lansel

Analysis of a real camera lens

Can we characterize how a lens blurs a point of light (point spread functions or psfs) by analyzing camera images of test targets that are displayed on a color monitor? This project has many possible variations.

  1. Illuminate red, green and blue pixels on a display and capture an image of the display with a camera placed on a tripod a far distance away. Vary the pattern of red, green and blue pixels (e.g. noise pattern).
  2. Estimate the psfs of a real camera with a real lens by analyzing camera images of displayed targets. Use a prosumer digital camera and vary 1/f# and observer how the estimated psfs change.
  3. Estimate the psfs for different field heights, wavelengths and depths.
  4. Use the estimated psfs to predict camera images of other displayed "natural" images, such as a face. Compare the predicted camera images to actual camera images.

Here are links to papers that describe a method for empirically estimating the psf of a camera lens. The links include code that you can download


People: Brian Wandell, Andy Lin, Joyce Farrell

Psf analysis and image deblurring using a simulated camera lens

The point spread function(psf) of a lens is an extremely important lens property. One possible application of knowing the psf is image deconvolution (deblurring). Deconvolution can drastically improve image sharpness. The following paper provides a good technique for estimating a psf and deconvolving an image with that psf:


  1. Andy Lin will provide simulated camera images of several different types of spatial test targets. Your task will be to use the code from to estimates psfs from the simulated camera images.
  2. To evaluate how well the psf estimation code works, compare the estimated psfs to the known psfs that Andy used to generate the simulated camera images.
  3. As another evaluation technique, use the estimated and known psfs to "deblur" a blurred image containing a secret message using the deconvolution code downloaded from the same site. The secret message will only be legible after proper deconvolution of the image. Andy will provide this blurred image.

Mentor: Andy Lin

Medical imaging: Super resolution microscopy

Super resolution microscopy refers to methods that build up a high resolution image of target by integrating many multiple images of the target illuminated such that only a small subset of the image points are captured in any one image. The camera image then samples a subset of the pixels in a high resolution image. The location of the pixels in many camera images are combined to construct a single full high resolution image of the target. By placing a point at the center of each sampled point, one can get very accurate spatial information about the location (phase) of illuminated points in the target. Because the center of a dot is smaller than the lens psf, some people assert that super-resolution methods beat the limit of lens diffraction. But you know better than that. Diffraction is a limit that no earthly being can beat. Nonetheless, by sampling with stochastic and sparse arrays of pixels, one can do a better job of locating the center of sampled points and hence build up a higher resolution image.

You can write an ISET simulation to test one of these super-resolution methods.

Alternatively, you can test methods for super-resolution imaging using real camera images. For example, take a camera image of a displayed image, (such as a face or a high resolution test chart) . Then take a capture a series of images of the display when only a subset of the pixels in the face (or chart) are illuminated. The illuminated pixels in each subset will be far away from each other such that the optical images of the pixels illuminated in each image do not overlap. You can further experiment by taking a blurry image of a face (say, by setting the caemra 1/f# to 12). Then, display subsets of pixels of the face that are widely separated. Find the location of the center of each illuminated pixel and combine the data to create a non-blurred camera image.

Mentor: Brian Wandell, Haomiao Jiang

Eulerian video processing (Bill Freeman thing)

Repeat one of the experiments from Bill Freeman. There published paper could be found from

Also, you need to compare the results for cameras with 3 color channels (rgb) and with 5 color channels (prototype in our lab).

Scene 3D System

The goal of the Scene3D project is to simulate the complete imaging pipeline for 3D scenes, from the scene to the lens , to the sensor and to the mage processing. Simulations of sensor and image processing are implemented in ISET. The novel part of Scene3D involves using a technique in 3D graphics called ray-tracing, which produces a physically accurate simulation of light rays that are refracted through lenses and towards the sensor. We modified the PBRT ray-tracer to simulate the important effects of diffraction and to be able to handle complex lenses and multispectral inputs and outputs. The end goal of the Scene3D project is to provide an infrastructure for rapid image systems prototyping.

[Scene3D project]

One important aspect in photography involves color balancing. Often times, photographs taken under different illuminant conditions will produce images that don't appear natural. For example, images taken under indoor tungsten lighting will exhibit an unnatural yellow/orange tint. These images must be corrected for in order to appear natural.

This class project involves applying the camera pipeline simulation provided by the Scene3D infrastructure for use in designing a color-balancing algorithm.


  1. Start with a 3D radiance scene generated by Andy Lin. Modify the parameters of the scene to make different renderings of the scene under different light conditions.
  2. Design and implement an intelligent method for "correcting" (color-balancing) the illuminant.
  3. (Challenge/Optional) Design a color balancing method that is able to correct for scenes with 2 or more different illuminants.

Mentor: Andy Lin

PBRT and Zemax optics modeling

Scene3D use a combination of PBRT and ISET to simulate the complete imaging processing pipeline of a digital camera. The unique contribution of Scene3D is that it applies a technique in 3D graphics called ray-tracing, to produce a physically accurate simulation of light rays as they are refracted through lenses and towards the sensor. We modified the PBRT ray-tracer to simulate the important effects of diffraction and to be able to handle complex lenses and multispectral inputs and outputs. However, we have yet to verify this pipeline empirically.

One way we plan to evaluate our modifications to PBRT is to compare the point spread functions we generate with point spread functions generated by Zemax, a well-established software package used by many optics professionals. We provide a Zemax macro that can be used to generate the PSFs that ISET needs. Although Zemax can produce physically accurate PSF's, it cannot produce rendered physically accurate 3D multispectral images like PBRT.


  1. This project would involve taking several PBRT multi-element lens models, and creating equivalent Zemax models.
  2. Use the Zemax to ISET interface to generate the data necessary for the ISET simulations.
  3. PSF's using the PBRT model will be provided as ISET optical images. We provide a Zemax macro that can be used to generate PSF for lenses that are modeled in Zemax.
  4. Compare and analyze the PSF's produced by these two different methods under different aperture and distances as verification.

Experience with Zemax is preferred.

Mentor: Andy Lin

Gesturing in a Virtual 3D space

The Holografika multi-projector display system creates a 3D light field that people can view without the need for special googles. Leap Motion is a controller that can sense small finger movements using an infrared led and camera. We linked these two devices so that users can grasp and move virtual objects in the 3D light space created by the Holografika display. We also linked the Leap Motion to a conventional stereoscopic display that uses an LCD with shutter goggles. The goal of this project is to compare how well users can use the hand-gesture controller to move objects in the virtual 3D spaces created by the two different types of displays.

The project has possible variations. - You can find a suitable OpenGL app or game from the Leap Motion Airspace app store that measures agility to quantify the learning rates of new users. The objects floating in front of the Holografika display will be aligned to the users hands in that 3D space, but not so with the flat LCD display. Possibly include mouse mode in the tests. -Using a 3D top-down street view map of London, test users skills at finding a location by panning and zooming a holographic 3D map of London on both kinds of displays, using hand gestures. Does the user's self-reported confidence correlate to measured performance and how does display type affect that? Use the metrics to predict the actual benefit for different kinds of organizations to transition from mouse control to (hands in air) gesture devices with 2D and Holographic displays.

The equipment is calibrated and available in Packard 070.

Here is a link to the companies involved: and

You can watch a video of the talk by the inventor of Holografika (Tibor Balogh) at

Mentors: Dave Singhal, Harlyn Baker, Peter Kovacs

Projects 2013

Camera Forensics

You are presented with a digital image and asked to determine if it has been manipulated and if so to localize the manipulation in the image. Color filter array (CFA) interpolation generates a tell-tale signature in a digital image that can be used in a forensic setting. CFA interpolation leads to strong correlations between a specific subset of pixels and their spatial and chromatic neighbors. Build a classifier that takes as input a digital image and automatically detects which parts of an image do and do not exhibit the expected CFA correlations. Begin by generating a synthetic set of test images that have undergone your choice of CFA interpolation. Test your forensic analysis on these uncompressed images and then quantify the efficacy of your approach on increasingly more JPEG compressed images. Disputes often erupt over the provenance of photos. Consider how you might use your new forensic technique to distinguish between images taken from different types of cameras (e.g., a Canon PowerShot vs. a Nikon D-series).


  1. A Survey of Image Forgery Detection
  2. Exposing Digital Forgeries in Color Filter Array Interpolated Images


  1. We provide you with training images
  2. You develop the classifier based on the papers
  3. We provide you with test images to see how you did

Image Forensics

You are presented with a JPEG image and asked to determine if it originated directly from a camera/mobile device, or if it was re-saved one or more times. Multiple compressions at different compression levels leave behind specific statistical artifacts in the distribution of DCT coefficients. These artifacts can be used to distinguish between singly and multiply compressed images. Build a classifier that can distinguish between singly and doubly compressed images (assume that the second compression level is different than the first). Validate your classifier on a large data set of images. Quantify the conditions under which the classifier is effective and not. Extend your classifier to distinguish between one, two, and three compressions. The expert forger becomes aware of your forensic technique and writes a special purpose encoder that will re-save a JPEG image with the same compression quality as the original. Consider how you might counter this by detecting multiple compressions made with the same compression setting.


  1. A Survey of Image Forgery Detection
  2. Statistical Tools for Digital Forensics


  1. We provide you with training images
  2. You develop the classifier based on the papers
  3. We provide you with test images to see how you did

Turbulence removal

X. Zhu and P. Milanfar, "Removing Atmospheric Turbulence via Space-Invariant Deconvolution" IEEE Trans. on Pattern Analysis and Machine Intelligence vol. 35, no. 1, pp. 157-170, Jan. 2013

Also see related talk and Project page


  1. You obtain by measurement or simulation example images and then use their methods.
  2. You develop a variant of their method, exploring deconvolution, registration, or some other part of the algorithm more deeply than in the original paper.
  3. You find another approach and compare that approach to this one.

Photon calculator utility (ISET)

Build a program, perhaps based on the ISET library, that calculates the spectral irradiance at the sensor from the scene radiance and a specification of the optics. Doing this for diffraction-limited optics, specifying only the f/#, is sufficient.

The utility should be backed by a wiki page that illustrates all of the steps in doing that calculation. This project should produce an educational and useful calculator.

  • Doing an implementation that can run on a browser on the Internet is best.
  • Doing a straight Matlab implementation with a nice GUI is also good.
  • Implementing the ISET (Matlab) routines as a Python calculator has value, as well.

Updating Wikipedia

Help us make Wikipedia better. There are surprisingly many Wikipedia entries on imaging and human vision that are just a few sentences long. Look-up for example: 'Troland', 'Stiles-Crawford effect', 'Photopic vision', 'Human PSF' or 'Active Pixel Sensor' to see how poor these entries are. Your mission, should you choose to accept it, is to improve these (or other) entries. Think of your work as of a paper, which is published online, rather than in a .pdf format. Of course, just as with writing any research paper, your work should start with a thorough literature review, select the relevant pieces of information and write them up in a way approachable to a non-expert in the field.

Neuroimaging (special approval)

With the opening of Stanford's Center for Cognitive and Neurobiological Imaging (CNI), we now have access to a large number of MR scans of the human brain. We are also closely connected to the MR hardware and image processing algorithms.

While this course is not specifically about neuroimaging, some of the methods in the course might be usefully applied to the data collected at the CNI. For students already working in MR and interested in such signal processing, we might be able to develop some projects that build on your interest.

Two possible projects are algorithms to:

  • Identify when two MR images are of the same brain (brainprint), even if they were acquired using different contrasts.
  • Evaluate image quality and MR artifacts

Scene database for computer vision testing (special approval)

Build scenes, say using Blender and PBRT, that we can run through the ISET simulation to produce images. Then analyze these calibrated scenes using computer vision algorithms to derive the depth, illumination, and shading. See this example page for folks who created a database from real, rather than simulated, scenes.

Color balancing (special approval)

Color balancing refers to the process of converting camera rgb data into display rgb values. If one simply copies the sensor pixel values into the display values, the resulting image will not generally be a good color representation of the original scene. An important step in the image processing pipeline is to transform camera rgb values to display values such that the display image appears to match the original scene that was captured.

A simple and common approach to color balancing is to make an educated guess about the scene illumination based on an analysis of the camera rgb values. The estimated illuminant is used to select a color transform (typically a 3x3 transform or a look-up table) that maps camera rgb values into human sensor (xyz) values for an ideal illuminant, such as daylight. The goal of this transform is to render the scene that the camera captured as if the scene were illuminated by daylight.

Most camera processing pipelines use a standard illuminant called D65 as the ideal rendering illuminant. As far as we know, no one has tested the assumption that people prefer to view objects illuminated by D65. The preferred rendering illuminant may also depend on the objects that are being rendered..

The project will use hyperspectral data of faces, fruit and vegetables and outdoor scenes, and spectral power distributions of different illuminants to generate images that people will view on calibrated displays. People will be asked to indicate which color renderings they prefer. In this way, we will collect preference data about preferred rendering illuminants. The preference data will provide a useful guide for engineers who are designing color balancing methods.

Hyperspectral video (special approval)

Help us build and evaluate a hyperspectral video system based on led lights synced with a monochrome video camera. Capture hyperspectral video images of human faces and estimate pulse rate by the change in color sensor values over time (see

Biology of the mouse eye image formation (special approval)

There is a huge amount of biology done in mouse. There is a movement to study the mouse retina in particular. To study the retina, we would like to be able to understand how the cornea lens in the mouse blur the retinal image.

Adaptive optics to the rescue: Williams and his colleagues analyzed the optical quality of the mouse eye. Specifically, they measured the wavefront aberrations from 20 wild type mice. They provide the data in their article.

Optical properties of the mouse eye

Brainard, Hofer and I have written a wavefront toolbox in Matlab that enables us to specify the wavefront aberration and calculate retinal images in ISET. This project is to use our software to reproduce Figures 10 and 11 from the paper.

You can do this! If you do, many people will cite your project because there are many people who work on mouse.

Active LED-based illumination (special approval)

These days LEDs can produce high intensity light with well defined spectral properties. We are interested in a hardware system that allows to control both the on/off times of a set of LEDs, as well as their intensity using a simple Arduino microcontroller. One way you can do this, and we have a working prototype (refer to this project), is to use pulse width modulated signals to control the duty cycle of an LED. If you operate at high enough frequency, then you will perceive the rapidly flickering LEDs as having lower or higher intensity. In this project, however, we are interested in controlling the LED intensity more directly, so that even at the micro-time scale you control the LED intensity directly, rather than switch it on/off.

Projects 2012

Image processing

Hyperspectral Imaging

Analysis of hyperspectral images of paintings by famous artists

Consumer digital cameras capture electromagnetic energy in three different spectral bands. Multispectral and hyperspectral cameras capture electromagnetic energy in many more spectral bands. We used two different hyperspectral cameras to capture images of several paintings in the Cantor Arts Museum. One of the cameras captures images in 160 different spectral bands ranging between 400 and 1000 nm (visible and near-infrared or VNIR). The other camera captures images in 256 different spectral bands ranging between 1000 and 2500 nm (short-wave infrared or SWIR). There is a very large literature on hyperspectral imaging of paintings that we will use to guide our analysis of the data we have already collected. ( In particular, we should be able determine if there is a drawing beneath the painting (an “underpainting) and to characterize the paint pigment. This analysis will allow us to determine the history of the painting and assess its originality. We hope that this project will serve as the groundwork for an exhibit at the Cantor Arts Museum. (JEF and TS) Here is a nice website that describes methods used in art forensics (

Analysis of hyperspectral images of live organs during surgery

Several research labs are investigating the advantages of hyperspectral imaging in robotic surgery. This is because hyperspectral cameras can capture a wider range of spectral data, including electromagnetic energy that the human eye cannot see. One of the challenges is how to map information that is normally invisible to surgeons onto visible images that enhance the ability to discriminate between different tissue types in a meaningful way. We have collected hyperspectral images of organs in a live pig during surgery. This project will analyze this data to determine if information in the invisible regions of the electromagnetic spectrum (> 700nm) can be used to enhance the information that surgeons see during an operation. (see ) (JEF and TS)

Colorimetric reproduction of human faces

We collected VNIR (160 narrowband spectral images ranging between 400-100 nm) hyperspectral images of human faces, outdoor scenes, still life (fruit) and paintings. The hyperspectral image data can be used to generate a representation of spectral reflectance of the objects in a scene and the spectral power of the scene illumination. These representations can, in turn, be used as input to the ISET digital camera simulation software. ISET can then be used to predict the output of digital cameras with different color channels. For example, one can simulate a digital camera with three or more color channels, and vary the spectral sensitivities of each of the color channels. One can also vary the spatial distribution of those channels. Finally, one can vary both the demosaicking and color balancing algorithms in the digital camera. This project provides an excellent tutorial on how a digital camera works and gives you the opportunity to develop your own color imaging processing algorithms. (JEF and TS)

References and Web Links

Novel detectors for RGB and NIR

Applications, devices and algorithms for the separation of visible and near infrared signals in monolithic Si sensors


Using NIR to enhance visible data

Several Susstrunk lab papers. Some others.

SIFT on visible and NIR

NIR Flash

- NIR flash

Photo retouching metric (Kee and Farid)

A perceptual metric for photo retouching Eric Kee and Hany Farid

Department of Computer Science, Dartmouth College, Hanover, NH 03755 October 19, 2011 (received for review July 5, 2011)

In recent years, advertisers and magazine editors have been widely criticized for taking digital photo retouching to an extreme. Impossibly thin, tall, and wrinkle- and blemish-free models are routinely splashed onto billboards, advertisements, and magazine covers. The ubiquity of these unrealistic and highly idealized images has been linked to eating disorders and body image dissatisfaction in men, women, and children. In response, several countries have considered legislating the labeling of retouched photos. We describe a quantitative and perceptually meaningful metric of photo retouching. Photographs are rated on the degree to which they have been digitally altered by explicitly modeling and estimating geometric and photometric changes. This metric correlates well with perceptual judgments of photo retouching and can be used to objectively judge by how much a retouched photo has strayed from reality.

Visibility of movie subtitles

A persistent problem in watching foreign movies is that sometimes the subtitles are illegible. Why? Because the contrast of the default background that is assumed is wrong and you have white characters on a light background. I assume this is done automatically because it is too expensive to have people judge frame by frame whether the script is visible. Need I say more. Some automated system that could assess the brightness of the standard background space where subtitles are printed and then adjust the contrast to be legible would be a huge improvement for the industry.

E. Markman, a committed viewer of subtitled films.

Image Quality

3D Image Quality Metrics

Develop algorithms for Shooting in 3D and Displaying in 2D. Explore ways in which to improve 2D rendering of 3D content in order to enhance “immersive video”.


Wavefront Toolbox


Advances in adaptive optics now make it possible to measure the wavefront aberrations of the living human eye. Many groups are making these measurements in both control subjects and subjects with different types of optical dysfunctions.

These aberrations are usually specified in a way that is difficult to apply to image processing: The aberrations are specified as the weights on a set of Zernike polynomials. It is a simple matter of programming to convert these polynomial weights to a point spread function that can be applied in image processing algorithms.

We have received software from experts on this topic that implements the conversion. We can probably obtain a large number of samples of measurements from different categories of human eyes. In this project, we would create a web-site to convert the Zernike polynomials to point spread functions and illustrate how those pointspread functions would influence the quality of the optical image falling on the retina.

As we accumulate additional summaries of the human measurements, we might look for statistical patterns that might be explained in terms of the biological properties of the human cornea and lens.

See: Chromatic and wavefront aberrations: L-, M- and S-cone stimulation with typical and extreme retinal image quality
Florent Autrusseau, Larry Thibos, Steven K. Shevell
Vision Research 51 (2011) 2282–2294

Integrating 3D Distributed Ray Tracing and Image Quality

(BW), (AL), (JEF)




Implement and test Nayar Generalized Patent

Read the patent and implement tests of the idea.



(AT), (AM), (RFD), (GS)

With the opening of Stanford's Center for Cognitive and Neurobiological Imaging (CNI), we now have access to a large number of MR scans of the human brain. We are also closely connected to the MR hardware and image processing algorithms.

While this course is not specifically about neuroimaging, some of the methods in the course might be usefully applied to the data collected at the CNI. For students already working in MR and interested in such signal processing, we might be able to develop some projects that build on your interest.

  • Intelligent compression algorithm for multi-channel image data stored in frequency space (p-file compression)
  • Algorithm to classify volumes that contain brains in a database of MR images that includes phantoms, squash, fruits, etc. (brain detector)
  • Algorithm to identify when two MR images are of the same brain (brainprint), even if they were acquired using different contrasts.
  • We can also do another one on MR artifact detection (so many artifacts, so few projects...)

Suggestions and projects from previous years

Web page of Project Ideas for 2011
Web page of Project Ideas for 2010
PDF of Project Ideas in 2008

To see projectsfrom previous years, visit SCIEN Class Projects Page.