Evaluation Pipeline with GenAI-Assisted Algorithm Development for Virtual Image Denoising and Pixel-Defect Correction: Difference between revisions

Revision as of 06:40, 9 December 2025

Introduction

Motivation for an Evaluation Pipeline for Image Processing

AI-generated image-processing scripts vary widely in quality, making it difficult to determine which versions are reliable for real-world applications. As large language models (LLMs) become increasingly integrated into algorithm development workflows, the need for systematic evaluation becomes increasingly critical. Different prompts or model versions can produce inconsistent algorithm logic, resulting in reproducibility challenges that undermine confidence in AI-assisted development [1].

Standardized benchmarking improves the efficiency of comparing LLM-generated algorithms across various tasks, including denoising, pixel-defect correction, region of interest (ROI) reconstruction, and enhancement. Without a robust evaluation framework, researchers and engineers must rely on slow, manual inspection to validate algorithmic variants, which is a process that significantly extends development cycles and introduces subjective bias.

To measure true performance across diverse conditions, a robust evaluation pipeline must account for a variety of scenes, defect patterns, noise levels, and lighting conditions. This systematic approach accelerates development cycles by automatically validating and ranking algorithm variants, enabling data-driven decisions to determine which implementations merit further refinement or deployment.

Advantages of Generative Artificial Intelligence for Algorithm Development in Image Processing

Generative Artificial Intelligence (GenAI) may be particularly advantageous for developing algorithms to handle images with challenging noise conditions and complex patterns, as well as proposing context-aware methods to reconstruct ROI impacted by defective pixels. GenAI-assisted scientific programming using LLM can expedite the development of denoising and defect-correction image processing pipelines. The development of sophisticated algorithms in traditional image processing pipelines requires extensive denoising, calibration, and defect correction capabilities; a variety of factors, such as noise model selection, tuning and filtering parameters, and validation using image quality metrics, must be accounted for. Multiple algorithmic variants can be developed and tested in parallel to speed up the development phase and vet which models are most promising for the desired image processing application [2].

With appropriate prompts, LLM-aided code generation can facilitate sensor characterization by testing various denoising assumptions, simulating images impacted by different forms of defect pixels for a more diverse and larger sample size for testing, as well as executing extensive parameter sweeps to evaluate their influence on image quality metrics. The relatively widespread access to GenAI tools, such as ChatGPT, would enable a broad audience to conveniently use available LLM resources for improving image quality by the GenAI-assisted development of sophisticated image processing algorithms [3].

Applications that Benefit from a Reliable Evaluation Pipeline

In recent years, the photography and imaging industry has undergone a rapid transformation driven by the integration of artificial intelligence (AI) into both camera hardware and post-processing workflows. No longer limited to traditional image-signal-processing (ISP) pipelines or manual editing in desktop software, modern camera systems increasingly leverage neural networks, on-device NPUs, and deep-learning algorithms to enhance image quality, reduce noise, stabilize scenes, and even reconstruct detail; this is often executed in real time or shortly after capture. Camera makers, including Nikon, Canon, and Sony, all utilize AI autofocus systems in their latest mirrorless cameras, with features such as face and eye detection, sports autofocus, subject detection, and scene recognition [4].

The development of a robust evaluation pipeline serves multiple domains where image quality is critical. In consumer photography, reliable algorithms ensure consistent enhancement across diverse shooting conditions. Scientific imaging applications, ranging from microscopy to astronomical observation, require validated processing methods in which accuracy is critical for drawing research conclusions. Moreover, computer vision systems depend on high-quality input images for tasks such as object detection, segmentation, and scene understanding. As AI reshapes the entire imaging pipeline, from sensor readout to final edits, the ability for systematic evaluation and comparison of image processing algorithms is crucial for enabling practitioners to select methods appropriate for their specific requirements, as well as account for balancing factors such as processing speed, accuracy, and robustness to various degradation types.

Background

Significance of Key Scene and Camera Parameters on Image Quality

Luminance

Luminance corresponds to the area light scattered from an extended source in the oriented surface’s direction; it can be used to measure the surface brightness from a specific viewpoint as it appears to the human eye. Luminance relates to the number of photons entering the camera and can directly influence the signal-to-noise ratio (SNR) and therefore impact image metrics, such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [5].

Exposure Control

Exposure settings must be set such that electron generation in the dark regions of an image will be sufficiently above the sensor’s noise floor, and the bright regions in the image will not exceed the full well capacity of the pixel. This is also crucial for ensuring that the dynamic range of the scene can fit within the dynamic range of the sensor. Therefore, optimizing exposure settings is crucial to minimize excessive noise or saturation. Both lens aperture and exposure duration are primary factors that influence exposure control. Very short exposure times risk insufficient collection of electrons in these dark regions, which can be beneath the noise level. A long exposure duration allows more time for photon collection and conversion to electrons. However, very long exposure times risk saturating the floating diffusion node, in which the scene intensities are recorded at the same maximum value. This occurs when electron generation in the bright region’s pixels is greater than the storage capability of the pixel. The exposure value depicted below defies how the aperture size and exposure time influence the image brightness. This relationship reveals that a shorter exposure time or a narrower aperture reduces the light reaching the sensor. Moreover, increasing the exposure value is necessary to prevent overexposure when the scene brightness increases [6].

$E V = \log_{2} (\frac{T}{F^{2}}) = 2 \log_{2} (F) - \log_{2} (T)$ [6]

where EV = exposure value, F = F-number, T = exposure time in seconds

F-number (f/#)

The F-number, as defined below, is the ratio of the focal length to the diameter of the aperture. It is a crucial parameter for thin lenses. A larger F-number for a given focal length corresponds to a smaller aperture diameter but increased depth of field; this would further limit the light that arrives at the sensor and be more resemblant of a pinhole camera. By comparison, a wider aperture enables the sensor to capture more photons [7].

$F-number = f / # = \frac{focal length}{aperture diameter}$

Fill factor

The fill factor, ranging from 0 to 1, corresponds to the portion of the photosensitive pixel area. This parameter sets the minimum possible noise level and constrains the SNR in low-light conditions. A higher fill factor increases the light sensitivity, since more of the pixel area is used for light detection. The architecture of stacked sensors provides separation of the photodiode layer from the readout and processing circuitry; this not only maximizes the fill factor for each pixel but also allows for advanced, lower-noise circuitry to be integrated into the sensor without the tradeoff in pixel size [6].

Significance of Photon and Read Noise on Image Quality

Photon noise, inherent to light measurements, is considered a dominant source of image noise in most conditions, except in low-light scenes where read noise dominates. Photon noise is also considered the least influenced by camera hardware technology improvement. The arrival of photons at each of the sensor's pixels and photon noise can be described by the Poisson distribution, in which the mean equals the variance, since within each small time interval, the probability of photon arrival is constant. When images are captured at a fixed exposure duration and aperture size, the number of photons collected by the sensor will be directly proportional to the ambient lighting level. Therefore, both the photon noise and SNR will increase proportionally to the square root of the ambient lighting level. At dim illumination, fewer photons are captured by the photodiode. The inherent Poisson noise indicates that the collection of fewer electrons will also result in a lower SNR. Although the absolute noise can be lower in the case of dim illumination, the SNR can worsen when certain pixels collect fewer electrons than the noise level, resulting in a noisy image [6], [8].

Read noise is independent of illumination and is the random electronic noise that arises during the readout process when the pixel’s stored charge is converted to a digital value. It can become the dominant source of noise that constrains image quality, especially in low scene illumination, in which fewer photons are captured by the sensor. There is a negative correlation between read noise and dynamic range, in which lower read noise will decrease the noise floor and correspond to an increased dynamic range [9].

Significance of Pixel Defects on Image Quality

Pixel defects, commonly categorized as dead, hot, or stuck pixels, pose a significant challenge to modern image sensor performance because they introduce localized impulse noise that disrupts both pixel-level accuracy and global perceptual quality. They appear as bright, distinctive dots on the image, and even at low defect densities, these anomalies can degrade key metrics such as PSNR, SSIM, and MTF by introducing intensity discontinuities that propagate through demosaicing, denoising, and compression pipelines; this ultimately leads to a reduction in edge fidelity and scene detail. Defective pixels severely impact sensors regularly exposed to high levels of light, electrical energy, or radiation, leading to high rates of pixel corruption [10]. As pixel sizes continue to shrink in advanced CMOS sensors, the relative influence of individual defective sites increases, making robust defect detection and correction essential for maintaining image quality in consumer and scientific imaging systems.

Evaluation of Image Quality Metrics: PSNR, SSIM, and MTF50

The PSNR is derived from the mean square error (MSE) and measures pixel-level error. It corresponds to the ratio between the maximum pixel intensity and the power of distortion. A higher PSNR value corresponds to higher image quality. As the MSE approaches zero, the PSNR value approaches infinity. By comparison, low PSNR values indicate large numerical differences between the reference and test images. The SSIM metric accounts for factors of luminance, contrast, and local image structure. For SSIM, pixel intensity patterns are considered structures after both luminance and contrast are normalized. It provides a local quality score that better aligns with human visual perception than the PSNR metric. Studies have shown that both PSNR and SSIM are particularly sensitive to noise degradation [11], [12].

$M S E = \frac{\sum_{m, n} {[I_{1} (m, n) - I_{2} (m, n)]}^{2}}{M N}$ [13]

$P S N R = 10 \log_{10} (\frac{R^{2}}{M S E})$ [13]

$S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}$ [14]

The MTF50 (Modulation Transfer Function at 50% contrast) metric quantifies spatial resolution and sharpness by measuring the spatial frequency at which contrast drops to 50% of its maximum value, as defined in the ISO 12233 standard for electronic still-picture imaging resolution and spatial frequency responses [15].

Methods: Evaluation Pipeline for Image Processing

The evaluation pipeline developed in this project provides a comprehensive framework for systematically assessing image processing algorithms using ISETCam as the supporting package [16]. The pipeline incorporates sweeps of scene and camera parameters, varied noise models, and various pixel defect modes (hot, dead, and mixed) across a range of defect percentages. It generates trend plots for ∆PSNR and ∆SSIM, box plots, and percentage improvement metrics, SNR profiles for each RGB channel, MTF curves, and MTF50 trend plots to quantify spatial resolution performance. The system supports multiple trials per setting for robust statistical analysis and includes visualization capabilities such as SSIM error maps and GIF generation comparing the golden standard, defective, and corrected images. Additionally, the pipeline offers flexibility by supporting both user-defined scenes through file path selection and built-in standard testing charts, enabling evaluation across diverse image content and standardized targets. The two key phases used in establishing the evaluation pipeline in this investigative study are shown in Fig. 1.

Phase 1: Identification and Evaluation of Key Parameters and Challenges. A systematic study using ISETCam was conducted to assess how sweeping parameters listed in Table 1 influence image quality across different noise models and pixel defect types. Assessment metrics included PSNR to compare the rendered image to a noiseless or defectless reference based on MSE as well as SSIM, to evaluate the perceived visual quality and similarity between the images. MTF50 is also evaluated as a function of pixel defect percentage.

Phase 2: Testing Evaluation Pipeline. MATLAB Simulation case studies are performed with ISETCam to evaluate the performance of GenAI assistance in algorithm development for virtual image quality enhancement, with the objective of denoising and pixel defect removal. This image quality improvement would traditionally require a camera hardware upgrade or is inherently limited by environmental conditions during scene capture.

Table 1. Parameter sweep intervals for image quality evaluation
Parameter	Units	Sweep Interval [LSL USL]	Purpose
Luminance	cd/m²	[50 5000]	Lighting level (brighter or dimmer)
Exposure Time	seconds	[0.001 0.5]	Controls the number of photons captured by the sensor
F-Number (f/#)	-	[1.4 16]	Related to the optical aperture diameter for a given focal length
Fill factor	-	[0.2 0.8]	Photosensitive fraction of pixel area

Evaluation of the Influence of Parameter Sweep for Varied Noise Models and Pixel Defect Types on Image Quality Metrics of PSNR and SSIM

A MATLAB script was developed to perform a parameter sweep for luminance, exposure time, f/#, and fill factor. Each parameter sweep was conducted independently to analyze its effect on PSNR and SSIM for a Macbeth reflectance chart illuminated by D65 with 600 x 600 pixels. Parameter sweeps at three different noise models were evaluated by setting the noise flag to the following: (1) no noise, (2) photon noise only, and (3) the combination of photon and read noise. This evaluation allowed for a quantitative analysis to determine which imaging parameters have the strongest influence on image quality metrics across the different noise models. Moreover, the same parameter sweep analysis is performed across three pixel defect types (hot, dead, and mixed) at a fixed pixel defect percentage of 0.5%. In addition, MTF50 analysis is conducted to evaluate the impact of pixel defects on the spatial resolution of a slanted edge scene.

Denoising Algorithm Development through GenAI-Assisted Script Generation

Non-Local Means and Smoothing Filter Operations with RGB Channel Processing for Maximizing PSNR and SSIM

Prompts were given to GenAI through ChatGPT to generate MATLAB scripts for image enhancement, denoising images impacted by photon noise only or a combination of photon and read noise across varied parameter settings. The image files had file names that specified the parameter sweep condition and noise model type; the script references the specified folder with the original image files for the denoising process. Several iterations of prompt refinement were necessary to create a more robust denoising script that achieved better image quality. One of the refined prompts updated the algorithm’s range of smoothing values for denoising strength to account for edge preservation and prevention of image blurriness. The final denoising script also adapts the algorithm to each image’s unique conditions (e.g., parameter sweep settings and noise model type) to achieve the highest PSNR and SSIM.

As shown in the flow diagram in Fig. 2, a non-local means (NL-means) filtering was applied in the denoising algorithm with RGB channel processing, such that each pixel is filtered based on weighted averages of patches across the image with similar patterns. To preserve the image fidelity and spatial structure, each color channel is separately denoised to prevent tentative cross-channel color artifacts [17]. Moreover, the smoothing parameter is optimized to maximize the PSNR and SSIM relative to the noiseless reference image to ensure the optimal tradeoff between noise reduction and fine detail preservation. Increasing the degree of smoothing will heighten the aggressiveness of denoising but at the tradeoff of blurred details in the image. By comparison, decreasing the degree of smoothing will allow for better preservation of the edges and fine details but can result in poorer noise removal [18].

The denoising algorithm effectively produces a parameter-aligned, denoised image for each of the swept sensor settings. After the denoising procedure is applied, the script takes in the denoised images for each noise model (photon noise only as well as the combination of photon and read noise) and compares them against the noiseless reference for each swept parameter setting. ΔPSNR and ΔSSIM were used to assess the improvement between the ChatGPT-denoised and original noisy images.

Laplacian-Based Denoising Algorithm for Read Noise Reduction

Read noise is independent of exposure time and is attributed to the inherent electronic noise of digital cameras; it becomes amplified and converted to a digital signal during the readout process of the pixel charge. As shown in the flow diagram in Fig. 3, ChatGPT was used to develop a denoising script that applies a Laplacian filter and estimates the read noise level per L, a, b channel [19]. The image file of the noisy Macbeth reflectance chart is loaded in and converted to the LAB color space. The Laplacian-based estimation is then performed, followed by NL-means denoising of the channels. The degree of NL-means denoising for smoothing requirements is adapted based on the swept read noise conditions; this allows a balance between texture preservation and overblurring for noise removal. Prompt refinement also resulted in updating the script for automatic patch grid detection for the selection of region of interest (ROI) that enables patch-level SNR analysis. Following conversion from the denoised lab back to RGB, the patch-level metrics for SNR are computed. This is followed by the plotting of the patch-level SNR comparison for each channel and exporting of the image files for the noisy versus denoised Macbeth reflectance charts.

Adding Dead, Hot, and Mixed Pixel Defects to an Ideal Sensor

A MATLAB script is developed to artificially inject a specified percentage of pixel defects into a defect-free sensor at random locations. There are three modes of pixel defects. A dead defect mode means all pixel defects have the voltage stuck at 0V, a hot defect mode means all pixel defects have the voltage stuck at the highest level (1V), and a mixed defect mode has half dead and half hot defect pixels. The updated normalized voltage level across pixels is plotted in Fig. 4 to compare different pixel defect modes with the golden sensor. It can be seen that some pixels are reaching either 0 or 1, indicating the different pixel defect types.

Pixel Defect Removal Algorithm Development through GenAI-Assisted Script Generation

Fig. 5 demonstrates the iterative development process of a GenAI-generated pixel defect correction algorithm for MATLAB. Starting with a basic prompt to "remove pixel defect," successive iterations were generated by providing feedback on specific issues, such as the image appearing darker than the golden standard, excessive remaining pixel defects, and poor spatial resolution and sharpness. Each iteration at 1.000% defect ratio shows progressive improvement in matching the golden standard reference image, though all versions still exhibit visible artifacts. The final version used in the analysis achieves the best PSNR and SSIM improvements for a defect rate below 0.1%. Its algorithm flowchart is annotated in Fig. 6. with key MATLAB functions.

The algorithm first performs per-channel local median and median absolute deviation (MAD) filtering over a 5x5 pixel neighborhood: for each color channel, the median intensity of the 5x5 window is computed, and the MAD is then computed as the median of the absolute deviations from that median. This method is well established in imaging and remote-sensing contexts for bad-pixel detection because the median and MAD are far less sensitive to extreme outliers than the mean and standard deviation [20]. MAD is defined as the median of the absolute deviations of the data points from the median of the dataset, and is calculated using the following formula:

${MAD}_{i j} = median (| X_{i j} - median (x_{i j}) |)$ [21]

Next, a per-channel robust z-like score is computed for every pixel in each channel (absolute difference from the median divided by local MAD). Pixels exceeding a tunable threshold are marked as “suspect.”

To reduce false detection of legitimate image features, the algorithm then applies cross-channel consistency checks and magnitude heuristics [22]. First, it counts how many of the R, G, and B channels flagged the pixel and imposes additional absolute-difference thresholds. A “high-confidence” defect is defined as exactly one channel being flagged, and that channel has a strong absolute standard deviation ( > 0.08). A ”medium-confidence” defect is defined as either multi-channel moderate deviation (0.08 > std. dev > 0.05) or single-channel moderate deviation.

Additionally, the script removes pixels near the image border, excludes already-repaired candidates, and has edge-protection where it computes illuminance and runs an edge detector (Canny) for any pixels that are considered not high-confidence. Only pixels satisfying these combined criteria are accepted as defect candidates, thereby distinguishing sensor-induced defects from genuine image content.

After detection, the final step is to repair the defect candidates. Candidates are grouped by connectivity to decide the repair type: isolated single-pixel defects, or small clusters suitable for inpainting. Isolated pixel defects are replaced with a neighbor-weighted average over a 3x3 pixel neighborhood (excluding other flagged neighbors), while small clusters are filled per channel via inpainting [23]. Image inpainting is the technique of filling-in the missing regions and removing unwanted objects by region-fill diffusing pixel information from neighboring pixels.

Results

Parameter Sweep Analysis on Image Metrics

Performance Evaluation of ChatGPT Denoising Algorithms

Denoised Rendered Images for Varied Noise Models for Optimal PSNR and SSIM with NL Algorithm

As shown in Table 2, image enhancement through a ChatGPT denoising algorithm was evaluated by the image metrics of calculating ΔPSNR and ΔSSIM, as well as percent improvement in PSNR and SSIM between the denoised and noisy images for the swept parameter settings. During this denoising assessment, to isolate the noise-attributed effect, the noiseless reference uses the swept parameter setting that matches the denoised and noisy images when calculating PSNR and SSIM.

Following the denoised image processing for both noise models, ∆PSNR and ∆SSIM were non-negative for all four parameter sweeps. As shown in Fig. 15 and Fig. 16, the NL-means denoising algorithm improved the original noisy image’s PSNR and SSIM to varying degrees, displaying the widest range of ΔPSNR and ΔSSIM distributions in the case of the exposure time parameter sweep. The general trend revealed that the denoising algorithm resulted in an average 6.5 to 11% and 4.3% to 6.7% improvement in PSNR and SSIM, respectively, for images with the photon and read noise model. By comparison, there was a wider range of 1.6 to 9.5% and 0.6 to 3.9% improvement in PSNR and SSIM, respectively, for images with the photon noise model.

Table 2. Improvement in PSNR and SSIM for Parameter Sweep and Varied Noise Models
Noise Model	Parameter	ΔPSNR [mean ± std. Dev.] (dB)	ΔSSIM (mean ± std. dev.]	PSNR Percent Improvement (%)	SSIM Percent Improvement (%)
Photon & read noise	Luminance (cd/m²)	2.639 ± 0.266	0.0171 ± 0.0002	6.92	1.75
Photon & read noise	Exposure Time (s)	3.151 ± 0.432	0.0659 ± 0.0683	10.99	9.52
Photon & read noise	F-number (f/#)	2.481 ± 0.437	0.0157 ± 0.0029	6.51	1.61
Photon & read noise	Fill Factor	2.636 ± 0.380	0.0174 ± 0.0002	6.92	1.79
Photon noise	Luminance (cd/m²)	1.758 ± 0.353	0.0058 ± 0.0001	4.44	0.58
Photon noise	Exposure Time (s)	2.296 ± 0.797	0.0345 ± 0.0398	6.70	3.94
Photon noise	F-number (f/#)	1.767 ± 0.303	0.0055 ± 0.0006	4.45	0.56
Photon noise	Fill Factor	1.682 ± 0.274	0.0057 ± 0.0002	4.27	0.57

Exposure time controls the quantity of photons and therefore electrons collected, in which a higher signal should result from an increased number of collected photons at higher exposure times. At the low exposure times, read noise dominates, and there is a low-moderate photon noise variance which follows the Poisson distribution; the image appears dimmer and has a grainier texture. As shown in Fig. 17, both noise models showed the strongest trends for ∆PSNR and ∆SSIM improvement across the exposure time sweep, as well as a sharp, monotonic exponential decay-like trend in SSIM improvement at high exposure times. For the noise model accounting for both photon and read noise, the PSNR improvement peaks at 17 to 18% at the lowest exposure time and then steadily declines as the exposure time increases. The lower baseline noise at greater exposure is a possible source for the lower PSNR improvement when denoising images with both photon and read noise at the increased exposure time.

Amongst the swept luminance levels, the approximated mid-range of 1500 to 2500 cd/m² showed the greatest improvement for PSNR, as shown in Fig. 18. At the extreme ends of the luminance interval, ∆PSNR decreased. There was relatively stable SSIM improvement, trending quite consistently at an average of 1.75% for the noise model accounting for both photon and read noise and 0.58% for the noise model with photon noise only. SSIM remains largely insensitive to the swing in luminance values.

The trends shown in Fig. 19 reveal a non-monotonic concave-down function across the f/# sweep for the noise model accounting for both photon and read noise, in which the highest PSNR and SSIM improvement occurred at moderately wide apertures and a negative correlation between f/# and ∆SSIM was observed for this noise model. A smaller aperture at higher f/# corresponds to less light being collected, which can result in a lower SNR. There is a relatively stable SSIM improvement across the f/# interval of f/4 to f/16 for the noise model accounting only for photon noise.

As shown in Fig. 20, in the noise model case accounting for the combination of photon and read noise, there was an overall gradual increase in ∆PSNR as fill factor increased. For both noise models, their SSIM trends are relatively flat across the parameter sweep, so the structural image quality is less influenced by fill factor.

Denoised Images at Varied Read Noise Conditions Using Laplacian NL-Means Algorithm

As shown in Fig. 21, a grainier texture and loss of image quality result from the introduction of increased read noise. Read noise can result in signals being buried within the noisy background, especially in cases of low light or very short exposure times. While read noise is no longer a primary limiting factor for modern day photography with CMOS sensors, this study underlines the significance of how state of the art sensor technology development and fabrication have allowed past challenges, such as high read noise, to be mitigated.

At the high-end of the read noise sweep for a read noise of 0.1 V, the denoising effect on the before and after images is perceptually visible to the human eye as evident in Fig. 22. In this case study, the Macbeth reflectance chart for a read noise of 0.1V was denoised by the Laplacian NL-means algorithm, resulting in an average patch-level SNR improvement of 26.0%, 71.6%, and 7.7% respectively for R, G, B channels respectively as shown in Fig. 23. As most cameras use a Bayer filter, green pixels are oversampled, resulting in inherently higher SNR for the green channel; furthermore, the green channel contributes most to the perceived brightness, luminance; therefore, the denoising process results in a stronger noise reduction in the green channel and yields larger SNR improvement. Moreover, the patch-to-patch SNR variation for the Macbeth reflectance chart can largely be attributed to the differences in the patch reflectance [26].

Performance Evaluation of ChatGPT Pixel-Defect Correction Algorithms

Macbeth Reflection Chart

First, a Macbeth Reflection Chart scene is used to evaluate the effectiveness of ChatGPT pixel-defect correction algorithms. Only the defect ratio is swept from 0.001% to 1% for the three defect modes with a fixed luminance of 500 cd/m², exposure time of 0.5 s, f/# of 8, and fill factor of 0.8. For each defect percentage, five trials were run for the statistical analysis, since the locations of the pixel defects are randomized. To visually confirm the removal of pixel defects, two examples of the mixed mode defect ratios of 0.1% and 1% and their corresponding corrected images are shown in Fig. 24, along with SSIM error maps that highlight where structural similarity is lost. The defect images show bright localized error around each pixel defect, while the corrected images show less bright spots.

The first trial per defect percentage for each defect mode is used to generate GIFs of the golden, defect, and corrected versions, as shown in Fig. 25. These GIFs demonstrate how pixel defect severity evolves from 0.001% to 1% as well as how the AI-generated correction algorithm improves visual quality. At very low percentages,the artifacts are barely visible, but the AI-generated correction algorithm sometimes results in a slightly worse PSNR due to unnecessary interpolation. For ≥0.05%, the correction becomes visibly beneficial. At high percentages (0.3 to 1%), the corrected result removes most impulse-like speckles, whereas defect images degrade quickly. Hot and mixed mode defect images appear darker compared to the golden image, and the pixel-defect correction algorithms are unable to match the golden image’s brightness level.

The curves for ∆PSNR and ∆SSIM as a function of defect percentage sweep, shown in Fig. 26 demonstrate that there is a positive improvement once the defect density reaches a threshold. PSNR is degraded for dead defect type less than 0.05%, but hot and mixed mode are mostly positive improvements. For SSIM, dead pixels show improvement only after ~0.05%. Hot and mixed modes show earlier mild improvement (~0.03%). Beyond 0.1%, SSIM curves climb nearly linearly, showing that correction effectiveness increases strongly with defect density.

The box plots in Fig. 27 show the variability across five random trials per defect percentage. Dead pixels show a high variance of PSNR improvement, including many cases where PSNR is further degraded after the correction algorithm is applied. This indicates dead pixel placements create more noticeable dark spots and can result in over-correction and modification to the image by the AI-generated script; this is especially impacted at the lower defect percentages. For hot defects, PSNR shows lower variance at all defect percentages. Hot pixels contribute bright impulsive noise, making them highly visible even when sparsely distributed. Because the algorithm removes these bright outliers reliably, the ΔPSNR distribution remains tight and consistently positive even at defect percentages as low as 0.03%.

SSIM is much more robust to sparse random pixel defects compared to PSNR as the majority of the trials show positive SSIM improvement. Similar to PSNR, hot mode defect shows more noticeable improvements than dead pixels and much less trial-to-trial variance. Hot pixels disrupt local structure more severely, so their removal consistently raises SSIM even at moderate defect percentages (~0.03%). Variance stays small because structural distortion from hot pixels is predictable across trials. For both PSNR and SSIM improvement, the mixed mode lies in between. As shown in the box plots, the spatial distribution and pixel defect type matter in addition to the total pixel defect percentage. Dead defect mode naturally affects image quality the least because dead pixels simply don’t react to light. Unless the image is overexposed, dead-pixel defects still give the best PSNR and SSIM compared to hot or mixed defects as shown in Fig. 11. Since their starting quality is already fairly good, the ChatGPT correction script can end up over-correcting them, resulting in a negative ΔPSNR and ΔSSIM for dead mode, particularly at low defect percentages.

The detailed statistical analysis for five samples per defect percentage for each mode is shown in Table 3. Both PSNR and SSIM improvement rates reach 100% across all five samples at defect rates of 0.07% for the dead defect mode and 0.05% for the hot and mixed defect modes. Overall, the pixel defect correction algorithm is most beneficial for addressing hot pixels because their artifacts are more visually disruptive.

Table 3. Improvement in PSNR and SSIM for different pixel defect modes with defect percentage achieving 100% improvement out of five samples
Defect Mode	Defect %	ΔPSNR [mean ± std. dev.] (dB)	ΔSSIM [mean ± std. dev.]	PSNR Percent Improvement (%)	SSIM Percent Improvement (%)	PSNR Improvement Rate out of 5 samples	SSIM Improvement Rate out of 5 samples
Dead	0.07	0.644 ± 0.162	0.0017 ± 0.0008	1.47	0.18	100	100
Hot	0.05	0.027 ± 0.008	0.0035 ± 0.0006	0.08	0.36	100	100
Mixed	0.05	0.012 ± 0.011	0.0024 ± 0.0009	0.03	0.25	100	100

Natural Scenes

For further validation of the evaluation pipeline, scene-based testing was conducted to verify that the ChatGPT-based pixel-defect correction algorithm generalizes beyond standardized testing targets (e.g., Macbeth charts) and can address real-world natural images. Five scenes from the ISETCam database are included with mixed mode pixel defect percentage swept from 0.01% to 1%: Stuffed Animal, Eagle, Zebra, Camera man, and Face, to account for a variety of different spatial-frequency characteristics, textures, and contrast profiles. The comparison of PSNR and SSIM for the original images with defects and corrected images that meet a target of PSNR > 30 dB and SSIM > 0.93 is shown in Table 4. Across all scenes, the corrected images consistently achieve higher PSNR and SSIM compared to their defective counterparts, with smoother scenes such as Stuffed Animal and Face demonstrating the largest gains due to isolated pixel defects that stand out strongly and are easier for the algorithm to repair. High-frequency scenes like Zebra and Eagle demonstrate smaller numerical improvements, as fine textures limit how aggressively interpolation can be applied, though these PSNR and SSIM improvements remain positive in all cases.

Fig. 28 provides a visual comparison to demonstrate that pixel defects introduce bright or dark speckles that disrupt perceptual coherence, and the corrected images remove these artifacts while largely preserving key structures and edges; this confirms that the algorithm consistently restores perceptual quality without oversmoothing important details.

Table 4. PSNR and SSIM Improvement with different natural scenes using ChatGPT pixel-defect correction
Scene	Defect PSNR [dB]	Defect SSIM	Corrected PSNR [dB]	Corrected SSIM	Defect percentage passes PSNR >30 & SSIM > 0.93
Stuffed animal	33.81	0.8660	34.44	0.9487	0.5
Eagle	32.33	0.9015	32.39	0.9305	0.09
Zebra	34.42	0.9252	34.52	0.9359	0.1
Camera man	33.80	0.8780	33.94	0.9433	0.1
Face	33.20	0.8845	33.41	0.9364	0.3

Limitations of the GenAI-Assisted Algorithm Generation for Image Processing

While GenAI is a rising tool that can be advantageous for expediting the development and testing of different image processing algorithms as well as prototyping, it should be used in conjunction with domain expertise to ensure that the algorithms are reliable and consider all aspects of the desired specifications. It should be cautioned that the lack of domain-specific understanding of image formation or sensor characteristics can hinder the competency of GenAI to generate optimal algorithms; for instance, physical properties attributed to noise or defects may not be fully accounted for in the development process. As discussed in the case studies for denoising and pixel defect correction, multiple iterations of refinement and injection of domain-specific knowledge are necessary to facilitate the optimization of the image processing algorithms. There is also a risk of overfitting and overgeneralization, in which the GenAI-generated algorithms may only be suitable for image quality enhancement of specific example images but may result in degraded performance for new examples. As demonstrated, extensive validation is needed to thoroughly test the GenAI-generated algorithms to verify that the logic used for the image processing and simulation is reasonably sound and suitable for the desired application. The evaluation metrics of the simulation study should align with the theoretical expectations.

Conclusions

Summary of Investigative Studies

The methodologies presented in this investigation and simulation case studies underscore the importance of establishing an effective evaluation pipeline for GenAI script generation with denoising and pixel defect correction techniques for optimizing image quality metrics.

ChatGPT, a form of LLM for GenAI, was used to generate denoising scripts to address various case scenarios, accounting for parameter sweeps and varied models. The denoising NL-means algorithm used for denoising the images rendered with two different noise models (photon noise only and the combination of photon and read noise) shows the greatest improvement in image quality at lower exposure times (short shutter time), considerably small f/# (wide aperture), and moderately low luminance. The Laplacian-based denoising algorithm for patch-level read noise reduction of a noisy Macbeth reflectance chart used channel-independent NL-means denoising. The denoised images demonstrated SNR improvement but at a tradeoff of image blurriness. The development of this evaluation pipeline provides a thorough assessment, statistical analysis, and visualization of image quality metrics for varied parameter settings and models. With pixel-defect percentage sweeps and PSNR and SSIM as metrics, the results consistently demonstrate that the AI-generated pixel-defect correction algorithm is effective at removing both dead and hot pixel artifacts while preserving structural detail and image sharpness. The method yields negligible or slightly negative gains only at extremely low defect densities where correction is unnecessary, especially for dead pixel type, but becomes reliably beneficial once defect levels reach ~0.03–0.05%, with near-100% improvement rates in PSNR and SSIM at moderate and high defect percentages. Scene-based testing further validates that the approach generalizes well to real-world natural scenes, restoring perceptual quality and meeting PSNR ≥ 30 dB and SSIM ≥ 0.93 requirements across diverse scenes.

Further Studies

As presented in the case studies on denoising and pixel defect correction, iterative and methodical prompt refinement is necessary to fine-tune the auto-generated scripts from GenAI. While these case studies utilized ChatGPT, further studies can evaluate the performance of algorithms for denoising and pixel defect correction generated by multiple GenAI tools that use large language models (LLMs), such as Co-Pilot, Claude, and DeepSeek. GenAI LLMs can be used to expedite the algorithm development phase, but stringent validation is necessary to ensure that the generated scripts adhere to all the intended specifications.

Another important aspect that requires further exploration is how the design of prompts or instructions to these systems can shape the effectiveness of algorithms for defect removal, denoising, or enhancement. For example, recent work on prompt-driven restoration demonstrates that semantic or quantitative instructions can guide restoration models to adapt their behavior more precisely [27]. Furthermore, other critical image quality metrics, such as color accuracy, can be integrated to enhance the scope of this evaluation pipeline. It is crucial to execute the pipeline evaluation for a greater variety of scenes to provide further performance assessment of the virtual image enhancement and enable further optimization.

References

[1] R. Ehsani, S. Pathak and P. Chatterjee, "Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution," 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), Ottawa, ON, Canada, 2025, pp. 699-711, doi: 10.1109/MSR66628.2025.00107.

[2] J. Sauvola, S. Tarkoma, M. Klemettinen, J. Riekki, and D. Doermann, “Future of software development with Generative AI,” Automated Software Engineering, vol. 31, no. 1, Mar. 2024. doi:10.1007/s10515-024-00426-z

[3] S. Bistarelli, M. Fiore, I. Mercanti, and M. Mongiello, “Usage of large language model for code generation tasks: A Review,” SN Computer Science, vol. 6, no. 6, Jul. 2025. doi:10.1007/s42979-025-04241-5

[4] Y. Gupta, “Game-changing AI in digital mirrorless cameras today,” Shotkit, https://shotkit.com/ai-technology-mirrorless-cameras/

[5] Zhan Jingchun, G. E. Su, and Mohd Shahrizal Sunar, “Low-light image enhancement: A comprehensive review on methods, datasets and evaluation metrics,” Journal of King Saud University - Computer and Information Sciences, pp. 102234–102234, Nov. 2024, doi:10.1016/j.jksuci.2024.102234.

[6] B. Wendell, Foundations of Image Systems Engineering, Github.io, Accessed on: Dec. 08, 2025. [Online]. Available: https://wandell.github.io/FISE-git/

[7] I. Ihrke, “F-Number and Focal Length of Light FieldSystems: A Comparative Study of Field of View,Light Efficiency, Signal to Noise Ratio, and Depth of Field,” Optics Continuum, vol. 1, No. 4, Feb. 2022, doi: 10.1364/optcon.445077.

[8] S. Hasinoff, “Photon , Poisson noise.” Available: https://people.csail.mit.edu/hasinoff/pubs/hasinoff-photon-2012-preprint.pdf

‌[9] Boukhayma A, Peizerat A, Enz C. Noise Reduction Techniques and Scaling Effects towards Photon Counting CMOS Image Sensors. Sensors (Basel). 2016 Apr 9;16(4):514. doi: 10.3390/s16040514.

[10] S. Sarkar, X. Ye, G. Datta, P. Beerel “FixPix: Fixing Bad Pixels using Deep Learning,” Arxiv.org, 2015. https://arxiv.org/html/2310.11637v2

[11] A. Horé and D. Ziou, "Image Quality Metrics: PSNR vs. SSIM," 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 2010, pp. 2366-2369, doi: 10.1109/ICPR.2010.579.

[12] “Image Quality Metrics - MATLAB & Simulink,” www.mathworks.com. https://www.mathworks.com/help/images/image-quality-metrics.html

[13] “PSNR,” www.mathworks.com, 2006. https://www.mathworks.com/help/vision/ref/psnr.html

[14] “ssim,” www.mathworks.com, 2014. https://www.mathworks.com/help/images/ref/ssim.html

[15] D. Williams, “Benchmarking of the ISO 12233 Slanted-edge Spatial Frequency Response Plug-in,” IS&T, 1998, https://www.imaging.org/common/uploaded%20files/pdfs/Papers/1998/PICS-0-43/613.pdf

[16] J. E. Farrell, F. Xiao, P. B. Catrysse, and B. A. Wandell, “A simulation tool for evaluating digital camera image quality,” SPIE Proceedings, vol. 5294, pp. 124–131, Dec. 2003. doi:10.1117/12.537474

[17] A. Buades, B. Coll, and J. Morel, “Non-Local Means Denoising”, Image Processing On Line, 1 (2011), pp. 208–212. doi: 10.5201/ipol.2011.bcm_nlm

[18] S. Yang et al., "A Review of Image Enhancement Technology Research," 2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 2021, pp. 715-720, doi: 10.1109/MLBDBI54094.2021.00141.

[19] A. Ranjbaran, A. H. Hassan, M. Jafarpour, B. Ranjbaran. “A Laplacian based image filtering using switching noise detector”. Springerplus. 2015 Mar 8;4:119. doi: 10.1186/s40064-015-0846-5. PMID: 25897407; PMCID: PMC4398689.

[20] V. Crnojevic, V. Senk and Z. Trpovski, "Advanced impulse Detection Based on pixel-wise MAD," in IEEE Signal Processing Letters, vol. 11, no. 7, pp. 589-592, July 2004, doi: 10.1109/LSP.2004.830117.

[21] Bad Pixel Removal — HSpeQ - Hyperspectral Imaging, “HSpeQ - Hyperspectral Imaging,” HSpeQ - Hyperspectral Imaging, Apr. 03, 2023. https://www.idcubes.com/documentation/bad-pixel-removal

[22] N. El-Yamany, "Robust Defect Pixel Detection and Correction for Bayer Imaging Systems,” Proc. IS&T Int’l. Symp. on Electronic Imaging: Digital Photography and Mobile Imaging XIII, 2017, pp 46 - 51, doi: 10.2352/ISSN.2470-1173.2017.15.DPMI-088

[23] R. Biradar, V. V. Kohir, “A novel image inpainting technique based on median diffusion,” in Sadhana, vol.50, 2025. https://www.ias.ac.in/describe/article/sadh/038/04/0621-0644?lang=English

[24] “The Benefits of Bringing BSI to High-Speed Applications,” Phantomhighspeed.com, 2021. https://www.phantomhighspeed.com/news/newsarticles/2021/june/tmxshort2#

[25] N. Morrison, “What is f-stop on a camera?,” www.adobe.com. https://www.adobe.com/creativecloud/photography/discover/f-stop.html

[26] X. Tan, S. Lai, Y. Liu, and M. Zhang, “Green channel guiding denoising on Bayer Image,” The Scientific World Journal, vol. 2014, pp. 1–9, Mar. 2014. doi:10.1155/2014/979081

[27] C. Qi et al., “SPIRE: Semantic Prompt-Driven Image Restoration,” arXiv (Cornell University), Dec. 2023, doi: 10.48550/arxiv.2312.11595.

Appendix I

Appendix II

Stephanie Chang: communicated with mentor and instructors, project proposal and investigation, written report, presentation, MATLAB simulation, parameter sweep with varied noise model case study (read noise, photon noise), development of evaluation pipeline and denoising algorithm

Yulin Deng: communicated with mentor and instructors, written report, project proposal and investigation, presentation, MATLAB simulation, parameter sweep with varied pixel defect case study (hot, dead, mixed), development of evaluation pipeline and pixel defect correction algorithm

@@ Line 152: / Line 152: @@
 Following the denoised image processing for both noise models, ∆PSNR and ∆SSIM were non-negative for all four parameter sweeps. As shown in Fig. 15 and Fig. 16, the NL-means denoising algorithm improved the original noisy image’s PSNR and SSIM to varying degrees, displaying the widest range of ΔPSNR and ΔSSIM distributions in the case of the exposure time parameter sweep. The general trend revealed that the denoising algorithm resulted in an average 6.5 to 11% and 4.3% to 6.7% improvement in PSNR and SSIM, respectively, for images with the photon and read noise model. By comparison, there was a wider range of 1.6 to 9.5% and 0.6 to 3.9% improvement in PSNR and SSIM, respectively, for images with the photon noise model.
-{| class="wikitable" style="margin:auto"
+{| class="wikitable"
 |+ Table 2. Improvement in PSNR and SSIM for Parameter Sweep and Varied Noise Models
 |-