On-Device Demo
Real deployment on iPhone 13 Pro Max — no cloud, fully on-device. (Demo videos play at 3x speed)
Style Mode
Param Mode
Overview
Abstract
Reasoning photo retouching has gained significant traction, requiring models to analyze image defects, give reasoning processes, and execute precise retouching enhancements. However, existing approaches often rely on non-differentiable external software, creating optimization barriers and suffering from high parameter redundancy and limited generalization. To address these challenges, we propose VeraRetouch, a lightweight and fully differentiable framework for multi-task photo retouching. We employ a 0.5B Vision-Language Model (VLM) as the central intelligence to formulate retouching plans based on instructions and scene semantics. Furthermore, we develop a fully differentiable Retouch Renderer that replaces external tools, enabling direct end-to-end pixel-level training through decoupled control latents for lighting, global color, and specific color adjustments. To overcome data scarcity, we introduce AetherRetouch-1M+, the first million-scale dataset for professional retouching, constructed via a new inverse degradation workflow. Furthermore, we propose DAPO-AE, a reinforcement learning post-training strategy that enhances autonomous aesthetic cognition. Extensive experiments demonstrate that VeraRetouch achieves state-of-the-art performance across multiple benchmarks while maintaining a significantly smaller footprint, enabling mobile deployment.
Highlights
- Lightweight design for controllable, interpretable mobile deployment
- Free-resolution input for flexible retouching across diverse image sizes
- Fully differentiable renderer for direct pixel-level training
- Unified support for auto, style, and parameter retouching
- AetherRetouch-1M+ for large-scale professional supervision
Visual Results
Comparisons across three retouching modes
Drag the divider to inspect how the model reshapes lighting, palette, and local color relationships while preserving detail. Auto mode autonomously analyzes image defects and generates reasoning-aware enhancements without any user prompt. Style mode translates stylistic text prompts into visual adjustments by formulating a structured retouching plan and control latents. Param mode executes exact pixel-level modifications based on professional operational parameters. (Note: The comprehensive reasoning text produced by the model has been truncated for this display to prioritize visual clarity.)
Mode 01
Auto Mode
Mode 02
Style Mode
Mode 03
Param Mode
Method
A compact pipeline with explicit control over retouching intent
Reasoning Brain
A 0.5B vision-language model reads the image and optional user request, then produces an interpretable retouching plan.
Disentangled Controls
Internal latents separate lighting, global color, and specific-color adjustments for finer retouch behavior.
Differentiable Rendering
The renderer replaces external editing software so the whole system can be trained end to end at pixel level.
Dataset
AetherRetouch Dataset
To support large-scale reasoning photo retouching, VeraRetouch introduces AetherRetouch-1M+, a million-scale dataset designed for professional-quality enhancement. The dataset covers diverse scenes, lighting conditions, portrait and landscape content, and rich retouching targets across auto, style, and parameter-driven workflows.
AetherRetouch is organized into three complementary parts: Auto-Retouch pairs for image-only enhancement, Style-Retouch pairs for prompt-guided stylistic editing, and Param-Retouch examples for explicit parameter-driven control. Together, these three subsets provide broad supervision for reasoning, controllability, and visual generalization across real retouching scenarios.
The data construction pipeline below mainly illustrates how the Auto-Retouch subset is built through an inverse degradation process. Starting from high-quality retouched references, the pipeline synthesizes realistic low-quality inputs to form supervision pairs for differentiable planning and rendering. This strategy enables scalable collection while preserving strong retouch targets and realistic visual degradation patterns.
Citation
Use VeraRetouch in your research
@article{guo2026veraretouch,
title={VeraRetouch: A Lightweight Fully Differentiable Framework for Multi-Task Reasoning Photo Retouching},
author={Guo, Yihong and Lyu, Youwei and Tang, Jiajun and Zhou, Yizhuo and Wang, Hongliang and Chen, Jinwei and Zou, Changqing and Fan, Qingnan},
journal={arXiv preprint arXiv:2604.27375},
year={2026}
}