PixelMan@AAAI2025

Teaser Image — PixelMan achieves consistent object editing for object repositioning with lower latency and fewer inference steps, while better preserving image consistency and achieving cohesive inpainting.

Abstract

Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.

Method Overview

Overview Image — Overview of PixelMan. An efficient inversion-free sampling approach for consistent image editing, which copies the object to target location in pixel-space, and ensure image consistency by anchoring to the latents of pixel-manipulated image. We design a leak-proof self-attention mechanism to achieve complete and cohesive inpainting by mitigating information leakage.

Efficiency Comparison

Quantitative Results

COCOEE Dataset

ReS Dataset

Quantitative results on the COCOEE and ReS datasets as well as extensive visual comparisons show that PixelMan achieves superior performance in consistency metrics for object, background, and image semantics while achieving higher or comparable performance in IQA metrics. As a training-free method, PixelMan only requires 16 inference steps with lower average latency and a lower number of NFEs than current popular methods.

Other Consistent Object Editing Tasks

BibTeX


        @inproceedings{jiang2025pixelman,
          title = {PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation},
          author = {Liyao Jiang and Negar Hassanpour and Mohammad Salameh and Mohammadreza Samadi and Jiao He and Fengyu Sun and Di Niu},
          booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
          year={2025}
        }

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

Abstract

Method Overview

Efficiency Comparison

Quantitative Results

COCOEE Dataset

ReS Dataset

Other Consistent Object Editing Tasks

Visual Comparisons

Additional qualitative comparison on the COCOEE dataset at both 16 and 50 steps.

Additional qualitative comparison on the COCOEE dataset at both 16 and 50 steps.

Additional qualitative comparison on the COCOEE dataset at both 16 and 50 steps.

Additional qualitative comparison on the ReS dataset at both 16 and 50 steps.

Additional qualitative comparison on the ReS dataset at both 16 and 50 steps.

BibTeX