PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

Dept. ECE, University of Alberta
Huawei Technologies Canada
Huawei Kirin Solution, China
Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25)

In this work, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation. PixelMan maintains image consistency by directly creating a duplicate copy of the source object at target location in the pixel space, and we introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location. The key to ensuring image consistency is anchoring the output image to be generated to the pixel-manipulated image as well as introducing various consistency-preserving optimization techniques during inference. Moreover, we propose a leak-proof SA manipulation technique to enable cohesive inpainting by addressing the attention leakage issue which is a root cause of failed inpainting.

Teaser Image
PixelMan achieves consistent object editing for object repositioning with lower latency and fewer inference steps, while better preserving image consistency and achieving cohesive inpainting.

Abstract

Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.

Method Overview

Overview Image
Overview of PixelMan. An efficient inversion-free sampling approach for consistent image editing, which copies the object to target location in pixel-space, and ensure image consistency by anchoring to the latents of pixel-manipulated image. We design a leak-proof self-attention mechanism to achieve complete and cohesive inpainting by mitigating information leakage.

Efficiency Comparison

Efficiency Comparison
PixelMan at 16 steps performs 112 fewer NFEs and is 15 seconds faster than DiffEditor (Mou et al. 2024a) on the COCOEE dataset.

Quantitative Results

COCOEE Dataset

Results Table 1

ReS Dataset

Results Table 2
Quantitative results on the COCOEE and ReS datasets as well as extensive visual comparisons show that PixelMan achieves superior performance in consistency metrics for object, background, and image semantics while achieving higher or comparable performance in IQA metrics. As a training-free method, PixelMan only requires 16 inference steps with lower average latency and a lower number of NFEs than current popular methods.

Other Consistent Object Editing Tasks

Other Consistent Object Editing Tasks
Qualitative examples on other consistent object editing tasks including object resizing, and object pasting.

Visual Comparisons

BibTeX


        @inproceedings{jiang2025pixelman,
          title = {PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation},
          author = {Liyao Jiang and Negar Hassanpour and Mohammad Salameh and Mohammadreza Samadi and Jiao He and Fengyu Sun and Di Niu},
          booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
          year={2025}
        }