߷ SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

CVPR 2023

{ashkan,jkelly,gilitschenski}@cs.toronto.edu, tristan.a@partner.samsung.com, {kosta,mab}@eecs.yorku.ca, alex.lev@samsung.com
1Samsung AI Centre Toronto, 2University of Toronto, 3York University, 4Vector Institute for AI

Abstract

Neural Radiance Fields (NeRFs) have emerged as a popular approach for novel view synthesis. While NeRFs are quickly being adapted for a wider set of applications, intuitively editing NeRF scenes is still an open challenge. One important editing task is the removal of unwanted objects from a 3D scene, such that the replaced region is visually plausible and consistent with its context. We refer to this task as 3D inpainting. In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. Given a small set of posed images and sparse annotations in a single input image, our framework first rapidly obtains a 3D segmentation mask for a target object. Using the mask, a perceptual optimization-based approach is then introduced that leverages learned 2D image inpainters, distilling their information into 3D space, while ensuring view consistency. We also address the lack of a diverse benchmark for evaluating 3D scene inpainting methods by introducing a dataset comprised of challenging real-world scenes. In particular, our dataset contains views of the same scene with and without a target object, enabling more principled benchmarking of the 3D inpainting task. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRF-based methods and 2D segmentation approaches. We then evaluate on the task of 3D inpainting, establishing state-of-the-art performance against other NeRF manipulation algorithms, as well as a strong 2D image inpainter baseline.

Video

Inpainting Pipeline


An overview of our proposed inpainting pipeline. Using the posed input images and their corresponding masks (upper and lower left insets), we obtain (i) an initial NeRF with the target object present and (ii) the set of inpainted input RGB images with the target object removed (but with view inconsistencies). The initial NeRF (i) is used to compute depth values, which we inpaint to obtain depth images as geometric priors (upper-right inset). The inpainted RGB images (ii), which act as appearance priors, are used in conjunction with the depth priors, to fit a 3D consistent NeRF to the inpainted scene (lower-right inset).




Sample Scene from Our Dataset

Original Scene
Our Inpainting Result

Inpainting Stages

1) Interactive Segmentation

2) Multiview Segmentation

3) Multiview Inpainting



Comparison to the Concurrent Work

NeRF
NeRF-In
NeRF-In (Single)
Ours

On the Importance of the Perceptual Loss

Here, we demonstrate the importance of using the perceputal loss instead of direct MSE optimization on a 2D toy example. Consider the following RGB image and the synthetic square mask. Based on them, we create the 16 different possible 2D inapintings. Note that inpainting is an ill-posed problem and all of the following are plausible answers for the task of inpainting the image:

Sample Image

Masked Image

16 Different Inpaintings

Now, we start optimizing an image based on these 16 outputs. In the first attempt, the Mean Squared Error (MSE) loss is used, while in the alternative approach, we use a perceptual loss as proposed in the paper for the masked region:

As evident in the results, even after these few steps of fitting the output image on the 16 input inpaintings, the perceptual loss has led to a more detailed and accurate texture. In contrast, the MSE loss has difficulties when facing inconsistent inputs, and has converged to a blurry inpainted region. This blurry area is close to the average of all of the inputs.

BibTeX

@inproceedings{spinnerf,
      title={{SPIn-NeRF}: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields}, 
      author={Ashkan Mirzaei and Tristan Aumentado-Armstrong and Konstantinos G. Derpanis and Jonathan Kelly and Marcus A. Brubaker and Igor Gilitschenski and Alex Levinshtein},
      year={2023},
      booktitle={CVPR},
}