Text2LIVE: Text-Driven Layered Image and Video Editing

Text-Driven Layered Image and Video Editing

ECCV 2022 Oral

Omer Bar-Tal *1Dolev Ofri-Amar *1Rafail Fridman *1Yoni Kasten 2Tali Dekel 1

1 Weizmann Institute of Science

2 NVIDIA Research

Previous Next

* Equal contribution

Abstract

We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Specifically, given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or augment the scene with new visual effects (e.g., smoke, fire) in a semantically meaningful manner. Our framework trains a generator using an internal dataset of training examples, extracted from a single input (image or video and target text prompt), while leveraging an external pre-trained CLIP model to establish our losses. Rather than directly generating the edited output, our key idea is to generate an edit layer (color+opacity) that is composited over the original input. This allows us to constrain the generation process and maintain high fidelity to the original input via novel text-driven losses that are applied directly to the edit layer. Our method neither relies on a pre-trained generator nor requires user-provided edit masks. Thus, it can perform localized, semantic edits on high-resolution natural images and videos across a variety of objects and scenes.

Semi-Transparent Effects

Text2LIVE successfully augments the input scene with complex semi-transparent effects without changing irrelevant content in the image.

Paper

Text2LIVE: Text-Driven Layered Image and Video Editing
Omer Bar-Tal*, Dolev Ofri-Amar*, Rafail Fridman*, Yoni Kasten, Tali Dekel.
(* indicates equal contribution)
ArXiv

[paper]

Supplementary Material

[supplementary page]

Bibtex

  @article{Text2LIVE2022,
	  author    = {Omer Bar-Tal and Dolev Ofri-Amar and Rafail Fridman and Yoni Katen and Tali Dekel},
  	  title     = {Text2LIVE: Text-Driven Layered Image and Video Editing},
	  journal   = {arXiv preprint arXiv:2204.02491},
	  year      = {2022},
  }

Acknowledgments

We thank Kfir Aberman, Lior Yariv, Shai Bagon, and Narek Tumanayan for their insightful comments. We thank Narek Tumanayan for his help with the baselines comparison.

Recommend

Drive Online Sales With These 5 Search Optimizations

腾讯与梅赛德斯-奔驰签署合作备忘录，将建立自动驾驶联合实验室

互联网巨头真的抢了实体中小零售商的饭碗吗？

《孤勇者》变全民儿歌，谁是推手？

早报｜微信内测双账号功能 / 懂车帝恶意投诉被驳回 / 上海市监局回应钟薛高雪糕烧不化

72小时卖出7600万！旧歌引争议周杰伦和平台“吃相难看”？

如何在 Windows 中查看可用内存插槽

30万元家用纯电SUV选唐EV、沃尔沃XC40还是大众ID6？

'I'm CEO of a Robotics Company, and I Believe AI's Failed on Many Fronts'

Best Buy Black Friday in July sale: best anti-Prime Day tech deals - The Verge

About Joyk