TEXTure: Text-Guided Texturing of 3D Shapes

Anonymous Authors

TEXTure takes an input mesh and a conditioning text prompt and paints the mesh with high-quality textures, using an iterative diffusion-based process.


In this paper, we present TEXTure, a novel method for text-guided generation, editing, and transfer of textures for 3D shapes. Leveraging a pretrained depth-to-image diffusion model, TEXTure applies an iterative scheme that paints a 3D model from different viewpoints. Yet, while depth-to-image models can create plausible textures from a single viewpoint, the stochastic nature of the generation process can cause many inconsistencies when texturing an entire 3D object. To tackle these problems, we dynamically define a trimap partitioning of the rendered image into three progression states, and present a novel elaborated diffusion sampling process that uses this trimap representation to generate seamless textures from different views. We then show that one can transfer the generated texture maps to new 3D geometries without requiring explicit surface-to-surface mapping, as well as extract semantic textures from a set of images without requiring any explicit reconstruction. Finally, we show that TEXTure can be used to not only generate new textures but also edit and refine existing textures using either a text prompt or user-provided scribbles. We demonstrate that our TEXTuring method excels at generating, transferring, and editing textures through extensive evaluation, and further close the gap between 2D image generation and 3D texturing.

Texturing Results

A desktop Apple iMac

A photo of Napoleon Bonaparte

A goldfish

A puffer fish

A piranha fish

A carved wood elephant

A turtle

A wooden klein bottle

A 90s boombox

Token-Based Texture Transfer

Teapot Images

Input Images

Teddy Images

Input Images

Input Mesh

Text-Based Refinement

Input Mesh

A plush orangutan toy

A plastic orangutan toy

Results Video

How does it work?

Texturing Pipeline

A mesh is iteratively painted from different viewpoints. In each painting iteration, we render the mesh alongside its depth and normal maps. We then calculate a trimap partitioning of the image to three distinct areas based on the camera normals and a viewpoint cache representing previously viewed angles. These inputs are then fed into a modified diffusion process alongside a given text prompt which generates an updated image. This image is then projected back to the texture map for the next iteration.

Check out the paper to learn more 🤓