Machine Learning

LOGAN Is a Deep Learning AI That Transforms 3D Shapes Seamlessly

1 Nov 2019 2:08pm, by

You might have heard of the recent buzz around generative adversarial networks (GANs) — a machine learning technique that makes it possible to create eerily convincing “deepfake” videos, or as a “de-identification” tool that anonymizes photos to protect one’s privacy, or as a way to generate realistic-looking cityscapes in video games. GANs are unsupervised deep learning algorithms that work by pitting one “generator” neural network against a “discriminator” neural network — the “generator” aims to produce images over and over until it produces one that ultimately fools the “discriminator” network, thus resulting in a persuasive fake image.

As one can see, generative adversarial networks are pretty versatile, and now a team of researchers from Simon Fraser University, Shenzhen University and Tel Aviv University are adapting GANs as a tool for transforming shapes — such as seamlessly morphing a sphere into a pyramid, for instance — something that would be supremely useful in the fields of computer graphics and geometric modeling.

Their work, which will be presented at SIGGRAPH Asia later this year, features a deep neural network that can automatically learn how to transform the shape of one “source” object into the shape of another “target” object. This deep neural network — dubbed LOGAN (an abbreviation of “Latent Overcomplete GAN”) — is able do such a shape transformation without needing to be trained on what the intermediary steps might look like, resulting in a tool that can do more natural-looking and general-purpose transformations of objects. While that might not sound like a big deal, there is something significant here: in contrast to other models, LOGAN can perform transformations between two objects that are “unpaired” — meaning the model doesn’t necessarily have to be hand-fed data on how to translate specific pairs of objects — for example, one particular chair transforming into one particular table — it can generalize more broadly, meaning it can transform any chair into any table in an unsupervised manner.

“In the real world, paired datasets are not always available,” explained Simon Fraser University PhD student and lead author Kangxue Yin to The New Stack. “For example, it is difficult to find one-to-one pairing relationships from a set of chair and a set tables. Also, real 3D scans of indoor scenes captured by humans and 3D scenes created by artists are two datasets that cannot be well-paired.”

A ‘Common Latent Space’

Diagram of LOGAN’s network architecture.

To create LOGAN, the team first trained a neural network to work as an “autoencoder” that computes data taken from the two opposing shapes, and encodes that data into what the team calls a “common latent space” that is shared by both the source and target objects. Rather than having the shape translation happen directly on the input shapes — as in the case with corresponding pixel-to-pixel translations of images — the translation here occurs in this common latent space, providing an analogous correspondence that other conventional shape-translating models offer.

“The encoding allows us to put all the shapes into a common space which provides that correspondence,” said Yin. “In general, autoencoders are trained to reconstruct shapes. The codes learned by our autoencoder are overcomplete, meaning that we do not need the full code to reconstruct a shape; some parts of code can also do a good reconstruction.”

To ensure that the algorithm achieves a proper reconstruction so that it is clear that the target shape has evolved from the source shape, it also incorporates a “translator” network that’s based on generative adversarial networks (GANs), permitting it to perform what the team calls “feature preservation loss,” which means that some of the features that are common to both the source object and the target object are preserved, while other characteristics are changed, depending on the shapes that are given.

Comparison of transformations that preserve the style of the original letters (first row), done by LOGAN and other existing GANs.

LOGAN transforms tables from a lower to a higher height.

LOGAN’s autoencoder takes data from forms and encodes that data into “point clouds”.

The team notes that a tool like LOGAN could have applications in a number of diverse fields. For instance, whether designing fonts or furniture, the network can be given a letter or a piece of furniture done in a particular style, and it will then produce a new letter or furniture in the same style. As one might expect, LOGAN is only the first step of many toward developing a truly general-purpose shape-transform algorithm — currently, some technical improvements are needed so that similar systems can someday understand more intelligently the deeper relationships between forms, in order to expand the boundaries of geometric deep learning further.

Images: Simon Fraser University, Shenzhen University and Tel Aviv University