Analysis / Science / Technology / Top Stories /

Picture This: Google Trains AI to Create Professional-Quality Art Photography

21 Jul 2017 1:25pm, by

An “art” landscape photograph from Interlaken, Switzerland, produced from a Google Earth image by Creatism — Google’s new experimental “deep-learning system for artistic content creation.”

There are many professions where human workers are being replaced by intelligent machines. Cashiers at stores and restaurants, factory workers, even farm laborers are all being swapped out for robots at a dizzying pace. Until now, however, those in the artistic professions felt pretty safe from the threat. After all, how could an algorithm ever replicate the inenarrable process of human creativity?

Enter Hui Fang and Meng Zhang, software engineers at Google Research who specialize in machine perception. On July 11th Fang and Zhang released a white paper announcing Creatism, a “deep-learning system for artistic content creation.”

The pair was interested in moving beyond the objective realm where machine learning has been limited thus far: an essentially binary universe where tasks have linear, clearly defined parameters producing an outcome that is either correct or incorrect. Right vs. wrong answers underlie the machine learning “training process” — but how could this work in an area like music or the visual arts, where there are no objective “right” or “wrong” outcomes? Aesthetic values are highly subjective, “eye of the beholder” utterly un-quantifiable entities…or are they?

Fang and Zhang wanted to find out. They drafted an experimental deep-learning system to mimic the workflow of a professional photographer: choosing and framing a composition, then applying a series of post-processing filters and effects to enhance the image. The system was allowed to “roam” a set of 40,000 Google Earth Street View landscape panoramas — drawn from places like Banff National Park, the Interlochen lakes region of Switzerland, and similar drop-dead gorgeous locales — in order to select its own images for processing. The system then cropped the image for optimal composition, applied algorithms to enhance color saturation and HDR strength, and finally applied a content-aware “dramatic mask” to adjust highlights within the photo.

A Google Earth source image (a) is selected by the AI, then cropped into (b), with saturation and HDR strength enhanced in (c), and with a dramatic mask applied in (d). Each step is guided by one learned aspect of aesthetics.

The engineers then applied the ultimate in AI evaluations, the Turing test. They mixed the AI’s output images with other images created by humans and presented them for blind evaluation by a panel of professional photographers. The pros rated the images using a simplified system based on the kind used for judging photo competitions. Each image was given a score ranging from 1 to 4, with the lowest being “point and shoot without consideration for composition, lighting, etc.” and the highest being “pro” for images that achieved visual near-perfection. Forty percent of the AI’s images were deemed “pro” or “semi-pro” in quality — an impressive number for a first-time experiment.

Turing Test scores for image quality ratings of Creatism images mixed randomly with images made by humans, as rated professional photographers. Up to 40 percent of the AI- produced images were rated to be the same as pro or semi-pro quality photos by humans

How did Feng and Zhang pull off teaching a bot to achieve Ansel Adams-level results in four out of ten photographs? The simplified explanation is by breaking down aesthetics into defined categories, each learned individually “with negative examples generated by a coupled image operation.” In other words, by defining “right” and “wrong” outcomes at each step of Creatism’s image manipulation process.

“Aesthetics is treated not as a single quantity, but as a combination of different aspects… By making image operations semi-orthogonal, we can efficiently optimize a photo one aspect at a time,” Feng wrote in the white paper. Meaning there was no need for the traditional before/after image pairing for the algorithm to identify and map the processes necessary to improve a given photo. After all, who needs a map when getting from point A to point B is a short, straight, well-defined line?

There’s a lot more to it than that, obviously. Data nerds can delve into the white paper for all the gory applied mathematics and machine deep-learning deets, such as

Assume there exists a universal aesthetics metric, Φ, that gives a higher score for a photo with higher aesthetic quality. A robot is assigned a task to produce the best photograph P, measured by Φ, from its environment E. The robot seeks to maximize Φ by creating P with optimal actions, controlled by parameters {x}: 

arg max (Φ(P(E, {x}))


Professional photographers in fear of losing their jobs to robots, on the other hand, can take heart from the first sentence in the above block quote, which contains the fundamental fallacy these engineers started from: “Assume there exists a universal aesthetics metric.”

That is one Grand Canyon-sized assumption you got there, guys: the very notion that there are any “right” answers when it comes to creating art. The results of the Creatism experiment are fun, and interesting, and even impressive at times. But consider that the AI was working from Google Earth images drawn only from some of the most profoundly beautiful places on the planet: Big Sur. Glacier National Park. Yellowstone. (And, yes, the Grand Canyon).

In places like these, even your Aunt Bertha out on vacation with her drugstore disposable point-and-shoot camera is going to come up with some beautiful images. It’s a combo fish-in-a-barrel, monkeys and typewriters and Shakespeare scenario.

The upshot: Neither Creatism nor Aunt Bertha poses a credible threat to anyone’s creative professional future. Not yet, anyway.

A digest of the week’s most important stories & analyses.

View / Add Comments