Development / Machine Learning / Monitoring

A Look Inside TensorFlow, Google’s Open Source Deep Learning Framework

1 Dec 2015 12:02pm, by

Earlier this month, Google open sourced its second generation artificial intelligence engine, TensorFlow, to much fanfare, bringing the world a bit closer to user-friendly machine learning (ML).

TensorFlow is a robust, application-grade software library of machine learning (ML) code for computation, providing both a Python and C/C++ API to link into a developer’s program. Google hopes that the emerging ML community will eventually extend the tool further with other languages like Go, Java and JavaScript, to give programmers more options in building apps.

Although researched in academia for decades, ML is increasingly moving into corporate computing use, thanks to both the proliferation of cheap cloud computing and massive amounts of data. Loosely based on how the brain processes information, ML moves beyond basic computational data analysis in that it provides the framework for a computer to make guesses about what will happen in the future, given a set of data. The prediction process is refined through feedback loops, or neural nets, allowing the algorithm to refine itself with repeated iterations of testing.

Microsoft offers a ML service on its Azure cloud computing service and is fostering a community where data scientist can share and purchase algorithms on everything from voice recognition to highlighting groups of products that have been purchased in tandem. IBM offers a set of ML-based predictive analysis services on its Watson Analytics platform. Watson aims to simplify the ML interface as much as possible by allowing a user to merely enter a question and upload data, and the service will provide the means to explore data and even extrapolate the data to predict future outcomes.

With its release on an Apache 2.0 license,  TensorFlow can be used by developers and researchers in their projects and products, to teach their systems tasks like recognizing images or translating speech. But it’s not just for machine learning; it can also be used to crunch any large set of complex data. In particular, TensorFlow will allow deep learning researchers to build, train and deploy deep learning neural nets more easily, and it’s but one of the many deep learning frameworks that are already available out to researchers, like Theano, Torch, Caffe, Neon, H2O and more.

The new Google platform took the place of the company’s internal machine learning infrastructure DistBelief, first developed in 2011, and which was responsible for training the deep learning neural networks underlying familiar services like Google Photos, Google Translate, YouTube, as well as experimental projects like DeepDream.

But DistBelief had its limitations: it wasn’t very user-friendly, and wasn’t a candidate for open sourcing as it was too closely linked to the Google’s internal infrastructure. According to the company, TensorFlow is smarter, more flexible and “twice as fast” than its previous system, and is highly scalable from a single smartphone to large-scale, distributed systems with thousands of computers. In contrast to DistBelief, TensorFlow is also designed for other types of AI learning like reinforcement learning and logistic regression.

But before deep learning buffs dig in, it’s worthwhile to note Google isn’t giving everything away. The version available now only works on a single computer, so there is limited ability to analyze data at scale, though this may change in the future So, at least in the short term, TensorFlow could not serve as a replacement, or even an adjunct, to big data platforms such as Hadoop or Spark.

Also, with Tensor, one would also need large amounts of data to sufficiently train neural nets. So while Google may be open-sourcing the software, it’s still keeping its competitive edge by not giving away access to its data nor its trained neural net models.

Under the Hood

Computations in TensorFlow are represented as stateful dataflow graphs, which allows models to be deployed across different devices in a distributed system without having to rewrite models. The nodes here represent operations (“ops”), with the edges serving as “tensors,” or multidimensional arrays. Dataflow graphs delineate the computations that happen during a “session” and can be heterogeneously executed on a wide range of devices, from single or multiple CPUs, to super-accelerated GPU environments, and apparently without too many changes to the code.

tensorflow-1

TensorBoard visualization graph of a convolutional neural network model

According to some TensorFlow testers in the wild, there are some nifty features here. For starters, resource allocation is optimized in a sophisticated way, minimizing communication overhead of data across devices, making it particularly suited to scaling across distributed systems. It also features “queues” that permit segments of the graph to execute asynchronously, so that input data can be pre-fetched from disk files while a previous batch of data is still being processed. Queues can also be used for grouping other kinds of data, either by type or for efficiency.

Single machine system structure versus distributed system structure

Single machine system structure versus distributed system structure

Another strength is TensorBoard, the platform’s visualization tool for computation graph structures and summary statistics. TensorBoard’s algorithms will collapse nodes into high-level blocks, and highlight groups with the same structures, while also separating out high-degree nodes. TensorBoard is also interactive: users can pan, zoom in, expand and collapse nodes. Some computation graphs for training complex models — like that of Google’s Inception model — can have tens of thousands of nodes, so having a clean and intuitive method of visual organization is imperative to ensure that the development process can flow smoothly and that performance can be more easily inspected.

TensorBoard graphs showing model summary statistics

TensorBoard graphs showing model summary statistics

Ultimately, TensorFlow is a flexible and adaptable system that will allow researchers and developers to collaborate more seamlessly, whether it’s for training deep neural network models for research, or for developing applications for speech recognition, computer vision, natural language processing, robotics and even studying how drugs interact with complex systems like proteins. Since it isn’t fully featured in its open source release, some have been pointing out this may hobble it a bit in the open source space. But this available version of TensorFlow still has a number of attractive elements that might make it worthwhile for those who are curious and inclined to check it out.

For more information, read Google’s whitepaper, try out the tutorials or see the sample model architectures.

IBM is a sponsor of The New Stack.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.