When it comes to using software frameworks to train models for machine learning tasks, Google’s TensorFlow beats the University of California Berkeley’s Caffe library in a number of important ways, argued Aaron Schumacher, senior data scientist for Arlington, Virginia-based data science firm Deep Learning Analytics.
In short, TensorFlow is easier to deploy and has a much more flexible API, among other favorable attributes, asserted Schumacher, who made his case at the OSCON open source conference, held last week in Austin. His insight could prove valuable to someone trying to get a foothold in the rapidly expanding world of machine learning.
Here are the ways, according to Schumacher, in which Tensorflow is more friendly to developers, as compared to Caffe:
1: Easier Deployment: One of the downsides of Caffe is that there is no easy mechanism for installation, such as the Python pip package manager deployed by Tensorflow. “You always need to compile it from source,” Schumacher said.
2. A High-Level APIs for Using and Sharing APIs: Caffe was one of the first machine learning frameworks to offer a repository of models that developers have built and shared, through Caffe Model Zoo. One thing Caffe is missing, however, “is the high-level APIs for building models,” something that TensorFlow provides (In fact, Schumacher will also be giving a Webinar on the new TensorFlow APIs on May 24).
With TensorFlow, a research can pull in pre-trained models with a single line of Python (“tf.contrib.keras”). “You specify want you to want from a model, and it comes in ready to be trained,” Schumacher said. TensorFlow also offers a call for evaluating a range of models in a scikit-like package (“tf.contrib.learn”).
3. Lifecycle Management for Developers: “The level of the API matter a lot here. With a high-level API, you can experiment quickly but you want to have a low-level API sometimes to get at the nuts and bolts to configure stuff in a non-standard way,” Schumacher said.
Caffe’s approach, in Schumacher’s estimate, has been “middle-to-low API” which offers little high-level support, but somewhat limited deep configurability as well. “It’s not always as low as you want it to get to change things, and if you want to go higher, you have to build your own,” he said.
For instance, Deep Learning Analytics had to build a wrapper for the PyCaffe interface, in order to make it easier to use. Although both TensorFlow and PyCaffe were written in C++, Tensorflow has a much more suitable interface for Python, which is increasingly becoming the language of choice for data scientists. Caffe’s interface is much more C++ centric, requiring users to do tasks such as creating configuration files and planting them on disk for each new machine learning job.
4. Better Support for GPUs: Caffe has some support for running on jobs on GPUs, the vector processing capability of which support parallel operations. But Caffe’s documentation is hidden on its GitHub repository. And currently, the GPU support offers no tools for Python — all the training must be done through a C++-based command line interface. Also, Caffe only supports a single style of multi-GPU configuration. “It isn’t a general multi-GPU support,” he said.
TensorFlow, by contrast “is so easy it is amazing,” Schumacher said. All the necessary adjustments are done through the tf.device(), where one designates the use of GPUs. No additional documentation is needed, nor any changes are required to the API. Also, TensorFlow is more flexible in terms of the architecture: You can run two copies of a model on two GPUs, or a single big model across two GPUs.
5. Better Support for Multi-Machine Configurations: Support for multiple machines is similarly easy with TensorFlow, Schumacher asserted. With Caffe, one must use the MPI library. MPI was originally developed for breaking apart applications to on massively multi-node supercomputers. As a result, “For a lot of people, implementing an MPI version that is running a Caffe training process is not super easy,” Shumacher said.
TensorFlow again, offers an easy way to configure jobs for multi-node jobs, simply by setting tf.device() to the number of machines the job can run on.
Feature image via Aaron Schumacher.