Deep Learning Dissected: The Role of DevOps Teams and Workflows
This post is part of a series called “Deep Learning Dissected” contributed by IBM, that explores the challenges in adopting deep learning-based cognitive systems. Adel El-Hallak will be speaking on this topic next Thursday at the Supercomputing 2017 conference in Denver.
Growing up in Montreal, my favorite time of year was when the city hosted the Formula 1 (F1) Grand Prix. This single-seat auto race, on the surface, certainly looks like an individual sport — the guy or gal in the car. But there is way more to F1 than the driver.
There is the entrant sponsor, who registers the car and driver and maintains the vehicle. Then there is the constructor who builds the engine and chassis and owns the intellectual rights to the design. There is also a performance engineering team that precision tunes the car for optimal performance. Then, of course, there is the pit crew: 10-15 people who prepare and maintain the vehicle before, after, and during the race.
Point is, despite the fact that the cameras are largely focused on the driver, the real story is of collaboration: experts working together to make a difference between success and failure. Oftentimes, data scientists are given the broad, general directive to bring Artificial Intelligence (AI) into their organization, but like a driver without a car, data scientists cannot win without leveraging a team of experts.
Assembling Your High-Performance Team
Realizing business value through machine and deep learning requires the same level of teaming across an organization. While there are countless stories solely focused on the data scientists, the increasing demand and scarcity of their skills and their ballooning salaries, they omit the importance of the remaining stakeholders and their roles in helping realize success, much like we forget about broader F1 teams.
Before initiating any machine learning project, a data scientist needs to work closely with the business unit’s analyst to define the use case that addresses the business needs and to define the corresponding goals of the predictive model. This will often entail multiple interactions between the data scientist and business analyst as the data is reviewed, limitations are discovered and the model is refined until it is ultimately approved by the business unit. This also needs to be done in the context of the data governance policies adopted across the organization and, more importantly, their respective industry, further stressing the importance of teaming up.
If governance policies dictate the need for data to remain on-premises, data scientists have to work with their IT department to ensure the appropriate infrastructure is in place to support these compute-intensive workloads. Considering the frequent use of open source frameworks, data scientists also need to ensure their favorite tools are supported by the infrastructure and align with the organization’s service level agreements.
Since deep learning is highly dependent on large amounts of data, understanding the data that is available to an organization, where it resides, how it can be accessed, cleansed, transformed, and subsequently labeled, are also pre-requisites to AI workflows. So a data scientist also needs to partner with the data engineer in parallel to the interaction with the business analyst during use case derivation. The interactions here are also iterative so to ensure the data being collected is sufficient and satisfies the requirements of the predictive model.
Time Trials for Deep Learning
Once the above pre-requisites are met, the iterative process of building, training and optimization begins. While this is the phase of machine learning that presents the largest market skills gap, having a great data scientist does not translate to organizational success, irrespective of how awesome or accurate the predictive model is. An application developer, working closely with DevOps, needs to build, deploy and manage apps that put the deep learning algorithm to work. Part of this work is to ensure the integration with the operational data against which these models will infer or score.
Even at this stage, data scientists are not off the hook. With the model scoring against the operational data, its accuracy needs to be monitored to ensure it is not losing precision, or better yet, the new data is leveraged to improve the model’s accuracy. This usually entails retraining a model with the feedback data before redeploying it into the application again.
Winning the Deep Learning Race as a Team
Deriving business value through machine learning is a multidisciplinary team sport. Up-front teaming, during the use case derivation stage, is a fundamental prerequisite for success. And the partnership between these stakeholders needs to be maintained throughout the cyclical and iterative workflows, even once models are monetized.
So for all the attention that data scientists and F1 drivers receive, the reality is, they can only achieve success through collaboration with the broader teams.
Team Up So You Don’t Burn Out and Crash
As IDC recently pointed out, organizations need to plan for avoiding the “infrastructure wall” as they progress through their deep learning projects. They’re doing this by forming cross-discipline teams as I’ve outlined, and you can read more about IDC’s findings.
Stay tuned for my next column that will discuss how once a data scientist has created their team and cultivated buy-in across their organization, they can go about preparing data for deep learning while avoiding the common pitfalls during this time-consuming process.
Feature image via Pixabay.