Data / Machine Learning / Open Source / Software Development / Contributed

PyTorch Lightning and the Future of Open Source AI

23 Mar 2022 12:00pm, by
This post is one in a series we are running in anticipation of the ScaleUp:AI conference, taking place April 6-7 in New York. See note below for the TNS “early bird” discount code.
William Falcon
William Falcon is the creator of the open source project PyTorch Lightning and the founder and CEO of Grid.ai. He previously co-founded the now-acquired NextGenVest and spent time at Goldman Sachs. His Ph.D. is funded by Google Deepmind and the NSF.

The use of machine learning tools in research, industrial and academic settings has enabled significant leaps in our ability to both ask and answer increasingly complex questions. Those tools, however, are not without their caveats: Managing Machine Learning Operations (MLOps), especially for teams working outside of the machine learning field, can be resource-intensive, prohibitively expensive and time-consuming. We built PyTorch Lightning to begin solving this problem.

When it became clear that Lightning’s ability to provide a user experience designed to handle complex model interactions was benefitting users across the globe, we then built Grid, our in-house platform used to train those models on the cloud.

Using Grid, anyone — not just people with access to industrial processing power or deep institutional resources — could train models from their laptop on the cloud. In conjunction with our open source research framework, Grid made it easier than ever before to build, deploy and scale MLOps without managing any additional infrastructure.

In the absence of a fully unified and flexible research framework, integrating machine learning tooling poses a significant challenge for both individual users and large-scale enterprises. In addition to being costly and resource-intensive, building, managing and deploying ML tools often takes time away from the tasks and research questions at hand. Think of it like having to build an internal combustion engine every time you get in the car to do groceries. 

Open source research frameworks such as PyTorch Lightning offer an elegant, powerful and accessible solution to the array of problems that the implementation of ML tooling engenders. As our framework expanded rapidly to users in industry and academia, we became increasingly committed to developing open source AI tooling that would be sustainable, accessible and interoperable.

In addition to alleviating the extant pressures users faced when deploying MLOps, one of our key priorities was ensuring that the solutions we built alongside our community would remain widely available and supported in the long term. Our commitment to open source frameworks grew out of this desire to ensure that AI tooling remained accessible and easily integrated.

That’s always been at the core of our mission: to make groundbreaking advancements in machine learning accessible to users across a wide array of settings and users, regardless of the resources available to them. We’ve helped neuroscientists, consultants and robotics experts leverage machine learning tools across their operations, enabling them to scale those operations without duplicating work or ballooning computational costs.

We’ve been able to engage and rely on the support of this expert community by both redoubling our commitment to open source technology and responding flexibly to our users’ needs. By leveraging the agility that open source technology affords, we are able to foster an expert community while also developing and providing a product that is user-friendly, intuitive and cutting-edge.

We initially built PyTorch Lightning for the research community, people who were already experts or otherwise had significant working knowledge of ML workflows. As the framework grew, however, it became clear that our combination of open source innovation and active community was an attractive prospect even — and, perhaps, especially — for people working outside of machine learning.

Once we realized that Lightning’s community-driven framework had the potential to pose a significant benefit for a much wider array of users than we had originally intended, we doubled down on our commitment to providing flexible, integrated and open source tools.

Our goal was simple: Find out what problems the members of our community were facing when building or deploying AI tools and work with them to alleviate those pain points. We’ve thus fostered a robust community of expert developers and contributors by responding to and directly engaging with the needs of our users.

Keeping members of this community engaged has not been (perhaps counterintuitively) a challenge because we work on a foundation of mutual trust built on the desire to solve similar problems. This has allowed us to work seamlessly across the academic and industrial communities: wherever integrating MLOps or AI products has proven to be prohibitively difficult, a solution can be developed using Lightning.

As we consider what the future of developing, building, and deploying AI products looks like, our engaged, expert community of developers is at the top of our mind. PyTorch Lightning has already enabled hundreds of thousands of users to focus on the scalability of their MLOps without managing any of the infrastructure.

One of Lightning’s driving principles has been to enable state-of-the-art research using AI to happen at scale. We designed it for professional researchers to test out complex ideas on massive compute resources without losing any flexibility, and that flexibility has since enabled users across a large range of settings to leverage AI tooling easier than ever before. When we talk about successful AI deployment — an overarching theme of this April’s ScaleUp:AI conference — that’s what we have in mind: widening our community-driven framework to make those technologies available to an increasingly wider audience.

Leveraging open source technology means reducing the barrier to entry in deploying state-of-the-art AI tooling. This community-driven framework harnesses the distributed expertise of contributors from across the globe and makes it available to both enterprise and individual users. Lightning and Grid save time, cut back on repetitive infrastructure management and enhance the ability to solve increasingly complex research questions. 

The New Stack’s parent company Insight Partners is hosting the ScaleUp:AI conference, April 6-7 alongside partner Citi and the AI industry’s most transformational leaders. Bringing the visionaries, luminaries, and doers of AI and innovation together, this hybrid conference will unlock ideas, solve real business challenges, and illustrate why we are in the middle of the AI ScaleUp revolution — and how to turn it into commercial reality. RSVP today to access early bird pricing and receive an additional TNS discount on top: Use the code TNS25.

Feature image via Pixabay.