Data Science / Software Development

Streamlit: An App Builder for the Data Science Team

6 Dec 2021 3:00am, by

Not only are companies collecting vast amounts of data, but new types of data like geographic data and sentiment analysis that’s being used to not only chart the past but, with machine learning, to predict the future.

Yet companies haven’t been able to take full advantage of the data they have because sharing it internally took too much time and human resources to build the kind of applications to fully harness the data.

Enter Streamlit, an open source framework making it easy for data scientists to quickly build web apps to access and explore machine learning models, advanced algorithms and complex data types.

“There is a completely new class of business intelligence problems that didn’t exist five years ago, and the traditional ways, using Tableau or Microsoft Power BI, just saying, ‘Let’s put up a dashboard, and let’s put up some graphs, and, and graph this data’ just no longer works in this world,” said Adrien Treuille, co-founder and CEO of Streamlit.

He and co-founders Amanda Kelly, and Thiago Teixeira, met while working at the innovation lab Google X in 2013.

They began with the question: What if we could make building tools as easy as writing Python scripts?

They wanted data scientists and machine learning engineers to be able to build apps that would let them interact with the data without having to call in a tools team or manage backend data engineering tasks.

Today the San Francisco-based company, which open sourced the technology in 2019, has more than 16,000 GitHub stars and a community of more than 30,000 developers around the world. It is used by the likes of Delta Dental, Caterpillar, 7-Eleven, Uber, Ford and Pfizer.

Streamlit began with the question: What if we could make building tools as easy as writing Python scripts?

“Building a small web app in Streamlit takes me 10% as long as it’d take to build the same thing with a conventional app-building approach. Streamlit is an even bigger win for data scientists who don’t know JavaScript, since Streamlit lets them build everything in Python,” said former Google data scientist Dan Becker, founder of Kaggle Learn and Decision.ai, now vice president of product, Decision Intelligence at DataRobot.

“Historically I’d have to manage frontend code, backend code and communication between them. With Streamlit, I can specify how I want the page to work in Python, and it takes care of everything. The pages look nice by default, saving me the trouble of writing CSS. Streamlit is uniquely easy to learn. It takes about 10 minutes to learn enough to be productive.”

 

Streamline's data visualization

Part of Existing Workflow

Rather than build a one-size-fits-all tool, the idea was to create Lego-like capabilities to let users create their own ways to make sense of their data. That might mean building sliders with different variables or pulling out subsets of data into sidebars to look at it in different ways.

These apps are visualizations of data written as just a few lines of Python code, the mainstay of data scientists’ existing workflow. React is the frontend framework used to render data on the screen.

Streamlit treats widgets as variables. Every interaction simply reruns the script from top to bottom.

It downloads the data only once, using a cache primitive that acts as a persistent, immutable data store that enables the app to safely reuse information. That eliminates redundant data fetches and computation.

The product deploys apps directly from private Git repos and updates instantly on commits.

It integrates with popular Python libraries used in data science such as NumPy, Pandas, Matplotlib, Scikit-learn and others.

“From my perspective, Streamlit is by far the fastest method to turn an interesting bit of analysis, machine learning model or clever visualization into a data product that you can easily show to other people online,” said Tyler Richards, a data scientist at Facebook who also wrote a book on Streamlit.

“I consistently have this problem where I have an awesome result at work or for a personal project, and am forced between dumbing it down to something I can stick easily into a dashboard or Word doc (a static graph or some basic performance stats on my model), or spending a huge amount of time creating a custom Flask/Django app. Streamlit is the best of both of these worlds, because I can just directly create a fully functioning web app from my already created Python script and use their tools to host it easily.”

 

More visualizations!

Hours, Not Weeks

Treuille pulled from his experience working with students on machine learning projects as a professor at Carnegie Mellon University and as a vice president of autonomous vehicle startup Zoox.

With Streamlit, a project that previously would take weeks can be done in a few hours, he said.

“[The data science] group has unique challenges the company has never seen before, particularly when it comes to how do we make available the insights that we’re producing, scalably, so that the marketing team can directly benefit from a model that we’ve built that predicts the future, or so that the product team can themselves look through all of this geographic data filtered in ways that are not traditionally possible, and then jump in and see sentiment analysis applied to this or that country,” he said.

“So those are the kinds of like, next-generation challenges that data scientists and machine learning engineers are very good at solving, but which have not been systematically shared more broadly in the company.”

The company built on the open source technology adds on enterprise-grade data security and authentication as well as collaboration features for both data scientists and their customers.

“Literally in an afternoon, within the work that you’re already doing, you can go from an analysis that was primarily for yourself … to something that’s interactive and shareable with somebody else,” said Kelly.

“We’ve had people tell us all the time, ‘This would have been 10,000 lines of code, if I had to put this in a different language like Flask, and it was, like 100 lines [in Streamlit].’ Or ‘This took another team three and a half months to build; I replicated the exact same thing in six hours.’”

New Features in 1.0

Though Streamlit can be deployed anywhere, the company recently announced Streamlit Cloud to handle containers, authentication, scaling, security and more.

The company’s physical infrastructure is hosted and managed on Google Cloud Platform (GCP), taking advantage of its built-in security, privacy and redundancy features.

Users’ permission levels are those assigned in GitHub. Workers with write access to a particular app can make changes, but only those with admin access can deploy an app or delete it.

The technology recently reached the 1.0 milestone.

“We spent basically all of 2020, and a good chunk of 2021, both adding these features, but also hardening, making sure that we were really testing with the community, really figuring out and saying, ‘Is this not just the fastest way to go out and build an app, but the best way to do that in terms of the primitives and ease of use,’” Kelly said.

Among those new features:

  • Improving caching by harnessing Apache Arrow for serialization and memory management, which added speed and responsiveness.
  • Providing more customization with app layout primitives and themes to enable users to match their company brand.
  • Adding statefulness with session state and forms to enable users to create more complex apps.
  • Adding components and integrations to enable users to write their own components or pull in libraries like SpaCy, HiPlot or Folium. New functionality also includes the ability to send and receive video or draw on a canvas.

Its roadmap includes plans to add to its widget library, improve the developer experience and make sharing of code, components and apps easier.

In a blog post, Crystal Huang, who describes herself as an aspiring data scientist, described her project using Streamlit to apply face mask detection to photos using deep learning algorithms.

Streamlit has raised $62 million, most recently a $35 million Series B round announced in April from Sequoia and previous investors Gradient Ventures and GGV Capital.