Snowflake Delivers Bevy of Developer Goodies
At its at its Snowday 2022 event, being held in San Francisco and online today, Snowflake announced a grab bag of cool dev features for its cloud data warehouse platform. A couple of the bigger announcements are specifically for Python developers, and piggyback nicely on Snowflake’s acquisition of Streamlit in March of this year. But SQL developers get some good stuff too, and all of the announced features amplify developers’ prowess and participation in the analytics world.
Torsten Grabs, Snowflake’s Director of Product Management, is pretty proud of this hit parade of announcements, explaining that Snowflake is “…giving builders the data access and tools they need to accelerate their pace of innovation securely under Snowflake’s one unified platform.” Grabs also briefed The New Stack on these features, taking us through them one-by-one.
To begin with, Snowflake is releasing to general availability (GA) its Snowpark for Python API. This one’s a big deal, as it provides both a client- and server-side coding solution for what seems to be the world’s favorite programming language, especially in the analytics and data science worlds. Snowpark lets developers code imperative data pipelines, client applications, stored procedures, user-defined functions (UDFs) and user-defined table functions (UDTFs).
The Snowpark feature previously worked for code written in SQL, Scala and Java, but today’s announcement puts Python on even GA footing with those languages. Python developers just love working with DataFrames, the cursor-like data structures that let you loop through a result set and read or manipulate the data within them, row-by-row. And Snowpark for Python lets them do that to their heart’s content, but in an optimized fashion that provides type checking and which minimizes server roundtrips using a technique called lazy execution.
UDF code executes on the Snowflake cluster, and even for code running on the client, Snowpark supports pushdown operations so much of the heavy lifting is still delegated to Snowflake. In order to stand up to such workloads, Snowflake has another feature called Snowpark-optimized warehouses, a public preview of which (on AWS-based Snowflake clusters) is being announced today.
Even more Python goodness is on the way, as Snowflake works on fully integrating Streamlit inside the Snowflake cloud. It will allow full-fledged applications to be hosted on Snowflake and execute in its security context, working especially well for machine learning (ML) applications. This initiative was first announced in June when Snowflake announced its Native Application Framework. It is still under development, but Snowflake officials told The New Stack that a private preview of the feature would launch soon.
Also read: Snowflake Builds out Its Data Cloud
Declarative, Materialized and Dynamic
Unlike Python, SQL is a declarative language. Rather than doing things imperatively, row by row, SQL lets developers describe and manipulate whole sets of data with a single verbose command. As a result, it’s a great paradigm for doing all sorts of things with data, beyond typical CRUD (create, read, update and delete) operations. That’s why Snowflake has used SQL to create a declarative streaming data pipeline feature. The feature was initially christened Materialized Tables, but is now launching in public preview form as Dynamic Tables.
SQL-based declarative pipelines aren’t new, but in the case of Dynamic Tables, they not only make pipelines more straightforward, but they also make it easier to work with streaming data. By creating a table-like abstraction over data streams, and letting them be read and processed via SQL, developers who are not streaming data specialists can nonetheless work with such data in a precise way, using a paradigm familiar to them. For a more concrete sense of how all this works, Snowflake created a very good video on the feature, complete with a code sample, that you may wish to view — it’s only a bit over three minutes long. By the way, Dynamic Tables and Snowpark for Python aren’t mutually exclusive. The two features can be used together, which creates interesting possibilities for implementing real-time predictions with ML models.
Speaking of pipelines, authoring them is one thing, but managing them is quite another. Since pipelines run on an unattended basis, things can go wrong, and sometimes it can be hard to detect and address these exceptions. With that in mind, Snowflake is launching a set of features around pipeline observability. One of those features, now launching in public preview, provides a visualization-based monitoring interface for data flow, control flow, and task history, to help developers and DevOps personnel ensure healthy operation of pipelines. Observability also includes alerting, logging, and event tracing, all of which are launching in private preview.
The Snowday event brings other announcements too, around the “Powered by Snowflake” program, partner investments, and joint solutions with partners, along with cross-cloud data governance and business continuity. But we’ve covered the developer goodies extensively here, and there’s a lot to them. We know Snowflake wants to build its ecosystem ever bigger, and the company seems to understand very well that wooing, and empowering, developers is critical to making that ecosystem growth a reality.