Snowflake Pushes Range of New AI, Developer Capabilities
In Las Vegas yesterday, at its annual Summit event, cloud data leader Snowflake is making a full slate of product announcements. As you might imagine, several of them are artificial intelligence-related, but Snowflake’s reveals go beyond AI, with a host of goodies for developers as well. All in all, the announcements, Snowflake’s recent acquisitions and its strategy are emblematic of its earnest aspirations to build a complete cloud application platform, albeit a data-centric one.
I’ll provide a roundup of many of Snowflake’s announcements in this post. Note that while some of the new goodies are being announced as generally available (GA), several others are or soon will be in public or private preview. I’ll try to identify which is which.
No technology reveal in the summer of 2023 would be complete without a generative AI announcement, and Snowflake is no exception here. The company is announcing the private preview of what it calls Document AI, which allows unstructured data stored in documents to be queried with natural language. The search is powered by a large language model (LLM), which Snowflake customers can fine-tune with their own data. Rather than having any dependency on OpenAI or other external LLM providers, Snowflake’s LLM is a first-party offering, deriving from its September 2022 acquisition of Warsaw-based generative AI specialist Applica Sp. z o.o.
Snowflake is not limiting itself to the software side of AI in its Summit announcements. In fact, the company is announcing a substantial partnership with GPU powerhouse Nvidia that will allow Snowflake to provide Nvidia GPU infrastructure and the company’s NeMo framework, an end-to-end platform for building custom LLMs. Again, no external LLM dependency exists here, as NeMo itself provides the foundation models, then lets customers conduct supplemental training using their own Snowflake data, all in the context and security boundaries of their own Snowflake accounts.
NeMo was announced almost a full year ago, well before the current LLM/generative AI craze. I reported on it at the time and learned then about the power of the platform. Add in the current mainstream enthusiasm around generative AI, and this offering, which would have been intriguing to begin with, now looks especially compelling.
Get the background:
- Nvidia Shaves up to 30% off Large Language Model Training Times
- Nvidia Intros Large Language Model Customization, Services
Develop Like a Native
On the developer front, Snowflake is announcing that its Native App Framework, first announced a year ago, is now in public preview, on AWS. Along with that, the company is announcing Native Apps on Snowflake Marketplace, providing a monetization and delivery platform for developers. And that monetization may be more readily realized than it would at first appear, as Snowflake is also announcing that customers can use funds allocated to their Snowflake Capacity commitment to acquire those native apps via the Marketplace Capacity Drawdown Program, which is being offered in GA.
Snowflake Native Apps allow Snowflake assets to be packaged up and made available as standalone apps. Right now, those assets include stored procedures, user-defined functions and external functions. Full-blown applications, created on the Streamlit platform (which Snowflake acquired in 2022), will eventually be available as Native Apps as well, and Streamlit in Snowflake will enter public preview soon, according to the company. Snowflake has already worked with partners to have Native Apps built, and the Marketplace is launching with over 25 of them, from the likes of Bond Brand Loyalty, Capital One Software, the Depository Trust & Clearing Corporation, and Goldman Sachs.
Important Background: Snowflake Builds out Its Data Cloud
Beyond Native Apps, there’s new stuff in Snowpark, which provides a DataFrame API for working with Snowflake data, using code that runs in the customer’s Snowflake environment. Snowpark’s GA initially supported code written in Scala and Java. Support for Python was added back in November, also in GA. As cool as that was, though, several of Snowflake’s customers told the company they wanted to develop in languages beyond that triad.
To meet that need, and to avoid having to add support for a range of languages, one at a time, Snowflake is announcing the private preview Snowpark Container Services, a Kubernetes-based platform that allows code developed in virtually any language to run in the Snowpark environment, in the context of the customer’s Snowflake account, and even be used in apps made available through the Marketplace.
Contain Your Enthusiasm
Snowpark Container Services also allows for use of open source and partner-supplied LLMs (the latter within Snowflake Native Apps). With the combination of Snowflake’s first-party LLM experiences, Snowpark Container Services and Snowpark External Services — a private preview feature that allows code in the Snowflake environment to call external API endpoints — Snowflake is promoting itself as a first-class platform for generative AI.
Snowpark Container Services features a host of standard runtimes and underlying compute infrastructure. The former includes AI and machine learning tools including Dataiku‘s data science platform and Nvidia AI Enterprise. It also includes vector databases like Pinecone and applications like Amplitude customer data analytics and Sailpoint identity security. On the infrastructure side, the offerings include Nvidia GPUs and Astronomer’s Astro orchestration services, based on Apache Airflow. And back in the world of AI, Snowflake is announcing new machine learning APIs (in public preview) and the Snowpark Model Registry (in private preview).
Data Lake Capabilities No Longer on Ice
A year ago, at last year’s Summit, Snowflake announced that it would soon offer support for database tables stored using the Apache Iceberg standard, as an alternative native data storage option. While the capability hasn’t shipped yet, Snowflake has apparently improved on its originally envisioned model for Iceberg tables and assures us the newly revamped feature will enter private preview soon.
Iceberg is one of three major open source table standards (the other two being Delta Lake and Apache Hudi) that allow data stored in data lakes, in the open source Apache Parquet format, to be enhanced with special metadata and additional capabilities that make them more like true relational database tables than simple standalone data files. These capabilities include “time travel” (wherein historical column values are retained for a period of time, allowing their values as of a certain date in the past to be viewed easily) as well as fast updates, inserts and deletes, all done with full ACID consistency.
Keeping certain tables in Iceberg format allows Snowflake to share the data with other data engines that are compatible with Iceberg or even just with Parquet. That means that professionals using other tools (for example, data scientists using Python) can read and write to the same physical data that Snowflake does.
The tricky part here is that, initially, Snowflake’s concept of treating Iceberg as a native format meant that those Iceberg tables would be managed exclusively by the Snowflake data warehouse engine. Customers who wished a less exclusive arrangement for table management could still use Iceberg tables but only as so-called external tables. The catch is that the latter are not treated as native and don’t perform as well, either. But Snowflake says the Iceberg table facility has been re-engineered to do away with the distinction between native Iceberg tables and external tables for Iceberg, converging the two into a single Iceberg table object. This means Iceberg tables, kept in customers’ own storage can be used natively, with unified governance, ostensibly with the associated performance gains, without having to be exclusively managed by Snowflake.
That Ain’t All, Folks
We’ve covered a lot here, but I haven’t been encyclopedic, and many other smaller features are being announced at Snowflake Summit, including a bunch around spend management and optimized utilization. But the biggies that I have covered here should provide the gist: Snowflake is not content to be customers’ cloud data warehouse platform. The company wants to supply a full data-driven cloud platform for data warehouse and data lake analytics, machine learning, and generative AI, as well as application development and monetization. It also wants to provide CPU, GPU and container-based compute services, thus providing a one-stop-shop for modern, data-driven cloud technology. And Snowflake wants to offer this across all three major cloud platforms, to companies of all stripes.
Can Snowflake do it? Can it move its multitude of private preview services to GA in a timely fashion? Can it counter the cloud providers’ own analytics and AI services, and successfully go up against independent data platform providers like Cloudera, Starburst and archrival Databricks (which is holding its own Summit this week in San Francisco)? While I’m not certain of the outcome, I am convinced that the ambition is genuine and rigorous and the resources appear to be vast, a pairing which often makes for a winning combination. Time and the market will determine whether all these initiatives, investments and aspirations can constitute Snowflake’s avalanche.