Microsoft Fabric Defragments Analytics, Enters Public Preview
At Microsoft’s Build conference in Seattle today, the software and cloud giant announced the public preview of Microsoft Fabric, the company’s new end-to-end analytics platform. The product both evolves and unifies heretofore separate cloud services, including Azure Synapse Analytics, Azure Data Factory and Power BI. And though all of the precursor services on the Azure side have been available as separately-billed Platform as a Service (PaaS) offerings, Microsoft Fabric unifies them under a single Software as a Service (SaaS) capacity-based pricing structure, with a single pool of compute for all workloads within the scope of the platform.
Fabric’s functionality spans the full data lifecycle, including data ingest and integration, data engineering, real-time analytics, data warehousing, data science, business intelligence and data monitoring/alerting. All such workloads are data “lake-centric” and operate on top of a unified data store called OneLake, in which data is persisted in Apache Parquet/Delta Lake format, a combination that brings relational table-like functionality to data stored in open data lake formats. And since Fabric can also connect to data in Amazon S3 (and, in due time, to Google Cloud Storage, Microsoft promises) via so-called “shortcuts” in OneLake, there’s a multicloud dimension to all of this as well.
Less Assembly Required
The New Stack was briefed last week on Microsoft Fabric by Arun Ulagaratchagan, Microsoft’s Corporate Vice President of Azure Data, who provided important color on why Microsoft created Fabric and what its goals were with it. With the benefit of that insight, it seems the real innovation in Microsoft Fabric may be around the analytics integration it delivers, which has been sorely lacking in the industry at large, and even within Microsoft’s own collection of data and analytics services.
Ulagaratchagan got to the point pretty quickly with one key observation about the analytics landscape today: “if you put yourself in the customers’ shoes, there are literally hundreds, thousands of products out there that they have to figure out what makes sense for them, how do they use it, how do they wire it all up together to be able to take advantage the data, get it to the right shape that they need, and make it work for their business.”
Fabric was designed to solve this very real deficiency in the modern data stack, and the level of integration and rationalization that Fabric has already achieved is unprecedented. I say this as someone who participated in the early adopter’s program for Fabric and who has used Microsoft data and analytics technology for almost 30 years.
The Microsoft analytics stack was fractured even in the enterprise software days, and the era of the cloud had only made it worse. What Microsoft has done with Fabric is to address these disconnects and arbitrary segregations of functionality, not just technologically, but also in terms of the pricing/billing model and the structure of the engineering organization behind it all. I will explain all of that in more detail later in this post, but first, let’s take inventory of Fabric’s components, capabilities and use cases.
What’s in the Box?
Of Fabric’s seven core workload-specific components, one, called Data Activator, which implements data monitoring/alerting, is built on new technology and is in private preview. The other six, which are based on technologies that existed previously, and are available today in the Fabric public preview, are as follows:
- Data Factory, based on Azure Data Factory and Power Query technology, provides for visual authoring of data transformations and data pipelines.
- Synapse Data Engineering, based on the same technology as the Spark pools in Azure Synapse Analytics, and related technologies for Apache Spark, including notebooks, PySpark and .NET for Apache Spark, provides for code-first data engineering. However, unlike the explicitly provisioned Spark pools in Azure Synapse Analytics, Spark resources in Fabric are provided on a serverless basis using “live-pools.”
- Synapse Data Science, which provides for training, deployment and management of machine learning models. This component is also based largely on Spark, but incorporates elements of Azure Machine Learning, SQL Server Machine Learning Services and the open source MLflow project, as well.
- Synapse Data Warehousing, based on an evolution of the original Azure SQL Data Warehouse (and, ultimately, SQL Server) technology provides a “converged” lakehouse and data warehouse platform.
- Synapse Real-Time Analytics, which combines Azure Event Hubs, Azure Stream Analytics, Azure Data Explorer and even open source event streaming platforms like Apache Kafka, allows analytics on IoT, telemetry, log and other streaming data sources.
- Power BI, Microsoft’s flagship business intelligence platform, but soon-to-be enhanced with a new large language model AI-based Copilot experience that can generate DAX (Data Analysis eXpressions — Power BI’s native query language). In many ways, Power BI is the “captain” of the Microsoft Fabric team as its Premium capacities and workspaces are the basis for their counterparts in Fabric.
The inclusion of Power BI in Microsoft Fabric goes beyond its own capabilities and provides integration with Microsoft 365 (read: Office) by extension. Tight integration between Power BI on the one hand, and Excel, PowerPoint, Teams, SharePoint and Dynamics 365, on the other, means the power of Fabric can be propagated outwards, to bona fide business users and not just business data analysts.
All for One
Despite the varied branding, which may have been done for reasons of continuity or politics, Fabric is a single product with a single overarching user interface and user experience. As Ulagaratchagan explained, “even though it is seven workloads running on top of One Lake, it looks and feels and works from an architecture perspective as one integrated product. We wanted to make sure we conveyed how all of these experiences just flow. That’s why the Fabric name seemed appropriate.”
Although persona-specific UIs are provided for different workloads, they are more akin to “skins” or “views” than distinct products. In fact, The “Create” button in Fabric’s navigation bar presents a menu of all artifacts from all workloads, as an alternative to the compartmentalized experiences and, in the process, emphasizes the integrated nature of it all.
Whether the workloads are engaged from their respective user interfaces or the general one, the “artifacts” created in each are kept together in unified Fabric workspaces. And because the basis for each data artifact consists of Delta/Parquet data files stored in OneLake, many of the assets are just thin, workload-specific layers that sit atop those physical data files. For example, a collection of data in OneLake is directly readable and writeable by Spark, as with any data lake, but it can also manifest as tables in a relational data warehouse or a Power BI dataset.
Of course, each artifact type can contain its own unique assets; for example, a warehouse can have views and stored procedures, and a Power BI dataset can have measures and hierarchies. None of the components in Fabric gets dumbed down, but several (and, eventually, all, quite possibly) use Delta/Parquet as a native format, so the data doesn’t need to be replicated in a series of component-specific proprietary formats.
Cooperation, Not Duplication
This means that, in Fabric, a data engineer writing Python code running on Spark, a data scientist training a machine learning model, a business analyst creating sophisticated data visualizations, and an ETL engineer building a data pipeline are all working against the same physical data. And folks using other data platforms — including Databricks or non-Microsoft BI tools — can share this same physical data too, because OneLake is based on, and API-compatible with, Azure Data Lake Storage, to which most modern data stack technologies have connectivity.
In the case of Power BI, the ramifications get even more interesting. When working with data in Microsoft Fabric, a BI engineer doesn’t have to decide whether to import the data into a Power BI model or leave it in OneLake and query it on the fly. With the aid of something called Direct Lake mode, that distinction goes away, because the data in OneLake already is in a Power BI-native format. Power BI already supported composite models, where Import and DirectQuery access methods could be combined in a single model. But with Direct Lake mode, composite models aren’t necessary, as the need to import data from OneLake is simply eliminated.
Eliminating the segregation between services, artifacts and data formats means the economics get simpler, too. Fabric’s capacity-based compute model provides processing power that is fungible between, and usable from, all of its workloads. Ulagaratchagan had this to say on that subject: “We see this as an opportunity for customers to save a ton of money because today, often every analytics product has multiple subsystems. These subsystems typically require different classes of products, often coming from different vendors, and you’re provisioning multiple pools of compute across many different products, and weaving them all together to create one analytics project.”
While Ulagaratchagan identifies this as a problem with a multi-vendor approach, even a Microsoft-only solution has up until now suffered from the same issue. The combination of Power BI and the present-day Azure services needed to create an equivalent to Microsoft Fabric has up until now required separately provisioned compute. This sprawl can even be an issue within an individual service. For example, Azure Synapse requires the management of four different types of compute clusters (Dedicated SQL, Serverless SQL, Spark and Data Explorer), three of which invoke separate infrastructure lifecycles and billing. Fabric eliminates these redundancies and their accompanying complexity and expense.
Take Me to Your (Common) Leader
There’s corporate unification at play here too. Many of the teams at Microsoft that built Fabric’s forerunner technologies — like Azure Synapse, Data Factory and Power BI — worked together to build Fabric and are all part of the same organizational structure under the management of Ulagaratchagan and the technical direction of Technology Fellow Amir Netz, the duo that previously led the standalone Power BI organization. This alignment is rather unprecedented at Microsoft, a company infamous for its internal competition and the sometimes disjoint technologies that result. The challenges here involved geography, too: engineering teams in the US, India, Israel and China, each with their own culture and operating in their own time zones, worked together in a remarkably cohesive fashion to build Fabric.
Building a federated product team like this was a calculated but very big gamble. Frankly, it could have gone horribly wrong. But from my point of view, morale was high, hubris was practically non-existent and top talent at Microsoft was skillfully honed to build a very comprehensive platform that changes the analytics game immensely. All three cloud providers were guilty of creating numerous siloed services and putting the burden of implementing them in combination on the customer. That’s not just an issue of discourtesy or insensitivity — it’s one of expense too, as customers need either to allocate significant human resources to such projects, or else invest an enormous amount of capital in the consulting talent necessary to carry them off.
Some of the technologies that now work together within Fabric have literally decades of history. And those that are newer had to be integrated with the older ones, and each other. Much as PC “defrag” tools tidy and reorganize files on spinning hard drives that have become scattered across the disk, Microsoft has had to defrag itself and its analytics technology stack to get Fabric built. Even if much of the technology itself isn’t new, the harmonious unification of it is a huge breakthrough that will enable new analytics use cases because of new simplicity, ease of use, efficiencies and economies of scale.
How to Get Started
The public preview of Microsoft Fabric begins immediately. Microsoft says Power BI Premium customers can get access to Fabric today by turning on the Fabric tenant setting in the Power BI admin portal, which will have the effect of upgrading Premium capacities to support Fabric. Customers can also enable Fabric in specific capacities instead of their entire tenant. Microsoft says that using new Fabric functionality (versus capabilities that were already available under Power BI) will not incur capacity usage before Aug. 1, 2023, but customers can still use the Capacity Metrics app to monitor how Fabric would impact capacity usage were the meter running.
Non-Power BI Premium customers can get access to a free 60-day Fabric trial.
There’s more work to be done. Not only does Fabric have to move from Public Preview to GA, but more functionality is required. Microsoft is promising Copilot experiences for all workloads, rather than just Power BI. The company says this is in private preview now, so general availability would seem a long way off. Likewise, Data Activator needs to move forward to public preview and full release. And data governance functionality of the type offered by Microsoft Purview will be needed in Fabric to make it a truly complete offering. For now, the lineage, impact analysis and asset endorsement capabilities of Power BI will have to do.
There’s lots of other work ahead, too. Just because the Fabric team was successful with the heavy lift required to get where they are now doesn’t mean the pressure’s off, by any means. As the product moves to public preview, the technology, the pricing model, and the very notion that customers will prefer the end-to-end approach to “a la carte” functionality and procurement will now all be put to the test. There’s more pivoting, more innovation and more risk management required; and if Fabric is successful, competition from AWS and/or Google Cloud is almost sure to follow.
But Microsoft can celebrate significant interim success with Fabric already and betting on its ultimate success seems prudent. In an era when everyone’s going gaga over AI, we need to keep in mind that AI’s models are only as good as their underlying data, and the engineering used to discover, shape and analyze it. AI may get a lot of the “oohs and ahs” at Build, but I’d argue Fabric is the real news.
Disclosure: Post author Andrew Brust is a Microsoft Data Platform MVP and member of Microsoft’s Regional Directors Program for independent influencers. His company, Blue Badge Insights [www.bluebadgeinsights.com], has done work for Microsoft, including the Power BI team.