How Microsoft Fabric aims to beat Amazon and Google in the cloud war

How Microsoft Fabric aims to beat Amazon and Google in the cloud war

Microsoft unveiled a new cloud data and analytics platform two weeks ago that analysts say give it an edge over its main rivals, Amazon and Google, in the fiercely competitive cloud market. 

The platform, called Microsoft Fabric, is a comprehensive suite of tools that allows enterprise customers to store, manage and analyze the data that drives their most important applications. It also integrates products that cater to all of a company’s data users, from engineers who handle the technical aspects of data processing to analysts who want to derive insights and make decisions from the data. (See our reporting on the announcement). 

Microsoft Fabric, which is currently in public preview mode and will be updated with more features in the coming months, surprised many industry experts who were not briefed by the company beforehand. Some reserved full judgment until they can see it work in practice. But they praised the platform as a significant advancement that could help Microsoft “leapfrog” Amazon and other cloud providers, such as Google — at least when it comes to serving large enterprise companies. Fabric will also pressure other tech providers like Snowflake and Databricks, a close partner of Microsoft, analysts said. 

“With all these capabilities coming together,” said Noel Yuhanna, an analyst at Forrester, “Microsoft definitely has a slight advantage over the other hyperscalers at the moment.”

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

Even before the announcement Microsoft had already become a leader in data and analytics software, according to Gartner, a research firm. But with Microsoft Fabric, analysts said, the company has taken its offerings to a new level of integration and ease of use that could be hard for its competitors to match anytime soon.

While Fabric gives Microsoft a dominant offering, the key is now in execution, analysts said. Amazon’s AWS cloud service still enjoys a clear lead over Microsoft’s Azure in overall revenue, and will probably continue to do so for some time. But in the area of enterprise analytics and data, Microsoft’s cloud offerings now lead in terms of breadth of capabilities. “But the ability to execute is often defined by sales. So that number is yet to be proven,” said Hyoun Park, analyst at Amalgam Insights.

Fabric’s secret sauce: OneLake

So what makes Microsoft Fabric stand out? According to analysts, it is the way Microsoft has simplified and unified its data architecture with a single data lake, called OneLake, that can store and allow access to all kinds of data from different sources and applications. 

This approach, they said, will offer significant benefits to customers in terms of cost savings, transparency, flexibility, governance and data quality. OneLake is designed to be the central repository for not only the data generated by Microsoft’s own software services, but for data from external sources, such as third-party applications. It also provides a consistent experience and interface for users, regardless of the type or format of the data. This may sound like an obvious idea, but it has been elusive for most cloud providers, including Microsoft, Amazon and Google. 

Over the years, these tech giants have acquired or developed dozens of software tools for various data and analytics tasks, such as business intelligence, data science, machine learning and real-time streaming. But they have largely bolted together these tools in a piecemeal fashion, without creating a coherent and seamless platform.

As a result, customers have to deal with a complex and fragmented landscape of tools and databases, each with its own provisioning, pricing and pooling of data. This creates frustration and inefficiency for customers, who have to spend more time and money on managing their data infrastructure. It also imposes an “integration tax” on customers, who are charged separately for each service’s compute and storage resources.

Microsoft Fabric promises to eliminate this Frankenstein-like complexity by offering true integration — including only one copy of data, and one experience and one interface. “Part of the innovation here is that Microsoft is providing all of these capabilities by themselves as an integrated package,” said Amalgam’s Park. “And as simple as that sounds, it’s not something that the majority of data and analytic vendors are able to provide.”

Jason Medd, an analyst at Gartner, concurs. He said that Gartner’s surveys of chief data officers have shown that only about 30% said they get value from their data and analytics tools. By integrating its tools and lowering its prices, Microsoft is addressing these pain points, Medd said.

How OneLake data lake works

How does Microsoft achieve this simplicity and unification with OneLake? The key is that OneLake stores a single copy all the data from Microsoft’s various services in a common format, called Apache Parquet. This is an open-source file format that is widely used in the industry and that organizes data by columns.

This makes it easier and faster to query and analyze data. Whenever customers add or update any data to their systems, Fabric automatically saves it in OneLake in the Parquet format, regardless of its original format. This means that customers can access and query their data from OneLake directly, without having to go through multiple sources or services.

For example, if a customer wants to use Microsoft’s business intelligence tool, Power BI, to analyze data from Microsoft’s data warehouse, Synapse, they do not have to send a query to Synapse. Power BI simply retrieves the data from OneLake. This reduces the number of queries across services and lowers the cost for customers, who are charged for a single storage and data bucket, instead of multiple ones.

How OneLake pulls in data from external sources

OneLake’s simplicity and unification also extend to data from outside Microsoft’s ecosystem. This is where the technical details matter: OneLake stores its data tables in an open-source format called Delta Lake, which creates a single layer of metadata that converts raw data from various sources, such as CSV or JSON files, into a common format that can be analyzed by any compute engine in the industry. 

“Microsoft has done the right thing here,” said Tony Baer, an analyst at DBInsights, of its embrace of open source. 

He said that the competition among vendors is not about file formats, but about achieving a standard of accuracy and consistency, known as ACID, for databases. And Fabric’s integration, through open formats, is a step in that direction. Microsoft makes it easy for customers to transform data from third-party services with its Data Factory, which offers more than 150 pre-built connectors. 

Microsoft is also working on ways to automate the transformation process, instead of relying on the traditional and time-consuming method of extract, transform and load (ETL). 

Microsoft Fabric also supports multicloud scenarios, something that Amazon has been slow to do. With a feature called “Shortcuts,” OneLake can virtualize data storage in Amazon’s S3 and Google’s storage (coming soon). 

“Now that you’re going to a single open format that’s shared, all these engines can work natively with the data as opposed to getting fragmented,” said Arun Ulagaratchagan, Microsoft’s corporate vice president of Azure Data, in an interview with VentureBeat. He said Microsoft is the first major cloud vendor “that is moving away from completely protected formats to completely open formats.”

Ulagaratchagan said that he had talked with 100 of the Fortune 500 companies over the last few years, and they were most excited by Fabric’s promise of lower cost, ease of use and lack of lock-in.

Fabric’s integration work took years

Microsoft’s Fabric announcement may have seemed sudden, but it was the result of at least four years of work by the company to break down silos and integrate its data services. This also involved overcoming internal politics and turf wars among different executives. 

One of the milestones was Synapse, which combined several services, such as data lake and data warehouse, into a single hub. But Fabric is the ultimate integration, bringing together Synapse, Power BI and other data services as a single software-as-a-service (SaaS) offering.

“I think it’s leapfrogging,” said Andrew Brust, an industry consultant who runs BlueBadge Insights, referring to Microsoft’s move with Fabric. “The functionality is comprehensive and cohesive, and that hasn’t been possible before.”

Brust acknowledged that he is biased. He said Microsoft is a client of his, and that he is a Microsoft Data Platform MVP, which made him part of a small group of consultants, customers and partners who were privy to Fabric before its announcement. Brust also said that Microsoft’s offering of Fabric as an SaaS, rather than a platform-as-a-service (PaaS), was significant. It means that data engineers do not have to deal with provisioning units of compute, which simplifies their work. He said Amazon and Google still have a lot of work to do in this area.

Quality of data is the key to winning the enterprise cloud race

Analysts also emphasized that the main competition among cloud providers is about the quality of data, which is what enables customers to get better insights and make better decisions. 

Noel Yuhanna, an analyst at Forrester, said he talks to three or four enterprise customers every day who complain that moving to the cloud did not solve their problems with data quality. “We get compute, we get storage, we get Kubernetes,” Yuhanna said, summarizing the view of most enterprise executives. “That’s cool. But did we really modernize the system?” He said that’s why system integrators, such as BearingPoint, Capgemini, Infosys and Wipro, have so far made the most money from providing insights from the cloud. They have consultants who write up reports on what they find from the data. 

That’s also why Microsoft is pushing forward with Fabric. By connecting data sources together, Fabric improves the consistency and trustworthiness of data, Yuhanna said. “The biggest challenge with data replication is that data is all over the place, and you can’t get consistent data anymore … Fabric really gives you that consistency of data.”

By providing one place to go, it is like providing a single window to look through data management: “Security, governance, integration, discovery, that’s exactly what this is about,” he said. 

If customers want to apply security rules to their data, they can do much of this at the OneLake level. And all of the Fabric applications downstream that access the data will have to follow those rules, Microsoft said in its announcement. For example, if customers have sensitive salary information in Power BI that they only want a certain team to access, they can set up rules that ensure this. And files will carry the same rules, wherever they are exported — even carrying the same encryption if sent outside of Microsoft’s Fabric.

Microsoft catches up with the ‘lakehouse’ trend

One of the areas where Microsoft has lagged behind some of its competitors is the so-called “lakehouse,” which combines two technologies: a data lake to store a company’s data, and a data warehouse to analyze it.

The lakehouse has become popular because of the rise of apps like artificial intelligence, which require massive amounts of data and analysis. And one company in particular, Databricks, has been a pioneer in creating a secure, open lakehouse that many analysts consider industry-leading. Databricks, after all, created the Delta Lake protocol.

Another company, Snowflake, has also offered a well integrated lakehouse product. Microsoft’s offerings in this area, under its Synapse brand, have reportedly not performed as well, and Microsoft has compensated for this by forming a close partnership with Databricks, offering its support on its Azure cloud. So it is no surprise that Microsoft’s Fabric adopted the Delta Lake protocol as well. All customers who use Databricks will continue to be happy using Microsoft’s Fabric.

But Fabric’s integration also narrows the gap with Databricks and Snowflake, and aims to surpass them, analysts said. Fabric extends the open format pioneered by Databricks to the rest of Microsoft’s data stack, which is more comprehensive. While Microsoft’s Ulagaratchagan says Microsoft is happy to give customers choice by working with platforms like Databricks, he also makes clear that Microsoft’s Synapse intends to lead the lakehouse market: “We literally intend to be the best in breed and best of suite,” he said. 

Microsoft’s single experience, and move to SaaS offering, helps Fabric’s Synapse leap ahead in some key aspects, analysts said. Databricks remains a PaaS offering, which means that data engineers still have to do more work and specify things like the number of nodes they want to run processing jobs.

Microsoft Fabric combines its strength in business intelligence (Power BI) with data science, and adds other capabilities, such as pattern detection and workflows (Data Activator), and that’s “a big deal,” said Amalgam’s Park. He said that bridging BI to AI continues to be a challenge for the enterprise. Microsoft is providing a package that solves this to a greater extent than any of its competitors.”

The power of generative AI has yet to be realized

Finally, Microsoft said it is using its new generative AI technology, acquired from its investment in OpenAI, to enhance its Copilot tool. Copilot helps users perform tasks, such as reading and summarizing data reports. With OpenAI’s technology, Copilot can now allow developers and analysts to use natural language to ask questions of data, and to receive answers in natural language as well. Here, Microsoft’s Ulagaratchagan says that while this will improve productivity, the full impact of applying generative AI across the Fabric offerings will take some time to be seen.

After all, Fabric is the first time customers have experienced an end-to-end integration of their data, and they have yet to explore what generative AI can do.

“You can think about it not just accelerating a step in the customer’s journey with generative AI, but the entire journey, so that’s an opportunity that customers haven’t found yet,” Ulagaratchagan said. “It’s critically important that we learn from actual customer usage and get the experience right.”

[Editor’s note: VentureBeat will be hosting VB Transform, a networking event in SF on July 11 & 12 for decision makers, to debate technologies like Microsoft Fabric, and other enterprise strategies around AI and data infrastructure. Register now, and look forward to seeing you there.]

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *