All articles

Exploring the VMware Software Defined Datacenter (SDDC) - Part 7 - Private AI

Author:

Rob Sims

Hybrid Platforms

•  Nov 05, 2024

Welcome back to the VMware SDDC series for Part 7. 

The previous parts can be found here and cover the core components of the VMware Cloud Foundation (VCF) architecture around Storage and Networking, and the core elements of Compute and Operations. 

Parts 7, 8, and 9 will cover some of the innovations that VMware by Broadcom is building onto the VCF foundation. These will be Private AI, Data Services Manager and Live Recovery. 

Back at Explore in 2023, VMware previewed a new collection of technologies to simplify the deployment of Generative AI services into private cloud architecture, building on the core capabilities of Cloud Foundation. Throughout the early parts of 2024, we kept track of the evolution, and then at Nvidia GTC in March, we got the initial availability announcement, followed by the General Availability release in May. When we look at the dates of these two milestones (March 18th > May 6th), we have a gap of just over a month; a hopeful glimpse that the new Broadcom strategy will start to deliver faster innovation for customers! 

Why do we need VMware Private AI Foundation with NVIDIA? 

VMware and NVIDIA have created a vision for an enterprise-grade generative AI solution focused on enabling choice, flexibility, performance, and security. The two big questions are why, and what exactly is this vision?  

Why? 

Unless you have been in hibernation, missing the Generative AI hype throughout 2023 and 2024 would be almost impossible. The use cases are vast, and the impact reported ranges from around 800 billion in the UK to over 4 trillion globally (many other numbers depending on your favourite analyst).  

Regardless of the numbers you settle on, the reality of its impact cannot be denied, especially in larger enterprise customers. Smaller customers will likely take advantage of packaged solutions as the technology becomes more commodity and commercially accessible. To understand more about what has happened, look at this Hybrid Platforms Trends Series. 

One thing that is true for any organisation looking at Generative AI is the need for large amounts of high-quality data. The other truth is that the required data is likely to be of a sensitive or critical nature, bringing risks to intellectual property and legal ramifications. When we combine this with the increasing stories of GenAI hallucinations hitting the headlines, many organisations are rightly concerned about the reputational impact of getting it 'wrong'.  

This all means that privacy is the key challenge of the generative AI journey, and one that should be top of mind for every organisation. VMware consolidates this into the following three areas, which really boil down to two things. Firstly, I want to control my data so it's not used to train models that competitors can benefit from, and secondly, I want to control the models used to ensure my reputation is not the next news story! 

The challenges these concerns bring can be grouped into four categories, as shown in the image below: 

Cost: Running large language models requires a lot of GPU, CPU, and Storage power, and this can get expensive quickly when running in a public cloud. Leveraging a cloud PaaS solution could be the place to build that PoV, but once you need to scale and operate over an extended period, the cost will likely outstrip the value.  

Compliance: In the simplest version, if you leverage a SaaS AI platform, how can you be sure what models are running behind that platform? How could your data be used, and therefore, what is the impact of hallucinations?  

Choice: The number of available models and ways of leveraging them is growing daily; you need the option to adopt the latest innovation that could quickly increase accuracy or reduce costs. 

Performance: Training and Tuning models or running retrieval augmented generation (RAG) against an LLM usually requires a lot of computing power. Getting top performance from that investment is critical and will often require specialist storage and networking architectures.  

When looking to solve these issues, we trend towards on-premises deployments, which could bring a world of complexity many enterprise organisations don’t want to deal with, from managing the underlying platform to deploying AI-ready workloads that enable your data scientists and developers to execute at speed. This is why VMware and Nvidia have created Private AI. Allowing you complete control over your data and AI models, allowing easier compliance and security while removing the operational burden. 

What is VMware Private AI Foundation with NVIDIA?  

So, what is this new architecture, and how does it solve the challenges discussed above? The press release for VMware Private AI Foundation with NVIDIA read as follows:   

"VMware Private AI Foundation with NVIDIA is a joint GenAI platform by Broadcom and NVIDIA, which helps enterprises unlock GenAI & unleash productivity. With this platform, enterprises can deploy AI workloads faster, fine-tune and customise LLM models, deploy retrieval augmented generation (RAG) workflows, and run inference workloads in their data centers, addressing privacy, choice, cost, performance, and compliance concerns." 

When we look at this as a hierarchy of components, we can see how this comes to life in practical terms. First, pick your hardware OEM of choice and deploy the base VMware Cloud Foundation to provide your cloud-like experience. Combine this with your OEM of choice consumption plan (Dell Apex, HPE GreenLake, Lenovo Truscale), and you can meet changing OpEx demands at the same time.  

Building on this base platform is a suite of new VMware tools (we will look at these later in this article) designed to simplify the deployment, enhance day-two operations and ease of consumption for AI practitioners. This is then integrated into the Nvidia AI Enterprise Suite and the open-source communities to allow complete choice in model consumption and access to optimised tooling to unlock the best value from that underlying infrastructure.  

The VMware automation frameworks combine all the components needed to train, tune, and run any type of AI model. Thus, customers can deploy private infrastructure supporting privacy concerns without the usual complexity and day-two operational overhead.  

Remember that this is not just an Nvidia play; VMware also announced a similar reference architecture with Intel: VMware Private AI with Intel. When we see the full release of the new Gaudi 3 AI accelerator, this framework will provide access to both CPU and GPU-accelerated AI outcomes, which could stir up the economics for specific AI workloads. 

Features - VMware Private AI Foundation with NVIDIA   

Let's examine some of the specific features of this combined architecture and how they relate to the challenges of choice, cost, compliance, and performance. 

Model Store 

Deploying and configuring deep-learning virtual machines can be complicated and time-consuming. Manually building a VM can lead to inconsistencies that, at best, would mean a non-optimised environment and, at worst, could leave room for a security risk. We can help streamline this process and ensure compliance by providing curated outcomes. VMware had the following to say about the new model store in the release blog.   

"With the introduction of the Model Store capability, MLOps teams and data scientists can curate and provide secure LLMs with integrated access control (RBAC). This can help ensure the governance and security of the environment and the privacy of enterprise data and IP." 

GPU Monitoring   

GPUs are expensive resources, and ensuring maximum performance requires focusing on multiple areas of the overall architecture, from storage performance and networking to consideration of environmental factors. Maximising GPU performance will ultimately come down to temperature and maintaining an optimal operating envelope, allowing the best throughput. VMware has brought GPU monitoring capabilities into VCF, providing infra teams with the data needed to optimise workloads and unlock the total value from any investment. 

Performance  

As mentioned above, performance is critical in AI training and inferencing, but we must balance this against operational simplicity. Deploying a fleet of bare metal servers in 2024 brings many challenges around compliance, disaster recovery, cyber resilience and patching. All those reasons we moved to VMware 20 years ago still exist if we take the backward step to bare metal. The good news is that inserting the ESXi hypervisor into the mix for many AI workloads delivers equal to and exceeds bare in inference use cases. We can see the results from an MLPerf benchmark below; bare metal is considered 1.00. The chart below is the relative virtualised performance. 

Of course, we must assess the specific workloads, and different use cases yield differing results, but with a range of 95% > 105%, we have to assess the operational advantages of virtualisation. If you are training a large model, then that 5% could be a big deal, but a 2% performance drop at the inference stage could be worthwhile to gain the ability to maintain uptime security and scale. Read more about this here. 

 Catalogue Setup 

Guided deployment services are designed to reduce the complexity and time required to deploy the services at the platform layer (VCF) and above (Private AI). In many enterprises, data scientists and developers spend too much effort defining the infrastructure they need for AI/ML development. The challenge is these teams need to be more focused on things like security or compliance and, as such, may take shortcuts to get on with development or get stuck waiting for another team to design to the required standards. Either way, we have risks or impacts on the speed of innovation, bringing their own concerns at a business level. 

With new playbooks and automation being driven through Aria, we can start to deliver a simple self-service capability for a data scientist looking for new resources. We can see this below. 

Vector Databases for Enabling RAG Workflows 

One pivotal advancement in the evolving field of AI has been the integration of vector databases. These databases play a crucial role in enhancing the efficiency and precision of Generative AI. By storing and managing high-dimensional vectors, they can facilitate the rapid retrieval and manipulation of data, which is essential for the complex operations performed by Retrieval-Augmented Generation (RAG) workflows. 

VMware has added support for this technology via Postgres and Pgvector. When combined with VMware Data Services Manager, we gain automation for rapid deployment and support services to meet enterprise requirements.  

Futures 

These are just the launch capabilities of this Private AI architecture. We can expect continued innovation to drive down the cost and complexity of Generative AI. We already have a view of a couple of these future enhancements.  

Continuing the GPU visibility and performance track, we expect to see vGPU profile visibility and GPU reservations to further enhance day two operations and platform flexibility. 

Another future development is the Data Indexing & Retrieval Service; today, there is a need for more capability and availability to complete the complex data indexing and vectorisation tasks needed to support Generative AI. This is the challenge these new features will assist with. VMware provides the following description: 

"The Data Indexing and Retrieval Service will allow enterprises to chunk, index private data sources (e.g., PDFs, CSVs, PPTs, Microsoft Office docs, internal web or wiki pages) and vectorise the data. This vectorised data will be made available through knowledge bases." 

Summary 

When we combine all the above features with those from the Nvidia AI Enterprise portfolio and the Hardware OEMs, we have a powerful triad of capability to deliver scalable, performant, and compliant AI outcomes at the pace required by the business. 

We can enable the development, data, and IT operations teams to focus on what matters and not get bogged down in the perceived complexity of Private Platforms.  

For those looking for more technical detail, there is a great blog here that’s worth a read.

Contributors
  • Rob Sims

    Chief Technologist - Hybrid Platforms

Share
Subscribe to email updates