All articles

Hybrid Platforms Partner Series - Its All about the Token at Nvidia GTC

Author:

Rob Sims

Hybrid Platforms

•  Apr 02, 2025

What is GTC ? 

NVIDIA GTC (GPU Technology Conference) is an annual event hosted by NVIDIA designed to showcase advancements in AI, high-performance computing, data science, and graphics processing. The conference is a platform for researchers, developers, business leaders, and technology enthusiasts to explore the latest innovations in GPU computing, AI, deep learning, and autonomous systems. 

I see GTC as a bridge between cutting-edge research and real-world application of that technology, making it slightly different from your normal technology conference. The marketing overlay is peeled back, and you get a real sense of the potential and energy. While this makes the event far more technical than most (it is a developer conference at heart), I found it allowed a better connection to the outcomes with the types of conversations and innovation I have not seen for quite some time at this type of conference. I guess it's like early virtualisation or container events; they are exciting, fresh, and full of promise.  

The Keynote  

I am not going to try to unpack the entire 2 hours and 10 minutes of the GTC 2025 keynote; way too much happened for that. If you want, the full stream is on YouTube (GTC March 2025 Keynote with NVIDIA CEO Jensen Huang). I will pull out some highlights, impressions, and thoughts.  

Being held in the SAP Center Arena, home of the San Jose Sharks (National Hockey League) it was hard to miss the scale. The queue for entry was miles long and I am sure quite a few will not have made it inside. I was lucky a colleague managed to save me a seat and we had a great view of proceedings. 

Tokens: How do they help us explore the world, connect the dots, and bravely go where no one has gone before? This was the opening video, calling out that the token is the building block of AI and the output of a new factory—the AI factory. It’s a factory, with one job: producing (manufacturing) the token needed to drive the AI outcomes. Jensen later commented on how AI is helping scientists complete work that previously would have taken multiple generations.  

Everything that followed the intro video, the history lessons, maths lessons, product roadmaps, partnerships appreciations, jokes and demos were all delivered without a teleprompter, scripts or assistance from an army of VPs. Jensen held the packed stadium for over 2 hours on his own, and some may say he got too technical at times, but for me, it was refreshing to ditch the marketing polish and hear from an innovator who is still connected to the technology and the outcomes (even if mainly focussed on the largest and most complex outcomes) 

GTC was at capacity (25,000 in person, 300,000 virtual), and Jensen joked about their need to grow San Jose as a city to cope in the future and how it was the Superbowl of technology, but everyone wins. He talked about AI really being a 10-year-old technology that started with Perception AI (Predictive Machine Learning) and accelerated quickly through Generative to Agentic AI. His prediction is Physical AI as the future, and this is wider than just robotics in manufacturing; we are talking robots from all walks of life (more later) 

The challenges to making this leap are threefold, and they are all in the target of NVIDIA to solve! 

  • Solve the Data problem: Have enough data at the right quality and accessibility 
  • Training without a Human in the loop (speed and cost) 
  • Scaling the compute requirement for these next AI phases (Agentic/Physical) 

Agentic AI is the start of the reasoning AI era, where we move away from a one-shot, typically wrong answers to a chain of thought approach where the model checks and balances its responses to provide an accurate answer. The challenge here is the impact on tokens; take the example below - Traditional LLM answer uses ~400 tokens but gives an inaccurate answer; the reasoning model gives the right answer but uses 20x the token count! This is the fundamental reason we need the announcements to be talked about later, as well as the ever-increasing accelerated computing power. 

It was interesting to see Jensen get genuinely excited about the breadth of CUDA-X libraries that NVIDIA is providing; he spent around 10 minutes of the keynote talking through most of them. Please skip to around 22mins in the video above to get the full view 

One important part of the entire keynote was when the offering, from small to large, came into view. From DGX Spark (See below), through the newly announced DGX Station to the full DGX Rack solutions, NVIDIA has the accelerated computing to support every organisation regardless of scale. 

The NVIDIA software and new infra combine to solve two of the three challenges we talked about earlier, leaving only the storage problem to solve. Jensen announced they are working with all the major storage providers to make storage GPU-accelerated and integrated into the ecosystem shown below. I'm looking forward to seeing the realities of this.  

As we neared the end of the keynote, Jensen moved on to the topic of robotics and the concept of three factories; he sees Robots as a method of interaction with the physical world. We have large labour shortages globally, expected to hit 50 million by the end of the decade! Solving this through robotics is Jensen's vision. He sees a world of three computers: one to simulate, one to train, and one to deploy, with robots acting as the real-world interaction of that deployment. 

We have the same three challenges for robotics as general AI (Data, human & scale); through a series of videos and presentations, the keynote laid out the journey to general robotics and solving these challenges. It’s hard to do it justice here, so please watch the keynote at around 1 hour 55. In summary, three components of the NVidia portfolio are combined to deliver the future of physical AI 

  • Omniverse & Cosmos combine to solve the data problem. 

(https://www.nvidia.com/en-gb/ai/cosmos/) 

  • Reinforcement learning requires physics (to remove the human),  Newton Open-Source Physics Engine for Robotics Simulation 

(https://developer.nvidia.com/blog/announcing-newton-an-open-source-physics-engine-for-robotics-simulation/) 

To finish the keynote, you will notice a little robot standing beside Jensen in the image above. Please go watch the video at 2 hours and 4 minutes. You won’t be disappointed, and I won't spoil it here! 

Announcements  

The full list of announcements that Nvidia shared with me after the event took 16 pages of detail when transposed into a Word document! I am not going to unpack this in full during this article, so I thought I would pick my top four that Jensen talked about during the keynote and give me thoughts and perspective.  

New GPU Chips 

Jensen talked about how AI Applications are different, which means the Operating system and Infrastructure will need to be different to support them. Just looking at the impact reasoning models are having on token generation shows us that in the inference space, there is a lot of evolution to come. It was suggested that the current 1 billion knowledge workers on the planet would be augmented by 10 billion AI Agents, which will require a different level of thinking. 

One element of this change is the knowledge that AI Factories require planning and that planning can mean multi-year projects. Multi-year projects mean that roadmaps are critical so we can plan architectures and critically assess the impact of facilities. When we know that the future may require 600Kw racks, we can make different decisions! 

NVIDIA Blackwell Ultra 

With Blackwell now shipping, attention has turned to the Ultra version slated for H2 2025 release, focusing on the reasoning inference space at scale. The Blackwell ultra is an air-cooled chip based on the new scale-up NVL72 (that’s NV link, which allows multiple GPUs to act as a single processer) architecture, with the Grace Blackwell (GB) being the water-cooled CPU+GPU Combined system. Info on the B300 and GB300 is as follows: 

Blackwell Ultra is the newest accelerated computing platform built for the 3 scaling laws and the age of AI inference reasoning. Blackwell Ultra offers 50x more AI factory output over Hopper and is available in two system configurations: GB300 NVL72 and HGX B300 NVL16. Blackwell Ultra GPUs introduce several key technological breakthroughs: 

  • AI Reasoning Inference: Features 1.5x more AI FLOPS compared to Blackwell GPUs and 2x attention acceleration for long-context thinking. 
  • More Memory: Features 1.5x more memory compared to Blackwell GPUs for up to 288 GB of HBM3e per GPU, enabling more inference performance.  
  • Faster Networking: Features 800 Gb/s of network connectivity for each GPU with NVIDIA ConnectX-8 SuperNIC with either NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet for multi-node workloads. 

The NVIDIA GB300 NVL72 features a fully liquid-cooled, rack-scale design that unifies 72 NVIDIA Blackwell Ultra GPUs and 36 Arm-based NVIDIA Grace CPUs in a single platform optimized for AI training and test-time scaling inference.  

  • NVIDIA GB300 Grace Blackwell Ultra Superchip: Key building block for the NVIDIA GB300 NVL72 rack-scale solution.  
  • Increased AI factory output: Offering a 50X increase in AI factory output with optimised inference capabilities. 
  • Specifications: Relative to Hopper, GB300 NVL72 has 70X FP4 dense inference FLOPS, 30X GPU memory, 65X fast GPU + CPU memory, 20X networking bandwidth. 

NVIDIA Vera Rubin. 

For 2026 and 2027, we expect to see the next evolution of the chips, with Blackwell GPU moving the Rubin and Grace CPU moving the Vera. 2026 is expected to deliver the Rubin/Vera combination, with 2027 bringing the Ultra version, very much like 2024 brought Blackwell/Grace and 2025 brought the Ultra versions. The Rubin GPU will again be paired with a Nvidia arm CPU (Vera) to provide a unified rack scale solution. Details from Nvidia as follows: 

  • NVIDIA Rubin GPU: The NVIDIA Rubin GPU is our next-generation data center GPU featuring: 2 reticle-sized dies, 50 petaFLOPS of FP4, 288 GB of HBM4 memory  
  • NVIDIA Rubin Ultra GPU: Next-generation of the Rubin platform. Rubin Ultra will feature: 4 reticle-sized dies, 100 petaFLOPS of FP4, 1TB of HBM4e memory  
  • NVIDIA Vera CPU: The NVIDIA Vera CPU is our next-generation data center CPU featuring 88 NVIDIA-designed, high-performance Olympus CPU cores, it delivers up to 1.2 terabytes per second (TB/s) of memory bandwidth while using only 50 watts of memory power. NVIDIA Scalable Coherency Fabric (SCF) maximizes performance and keeps data flowing. And with over 2x the performance of the prior generation, the NVIDIA Vera CPU is ideal for data processing, compute and memory-intensive workloads or pairing with the NVIDIA Rubin GPU to shape the future of high-performance computing (HPC) and AI. 

We are also seeing the continued expansion of NV Link technology to drive the scale up requirements of larger AI models and inference requirements. Again details from Nvidia: 

  • NVIDIA Vera Rubin NVL144: Next generation NVLink liquid cooled rack-scale architecture connecting Arm-based NVIDIA Vera CPUs and 144 Rubin GPUs in a single NVLink domain. Compared to GB300 NVL72, Vera Rubin NVL144 features: 2.5X FP4 inference FLOPS, 3.3X FP8 training FLOPS, 1.6X HBM memory bandwidth, 2X fast memory (GPU + CPU), 2X all-to-all NVLink bandwidth, and 2X networking bandwidth (ConnectX-9) 
  • NVIDIA Vera Rubin Ultra NVL576: Next generation NVLink liquid cooled rack-scale architecture connecting Arm-based NVIDIA Vera CPUs and 576 Rubin Ultra GPUs in a single NVLink domain. Compared to GB300 NVL72, Vera Rubin NVL576 features: 10X FP4 inference FLOPS, 14X FP8 training FLOPS, 8X HBM memory bandwidth, 9X fast memory (GPU + CPU), 12X all-to-all NVLink bandwidth, and 8X networking bandwidth (ConnectX-9) 

All of this is forecast out to 2028 (Feynman did not get much talk time as it’s so far out) and allows us to see how we can easily get to a gigawatt AI Factory. I must stress here that the mass enterprise market is not going to be buying these systems, but as with all tech, innovation flows down, and this roadmap will drive the smaller-scale value for the wider market. 

Dynamo 

Models are growing and are increasingly being integrated into agentic AI workflows that require interaction with multiple other models. Deploying these models and workflows in production environments involves distributing them across multiple nodes of GPUs, which demands careful orchestration and coordination. This can lead to a wealth of performance and optimisation issues that are not great on expensive GPU platforms. Dynamo is the new inference software system that is aligned with solving the challenges of deploying reasoning AI Solutions. NVIDIA refer to this as a low latency distributed Inference framework for scaling reasoning AI Models and provides the following summary: 

NVIDIA unveiled Dynamo, an open-source AI inference-serving software designed to maximize token generation for AI factories deploying reasoning AI models. Dynamo orchestrates and accelerates inference communication across thousands of GPUs. It uses disaggregated serving to separate the processing and generation phases of LLMs on different GPUs, allowing each phase to be optimized independently for its specific needs and ensuring it can maximize GPU resource utilization.  

NVIDIA Dynamo incorporates features that enable it to reduce costs. It can dynamically add, remove and reallocate GPUs in response to fluctuating request volumes and types. It can pinpoint specific GPUs in large clusters that can minimize response computations and route queries to them. It offloads inference data to more affordable memory and storage devices, quickly retrieving them when needed, minimizing inference costs.  

NVIDIA Dynamo provides: 

  • 30x more throughput running DeepSeek R1 models on NVIDIA GB200NVL72 
  • 2x more throughput running Llama 70B models on NVIDIA Hopper 

NVIDIA Dynamo is available for developers on the Dynamo GitHub repo. For enterprises looking for faster time to production and enterprise-grade security, support, and stability, Dynamo will be included with NVIDIA AI Enterprise.  

More details can be found here: 

https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/ 

Silicon Photonics 

Cables, Optics and Networking are the complicated parts of any at-scale datacentre deployment. We now must consider the power impact on power consumption as well! A 'typical' cloud datacentre might have 100K servers and the transceivers consuming around 2.4MW of power; those same 100K servers in an AI Factory would put the transceiver’s power at 40MW! 

Connecting all those GPUs via NV link requires a lot of connections. The other challenge is managing heat and signal integrity over longer distances as throughputs scale (think 200-800 Gb/s ranges); we may not even be able to support connectivity within the rack, let alone in a facility the size of a football pitch. There is also a massive cost, with Jenson estimating that each GPU requires around $6k of cables, which adds up to over 100,000 GPUs! 

Certainly, these issues would be significant barriers to scaling AI Factories. Enter Silicon Photonics, micro-ring modulators (MRMs), and co-packaged optics (cpo) to save the day. In simple terms, we remove the need for the transceiver and, therefore, remove the cost and power draw from the laser (there is a lot more to it) while also increasing the distance we can deliver a consistent signal. 

The NextPlatform has a deep dive on the topic for those like such things: 

https://www.nextplatform.com/2025/03/18/nvidia-weaves-silicon-photonics-into-infiniband-and-ethernet/ 

Info from NVIDIA: 

NVIDIA Spectrum-X and NVIDIA Quantum-X silicon photonics networking switches are the world’s most advanced networking solution for the era of agentic AI, enabling AI factories to connect millions of GPUs across multi-sites. NVIDIA co-packaged optics (CPO) based networks simplify manageability and design while enabling more power for compute infrastructure. These benefits are critical to delivering the scale needed to enter the future of million-GPU AI factories. By replacing pluggable transceivers with silicon photonics on the same package as the ASIC, NVIDIA CPO innovations provide: 

  • 3.5x better power efficiency 
  • 10x higher network resiliency 
  • 1.3x faster time to deploy compared to traditional networks 

DGX Spark 

NVIDIA announced the DGX Spark, the cumulation of Project Digits; this tiny unit has the same power as the original DGX 1; when considered, that was a 3U rack-mounted server weighing approximately 60kg and costing over 100K back in 2016. Jensen joked he had applied Pimm Particles to the DGX1 (a reference to the shrinking tech used in the Marvel universe). The Spark is tailored for developers, researchers, and students to prototype, fine-tune, and deploy AI models locally, enabling outcomes like fine-tuning, inference, and prototyping without extensive server infrastructure. Combine this with your AI PC and your set!  

Being touted as offering support for 200 billion parameter models, this is a great component in the wider AI development chain and, at around 3K, offers an interesting entry point for AI development. 

The Expo Hall(s) 

Two huge rooms packed with people, technology, demos and technology; what more could you ask for? For me, more time to spend speaking to the multitude of options would have been amazing; I think you could easily have spent 2 full days just discovering this part of the conference! I have attended many tech conferences over the past 20 years, and few have provided this level of innovation, excitement and conversation. Everywhere you look, people are deep into a discussion about what is possible; nobody is just milling around killing time as you can see at other events. If you want to understand what is possible with AI, Robotics, Automation and Accelerated computing, this is the place to be in 2026! 

The technology on display was mesmerising; the options seemed limitless from the extraordinary amount of accelerated compute options, rack-level cooling solutions, robotics and software innovation partners. This level of innovation was amazing to see, and I am sure you could find the answer to almost any problem across the two huge expo rooms. I think it is even more important to have a strong and structured AI Strategy in place to ensure you are not led by the technology but rather by the outcomes at a business level.  

Partner Presence  

Another great thing about an event like GTC is that it brings together all the ecosystem partners supporting a Hybrid AI infrastructure. Normally, you would not see the likes of Dell, HPE, Cisco, NetApp, Pure, VAST, etc., at the same event as it would be seen as competitive. This is great on two fronts; firstly, it allows customers to see all the comparative options available from a single location, enabling easy compare/contrast of all abilities. Something that may have taken months to arrange a meeting could be conducted in a few short days.  

Secondly, as a partner of these providers, it allows us to connect and align plans at a global level, ensuring we bring the latest information to our local customers and a consistent message is agreed upon. The common thread throughout all the partner meetings I attended was how we create proven ROI in the mass enterprise and mid-market space. It was great to hear that our CDW UK & International AI strategy resonated, and the future roadmaps for partner solutions should complement and be in lockstep. While we all like to work on the large Nvidia SuperPod scale opportunities, the reality is that 90% of companies won’t need such a scale to deliver measurable AI outcomes.  

Customer Conversations  

One great thing about working at CDW is being part of our Global AI Network and being able to align on simple messaging for our customers. Let us help you make AI Simple. As mentioned in the Keynote and Expo Hall sections above, the possible complexity and choice could easily lead to option paralysis or wasted investment. Being present at GTC and providing a truly independent view of the AI market, I believe, is critical to ensuring success in the mega enterprises is replicated for all organisations.  

On and off the stand, connecting with customers and understanding the challenges and successes of AI adoption was great. From the ultra-high-end conversations about building AI Factories that are expected to consume 20Mw of power (for context, that’s about 16,000 GPU's) when the conversations about SFP power draw really do become a critical consideration. The challenges of delivering AI at the departmental level or scaling to thousands of locations when the cost per unit is the biggest challenge to balancing the investment plan.  One thing that is clear to me is that everyone is at a similar stage of their GenAI adoption, and over the next 2-3 years, we will see a boom in use cases and access to outcomes at scale.  

Technology in Action 

The conversation about Physical AI has been coming to life in San Francisco (more on why I was in San Fran later); I noted these heavily technology-laden Jaguar I-PACE cars around the city and at first wrote them off as another survey vehicle. How wrong I was! I noticed something strange about them: no human was sitting in the driver’s seat! It turns out Waymo has been running a pilot service since 2021, which was opened to the general public in mind in 2024. An all-electric fleet of cars has delivered an estimated 4 million miles of driverless services with little reported incidents caused by the technology. Here is one of the cars in action on the streets, exciting to see technology in action for general consumption. 

Jensen talked about Nvidia Halos during GTC, a full-stack safety system for autonomous vehicles that included 7 million lines of security-checked code. I wonder if it's being tested here.  

I asked ChatGPT for some info on the service for a little more context: 

​Waymo, a subsidiary of Alphabet, operates a fully autonomous ride-hailing service in San Francisco utilizing a fleet of all-electric Jaguar I-PACE vehicles. This service, known as Waymo One, is available 24/7 and allows users to book rides through the Waymo One app. The vehicles operate without human drivers, providing a seamless and innovative transportation experience. 

Vehicle and Technology 

Each Jaguar I-PACE in Waymo's fleet is equipped with an array of over 50 sensors, including LiDAR, radar, and cameras. These sensors enable the vehicle to navigate complex urban environments safely and efficiently. ​ 

User Experience 

Riders have reported smooth and human-like driving experiences during their trips. The vehicles handle traffic conditions adeptly, providing a comfortable journey without abrupt movements. 

Final Thoughts & Key Messages from GTC 

AI is here, and the move to reasoning AI and robotics is going to accelerate the practical and return-based use cases over the next 24 months. The industry has moved at a tremendous pace over the last two years, but there has never been a better time to build your AI strategy and start the journey, modernise infrastructure, curate data, and deliver transformation.   

I would like to leave you all with the top 5 key messages from GTC. Please do reach out if you want to chat through anything discussed above or accelerate your AI journey in general.  

  • Shift to accelerated computing is accelerating; $1T worth of data centers becoming accelerated and driving demand for AI factories. 
  • AI investment is accelerating across every industry - beyond CSPs, to GPU clouds, enterprise, and robotics. 
  • AI reasoning inference requires significantly more computation than traditional one-shot inference. NVIDIA is the leader in inference with full-stack invention and optimization. 
  • AI is now mainstream - and the entire Enterprise stack has been reinvented with NVIDIA acceleration. 
  • Robots have arrived - a new wave of physical AI drives major data center workload consumption with simulation and synthetic data generation for post-training. 

We managed to get a tour of the new Nvidia offices, which are truly a cool place to work if you get the chance. The engineering is out of this world, and the views are not bad either! It's also cool to see the meeting rooms named after sci-fi and technologies. I noted a few Star Wars-based rooms. 

Share
Subscribe to email updates