Skip to main content Skip to secondary navigation

Day 2: AI Infrastructure for Training and Inference

Main content start

April 21, 2026
Location: Computing and Data Science Building, Simonyi Conference Center 

AI infrastructure is evolving into a tightly integrated computing substrate spanning GPU clusters, accelerators,memory systems, and distributed software for training and inference. Achieving high Model FLOP Utilization (MFU) and reliability requires co-design across hardware and software layers, along with innovations in interconnection fabrics via software-driven approaches.

This conference explores advances in ML accelerators, compilers, data representations, low-latency inference systems, and large-scale AI platforms. A central theme is reconciling performance with robustness in heterogeneous, failure-prone environments, while leveraging workload and fleet automation to sustain efficiency at scale.

TimeAgendaVideos
8:00amBreakfast & Registration 
8:50amWelcome and Opening Remarks
Balaji Prabhakar | VMware Founders Professor of Computer Science, Stanford University
YouTube
9:00amKeynote 1: The Evolution of ML Accelerators from General Purpose to Task-optimized
Nafea Bshara | Vice President and Distinguished Engineer, Amazon 
YouTube
9:40amVoyager: A Compiler and Design-Space Exploration System for AI Accelerators
Priyanka Raina | Associate Professor of Electrical Engineering, Stanford University
YouTube
10:10amHeterogeneous Data and Memory Representations for Efficient AI
Thierry Tambe | Assistant Professor of Electrical Engineering, Stanford University 
YouTube
10:40amBreak | Jay BorensteinYouTube
11:00amSYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling
Athinagoras Skiadopoulos | Research Scientist, NVIDIA Research
YouTube
11:30amMichelangelo: Uber’s AI/ML Platform
Viv Keswani | Senior Director of Engineering, Uber
YouTube
12:00pmLunch Break 
1:00pmKeynote 2: Connecting GPUs Worldwide into an AI Platform for the World
Deepak Bansal | General Manager and Corporate Vice President, Microsoft Azure
YouTube
1:40pmAre AI Fabrics and Infrastructure Really That Different?
Joseph L. White | ISG-CTO Fellow, Dell 
YouTube
2:10pmDéjà Vu: Reconciling Fabric Perfection with Network Reality
Murai Sridharan | Senior Vice President or Networking, Oracle Cloud
YouTube
2:40pmAI Building AI: How AI is Accelerating Model Experimentation and Enabling The Flywheel
Animesh Singh | Senior Director, AI Platform and Infrastructure, LinkedIn
YouTube
3:10pmBreak 
3:30pmSoftware-Driven Fabrics Using Clocks and Shims
Balaji Prabhakar | VMware Founders Professor of Computer Science, Stanford University
YouTube
4:10pmFireside Panel: From Packets to Parameters
The panel explores how decades of networking and distributed systems innovation underpin modern AI infrastructure, examining evolving system abstractions, scalability challenges, and recurring design patterns. Discussion highlights interconnect design, job scheduling, low-latency response generation and fault tolerance in AI systems. 
Albert Greenberg | Chief Architect Officer, Uber 
Sachin Katti | Head of Compute Infrastructure, OpenAI
Ion Stoica | Professor of EECS, UC Berkeley
YouTube
5:15pmClose