Mettez-vous hors ligne avec l'application Player FM !
Dataflow Computing for AI Inference with Kunle Olukotun - #751
Manage episode 513616212 series 2355587
In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware.
The complete show notes for this episode can be found at https://twimlai.com/go/751.
777 episodes
Dataflow Computing for AI Inference with Kunle Olukotun - #751
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Manage episode 513616212 series 2355587
In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware.
The complete show notes for this episode can be found at https://twimlai.com/go/751.
777 episodes
Tous les épisodes
×Bienvenue sur Lecteur FM!
Lecteur FM recherche sur Internet des podcasts de haute qualité que vous pourrez apprécier dès maintenant. C'est la meilleure application de podcast et fonctionne sur Android, iPhone et le Web. Inscrivez-vous pour synchroniser les abonnements sur tous les appareils.