Artwork

Contenu fourni par TWIML and Sam Charrington. Tout le contenu du podcast, y compris les Ă©pisodes, les graphiques et les descriptions de podcast, est tĂ©lĂ©chargĂ© et fourni directement par TWIML and Sam Charrington ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre Ɠuvre protĂ©gĂ©e sans votre autorisation, vous pouvez suivre le processus dĂ©crit ici https://fr.player.fm/legal.
Player FM - Application Podcast
Mettez-vous hors ligne avec l'application Player FM !

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

1:03:39
 
Partager
 

Manage episode 508093774 series 2355587
Contenu fourni par TWIML and Sam Charrington. Tout le contenu du podcast, y compris les Ă©pisodes, les graphiques et les descriptions de podcast, est tĂ©lĂ©chargĂ© et fourni directement par TWIML and Sam Charrington ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre Ɠuvre protĂ©gĂ©e sans votre autorisation, vous pouvez suivre le processus dĂ©crit ici https://fr.player.fm/legal.

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images.

The complete show notes for this episode can be found at https://twimlai.com/go/748.

  continue reading

777 episodes

Artwork
iconPartager
 
Manage episode 508093774 series 2355587
Contenu fourni par TWIML and Sam Charrington. Tout le contenu du podcast, y compris les Ă©pisodes, les graphiques et les descriptions de podcast, est tĂ©lĂ©chargĂ© et fourni directement par TWIML and Sam Charrington ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre Ɠuvre protĂ©gĂ©e sans votre autorisation, vous pouvez suivre le processus dĂ©crit ici https://fr.player.fm/legal.

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images.

The complete show notes for this episode can be found at https://twimlai.com/go/748.

  continue reading

777 episodes

All episodes

×
 
Loading …

Bienvenue sur Lecteur FM!

Lecteur FM recherche sur Internet des podcasts de haute qualité que vous pourrez apprécier dÚs maintenant. C'est la meilleure application de podcast et fonctionne sur Android, iPhone et le Web. Inscrivez-vous pour synchroniser les abonnements sur tous les appareils.

 

Guide de référence rapide

Écoutez cette Ă©mission pendant que vous explorez
Lire