Artwork

Contenu fourni par LessWrong. Tout le contenu du podcast, y compris les épisodes, les graphiques et les descriptions de podcast, est téléchargé et fourni directement par LessWrong ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre œuvre protégée sans votre autorisation, vous pouvez suivre le processus décrit ici https://fr.player.fm/legal.
Player FM - Application Podcast
Mettez-vous hors ligne avec l'application Player FM !

“0. CAST: Corrigibility as Singular Target” by Max Harms

19:40
 
Partager
 

Manage episode 432959396 series 3364758
Contenu fourni par LessWrong. Tout le contenu du podcast, y compris les épisodes, les graphiques et les descriptions de podcast, est téléchargé et fourni directement par LessWrong ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre œuvre protégée sans votre autorisation, vous pouvez suivre le processus décrit ici https://fr.player.fm/legal.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency.
Then, last year, I spent some time revisiting my perspective, and I concluded that I had been deeply confused by what corrigibility even was. I now think that corrigibility is a single, intuitive property, which people can learn to emulate without too much work and which is deeply compatible with agency. Furthermore, I expect that even with prosaic training methods, there's some chance of winding up with an AI agent that's inclined to become more corrigible over time, rather than less (as long as the people who built it understand corrigibility and want that agent [...]
---
Outline:
(07:30) Overview
(07:33) 1. The CAST Strategy
(08:15) 2. Corrigibility Intuition (Coming Saturday)
(08:49) 3a. Towards Formal Corrigibility (Coming Sunday)
(09:27) 3. Formal (Faux) Corrigibility ← the mathy one (Also Sunday)
(10:12) 4. Existing Writing on Corrigibility (Coming Monday)
(10:33) 5. Open Corrigibility Questions (Also Monday)
(10:58) Bibliography and Miscellany
---
First published:
June 7th, 2024
Source:
https://www.lesswrong.com/posts/NQK8KHSrZRF5erTba/0-cast-corrigibility-as-singular-target-1
---
Narrated by TYPE III AUDIO.
  continue reading

335 episodes

Artwork
iconPartager
 
Manage episode 432959396 series 3364758
Contenu fourni par LessWrong. Tout le contenu du podcast, y compris les épisodes, les graphiques et les descriptions de podcast, est téléchargé et fourni directement par LessWrong ou son partenaire de plateforme de podcast. Si vous pensez que quelqu'un utilise votre œuvre protégée sans votre autorisation, vous pouvez suivre le processus décrit ici https://fr.player.fm/legal.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency.
Then, last year, I spent some time revisiting my perspective, and I concluded that I had been deeply confused by what corrigibility even was. I now think that corrigibility is a single, intuitive property, which people can learn to emulate without too much work and which is deeply compatible with agency. Furthermore, I expect that even with prosaic training methods, there's some chance of winding up with an AI agent that's inclined to become more corrigible over time, rather than less (as long as the people who built it understand corrigibility and want that agent [...]
---
Outline:
(07:30) Overview
(07:33) 1. The CAST Strategy
(08:15) 2. Corrigibility Intuition (Coming Saturday)
(08:49) 3a. Towards Formal Corrigibility (Coming Sunday)
(09:27) 3. Formal (Faux) Corrigibility ← the mathy one (Also Sunday)
(10:12) 4. Existing Writing on Corrigibility (Coming Monday)
(10:33) 5. Open Corrigibility Questions (Also Monday)
(10:58) Bibliography and Miscellany
---
First published:
June 7th, 2024
Source:
https://www.lesswrong.com/posts/NQK8KHSrZRF5erTba/0-cast-corrigibility-as-singular-target-1
---
Narrated by TYPE III AUDIO.
  continue reading

335 episodes

Toate episoadele

×
 
Loading …

Bienvenue sur Lecteur FM!

Lecteur FM recherche sur Internet des podcasts de haute qualité que vous pourrez apprécier dès maintenant. C'est la meilleure application de podcast et fonctionne sur Android, iPhone et le Web. Inscrivez-vous pour synchroniser les abonnements sur tous les appareils.

 

Guide de référence rapide