Best of Sara Hooker - LinkedIn Posts by Sara Hooker

Were you curious to find out Google DeepMind new gemma models are using REINFORCE instead of PPO for RLHF for learning human preferences? 🔎

We have been working on this paper for months -- that takes apart PPO and motivates REINFORCE Style Optimization as far simpler and more effective. 🎖Reinforcement learning from human feedback has been widely adopted as a way to ensure models reflect preferences. 🌍 Approaches like PPO directly borrow from traditional RL assumptions. Here we ask is this necessary for LLM settings?

In an LLM setting we show that due to the strength of the init policy and prompt conditioning, PPO is unnecessary and computationally costly. Take a look and understand why we suggest the future of RLHF for LLMs looks different from PPO and requires going back to basics

Work I am very proud of this work led by our Cohere For AI Cohere teams. A huge congrats to first author Arash Ahmadian and the rest of authors Chris Cremer, Matthias Gallé Marzieh Fadaee Julia Kreutzer Olivier Pietquin Ahmet Üstün. 🎉 🎊

Learn more: https://lnkd.in/e77W6hec

▿ Show more

Very proud to share our cross-institutional work about how shifting patterns in consent on the internet impact AI ✨ .

We find that consent around web data is rapidly evolving 🌐 . As the internet has changed, so has user preferences. In the last year alone, >5% of all tokens have become restricted for AI training. ~30% of top sites.

**Yet, while restrictions are increasing, widely used protocols like Robots.txt fail to be effective at expressing intent.**

Paper: https://lnkd.in/eEjqKU9K

NYT article by Kevin Roose: https://lnkd.in/eiDRFRBu

Much of consent on the internet relies on a decades old protocol called Robots.txt. Originally designed to specify whether search bots were allowed to crawl a users page, it has increasingly been leaned on to express preference about data being used for AI training. However, it places huge burden on the user to specify each agent individually, which leads to a patchwork of restrictions and could disproportionately impact access for researchers.

I see this as part of wider research agenda to understand how data informs breakthroughs. AI datasets are no longer static but reflect an evolving internet. Understanding how we shape data, as well as the protocols needed to reflect consent effectively is critical work.

If you made it this far -- take a look at some of the wider research agenda of the data provenance initiative. Earlier this year we released the largest scale audit of dataset licensing & attribution in AI: https://lnkd.in/eh5mZkF2

A shoutout to Shayne Longpre who led this initiative, with many other cross-instutional collaborators including Robert Mahari, Ariel Lee, Campbell Lund, Caiming Xiong, Luis Villa, Stella Biderman, Hanlin li, Daphne Ippolito, Jad Kabbara, and Alex 'Sandy' Pentland and many more. 🔥

▿ Show more

Sara Hooker

Best Posts by Sara Hooker on LinkedIn

Related Influencers

Roche

Arfa Farheen

Santiago Amador

Adrian K