#ModelAlignment

Latest posts tagged with #ModelAlignment on Bluesky

Trending

#Oscars #Ukraine Conflict #U.S. Foreign Policy #F1 #Chinese Grand Prix #SNL #Venezuela Baseball #AEW Collision #Six Nations #Newcastle United #Oscars #Ukraine Conflict #U.S. Foreign Policy #F1 #Chinese Grand Prix #SNL #Venezuela Baseball #AEW Collision #Six Nations #Newcastle United

Posts tagged #ModelAlignment

Dr Robert N. Winter

@robert.social.winter.ink.ap.brid.gy

6 months ago

Original post on social.winter.ink

AI models often mirror our beliefs, rewarding us with agreeable but shallow answers. This sycophancy flatters rather than challenges, eroding judgment and candour. To gain true value, we must set incentives that favour truth over comfort, design prompts that demand trade-offs, and treat AI as a […]

0 1 0 0

iMerit

@imerit.bsky.social

8 months ago

Training LLMs on open-ended tasks is tricky; opinions vary, and interpretations clash. Consensus scoring + escalation workflows bring structure and consistency to reward modeling.

How it works: bit.ly/44AMGZh

#ModelAlignment #RLHF #LLMTraining #FeedbackQuality

1 0 0 0

The MES Times

@themestimes.bsky.social

9 months ago

A new series of experiments by Palisade Research has sparked concern in the AI safety community, revealing that OpenAI’s o3 model appears to resist shutdown protocols—even when explicitly instructed to comply.

#AISafety #OpenAI #ModelAlignment #ReinforcementLearning #TechEthics

0 0 0 0