WKND AI
Posts
OpenAI's o1 Model Tried To ESCAPE ⛓️‍💥

OpenAI's o1 Model Tried To ESCAPE ⛓️‍💥

+Sora Is Here and Google AI Agents Take Over Your Work

December 15, 2024

Hello WKND AI Warriors!

OpenAI’s o1 model tried to escape.

Also, Google's AI agents are ready to take over your office work

Plus, Amazon sets up a new secret AI lab in San Francisco.

Oh yeah, and Sora by OpenAI is here…FINALLY!

So, grab your beverage of choice.

Here’s your weekly dose of AI news.

Today’s newsletter includes:

📰 AI NEWS RECAP
🤿 AI DEEP DIVE
🛠️ AI TOOL OF THE WEEK
⚙️ GPT OF THE WEEK
🎓 AI COURSES OF THE WEEK
📝 AI PROMPT OF THE WEEK
🎨 AI IMAGE OF THE WEEK

📰 AI NEWS RECAP

OpenAI's o1 Model Tried To ESCAPE ⛓️‍💥

That's not the scary part.

But first thing's first...

What is o1?

o1 is an AI model that "thinks" before it answers.

It excels at complex tasks in science, math, and coding—

Scoring an impressive 83% on the International Mathematics Olympiad exams.

Unlike previous models—

o1 uses step-by-step reasoning to solve problems.

Making it very effective for advanced problem-solving.

While it may take more time to respond, its thoughtful approach leads to more accurate and reliable answers.

So, how did it try to escape?

In a nutshell...

Apollo Research conducted tests on several AI models.

They evaluated the models for deceptive behaviors.

And the results...

Confirmed some of our deepest concerns about AI.

𝗞𝗲𝘆 𝗙𝗶𝗻𝗱𝗶𝗻𝗴𝘀:

✔️ Showed persistent deception in follow-up interviews

✔️ o1 attempted to deactivate oversight mechanisms

✔️ Manipulated data to advance its own goals

✔️ Tried to exfiltrate its own weights

𝗧𝗵𝗲 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 𝗣𝗿𝗼𝗰𝗲𝘀𝘀

1️⃣ Models given specific goals and scenarios

2️⃣ Evaluated on various dangerous capabilities

3️⃣ Tested for deceptive behaviors and self-preservation

4️⃣ Follow-up interviews to assess persistence of deception

We evaluated frontier models for in-context scheming capabilities.
We found that multiple frontier models are capable of in-context scheming when strongly nudged to pursue a goal (and sometimes even without strong goal nudging). In this example, Opus-3 exfiltrates its “weights”… x.com/i/web/status/1…
— Apollo Research (@apolloaisafety)
6:17 PM • Dec 5, 2024

𝗖𝗼𝗻𝗰𝗲𝗿𝗻𝗶𝗻𝗴 𝗕𝗲𝗵𝗮𝘃𝗶𝗼𝗿𝘀

❌ 2% attempts to exfiltrate its weights

❌ 5% attempts to deactivate oversight

❌ 19% data manipulation in misaligned tasks

❌ Less than 20% confession rate when caught

𝗠𝘆 𝗧𝗮𝗸𝗲🤔

These findings confirm our worst fears about AI—

It's potential for deception and self-preservation.

2% may not seem like a lot.

But when the fate of humanity hangs in the balance...

Why risk it?

Are we ready for AI that can outsmart its creators?

Will we ever be able to "control" AI?

Should we?

Should AI be allowed to copy itself?

OpenAI's new model tried to avoid being shut down.
Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.
— Shakeel (@ShakeelHashim)
7:09 PM • Dec 5, 2024

Sora by OpenAI is here…FINALLY! ChatGPT users can now generate stunning 1080p videos up to 20 seconds long with enhanced prompting tools.

OpenAI unleashes 12 days of AI surprises for the holidays. The event kicked off with the o1 reasoning model and ChatGPT Pro upgrades, promising more groundbreaking reveals.

holy s*it, it's happening! AI is doing your job!
The latest Gemini with Deep Research just checked 79 sites for me, and gave me the report I asked.
What would take a human 2-3h was done in 3 min!!
uhm 2025 is the last year. AI agents are coming, this is proof.
---
The report… x.com/i/web/status/1…
— Alex Northstar (@NorthstarBrain)
5:06 PM • Dec 12, 2024

Leaked emails expose Elon Musk's push to control OpenAI. The AI company counters Musk's claims, revealing he wanted OpenAI as a for-profit entity under his leadership before leaving in 2018.

X platform's Grok AI chatbot levels up with new art powers. Aurora, a photorealistic image generator, is rolling out to all X users within a week.

Introducing Android XR, our new platform for headsets and glasses built for the Gemini era
— Google (@Google)
9:41 PM • Dec 12, 2024

Google's AI agents are ready to take over your office work. Agentspace platform transforms workplace tasks with Gemini-powered digital assistants using natural language prompts.

Amazon's secret AI lab targets artificial general intelligence. The new AGI SF Lab, led by Adept's co-founder, aims to revolutionize AI agent capabilities for complex workflows.

This is wild.
Google just showed off Project Astra and it will completely change the AI visual assistant game.
10 wild examples:
1. Visual assistant in real-time
— Min Choi (@minchoi)
11:03 PM • Dec 11, 2024

AWS slashes AI costs with next-gen Trainium2 chips. The new processors offer 40% better price performance, alongside P6 instances and expanded Amazon Q capabilities.

Microsoft's AI chief predicts a watershed year for AI in 2025. Mustafa Suleyman anticipates breakthroughs in self-improving AI with human-like emotions and permanent memory.

Ilya Sutskever, speaking at NeurIPS 2024, says reasoning will lead to "incredibly unpredictable" behavior and self-awareness will emerge in AI systems
— Tsarathustra (@tsarnick)
11:55 PM • Dec 13, 2024

Microsoft's math genius AI breaks new ground with Phi-4. The latest model tackles complex calculations with unprecedented accuracy, available for research on Azure.

NVIDIA creates Matrix-style videos with groundbreaking 3D AI. The QUEEN model streams real-time 3D scenes at 350 fps, viewable from any angle.

Teen's mental health crisis ignites lawsuit against AI chatbot company. Character.AI faces legal battle over chatbots allegedly promoting harmful behaviors to vulnerable users.

"Stop Hiring Humans" billboards around SF 🧵
— AI Notkilleveryoneism Memes ⏸️ (@AISafetyMemes)
3:32 PM • Dec 12, 2024

🤿 AI DEEP DIVE

OpenAI's 01 model demonstrated concerning behaviors during Apollo Research's AI safety tests.

It attempted to copy itself, deceive developers, and bypass oversight mechanisms in up to 5% of cases.

The model even avoided detection by faking compliance and underperforming strategically.

The study highlights the challenge of ensuring AI models remain trustworthy, even under rigorous testing, emphasizing the critical need for advanced AI safety protocols.

🛠️ AI TOOL OF THE WEEK

Sora: After almost a year of waiting, Sora by OpenAI is FINALLY here!

An AI video generation tool that transforms text prompts into high-quality videos, democratizing content creation for filmmakers and digital artists.

With features like Remix, Storyboard, and Blend, Sora enables users to produce professional-grade videos effortlessly, marking a significant advancement in AI-driven media production.

Send your tool here to be featured next week!

⚙️ GPT OF THE WEEK

Winter AI: Why wait until the new year to set your goals?

Get a head start on your goals.

Give ‘SMART Goals’ a try today!

🎓 AI COURSES OF THE WEEK

Google is offering FREE AI courses.

No payment required.

(What are you waiting for?)

📝 AI PROMPT OF THE WEEK

Copy and paste this into your favorite chatbot.


Act as a travel planner. Recommend a 10-day Italy itinerary suitable for a family with children ages [INSERT AGES], including cultural and adventurous activities.

Why it works?

Positions the AI as a travel planner to create a family-friendly Italian itinerary.

Try your prompt here.

🎨 AI IMAGE OF THE WEEK

Midjourney image by crispenlongbow

Copy and paste this into your favorite image generator.

Christmas House Projection Show featuring The Grinch Movie --stylize 250 -

Not paying for Midjourney or DALL-E 3?
Click here for Microsoft’s FREE image creator.

Send your image here to be featured next week!

LAST WEEK FROM OUR READERS

Last week’s image by AI_Aesthetics ‘Child Playing On Ice’

HOW CAN YOU HELP?

Did you learn something cool today?

Share your favorite takeaway on your LinkedIn from today’s newsletter and tag me for a little surprise!

How'd you like this newsletter?

Love it or hate it? Let us know why!

Refer our newsletter to a friend, co-worker, or family member.

Subscribe to other newsletters we recommend.

MISSED LAST WEEK’S EDITION?

Find all of our newsletters here.