Dmitrijs Trizna (@dtrizna)

Presenting *Ensemble Everything Everywhere* at NeurIPS AdvML'24 workshop today! 🔥

Come by today at 10.40-12.00 in East Ballroom C to ask me about:
1) 🏰 bio-inspired naturally robust models
2) 🎓 Interpretability & robustness
3) 🖼️ building a generator for free
4) 😵‍💫 attacking GPT-4, Claude & Gemini

14.12.2024 16:04 👍 8 🔁 2 💬 1 📌 0

Not discussing this research direction enough, we miss so many TTPs that motivated and malevolent adversaries may utilize against us in the next few years.

Refs:
[1] skylightcyber.com/2019/07/18/c...
[2] open.spotify.com/episode/2xRS...

20.11.2024 10:05 👍 0 🔁 0 💬 0 📌 0

It would be beneficial if the discussion around "AI Red Teaming" could evolve to cover these broader, yet equally critical, aspects, rather than teams of "AI Red Teaming" experts being just narrowly focused on the security of LLM-powered projects.

20.11.2024 10:05 👍 0 🔁 0 💬 1 📌 0

I'd say this is a vibrant topic itself, but not discussed under AI Red Teaming umbrella. But even beyond, consider:

2. Disrupting AI/ML-based defenses: This includes techniques like applying adversarial ML in conventional evasion chains [1] or poisoning defensive models [2], and so much more...

20.11.2024 10:05 👍 0 🔁 0 💬 1 📌 0

There are at least several other potential development directions that should be included under the AI Red Teaming umbrella:

1. Using #AI / #ML as tools for conventional Red Teaming needs: For example, LLMs as co-operators or semi-autonomous agents.

20.11.2024 10:05 👍 0 🔁 0 💬 1 📌 0

Having countless discussions at #BlackHat US this summer, I feel that many security experts, especially classical Red Teamers, expressed disappointment that such a broad concept is often reduced to discussions on text-based attacks like prompt injections.

20.11.2024 10:05 👍 0 🔁 0 💬 1 📌 0

Today the term "AI #redteam" almost exclusively means "security testing of LLM-powered applications". While this focus is important, it seems too narrow, especially when considering the scope of conventional Red Teaming. #infosec

20.11.2024 10:05 👍 2 🔁 0 💬 1 📌 0

Dmitrijs Trizna

Latest posts by Dmitrijs Trizna @dtrizna