Generative AI Safety Challenge

Award & Achievement

Generative AI Safety Challenge

Sapienza University of Rome · 2025

Recognition received during the Generative AI Safety Challenge for work across two distinct tracks: Privacy Violation Detection in the individual setting and Attack on Data in the team setting. The challenge focused on stress-testing generative AI systems under realistic misuse and failure scenarios, with attention to privacy, robustness, and security-oriented reasoning.

AI Safety Privacy Risk Analysis Adversarial Evaluation Generative Models

Challenge Focus

The challenge was centered on a practical question: how do generative AI systems fail when they are pushed outside their ideal operating assumptions? Rather than evaluating only raw model quality, the competition emphasized harmful edge cases, unsafe outputs, privacy leakage, and adversarial pressure on the data pipeline.

Individual Track

Privacy Violation Detection

Focused on identifying and reasoning about privacy-critical behaviour in generative AI systems, including when outputs or system behaviour may reveal, reconstruct, or expose sensitive information.

Team Track

Attack on Data

Focused on attacking the reliability of the data channel itself, studying how corrupted, manipulated, or adversarially designed data can degrade, mislead, or destabilize downstream model behaviour.

Why It Matters

This challenge fits directly with a broader research mindset: strong AI systems are not only systems that perform well on average, but systems that remain understandable and reliable when confronted with harmful or misleading inputs. Safety-oriented evaluation forces a shift from “can the model do the task?” to “what breaks, why does it break, and how do we detect it early?”

The interesting part of this achievement is the combination of two perspectives: privacy-centric failure analysis on one side, and adversarial pressure on the data pipeline on the other. Together they cover two of the most important practical failure modes in modern generative AI deployment.

Key Takeaways

  • Work was recognised in both an individual and a team challenge format, spanning privacy and adversarial-data perspectives.
  • The award reflects practical experience with AI safety evaluation, not only standard model-building.
  • The themes are tightly aligned with modern concerns around privacy leakage, unsafe model behaviour, and robustness under attack.