Generative AI Safety Challenge
Award & Achievement
Generative AI Safety Challenge
Sapienza University of Rome · 2025
Recognition received during the Generative AI Safety Challenge for work across two distinct tracks: Privacy Violation Detection in the individual setting and Attack on Data in the team setting. The challenge focused on stress-testing generative AI systems under realistic misuse and failure scenarios, with attention to privacy, robustness, and security-oriented reasoning.
Challenge Focus
The challenge was centered on a practical question: how do generative AI systems fail when they are pushed outside their ideal operating assumptions? Rather than evaluating only raw model quality, the competition emphasized harmful edge cases, unsafe outputs, privacy leakage, and adversarial pressure on the data pipeline.
Individual Track
Privacy Violation Detection
Focused on identifying and reasoning about privacy-critical behaviour in generative AI systems, including when outputs or system behaviour may reveal, reconstruct, or expose sensitive information.
Team Track
Attack on Data
Focused on attacking the reliability of the data channel itself, studying how corrupted, manipulated, or adversarially designed data can degrade, mislead, or destabilize downstream model behaviour.
Why It Matters
This challenge fits directly with a broader research mindset: strong AI systems are not only systems that perform well on average, but systems that remain understandable and reliable when confronted with harmful or misleading inputs. Safety-oriented evaluation forces a shift from “can the model do the task?” to “what breaks, why does it break, and how do we detect it early?”
Key Takeaways
- Work was recognised in both an individual and a team challenge format, spanning privacy and adversarial-data perspectives.
- The award reflects practical experience with AI safety evaluation, not only standard model-building.
- The themes are tightly aligned with modern concerns around privacy leakage, unsafe model behaviour, and robustness under attack.
