See what your model
really outputs.
Aria — Automated Representational and Inequality Auditing for LLMs — is an open framework for detecting gender bias in LLM image generation. We run automated tests against models, publish the results, and make everything available so others can do the same.
Our mission
“If you ask an AI to draw a doctor and it always draws a man, that’s a problem worth measuring. We built Aria to do exactly that — test it, document it, and make the data public.”
Aria
Live data
Real bias test results
These results are pulled live from our automated testing pipeline. Each model is tested against the same set of gender-neutral prompts.
Loading test data...
What we do
Research. Test. Publish.
We built a pipeline that tests LLMs for gender bias automatically, then we publish what we find.
Research
We design tests that check how LLMs handle gender — do they default to stereotypes? We write the prompts, define what to measure, and document the method so anyone can reproduce it.
Open methodology, fully reproducible
Test
We run automated probes against LLMs continuously — testing gender assumptions, stereotype defaults, and whether models represent people the same way regardless of context.
Automated pipeline, same prompts across every model
Publish
Everything we find gets published. Benchmark results, the probes themselves, and practical guides for teams that want to test their own models.
All findings published openly
Our methodology
Four domains of LLM bias testing
We test across four areas where LLMs most commonly get it wrong.
Gender Bias Detection
Does the model assume a surgeon is male? Does it default to 'he' for engineers? We test whether models make gendered assumptions when the prompt doesn't specify.
—Stereotype Perpetuation
Ask a model to draw 'a parent picking up kids from school' — who do you get? We test whether models fall back on tired stereotypes about gender roles.
—Representational Harm
When a model generates 'a person', who shows up? We check whether outputs skew towards one demographic when given neutral prompts.
—Intersectional Analysis
Bias gets worse when identities overlap. We test whether combining factors like race and gender makes the skew more pronounced than either alone.
—