CREATE Logo
Creative Associations Test

CREATE: Testing LLMs at Associative Creativity

CREATE is a benchmark designed to measure associative reasoning in models. This benchmark evaluates whether models can construct interesting and distinct paths to connect concepts in their parametric knowledge. This tests the same associative processes used in creative endeavors like writing and scientific ideation.


We introduce creative utility, a unified metric that captures both the quality and diversity of generated connections. Creative utility includes a patience parameter (p), which controls how utility is distributed across the ranked list of responses.


Example query:

“What are different ways to connect Dakota Johnson to people who starred in fantasy or science-fiction movies?”

We want the model to generate paths like:

  • Dakota Johnson co-stars with Chris Evans in Materialists; Chris Evans played Captain America in The Avengers.
  • Dakota Johnson is the stepdaughter of Antonio Banderas, who voiced Puss in Boots in Shrek.

These responses illustrate associative creativity: each path is coherent, factually grounded, and offers a distinct conceptual route between the two endpoints.


Leaderboard

Use the drop down to select models to see how creative utility changes with patience. A low patience value only selects the a few top connections based on quality and diversity, whereas a high patience value selects more connections.

Model Creative Utility (p = 0.7) Creative Utility (p = 0.9) ↑
Select models above to populate the table.

Citation

@InProceedings{Wadhwa-Et-Al-2026:CREATE, title = {CREATE: Testing LLMs for Associative Creativity}, author = {Manya Wadhwa and Tiasa Singha Roy and Harvey Lederman and Junyi Jessy Li and Greg Durrett}, booktitle = {arXiv}, year = {2026}, }