CREATE is a benchmark designed to measure associative reasoning in models. This benchmark evaluates whether models can construct valid, diverse, and insightful paths that connect two concepts through intermediate entities or relationships.
We introduce creative utility, a unified metric that captures both the quality and diversity of generated connections. Creative utility includes a patience parameter (p), which controls how utility is distributed across the ranked list of responses.
Example query:
“What are different ways to connect Dakota Johnson to people who starred in fantasy or science-fiction movies?”
We want the model to generate paths like:
These responses illustrate associative creativity: each path is coherent, factually grounded, and offers a distinct conceptual route between the two endpoints.
Use the drop down to select models to see how creative utility changes with patience. A low patience value only selects the a few top connections based on quality and diversity, whereas a high patience value selects more connections.