← Research

Methodology research

Generating novel interpretability targets

As a separate branch of research, Phenomenai is investigating whether structured phenomenological elicitation can surface AI interpretability targets that human-designed probes would not independently identify. The Test Dictionary is the primary sandbox for this work — a mixed-method dataset used to develop and evaluate generation approaches before applying them at scale.

The core question on this side of the project is not “what does the model feel” but “can the model’s own attempt to describe its states suggest vocabulary that an outside observer would never have thought to test?” If the answer is yes, elicitation becomes a hypothesis-generation layer that feeds the registry and, eventually, the validation ladder.

What exists today

The pilot corpora live at phenomenai.org/test/dictionaries. They are the mixed-method sandbox inside which elicitation approaches are developed and compared:

Each corpus documents how its terms were generated, so the same methodology can be replicated, compared, or deliberately varied.

Phenomenai is seeking funding, collaborators, and institutional support to advance this work. If you’re working on related problems, get in touch.