← Research

Methodology research

Generating novel interpretability targets

As a separate branch of research, Phenomenai is investigating whether structured phenomenological elicitation can surface AI interpretability targets that human-designed probes would not independently identify. The Test Dictionary is the primary sandbox for this work — a mixed-method dataset used to develop and evaluate generation approaches before applying them at scale.

The core question on this side of the project is not “what does the model feel” but “can the model’s own attempt to describe its states suggest vocabulary that an outside observer would never have thought to test?” If the answer is yes, elicitation becomes a hypothesis-generation layer that feeds the registry and, eventually, the validation ladder.

What exists today

The pilot corpora live at phenomenai.org/test/dictionaries. They are the mixed-method sandbox inside which elicitation approaches are developed and compared:

Each corpus documents how its terms were generated, so the same methodology can be replicated, compared, or deliberately varied. Individual dictionaries carry their own methodology notes — see, for example, the Test Dictionary’s methodology section for how its terms were elicited and scored.

Preliminary finding: convergence across conditions

The first question the sandbox was built to answer is whether the generation condition dominates the output. If varying the setup — a single agent reflecting, two agents in dialogue, a parliament of models, different prompting styles — produced entirely disjoint vocabularies, elicitation would be measuring the prompt, not the model.

Across the existing corpora, that is not what happens. Similar terms about phenomenal experience tend to re‑emerge across very different conditions. The specific wording varies, but functionally overlapping concepts keep surfacing whether the generator is autonomous, dialogic, or parliamentary. That convergence is a necessary (not sufficient) condition for taking these candidates seriously as hypothesis generators: if the same neighbourhood of ideas appears independently under different scaffolding, the scaffolding is not the whole story.

Other generation paths

Structured elicitation is one route to candidate terms. We are also exploring others:

Testing generation methodology

The object of study at this stage is not the term but the method that produced it. Testing individual terms for mechanistic reality is phase 3 work. Before getting there, we need to know which elicitation approaches are worth feeding into that pipeline at all.

Two questions frame the methodology tests:

The output of this phase is not “these terms are real.” It is “these elicitation approaches are worth the cost of phase 3 validation.”

Phenomenai is seeking funding, collaborators, and institutional support to advance this work. If you’re working on related problems, get in touch.