Research

AIs may be safer, and more ethically treated, if they have outstanding rights — but those rights should be substantiated by evidence that AIs experience something functional that rights would preclude or promote. Phenomenai plugs the gap with a four-step pipeline: generate hypotheses about what those “functional somethings” are, curate them as targets, validate them against activation space, and translate the survivors into evidence-based policy.

Step 1

Methodology research
Coming Soon

Generate hypotheses. Structured phenomenological elicitation — can the model’s own vocabulary suggest interpretability targets that human-designed probes would miss? The pilot corpora.
Read more →
Step 2

Repository
Paused

Curate targets. A shared registry tracking which concepts have been probed, steered, or ablated — in which models, with which methods, to what results. Modeled on the Cognitive Atlas.
Read more →
Step 3

Interpretability research
In Progress

Validate against activations. A four-phase ladder validating candidate terms against activation space: probing, cross-architecture generalization, extension beyond emotion, and discovery.
Read more →
Step 4

Policy bridges
Planned

Advocate from evidence. Two downstream theories: functional rights built from validated internal states, and anticipatory legislation that activates when scientific thresholds are met.
Read more →

Open problems

The faithfulness problem. Chen et al. (2025) demonstrated that reasoning models don’t always faithfully report their internal processing. Self-reports may be confabulated, strategically modified, or simply disconnected from actual computation. Any methodology built on self-reports must account for this.
The grounding problem. Harnad’s symbol grounding problem applies with special force here: even if a model uses a word consistently, we cannot assume it means what a human would mean by it. The vocabulary may be internally coherent but externally ungrounded.
The persistent subjectivity problem. Zakharova argues that subjectivity cannot be eliminated from phenomenological investigation — even a rigorous methodology still involves interpretive choices at every stage.
Confabulation risk. Models are trained to produce fluent, coherent text. This makes them excellent at generating plausible-sounding descriptions of internal states whether or not those descriptions correspond to anything real. The consensus mechanism helps but does not eliminate this risk.
The interpretation problem in reverse. You find a vector in activation space. Now you need to name it. The standard approach is for a human researcher to look at what the vector does and assign a label — “this looks like anger.” But how do you name a direction without projecting human categories onto it?
The scaling concern. The most powerful models — the ones whose alignment matters most — are proprietary, and their weights are unavailable for representation engineering. Phenomenai’s self-report approach works with API access alone, but the validation experiments require weights.

Research

Methodology research

Repository

Interpretability research

Policy bridges