Measurement methodology

PRAXIS

A defined method for measuring the efficiency difference between unstructured and AXIS-governed AI exchanges. Baseline versus AXIS-governed. Identical tasks. Logged token counts.

What PRAXIS is

PRAXIS is the measurement protocol for the AXIS efficiency claim. It defines how to test whether AXIS-governed exchanges produce fewer correction loops, shorter completions, and lower token usage than unstructured equivalents — under controlled, reproducible conditions.

It is not a product, not a service, and not a certification. It is a defined test format. Any operator can run it. The methodology is intentionally simple so that results can be compared across different operators, models, and task types.

The claim PRAXIS tests: structured prompts reduce drift, retry loops, and token waste compared to unstructured equivalents. Token reduction may correlate with compute efficiency under specified conditions — but PRAXIS does not claim energy savings as a guaranteed outcome. It measures what is directly observable: token counts, completion length, retry rate.

The comparison method

A PRAXIS test requires two runs of the same task — one unstructured (baseline), one AXIS-governed. Everything else is held constant: the model, the task, the context, the evaluation criteria.

Baseline
Unstructured exchange
The task is sent as natural language — the way most people communicate with AI. No operators. No structural markers. No explicit scope boundaries.
AXIS-governed
Structured exchange
The same task, structured with AXIS operators. Intent is marked. Scope is bounded. Questions are explicit. The AI works from a defined signal, not inference.
What gets measured
01
Token count per turn
Total tokens used (prompt + completion) for each turn of the exchange. Logged per-turn and aggregated across the full task.
02
Completion length
Word and token count of each AI response. AXIS-governed exchanges should produce more targeted completions — answering what was asked, not what was assumed.
03
Retry rate
Number of correction turns — where the human re-prompts because the AI misread intent, scope, or constraint. Each retry is a measurable cost of ambiguity.
04
Task completion fidelity
Did the AI complete what was asked? Scored against a defined success criterion established before the test run begins.
Controlled task replication

For a PRAXIS test to be valid, the task must be reproducible. That means:

The task is defined before either run begins. Success criteria are written down before any output is seen. The same model is used for both baseline and AXIS-governed runs, at the same temperature settings. The context window starts fresh for each run — no carry-over. Results are logged verbatim, not summarised from memory.

Any operator can run a PRAXIS test using their own tasks and their preferred AI. Results should be shared with the task definition, model used, and both full transcripts — not just the summary metrics.

Methodology status

PRAXIS is in active development. The framework described here is the current working version — it has been used in informal testing and has produced consistent directional results. It has not yet been independently verified or peer-reviewed.

Current status
Methodology evolving. The core comparison method (baseline vs AXIS-governed, identical tasks, logged token counts) is stable. The scoring criteria, task taxonomy, and cross-model normalisation approach are still being developed. Results from early tests are directionally consistent with the claim but cannot yet be treated as independently verified findings.

Independent replication is welcomed. If you run a PRAXIS test, share your task definition, model, and full transcripts. This methodology gets stronger with every transparent replication.