2L 2NDLAW epistemic governance for LLMs

Evaluation

How 2ndlaw is evaluated

This page is about behavior, not benchmarks. It summarizes what changes when the same model is run under the 2ndlaw runtime contract versus unconstrained usage, focusing on failure modes that matter in real systems: hallucinated bridges, hidden voids, and smoothed contradictions.

Evaluation setup

The core comparison is simple:

  • baseline: the model is called directly with the same inputs your systems would normally send;
  • governed: the model is called through the 2ndlaw runtime, with the runtime contract injected server-side on each call.

In governed runs, the model never sees the contract as “content.” The contract is injected as a fixed governance layer inside 2ndlaw infrastructure; only the resulting behavior is visible.

What changes under governance

Across a wide range of tasks, governed inference tends to:

  • reduce opportunistic guessing when evidence is thin or absent;
  • surface missing information as explicit voids or data requests;
  • keep conflicting sources visible instead of smoothing them into a single story;
  • avoid speculative causal narratives that are not supported by admissible evidence.

None of this makes the model “perfect.” It makes its behavior more inspectable and structurally constrained in ways that matter for systems that depend on epistemic stability.

What evaluation is not

This is not:

  • a public benchmark suite;
  • a leaderboard or vendor shootout;
  • a promise that governance can fix training-data pathologies or deep model biases.

Evaluation is an internal tool and a collaboration point with teams who want to see where governed inference actually makes a difference in their own contexts.

Working with 2ndlaw on evaluation

If you are interested in running your own evaluation passes:

  • identify concrete tasks or flows where inference quality matters;
  • define what “failure” means in those contexts (hallucination, omission, unjustified certainty, etc.);
  • work with 2ndlaw to run governed vs unconstrained inference on the same workloads.

To discuss evaluation access, email access@2ndlaw.ai .