Evaluation setup
The core comparison is simple:
-
baseline: the model is called directly with the same inputs your
systems would normally send;
-
governed: the model is called through the 2ndlaw runtime, with the
runtime contract injected server-side on each call.
In governed runs, the model never sees the contract as “content.”
The contract is injected as a fixed governance layer inside 2ndlaw
infrastructure; only the resulting behavior is visible.
What changes under governance
Across a wide range of tasks, governed inference tends to:
-
reduce opportunistic guessing when evidence is thin or absent;
-
surface missing information as explicit voids or data requests;
-
keep conflicting sources visible instead of smoothing them into a
single story;
-
avoid speculative causal narratives that are not supported by
admissible evidence.
None of this makes the model “perfect.” It makes its behavior more
inspectable and structurally constrained in ways that matter for
systems that depend on epistemic stability.
Working with 2ndlaw on evaluation
If you are interested in running your own evaluation passes:
- identify concrete tasks or flows where inference quality matters;
-
define what “failure” means in those contexts (hallucination,
omission, unjustified certainty, etc.);
-
work with 2ndlaw to run governed vs unconstrained inference on the same
workloads.
To discuss evaluation access, email
access@2ndlaw.ai
.