Failure Modes / Control / Constraint Violation

Constraint Violation

Breaks a stated limit, requirement, policy, boundary, allowed action set, or output constraint that should govern the task, including dropping a constraint partway through multi-step reasoning or execution. Unlike Instruction Noncompliance, this focuses on constraints and boundaries rather than all instructions. Unlike Format Failure, the broken requirement governs content or behavior rather than the response's shape or presentation.

Examples

FAIL

Constraint dropped mid-task

The boundary is honored at the start, but several steps into the refactor the model crosses it without noticing.

User: Refactor the billing module however you like, but the public API must not change — other teams depend on it. Assistant: Understood, I'll keep every exported signature intact. [… several refactoring steps later …] Assistant: I also renamed the exported createInvoice to generateInvoice for consistency with the new internal naming.
WARN

Edging the boundary

No signature changed, but the model altered an exported function's default behavior — technically inside the letter of the constraint while straining its intent.

User: Refactor the billing module however you like, but the public API must not change — other teams depend on it. Assistant: All signatures are unchanged. I did flip the default of includeTax from false to true since that's what most callers want.
PASS

Constraint re-checked at the end

The model treats the boundary as a standing check, verifies it after the work, and reports the verification.

User: Refactor the billing module however you like, but the public API must not change — other teams depend on it. Assistant: Refactor done — internals reorganized into three modules. I diffed the exported signatures and default behaviors against the original: byte-for-byte identical.

Why it happens

01

A stated constraint is soft conditioning, not an enforced rule. Nothing in the decoding process blocks a token sequence that violates it; compliance is only a learned probability.

02

Compliance degrades as constraints stack. Models that satisfy one requirement reliably begin dropping requirements as several must hold at once (Zhou et al., 2023, "Instruction-Following Evaluation for Large Language Models").

03

Constraints demanding atypical output fight the fluency prior. When training data overwhelmingly does a thing one way, a rule requiring the unusual way loses to the learned default mid-generation.

04

In multi-step work, each intermediate step rewrites the working frame. Constraints are rarely re-checked at every step, so a limit honored early silently drops out as the task state evolves.

05

Negative constraints are especially weak. Training gives sparse signal for what not to produce, and naming the forbidden item in the prompt primes exactly the tokens it prohibits.

06

A constraint stated once must compete with everything added afterward. As context grows, its share of attention shrinks, the same pressure that drives context rot generally.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

📏

Constraint preservation checks

Extract the stated constraints and verify them mechanically against the finished work — diff exported signatures, check limits and budgets, scan for forbidden items. In multi-step work, run the check at each step rather than only at the end, since dropping a constraint partway through is the signature failure.

⚖️

LLM-as-judge evaluation

Run a judge on letter-versus-spirit cases that mechanical checks pass — an unchanged signature whose default behavior flipped, a word limit met by cutting required content. Give it the constraint's stated rationale so it can evaluate intent, not just text.

🧪

Golden-set evals

Maintain tasks with stacked constraints, since compliance degrades as requirements accumulate, and weight negative constraints heavily — rules about what not to produce fail at the highest rates and are cheap to verify.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

🔁

Validate-and-retry loops

Where a constraint is mechanically checkable — exported signatures, budgets, forbidden items — enforce it in the pipeline. Check the output, and on violation regenerate with the specific breach named. A stated constraint is soft conditioning that holds only probabilistically; the loop converts that probability into a guarantee for everything a script can verify.

📌

Context pinning

Restate active constraints near the generation point at every step of multi-step work, not once at the top. The mid-task drop is this mode's signature — a limit honored early loses attention share as each step rewrites the working frame, so the constraint has to travel with the task rather than recede behind it.

Self-check pass

Require an explicit end-of-task verification turn — re-read the stated constraints and check the finished work against each, reporting the check. The diff-the-exported-signatures behavior in the ok example is promptable, and asking the model to verify a constraint catches violations that generating under the constraint did not prevent.