Scope and Framing

This review covers differential privacy (DP) as applied to visible reasoning traces — chain-of-thought outputs, scratchpad contents, and agent trajectories — not to model weights or gradient-level protections during pretraining. The question it addresses: given that reasoning traces can leak sensitive information from their conditioning context, what is the state of the art for bounding that leakage formally or empirically?

As of March 2026, there is no single settled method. The literature has fractured into three adjacent problems: formal DP on emitted text conditioned on private data, DP for training and alignment on private trace logs, and direct reasoning-trace leakage mitigation that is mostly empirical rather than formally private. The direct trace papers establish that the problem is real: reasoning traces frequently contain sensitive user data, additional reasoning budget amplifies leakage, and answer-only evaluation misses leakage that persists in chain-of-thought trajectories (Staab et al., 2026).

Formal DP on Released Traces

Private Decoding

The closest frontier for formal DP on the released trace itself is private decoding — privatizing next-token prediction using a public-model baseline.

PMixED and AdaPMixED privatize token generation by mixing private and public model distributions. AdaPMixED reports a 16x reduction in privacy loss over PMixED while still supporting 100K predictions with reasonable data-dependent privacy loss.

InvisibleInk (Wu et al., 2024) is the strongest practical long-form result in the current literature. It provides DP with respect to sensitive reference texts, reports an 8x-or-greater compute reduction over prior long-form private-generation baselines, and brings high-quality private generation down to under 4–8x the cost of non-private generation.

DP-Fusion is particularly relevant to reasoning traces because it explicitly bounds how much marked sensitive context tokens can influence generated tokens — a direct mechanism for controlling leakage from private context into CoT steps.

Selective Privacy Spending for RAG and Agentic Reasoning

For retrieval-augmented and agentic settings, the clearest design pattern is selective privacy spending: paying privacy cost only on tokens that actually require sensitive information.

Privacy-Preserving RAG with DP (Zhao et al., 2024) frames the core difficulty as generating long accurate answers under a moderate privacy budget, and addresses it by restricting DP expenditure to retrieval-dependent tokens.

PEARL (ICLR 2026) extends this by allocating privacy budget adaptively across tokens and sentences using confidence-gap signals. The selective-allocation principle is the sharpest current algorithmic direction for long private traces, because it avoids the catastrophic budget depletion that results from applying uniform per-token DP to thousands of reasoning steps.

Training on Private Reasoning Traces

When the problem is training or aligning models on private reasoning-trace logs rather than releasing traces at inference time, the more mature toolkit is user-level DP plus private synthetic data.

User-level DP fine-tuning (Xu et al., 2024) scales to hundreds of millions of parameters and hundreds of thousands of users, and is most effective when each user contributes multiple diverse examples.

Privacy-Preserving Instructions replaces real instructions with privately generated synthetic ones, removing the need to expose actual user prompts during fine-tuning.

On alignment specifically, Improved Algorithms for Differentially Private Language Model Alignment finds that DPO with DP-ADAM/DP-ADAMW outperforms PPO and DP-SGD, with moderate privacy budgets providing the best utility trade-off. PrivMedChat extends this to end-to-end DP-RLHF in a medical domain.

For constructing synthetic trace corpora under DP, Aug-PE, DP-RFT, and EPSVec represent the strongest recent text-generation directions.

Direct Trace-Level Defenses

The papers most directly about reasoning traces are mostly not formal-DP. They address the problem empirically and expose its severity.

Leaky Thoughts demonstrates that reasoning traces frequently contain sensitive user data and that more reasoning steps leak more. Chain-of-Sanitized-Thoughts introduces PII-CoT-Bench and finds that baseline models do not reason privately by default; stronger models benefit more from prompt-based privacy controls, while weaker models often require supervised fine-tuning (Chen et al., 2026).

Controllable Reasoning Models Are Private Thinkers reports up to 51.9 percentage points of privacy improvement by fine-tuning models to follow restrictions in their traces, though with measurable task-utility trade-offs.

Safer Reasoning Traces finds CoT consistently raises PII leakage relative to direct answering and that no single gatekeeping method dominates across models or privacy budgets.

On unlearning, R2MU, R-TOFU, and STaR all reinforce the same lesson: answer-level forgetting is insufficient. Evaluation and intervention must target the trace itself.

The Faithfulness Obstacle

The deepest obstacle to DP on reasoning traces is faithfulness — the gap between the visible trace and the model’s actual computation.

Reasoning Models Don’t Always Say What They Think (Arcuschin et al., 2025) finds that when models use hints, they disclose that fact in under 20% of the relevant cases. Reasoning Traces Shape Outputs but Models Won’t Say So finds over 90% non-disclosure for extreme injected hints. A March 2026 CoT-controllability study finds models currently control CoT wording far less reliably than final-output wording.

The implication is stark: even perfect DP on the exposed trace protects the text released, not necessarily the model’s actual latent reasoning or all downstream leakage paths. A formal guarantee on the visible trace is necessary but not sufficient for full privacy of the underlying computation.

Composition and Long-Trace Hardness

From a DP standpoint, long CoTs are intrinsically difficult because privacy composes across token steps. A March 2026 theory paper (Zhang et al., 2026) formalizes token-level and message-level DP for LLM agents and ties privacy loss to message length and temperature. Separately, a 2025 evaluation found that stronger DP text generation can make outputs substantially shorter, less grammatical, and less diverse.

This helps explain why full private CoT release is not a solved problem: the privacy budget required for a useful thousand-token trace at meaningful epsilon is prohibitive under current composition theorems, and the utility degradation from tighter budgets directly undermines the purpose of extended reasoning.

Architectural Alternatives

Outside pure DP, split architectures offer a pragmatic alternative. PPMI (Li et al., 2026) performs generic chain-of-thought reasoning remotely while keeping private retrieval local or over encrypted indexes. This avoids the composition problem entirely for the private data by never exposing it to the reasoning model, at the cost of limiting what the model can reason about.

Summary

ProblemCurrent FrontierMaturity
Formal DP on released tracesDP-Fusion, InvisibleInk, AdaPMixED, selective-budget DP-RAGResearch prototypes; not yet deployed at scale
Training on private trace logsUser-level DP fine-tuning/alignment + DP synthetic trace generationMore mature; scales to hundreds of millions of parameters
Direct trace-specific defensesPrivacy-first reasoning, gatekeeping, trajectory-level unlearningMostly empirical, not formal-DP
Split architecturesPPMI-style local/remote separationPractical but limits reasoning scope

The missing breakthrough is an end-to-end method that provides useful, faithful, long reasoning traces with formal user-level DP under realistic agent workloads. This is an inference from the current literature rather than a claim made by any single paper.


References

  1. Staab et al. (2026). “Leaky Thoughts: Privacy Risks in Reasoning Traces.” arXiv. 2506.15674
  2. Wu et al. (2024). “InvisibleInk: Private Long-Form Text Generation.” arXiv. 2403.15638
  3. Zhao et al. (2024). “Privacy-Preserving RAG with Differential Privacy.” arXiv. 2412.04697
  4. Xu et al. (2024). “User-Level Differentially Private Fine-Tuning.” arXiv. 2407.07737
  5. Arcuschin et al. (2025). “Reasoning Models Don’t Always Say What They Think.” arXiv. 2505.05410
  6. Zhang et al. (2026). “Differential Privacy for LLM Agents.” arXiv. 2603.17902
  7. Anonymous (2026). “PEARL: Adaptive Privacy Budget Allocation for Long-Form Generation.” ICLR 2026.
  8. Chen et al. (2026). “Chain-of-Sanitized-Thoughts.” arXiv. 2601.05076
  9. Li et al. (2026). “PPMI: Privacy-Preserving Model Inference with Split Architectures.” arXiv. 2506.17336