1. Memory pruning protects both quality and cost
Long sessions accumulate irrelevant details, repeated acknowledgments, and outdated assumptions. Without pruning, the model becomes slower, more expensive, and more likely to focus on the wrong context.
2. Keep working memory and durable memory separate
A session should distinguish immediate task context from long-term user facts or policy preferences. That separation makes pruning safer.
3. Prune by value, not by age alone
Some old details still matter because they constrain the task. Some recent details do not matter at all. Good pruning rules consider task relevance, recency, confidence, and whether the fact already exists in durable memory.
4. Summaries should preserve decisions and unresolved items
Compression works best when it keeps commitments, active goals, constraints, and open questions. That is the context the next turn truly needs.
5. Review pruning failures explicitly
If the system forgets critical context or keeps too much noise, those failures should feed back into the pruning rubric and memory-layer design.
Practical Checklist
- Separate working memory from durable memory.
- Prune by task value and relevance, not by age alone.
- Preserve decisions, constraints, and open questions in summaries.
References
- LangChain, Memory overview
Useful for framing separate memory layers and persistence choices.
- Anthropic, Long context tips
Helpful for reasoning about long-context behavior and summarization.
- OpenAI, Prompting guide
Relevant when pruning changes what remains in the active prompt.