Notes
What Are Agents?
- Anthropic definition: Agents are models using tools in a loop.
- Process:
- Given a task
- Work continuously, use tools as needed
- Update decisions based on tool-call feedback
- Continue independently until the task is complete
- Key Principle: Keep the system prompt and agent’s instructions as simple as possible.
When to Use Agents
- Not always needed—many tasks are better suited to workflows or direct prompting.
- Best for complex, valuable, not-easily-defined tasks:
- If a human could do it step-by-step, you likely don’t need an agent.
- Use when it’s unclear how to complete a task, or multiple tools are needed.
- High-value, maybe revenue-generating or highly leveraged.
- Prefer giving agents tasks where error recovery is simple or inexpensive.
- Assess before building:
- Is the task complex?
- Is it valuable?
- Are all parts doable (are all tools/data available)?
- What’s the cost of errors?
Examples Where Agents Are Useful:
- Coding (from design doc to PR)
- Data analysis (from unknown data structure to insights/visualization)
- When error correction is easy (search, UI interaction)
Principles for Prompting Agents
1. Think Like Your Agents
- Model their environment (tools + tool feedback)
- Simulate the process, clarify tasks for both humans and AI
2. Conceptual Engineering
- Prompting = conceptual engineering (not just writing text)
- Define concepts and behaviors clearly (e.g., “irreversibility”: avoid harmful actions)
- Be specific about edge cases and expectations
3. Give Clear Heuristics
- Don’t assume models know when to stop searching (explicitly tell agent when it’s “done”)
- Set budgets: e.g.,
- “Simple query: ≤5 tool calls”
- “Complex: 10–15 tool calls”
- Think as if instructing a new intern who needs explicit rules and principles
4. Tool Selection & Instructions
- Clarify which tools to use for what usecase
- Avoid similar-sounding, overlapping tools. Make each tool distinct.
- Combine redundant tools; don’t overwhelm agent with too many similar options.
5. Guide the Thinking Process
- Prompt the agent to plan (“How hard is this query? How many tool calls? What sources should I use?”)
- Use “interleaved thinking”—reflect between tool calls. (Think tools)
- Remind the agent to verify web results and add disclaimers if unsure.
6. Prepare for Unintended Side Effects
- Agents are unpredictable—they run loops, iterate, and might not stop as expected.
- Add criteria for when to stop or fallback (“If you don’t find the perfect source, stop after X attempts”).
- Roll back prompts if issues emerge.
7. Manage Context Window
- Be aware of token/context limits (e.g., Cloud 4 models: 200k tokens)
- Techniques:
- Automatically summarize/compress context as it grows (compaction as tool calls)
- Write important data/memory to external files (file system as tool calls)
- Use sub-agents to split up work and compress data for the main agent (eg. open deep research by langgraph)
8. Let Claude Be Claude
- Start with a simple prompt and tools, see where it fails, iterate improvements.
Example & Demo
- Agents can use multiple tools in sequence/parallel:
- Search database → Reflect → Create invoice → Send email
- In demo, prompt instructs to “search web agentically” and plan steps/stop as needed.
Evaluation Best Practices
- Eval is essential.
- Don’t overcomplicate—small, consistent evals suffice.
- Use realistic tasks; don’t use artificial, unrelated examples.
- LLMs as Judge: Use models with rubrics to check:
- Answer accuracy
- Tool-use accuracy
- Did agent reach expected final state?
- Nothing replaces manual, human review for edge cases.
Prompt Engineering for Agents vs. Classic Models
- Start simple: Begin with a basic prompt, only add details as you encounter edge cases.
- Iterate: Build a collection of passing/failing test cases and refine your prompt to handle failures.
- Few-shot/Chain-of-Thought:
- Over-prescriptive examples or chains limit advanced models.
- State-of-the-art models have chain-of-thought built-in.
- Prefer instructing how to think/plan, not showing exactly what to say/do.
- Provide examples only when needed and not too prescriptive.
Closing
- Test, observe, and adjust. Simple, clear prompting, explicit heuristics, and focused tools lead to effective agents.
- Manual review and small, targeted evals are key to steady improvement.
For deeper or more targeted notes (e.g., focusing only on heuristics, tool use, or evals), let me know!