Notes


What Are Agents?

  • Anthropic definition: Agents are models using tools in a loop.
  • Process:
    • Given a task
    • Work continuously, use tools as needed
    • Update decisions based on tool-call feedback
    • Continue independently until the task is complete
  • Key Principle: Keep the system prompt and agent’s instructions as simple as possible.

When to Use Agents

  • Not always needed—many tasks are better suited to workflows or direct prompting.
  • Best for complex, valuable, not-easily-defined tasks:
    • If a human could do it step-by-step, you likely don’t need an agent.
    • Use when it’s unclear how to complete a task, or multiple tools are needed.
    • High-value, maybe revenue-generating or highly leveraged.
    • Prefer giving agents tasks where error recovery is simple or inexpensive.
  • Assess before building:
    • Is the task complex?
    • Is it valuable?
    • Are all parts doable (are all tools/data available)?
    • What’s the cost of errors?

Examples Where Agents Are Useful:

  • Coding (from design doc to PR)
  • Data analysis (from unknown data structure to insights/visualization)
  • When error correction is easy (search, UI interaction)

Principles for Prompting Agents

1. Think Like Your Agents

  • Model their environment (tools + tool feedback)
  • Simulate the process, clarify tasks for both humans and AI

2. Conceptual Engineering

  • Prompting = conceptual engineering (not just writing text)
  • Define concepts and behaviors clearly (e.g., “irreversibility”: avoid harmful actions)
  • Be specific about edge cases and expectations

3. Give Clear Heuristics

  • Don’t assume models know when to stop searching (explicitly tell agent when it’s “done”)
  • Set budgets: e.g.,
    • “Simple query: ≤5 tool calls”
    • “Complex: 10–15 tool calls”
  • Think as if instructing a new intern who needs explicit rules and principles

4. Tool Selection & Instructions

  • Clarify which tools to use for what usecase
  • Avoid similar-sounding, overlapping tools. Make each tool distinct.
  • Combine redundant tools; don’t overwhelm agent with too many similar options.

5. Guide the Thinking Process

  • Prompt the agent to plan (“How hard is this query? How many tool calls? What sources should I use?”)
  • Use “interleaved thinking”—reflect between tool calls. (Think tools)
  • Remind the agent to verify web results and add disclaimers if unsure.

6. Prepare for Unintended Side Effects

  • Agents are unpredictable—they run loops, iterate, and might not stop as expected.
  • Add criteria for when to stop or fallback (“If you don’t find the perfect source, stop after X attempts”).
  • Roll back prompts if issues emerge.

7. Manage Context Window

  • Be aware of token/context limits (e.g., Cloud 4 models: 200k tokens)
  • Techniques:
    • Automatically summarize/compress context as it grows (compaction as tool calls)
    • Write important data/memory to external files (file system as tool calls)
    • Use sub-agents to split up work and compress data for the main agent (eg. open deep research by langgraph)

8. Let Claude Be Claude

  • Start with a simple prompt and tools, see where it fails, iterate improvements.

Example & Demo

  • Agents can use multiple tools in sequence/parallel:
    • Search database → Reflect → Create invoice → Send email
  • In demo, prompt instructs to “search web agentically” and plan steps/stop as needed.

Evaluation Best Practices

  • Eval is essential.
  • Don’t overcomplicate—small, consistent evals suffice.
  • Use realistic tasks; don’t use artificial, unrelated examples.
  • LLMs as Judge: Use models with rubrics to check:
    • Answer accuracy
    • Tool-use accuracy
    • Did agent reach expected final state?
  • Nothing replaces manual, human review for edge cases.

Prompt Engineering for Agents vs. Classic Models

  • Start simple: Begin with a basic prompt, only add details as you encounter edge cases.
  • Iterate: Build a collection of passing/failing test cases and refine your prompt to handle failures.
  • Few-shot/Chain-of-Thought:
    • Over-prescriptive examples or chains limit advanced models.
    • State-of-the-art models have chain-of-thought built-in.
    • Prefer instructing how to think/plan, not showing exactly what to say/do.
    • Provide examples only when needed and not too prescriptive.

Closing

  • Test, observe, and adjust. Simple, clear prompting, explicit heuristics, and focused tools lead to effective agents.
  • Manual review and small, targeted evals are key to steady improvement.

For deeper or more targeted notes (e.g., focusing only on heuristics, tool use, or evals), let me know!