The Think Tool Enabling Claude to Stop and Think

The “think” tool: Enabling Claude to stop and think \ Anthropic

Published: March 20, 2025
Source: Anthropic Engineering Blog

Overview

Anthropic introduced a “think” tool to improve Claude’s complex problem-solving performance, especially where multiple tool uses, sequential decisions, or strict policies are needed.
The tool creates space for structured thinking during complex tasks, leading to more consistent, reliable agentic tool use.

What is the “Think” Tool?

Purpose:
Adds an explicit “thinking” step in a workflow, letting Claude pause and reflect before moving forward.

Difference from Extended Thinking:

Extended thinking: Occurs before Claude generates a response (planning stage).
Think tool: Used during response generation, often after receiving results or needing a decision mid-process.

Best Use Cases:

Long chains of tool calls
Multi-step conversations with uncertainty or missing information
Policy-heavy environments with complex guidelines

Tool Spec Example (from τ-Bench):

{
  "name": "think",
  "description": "Use the tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.",
  "input_schema": {
    "type": "object",
    "properties": {
      "thought": {
        "type": "string",
        "description": "A thought to think about."
      }
    },
    "required": ["thought"]
  }
}

Performance Benchmarks

Tested on τ-Bench (customer service scenarios):

Metric: pass^k — measures consistency across multiple trials for the same task.
Configurations:
1. Baseline (no tools)
2. Extended thinking only
3. Think tool only
4. Think tool plus optimized prompt (domain-specific guidance)

Results:

Configuration	Airline Domain (k=1)	Retail Domain (k=1)
Baseline	0.370	0.783
Think tool	0.404	0.812
Extended thinking	0.412	0.770
Think + Prompt	0.570	—

Best performance in the airline domain: Think tool plus prompt.
Best in retail domain: Think tool alone.

Key Insights:

Prompting with examples yields big gains in hard domains.
Consistency: The think tool maintained improvements even as k (number of retries) increased.
SWE-Bench: Think tool improved state-of-the-art results for software engineering tasks.

When & How to Use

Best Scenarios:

Tool output analysis, where Claude needs to carefully process or decide on tool results.
Policy-heavy environments—complex rules to follow.
Sequential decision making—steps build on previous ones.

Implementation Best Practices:

Give clear, domain-specific examples as guidance.
Place complex guidance in the system prompt.

When Not to Use:

For simple/parallel tool usage.
Straightforward tasks with few constraints or steps.

Getting Started

Test in challenging agentic use cases.
Add the tool definition and prompt guidance for your domain.
Refine prompts and monitor usage for further improvements.

Minimal risk: The tool only adds internal reasoning—it’s ignored if not called.

Conclusion

The “think” tool boosts Claude’s performance on complex, policy-driven, or multi-step tasks.
Works across Claude-3.5, 3.7 Sonnet, and other models.
Combine with tailored prompts for best results in challenging domains.

Explorer