Understanding the Claude Computer Use Demo

Published: October 24, 2024

I asked Cursor Compose to run through the interesting bits of Claude Computer Use Demo. The result is below.

The Claude Computer Use Demo is a fascinating project that showcases how Claude, an AI assistant, can interact with a virtual computer environment. This demo leverages the Anthropic API, along with some clever engineering, to create an interactive experience where users can instruct Claude to perform various tasks on a simulated computer.

Key Components

The demo consists of several key components:

  1. A Docker container that sets up a virtual Ubuntu environment
  2. A Python-based agent loop that interacts with the Anthropic API
  3. Custom tools for computer interaction (bash, editing, etc.)
  4. A Streamlit web interface for user interaction

Let's dive into how these components work together.

The Agent Loop

The heart of the demo is the sampling_loop function in loop.py. This function manages the back-and-forth communication between the user, Claude, and the virtual computer environment. Here's a simplified version of its structure:

async def sampling_loop(
    *,
    model: str,
    provider: APIProvider,
    system_prompt_suffix: str,
    messages: list[BetaMessageParam],
    output_callback: Callable[[BetaContentBlock], None],
    tool_output_callback: Callable[[ToolResult, str], None],
    api_response_callback: Callable[
        [httpx.Request, httpx.Response | object | None, Exception | None], None
    ],
    api_key: str,
    only_n_most_recent_images: int | None = None,
    max_tokens: int = 4096,
):
    # ... (setup code)

    while True:
        # Call the Anthropic API
        response = client.beta.messages.create(...)

        # Process the response
        for content_block in response.content:
            if content_block.type == "tool_use":
                result = await tool_collection.run(
                    name=content_block.name,
                    tool_input=content_block.input,
                )
                # ... (process tool result)

        # Add tool results to messages
        messages.append({"content": tool_result_content, "role": "user"})

        if not tool_result_content:
            return messages

This loop continuously processes messages, sends them to the Anthropic API, and handles any tool use requests from Claude.

Custom Tools

The demo implements custom tools that Claude can use to interact with the virtual environment. These tools are defined in the tools.py file and include:

  1. ComputerTool: For taking screenshots and simulating mouse/keyboard input
  2. BashTool: For executing bash commands
  3. EditTool: For editing text files

Here's a snippet of how the ComputerTool is implemented:

class ComputerTool(Tool):
    name = "computer"

    async def run(self, tool_input: dict[str, Any]) -> ToolResult:
        action = tool_input["action"]
        if action == "screenshot":
            return await self._screenshot()
        elif action == "click":
            return await self._click(tool_input["x"], tool_input["y"])
        # ... (other actions)

Streamlit Interface

The user interface is built using Streamlit, which provides a simple way to create web apps in Python. The streamlit.py file manages the UI and orchestrates the interaction between the user and the agent loop.

One interesting aspect is how the UI handles the asynchronous nature of the agent loop:

with st.spinner("Running Agent..."):
    st.session_state.messages = await sampling_loop(
        system_prompt_suffix=st.session_state.custom_system_prompt,
        model=st.session_state.model,
        provider=st.session_state.provider,
        messages=st.session_state.messages,
        output_callback=partial(_render_message, Sender.BOT),
        tool_output_callback=partial(
            _tool_output_callback, tool_state=st.session_state.tools
        ),
        api_response_callback=partial(
            _api_response_callback,
            tab=http_logs,
            response_state=st.session_state.responses,
        ),
        api_key=st.session_state.api_key,
        only_n_most_recent_images=st.session_state.only_n_most_recent_images,
    )

This code runs the agent loop and updates the UI in real-time as Claude responds and performs actions.

Interesting Details

  1. Multiple API Providers: The demo supports multiple API providers (Anthropic, Bedrock, and Vertex) through a simple configuration change.

  2. Image Management: To manage token usage, the demo implements a clever system to only send the N most recent images to Claude:

def _maybe_filter_to_n_most_recent_images(
    messages: list[BetaMessageParam],
    images_to_keep: int,
    min_removal_threshold: int = 10,
):
    # ... (implementation details)
  1. Security Considerations: The demo includes several security warnings and considerations, such as:
WARNING_TEXT = "⚠️ Security Alert: Never provide access to sensitive accounts or data, as malicious web content can hijack Claude's behavior"
  1. Custom System Prompt: Users can add a custom suffix to the system prompt, allowing for fine-tuning of Claude's behavior:
st.text_area(
    "Custom System Prompt Suffix",
    key="custom_system_prompt",
    help="Additional instructions to append to the system prompt. see computer_use_demo/loop.py for the base system prompt.",
    on_change=lambda: save_to_storage(
        "system_prompt", st.session_state.custom_system_prompt
    ),
)
  1. HTTP Exchange Logging: The demo includes a feature to log and display HTTP exchanges with the API, which is invaluable for debugging and understanding the communication flow.

Conclusion

The Claude Computer Use Demo is an impressive showcase of how AI can interact with computer environments. By combining the Anthropic API with custom tools and a user-friendly interface, it opens up exciting possibilities for AI-assisted computer interaction. The modular design and attention to security considerations make it an excellent starting point for developers looking to build similar applications.

Back to main page