Published: October 24, 2024
I asked Cursor Compose to run through the interesting bits of Claude Computer Use Demo. The result is below.
The Claude Computer Use Demo is a fascinating project that showcases how Claude, an AI assistant, can interact with a virtual computer environment. This demo leverages the Anthropic API, along with some clever engineering, to create an interactive experience where users can instruct Claude to perform various tasks on a simulated computer.
The demo consists of several key components:
Let's dive into how these components work together.
The heart of the demo is the sampling_loop
function in loop.py
. This function manages the back-and-forth communication between the user, Claude, and the virtual computer environment. Here's a simplified version of its structure:
async def sampling_loop(
*,
model: str,
provider: APIProvider,
system_prompt_suffix: str,
messages: list[BetaMessageParam],
output_callback: Callable[[BetaContentBlock], None],
tool_output_callback: Callable[[ToolResult, str], None],
api_response_callback: Callable[
[httpx.Request, httpx.Response | object | None, Exception | None], None
],
api_key: str,
only_n_most_recent_images: int | None = None,
max_tokens: int = 4096,
):
# ... (setup code)
while True:
# Call the Anthropic API
response = client.beta.messages.create(...)
# Process the response
for content_block in response.content:
if content_block.type == "tool_use":
result = await tool_collection.run(
name=content_block.name,
tool_input=content_block.input,
)
# ... (process tool result)
# Add tool results to messages
messages.append({"content": tool_result_content, "role": "user"})
if not tool_result_content:
return messages
This loop continuously processes messages, sends them to the Anthropic API, and handles any tool use requests from Claude.
The demo implements custom tools that Claude can use to interact with the virtual environment. These tools are defined in the tools.py
file and include:
ComputerTool
: For taking screenshots and simulating mouse/keyboard inputBashTool
: For executing bash commandsEditTool
: For editing text filesHere's a snippet of how the ComputerTool
is implemented:
class ComputerTool(Tool):
name = "computer"
async def run(self, tool_input: dict[str, Any]) -> ToolResult:
action = tool_input["action"]
if action == "screenshot":
return await self._screenshot()
elif action == "click":
return await self._click(tool_input["x"], tool_input["y"])
# ... (other actions)
The user interface is built using Streamlit, which provides a simple way to create web apps in Python. The streamlit.py
file manages the UI and orchestrates the interaction between the user and the agent loop.
One interesting aspect is how the UI handles the asynchronous nature of the agent loop:
with st.spinner("Running Agent..."):
st.session_state.messages = await sampling_loop(
system_prompt_suffix=st.session_state.custom_system_prompt,
model=st.session_state.model,
provider=st.session_state.provider,
messages=st.session_state.messages,
output_callback=partial(_render_message, Sender.BOT),
tool_output_callback=partial(
_tool_output_callback, tool_state=st.session_state.tools
),
api_response_callback=partial(
_api_response_callback,
tab=http_logs,
response_state=st.session_state.responses,
),
api_key=st.session_state.api_key,
only_n_most_recent_images=st.session_state.only_n_most_recent_images,
)
This code runs the agent loop and updates the UI in real-time as Claude responds and performs actions.
Multiple API Providers: The demo supports multiple API providers (Anthropic, Bedrock, and Vertex) through a simple configuration change.
Image Management: To manage token usage, the demo implements a clever system to only send the N most recent images to Claude:
def _maybe_filter_to_n_most_recent_images(
messages: list[BetaMessageParam],
images_to_keep: int,
min_removal_threshold: int = 10,
):
# ... (implementation details)
WARNING_TEXT = "⚠️ Security Alert: Never provide access to sensitive accounts or data, as malicious web content can hijack Claude's behavior"
st.text_area(
"Custom System Prompt Suffix",
key="custom_system_prompt",
help="Additional instructions to append to the system prompt. see computer_use_demo/loop.py for the base system prompt.",
on_change=lambda: save_to_storage(
"system_prompt", st.session_state.custom_system_prompt
),
)
The Claude Computer Use Demo is an impressive showcase of how AI can interact with computer environments. By combining the Anthropic API with custom tools and a user-friendly interface, it opens up exciting possibilities for AI-assisted computer interaction. The modular design and attention to security considerations make it an excellent starting point for developers looking to build similar applications.