AUV means Application Use Via ....
- Apple Music Application Use Via
auv-apple-music... - macOS Media Control Use Via
auv-media-macos... - Balatro (yes the game Balatro) Application Use Via
auv-game-balatro... - ... more, waiting for your implementation.
Think of it as a programmable computer use, without agents.
Install Rust first. This workspace uses Rust 2024 and currently requires the
toolchain declared in Cargo.toml.
Install directly from GitHub:
cargo install --git https://github.com/moeru-ai/auv auv-cli
auv --helpAfter installation, use the auv CLI directly:
auv --help
auv invoke --helpmacOS automation needs OS permissions granted to the process that launches AUV, usually your terminal app.
Open System Settings -> Privacy & Security and enable:
| Permission | Needed for |
|---|---|
| Accessibility | AX tree reads, focused element control, keyboard/pointer automation. |
| Screen Recording | Screenshots, OCR, visual inspection, and evidence capture. |
| Automation | AppleScript/System Events activation flows used by app probes and some drivers. |
After changing permissions, restart the terminal process and rerun:
auv permissions check
auv app probe com.apple.TextEditFor Cua, agent-browser, and
similar computer-use projects, it is common to execute screenshot, read image, click, type,
wait, and follow-up verification steps in sequence, then ask LLMs or agents to judge the next move.
flowchart LR
A[Agent] --> B[screenshot]
B --> C[read image]
C --> D[decide next step]
D --> E[click]
E --> F[wait]
F --> G[type]
G --> H[verify]
H --> D
Many of those repeated sequences can be squashed into reusable GUI operations. Opening an app, waiting for readiness, filling a form, and checking the result should be callable as one command instead of spending tokens on the same step-by-step loop every time.
Modern agents often use skills or project instructions to orchestrate tool calls, CLIs, and scripts. But built-in computer-use surfaces, such as OpenAI Computer Use or Claude Computer Use, are still primarily interactive model-tool loops, not scriptable GUI automation libraries.
| Tool-call loop | Rust scripts |
|---|---|
|
pub fn open_and_fill_form(
app: &mut AppSession,
data: FormData,
) -> AuvResult<OperationResult> {
app.open()?;
app.wait_for_ready()?;
app.fill(data)?;
app.verify_submitted()
} |
|
pub fn scan_visible_rows(
region: &mut WindowRegion,
) -> AuvResult<ScrollScanArtifact> {
region.scan_rows_until_stop()
} |
|
pub fn verify_and_retry<F>(
mut operation: F,
) -> AuvResult<OperationResult>
where
F: FnMut() -> AuvResult<OperationResult>,
{
retry_until_verified(&mut operation)
} |
Similar to Playwright, AUV expects agents to write, test, and improve reusable GUI automations for E2E tests and rapid application actions.
AUV is not a computer-use agent. It does not ship an agent or harness. It offers tools, CLIs, drivers, and verifiable observable results so agents can build reusable GUI operations.
AUV is meant to work with coding agents and agent products such as:
- Codex
- Claude Code
- Pi Agent
- LobeHub
- Kimi CLI
- ... bring your own
That means:
- If your agent can call a CLI, AUV can be used as computer use.
- If your agent can write code, AUV can save tokens by moving repeated GUI work into Rust commands today, with JavaScript/TypeScript and Python bindings planned after the contracts settle. Once a GUI flow is finalized as a command, repeated execution can approach zero reasoning-token cost.
AUV born from the grounding knowledge of building general gaming agents for Project AIRI, since 2024, we tried to build agents to allow LLMs to play the following games, you can find how we implement the agents in the following repos:
There are more games we implemented where you can find in Project AIRI organization, but these four requires YOLO, OCR, screen understanding, and computer-use capabilities.
Now you have the framework to build for any applications, games.
Since Vercel published the agent-browser, we fell in love with it and have it assisted agents to build many web projects, but we found that the loop it requires for agents to call agent-browser CLI to execute the commands is too slow and inefficient, while in computer use world, many operations can be repeated thousands of times, just like how Playwright/Vitest would allow us to write E2E test for applications, why don't we expand this idea of writing code to control application to computer use world?
What AUV can do, compared to other computer-use projects.
| Capability | AUV | Cua | OpenBridge / KWWKComputerUseCore | Playwright |
|---|---|---|---|---|
| Agent model | π‘ BYOA | π‘ BYOA + Built-in Agent | π‘ BYOA + Built-in Agent | β |
| Scriptable | β Rust β³ JS/TS/Python | β JS/TS/Python/... | ||
| Multi-driver | β macOS/Windows β³ Linux/Android/iOS | β | β | β |
| CLI | β | β | β | |
| MCP | β | β | β | β |
| RL Trajectory | β runs + o11y (OTEL compatible) + artifacts | β | β | |
| Screenshot | β | β | β | β browser only |
| OCR | β BYOK / OS OCR | β | β | |
| Image Match | β | β | β | β user code only |
| AX (Accessibility Tree) | β macOS/Windows | β macOS | β macOS | |
| AX Actions | β | β | β | |
| Mouse / Click | β | β | β | |
| Virtual Mouse / Background | β macOS/Windows | β macOS focused | β macOS focused | |
| Virtual Mouse / Foreground HID | β | β | β | |
| Keyboard | β | β | β | |
| Scroll | β | β | β | |
| Scroll to List | β | β | β | β |
| Bundled for Apps | β | β | β | β |
| Feedback | β Agent understand whether clicked or typed | |||
| SLM friendly | β Bundled for Apps | β | ||
| YOLO / Custom Models | β | β | β | β |
- Scroll scan is a major reason AUV exists. Most desktop automation stacks can
scroll or read a screenshot, but they do not turn a native app's visual list into
page records, row candidates, crop artifacts, OCR fragments, and inspectable
stop reasons. AUV's current scroll-scan implementation is still contract work,
so the old public
scan window-regionCLI was removed until the reusable API is clear. - Feedback means the automation returns machine-readable evidence after an attempt: what input path was used, what changed, what artifacts were captured, whether verification passed, and why an operation should retry, stop, or fail.
cargo fmt --check
cargo check
cargo test
git diff --check
cargo run -- --help
cargo run -- invoke --helpUseful entrypoints:
auv app probe <bundle-id>
auv app analyze .auv/app-probes/<probe>/probe.json
auv invoke <command-id> --help
auv inspect <run-id>Use docs/TERMS_AND_CONCEPTS.md for shared vocabulary. Durable design and
evidence notes live under docs/ai/references/.
Note
This project is part of the Project AIRI ecosystem.
Special thanks to all contributors for their contributions to auv β€οΈ