AUV

AUV means Application Use Via ....

Apple Music Application Use Via auv-apple-music...
macOS Media Control Use Via auv-media-macos...
Balatro (yes the game Balatro) Application Use Via auv-game-balatro...
... more, waiting for your implementation.

Think of it as a programmable computer use, without agents.

Install

Install Rust first. This workspace uses Rust 2024 and currently requires the toolchain declared in Cargo.toml.

Install directly from GitHub:

cargo install --git https://github.com/moeru-ai/auv auv-cli
auv --help

After installation, use the auv CLI directly:

auv --help
auv invoke --help

Setup

macOS Permissions

macOS automation needs OS permissions granted to the process that launches AUV, usually your terminal app.

Open System Settings -> Privacy & Security and enable:

Permission	Needed for
Accessibility	AX tree reads, focused element control, keyboard/pointer automation.
Screen Recording	Screenshots, OCR, visual inspection, and evidence capture.
Automation	AppleScript/System Events activation flows used by app probes and some drivers.

After changing permissions, restart the terminal process and rerun:

auv permissions check
auv app probe com.apple.TextEdit

Understand AUV

For Cua, agent-browser, and similar computer-use projects, it is common to execute screenshot, read image, click, type, wait, and follow-up verification steps in sequence, then ask LLMs or agents to judge the next move.

flowchart LR
  A[Agent] --> B[screenshot]
  B --> C[read image]
  C --> D[decide next step]
  D --> E[click]
  E --> F[wait]
  F --> G[type]
  G --> H[verify]
  H --> D

Many of those repeated sequences can be squashed into reusable GUI operations. Opening an app, waiting for readiness, filling a form, and checking the result should be callable as one command instead of spending tokens on the same step-by-step loop every time.

Modern agents often use skills or project instructions to orchestrate tool calls, CLIs, and scripts. But built-in computer-use surfaces, such as OpenAI Computer Use or Claude Computer Use, are still primarily interactive model-tool loops, not scriptable GUI automation libraries.

Tool-call loop Rust scripts

• Ran screenshot
  └ saved screen.png
• Ran read image screen.png
  └ form is visible
• Ran click "Email"
  └ clicked
• Ran type "user@example.com"
  └ typed
• Ran screenshot
  └ saved after.png
• Ran verify form state
  └ ready

pub fn open_and_fill_form(
  app: &mut AppSession,
  data: FormData,
) -> AuvResult<OperationResult> {
  app.open()?;
  app.wait_for_ready()?;
  app.fill(data)?;
  app.verify_submitted()
}

• Ran screenshot
  └ saved page-1.png
• Ran OCR visible rows
  └ 12 rows
• Ran scroll
  └ scrolled down
• Ran OCR visible rows
  └ 10 rows, 4 repeated
• Ran guess when to stop
  └ uncertain

pub fn scan_visible_rows(
  region: &mut WindowRegion,
) -> AuvResult<ScrollScanArtifact> {
  region.scan_rows_until_stop()
}

• Ran click target
  └ clicked
• Ran screenshot
  └ saved after-click.png
• Ran semantic check
  └ mismatch
• Ran retry manually
  └ repeated tool loop

pub fn verify_and_retry<F>(
  mut operation: F,
) -> AuvResult<OperationResult>
where
  F: FnMut() -> AuvResult<OperationResult>,
{
  retry_until_verified(&mut operation)
}

Similar to Playwright, AUV expects agents to write, test, and improve reusable GUI automations for E2E tests and rapid application actions.

AUV is not a computer-use agent. It does not ship an agent or harness. It offers tools, CLIs, drivers, and verifiable observable results so agents can build reusable GUI operations.

AUV is meant to work with coding agents and agent products such as:

Codex
Claude Code
Pi Agent
LobeHub
Kimi CLI
... bring your own

That means:

If your agent can call a CLI, AUV can be used as computer use.
If your agent can write code, AUV can save tokens by moving repeated GUI work into Rust commands today, with JavaScript/TypeScript and Python bindings planned after the contracts settle. Once a GUI flow is finalized as a command, repeated execution can approach zero reasoning-token cost.

Why even build AUV?

AUV born from the grounding knowledge of building general gaming agents for Project AIRI, since 2024, we tried to build agents to allow LLMs to play the following games, you can find how we implement the agents in the following repos:

There are more games we implemented where you can find in Project AIRI organization, but these four requires YOLO, OCR, screen understanding, and computer-use capabilities.

Now you have the framework to build for any applications, games.

Since Vercel published the agent-browser, we fell in love with it and have it assisted agents to build many web projects, but we found that the loop it requires for agents to call agent-browser CLI to execute the commands is too slow and inefficient, while in computer use world, many operations can be repeated thousands of times, just like how Playwright/Vitest would allow us to write E2E test for applications, why don't we expand this idea of writing code to control application to computer use world?

Capability Matrix

What AUV can do, compared to other computer-use projects.

Capability	AUV	Cua	OpenBridge / KWWKComputerUseCore	Playwright
Agent model	💡 BYOA	💡 BYOA + Built-in Agent	💡 BYOA + Built-in Agent	❌
Scriptable	✅ Rust ⏳ JS/TS/Python	⚠️ Tools only	⚠️ Swift Only	✅ JS/TS/Python/...
Multi-driver	✅ macOS/Windows ⏳ Linux/Android/iOS	✅	❌	❌
CLI	✅	✅	❌	⚠️ via user scripts
MCP	✅	✅	❌	❌
RL Trajectory	✅ runs + o11y (OTEL compatible) + artifacts	⚠️ recordings	❌	✅
Screenshot	✅	✅	✅	✅ browser only
OCR	✅ BYOK / OS OCR	⚠️ BYOK	❌	❌
Image Match	✅	✅	❌	❌ user code only
AX (Accessibility Tree)	✅ macOS/Windows	✅ macOS	✅ macOS	⚠️ Browser only
AX Actions	✅	✅	✅	⚠️ browser only
Mouse / Click	✅	✅	✅	⚠️ Browser only
Virtual Mouse / Background	✅ macOS/Windows	✅ macOS focused	✅ macOS focused	⚠️ Browser only
Virtual Mouse / Foreground HID	✅	✅	❌	⚠️ Browser only
Keyboard	✅	✅	✅	⚠️ Browser Only
Scroll	✅	✅	✅	⚠️ Browser Only
Scroll to List	✅	❌	❌	✅
Bundled for Apps	✅	❌	❌	❌
Feedback	✅ Agent understand whether clicked or typed	⚠️ tool outputs	⚠️ structured metadata	⚠️ assertions/traces
SLM friendly	✅ Bundled for Apps	⚠️ Agent orchestrated	⚠️ Agent ochestrated	✅
YOLO / Custom Models	✅	✅	❌	❌

Scroll scan is a major reason AUV exists. Most desktop automation stacks can scroll or read a screenshot, but they do not turn a native app's visual list into page records, row candidates, crop artifacts, OCR fragments, and inspectable stop reasons. AUV's current scroll-scan implementation is still contract work, so the old public scan window-region CLI was removed until the reusable API is clear.
Feedback means the automation returns machine-readable evidence after an attempt: what input path was used, what changed, what artifacts were captured, whether verification passed, and why an operation should retry, stop, or fail.

Development

cargo fmt --check
cargo check
cargo test
git diff --check
cargo run -- --help
cargo run -- invoke --help

Useful entrypoints:

auv app probe <bundle-id>
auv app analyze .auv/app-probes/<probe>/probe.json
auv invoke <command-id> --help
auv inspect <run-id>

Use docs/TERMS_AND_CONCEPTS.md for shared vocabulary. Durable design and evidence notes live under docs/ai/references/.

Acknowledgements

Special Thanks

Special thanks to all contributors for their contributions to auv ❤️

Star History

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 841 Commits
.cursor		.cursor
.github		.github
crates		crates
devtools/auv-game-minecraft		devtools/auv-game-minecraft
docs		docs
examples		examples
proto		proto
scripts		scripts
src		src
.editorconfig		.editorconfig
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.prototools		.prototools
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
cspell.config.yaml		cspell.config.yaml
flake.lock		flake.lock
flake.nix		flake.nix
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AUV

Install

Setup

macOS Permissions

Understand AUV

Why even build AUV?

Capability Matrix

Development

Related

Acknowledgements

Special Thanks

Star History

License

About

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AUV

Install

Setup

macOS Permissions

Understand AUV

Why even build AUV?

Capability Matrix

Development

Related

Acknowledgements

Special Thanks

Star History

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages