ClawMobile Cloud
A real Android phone, live in your browser. Demonstrate a task once โ the agent records it and induces a reusable skill that replays on a fresh device. No Termux, no ADB, no device of your own.
The agent is the ClawMobile runtime: OpenClaw plus the mobile-ui plugin, driving a hosted Redroid device over ADB.
Quickstart
Get your first Android skill running in a couple of minutes.
Prerequisites
- An invite code (request at clawmobile.cloud)
- A modern browser โ Chrome 90+, Firefox 88+, or Safari 15+
1 ยท Sign in
Enter your email and invite code at clawmobile.cloud, then click the magic link in your inbox. Login is passwordless.
2 ยท Start a session
Click New session. A clean Android device boots in a few seconds and its screen streams straight into the page โ you can tap it directly.
3 ยท Open an app
Open Google Keep 4 ยท Record a demonstration
Record me creating a note that says "hello world" Perform the task on the phone โ every tap, screenshot, and screen transition is captured.
5 ยท Save it as a skill
Stop recording and save as a skill called keep-hello-note 6 ยท Reuse it
Start a fresh session and run it โ the skill replays deterministically:
Run keep-hello-note How it works
A real Android runtime lives in the cloud and an agent drives it on your behalf. The phone, the agent, and the skill store are all server-side โ you never install anything.
CLAWMOBILE_LITE=1)- Cloudflare edge โ the Worker handles magic-link auth + JWT, rate limits, and fans out session progress over SSE. The web UI ships from Pages.
- Orchestrator โ a Bun service that allocates a device, brings the ClawMobile runtime online, relays the screen, and syncs skills to R2 when the session ends.
- Android container โ a fresh Redroid instance per session. The agent controls it over ADB; you watch a low-latency scrcpy stream and can tap directly.
A session, step by step
- You request a session; the Worker checks your rate limit and asks the orchestrator for a device.
- The orchestrator boots a clean Redroid container on a node โ or bursts to a secondary provider if the pool is full โ and waits for ADB. Progress streams live.
- The agent comes online: the screen appears, and you can type instructions or tap the phone yourself.
- You demonstrate a task; recording captures touches, screenshots, and app state.
- On stop, the induction pipeline generalizes the demo into a reusable, versioned skill.
- The session ends: skills sync to R2, the container is destroyed, and the workspace is wiped.
The agent runtime
The agent is OpenClaw running in Lite mode
(CLAWMOBILE_LITE=1) with the mobile-ui plugin. By default
each turn runs the agent embedded against the session's device; a
gateway mode is available for a long-lived per-session agent. Either
way the same plugin tools drive the phone.
A fresh agent process per turn โ spawned, runs the turn, exits. No daemon, no device pairing. This is the default.
One long-lived gateway per session over a WebSocket; each turn is routed to it. Closer to ClawMobile's on-device model, but needs a trusted device set up once.
Safe to run unattended
Every container is single-use and isolated. A watchdog sweeps the pool every couple of minutes and destroys any container whose session has gone silent, or any burst instance past its hard time cap โ so a crash can never leave compute billing or your data on a device.
Skills
A skill is a reusable, parameterized procedure induced from one of your demonstrations. Instead of reasoning over screenshots from scratch every time, the agent replays a known-good sequence โ faster, cheaper, and far more reliable.
- Demonstration-driven โ you show the task once; no scripting.
- Generalized โ concrete values you typed become parameters.
- Deterministic replay โ a fast path executes the steps directly, falling back to the agent only when the screen doesn't match.
- Versioned โ every change is a new version, synced to R2 per user.
Recording a demo
A skill is only as good as the demonstration behind it. Record one cleanly.
- Start a session and open the app you want to automate.
- Begin recording โ say
record a demoin chat, or press the record button (โบ) on the stream. The badge turns red while capturing. - Do the task once, deliberately. Tap, type, and scroll at a normal pace.
- Stop โ say
stop recording. The induction pipeline runs and proposes a skill.
What gets captured
- Touch events (
getevent) โ taps, swipes, and coordinates - Screenshots at each step, for selector inference
- App + window state โ package and activity at every transition
- Typed text โ including anything entered into fields
Tips for a skill that generalizes
- One task per recording. Record unrelated flows separately.
- Start from a stable screen so replay has a predictable entry point.
- Prefer visible, labeled controls over tiny or ambiguous tap targets.
- Parameterize after the fact โ record with concrete values; the pipeline detects which inputs are variable.
A recording captures everything you type, including passwords or codes. Recordings are only uploaded if you opt in โ see Privacy & data.
Induction pipeline
On stop, the recorded trace is turned into a skill through a short
record โ generate โ promote โ reuse pipeline, all driven by the plugin's
clawmobile_* tools.
- Summarize โ
clawmobile_trace_prepare_summarycompacts the trace into a digest plus a candidate schema and grounding rules. - Generate โ the agent fills the schema, producing a skill candidate: an intent, typed parameters, steps, and anchors.
- Save & validate โ
clawmobile_trace_save_skill_candidatechecks anchors and step references. - Promote โ
clawmobile_skill_candidate_promotewrites the skill to the workspace (SKILL.md+ a generalized definition). - Generalize โ
clawmobile_skill_generalizederives a deterministic fast path with entry-state checks and failure patterns. - Run โ
clawmobile_skill_run_fast_pathreplays the steps with your parameters; feedback is recorded for later evolution.
Tools
The agent drives Android through the mobile-ui plugin.
In the cloud it runs in Lite mode
(CLAWMOBILE_LITE=1): the ADB-backed tools work unchanged,
the Termux tx_* bridge is unavailable (no Termux on a
headless Redroid), and android_agent_task (DroidRun) is not
registered.
Device control
| Tool | Purpose |
|---|---|
android_screenshot | Capture the current screen |
android_tap | Tap at a coordinate |
android_type | Type text into the focused field |
android_swipe | Swipe / scroll gesture |
android_ui_dump / android_ui_query | Dump and query the view hierarchy |
android_ocr_dump | OCR the screen for text + boxes |
android_match_text_queries | Match on-screen text against queries |
android_health | Capability + device readiness probe |
android_shell | Run a shell command over ADB |
Recording & skills
| Tool | Purpose |
|---|---|
clawmobile_record_start / _stop | Begin / finalize a demonstration trace |
clawmobile_trace_prepare_summary | Build a skill candidate from a trace |
clawmobile_skill_candidate_promote | Promote a candidate to a skill |
clawmobile_skill_generalize | Derive the deterministic fast path |
clawmobile_skill_run_fast_path | Replay a skill with parameters |
clawmobile_batch_execute | Execute a batch of steps |
Tool names are canonical in the plugin's openclaw.plugin.json;
JSON schemas live in its src/index.ts.
REST API
The API lives at api.clawmobile.cloud. Authenticate with a
bearer JWT from magic-link login; all session routes are scoped to your
user.
POST /api/sessions
Authorization: Bearer <jwt>
โ 201 { "id", "status": "allocating", "stream_url", "created_at" } POST /api/sessions/:id/message
{ "message": "Open Google Keep" }
โ 200 { "text", "mediaUrls": [] } # mint a single-use ticket (bearer auth), then pass ?ticket= to the streams
POST /api/sessions/:id/stream-ticket โ { "ticket", "expires_in" }
GET /api/sessions/:id/stream?ticket= # SSE: status events
WS /stream/:id?ticket= # scrcpy screen + tap/swipe relay
WS /logs/:id?ticket= # agent log (session JSONL) DELETE /api/sessions/:id โ { "ok": true } FAQ
Is this an emulator?
No โ it's a real Android (Redroid) container, not a screenshot-reasoning mock. Apps behave as they do on a device.
Why is the stream ~1 fps?
The screen is delivered as periodic PNG screencaps over a WebSocket, which keeps the pipe simple and auth-friendly. Taps and swipes you make are relayed to the device in real time.
Do skills carry between sessions?
Yes. Skills sync to R2 per user (versioned) when a session ends, and are available in your next session. The device itself is wiped โ nothing else persists.
Can I use the system clipboard in a skill?
Avoid it. Headless Android has no interactive clipboard owner, so clipboard round-trips are unreliable. Build skills that type values directly instead.
What are the limits?
The preview is invite-only with per-user rate limits (one active session, a handful per hour) and a hard session lifetime cap enforced by the watchdog.
Privacy & data
- Devices are ephemeral. Each session gets a fresh container that is destroyed on end; its workspace is wiped.
- Recordings stay local by default. A demonstration captures everything you type โ it is only uploaded if you explicitly opt in.
- Skills are yours. Synced to R2 under your user namespace, versioned, and not shared across accounts.
- Auth is passwordless. Magic-link login issues a short-lived JWT; stream access uses single-use tickets, never your token.
Don't enter real credentials or secrets during a demo unless you intend to record them.
Self-hosting
The stack is open and self-hostable: a Cloudflare Worker (API), Pages (web + docs), and a Bun orchestrator on a box with Docker/Redroid and the OpenClaw binary plus the mobile-ui plugin.
Orchestrator essentials
OPENCLAW_BINโ path to the OpenClaw binary (the mobile-ui plugin must be installed and enabled).AGENT_MODEโembedded(default) orgateway.CLAWMOBILE_SEED_DIRโ the Lite workspace seed (AGENTS/TOOLS + baseline skills) copied into each session.REDROID_IMAGE,REDROID_HOSTS,MAX_POOL_SIZEโ device pool placement.
$ cd services/orchestrator && bun src/index.ts See the repository README for the full deployment topology and Cloudflare bindings.