ClawMobile Cloud

A real Android phone, live in your browser. Demonstrate a task once โ€” the agent records it and induces a reusable skill that replays on a fresh device. No Termux, no ADB, no device of your own.

The agent is the ClawMobile runtime: OpenClaw plus the mobile-ui plugin, driving a hosted Redroid device over ADB.

Quickstart

Get your first Android skill running in a couple of minutes.

Prerequisites

  • An invite code (request at clawmobile.cloud)
  • A modern browser โ€” Chrome 90+, Firefox 88+, or Safari 15+

1 ยท Sign in

Enter your email and invite code at clawmobile.cloud, then click the magic link in your inbox. Login is passwordless.

2 ยท Start a session

Click New session. A clean Android device boots in a few seconds and its screen streams straight into the page โ€” you can tap it directly.

3 ยท Open an app

Open Google Keep

4 ยท Record a demonstration

Record me creating a note that says "hello world"

Perform the task on the phone โ€” every tap, screenshot, and screen transition is captured.

5 ยท Save it as a skill

Stop recording and save as a skill called keep-hello-note

6 ยท Reuse it

Start a fresh session and run it โ€” the skill replays deterministically:

Run keep-hello-note

How it works

A real Android runtime lives in the cloud and an agent drives it on your behalf. The phone, the agent, and the skill store are all server-side โ€” you never install anything.

Browser
web ยท Pages
CF Worker
api.clawmobile.cloud
Orchestrator
Bun ยท the box
Android
Redroid + OpenClaw
โ—€ scrcpy frames + agent log, proxied back to the browser over the tunnel
Browserwhat the user touches
ScrcpyPlayer
draws PNG frames to a canvas; relays your taps & swipes
Chat panel
your instructions become agent turns
useSessionStream
SSE: allocating โ†’ active โ†’ ended
stream-ticket
single-use ticket so the WS can auth without a header
CF Workerthe edge โ€” auth, state, fan-out
Auth
magic-link signup + short-lived JWT
Rate limits
KV โ€” 1 active session, a few per hour
D1
sessions, invites, skill index
R2
skills + recordings buckets
WS proxy
/stream/:id, /logs/:id โ†’ orchestrator over the tunnel
SSE + tickets
status events; mints stream tickets
Orchestratorsessions, devices, the agent
Session lifecycle
reserve โ†’ boot โ†’ active โ†’ teardown
Device pool
node placement + ADB / gateway ports
Providers
Kamatera (primary), Vast.ai (burst)
Agent runtime
embedded (default) or per-session gateway
Screen + touch
scrcpy screencap; ADB input relay
Workspace + R2 sync
Lite seed in; induced skills out
Log tail
streams the session JSONL transcript
Watchdog
reaps silent sessions + burst caps
Android containerthe ClawMobile runtime
Redroid
a real Android, single-use, driven over ADB
OpenClaw + mobile-ui
Lite mode (CLAWMOBILE_LITE=1)
Mobile tools
tap, type, swipe, screenshot, UI dump, OCR
Recording โ†’ skills
getevent capture โ†’ trace induction
  • Cloudflare edge โ€” the Worker handles magic-link auth + JWT, rate limits, and fans out session progress over SSE. The web UI ships from Pages.
  • Orchestrator โ€” a Bun service that allocates a device, brings the ClawMobile runtime online, relays the screen, and syncs skills to R2 when the session ends.
  • Android container โ€” a fresh Redroid instance per session. The agent controls it over ADB; you watch a low-latency scrcpy stream and can tap directly.

A session, step by step

  1. You request a session; the Worker checks your rate limit and asks the orchestrator for a device.
  2. The orchestrator boots a clean Redroid container on a node โ€” or bursts to a secondary provider if the pool is full โ€” and waits for ADB. Progress streams live.
  3. The agent comes online: the screen appears, and you can type instructions or tap the phone yourself.
  4. You demonstrate a task; recording captures touches, screenshots, and app state.
  5. On stop, the induction pipeline generalizes the demo into a reusable, versioned skill.
  6. The session ends: skills sync to R2, the container is destroyed, and the workspace is wiped.

The agent runtime

The agent is OpenClaw running in Lite mode (CLAWMOBILE_LITE=1) with the mobile-ui plugin. By default each turn runs the agent embedded against the session's device; a gateway mode is available for a long-lived per-session agent. Either way the same plugin tools drive the phone.

Orchestratorโ–ถ openclaw agent --localโ–ถ ADBโ–ถ Redroid

A fresh agent process per turn โ€” spawned, runs the turn, exits. No daemon, no device pairing. This is the default.

AGENT_MODE=embedded

Safe to run unattended

Every container is single-use and isolated. A watchdog sweeps the pool every couple of minutes and destroys any container whose session has gone silent, or any burst instance past its hard time cap โ€” so a crash can never leave compute billing or your data on a device.

Skills

A skill is a reusable, parameterized procedure induced from one of your demonstrations. Instead of reasoning over screenshots from scratch every time, the agent replays a known-good sequence โ€” faster, cheaper, and far more reliable.

  • Demonstration-driven โ€” you show the task once; no scripting.
  • Generalized โ€” concrete values you typed become parameters.
  • Deterministic replay โ€” a fast path executes the steps directly, falling back to the agent only when the screen doesn't match.
  • Versioned โ€” every change is a new version, synced to R2 per user.

Recording a demo

A skill is only as good as the demonstration behind it. Record one cleanly.

  1. Start a session and open the app you want to automate.
  2. Begin recording โ€” say record a demo in chat, or press the record button (โบ) on the stream. The badge turns red while capturing.
  3. Do the task once, deliberately. Tap, type, and scroll at a normal pace.
  4. Stop โ€” say stop recording. The induction pipeline runs and proposes a skill.

What gets captured

  • Touch events (getevent) โ€” taps, swipes, and coordinates
  • Screenshots at each step, for selector inference
  • App + window state โ€” package and activity at every transition
  • Typed text โ€” including anything entered into fields

Tips for a skill that generalizes

  • One task per recording. Record unrelated flows separately.
  • Start from a stable screen so replay has a predictable entry point.
  • Prefer visible, labeled controls over tiny or ambiguous tap targets.
  • Parameterize after the fact โ€” record with concrete values; the pipeline detects which inputs are variable.

A recording captures everything you type, including passwords or codes. Recordings are only uploaded if you opt in โ€” see Privacy & data.

Induction pipeline

On stop, the recorded trace is turned into a skill through a short record โ†’ generate โ†’ promote โ†’ reuse pipeline, all driven by the plugin's clawmobile_* tools.

  1. Summarize โ€” clawmobile_trace_prepare_summary compacts the trace into a digest plus a candidate schema and grounding rules.
  2. Generate โ€” the agent fills the schema, producing a skill candidate: an intent, typed parameters, steps, and anchors.
  3. Save & validate โ€” clawmobile_trace_save_skill_candidate checks anchors and step references.
  4. Promote โ€” clawmobile_skill_candidate_promote writes the skill to the workspace (SKILL.md + a generalized definition).
  5. Generalize โ€” clawmobile_skill_generalize derives a deterministic fast path with entry-state checks and failure patterns.
  6. Run โ€” clawmobile_skill_run_fast_path replays the steps with your parameters; feedback is recorded for later evolution.

Tools

The agent drives Android through the mobile-ui plugin. In the cloud it runs in Lite mode (CLAWMOBILE_LITE=1): the ADB-backed tools work unchanged, the Termux tx_* bridge is unavailable (no Termux on a headless Redroid), and android_agent_task (DroidRun) is not registered.

Device control

ToolPurpose
android_screenshotCapture the current screen
android_tapTap at a coordinate
android_typeType text into the focused field
android_swipeSwipe / scroll gesture
android_ui_dump / android_ui_queryDump and query the view hierarchy
android_ocr_dumpOCR the screen for text + boxes
android_match_text_queriesMatch on-screen text against queries
android_healthCapability + device readiness probe
android_shellRun a shell command over ADB

Recording & skills

ToolPurpose
clawmobile_record_start / _stopBegin / finalize a demonstration trace
clawmobile_trace_prepare_summaryBuild a skill candidate from a trace
clawmobile_skill_candidate_promotePromote a candidate to a skill
clawmobile_skill_generalizeDerive the deterministic fast path
clawmobile_skill_run_fast_pathReplay a skill with parameters
clawmobile_batch_executeExecute a batch of steps

Tool names are canonical in the plugin's openclaw.plugin.json; JSON schemas live in its src/index.ts.

REST API

The API lives at api.clawmobile.cloud. Authenticate with a bearer JWT from magic-link login; all session routes are scoped to your user.

Create a session
POST /api/sessions
Authorization: Bearer <jwt>

โ†’ 201 { "id", "status": "allocating", "stream_url", "created_at" }
Send the agent a message
POST /api/sessions/:id/message
{ "message": "Open Google Keep" }

โ†’ 200 { "text", "mediaUrls": [] }
Live status + streams
# mint a single-use ticket (bearer auth), then pass ?ticket= to the streams
POST /api/sessions/:id/stream-ticket   โ†’ { "ticket", "expires_in" }
GET  /api/sessions/:id/stream?ticket=  # SSE: status events
WS   /stream/:id?ticket=               # scrcpy screen + tap/swipe relay
WS   /logs/:id?ticket=                 # agent log (session JSONL)
End a session
DELETE /api/sessions/:id   โ†’ { "ok": true }

FAQ

Is this an emulator?

No โ€” it's a real Android (Redroid) container, not a screenshot-reasoning mock. Apps behave as they do on a device.

Why is the stream ~1 fps?

The screen is delivered as periodic PNG screencaps over a WebSocket, which keeps the pipe simple and auth-friendly. Taps and swipes you make are relayed to the device in real time.

Do skills carry between sessions?

Yes. Skills sync to R2 per user (versioned) when a session ends, and are available in your next session. The device itself is wiped โ€” nothing else persists.

Can I use the system clipboard in a skill?

Avoid it. Headless Android has no interactive clipboard owner, so clipboard round-trips are unreliable. Build skills that type values directly instead.

What are the limits?

The preview is invite-only with per-user rate limits (one active session, a handful per hour) and a hard session lifetime cap enforced by the watchdog.

Privacy & data

  • Devices are ephemeral. Each session gets a fresh container that is destroyed on end; its workspace is wiped.
  • Recordings stay local by default. A demonstration captures everything you type โ€” it is only uploaded if you explicitly opt in.
  • Skills are yours. Synced to R2 under your user namespace, versioned, and not shared across accounts.
  • Auth is passwordless. Magic-link login issues a short-lived JWT; stream access uses single-use tickets, never your token.

Don't enter real credentials or secrets during a demo unless you intend to record them.

Self-hosting

The stack is open and self-hostable: a Cloudflare Worker (API), Pages (web + docs), and a Bun orchestrator on a box with Docker/Redroid and the OpenClaw binary plus the mobile-ui plugin.

Orchestrator essentials

  • OPENCLAW_BIN โ€” path to the OpenClaw binary (the mobile-ui plugin must be installed and enabled).
  • AGENT_MODE โ€” embedded (default) or gateway.
  • CLAWMOBILE_SEED_DIR โ€” the Lite workspace seed (AGENTS/TOOLS + baseline skills) copied into each session.
  • REDROID_IMAGE, REDROID_HOSTS, MAX_POOL_SIZE โ€” device pool placement.
$ cd services/orchestrator && bun src/index.ts

See the repository README for the full deployment topology and Cloudflare bindings.