Dockbox Voice

Press a button.
Talk to your Dockbox.

A small desktop companion that turns your Dockbox into a voice assistant. Speech-to-text and text-to-speech run on your machine — your voice never leaves it. The agent, your data, and your tools stay on the server.

Push-to-Talk Local Speech Same Brain Everywhere Hologram UI F9 Anywhere

What it does

One button between you and your Dockbox

Press the on-screen orb or hit F9 from anywhere. A beep marks the start; you talk; another beep marks the end. The hologram pulses while it thinks, then speaks the reply back. Press again at any time to interrupt.

Push-to-talk via on-screen button or global F9 hotkey
Replies stream sentence-by-sentence — no waiting for the full answer
Press again to barge in and cut off the reply
Same Dockbox agent that answers Slack, WhatsApp, email, web

jarvis · turn 04

You"Pull the latest revenue from the Q3 spreadsheet and email a summary to David."

Toolread Q3-financials.xlsx · $2.4M (+18% YoY)

Tooldraft email → [email protected]

Jarvis"Done. Q3 was 2.4 million, up 18%. Draft is on its way to David — say send if I should release it now."

How it works

Local ears and mouth. Server-side brain.

Whisper transcribes your voice on the device. The transcript — and only the transcript — goes to your Dockbox over the same authenticated session you already use. The reply streams back as text. Kokoro speaks it locally.

No new account — uses your existing email-verified Dockbox session
Audio never leaves your machine
Reply text streams over SSE; speaking starts before the agent finishes
Works fully offline for STT/TTS once models are cached

data-flow

LocalWhisper STT

→

DockboxAgent + Tools

→

LocalKokoro TTS

Audio in, audio out — only text crosses the wire.

Ways it simplifies further

Skip the keyboard for the things that don't need one

Half the requests we send to chat assistants don't need a typed message. They need a sentence — while you're walking, cooking, holding something, or just thinking out loud. Dockbox Voice closes that gap by reusing the agent and tools you already configured.

Dictate emails, tasks, and notes hands-free
Ask follow-up questions about a document without opening it
Trigger any automation by speaking the same phrasing you'd type
Drops onto a tiny side monitor as a dedicated assistant surface

use cases

While walking

"Take a note: hire two more SDRs in Q1."

Logged to your group's notes. Surfaces in the next standup brief.

Mid-meeting

"Read the latest message from David."

Reads it aloud. No screen-share switch, no glance.

Hands full

"Reply yes and add Friday at 3."

Drafts the reply, schedules the meeting, confirms once.

Thinking out loud

"Take a break for ten minutes."

Goes silent. Wakes you with a beep when time's up.

Privacy

Your voice stays on the device

Local STT

Whisper runs on your CPU or GPU. The audio is decoded, transcribed, and discarded — it never leaves the machine.

Text only over the wire

Only the transcript and the reply text traverse the network — over the same authenticated session your Dockbox already uses.

No new credentials

Uses the email-verified Dockbox session you already created — no extra auth, no API keys, no cloud accounts.

Install

Three clicks to your first conversation

Download the installer, sign in, and talk. No terminal, no dependencies, no configuration files to edit by hand.

Single .exe installer — downloads everything it needs
Configuration wizard opens automatically after install
Runs as a normal desktop app with a system-tray icon

Run the installer

Download and run JarvisSetup.exe. It installs Jarvis and the speech models in one pass.

Sign in

The configuration window opens automatically. Enter your Dockbox URL and log in with your existing account — same credentials as the web app.

Pick your group

Choose which Dockbox group your voice turns land in. You can change this later from the settings.

Talk

Jarvis launches. Press the hologram or hit F9 from anywhere — you're live.

Press a button.Talk to your Dockbox.