GUIDE

The voice dictation guide for knowledge workers

How speech-to-text works in 2026, how to choose a tool, and how to dictate in the apps you already use.

What voice dictation is

Voice dictation is speech-to-text software that turns what you say into written text in the apps you already use. You speak into a microphone, an AI model transcribes the audio in real time, a formatting layer cleans the output, and the text appears in whatever input field you're focused on.

Modern voice dictation tools work in two stages. A transcription model converts speech to raw text. A formatting layer adds punctuation, paragraph breaks, and platform-appropriate structure before the text lands in your app.

Voice dictation is not the same as meeting transcription. Otter, Fireflies, and Granola record conversations and produce summaries after the meeting ends. Voice dictation produces ready-to-send text as you speak it.

Voice dictation is also not the same as an AI writing assistant. ChatGPT, Claude, and Copilot generate content from a prompt. They write for you. Voice dictation writes what you said.

The category got serious in the past 18 months. Whisper-class transcription models from OpenAI and Groq made speech-to-text fast enough and accurate enough to use without thinking about it. The friction shifted from "can the AI hear me" to "what should it do with what I said."

Available for Chrome Download for Mac

Why voice dictation is changing

Three things changed at once.

First, transcription got fast. Five years ago, voice-to-text had to choose between accuracy and latency. Today, models like Whisper and Groq's distilled variants do high-accuracy transcription in under 500 milliseconds. The lag that made dictation feel like talking through a delayed phone line is gone. Speech is now a viable input method for short messages, not just long-form content.

Second, knowledge work got chattier. The average professional now writes across Gmail, Slack, Notion, Teams, Linear, Jira, and four or five other tools in a single day. Most of this writing is short-form. A Slack reply. A two-line Linear comment. An email response that's three sentences long. Typing all of that adds up. Voice dictation collapses the time per message when the formatting is right for the destination.

Third, AI started writing instead of helping you write. ChatGPT, Claude, and a wave of AI writing assistants pushed the category of "AI does the writing for you." This is genuinely useful for some tasks. It also created a backlash. People who want their voice in their writing, their cadence, their vocabulary, find AI-generated prose flat. It sounds like AI. The same wave hit voice tools. Wispr Flow's Command Mode rewrites your sentences. Willow Voice's AI Mode transforms rough notes into polished output. Aqua Voice promises to refine your words as you talk.

These features are useful. They are also a different category from voice dictation. They are closer to dictated AI writing. Some users want one, some want the other. The category is splitting.

That split is the most important thing to understand about voice dictation in 2026. Before you pick a tool, decide which side of it you're on.

The two real choices in a voice dictation tool

Once the category split, the choices stopped being about model accuracy or microphone quality. Those are commodities now. The choices are about philosophy and surface coverage.

Formatting vs generating

This is the bigger split. Two camps.

Camp one: the tool formats your speech. You speak, the tool adds punctuation, fixes grammar, breaks paragraphs, adapts the structure for the app you're writing in, and inserts your words at the cursor. The words stay yours. The formatting respects how you talk. Verbal hedges, emphasis, and casualness make it through if you used them. The output reads like you wrote it because you wrote it.

Camp two: the tool generates content from your speech. You speak rough notes or a brief prompt, the tool's AI layer rewrites the input into something polished. Sentences get restructured. Word choices get swapped. Tone gets adjusted. The output reads cleanly. It also reads more like AI than like you, which is sometimes what you want and sometimes not.

Both camps are valid. The choice depends on what you need voice dictation to do.

If your output goes to colleagues, clients, or anyone who knows your writing, the formatting camp keeps your voice intact. If your output is rough thoughts you want polished into something presentable, the generating camp does that work for you. The two are not interchangeable. Picking the wrong one means either editing AI-flavored output back into something that sounds like you, or sending raw transcription that sounds unstructured.

A quick test: read your last five Slack messages out loud. If you'd be happy seeing those as formatted text from your voice dictation tool, you want camp one. If you'd want the AI to make them sound more "professional," you want camp two.

Browser vs desktop

The other split is about where the tool lives.

Most voice dictation tools are desktop-only. They run as a Mac app or a Windows app and inject text into whatever app you have focused on. That works well for native desktop apps like Apple Mail, Slack desktop, or Linear's native client.

A few tools are browser-only. They run as a Chrome extension and work in web apps. That works well if your day is mostly Gmail, Notion, Google Docs, and other web-based tools.

Almost no voice tool covers both. That's a real problem for most knowledge workers, who spend their day moving between web and desktop. If you write in Gmail (web) and Slack (often desktop) and Notion (web) and Apple Mail (native), a single-surface tool covers maybe half of where you write.

The fix is dual distribution: a Chrome extension and a Mac app that share the same engine and the same personal vocabulary. Same voice input, same formatting rules, same recognized names and acronyms across both surfaces. This is rare in the market and worth checking before you commit to a tool.

Voice dictation by app

Different apps need different formatting. A Slack message and a Gmail email both start as the same spoken sentence, but they need to look very different when they land. This section covers the apps where voice dictation matters most for knowledge workers.

Gmail

Gmail needs a full email structure: greeting at the top, paragraph breaks, complete sentences, sign-off at the bottom. Voice dictation tools that do this well recognize they're in Gmail and format accordingly. Tools that don't drop a wall of unpunctuated text into your compose window and leave the formatting to you. See the Gmail voice dictation guide for what to look for in a tool, how to set up names and signatures, and what good dictation output looks like.

Slack

Slack messages are short, direct, and conversational. Greetings and sign-offs feel wrong in Slack. So do five-paragraph blocks. Good Slack dictation produces clean one-to-three sentence messages with the right @mentions, acronyms, and project names handled correctly. Names and @handles matter more here than almost anywhere else. A misspelled name or a missed @mention defeats the time savings. See the Slack voice dictation guide for the details.

Notion, Google Docs, and document-style writing

Document writing needs clean prose with proper paragraph breaks, headers when relevant, and full sentences. The tone is more careful than Slack but less formal than email. Voice tools that handle Gmail and Slack well usually handle Notion and Docs well because document formatting is closer to default behavior.

Microsoft Teams

Teams sits between Slack chat and email. Shorter than email, more structured than Slack. Voice tools that recognize the platform format accordingly. Tools that don't will produce either Slack-flavored output (too casual) or email-flavored output (too formal) depending on the default.

Linear, Jira, GitHub, and engineering tools

Engineering tools have their own conventions. Code references stay in code formatting. Ticket numbers stay intact. Technical terms keep exact spelling. Generic transcription tools mangle all of this. Voice tools with a personal Glossary that handles technical vocabulary and platform-specific behavior handle these tools much better.

Apple Mail, Apple Notes, and native Mac apps

Native Mac apps are where browser-only voice tools can't help. If your email lives in Apple Mail and your notes in Apple Notes, you need a voice tool that runs on Mac as a desktop app, not just a Chrome extension.

Claude desktop, ChatGPT, and AI assistants

Voice dictation pairs naturally with AI assistants. You dictate your prompt, the assistant does its work, you read the response. The formatting should respect that the prompt is going to an AI, not to a human reader. Short, direct, no greeting, no sign-off.

How to choose a voice dictation tool

Four questions to answer, in this order.

1. Do you want your words formatted or rewritten?

Re-read your last few messages. If you want voice dictation to produce text that sounds like you, you want a tool in the formatting camp. If you want voice dictation to produce more polished text than you'd write yourself, you want a tool in the generating camp.

Formatting-camp tools include Rubil, Voice In, and the default mode of most other voice dictation apps.

Generating-camp tools include Wispr Flow with Command Mode, Willow Voice with AI Mode, and Aqua Voice with its natural language editing.

The deeper comparisons:

Rubil vs Wispr Flow: the philosophy split in detail
Rubil vs Willow Voice: AI Mode vs ghostwriter approach
Rubil vs Aqua Voice: inline dictation vs floating box, model context

2. Where do you write?

If your day is mostly web apps (Gmail, Slack web, Notion, Google Docs), a Chrome extension covers most of it. If your day includes native Mac apps (Apple Mail, Slack desktop, Linear native, Claude desktop, VS Code), you need desktop coverage too. If it's both, you need dual distribution.

Most voice tools force you to pick one. A few cover both. Worth checking before you install.

3. How important is personalization?

Voice dictation hits the same wall every time: it doesn't know your colleagues' names, your team's acronyms, your project names, or your jargon. Some tools learn this through a flat custom vocabulary list. Some have structured personalization that handles names, @mentions, and platform-specific behavior like an acronym that expands in email but stays short in Slack.

If you write in a context heavy with internal terminology (any company name longer than a syllable, technical acronyms, team @handles), the depth of the personalization system matters more than model accuracy.

4. What's your privacy posture?

Voice tools handle your audio differently. Some store it. Some retain transcripts for "improving the AI." Some read your screen for context. Some require you to opt out of training data use.

If your dictation contains client work, sensitive internal communication, or anything you wouldn't want stored on a server, check the privacy page of any voice tool before installing. The questions to ask: do they store audio, do they store transcripts, do they read screen content, do they use your data to train models, do they name every subprocessor that touches your data?

A tool that can't answer those clearly is a tool to skip.

Putting it together

Most knowledge workers will end up with:

A formatting-camp tool (your words, not AI rewrites)
Dual distribution (Chrome and Mac coverage)
Structured personalization (names, acronyms, team vocabulary)
A privacy architecture that doesn't store your speech

If those four match, the rest is taste: pricing, free tier, language coverage, UI polish.

How Rubil approaches voice dictation

Rubil is a ghostwriter, not a generator. It formats what you say, it doesn't rewrite it. Verbal hedges, your sentence structure, your vocabulary, your cadence: kept. The formatting adapts to the app you're in. Email structure for Gmail, concise structure for Slack, clean prose for Notion and Docs, conventions for Linear and Jira. The words stay yours.

Two surfaces. A Chrome extension that works in any web app, and a Mac desktop app that works across native Mac apps. They share the same backend, the same formatting rules, and the same personal Glossary, so the names, acronyms, and terms you teach Rubil work everywhere you write.

You teach the Glossary your vocabulary by adding entries as you speak. Rubil can suggest an entry when you correct the same word twice.

Privacy is the architecture, not a marketing claim. Audio is processed transiently and discarded. Transcripts are formatted and discarded. We don't store voice files, transcripts, or formatted output on our servers. Your Glossary is stored in our cloud database (Supabase), isolated to your account with row-level security and encrypted in transit and at rest, so it syncs across your devices. Every data processor is named on the Trust page.

Multilingual support comes from Groq's transcription model: 50+ languages including Spanish, Portuguese, French, German, Japanese, Chinese, Korean, Arabic, and Hindi.

Voice dictation FAQ

What's the difference between voice dictation and speech-to-text?

Speech-to-text is the underlying technology: an AI model converts audio into text. Voice dictation is the product category that uses speech-to-text plus a formatting layer to insert ready-to-use text into the apps you write in. Speech-to-text is a component. Voice dictation is the experience.

Does voice dictation work in Gmail and Slack?

Yes, if the tool supports them. Some voice tools are browser-only and work in Gmail web and Slack web. Some are desktop-only and work in Slack desktop and any native Mac email client. A few cover both via dual distribution. Check the supported apps list before you install.

Is voice dictation faster than typing?

For most knowledge workers, yes, by a meaningful margin. People speak around 150 words per minute and type around 40 to 60. The catch is editing time. If your voice tool produces text that needs heavy reformatting, you give back the speed gain. The faster the tool's formatting matches what you need, the bigger the speed-up.

Is voice dictation accurate?

Modern voice dictation is accurate enough for professional work. Whisper-class models hit 95%+ accuracy on clear speech in English and most major languages. Accuracy drops with background noise, accents the model hasn't trained much on, and specialized vocabulary the tool doesn't know. The last one is fixable with a good personal Glossary.

Does voice dictation work in languages other than English?

The leading transcription models (OpenAI Whisper, Groq) cover 50+ languages including Spanish, Portuguese, French, German, Italian, Japanese, Chinese, Korean, Arabic, Hindi, and many more. Tools built on those models inherit the language coverage. Check the tool's privacy or features page for the underlying transcription model if you need a specific language.

Can I use voice dictation in a quiet office?

Some tools support a low-volume or whisper mode optimized for soft speech. If you work in an open office or shared space, look for that feature specifically. Without it, you have to speak at conversational volume for the model to hear you accurately.

What's the difference between voice dictation and AI writing assistants?

Voice dictation turns what you say into text in your own words. AI writing assistants generate content from a prompt in the AI's words. Voice dictation is faster typing. AI writing assistants are content generation. They're complementary, not competing. You might use ChatGPT to brainstorm an email, then use voice dictation to write the response.

Do voice dictation tools store my audio?

It depends on the tool. Some store audio for transcript history. Some discard it after transcription. Some use it to train models. The honest ones name every subprocessor and document retention policies on a public Trust or privacy page. If you can't find clear answers, assume the worst and skip the tool.

What's the difference between voice dictation and meeting transcription?

Meeting transcription tools (Otter, Fireflies, Granola) record conversations between multiple people and produce summaries after the meeting. Voice dictation produces text from a single speaker, formatted for an app you're writing in, while you speak. Different use cases, different tools.

Is voice dictation worth paying for?

If you write more than two hours a day across multiple apps, yes. The time savings compound. If you write less than that, the free tiers of most voice tools will cover your usage. Try the free tier of the tool that fits your workflow before paying.

Try Rubil

A Chrome extension and a Mac app that formats your speech for every app you write in. Install free. 1,000 words per day on the free tier. See pricing for Pro details.

Try Rubil free

1,000 words/day. No credit card. No setup.

Available for Chrome Download for Mac