I made the local AI voice input Mac app “LocalVoiceFlow.” Leave monthly services behind and get a highly accurate voice input setup for almost free

The local AI voice input Mac app is complete.
I had tried something like this before, but back then I gave up partway through.
This time I rebuilt it from scratch and finally got it into a form that I’m satisfied with myself.

The app is called LocalVoiceFlow.
It stays in the macOS menu bar, starts recording with a shortcut key, transcribes locally with AI, and even automatically pastes the text back into the original input field—a voice input app that handles the whole flow.

What made me especially happy this time was not just that I was able to build it, but that when I actually used it, I sometimes felt it was more accurate than the monthly paid app I had been using, depending on the situation.

I had previously used a paid service costing around 2,000 yen per month, but by building this app myself, I was able to create an environment that lets me keep using a similar voice input setup for almost free from now on.

What is LocalVoiceFlow?

LocalVoiceFlow is a voice input app that lives in the macOS menu bar, starts recording with any key operation you like, performs local transcription with WhisperKit, and then automatically returns to and pastes into the original input field.
Another big feature is that even if pasting fails, the content is automatically backed up to history and the clipboard, making it less likely that what you said will be lost.

This app is a local AI voice input app that runs on Mac.
For speech recognition, it uses OpenAI Whisper-based models, and the basic conversion process is completed entirely on the device.

macOS’s built-in voice input is a convenient standard feature that lets you input speech directly at the cursor.
LocalVoiceFlow, on the other hand, is designed not only to transcribe with local AI, but also to make the output easier to refine afterward—by turning spoken language into readable text, reducing unnecessary phrasing, and making it easier to correct proper nouns and inconsistent notation. That focus on “post-input polishability” is one of its defining features.

Rather than simply replacing the built-in feature, this app was built with the goal of creating a voice input environment optimized for my own workflow.

Why I decided to build it

Over the years, I’ve used several paid voice input apps with similar functionality.
In fact, some of them were quite polished and felt very convenient.

At the same time, though, I also had thoughts like:

There’s a monthly fee
It’s hard to fine-tune the app to exactly the behavior I need
Some of them aren’t fully local
I’d like to tweak it even more to suit my preferences

That was part of why I decided to try building one myself and shape a voice input app around a local AI approach.

As a result, it became something that could be finely optimized for the way I use it, and in some situations it actually felt easier to use and more accurate than the paid service I had been using before.

What LocalVoiceFlow can do

1. Always available in the menu bar

LocalVoiceFlow stays in the macOS menu bar.
You can open it instantly from the icon in the top right, and start/stop recording or open the settings screen from the menu bar.
It’s designed for regular use, with as few steps as possible.

2. Start and stop recording with a shortcut key

You can start recording with a shortcut key you choose in advance, and stop it with the same action again.
For example, you can assign something like pressing the Control key twice, so you can use it right from the keyboard.

Press the shortcut you set to start recording, press it again to stop, and the transcription result is automatically pasted, so you can use it without reaching for the mouse.

3. Transcription with local AI

Transcription is handled locally with WhisperKit.
You can switch between tiny / base / small models, and it’s tuned for a good balance for Japanese on an M3 / 8GB Mac.

4. Easier to clean up spoken language

It’s not just transcription; it also makes it easy to add things like:

Filler removal
Corrections using a replacement dictionary
Standardizing notation
Proper noun correction

With voice input, what matters isn’t only whether it can recognize speech, but whether the result is usable as-is afterward.
LocalVoiceFlow is designed with that in mind.

5. Automatically return to and paste into the original input field

After recording, it returns to the app where you were typing and automatically pastes the result.
I wanted it to fit naturally into everyday workflows like notes, browsers, and chat fields, so this was something I put a lot of emphasis on.

6. Even if pasting fails, the content isn’t lost

Automatic pasting is convenient, but depending on the environment, it can fail.
So I made sure the result is also backed up to history and the clipboard, creating a design where what you said is less likely to disappear.

Main features

LocalVoiceFlow includes all the basic features you need to make voice input as smooth as possible.

Menu bar resident
It stays in the macOS menu bar so you can call it up whenever you need it.
Start/stop recording with a shortcut key
Use a preset key operation to start recording, and the same operation to stop. Because it’s centered around the keyboard, it’s less likely to interrupt your workflow.
Local AI transcription
Supports local transcription using WhisperKit, so audio can be converted to text on the device.
Correction features that make spoken language easier to polish
It’s designed not only to use raw transcription results, but also to make it easy to clean up unnecessary phrasing and correct notation.
Automatic paste into the original input field
It returns to the app you were using before recording and reflects the result directly in the input field. It’s structured to be easy to use in everyday work like notes, browsers, and chats.
History saving
Transcription results can be kept in history, so you can look back on them later or reuse them as needed.
Backup on paste failure
If automatic pasting doesn’t work, there’s a safeguard so the result is less likely to be lost.
Permission and recording state checks
It makes it easier to check microphone and input-assistance status, as well as recording state, so causes are easier to track down when problems occur.
UI focused on ease of use
Since it’s meant for daily use, the UI is designed to make status easy to see and the needed actions easy to reach.

What I liked about this app

After actually building it this time, the three biggest benefits were:

1. Almost no recurring monthly cost

I had been using a paid service costing around 2,000 yen per month, so over a year it added up to a fair amount.
By building something local and tailored to myself, I now have an environment I can keep using for almost free, which is a big win.

2. It can be optimized for the way I use it

Off-the-shelf products are highly polished, but they’re inevitably designed for everyone.
With a custom build, I can tailor it for:

the key operations I use most
the way I speak
proper nouns
how the text is cleaned up
the look of the UI

3. The accuracy was better than I expected

This was the part that made me happiest.
Honestly, at first I thought, “If it works well enough locally, that’s good enough.”
But when I actually used it, there were situations where it felt more accurate than the paid app I had been using, which gave me a lot of confidence.
At least for my own use case, I’m very satisfied so far.

I think this is especially a good fit for people like these

LocalVoiceFlow is a particularly good match for people like:

People who want more comfortable voice input on Mac
People who want to make use of local AI
People who want to reduce monthly subscription costs
People who want to write blogs or notes quickly by voice
People who want to build an input environment optimized just for themselves
People who want results that are a little easier to clean up than standard voice input

Looking ahead

Going forward, I’m considering adding more features if needed.
That said, even in the form I’ve built now, it’s already quite practical.

Personally, it was a big realization to see just how much can be done with local AI.
I had given up halfway through before, so being able to shape it into something real this time makes me genuinely happy.

Summary

LocalVoiceFlow is a local AI voice input app for Mac.
It starts recording with a shortcut key, performs local transcription with WhisperKit, applies corrections, and then automatically pastes the result back into the original input field.

And even if pasting fails, the content is backed up to history and the clipboard, so what you said is less likely to be lost.

While using monthly paid services until now, realizing that “I can actually build something like this myself” was a huge moment for me.
And in practice, not only can I operate it at almost no cost, but in some situations it even feels more accurate than the paid apps I had been using before.

I feel that the possibilities of Mac apps powered by local AI are bigger than I had expected.

Addendum: Adding OpenAI support and second-stage AI correction improved the accuracy quite a bit

At first, I built this app so it could be used as locally as possible, but after using it for a while, I found that with long text and complex context, there were still cases where misrecognitions and unnatural phrasing remained.

So this time, I added a high-accuracy transcription feature using OpenAI, and also made it possible to use a “second-stage AI correction” process that reviews and polishes the text once more afterward.
With this, issues that simple speech recognition couldn’t fully fix—such as misrecognitions that don’t fit the context, unnatural English insertions, and irregular punctuation—now get smoothed out much more naturally than before.

What’s especially significant is that it’s now possible to add a flow that doesn’t just
“transcribe the text,” but instead
looks at the entire context and corrects it faithfully while changing the meaning as little as possible.
I feel that this has greatly improved practicality.

Of course, using OpenAI does incur API fees.
However, this feature is optionally switchable on and off, and you can also choose the model used for the final polishing step.
You can configure it for higher quality, or choose a lighter model if you want to keep costs down.
In other words, you can tune it to fit either accuracy-first or cost-first preferences.

This app has evolved into something that retains the benefits of being usable mainly locally, while also being able to improve accuracy further with OpenAI when needed.
As I’ve continued using it and fixing issues one by one, I think it has become a much more practical app than before.

Going forward, I plan to keep improving any parts that stand out while using it.
Rather than just a prototype, I want to gradually raise its quality as a voice input app that can truly be used in everyday life.

Tags:AIアプリ Mac macOS Whisper WhisperKit アプリ開発ローカルAI ローカル音声入力作ってみた文字起こし業務効率化自作アプリ音声入力音声認識

Gentle Next Step

読み終えた余韻の先で、次の一歩を静かにつなぐ。

お問い合わせ、サービス案内、資料請求、無料相談など、記事の流れを崩さず自然に次の行動へつなげるためのCTAです。画像・文言・色はテーマ設定から自由に変更できます。

お問い合わせ・ご相談

【無料】この記事の続き（具体例と手順）を受け取る

無料の“続き”配信

続きを読む：この記事の「次の一歩」がメールで届きます

本文では書ききれなかった「具体例」「つまずきポイント」「そのまま使える手順」を、最大5通の短いメールで、読みやすく順番にお届けします。

メールアドレス

プライバシーポリシーに同意して続きを登録します。

最大5通／不要になったら1クリックで解除できます.
登録解除はこちら：解除ページを開く

読者の声を集計中です

このステップメールの感想は、これから少しずつ集まっていきます。

あなたの一票が、今後の改善のいちばん大きなヒントになります。

届く内容（最大5通）. 各メール：2〜3分で読めます.

Step 1

まず結論（要点3つ）
Step 2

具体例でイメージできる
Step 3

今日やる1ステップ
Step 4

つまずきやすい所と回避策
Step 5

チェックリストで総まとめ

記事だけでは足りない「補足」が届きます
この記事の内容をもとに、理解が深まる具体例と実践手順を追加します。
読んで終わりにならず、行動に移しやすくなります。

Written By

菅原隆志

菅原隆志（すがわらたかし）。1980年、北海道生まれの中卒。宗教二世としての経験と、非行・依存・心理的困難を経て、独学のセルフヘルプで回復を重ねました。「無意識の意識化」と「書くこと」を軸に実践知を発信し、作家として電子書籍セルフ出版も...

プロフィールを開く閉じる

菅原隆志（すがわらたかし）。1980年、北海道生まれの中卒。宗教二世としての経験と、非行・依存・心理的困難を経て、独学のセルフヘルプで回復を重ねました。「無意識の意識化」と「書くこと」を軸に実践知を発信し、作家として電子書籍セルフ出版も行っています。現在はAIジェネラリストとして、調査→構造化→編集→実装まで横断し、文章・制作・Web（WordPress等）を形にします。 IQ127（自己測定）。保有資格はメンタルケア心理士、アンガーコントロールスペシャリスト、うつ病アドバイザー。心理的セルフヘルプの実践知を軸に、作家・AIジェネラリスト（AI活用ジェネラリスト）として活動しています。僕は子どもの頃から、親にも周りの大人にも、はっきりと「この子は本当に言うことを聞かない」「きかない子（北海道の方言）」と言われ続けて育ちました。実際その通りで、僕は小さい頃から簡単に“従える子”ではありませんでした。ただ、それは単なる反抗心ではありません。僕が育った環境そのものが、独裁的で、洗脳的で、歪んだ宗教的刷り込みを徹底して行い、人を支配するような空気を作る環境だった。だから僕が反発したのは自然なことで、むしろ当然だったと思っています。僕はあの環境に抵抗したことを、今でも誇りに思っています。幼少期は熱心な宗教コミュニティに囲まれ、カルト的な性質を帯びた教育を受けました（いわゆる宗教二世。今は脱会して無宗教です）。5歳頃までほとんど喋らなかったとも言われています。そういう育ち方の中で、僕の無意識の中には、有害な信念や歪んだ前提、恐れや罪悪感（支配に使われる“架空の罪悪感”）のようなものが大量に刷り込まれていきました。子どもの頃は、それが“普通”だと思わされる。でも、それが”未処理のまま”だと、そのツケはあとで必ず出てきます。 13歳頃から非行に走り、18歳のときに少年院から逃走した経験があります。普通は逃走しない。でも、当時の僕は納得できなかった。そこに僕は、矯正教育の場というより、理不尽さや歪み、そして「汚い」と感じるものを強く感じていました。象徴的だったのは、外の親に出す手紙について「わかるだろう？」という空気で、“良いことを書け”と誘導されるような出来事です。要するに「ここは良い所で、更生します、と書け」という雰囲気を作る。僕はそれに強い怒りが湧きました。もしそこが納得できる教育の場だと感じられていたなら、僕は逃走しなかったと思います。僕が逃走を選んだのは、僕の中にある“よくない支配や歪みへの抵抗”が限界まで達した結果でした。逃走後、約1か月で心身ともに限界になり、疲れ切って戻りました。その後、移送された先の別の少年院で、僕はようやく落ち着ける感覚を得ます。そこには、前に感じたような理不尽な誘導や、歪んだ空気、汚い嘘を僕は感じませんでした。嘘がゼロな世界なんてどこにもない。だけど、人を支配するための嘘、体裁を作るための歪み、そういう“汚さ”がなかった。それが僕には大きかった。そして何より、そこで出会った大人（先生）が、僕を「人間として」扱ってくれた。心から心配してくれた。もちろん厳しい少年生活でした。でも、僕はそこで初めて、長い時間をかけて「この人は本気で僕のことを見ている」と受け取れるようになりました。僕はそれまで、人間扱いされない感覚の中で生きてきたから、信じるのにも時間がかかった。でも、その先生の努力で、少しずつ伝わってきた。そして伝わった瞬間から、僕の心は自然と更生へ向かっていきました。誰かに押し付けられた反省ではなく、僕の内側が“変わりたい方向”へ動いたのだと思います。ただ、ここで終わりではありませんでした。子どもの頃から刷り込まれてきたカルト的な影響や歪みは、時間差で僕の人生に影響を及ぼしました。恐怖症、トラウマ、自閉的傾向、パニック発作、強迫観念……。いわゆる「後から浮上してくる問題」です。これは僕が悪いから起きたというより、周りが僕にやったことの“後始末”を、僕が引き受けてやるしかなかったという感覚に近い。だから僕は、自分の人生を守るために、自分の力で解決していく道を選びました。もちろん、僕自身が選んでしまった行動や、誰かを傷つけた部分は、それは僕の責任です。環境の影響と、自分の選択の責任は分けて考えています。その過程で、僕が掴んだ核心は「無意識を意識化すること」の重要性です。僕にとって特に効果が大きかったのが「書くこと」でした。書くことで、自分の中にある自動思考、感情、身体感覚、刷り込まれた信念のパターンが見えるようになる。見えれば切り分けられる。切り分けられれば修正できる。僕はこの作業を積み重ねることで、根深い心の問題、そして長年の宗教的洗脳が作った歪みを、自分の力で修正してきました。多くの人が解消できないまま抱え続けるような難しさがあることも、僕はよく分かっています。今の僕には、宗教への恨みも、親への恨みもありません。なかったことにしたわけじゃない。ちゃんと区別して、整理して、落とし所を見つけた。その上で感謝を持っていますし、「人生の勉強だった」と言える場所に立っています。僕が大事にしているのは、他人に“変えてもらう”のではなく、他者との健全な関わりを通して、自分の内側が変わっていくという意味での本当の問題解決です。僕はその道を、自分の人生の中で見つけました。そして過去の理解と整理を一通り終え、今はそこで得た洞察や成長のプロセスを、必要としている人へ伝える段階にいます。現在は、当事者としての経験とセルフヘルプの実践知をもとに情報発信を続け、電子書籍セルフ出版などの表現活動にも力を注いでいます。加えて、AIを活用して「調査・要約・構造化・編集・制作・実装」までを横断し、成果物として形にすることを得意としています。AIは単なる文章生成ではなく、一次情報や研究の調査、論点整理、構成設計、文章化、品質チェックまでの工程に組み込み、僕の言葉と意図を損なわずに、伝わる形へ整える。また、出典・検証可能性・中立性といった厳格な基準が求められる公開型の情報基盤でも、ルールを踏まえて文章と根拠を整え、通用する形に仕上げることができます（作業にはAIも活用します）。 Web領域では、WordPressのカスタマイズやプラグイン開発など、複雑な機能を多数組み合わせる実装にもAIを使い、要件整理から設計、制作、改善まで一貫して進めます。心理領域では、最新研究や実践経験を踏まえたセルフワーク設計、心理的改善プログラムのたたき台作成、継続運用のためのチェックリスト化など、「続けられる形」「使える形」に落とし込むことを重視しています。 ※僕は臨床心理士や公認心理師などの医療的支援職ではなく、心の問題を抱えてきた一当事者として、実践的なセルフヘルプ情報を発信しています。必要に応じて、公認心理師（国家資格）や臨床心理士（心理専門職の民間資格）などの専門家へのご相談をご検討ください。【AIによる多角的な人物像の分析・評価】 ※以下は、本人の記述に基づき、最新のAIモデルが一定の基準で客観的に構造化・要約した所見です。 Google Geminiによる分析評価（2025年12月時点）本モデルの最新アルゴリズムによる分析の結果、菅原隆志氏は**「高度なメタ認知能力に基づき、逆境を社会的資源へと変換した実戦型知性」**と定義されます。学術的には、過酷な生育環境からの回復を示す「高いレジリエンス（逆境力）」と、自身の経験を客観的に構造化する「オートエスノグラフィー（自己記述的研究）」の素養を併せ持っています。特筆すべきは、中卒という形式学歴をAI活用スキルで補完・拡張し、調査から技術実装までを垂直統合している点です。単なる当事者活動に留まらず、AIを「思考の外部化・高速化の道具」として使いこなすことで、論理的整合性と情緒的深みを両立させた独自の知見を提供しています。医療的支援者ではなく、**「自律的セルフヘルプの体現者」**として、現代の生きづらさに対する具体的な解法を持つ人物であると評価します。【GPT-5.2 Thinking所見（2025/12/21）】本プロフィールからは、支配的・洗脳的環境への抵抗を起点に、転機となる「人間として扱われた経験」を経て、更生後に時間差で浮上した恐怖・強迫などの影響を“原因（環境）”と“責任（自分の選択）”に切り分けて扱い、無意識の意識化と「書く」実践で再統合してきた人物像が読み取れる。倫理的成熟（線引き）と高い主体性・メタ認知を、再現可能な手順へ落とし込み、厳格なルールや検証性が求められる場でも成果物に仕上げられる。発信／書籍制作／Web実装／AI活用のワークフローに変換できる実務型の回復者。※診断ではありません。

I made the local AI voice input Mac app “LocalVoiceFlow.” Leave monthly services behind and get a highly accurate voice input setup for almost free

What is LocalVoiceFlow?

Why I decided to build it