Skip to content

05 - Audio Recording & Transcription Guide

05 - Audio Recording & Transcription Guide

Section titled “05 - Audio Recording & Transcription Guide”

🎤 Capture Meetings with One Click
⏱️ Time Estimate: 15 minutes
📋 What You’ll Learn: Recording audio, transcription methods, file management, troubleshooting



Feature: F011 - Audio Recording & Transcription

Section titled “Feature: F011 - Audio Recording & Transcription”

Selfoss allows you to:

  • 🎤 Record directly in-app (microphone capture)
  • 🔄 Auto-transcribe recordings (optional)
  • 🔒 Local or cloud transcription (your choice)
  • 💾 Save audio files for future reference
  • 🎯 Link to projects automatically
Click Record → Select Project → Record Audio → Stop Recording
Auto-Transcribe (if enabled) OR Manual Transcribe (later)
Process with AI → View Visualizations

Microphone access: Browser will request permission
Project created: Select where to save recording
Transcription configured: Set up Ollama or cloud provider (optional for later)

Step 1: Click Microphone Icon

  • Located in header (top-right area)
  • Icon: 🎤 microphone symbol
  • Available on any page

Step 2: Allow Microphone Permission

  • Browser popup: “Allow Selfoss to access your microphone?”
  • Click “Allow”
  • ✅ Permission saved for future recordings

Step 3: Select Project

  • Dropdown appears showing all your projects
  • Choose destination project
  • Recording will be linked to this project

💡 Pro Tip: Create a dedicated “Voice Notes” project for quick recordings.

Step 4: Start Recording

  • Click “Start Recording” button
  • Recording begins immediately
  • Red pulsating indicator shows recording is active

Step 5: Capture Your Audio

  • Speak clearly into microphone
  • Monitor duration timer (MM:SS format)
  • Audio levels indicator shows input (visual feedback)

Step 6: Stop Recording

  • Click “Stop Recording” button
  • Audio file saved automatically
  • Transcript entry appears in project list

⏱️ Processing Time:

  • Save audio: Instant
  • Auto-transcribe (if enabled): 10 seconds - 5 minutes

┌──────────────────────────────────┐
│ 🎤 Recording │
├──────────────────────────────────┤
│ Project: [Engineering Team ▼] │
│ │
│ ⏺️ Recording... 03:24 │
│ │
│ 🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪ Audio Level │
│ │
│ [⏸️ Pause] [⏹️ Stop & Save] │
└──────────────────────────────────┘

Red Pulsating Dot: 🔴 (top-right of app)

  • Visible on all pages while recording
  • Ensures you don’t forget an active recording
  • Click to return to recording interface

Duration Timer: MM:SS

  • Real-time duration display
  • Updates every second
  • Helps track meeting length

Audio Level Meter: 🟢🟢🟢⚪⚪

  • Visual feedback of microphone input
  • Green bars show audio strength
  • Helps verify microphone is working
ButtonActionShortcut (Planned)
⏺️ StartBegin recordingCtrl+R
⏸️ PausePause recordingCtrl+P
⏹️ StopSave & finishCtrl+S

💡 Pro Tip: Test your microphone with a 5-second recording before important meetings.


Selfoss supports multiple transcription providers with different trade-offs.

Method 1: Whisper.cpp (Local, Privacy-First)

Section titled “Method 1: Whisper.cpp (Local, Privacy-First)”

Provider: Ollama Whisper (runs locally)

Pros:

  • 🔒 100% Private - audio never leaves your device
  • Free - no API costs
  • 🌐 Offline - works without internet

Cons:

  • ⏱️ Slower - 2-5 minutes for 1-hour audio
  • 💻 Hardware dependent - requires decent CPU/GPU

Setup:

  1. Install Ollama → 02_PRIVACY_FIRST_SETUP_GUIDE.md
  2. Download model: ollama pull whisper:base
  3. Settings → Transcription Provider → Select “Ollama”
  4. Model: whisper:base

Processing Time:

Model: whisper:tiny
- 15-min meeting: ~30 seconds
- 1-hour meeting: ~2 minutes
Model: whisper:base (recommended)
- 15-min meeting: ~1 minute
- 1-hour meeting: ~4 minutes
Model: whisper:small
- 15-min meeting: ~3 minutes
- 1-hour meeting: ~12 minutes

Method 2: OpenAI Whisper API (Cloud, Fast)

Section titled “Method 2: OpenAI Whisper API (Cloud, Fast)”

Provider: OpenAI (cloud service)

Pros:

  • Very Fast - 30 seconds for 1-hour audio
  • Excellent Accuracy - state-of-the-art model
  • 🖥️ No Hardware Requirements

Cons:

  • 💰 Paid - $0.006 per minute ($0.36/hour)
  • ☁️ Requires Internet
  • 📤 Audio uploaded to OpenAI

Setup:

  1. Get API key → 03_CLOUD_PROVIDER_SETUP_GUIDE.md
  2. Settings → Transcription Provider → Select “OpenAI”
  3. Model: whisper-1
  4. API Key: sk-proj-...

Processing Time:

  • Any length audio: ~20-40 seconds

Cost Examples:

15-minute meeting: $0.09
1-hour meeting: $0.36
10 hours/month: $3.60
100 hours/month: $36.00

Method 3: Gemini Audio (Cloud, Cost-Effective)

Section titled “Method 3: Gemini Audio (Cloud, Cost-Effective)”

Provider: Google Gemini (cloud service)

Pros:

  • 💰 Cheaper than OpenAI (~50% cost)
  • Fast - ~45 seconds for 1-hour audio
  • Good Accuracy

Cons:

  • ☁️ Requires Internet
  • 📤 Audio uploaded to Google
  • 🆕 Newer - less proven than Whisper

Setup:

  1. Get API key → 03_CLOUD_PROVIDER_SETUP_GUIDE.md
  2. Settings → Transcription Provider → Select “Gemini”
  3. API Key: AIza...

Processing Time:

  • Any length audio: ~30-60 seconds

Auto-Transcribe (Recommended):

  • Toggle: Settings → “Auto-transcribe after recording”
  • ✅ Convenient - no extra clicks
  • ✅ Immediate results
  • ❌ Uses API quota automatically

Manual Transcribe:

  • Button: Three-dot menu → “Transcribe Audio”
  • ✅ Control over when to spend quota
  • ✅ Review recording first
  • ❌ Extra step required

💡 Pro Tip: Use auto-transcribe with local Ollama for unlimited free processing.


Recordings appear in the transcript list with indicators:

📄 Q4 Planning Meeting [🎤 Recording]
Uploaded: Oct 16, 2024
Duration: 01:23:45
Status: ✅ Transcribed

Right-click context menu:

  • 🎵 Play Audio - Listen to original recording
  • 📝 Transcribe Audio - Manual transcription trigger
  • ♻️ Re-transcribe - Try different provider/model
  • 📥 Download Audio - Export original file
  • 🗑️ Delete - Remove transcript and audio

You can transcribe the same recording multiple times:

  1. Right-click transcript
  2. Select “Transcribe Audio”
  3. Choose different provider/model
  4. Compare results between providers

Use cases:

  • Test accuracy of different models
  • Switch from cloud to local
  • Fix poor transcription results
  • Try higher-quality model for important meetings

Single file:

  1. Right-click transcript
  2. Select “Download Audio”
  3. Choose save location
  4. ✅ Original WebM file downloaded

Bulk export: (Planned)

  • Export all recordings from a project
  • Include in full backup (F013)

Windows:

C:\Users\{Username}\AppData\Roaming\selfoss\audio_recordings\
└── project_14\
├── recording_2024-10-16_14-30-25.webm
└── recording_2024-10-16_15-45-10.webm

macOS:

~/Library/Application Support/selfoss/audio_recordings/
└── project_14/
├── recording_2024-10-16_14-30-25.webm
└── recording_2024-10-16_15-45-10.webm

Linux:

~/.local/share/selfoss/audio_recordings/
└── project_14/
├── recording_2024-10-16_14-30-25.webm
└── recording_2024-10-16_15-45-10.webm
recording_YYYY-MM-DD_HH-MM-SS.webm
Examples:
recording_2024-10-16_14-30-25.webm
recording_2024-10-16_15-45-10.webm

WebM Format (default):

  • ~1MB per minute of audio
  • 15-minute meeting: ~15MB
  • 1-hour meeting: ~60MB
  • 10 hours: ~600MB
  • 100 hours: ~6GB

Cleanup Recommendations:

Light usage (< 10 hours/month): No cleanup needed
Medium usage (10-50 hours/month): Review quarterly
Heavy usage (> 50 hours/month): Monthly cleanup

💡 Pro Tip: Set calendar reminder to review and delete old recordings every quarter.


1. Clear Audio:

  • ✅ Quiet environment (minimal background noise)
  • ✅ Close to microphone (6-12 inches)
  • ✅ Quality microphone (not laptop built-in)
  • ❌ Avoid: fans, typing, traffic noise

2. Speaking Style:

  • ✅ Clear enunciation
  • ✅ Moderate pace (not too fast)
  • ✅ Avoid crosstalk (one person at a time)
  • ✅ Use full sentences

3. Technical Content:

  • ✅ Spell out acronyms first time: “API - Application Programming Interface”
  • ✅ Use standard pronunciation for technical terms
  • ✅ Consider higher-quality model (small/medium) for jargon-heavy content

4. Multiple Speakers:

  • ✅ Identify speakers: “This is John speaking…”
  • ✅ Allow pauses between speakers
  • ✅ Consider external recording device for better quality
Content TypeRecommended ModelReasoning
Quick voice noteswhisper:tinySpeed over accuracy
General meetingswhisper:baseBest balance
Technical discussionswhisper:smallBetter with jargon
Legal/medicalwhisper:mediumHighest accuracy needed
Noisy environmentsOpenAI WhisperBest noise handling
Multiple accentsOpenAI WhisperTrained on diverse data

“Microphone not detected” error

Windows:

✅ Settings → Privacy → Microphone → Allow apps
✅ Check default recording device (Control Panel → Sound)
✅ Test in other apps (Voice Recorder)

macOS:

✅ System Preferences → Security & Privacy → Microphone
✅ Enable for Selfoss
✅ Restart app after granting permission

Linux:

Terminal window
# Check available devices
arecord -l
# Test microphone
arecord -d 5 test.wav
aplay test.wav
# Grant permissions (if needed)
chmod +x /dev/snd/*

“No audio in recording” error

✅ Check microphone is selected (system sound settings)
✅ Test microphone in other apps
✅ Verify audio level meter shows activity during recording
✅ Check mute button on microphone
✅ Try different USB port (for external mics)

“Recording stopped unexpectedly”

Possible causes:

  • Low disk space (need > 100MB free)
  • System sleep/hibernation
  • Browser permission revoked
  • Memory exhaustion

Solutions: ✅ Free up disk space
✅ Disable sleep during recording
✅ Close other applications
✅ Record shorter segments (< 2 hours)

“Empty transcription result”

Common causes:

  • Audio file too short (< 1 second)
  • Audio level too low
  • Incorrect audio format
  • Transcription model not downloaded

Solutions:

✅ Record at least 5 seconds
✅ Check audio playback (can you hear it?)
✅ Try different transcription provider
✅ For Ollama: ollama pull whisper:base
✅ Check Ollama is running: ollama list

“Transcription failed” error

For Ollama:

Terminal window
# Check Ollama status
ollama list
# Restart Ollama
# Windows: Restart from system tray
# Linux: sudo systemctl restart ollama
# Test Ollama endpoint
curl http://localhost:11434/api/version

For OpenAI:

✅ Verify API key is valid
✅ Check account has credits
✅ Verify file size < 25MB
✅ Check OpenAI status: status.openai.com

For Gemini:

✅ Verify API key is valid
✅ Check API quota not exceeded
✅ Enable Generative Language API in Google Cloud Console

“Inaccurate transcription”

Improve accuracy:

  1. Use better model:

    • Ollama: Switch from tiny to base or small
    • Cloud: Always uses best model
  2. Improve audio quality:

    • Reduce background noise
    • Speak closer to microphone
    • Use external microphone
  3. Try different provider:

    • OpenAI Whisper often best for accents
    • Ollama better for privacy
    • Gemini good middle ground
  4. Post-process manually:

    • Edit transcript text directly
    • Use interactive editing (F004)

“Audio file too large” error

File size limits:

  • Ollama: No limit (local processing)
  • OpenAI: 25MB maximum
  • Gemini: 20MB maximum

Solutions:

✅ Record in shorter segments (< 1 hour)
✅ Use Ollama for long recordings
✅ Compress audio before upload (future feature)

“Insufficient disk space” error

Check available space:

Terminal window
# Windows
dir C:\Users\{Username}\AppData\Roaming\selfoss
# macOS/Linux
df -h ~/Library/Application\ Support/selfoss

Solutions: ✅ Delete old recordings
✅ Move audio files to external drive
✅ Export and delete processed transcripts
✅ Clean up Whisper model cache


🎉 You’re now an audio recording expert!

  1. 🎤 Practice recording - Test with short samples
  2. 🔊 Optimize audio setup - Quality microphone, quiet environment
  3. ⚙️ Choose transcription method - Local vs cloud
  4. 📊 Process transcripts06_TRANSCRIPT_UPLOAD_PROCESSING_GUIDE.md
  5. 💾 Set up cleanup routine09_DATA_MANAGEMENT_GUIDE.md
  • System audio capture (Windows WASAPI)
  • External recording devices integration
  • Batch transcription workflows
  • Custom Whisper models for specialized domains

🎤 Capture every insight, never miss a moment.