05 - Audio Recording & Transcription Guide

🎤 Capture Meetings with One Click
⏱️ Time Estimate: 15 minutes
📋 What You’ll Learn: Recording audio, transcription methods, file management, troubleshooting

Audio Recording Overview
Starting Your First Recording
Recording Interface
Transcription Methods
Audio File Management
Storage Locations
Transcription Quality Tips
Troubleshooting

Audio Recording Overview

Feature: F011 - Audio Recording & Transcription

Selfoss allows you to:

🎤 Record directly in-app (microphone capture)
🔄 Auto-transcribe recordings (optional)
🔒 Local or cloud transcription (your choice)
💾 Save audio files for future reference
🎯 Link to projects automatically

Recording Workflow

Click Record → Select Project → Record Audio → Stop Recording
   ↓
Auto-Transcribe (if enabled) OR Manual Transcribe (later)
   ↓
Process with AI → View Visualizations

Starting Your First Recording

Prerequisites

✅ Microphone access: Browser will request permission
✅ Project created: Select where to save recording
✅ Transcription configured: Set up Ollama or cloud provider (optional for later)

Step-by-Step: First Recording

Step 1: Click Microphone Icon

Located in header (top-right area)
Icon: 🎤 microphone symbol
Available on any page

Step 2: Allow Microphone Permission

Browser popup: “Allow Selfoss to access your microphone?”
Click “Allow”
✅ Permission saved for future recordings

Step 3: Select Project

Dropdown appears showing all your projects
Choose destination project
Recording will be linked to this project

💡 Pro Tip: Create a dedicated “Voice Notes” project for quick recordings.

Step 4: Start Recording

Click “Start Recording” button
Recording begins immediately
Red pulsating indicator shows recording is active

Step 5: Capture Your Audio

Speak clearly into microphone
Monitor duration timer (MM:SS format)
Audio levels indicator shows input (visual feedback)

Step 6: Stop Recording

Click “Stop Recording” button
Audio file saved automatically
Transcript entry appears in project list

⏱️ Processing Time:

Save audio: Instant
Auto-transcribe (if enabled): 10 seconds - 5 minutes

Recording Interface

┌──────────────────────────────────┐
│ 🎤 Recording                     │
├──────────────────────────────────┤
│ Project: [Engineering Team   ▼] │
│                                  │
│ ⏺️  Recording...    03:24       │
│                                  │
│ 🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪  Audio Level  │
│                                  │
│ [⏸️ Pause]    [⏹️ Stop & Save]   │
└──────────────────────────────────┘

Recording Indicators

Red Pulsating Dot: 🔴 (top-right of app)

Visible on all pages while recording
Ensures you don’t forget an active recording
Click to return to recording interface

Duration Timer: MM:SS

Real-time duration display
Updates every second
Helps track meeting length

Audio Level Meter: 🟢🟢🟢⚪⚪

Visual feedback of microphone input
Green bars show audio strength
Helps verify microphone is working

Recording Controls

Button	Action	Shortcut (Planned)
⏺️ Start	Begin recording	`Ctrl+R`
⏸️ Pause	Pause recording	`Ctrl+P`
⏹️ Stop	Save & finish	`Ctrl+S`

💡 Pro Tip: Test your microphone with a 5-second recording before important meetings.

Transcription Methods

Selfoss supports multiple transcription providers with different trade-offs.

Method 1: Whisper.cpp (Local, Privacy-First)

Provider: Ollama Whisper (runs locally)

Pros:

🔒 100% Private - audio never leaves your device
✅ Free - no API costs
🌐 Offline - works without internet

Cons:

⏱️ Slower - 2-5 minutes for 1-hour audio
💻 Hardware dependent - requires decent CPU/GPU

Setup:

Install Ollama → 02_PRIVACY_FIRST_SETUP_GUIDE.md
Download model: ollama pull whisper:base
Settings → Transcription Provider → Select “Ollama”
Model: whisper:base

Processing Time:

Model: whisper:tiny
- 15-min meeting: ~30 seconds
- 1-hour meeting: ~2 minutes

Model: whisper:base (recommended)
- 15-min meeting: ~1 minute
- 1-hour meeting: ~4 minutes

Model: whisper:small
- 15-min meeting: ~3 minutes
- 1-hour meeting: ~12 minutes

Method 2: OpenAI Whisper API (Cloud, Fast)

Provider: OpenAI (cloud service)

Pros:

⚡ Very Fast - 30 seconds for 1-hour audio
⭐ Excellent Accuracy - state-of-the-art model
🖥️ No Hardware Requirements

Cons:

💰 Paid - $0.006 per minute ($0.36/hour)
☁️ Requires Internet
📤 Audio uploaded to OpenAI

Setup:

Get API key → 03_CLOUD_PROVIDER_SETUP_GUIDE.md
Settings → Transcription Provider → Select “OpenAI”
Model: whisper-1
API Key: sk-proj-...

Processing Time:

Any length audio: ~20-40 seconds

Cost Examples:

15-minute meeting: $0.09
1-hour meeting:    $0.36
10 hours/month:    $3.60
100 hours/month:   $36.00

Method 3: Gemini Audio (Cloud, Cost-Effective)

Provider: Google Gemini (cloud service)

Pros:

💰 Cheaper than OpenAI (~50% cost)
⚡ Fast - ~45 seconds for 1-hour audio
⭐ Good Accuracy

Cons:

☁️ Requires Internet
📤 Audio uploaded to Google
🆕 Newer - less proven than Whisper

Setup:

Get API key → 03_CLOUD_PROVIDER_SETUP_GUIDE.md
Settings → Transcription Provider → Select “Gemini”
API Key: AIza...

Processing Time:

Any length audio: ~30-60 seconds

Auto-Transcribe vs Manual

Auto-Transcribe (Recommended):

Toggle: Settings → “Auto-transcribe after recording”
✅ Convenient - no extra clicks
✅ Immediate results
❌ Uses API quota automatically

Manual Transcribe:

Button: Three-dot menu → “Transcribe Audio”
✅ Control over when to spend quota
✅ Review recording first
❌ Extra step required

💡 Pro Tip: Use auto-transcribe with local Ollama for unlimited free processing.

Audio File Management

Viewing Recorded Transcripts

Recordings appear in the transcript list with indicators:

📄 Q4 Planning Meeting           [🎤 Recording]
   Uploaded: Oct 16, 2024
   Duration: 01:23:45
   Status: ✅ Transcribed

Transcript Actions

Right-click context menu:

🎵 Play Audio - Listen to original recording
📝 Transcribe Audio - Manual transcription trigger
♻️ Re-transcribe - Try different provider/model
📥 Download Audio - Export original file
🗑️ Delete - Remove transcript and audio

Re-transcribing Recordings

You can transcribe the same recording multiple times:

Right-click transcript
Select “Transcribe Audio”
Choose different provider/model
Compare results between providers

Use cases:

Test accuracy of different models
Switch from cloud to local
Fix poor transcription results
Try higher-quality model for important meetings

Exporting Audio Files

Single file:

Right-click transcript
Select “Download Audio”
Choose save location
✅ Original WebM file downloaded

Bulk export: (Planned)

Export all recordings from a project
Include in full backup (F013)

Storage Locations

Audio File Storage

Windows:

C:\Users\{Username}\AppData\Roaming\selfoss\audio_recordings\
└── project_14\
    ├── recording_2024-10-16_14-30-25.webm
    └── recording_2024-10-16_15-45-10.webm

macOS:

~/Library/Application Support/selfoss/audio_recordings/
└── project_14/
    ├── recording_2024-10-16_14-30-25.webm
    └── recording_2024-10-16_15-45-10.webm

Linux:

~/.local/share/selfoss/audio_recordings/
└── project_14/
    ├── recording_2024-10-16_14-30-25.webm
    └── recording_2024-10-16_15-45-10.webm

File Naming Convention

recording_YYYY-MM-DD_HH-MM-SS.webm

Examples:
recording_2024-10-16_14-30-25.webm
recording_2024-10-16_15-45-10.webm

Storage Estimates

WebM Format (default):

~1MB per minute of audio
15-minute meeting: ~15MB
1-hour meeting: ~60MB
10 hours: ~600MB
100 hours: ~6GB

Cleanup Recommendations:

Light usage (< 10 hours/month):  No cleanup needed
Medium usage (10-50 hours/month): Review quarterly
Heavy usage (> 50 hours/month):   Monthly cleanup

💡 Pro Tip: Set calendar reminder to review and delete old recordings every quarter.

Transcription Quality Tips

For Best Results

1. Clear Audio:

✅ Quiet environment (minimal background noise)
✅ Close to microphone (6-12 inches)
✅ Quality microphone (not laptop built-in)
❌ Avoid: fans, typing, traffic noise

2. Speaking Style:

✅ Clear enunciation
✅ Moderate pace (not too fast)
✅ Avoid crosstalk (one person at a time)
✅ Use full sentences

3. Technical Content:

✅ Spell out acronyms first time: “API - Application Programming Interface”
✅ Use standard pronunciation for technical terms
✅ Consider higher-quality model (small/medium) for jargon-heavy content

4. Multiple Speakers:

✅ Identify speakers: “This is John speaking…”
✅ Allow pauses between speakers
✅ Consider external recording device for better quality

Model Selection Guide

Content Type	Recommended Model	Reasoning
Quick voice notes	whisper:tiny	Speed over accuracy
General meetings	whisper:base	Best balance
Technical discussions	whisper:small	Better with jargon
Legal/medical	whisper:medium	Highest accuracy needed
Noisy environments	OpenAI Whisper	Best noise handling
Multiple accents	OpenAI Whisper	Trained on diverse data

Troubleshooting

Recording Issues

“Microphone not detected” error

Windows:

✅ Settings → Privacy → Microphone → Allow apps
✅ Check default recording device (Control Panel → Sound)
✅ Test in other apps (Voice Recorder)

macOS:

✅ System Preferences → Security & Privacy → Microphone
✅ Enable for Selfoss
✅ Restart app after granting permission

Linux:

# Check available devices
arecord -l

# Test microphone
arecord -d 5 test.wav
aplay test.wav

# Grant permissions (if needed)
chmod +x /dev/snd/*

“No audio in recording” error

✅ Check microphone is selected (system sound settings)
✅ Test microphone in other apps
✅ Verify audio level meter shows activity during recording
✅ Check mute button on microphone
✅ Try different USB port (for external mics)

“Recording stopped unexpectedly”

Possible causes:

Low disk space (need > 100MB free)
System sleep/hibernation
Browser permission revoked
Memory exhaustion

Solutions: ✅ Free up disk space
✅ Disable sleep during recording
✅ Close other applications
✅ Record shorter segments (< 2 hours)

Transcription Issues

“Empty transcription result”

Common causes:

Audio file too short (< 1 second)
Audio level too low
Incorrect audio format
Transcription model not downloaded

Solutions:

✅ Record at least 5 seconds
✅ Check audio playback (can you hear it?)
✅ Try different transcription provider
✅ For Ollama: ollama pull whisper:base
✅ Check Ollama is running: ollama list

“Transcription failed” error

For Ollama:

# Check Ollama status
ollama list

# Restart Ollama
# Windows: Restart from system tray
# Linux: sudo systemctl restart ollama

# Test Ollama endpoint
curl http://localhost:11434/api/version

For OpenAI:

✅ Verify API key is valid
✅ Check account has credits
✅ Verify file size < 25MB
✅ Check OpenAI status: status.openai.com

For Gemini:

✅ Verify API key is valid
✅ Check API quota not exceeded
✅ Enable Generative Language API in Google Cloud Console

“Inaccurate transcription”

Improve accuracy:

Use better model:
- Ollama: Switch from tiny to base or small
- Cloud: Always uses best model
Improve audio quality:
- Reduce background noise
- Speak closer to microphone
- Use external microphone
Try different provider:
- OpenAI Whisper often best for accents
- Ollama better for privacy
- Gemini good middle ground
Post-process manually:
- Edit transcript text directly
- Use interactive editing (F004)

File Size Issues

“Audio file too large” error

File size limits:

Ollama: No limit (local processing)
OpenAI: 25MB maximum
Gemini: 20MB maximum

Solutions:

✅ Record in shorter segments (< 1 hour)
✅ Use Ollama for long recordings
✅ Compress audio before upload (future feature)

“Insufficient disk space” error

Check available space:

# Windows
dir C:\Users\{Username}\AppData\Roaming\selfoss

# macOS/Linux
df -h ~/Library/Application\ Support/selfoss

Solutions: ✅ Delete old recordings
✅ Move audio files to external drive
✅ Export and delete processed transcripts
✅ Clean up Whisper model cache

Next Steps

🎉 You’re now an audio recording expert!

Recommended Actions:

🎤 Practice recording - Test with short samples
🔊 Optimize audio setup - Quality microphone, quiet environment
⚙️ Choose transcription method - Local vs cloud
📊 Process transcripts → 06_TRANSCRIPT_UPLOAD_PROCESSING_GUIDE.md
💾 Set up cleanup routine → 09_DATA_MANAGEMENT_GUIDE.md

Advanced Topics:

System audio capture (Windows WASAPI)
External recording devices integration
Batch transcription workflows
Custom Whisper models for specialized domains

🎤 Capture every insight, never miss a moment.

05 - Audio Recording & Transcription Guide

05 - Audio Recording & Transcription Guide

Table of Contents

Audio Recording Overview

Feature: F011 - Audio Recording & Transcription

Recording Workflow

Starting Your First Recording

Prerequisites

Step-by-Step: First Recording

Recording Interface

Recording Modal (Simplified in Phase 4)

Recording Indicators

Recording Controls

Transcription Methods

Method 1: Whisper.cpp (Local, Privacy-First)

Method 2: OpenAI Whisper API (Cloud, Fast)

Method 3: Gemini Audio (Cloud, Cost-Effective)

Auto-Transcribe vs Manual

Audio File Management

Viewing Recorded Transcripts

Transcript Actions

Re-transcribing Recordings

Exporting Audio Files

Storage Locations

Audio File Storage

File Naming Convention

Storage Estimates

Transcription Quality Tips

For Best Results

Model Selection Guide

Troubleshooting

Recording Issues

Transcription Issues

File Size Issues

Next Steps

Recommended Actions:

Advanced Topics: