05 - Audio Recording & Transcription Guide
05 - Audio Recording & Transcription Guide
Section titled “05 - Audio Recording & Transcription Guide”🎤 Capture Meetings with One Click
⏱️ Time Estimate: 15 minutes
📋 What You’ll Learn: Recording audio, transcription methods, file management, troubleshooting
Table of Contents
Section titled “Table of Contents”- Audio Recording Overview
- Starting Your First Recording
- Recording Interface
- Transcription Methods
- Audio File Management
- Storage Locations
- Transcription Quality Tips
- Troubleshooting
Audio Recording Overview
Section titled “Audio Recording Overview”Feature: F011 - Audio Recording & Transcription
Section titled “Feature: F011 - Audio Recording & Transcription”Selfoss allows you to:
- 🎤 Record directly in-app (microphone capture)
- 🔄 Auto-transcribe recordings (optional)
- 🔒 Local or cloud transcription (your choice)
- 💾 Save audio files for future reference
- 🎯 Link to projects automatically
Recording Workflow
Section titled “Recording Workflow”Click Record → Select Project → Record Audio → Stop Recording ↓Auto-Transcribe (if enabled) OR Manual Transcribe (later) ↓Process with AI → View VisualizationsStarting Your First Recording
Section titled “Starting Your First Recording”Prerequisites
Section titled “Prerequisites”✅ Microphone access: Browser will request permission
✅ Project created: Select where to save recording
✅ Transcription configured: Set up Ollama or cloud provider (optional for later)
Step-by-Step: First Recording
Section titled “Step-by-Step: First Recording”Step 1: Click Microphone Icon
- Located in header (top-right area)
- Icon: 🎤 microphone symbol
- Available on any page
Step 2: Allow Microphone Permission
- Browser popup: “Allow Selfoss to access your microphone?”
- Click “Allow”
- ✅ Permission saved for future recordings
Step 3: Select Project
- Dropdown appears showing all your projects
- Choose destination project
- Recording will be linked to this project
💡 Pro Tip: Create a dedicated “Voice Notes” project for quick recordings.
Step 4: Start Recording
- Click “Start Recording” button
- Recording begins immediately
- Red pulsating indicator shows recording is active
Step 5: Capture Your Audio
- Speak clearly into microphone
- Monitor duration timer (MM:SS format)
- Audio levels indicator shows input (visual feedback)
Step 6: Stop Recording
- Click “Stop Recording” button
- Audio file saved automatically
- Transcript entry appears in project list
⏱️ Processing Time:
- Save audio: Instant
- Auto-transcribe (if enabled): 10 seconds - 5 minutes
Recording Interface
Section titled “Recording Interface”Recording Modal (Simplified in Phase 4)
Section titled “Recording Modal (Simplified in Phase 4)”┌──────────────────────────────────┐│ 🎤 Recording │├──────────────────────────────────┤│ Project: [Engineering Team ▼] ││ ││ ⏺️ Recording... 03:24 ││ ││ 🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪ Audio Level ││ ││ [⏸️ Pause] [⏹️ Stop & Save] │└──────────────────────────────────┘Recording Indicators
Section titled “Recording Indicators”Red Pulsating Dot: 🔴 (top-right of app)
- Visible on all pages while recording
- Ensures you don’t forget an active recording
- Click to return to recording interface
Duration Timer: MM:SS
- Real-time duration display
- Updates every second
- Helps track meeting length
Audio Level Meter: 🟢🟢🟢⚪⚪
- Visual feedback of microphone input
- Green bars show audio strength
- Helps verify microphone is working
Recording Controls
Section titled “Recording Controls”| Button | Action | Shortcut (Planned) |
|---|---|---|
| ⏺️ Start | Begin recording | Ctrl+R |
| ⏸️ Pause | Pause recording | Ctrl+P |
| ⏹️ Stop | Save & finish | Ctrl+S |
💡 Pro Tip: Test your microphone with a 5-second recording before important meetings.
Transcription Methods
Section titled “Transcription Methods”Selfoss supports multiple transcription providers with different trade-offs.
Method 1: Whisper.cpp (Local, Privacy-First)
Section titled “Method 1: Whisper.cpp (Local, Privacy-First)”Provider: Ollama Whisper (runs locally)
Pros:
- 🔒 100% Private - audio never leaves your device
- ✅ Free - no API costs
- 🌐 Offline - works without internet
Cons:
- ⏱️ Slower - 2-5 minutes for 1-hour audio
- 💻 Hardware dependent - requires decent CPU/GPU
Setup:
- Install Ollama → 02_PRIVACY_FIRST_SETUP_GUIDE.md
- Download model:
ollama pull whisper:base - Settings → Transcription Provider → Select “Ollama”
- Model:
whisper:base
Processing Time:
Model: whisper:tiny- 15-min meeting: ~30 seconds- 1-hour meeting: ~2 minutes
Model: whisper:base (recommended)- 15-min meeting: ~1 minute- 1-hour meeting: ~4 minutes
Model: whisper:small- 15-min meeting: ~3 minutes- 1-hour meeting: ~12 minutesMethod 2: OpenAI Whisper API (Cloud, Fast)
Section titled “Method 2: OpenAI Whisper API (Cloud, Fast)”Provider: OpenAI (cloud service)
Pros:
- ⚡ Very Fast - 30 seconds for 1-hour audio
- ⭐ Excellent Accuracy - state-of-the-art model
- 🖥️ No Hardware Requirements
Cons:
- 💰 Paid - $0.006 per minute ($0.36/hour)
- ☁️ Requires Internet
- 📤 Audio uploaded to OpenAI
Setup:
- Get API key → 03_CLOUD_PROVIDER_SETUP_GUIDE.md
- Settings → Transcription Provider → Select “OpenAI”
- Model:
whisper-1 - API Key:
sk-proj-...
Processing Time:
- Any length audio: ~20-40 seconds
Cost Examples:
15-minute meeting: $0.091-hour meeting: $0.3610 hours/month: $3.60100 hours/month: $36.00Method 3: Gemini Audio (Cloud, Cost-Effective)
Section titled “Method 3: Gemini Audio (Cloud, Cost-Effective)”Provider: Google Gemini (cloud service)
Pros:
- 💰 Cheaper than OpenAI (~50% cost)
- ⚡ Fast - ~45 seconds for 1-hour audio
- ⭐ Good Accuracy
Cons:
- ☁️ Requires Internet
- 📤 Audio uploaded to Google
- 🆕 Newer - less proven than Whisper
Setup:
- Get API key → 03_CLOUD_PROVIDER_SETUP_GUIDE.md
- Settings → Transcription Provider → Select “Gemini”
- API Key:
AIza...
Processing Time:
- Any length audio: ~30-60 seconds
Auto-Transcribe vs Manual
Section titled “Auto-Transcribe vs Manual”Auto-Transcribe (Recommended):
- Toggle: Settings → “Auto-transcribe after recording”
- ✅ Convenient - no extra clicks
- ✅ Immediate results
- ❌ Uses API quota automatically
Manual Transcribe:
- Button: Three-dot menu → “Transcribe Audio”
- ✅ Control over when to spend quota
- ✅ Review recording first
- ❌ Extra step required
💡 Pro Tip: Use auto-transcribe with local Ollama for unlimited free processing.
Audio File Management
Section titled “Audio File Management”Viewing Recorded Transcripts
Section titled “Viewing Recorded Transcripts”Recordings appear in the transcript list with indicators:
📄 Q4 Planning Meeting [🎤 Recording] Uploaded: Oct 16, 2024 Duration: 01:23:45 Status: ✅ TranscribedTranscript Actions
Section titled “Transcript Actions”Right-click context menu:
- 🎵 Play Audio - Listen to original recording
- 📝 Transcribe Audio - Manual transcription trigger
- ♻️ Re-transcribe - Try different provider/model
- 📥 Download Audio - Export original file
- 🗑️ Delete - Remove transcript and audio
Re-transcribing Recordings
Section titled “Re-transcribing Recordings”You can transcribe the same recording multiple times:
- Right-click transcript
- Select “Transcribe Audio”
- Choose different provider/model
- Compare results between providers
Use cases:
- Test accuracy of different models
- Switch from cloud to local
- Fix poor transcription results
- Try higher-quality model for important meetings
Exporting Audio Files
Section titled “Exporting Audio Files”Single file:
- Right-click transcript
- Select “Download Audio”
- Choose save location
- ✅ Original WebM file downloaded
Bulk export: (Planned)
- Export all recordings from a project
- Include in full backup (F013)
Storage Locations
Section titled “Storage Locations”Audio File Storage
Section titled “Audio File Storage”Windows:
C:\Users\{Username}\AppData\Roaming\selfoss\audio_recordings\└── project_14\ ├── recording_2024-10-16_14-30-25.webm └── recording_2024-10-16_15-45-10.webmmacOS:
~/Library/Application Support/selfoss/audio_recordings/└── project_14/ ├── recording_2024-10-16_14-30-25.webm └── recording_2024-10-16_15-45-10.webmLinux:
~/.local/share/selfoss/audio_recordings/└── project_14/ ├── recording_2024-10-16_14-30-25.webm └── recording_2024-10-16_15-45-10.webmFile Naming Convention
Section titled “File Naming Convention”recording_YYYY-MM-DD_HH-MM-SS.webm
Examples:recording_2024-10-16_14-30-25.webmrecording_2024-10-16_15-45-10.webmStorage Estimates
Section titled “Storage Estimates”WebM Format (default):
- ~1MB per minute of audio
- 15-minute meeting: ~15MB
- 1-hour meeting: ~60MB
- 10 hours: ~600MB
- 100 hours: ~6GB
Cleanup Recommendations:
Light usage (< 10 hours/month): No cleanup neededMedium usage (10-50 hours/month): Review quarterlyHeavy usage (> 50 hours/month): Monthly cleanup💡 Pro Tip: Set calendar reminder to review and delete old recordings every quarter.
Transcription Quality Tips
Section titled “Transcription Quality Tips”For Best Results
Section titled “For Best Results”1. Clear Audio:
- ✅ Quiet environment (minimal background noise)
- ✅ Close to microphone (6-12 inches)
- ✅ Quality microphone (not laptop built-in)
- ❌ Avoid: fans, typing, traffic noise
2. Speaking Style:
- ✅ Clear enunciation
- ✅ Moderate pace (not too fast)
- ✅ Avoid crosstalk (one person at a time)
- ✅ Use full sentences
3. Technical Content:
- ✅ Spell out acronyms first time: “API - Application Programming Interface”
- ✅ Use standard pronunciation for technical terms
- ✅ Consider higher-quality model (small/medium) for jargon-heavy content
4. Multiple Speakers:
- ✅ Identify speakers: “This is John speaking…”
- ✅ Allow pauses between speakers
- ✅ Consider external recording device for better quality
Model Selection Guide
Section titled “Model Selection Guide”| Content Type | Recommended Model | Reasoning |
|---|---|---|
| Quick voice notes | whisper:tiny | Speed over accuracy |
| General meetings | whisper:base | Best balance |
| Technical discussions | whisper:small | Better with jargon |
| Legal/medical | whisper:medium | Highest accuracy needed |
| Noisy environments | OpenAI Whisper | Best noise handling |
| Multiple accents | OpenAI Whisper | Trained on diverse data |
Troubleshooting
Section titled “Troubleshooting”Recording Issues
Section titled “Recording Issues”“Microphone not detected” error
Windows:
✅ Settings → Privacy → Microphone → Allow apps✅ Check default recording device (Control Panel → Sound)✅ Test in other apps (Voice Recorder)macOS:
✅ System Preferences → Security & Privacy → Microphone✅ Enable for Selfoss✅ Restart app after granting permissionLinux:
# Check available devicesarecord -l
# Test microphonearecord -d 5 test.wavaplay test.wav
# Grant permissions (if needed)chmod +x /dev/snd/*“No audio in recording” error
✅ Check microphone is selected (system sound settings)
✅ Test microphone in other apps
✅ Verify audio level meter shows activity during recording
✅ Check mute button on microphone
✅ Try different USB port (for external mics)
“Recording stopped unexpectedly”
Possible causes:
- Low disk space (need > 100MB free)
- System sleep/hibernation
- Browser permission revoked
- Memory exhaustion
Solutions:
✅ Free up disk space
✅ Disable sleep during recording
✅ Close other applications
✅ Record shorter segments (< 2 hours)
Transcription Issues
Section titled “Transcription Issues”“Empty transcription result”
Common causes:
- Audio file too short (< 1 second)
- Audio level too low
- Incorrect audio format
- Transcription model not downloaded
Solutions:
✅ Record at least 5 seconds✅ Check audio playback (can you hear it?)✅ Try different transcription provider✅ For Ollama: ollama pull whisper:base✅ Check Ollama is running: ollama list“Transcription failed” error
For Ollama:
# Check Ollama statusollama list
# Restart Ollama# Windows: Restart from system tray# Linux: sudo systemctl restart ollama
# Test Ollama endpointcurl http://localhost:11434/api/versionFor OpenAI:
✅ Verify API key is valid✅ Check account has credits✅ Verify file size < 25MB✅ Check OpenAI status: status.openai.comFor Gemini:
✅ Verify API key is valid✅ Check API quota not exceeded✅ Enable Generative Language API in Google Cloud Console“Inaccurate transcription”
Improve accuracy:
-
Use better model:
- Ollama: Switch from
tinytobaseorsmall - Cloud: Always uses best model
- Ollama: Switch from
-
Improve audio quality:
- Reduce background noise
- Speak closer to microphone
- Use external microphone
-
Try different provider:
- OpenAI Whisper often best for accents
- Ollama better for privacy
- Gemini good middle ground
-
Post-process manually:
- Edit transcript text directly
- Use interactive editing (F004)
File Size Issues
Section titled “File Size Issues”“Audio file too large” error
File size limits:
- Ollama: No limit (local processing)
- OpenAI: 25MB maximum
- Gemini: 20MB maximum
Solutions:
✅ Record in shorter segments (< 1 hour)✅ Use Ollama for long recordings✅ Compress audio before upload (future feature)“Insufficient disk space” error
Check available space:
# Windowsdir C:\Users\{Username}\AppData\Roaming\selfoss
# macOS/Linuxdf -h ~/Library/Application\ Support/selfossSolutions:
✅ Delete old recordings
✅ Move audio files to external drive
✅ Export and delete processed transcripts
✅ Clean up Whisper model cache
Next Steps
Section titled “Next Steps”🎉 You’re now an audio recording expert!
Recommended Actions:
Section titled “Recommended Actions:”- 🎤 Practice recording - Test with short samples
- 🔊 Optimize audio setup - Quality microphone, quiet environment
- ⚙️ Choose transcription method - Local vs cloud
- 📊 Process transcripts → 06_TRANSCRIPT_UPLOAD_PROCESSING_GUIDE.md
- 💾 Set up cleanup routine → 09_DATA_MANAGEMENT_GUIDE.md
Advanced Topics:
Section titled “Advanced Topics:”- System audio capture (Windows WASAPI)
- External recording devices integration
- Batch transcription workflows
- Custom Whisper models for specialized domains
🎤 Capture every insight, never miss a moment.