Skip to content

02 - Privacy-First Setup Guide

πŸ”’ 100% Local Mode - Zero Cloud Dependencies
⏱️ Time Estimate: 20-30 minutes
πŸ“‹ What You’ll Learn: How to set up Selfoss with complete local processing using Ollama and Whisper.cpp



FeatureLocal ModeCloud Mode
PrivacyπŸ”’ Complete - data never leaves your device⚠️ Data sent to AI providers
Costβœ… Free forever (after initial setup)πŸ’° Pay-per-use API costs
Internetβœ… Works offline❌ Requires connection
Data Retentionβœ… No external storage⚠️ Stored by providers temporarily
Speed⏱️ Moderate (depends on hardware)⚑ Fast (cloud GPUs)
SetupπŸ”§ Requires installation✨ Just need API keys
  • πŸ’Ό Confidential business meetings
  • πŸ₯ Healthcare discussions (HIPAA compliance)
  • πŸ’° Financial planning sessions
  • πŸ” Security-sensitive environments
  • 🌍 Offline or low-connectivity scenarios
  • πŸ“‰ Cost-conscious users with high volume

Minimum:

  • CPU: Quad-core processor (Intel i5 or AMD equivalent)
  • RAM: 8GB (16GB recommended)
  • Storage: 10GB free space for models
  • GPU: Optional (speeds up processing significantly)

Recommended for Best Performance:

  • CPU: 8+ cores
  • RAM: 16GB+
  • GPU: NVIDIA GPU with 6GB+ VRAM (for GPU acceleration)
  • Storage: SSD with 20GB+ free space

πŸ’‘ Pro Tip: GPU acceleration can reduce transcription time by 5-10x. If you have an NVIDIA GPU, make sure to install CUDA drivers.


Ollama provides the local LLM infrastructure for text analysis.

  1. Download Ollama:

  2. Install:

    Terminal window
    # Run the downloaded installer
    # Follow the on-screen instructions
  3. Verify Installation:

    Terminal window
    ollama --version
    # Should output: ollama version x.x.x
  4. Pull Required Models:

    Terminal window
    # For text analysis (required)
    ollama pull llama3.1:latest
    # For transcription (required)
    ollama pull whisper:base
    # Optional: Larger models for better accuracy
    ollama pull llama3.1:70b
    ollama pull whisper:large

⏱️ Download Time: 5-15 minutes per model (depending on internet speed)

  1. Download Ollama:

    Terminal window
    # Visit https://ollama.com/download
    # Download the macOS installer (.dmg)
  2. Install:

    • Open the .dmg file
    • Drag Ollama to Applications
    • Launch Ollama from Applications
  3. Verify Installation:

    Terminal window
    ollama --version
  4. Pull Required Models:

    Terminal window
    ollama pull llama3.1:latest
    ollama pull whisper:base
  1. Install via Script:

    Terminal window
    curl -fsSL https://ollama.com/install.sh | sh
  2. Verify Installation:

    Terminal window
    ollama --version
  3. Start Ollama Service:

    Terminal window
    # Start as service
    sudo systemctl start ollama
    # Enable on boot
    sudo systemctl enable ollama
  4. Pull Required Models:

    Terminal window
    ollama pull llama3.1:latest
    ollama pull whisper:base
Terminal window
# Test text generation
ollama run llama3.1:latest "Hello, how are you?"
# Test transcription (requires audio file)
# Ollama will be tested via Selfoss interface

βœ… Success: If you see a response, Ollama is working!


Whisper.cpp provides local audio transcription without external APIs.

ModelSizeSpeedAccuracyBest For
tiny.en~75MB⚑⚑⚑ Very Fast⭐⭐ BasicQuick notes, clear audio
base.en~140MB⚑⚑ Fast⭐⭐⭐ GoodGeneral meetings, standard quality
small.en~460MB⚑ Moderate⭐⭐⭐⭐ Very GoodProfessional meetings, important content
medium.en~1.5GB🐒 Slow⭐⭐⭐⭐⭐ ExcellentHigh-accuracy needs, technical content
large-v3~3GB🐒🐒 Very Slow⭐⭐⭐⭐⭐ BestCritical transcripts, noisy audio

πŸ’‘ Recommendation: Start with base.en for the best balance of speed and accuracy.

The easiest way is to use Ollama’s Whisper integration:

Terminal window
# Download recommended model
ollama pull whisper:base
# Optional: Download other sizes
ollama pull whisper:tiny # Fastest
ollama pull whisper:small # Better accuracy
ollama pull whisper:medium # High accuracy
ollama pull whisper:large # Best accuracy

If you want to use Whisper.cpp directly (without Ollama):

Windows/macOS/Linux:

  1. Download models from Hugging Face
  2. Place in: ~/.cache/whisper/ (Linux/macOS) or %USERPROFILE%\.cache\whisper\ (Windows)

Now configure Selfoss to use your local setup.

  1. Launch Selfoss
  2. Click βš™οΈ Settings in the header
  3. Navigate to LLM & Processing section

For Audio β†’ Text (Transcription):

  1. Provider: Select Ollama
  2. Model: Select whisper:base (or your chosen model)
  3. Ollama Endpoint: Leave as http://localhost:11434 (default)
  4. API Key: Leave empty (not needed for local)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Transcription LLM Settings β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Provider: [Ollama β–Ό] β”‚
β”‚ Model: [whisper:base β–Ό] β”‚
β”‚ Endpoint: http://localhost:11434β”‚
β”‚ API Key: (leave empty) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For Text β†’ Insights (Analysis):

  1. Provider: Select Ollama
  2. Model: Select llama3.1:latest
  3. Ollama Endpoint: Leave as http://localhost:11434
  4. API Key: Leave empty
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Analysis LLM Settings β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Provider: [Ollama β–Ό] β”‚
β”‚ Model: [llama3.1:latest β–Ό] β”‚
β”‚ Endpoint: http://localhost:11434β”‚
β”‚ API Key: (leave empty) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Toggle these settings for convenience:

  • βœ… Auto-transcribe after recording: Automatically process audio
  • βœ… Auto-analyze after transcription: Automatically generate insights

⚠️ Note: Auto-analyze will start immediately after transcription completes.

Click β€œSave Settings” at the bottom of the page.

βœ… Success: You’ll see a confirmation toast notification.


  1. Go to Settings β†’ LLM & Processing
  2. Verify you see your models listed
  3. Endpoint should show http://localhost:11434

βœ… Success: Models are loaded and ready.

  1. Create a test project
  2. Click the microphone icon 🎀
  3. Record a short message (10-15 seconds)
  4. Stop recording
  5. Wait for auto-transcription (or click β€œTranscribe Audio”)

⏱️ Expected Time:

  • Whisper tiny: ~5-10 seconds
  • Whisper base: ~15-30 seconds
  • Whisper small: ~30-60 seconds

βœ… Success: You see transcribed text in the transcript view.

  1. Upload a test transcript file (.txt)
  2. Click β€œStart Analysis”
  3. Wait for processing

⏱️ Expected Time:

  • Short transcript (1 page): ~10-20 seconds
  • Medium transcript (5 pages): ~30-60 seconds
  • Long transcript (20+ pages): 2-5 minutes

βœ… Success: You see decisions, actions, and concepts visualized.


Whisper Tiny (75MB)

  • βœ… Fastest processing (real-time capable)
  • βœ… Minimal disk space
  • ❌ May miss technical terms
  • ❌ Struggles with accents
  • Use for: Quick voice notes, clear audio

Whisper Base (140MB) ⭐ Recommended

  • βœ… Good balance of speed and accuracy
  • βœ… Handles most accents well
  • βœ… Reasonable disk space
  • Use for: General meetings, standard transcription

Whisper Small (460MB)

  • βœ… Excellent accuracy
  • βœ… Better with technical terminology
  • ❌ Slower processing (3-4x base)
  • Use for: Important meetings, professional content

Whisper Large (3GB)

  • βœ… Best possible accuracy
  • βœ… Handles noisy audio well
  • ❌ Very slow (10x base)
  • ❌ Large disk space required
  • Use for: Critical transcripts only

Llama 3.1 (4GB)

  • βœ… Good general-purpose model
  • βœ… Fast inference
  • βœ… Handles most business content
  • Use for: Standard meeting analysis

Llama 3.1 70B (40GB)

  • βœ… State-of-the-art accuracy
  • βœ… Better reasoning
  • ❌ Requires 48GB+ RAM
  • ❌ Much slower processing
  • Use for: Complex strategic discussions

πŸ’‘ Pro Tip: Use base for transcription and llama3.1 for analysis. This gives you the best performance/quality balance for most use cases.


Ollama: ~500MB
whisper:base: ~140MB
llama3.1:latest: ~4GB
─────────────────────────────
Total: ~5GB
Ollama: ~500MB
whisper:base: ~140MB
whisper:small: ~460MB
llama3.1:latest: ~4GB
─────────────────────────────
Total: ~5.5GB
Ollama: ~500MB
whisper:tiny: ~75MB
whisper:base: ~140MB
whisper:small: ~460MB
whisper:medium: ~1.5GB
llama3.1:latest: ~4GB
llama3.1:70b: ~40GB
─────────────────────────────
Total: ~47GB

Recordings are stored in:

  • Windows: C:\Users\{Username}\AppData\Roaming\selfoss\audio_recordings\
  • macOS: ~/Library/Application Support/selfoss/audio_recordings/
  • Linux: ~/.local/share/selfoss/audio_recordings/

Estimate: ~1MB per minute of audio (WebM format)

  • 1 hour meeting: ~60MB
  • 10 hours: ~600MB
  • 100 hours: ~6GB

πŸ’‘ Pro Tip: Set up periodic cleanup of old recordings to save space. See 09_DATA_MANAGEMENT_GUIDE.md.


β€œCannot connect to Ollama” error

Terminal window
# Check if Ollama is running
ollama list
# Restart Ollama (Windows)
# Close from system tray and relaunch
# Restart Ollama (Linux)
sudo systemctl restart ollama
# Check endpoint
curl http://localhost:11434/api/version

Models not appearing in Selfoss

Terminal window
# List installed models
ollama list
# Pull missing models
ollama pull whisper:base
ollama pull llama3.1:latest

Slow transcription on CPU

  • βœ… Close other applications to free RAM
  • βœ… Use smaller model (tiny or base)
  • βœ… Consider GPU acceleration

β€œModel not found” error

  • βœ… Verify model is downloaded: ollama list
  • βœ… Re-download: ollama pull whisper:base
  • βœ… Restart Ollama service

Empty transcription results

  • βœ… Check audio file size (must be > 1KB)
  • βœ… Verify audio duration (minimum 1 second)
  • βœ… Test with a longer recording (30+ seconds)
  • βœ… Try a different model

Very slow processing

  • βœ… Use a smaller model (tiny or base)
  • βœ… Close other applications
  • βœ… Check CPU usage in task manager
  • βœ… Consider upgrading hardware

Speed up transcription:

  1. Use whisper:tiny for quick drafts
  2. Enable GPU acceleration (NVIDIA GPUs only)
  3. Close resource-intensive applications
  4. Upgrade RAM if using large models

Speed up analysis:

  1. Use standard llama3.1:latest (not 70B)
  2. Process shorter transcripts
  3. Disable auto-analyze for batch processing
  4. Consider cloud provider for complex analysis

πŸŽ‰ Congratulations! You’ve set up 100% local processing.

  1. πŸ“Š Test with real meetings - Record or upload actual transcripts
  2. ⚑ Optimize models - Experiment with different sizes for your hardware
  3. πŸ’Ύ Set up backups β†’ 09_DATA_MANAGEMENT_GUIDE.md
  4. πŸ”„ Try hybrid mode - Use local transcription + cloud analysis β†’ 13_ADVANCED_WORKFLOWS_GUIDE.md
  • GPU acceleration for faster processing
  • Custom model tuning for domain-specific accuracy
  • Batch processing scripts for multiple files
  • Docker deployment for isolated environments

πŸ”’ Your data, your device, your control.