Skip to content

03 - Cloud Provider Setup Guide

☁️ Fast AI Processing with OpenAI & Google Gemini
⏱️ Time Estimate: 10-15 minutes
📋 What You’ll Learn: How to configure cloud AI providers, secure API keys, understand the two-stage pipeline, and optimize costs



FeatureCloud ModeLocal Mode
Speed⚡ Very Fast (5-30 seconds)⏱️ Moderate (1-5 minutes)
Setup✨ Just API keys🔧 Install Ollama + models
Hardware☁️ No requirements💻 8GB+ RAM, good CPU
Accuracy⭐⭐⭐⭐⭐ State-of-the-art⭐⭐⭐⭐ Very good
Cost💰 Pay-per-use ($0.01-0.50/transcript)✅ Free (after setup)
Internet❌ Required✅ Works offline
  • ⚡ Users who need fast processing
  • 💼 Low-volume usage (occasional meetings)
  • 🚀 Quick testing and evaluation
  • 🖥️ Users with limited hardware
  • 🌐 Always-online environments

Step 1: Create an Account

  1. Visit https://platform.openai.com/signup
  2. Sign up with email or Google/Microsoft account
  3. Verify your email address

Step 2: Add Payment Method

  1. Go to https://platform.openai.com/account/billing
  2. Click “Add payment method”
  3. Enter credit card details
  4. Set a monthly spending limit (recommended: $10-50)

💡 Pro Tip: Set a low spending limit initially to avoid surprise charges.

Step 3: Generate API Key

  1. Go to https://platform.openai.com/api-keys
  2. Click ”+ Create new secret key”
  3. Give it a name: “Selfoss Desktop App”
  4. Click “Create secret key”
  5. 📋 Copy and save the key (you won’t see it again!)

Key Format: sk-proj-... (starts with sk-)

⚠️ Important: Never share your API key. Treat it like a password.

Step 4: Check Credits


Step 1: Get API Access

  1. Visit https://makersuite.google.com/app/apikey
  2. Sign in with your Google account
  3. Click “Get API key”

Step 2: Create API Key

  1. Click “Create API key”
  2. Select “Create API key in new project” (or use existing)
  3. 📋 Copy your API key

Key Format: AIza... (starts with AIza)

Step 3: Check Quota

  • Free tier: 60 requests per minute
  • Paid tier: Higher limits based on billing

💡 Pro Tip: Gemini 1.5 Flash is significantly cheaper than GPT-4 while maintaining excellent quality.


Selfoss uses encrypted storage to protect your API keys.

  1. Encrypted at rest: Keys stored in SQLite database with encryption (planned: Tauri secure storage)
  2. Never logged: Keys never appear in debug logs
  3. Local only: Keys never leave your device
  4. Secure transmission: HTTPS for all API calls

DO:

  • Use unique API keys per application
  • Set spending limits on provider dashboards
  • Rotate keys periodically (every 3-6 months)
  • Store backup of keys in password manager
  • Delete keys from Selfoss before sharing device

DON’T:

  • Share keys via email or chat
  • Commit keys to version control
  • Use production keys for testing
  • Leave keys on shared computers

💡 Pro Tip: Create separate API keys for different devices so you can revoke access individually.


Selfoss separates AI processing into two independent stages:

Purpose: Convert audio to text

Providers:

  • OpenAI Whisper API (cloud, fast, accurate)
  • Ollama Whisper (local, free, private)

Input: Audio file (.webm, .wav, .mp3)
Output: Plain text transcript

Cost Example (OpenAI):

  • 1 hour audio: ~$0.36 ($0.006/minute)
  • 10 hours: ~$3.60

Purpose: Extract decisions, actions, concepts

Providers:

  • OpenAI GPT (GPT-4o, GPT-4o-mini, GPT-3.5-turbo)
  • Google Gemini (Gemini 1.5 Flash, Gemini 1.5 Pro)
  • Ollama (Llama 3.1, local)

Input: Plain text transcript
Output: Structured JSON with decisions, actions, concepts

Cost Example (OpenAI GPT-4o-mini):

  • 10-page transcript: ~$0.02-0.05
  • 50-page transcript: ~$0.10-0.25

Flexibility: Mix and match providers (e.g., local transcription + cloud analysis)
Cost optimization: Use cheap transcription + expensive analysis
Privacy control: Keep sensitive audio local, send only text to cloud
Reliability: If one stage fails, you can retry independently


  1. Open Settings (⚙️ in header)
  2. Navigate to LLM & Processing section
  3. Configure each stage independently

Transcription Settings:

Provider: OpenAI
Model: whisper-1
API Key: sk-proj-xxxxxxxxxxxxx

Analysis Settings:

Provider: OpenAI
Model: gpt-4o-mini
API Key: sk-proj-xxxxxxxxxxxxx (same key)

⏱️ Processing Time: 10-30 seconds total
💰 Cost: ~$0.05-0.15 per transcript

Transcription Settings:

Provider: Gemini
Model: gemini-1.5-flash
API Key: AIzaxxxxxxxxxxxxxx

Analysis Settings:

Provider: Gemini
Model: gemini-1.5-flash
API Key: AIzaxxxxxxxxxxxxxx (same key)

⏱️ Processing Time: 15-40 seconds total
💰 Cost: ~$0.01-0.05 per transcript (cheaper than OpenAI!)

Transcription Settings:

Provider: Ollama (local)
Model: whisper:base
Endpoint: http://localhost:11434

Analysis Settings:

Provider: OpenAI
Model: gpt-4o
API Key: sk-proj-xxxxxxxxxxxxx

⏱️ Processing Time: 30-90 seconds total
💰 Cost: ~$0.03-0.08 per transcript
🔒 Privacy: Audio stays local, only text sent to cloud


ProviderSpeedAccuracyCost (1hr audio)Privacy
OpenAI Whisper⚡⚡⚡ Fast (30s)⭐⭐⭐⭐⭐ Excellent$0.36☁️ Cloud
Ollama Whisper⏱️ Moderate (2-5min)⭐⭐⭐⭐ Very GoodFree🔒 Local
Gemini Audio⚡⚡ Fast (45s)⭐⭐⭐⭐ Very Good$0.15-0.30☁️ Cloud
ProviderModelSpeedQualityCost (10-page)Context
OpenAIGPT-4o⚡⚡⚡⭐⭐⭐⭐⭐$0.05-0.10128K tokens
OpenAIGPT-4o-mini⚡⚡⚡⭐⭐⭐⭐$0.01-0.03128K tokens
OpenAIGPT-3.5-turbo⚡⚡⚡⭐⭐⭐$0.01-0.0216K tokens
Gemini1.5 Flash⚡⚡⭐⭐⭐⭐$0.005-0.0151M tokens
Gemini1.5 Pro⭐⭐⭐⭐⭐$0.03-0.082M tokens
OllamaLlama 3.1⏱️⭐⭐⭐⭐Free128K tokens

🚀 Fastest (High Volume):

  • Transcription: OpenAI Whisper
  • Analysis: GPT-4o-mini
  • Cost: ~$0.05/transcript
  • Speed: 10-20 seconds

💰 Cheapest (Budget):

  • Transcription: Gemini Audio
  • Analysis: Gemini 1.5 Flash
  • Cost: ~$0.01-0.02/transcript
  • Speed: 20-40 seconds

🔒 Most Private (Hybrid):

  • Transcription: Ollama Whisper (local)
  • Analysis: GPT-4o-mini (cloud)
  • Cost: ~$0.02/transcript
  • Speed: 60-120 seconds

⭐ Best Quality:

  • Transcription: OpenAI Whisper
  • Analysis: GPT-4o or Gemini 1.5 Pro
  • Cost: ~$0.08-0.15/transcript
  • Speed: 15-30 seconds

Transcription Costs:

OpenAI Whisper: $0.006/minute
- 15-minute meeting: $0.09
- 1-hour meeting: $0.36
- 10 hours/month: $3.60

Analysis Costs (GPT-4o-mini):

Input: $0.15 per 1M tokens
Output: $0.60 per 1M tokens
Typical 10-page transcript:
- Input: ~5,000 tokens = $0.00075
- Output: ~2,000 tokens = $0.0012
- Total: = $0.002

Combined Cost Examples:

Short meeting (15 min, 3 pages):
Transcription: $0.09
Analysis: $0.01
Total: $0.10
Long meeting (2 hours, 40 pages):
Transcription: $0.72
Analysis: $0.08
Total: $0.80
  1. Use Ollama for transcription (free, local)

    • Saves ~$0.36 per hour of audio
    • Only pay for analysis (~$0.02-0.05)
  2. Choose cheaper analysis models

    • GPT-4o-mini instead of GPT-4o: 75% savings
    • Gemini Flash instead of Pro: 85% savings
  3. Set spending limits

    • OpenAI: Set monthly budget cap
    • Monitor usage in provider dashboards
  4. Batch process during off-peak (if volume-based pricing)

  5. Pre-process transcripts (clean up noise/filler) to reduce token usage

Selfoss tracks costs automatically:

  1. Go to Settings → Usage Statistics (planned feature)
  2. View:
    • Total tokens used (by provider)
    • Estimated costs
    • Usage by project
    • Cost trends over time

💡 Pro Tip: Export usage data monthly for expense tracking.


Free Tier ($5 credit):

  • RPM: 3 requests per minute
  • TPM: 40,000 tokens per minute
  • Concurrent: 1 request at a time

Paid Tier (Tier 1):

  • RPM: 500 requests per minute
  • TPM: 2,000,000 tokens per minute
  • Concurrent: Multiple

⚠️ What happens if exceeded: HTTP 429 error, Selfoss will retry automatically

Free Tier:

  • RPM: 60 requests per minute
  • TPD: 1,500 requests per day

Paid Tier:

  • RPM: 1,000+ requests per minute
  • TPD: 50,000+ requests per day

Selfoss automatically handles rate limits:

  1. Exponential backoff: Waits longer between retries
  2. Queue system: Processes requests sequentially
  3. User feedback: Shows “Rate limit reached, retrying…”

💡 Pro Tip: If you hit limits frequently, consider upgrading your provider tier or using local processing.


Combine local and cloud for optimal privacy, cost, and performance.

Goal: Keep audio private, use cloud for analysis

Setup:

  • Transcription: Ollama Whisper (local)
  • Analysis: GPT-4o-mini (cloud)

Benefits:

  • 🔒 Audio never leaves device
  • 💰 Save transcription costs
  • ⚡ Fast analysis with cloud models

Trade-off: Slower transcription (2-5 min)

Goal: Minimize API costs

Setup:

  • Transcription: Ollama Whisper (local)
  • Analysis: Gemini 1.5 Flash (cloud)

Benefits:

  • 💰 ~$0.01 per transcript
  • 🔒 Audio stays local
  • ⚡ Fast cloud analysis

Trade-off: Moderate transcription speed

Goal: Fastest possible processing

Setup:

  • Transcription: OpenAI Whisper (cloud)
  • Analysis: GPT-4o (cloud)

Benefits:

  • ⚡ 10-20 second total processing
  • ⭐ Best quality results

Trade-off: Higher cost (~$0.40-0.50 per transcript)

👉 Learn More: See 13_ADVANCED_WORKFLOWS_GUIDE.md


“Invalid API key” error

✅ Check key starts with 'sk-'
✅ Verify no extra spaces
✅ Try creating new key
✅ Check account has credits

“Insufficient quota” error

✅ Add payment method to OpenAI account
✅ Check monthly spending limit
✅ View usage dashboard for current spend

“Rate limit exceeded” error

✅ Wait 60 seconds and retry
✅ Upgrade to paid tier
✅ Process fewer transcripts simultaneously

“Model not found” error

✅ Check model name spelling
✅ Use 'whisper-1' (not 'whisper')
✅ Use 'gpt-4o-mini' (not 'gpt4omini')
✅ Selfoss auto-corrects common typos

“API key not valid” error

✅ Check key starts with 'AIza'
✅ Enable Generative Language API in Google Cloud Console
✅ Verify API key restrictions (if any)

“Resource exhausted” (quota) error

✅ Wait for quota reset (per minute/day)
✅ Upgrade to paid tier
✅ Reduce request frequency

“Permission denied” error

✅ Check API is enabled in Google Cloud Console
✅ Verify billing is set up (for paid tier)

“Connection timeout” error

✅ Check internet connection
✅ Verify firewall/proxy settings
✅ Try different network
✅ Check provider status pages:
- OpenAI: status.openai.com
- Google: status.cloud.google.com

“SSL certificate” error

✅ Update system certificates
✅ Check system date/time is correct
✅ Disable antivirus SSL scanning temporarily

🎉 Congratulations! You’ve configured cloud AI providers.

  1. 🧪 Test with sample transcript - Verify everything works
  2. 📊 Monitor costs - Track usage for first month
  3. ⚙️ Optimize configuration - Adjust based on usage patterns
  4. 🔒 Consider hybrid mode13_ADVANCED_WORKFLOWS_GUIDE.md
  5. 💾 Set up backups09_DATA_MANAGEMENT_GUIDE.md
  • Custom models for specialized domains
  • Batch processing for multiple transcripts
  • Cost forecasting based on usage patterns
  • Provider switching strategies

☁️ Fast, accurate, and flexible AI processing.