03 - Cloud Provider Setup Guide
03 - Cloud Provider Setup Guide
Section titled “03 - Cloud Provider Setup Guide”☁️ Fast AI Processing with OpenAI & Google Gemini
⏱️ Time Estimate: 10-15 minutes
📋 What You’ll Learn: How to configure cloud AI providers, secure API keys, understand the two-stage pipeline, and optimize costs
Table of Contents
Section titled “Table of Contents”- Why Choose Cloud Providers?
- Obtaining API Keys
- Secure API Key Storage
- Two-Stage LLM Pipeline
- Configuring Providers in Selfoss
- Provider Comparison
- Cost Tracking & Optimization
- API Rate Limits
- Hybrid Configurations
- Troubleshooting
Why Choose Cloud Providers?
Section titled “Why Choose Cloud Providers?”✅ Benefits
Section titled “✅ Benefits”| Feature | Cloud Mode | Local Mode |
|---|---|---|
| Speed | ⚡ Very Fast (5-30 seconds) | ⏱️ Moderate (1-5 minutes) |
| Setup | ✨ Just API keys | 🔧 Install Ollama + models |
| Hardware | ☁️ No requirements | 💻 8GB+ RAM, good CPU |
| Accuracy | ⭐⭐⭐⭐⭐ State-of-the-art | ⭐⭐⭐⭐ Very good |
| Cost | 💰 Pay-per-use ($0.01-0.50/transcript) | ✅ Free (after setup) |
| Internet | ❌ Required | ✅ Works offline |
🎯 Perfect For:
Section titled “🎯 Perfect For:”- ⚡ Users who need fast processing
- 💼 Low-volume usage (occasional meetings)
- 🚀 Quick testing and evaluation
- 🖥️ Users with limited hardware
- 🌐 Always-online environments
Obtaining API Keys
Section titled “Obtaining API Keys”OpenAI (Whisper + GPT Models)
Section titled “OpenAI (Whisper + GPT Models)”Step 1: Create an Account
- Visit https://platform.openai.com/signup
- Sign up with email or Google/Microsoft account
- Verify your email address
Step 2: Add Payment Method
- Go to https://platform.openai.com/account/billing
- Click “Add payment method”
- Enter credit card details
- Set a monthly spending limit (recommended: $10-50)
💡 Pro Tip: Set a low spending limit initially to avoid surprise charges.
Step 3: Generate API Key
- Go to https://platform.openai.com/api-keys
- Click ”+ Create new secret key”
- Give it a name: “Selfoss Desktop App”
- Click “Create secret key”
- 📋 Copy and save the key (you won’t see it again!)
Key Format: sk-proj-... (starts with sk-)
⚠️ Important: Never share your API key. Treat it like a password.
Step 4: Check Credits
- New accounts get $5 free credits (expires after 3 months)
- View usage at https://platform.openai.com/usage
Google Gemini (Fast & Cost-Effective)
Section titled “Google Gemini (Fast & Cost-Effective)”Step 1: Get API Access
- Visit https://makersuite.google.com/app/apikey
- Sign in with your Google account
- Click “Get API key”
Step 2: Create API Key
- Click “Create API key”
- Select “Create API key in new project” (or use existing)
- 📋 Copy your API key
Key Format: AIza... (starts with AIza)
Step 3: Check Quota
- Free tier: 60 requests per minute
- Paid tier: Higher limits based on billing
💡 Pro Tip: Gemini 1.5 Flash is significantly cheaper than GPT-4 while maintaining excellent quality.
Secure API Key Storage
Section titled “Secure API Key Storage”Selfoss uses encrypted storage to protect your API keys.
How Selfoss Protects Your Keys
Section titled “How Selfoss Protects Your Keys”- Encrypted at rest: Keys stored in SQLite database with encryption (planned: Tauri secure storage)
- Never logged: Keys never appear in debug logs
- Local only: Keys never leave your device
- Secure transmission: HTTPS for all API calls
Best Practices
Section titled “Best Practices”✅ DO:
- Use unique API keys per application
- Set spending limits on provider dashboards
- Rotate keys periodically (every 3-6 months)
- Store backup of keys in password manager
- Delete keys from Selfoss before sharing device
❌ DON’T:
- Share keys via email or chat
- Commit keys to version control
- Use production keys for testing
- Leave keys on shared computers
💡 Pro Tip: Create separate API keys for different devices so you can revoke access individually.
Two-Stage LLM Pipeline
Section titled “Two-Stage LLM Pipeline”Selfoss separates AI processing into two independent stages:
Stage 1: Transcription (Audio → Text)
Section titled “Stage 1: Transcription (Audio → Text)”Purpose: Convert audio to text
Providers:
- OpenAI Whisper API (cloud, fast, accurate)
- Ollama Whisper (local, free, private)
Input: Audio file (.webm, .wav, .mp3)
Output: Plain text transcript
Cost Example (OpenAI):
- 1 hour audio: ~$0.36 ($0.006/minute)
- 10 hours: ~$3.60
Stage 2: Analysis (Text → Insights)
Section titled “Stage 2: Analysis (Text → Insights)”Purpose: Extract decisions, actions, concepts
Providers:
- OpenAI GPT (GPT-4o, GPT-4o-mini, GPT-3.5-turbo)
- Google Gemini (Gemini 1.5 Flash, Gemini 1.5 Pro)
- Ollama (Llama 3.1, local)
Input: Plain text transcript
Output: Structured JSON with decisions, actions, concepts
Cost Example (OpenAI GPT-4o-mini):
- 10-page transcript: ~$0.02-0.05
- 50-page transcript: ~$0.10-0.25
Why Two Stages?
Section titled “Why Two Stages?”✅ Flexibility: Mix and match providers (e.g., local transcription + cloud analysis)
✅ Cost optimization: Use cheap transcription + expensive analysis
✅ Privacy control: Keep sensitive audio local, send only text to cloud
✅ Reliability: If one stage fails, you can retry independently
Configuring Providers in Selfoss
Section titled “Configuring Providers in Selfoss”Configuration UI
Section titled “Configuration UI”- Open Settings (⚙️ in header)
- Navigate to LLM & Processing section
- Configure each stage independently
Example 1: OpenAI for Both Stages
Section titled “Example 1: OpenAI for Both Stages”Transcription Settings:
Provider: OpenAIModel: whisper-1API Key: sk-proj-xxxxxxxxxxxxxAnalysis Settings:
Provider: OpenAIModel: gpt-4o-miniAPI Key: sk-proj-xxxxxxxxxxxxx (same key)⏱️ Processing Time: 10-30 seconds total
💰 Cost: ~$0.05-0.15 per transcript
Example 2: Google Gemini (Cost-Optimized)
Section titled “Example 2: Google Gemini (Cost-Optimized)”Transcription Settings:
Provider: GeminiModel: gemini-1.5-flashAPI Key: AIzaxxxxxxxxxxxxxxAnalysis Settings:
Provider: GeminiModel: gemini-1.5-flashAPI Key: AIzaxxxxxxxxxxxxxx (same key)⏱️ Processing Time: 15-40 seconds total
💰 Cost: ~$0.01-0.05 per transcript (cheaper than OpenAI!)
Example 3: Hybrid (Privacy + Performance)
Section titled “Example 3: Hybrid (Privacy + Performance)”Transcription Settings:
Provider: Ollama (local)Model: whisper:baseEndpoint: http://localhost:11434Analysis Settings:
Provider: OpenAIModel: gpt-4oAPI Key: sk-proj-xxxxxxxxxxxxx⏱️ Processing Time: 30-90 seconds total
💰 Cost: ~$0.03-0.08 per transcript
🔒 Privacy: Audio stays local, only text sent to cloud
Provider Comparison
Section titled “Provider Comparison”Transcription Providers
Section titled “Transcription Providers”| Provider | Speed | Accuracy | Cost (1hr audio) | Privacy |
|---|---|---|---|---|
| OpenAI Whisper | ⚡⚡⚡ Fast (30s) | ⭐⭐⭐⭐⭐ Excellent | $0.36 | ☁️ Cloud |
| Ollama Whisper | ⏱️ Moderate (2-5min) | ⭐⭐⭐⭐ Very Good | Free | 🔒 Local |
| Gemini Audio | ⚡⚡ Fast (45s) | ⭐⭐⭐⭐ Very Good | $0.15-0.30 | ☁️ Cloud |
Analysis Providers
Section titled “Analysis Providers”| Provider | Model | Speed | Quality | Cost (10-page) | Context |
|---|---|---|---|---|---|
| OpenAI | GPT-4o | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | $0.05-0.10 | 128K tokens |
| OpenAI | GPT-4o-mini | ⚡⚡⚡ | ⭐⭐⭐⭐ | $0.01-0.03 | 128K tokens |
| OpenAI | GPT-3.5-turbo | ⚡⚡⚡ | ⭐⭐⭐ | $0.01-0.02 | 16K tokens |
| Gemini | 1.5 Flash | ⚡⚡ | ⭐⭐⭐⭐ | $0.005-0.015 | 1M tokens |
| Gemini | 1.5 Pro | ⚡ | ⭐⭐⭐⭐⭐ | $0.03-0.08 | 2M tokens |
| Ollama | Llama 3.1 | ⏱️ | ⭐⭐⭐⭐ | Free | 128K tokens |
Recommended Configurations
Section titled “Recommended Configurations”🚀 Fastest (High Volume):
- Transcription: OpenAI Whisper
- Analysis: GPT-4o-mini
- Cost: ~$0.05/transcript
- Speed: 10-20 seconds
💰 Cheapest (Budget):
- Transcription: Gemini Audio
- Analysis: Gemini 1.5 Flash
- Cost: ~$0.01-0.02/transcript
- Speed: 20-40 seconds
🔒 Most Private (Hybrid):
- Transcription: Ollama Whisper (local)
- Analysis: GPT-4o-mini (cloud)
- Cost: ~$0.02/transcript
- Speed: 60-120 seconds
⭐ Best Quality:
- Transcription: OpenAI Whisper
- Analysis: GPT-4o or Gemini 1.5 Pro
- Cost: ~$0.08-0.15/transcript
- Speed: 15-30 seconds
Cost Tracking & Optimization
Section titled “Cost Tracking & Optimization”Understanding Costs
Section titled “Understanding Costs”Transcription Costs:
OpenAI Whisper: $0.006/minute- 15-minute meeting: $0.09- 1-hour meeting: $0.36- 10 hours/month: $3.60Analysis Costs (GPT-4o-mini):
Input: $0.15 per 1M tokensOutput: $0.60 per 1M tokens
Typical 10-page transcript:- Input: ~5,000 tokens = $0.00075- Output: ~2,000 tokens = $0.0012- Total: = $0.002Combined Cost Examples:
Short meeting (15 min, 3 pages):Transcription: $0.09Analysis: $0.01Total: $0.10
Long meeting (2 hours, 40 pages):Transcription: $0.72Analysis: $0.08Total: $0.80Cost Optimization Strategies
Section titled “Cost Optimization Strategies”-
Use Ollama for transcription (free, local)
- Saves ~$0.36 per hour of audio
- Only pay for analysis (~$0.02-0.05)
-
Choose cheaper analysis models
- GPT-4o-mini instead of GPT-4o: 75% savings
- Gemini Flash instead of Pro: 85% savings
-
Set spending limits
- OpenAI: Set monthly budget cap
- Monitor usage in provider dashboards
-
Batch process during off-peak (if volume-based pricing)
-
Pre-process transcripts (clean up noise/filler) to reduce token usage
Monitoring Usage in Selfoss
Section titled “Monitoring Usage in Selfoss”Selfoss tracks costs automatically:
- Go to Settings → Usage Statistics (planned feature)
- View:
- Total tokens used (by provider)
- Estimated costs
- Usage by project
- Cost trends over time
💡 Pro Tip: Export usage data monthly for expense tracking.
API Rate Limits
Section titled “API Rate Limits”OpenAI Limits
Section titled “OpenAI Limits”Free Tier ($5 credit):
- RPM: 3 requests per minute
- TPM: 40,000 tokens per minute
- Concurrent: 1 request at a time
Paid Tier (Tier 1):
- RPM: 500 requests per minute
- TPM: 2,000,000 tokens per minute
- Concurrent: Multiple
⚠️ What happens if exceeded: HTTP 429 error, Selfoss will retry automatically
Google Gemini Limits
Section titled “Google Gemini Limits”Free Tier:
- RPM: 60 requests per minute
- TPD: 1,500 requests per day
Paid Tier:
- RPM: 1,000+ requests per minute
- TPD: 50,000+ requests per day
Handling Rate Limits
Section titled “Handling Rate Limits”Selfoss automatically handles rate limits:
- Exponential backoff: Waits longer between retries
- Queue system: Processes requests sequentially
- User feedback: Shows “Rate limit reached, retrying…”
💡 Pro Tip: If you hit limits frequently, consider upgrading your provider tier or using local processing.
Hybrid Configurations
Section titled “Hybrid Configurations”Combine local and cloud for optimal privacy, cost, and performance.
Scenario 1: Privacy-Conscious
Section titled “Scenario 1: Privacy-Conscious”Goal: Keep audio private, use cloud for analysis
Setup:
- Transcription: Ollama Whisper (local)
- Analysis: GPT-4o-mini (cloud)
Benefits:
- 🔒 Audio never leaves device
- 💰 Save transcription costs
- ⚡ Fast analysis with cloud models
Trade-off: Slower transcription (2-5 min)
Scenario 2: Cost-Optimized
Section titled “Scenario 2: Cost-Optimized”Goal: Minimize API costs
Setup:
- Transcription: Ollama Whisper (local)
- Analysis: Gemini 1.5 Flash (cloud)
Benefits:
- 💰 ~$0.01 per transcript
- 🔒 Audio stays local
- ⚡ Fast cloud analysis
Trade-off: Moderate transcription speed
Scenario 3: Speed-Optimized
Section titled “Scenario 3: Speed-Optimized”Goal: Fastest possible processing
Setup:
- Transcription: OpenAI Whisper (cloud)
- Analysis: GPT-4o (cloud)
Benefits:
- ⚡ 10-20 second total processing
- ⭐ Best quality results
Trade-off: Higher cost (~$0.40-0.50 per transcript)
👉 Learn More: See 13_ADVANCED_WORKFLOWS_GUIDE.md
Troubleshooting
Section titled “Troubleshooting”OpenAI Issues
Section titled “OpenAI Issues”“Invalid API key” error
✅ Check key starts with 'sk-'✅ Verify no extra spaces✅ Try creating new key✅ Check account has credits“Insufficient quota” error
✅ Add payment method to OpenAI account✅ Check monthly spending limit✅ View usage dashboard for current spend“Rate limit exceeded” error
✅ Wait 60 seconds and retry✅ Upgrade to paid tier✅ Process fewer transcripts simultaneously“Model not found” error
✅ Check model name spelling✅ Use 'whisper-1' (not 'whisper')✅ Use 'gpt-4o-mini' (not 'gpt4omini')✅ Selfoss auto-corrects common typosGemini Issues
Section titled “Gemini Issues”“API key not valid” error
✅ Check key starts with 'AIza'✅ Enable Generative Language API in Google Cloud Console✅ Verify API key restrictions (if any)“Resource exhausted” (quota) error
✅ Wait for quota reset (per minute/day)✅ Upgrade to paid tier✅ Reduce request frequency“Permission denied” error
✅ Check API is enabled in Google Cloud Console✅ Verify billing is set up (for paid tier)Network Issues
Section titled “Network Issues”“Connection timeout” error
✅ Check internet connection✅ Verify firewall/proxy settings✅ Try different network✅ Check provider status pages: - OpenAI: status.openai.com - Google: status.cloud.google.com“SSL certificate” error
✅ Update system certificates✅ Check system date/time is correct✅ Disable antivirus SSL scanning temporarilyNext Steps
Section titled “Next Steps”🎉 Congratulations! You’ve configured cloud AI providers.
Recommended Actions:
Section titled “Recommended Actions:”- 🧪 Test with sample transcript - Verify everything works
- 📊 Monitor costs - Track usage for first month
- ⚙️ Optimize configuration - Adjust based on usage patterns
- 🔒 Consider hybrid mode → 13_ADVANCED_WORKFLOWS_GUIDE.md
- 💾 Set up backups → 09_DATA_MANAGEMENT_GUIDE.md
Advanced Topics:
Section titled “Advanced Topics:”- Custom models for specialized domains
- Batch processing for multiple transcripts
- Cost forecasting based on usage patterns
- Provider switching strategies
☁️ Fast, accurate, and flexible AI processing.