03 - Cloud Provider Setup Guide

☁️ Fast AI Processing with OpenAI & Google Gemini
⏱️ Time Estimate: 10-15 minutes
📋 What You’ll Learn: How to configure cloud AI providers, secure API keys, understand the two-stage pipeline, and optimize costs

Why Choose Cloud Providers?
Obtaining API Keys
Secure API Key Storage
Two-Stage LLM Pipeline
Configuring Providers in Selfoss
Provider Comparison
Cost Tracking & Optimization
API Rate Limits
Hybrid Configurations
Troubleshooting

Why Choose Cloud Providers?

✅ Benefits

Feature	Cloud Mode	Local Mode
Speed	⚡ Very Fast (5-30 seconds)	⏱️ Moderate (1-5 minutes)
Setup	✨ Just API keys	🔧 Install Ollama + models
Hardware	☁️ No requirements	💻 8GB+ RAM, good CPU
Accuracy	⭐⭐⭐⭐⭐ State-of-the-art	⭐⭐⭐⭐ Very good
Cost	💰 Pay-per-use ($0.01-0.50/transcript)	✅ Free (after setup)
Internet	❌ Required	✅ Works offline

🎯 Perfect For:

⚡ Users who need fast processing
💼 Low-volume usage (occasional meetings)
🚀 Quick testing and evaluation
🖥️ Users with limited hardware
🌐 Always-online environments

Obtaining API Keys

OpenAI (Whisper + GPT Models)

Step 1: Create an Account

Visit https://platform.openai.com/signup
Sign up with email or Google/Microsoft account
Verify your email address

Step 2: Add Payment Method

Go to https://platform.openai.com/account/billing
Click “Add payment method”
Enter credit card details
Set a monthly spending limit (recommended: $10-50)

💡 Pro Tip: Set a low spending limit initially to avoid surprise charges.

Step 3: Generate API Key

Go to https://platform.openai.com/api-keys
Click ”+ Create new secret key”
Give it a name: “Selfoss Desktop App”
Click “Create secret key”
📋 Copy and save the key (you won’t see it again!)

Key Format: sk-proj-... (starts with sk-)

⚠️ Important: Never share your API key. Treat it like a password.

Step 4: Check Credits

New accounts get $5 free credits (expires after 3 months)
View usage at https://platform.openai.com/usage

Google Gemini (Fast & Cost-Effective)

Step 1: Get API Access

Visit https://makersuite.google.com/app/apikey
Sign in with your Google account
Click “Get API key”

Step 2: Create API Key

Click “Create API key”
Select “Create API key in new project” (or use existing)
📋 Copy your API key

Key Format: AIza... (starts with AIza)

Step 3: Check Quota

Free tier: 60 requests per minute
Paid tier: Higher limits based on billing

💡 Pro Tip: Gemini 1.5 Flash is significantly cheaper than GPT-4 while maintaining excellent quality.

Secure API Key Storage

Selfoss uses encrypted storage to protect your API keys.

How Selfoss Protects Your Keys

Encrypted at rest: Keys stored in SQLite database with encryption (planned: Tauri secure storage)
Never logged: Keys never appear in debug logs
Local only: Keys never leave your device
Secure transmission: HTTPS for all API calls

Best Practices

✅ DO:

Use unique API keys per application
Set spending limits on provider dashboards
Rotate keys periodically (every 3-6 months)
Store backup of keys in password manager
Delete keys from Selfoss before sharing device

❌ DON’T:

Share keys via email or chat
Commit keys to version control
Use production keys for testing
Leave keys on shared computers

💡 Pro Tip: Create separate API keys for different devices so you can revoke access individually.

Two-Stage LLM Pipeline

Selfoss separates AI processing into two independent stages:

Stage 1: Transcription (Audio → Text)

Purpose: Convert audio to text

Providers:

OpenAI Whisper API (cloud, fast, accurate)
Ollama Whisper (local, free, private)

Input: Audio file (.webm, .wav, .mp3)
Output: Plain text transcript

Cost Example (OpenAI):

1 hour audio: ~$0.36 ($0.006/minute)
10 hours: ~$3.60

Stage 2: Analysis (Text → Insights)

Purpose: Extract decisions, actions, concepts

Providers:

OpenAI GPT (GPT-4o, GPT-4o-mini, GPT-3.5-turbo)
Google Gemini (Gemini 1.5 Flash, Gemini 1.5 Pro)
Ollama (Llama 3.1, local)

Input: Plain text transcript
Output: Structured JSON with decisions, actions, concepts

Cost Example (OpenAI GPT-4o-mini):

10-page transcript: ~$0.02-0.05
50-page transcript: ~$0.10-0.25

Why Two Stages?

✅ Flexibility: Mix and match providers (e.g., local transcription + cloud analysis)
✅ Cost optimization: Use cheap transcription + expensive analysis
✅ Privacy control: Keep sensitive audio local, send only text to cloud
✅ Reliability: If one stage fails, you can retry independently

Configuring Providers in Selfoss

Configuration UI

Open Settings (⚙️ in header)
Navigate to LLM & Processing section
Configure each stage independently

Example 1: OpenAI for Both Stages

Transcription Settings:

Provider:   OpenAI
Model:      whisper-1
API Key:    sk-proj-xxxxxxxxxxxxx

Analysis Settings:

Provider:   OpenAI
Model:      gpt-4o-mini
API Key:    sk-proj-xxxxxxxxxxxxx (same key)

⏱️ Processing Time: 10-30 seconds total
💰 Cost: ~$0.05-0.15 per transcript

Example 2: Google Gemini (Cost-Optimized)

Transcription Settings:

Provider:   Gemini
Model:      gemini-1.5-flash
API Key:    AIzaxxxxxxxxxxxxxx

Analysis Settings:

Provider:   Gemini
Model:      gemini-1.5-flash
API Key:    AIzaxxxxxxxxxxxxxx (same key)

⏱️ Processing Time: 15-40 seconds total
💰 Cost: ~$0.01-0.05 per transcript (cheaper than OpenAI!)

Example 3: Hybrid (Privacy + Performance)

Transcription Settings:

Provider:   Ollama (local)
Model:      whisper:base
Endpoint:   http://localhost:11434

Analysis Settings:

Provider:   OpenAI
Model:      gpt-4o
API Key:    sk-proj-xxxxxxxxxxxxx

⏱️ Processing Time: 30-90 seconds total
💰 Cost: ~$0.03-0.08 per transcript
🔒 Privacy: Audio stays local, only text sent to cloud

Provider Comparison

Transcription Providers

Provider	Speed	Accuracy	Cost (1hr audio)	Privacy
OpenAI Whisper	⚡⚡⚡ Fast (30s)	⭐⭐⭐⭐⭐ Excellent	$0.36	☁️ Cloud
Ollama Whisper	⏱️ Moderate (2-5min)	⭐⭐⭐⭐ Very Good	Free	🔒 Local
Gemini Audio	⚡⚡ Fast (45s)	⭐⭐⭐⭐ Very Good	$0.15-0.30	☁️ Cloud

Analysis Providers

Provider	Model	Speed	Quality	Cost (10-page)	Context
OpenAI	GPT-4o	⚡⚡⚡	⭐⭐⭐⭐⭐	$0.05-0.10	128K tokens
OpenAI	GPT-4o-mini	⚡⚡⚡	⭐⭐⭐⭐	$0.01-0.03	128K tokens
OpenAI	GPT-3.5-turbo	⚡⚡⚡	⭐⭐⭐	$0.01-0.02	16K tokens
Gemini	1.5 Flash	⚡⚡	⭐⭐⭐⭐	$0.005-0.015	1M tokens
Gemini	1.5 Pro	⚡	⭐⭐⭐⭐⭐	$0.03-0.08	2M tokens
Ollama	Llama 3.1	⏱️	⭐⭐⭐⭐	Free	128K tokens

Recommended Configurations

🚀 Fastest (High Volume):

Transcription: OpenAI Whisper
Analysis: GPT-4o-mini
Cost: ~$0.05/transcript
Speed: 10-20 seconds

💰 Cheapest (Budget):

Transcription: Gemini Audio
Analysis: Gemini 1.5 Flash
Cost: ~$0.01-0.02/transcript
Speed: 20-40 seconds

🔒 Most Private (Hybrid):

Transcription: Ollama Whisper (local)
Analysis: GPT-4o-mini (cloud)
Cost: ~$0.02/transcript
Speed: 60-120 seconds

⭐ Best Quality:

Transcription: OpenAI Whisper
Analysis: GPT-4o or Gemini 1.5 Pro
Cost: ~$0.08-0.15/transcript
Speed: 15-30 seconds

Cost Tracking & Optimization

Understanding Costs

Transcription Costs:

OpenAI Whisper: $0.006/minute
- 15-minute meeting: $0.09
- 1-hour meeting:   $0.36
- 10 hours/month:   $3.60

Analysis Costs (GPT-4o-mini):

Input:  $0.15 per 1M tokens
Output: $0.60 per 1M tokens

Typical 10-page transcript:
- Input:  ~5,000 tokens  = $0.00075
- Output: ~2,000 tokens  = $0.0012
- Total:                 = $0.002

Combined Cost Examples:

Short meeting (15 min, 3 pages):
Transcription: $0.09
Analysis:      $0.01
Total:         $0.10

Long meeting (2 hours, 40 pages):
Transcription: $0.72
Analysis:      $0.08
Total:         $0.80

Cost Optimization Strategies

Use Ollama for transcription (free, local)
- Saves ~$0.36 per hour of audio
- Only pay for analysis (~$0.02-0.05)
Choose cheaper analysis models
- GPT-4o-mini instead of GPT-4o: 75% savings
- Gemini Flash instead of Pro: 85% savings
Set spending limits
- OpenAI: Set monthly budget cap
- Monitor usage in provider dashboards
Batch process during off-peak (if volume-based pricing)
Pre-process transcripts (clean up noise/filler) to reduce token usage

Monitoring Usage in Selfoss

Selfoss tracks costs automatically:

Go to Settings → Usage Statistics (planned feature)
View:
- Total tokens used (by provider)
- Estimated costs
- Usage by project
- Cost trends over time

💡 Pro Tip: Export usage data monthly for expense tracking.

API Rate Limits

OpenAI Limits

Free Tier ($5 credit):

RPM: 3 requests per minute
TPM: 40,000 tokens per minute
Concurrent: 1 request at a time

Paid Tier (Tier 1):

RPM: 500 requests per minute
TPM: 2,000,000 tokens per minute
Concurrent: Multiple

⚠️ What happens if exceeded: HTTP 429 error, Selfoss will retry automatically

Google Gemini Limits

Free Tier:

RPM: 60 requests per minute
TPD: 1,500 requests per day

Paid Tier:

RPM: 1,000+ requests per minute
TPD: 50,000+ requests per day

Handling Rate Limits

Selfoss automatically handles rate limits:

Exponential backoff: Waits longer between retries
Queue system: Processes requests sequentially
User feedback: Shows “Rate limit reached, retrying…”

💡 Pro Tip: If you hit limits frequently, consider upgrading your provider tier or using local processing.

Hybrid Configurations

Combine local and cloud for optimal privacy, cost, and performance.

Scenario 1: Privacy-Conscious

Goal: Keep audio private, use cloud for analysis

Setup:

Transcription: Ollama Whisper (local)
Analysis: GPT-4o-mini (cloud)

Benefits:

🔒 Audio never leaves device
💰 Save transcription costs
⚡ Fast analysis with cloud models

Trade-off: Slower transcription (2-5 min)

Scenario 2: Cost-Optimized

Goal: Minimize API costs

Setup:

Transcription: Ollama Whisper (local)
Analysis: Gemini 1.5 Flash (cloud)

Benefits:

💰 ~$0.01 per transcript
🔒 Audio stays local
⚡ Fast cloud analysis

Trade-off: Moderate transcription speed

Scenario 3: Speed-Optimized

Goal: Fastest possible processing

Setup:

Transcription: OpenAI Whisper (cloud)
Analysis: GPT-4o (cloud)

Benefits:

⚡ 10-20 second total processing
⭐ Best quality results

Trade-off: Higher cost (~$0.40-0.50 per transcript)

👉 Learn More: See 13_ADVANCED_WORKFLOWS_GUIDE.md

Troubleshooting

OpenAI Issues

“Invalid API key” error

✅ Check key starts with 'sk-'
✅ Verify no extra spaces
✅ Try creating new key
✅ Check account has credits

“Insufficient quota” error

✅ Add payment method to OpenAI account
✅ Check monthly spending limit
✅ View usage dashboard for current spend

“Rate limit exceeded” error

✅ Wait 60 seconds and retry
✅ Upgrade to paid tier
✅ Process fewer transcripts simultaneously

“Model not found” error

✅ Check model name spelling
✅ Use 'whisper-1' (not 'whisper')
✅ Use 'gpt-4o-mini' (not 'gpt4omini')
✅ Selfoss auto-corrects common typos

Gemini Issues

“API key not valid” error

✅ Check key starts with 'AIza'
✅ Enable Generative Language API in Google Cloud Console
✅ Verify API key restrictions (if any)

“Resource exhausted” (quota) error

✅ Wait for quota reset (per minute/day)
✅ Upgrade to paid tier
✅ Reduce request frequency

“Permission denied” error

✅ Check API is enabled in Google Cloud Console
✅ Verify billing is set up (for paid tier)

Network Issues

“Connection timeout” error

✅ Check internet connection
✅ Verify firewall/proxy settings
✅ Try different network
✅ Check provider status pages:
   - OpenAI: status.openai.com
   - Google: status.cloud.google.com

“SSL certificate” error

✅ Update system certificates
✅ Check system date/time is correct
✅ Disable antivirus SSL scanning temporarily

Next Steps

🎉 Congratulations! You’ve configured cloud AI providers.

Recommended Actions:

🧪 Test with sample transcript - Verify everything works
📊 Monitor costs - Track usage for first month
⚙️ Optimize configuration - Adjust based on usage patterns
🔒 Consider hybrid mode → 13_ADVANCED_WORKFLOWS_GUIDE.md
💾 Set up backups → 09_DATA_MANAGEMENT_GUIDE.md

Advanced Topics:

Custom models for specialized domains
Batch processing for multiple transcripts
Cost forecasting based on usage patterns
Provider switching strategies

☁️ Fast, accurate, and flexible AI processing.

03 - Cloud Provider Setup Guide

03 - Cloud Provider Setup Guide

Table of Contents

Why Choose Cloud Providers?

✅ Benefits

🎯 Perfect For:

Obtaining API Keys

OpenAI (Whisper + GPT Models)

Google Gemini (Fast & Cost-Effective)

Secure API Key Storage

How Selfoss Protects Your Keys

Best Practices

Two-Stage LLM Pipeline

Stage 1: Transcription (Audio → Text)

Stage 2: Analysis (Text → Insights)

Why Two Stages?

Configuring Providers in Selfoss

Configuration UI

Example 1: OpenAI for Both Stages

Example 2: Google Gemini (Cost-Optimized)

Example 3: Hybrid (Privacy + Performance)

Provider Comparison

Transcription Providers

Analysis Providers

Recommended Configurations

Cost Tracking & Optimization

Understanding Costs

Cost Optimization Strategies

Monitoring Usage in Selfoss

API Rate Limits

OpenAI Limits

Google Gemini Limits

Handling Rate Limits

Hybrid Configurations

Scenario 1: Privacy-Conscious

Scenario 2: Cost-Optimized

Scenario 3: Speed-Optimized

Troubleshooting

OpenAI Issues

Gemini Issues

Network Issues

Next Steps

Recommended Actions:

Advanced Topics: