14 - Privacy & Security Guide

🔒 Data Protection & Security Best Practices
⏱️ Time Estimate: 15 minutes
📋 What You’ll Learn: Data handling, security features, privacy compliance, best practices

Privacy Philosophy
Local-Only Storage Architecture
Encrypted API Key Storage
No Telemetry Policy
Cloud Provider Privacy
API Key Security
GDPR Compliance
Enterprise Deployment
Secure Backup Strategies

Privacy Philosophy

Selfoss Core Principles

Privacy by Design:

🔒 Local-first: All data stored on your device
🚫 No tracking: Zero telemetry or analytics
🔐 Encrypted keys: Secure API key storage
📍 Data control: You own all your data
🌐 Offline capable: Works without internet (with local models)

User Control:

Choose your AI providers
Decide what goes to cloud
Full export/delete capabilities
Transparent data handling

Local-Only Storage Architecture

Where Your Data Lives

Everything stays local:

Your Device
├── Selfoss Database (SQLite)
│   ├── Projects
│   ├── Transcripts
│   ├── Processed data
│   └── Settings
│
├── Audio Recordings
│   └── WebM files by project
│
└── Whisper Models
    └── Local AI models

What this means:

✅ No Selfoss cloud servers
✅ No data uploaded to us
✅ Complete data sovereignty
✅ Works offline with local models

Data Storage Locations

Windows:

C:\Users\{Username}\AppData\Roaming\selfoss\

macOS:

~/Library/Application Support/selfoss/

Linux:

~/.local/share/selfoss/

Security features:

User-specific directories (OS-level isolation)
File system permissions (read/write by user only)
No shared storage
No system-wide data

Network Activity

Selfoss ONLY connects to:

License server (LemonSqueezy)
- Purpose: License validation
- Frequency: On activation only
- Data sent: License key, device ID
AI providers (if configured)
- OpenAI, Google, Ollama (if remote)
- Purpose: Transcription/analysis
- Frequency: Per transcript processing
- Data sent: Audio/text for processing

Selfoss NEVER connects to:

❌ Analytics servers
❌ Tracking services
❌ Ad networks
❌ Selfoss company servers (except license)

Encrypted API Key Storage

Current Implementation

Storage method:

SQLite database with application-level encryption
Location: selfoss.db (encrypted fields)
Algorithm: AES-256 (planned: OS keyring)

How it works:

User enters API key in Settings
Key encrypted before database storage
Decrypted only when needed for API calls
Never logged or exposed

Future Enhancement: OS Keyring

Planned migration (F007):

Windows:

Windows Credential Manager
System-level encryption
Requires user authentication

macOS:

Keychain Access
Secure Enclave (on supported devices)
TouchID/Password required

Linux:

Secret Service API (libsecret)
GNOME Keyring / KWallet
User password required

Benefits:

✅ OS-level security
✅ Separate from app data
✅ Hardware-backed encryption (where available)
✅ Better protection if device stolen

Security Best Practices

For API keys:

DO:

✅ Use unique keys per application
✅ Set spending limits on provider dashboards
✅ Rotate keys every 3-6 months
✅ Store backup in password manager (encrypted)
✅ Delete keys before device disposal

DON’T:

❌ Share keys via email/chat
❌ Commit to version control
❌ Use same key on shared devices
❌ Screenshot keys
❌ Leave keys on public computers

No Telemetry Policy

What We DON’T Collect

Zero data collection:

❌ Usage statistics
❌ Feature usage
❌ Error reports (unless you submit)
❌ Analytics
❌ Crash reports
❌ Device information
❌ IP addresses

Why This Matters

Your privacy:

No user profiling
No behavior tracking
No data monetization
No third-party sharing (because there’s no data!)

Transparency:

Open source codebase
Auditable network calls
Clear privacy policy
Community verification

Verification

How to verify:

1. Use network monitoring tool:
   - Wireshark (desktop)
   - Charles Proxy (mobile)
   - Browser DevTools (web version)

2. Monitor outbound connections:
   - Should only see AI provider calls
   - License validation on activation
   - No other network activity

3. Review source code:
   - GitHub: shobankr/selfoss
   - Search for analytics calls
   - None found!

Cloud Provider Privacy

When You Use Cloud AI

What leaves your device:

For transcription:

🎵 Audio file (OpenAI, Gemini)
⏱️ Duration metadata
🔤 Returned: Text transcript

For analysis:

📄 Text transcript
🤖 Returned: Structured JSON (decisions, actions, concepts)

What doesn’t leave:

❌ Project names
❌ Other transcripts
❌ Database contents
❌ API keys (used for auth header only)
❌ Personal information (unless in transcript)

Provider Data Policies

OpenAI:

Data retention: 30 days (API data)
Training: Not used (as of 2024)
Policy: api.openai.com/data-usage

Google Gemini:

Data retention: Per user agreement
Training: Not used for improvement (standard)
Policy: cloud.google.com/terms/aup

Ollama (Local):

Data retention: N/A (never leaves device)
Training: N/A
Policy: Runs on your machine

Minimizing Cloud Exposure

Strategy 1: Sanitize before sending

Before processing:
1. Remove names (replace with roles)
2. Redact sensitive numbers
3. Generic places/companies
4. Remove context not needed for analysis

Example:
Before: "John Smith at Acme Corp, account #12345"
After: "Team member at Company A, account redacted"

Strategy 2: Local transcription only

Audio → Ollama Whisper → Text (local)
Text → Review → Redact → Send to cloud analysis

Strategy 3: Fully local

Audio → Ollama Whisper → Text (local)
Text → Ollama Llama → Analysis (local)
Zero cloud exposure

API Key Security

Secure Key Management

Best practices:

Generation:

1. Use provider's dashboard (official only)
2. Give descriptive name ("Selfoss Desktop")
3. Set permissions (read-only where possible)
4. Set spending limits
5. Note creation date for rotation

Storage:

Primary: Selfoss app (encrypted)
Backup: Password manager (1Password, Bitwarden)
Never: Plain text files, screenshots, email

Rotation:

Schedule:
- Every 3 months: Routine rotation
- Immediately: If compromised
- Before: Selling/giving away device
- After: Shared device usage

Process:
1. Generate new key (keep old active)
2. Update Selfoss settings
3. Test new key works
4. Revoke old key

If Key Compromised

Immediate actions:

1. Revoke key on provider dashboard
   - OpenAI: platform.openai.com/api-keys
   - Gemini: console.cloud.google.com

2. Generate new key
   - Different name
   - New permissions

3. Update Selfoss
   - Settings → API keys
   - Test connection

4. Monitor usage
   - Check for unauthorized calls
   - Verify billing

5. Report if fraudulent
   - Contact provider support
   - Dispute unauthorized charges

Key Permissions

Principle of least privilege:

For transcription:

OpenAI:
- Required: Whisper API access
- Not needed: GPT models, DALL-E, etc.

Gemini:
- Required: Generative Language API
- Not needed: Other Google Cloud services

For analysis:

OpenAI:
- Required: GPT API access
- Not needed: Whisper, DALL-E, etc.

Gemini:
- Required: Generative Language API

Self-Hosting Benefits

GDPR principles:

1. Data Minimization:

✅ Only stores necessary data
✅ No telemetry/tracking
✅ User controls what’s processed

2. Purpose Limitation:

✅ Data used only for transcription/analysis
✅ Not shared with third parties
✅ Not used for other purposes

3. Storage Limitation:

✅ User controls retention
✅ Easy deletion (project/transcript level)
✅ Complete data export

4. Right to Access:

✅ All data accessible locally
✅ Full export capabilities
✅ No barriers to data access

5. Right to Erasure:

✅ Delete projects/transcripts
✅ Uninstall = complete removal
✅ No cloud data to delete

6. Data Portability:

✅ Export in standard formats (JSON, PDF, CSV)
✅ No vendor lock-in
✅ Easy migration

For Organizations

GDPR considerations:

Data Controller:

Organization using Selfoss
Controls what data is processed
Responsible for compliance

Data Processor (AI Providers):

OpenAI, Google (if used)
Process data on behalf of controller
Have their own GDPR compliance

Recommendations:

1. Use local-only mode for sensitive data
   - Ollama for transcription
   - Ollama for analysis
   - Zero data to processors

2. Data Processing Agreements (DPA):
   - OpenAI: Available in dashboard
   - Google: Available in Cloud Console
   - Review and sign before use

3. Employee training:
   - What data can be processed
   - Redaction procedures
   - Handling sensitive information

4. Audit logs:
   - Track what was processed
   - Who processed it
   - When and why

5. Regular reviews:
   - Quarterly data audits
   - Provider policy updates
   - Compliance verification

Enterprise Deployment

Deployment Models

Model 1: Individual Installations

Each user's device:
├── Own Selfoss installation
├── Own database (isolated)
├── Own API keys (or shared)
└── Own backups

Pros:
✅ Maximum isolation
✅ No shared infrastructure
✅ User-specific settings

Cons:
❌ No centralized management
❌ Individual license per user

Model 2: Shared Ollama Server

Corporate network:
├── Central Ollama server
│   └── All models cached
└── User devices
    └── Selfoss pointing to server

Pros:
✅ Shared model downloads
✅ GPU-powered processing
✅ Consistent performance

Cons:
❌ Network dependency
❌ Potential bottleneck

Model 3: Air-Gapped Deployment

Secure environment:
├── No internet access
├── Local Ollama only
└── Manual model transfer

Pros:
✅ Maximum security
✅ No data exfiltration risk
✅ Compliance-friendly

Cons:
❌ No cloud models
❌ Manual updates

Network Configuration

Firewall rules:

Outbound allowed (if using cloud):
- api.openai.com (443)
- generativelanguage.googleapis.com (443)
- license.lemonsqueezy.com (443)

Inbound: None required

For Ollama server:
- Internal network: Port 11434

Proxy configuration:

Selfoss → Settings → Advanced → Proxy
HTTP Proxy: http://proxy.company.com:8080
HTTPS Proxy: https://proxy.company.com:8443

Centralized License Management

Volume licensing:

Contact for enterprise:
- Multiple seat licenses
- Centralized billing
- Admin dashboard (planned)
- SSO integration (future)

Secure Backup Strategies

Backup Security

Threat model:

Device theft
Hardware failure
Accidental deletion
Ransomware

Protection layers:

Layer 1: Encryption

Encrypt backups before storing:

# Linux/macOS
zip -e -r backup.zip selfoss_backup/
# Enter password when prompted

# Windows (7-Zip)
7z a -p -mhe=on backup.7z selfoss_backup\

Layer 2: Off-site Storage

Store in multiple locations:
1. Local drive (primary)
2. External drive (secondary)
3. Cloud storage (encrypted) (tertiary)

Layer 3: Access Control

Limit backup access:
- Password-protected archives
- Encrypted cloud storage
- OS-level file permissions

Cloud Backup Privacy

If using cloud storage (Dropbox, Google Drive, etc.):

DO:

✅ Encrypt locally before upload (see above)
✅ Use strong password (password manager)
✅ Enable 2FA on cloud account
✅ Regularly test restore
✅ Rotate encryption passwords

DON’T:

❌ Upload unencrypted backups
❌ Share backup links
❌ Use weak passwords
❌ Store password with backup

Recommended services:

Tresorit (end-to-end encrypted)
ProtonDrive (privacy-focused)
Cryptomator (client-side encryption)

Security Checklist

Daily Operations

☑ Keep Selfoss updated

Check for updates monthly
Apply security patches promptly

☑ Monitor API usage

Unusual activity = potential compromise
Review monthly

☑ Lock device when away

Password/biometric required
Auto-lock after 5 minutes

Monthly Review

☑ Review API keys

Still needed?
Still secure?
Time to rotate?

☑ Check backups

Backup exists?
Restore test passed?
Stored securely?

☑ Audit processed data

Any sensitive data needs removal?
Old projects to archive?

Quarterly Security Audit

☑ Rotate API keys

Generate new keys
Update Selfoss
Revoke old keys

☑ Review provider policies

Privacy policy changes?
Terms of service updates?
Data retention changes?

☑ Update security practices

New threats?
Better encryption?
Enhanced procedures?

Next Steps

🔒 Your data is now secure!

Implement Security:

🔐 Enable encryption - OS keyring or encrypted backups
🔄 Rotate API keys - Quarterly schedule
📋 Document procedures - For team/organization
🧪 Test recovery - Verify backups work
📚 Train users - Security best practices

Stay Secure:

Monitor for updates
Review security logs
Test disaster recovery
Keep informed of threats

🔒 Privacy-first, security-focused, user-controlled.

14 - Privacy & Security Guide

14 - Privacy & Security Guide

Table of Contents

Privacy Philosophy

Selfoss Core Principles

Local-Only Storage Architecture

Where Your Data Lives

Data Storage Locations

Network Activity

Encrypted API Key Storage

Current Implementation

Future Enhancement: OS Keyring

Security Best Practices

No Telemetry Policy

What We DON’T Collect

Why This Matters

Verification

Cloud Provider Privacy

When You Use Cloud AI

Provider Data Policies

Minimizing Cloud Exposure

API Key Security

Secure Key Management

If Key Compromised

Key Permissions

GDPR Compliance

Self-Hosting Benefits

For Organizations

Enterprise Deployment

Deployment Models

Network Configuration

Centralized License Management

Secure Backup Strategies

Backup Security

Cloud Backup Privacy

Security Checklist

Daily Operations

Monthly Review

Quarterly Security Audit

Next Steps

Implement Security:

Stay Secure: