Skip to content

14 - Privacy & Security Guide

🔒 Data Protection & Security Best Practices
⏱️ Time Estimate: 15 minutes
📋 What You’ll Learn: Data handling, security features, privacy compliance, best practices



Privacy by Design:

  • 🔒 Local-first: All data stored on your device
  • 🚫 No tracking: Zero telemetry or analytics
  • 🔐 Encrypted keys: Secure API key storage
  • 📍 Data control: You own all your data
  • 🌐 Offline capable: Works without internet (with local models)

User Control:

  • Choose your AI providers
  • Decide what goes to cloud
  • Full export/delete capabilities
  • Transparent data handling

Everything stays local:

Your Device
├── Selfoss Database (SQLite)
│ ├── Projects
│ ├── Transcripts
│ ├── Processed data
│ └── Settings
├── Audio Recordings
│ └── WebM files by project
└── Whisper Models
└── Local AI models

What this means:

  • ✅ No Selfoss cloud servers
  • ✅ No data uploaded to us
  • ✅ Complete data sovereignty
  • ✅ Works offline with local models

Windows:

C:\Users\{Username}\AppData\Roaming\selfoss\

macOS:

~/Library/Application Support/selfoss/

Linux:

~/.local/share/selfoss/

Security features:

  • User-specific directories (OS-level isolation)
  • File system permissions (read/write by user only)
  • No shared storage
  • No system-wide data

Selfoss ONLY connects to:

  1. License server (LemonSqueezy)

    • Purpose: License validation
    • Frequency: On activation only
    • Data sent: License key, device ID
  2. AI providers (if configured)

    • OpenAI, Google, Ollama (if remote)
    • Purpose: Transcription/analysis
    • Frequency: Per transcript processing
    • Data sent: Audio/text for processing

Selfoss NEVER connects to:

  • ❌ Analytics servers
  • ❌ Tracking services
  • ❌ Ad networks
  • ❌ Selfoss company servers (except license)

Storage method:

SQLite database with application-level encryption
Location: selfoss.db (encrypted fields)
Algorithm: AES-256 (planned: OS keyring)

How it works:

  1. User enters API key in Settings
  2. Key encrypted before database storage
  3. Decrypted only when needed for API calls
  4. Never logged or exposed

Planned migration (F007):

Windows:

  • Windows Credential Manager
  • System-level encryption
  • Requires user authentication

macOS:

  • Keychain Access
  • Secure Enclave (on supported devices)
  • TouchID/Password required

Linux:

  • Secret Service API (libsecret)
  • GNOME Keyring / KWallet
  • User password required

Benefits:

  • ✅ OS-level security
  • ✅ Separate from app data
  • ✅ Hardware-backed encryption (where available)
  • ✅ Better protection if device stolen

For API keys:

DO:

  • ✅ Use unique keys per application
  • ✅ Set spending limits on provider dashboards
  • ✅ Rotate keys every 3-6 months
  • ✅ Store backup in password manager (encrypted)
  • ✅ Delete keys before device disposal

DON’T:

  • ❌ Share keys via email/chat
  • ❌ Commit to version control
  • ❌ Use same key on shared devices
  • ❌ Screenshot keys
  • ❌ Leave keys on public computers

Zero data collection:

  • ❌ Usage statistics
  • ❌ Feature usage
  • ❌ Error reports (unless you submit)
  • ❌ Analytics
  • ❌ Crash reports
  • ❌ Device information
  • ❌ IP addresses

Your privacy:

  • No user profiling
  • No behavior tracking
  • No data monetization
  • No third-party sharing (because there’s no data!)

Transparency:

  • Open source codebase
  • Auditable network calls
  • Clear privacy policy
  • Community verification

How to verify:

1. Use network monitoring tool:
- Wireshark (desktop)
- Charles Proxy (mobile)
- Browser DevTools (web version)
2. Monitor outbound connections:
- Should only see AI provider calls
- License validation on activation
- No other network activity
3. Review source code:
- GitHub: shobankr/selfoss
- Search for analytics calls
- None found!

What leaves your device:

For transcription:

  • 🎵 Audio file (OpenAI, Gemini)
  • ⏱️ Duration metadata
  • 🔤 Returned: Text transcript

For analysis:

  • 📄 Text transcript
  • 🤖 Returned: Structured JSON (decisions, actions, concepts)

What doesn’t leave:

  • ❌ Project names
  • ❌ Other transcripts
  • ❌ Database contents
  • ❌ API keys (used for auth header only)
  • ❌ Personal information (unless in transcript)

OpenAI:

Data retention: 30 days (API data)
Training: Not used (as of 2024)
Policy: api.openai.com/data-usage

Google Gemini:

Data retention: Per user agreement
Training: Not used for improvement (standard)
Policy: cloud.google.com/terms/aup

Ollama (Local):

Data retention: N/A (never leaves device)
Training: N/A
Policy: Runs on your machine

Strategy 1: Sanitize before sending

Before processing:
1. Remove names (replace with roles)
2. Redact sensitive numbers
3. Generic places/companies
4. Remove context not needed for analysis
Example:
Before: "John Smith at Acme Corp, account #12345"
After: "Team member at Company A, account redacted"

Strategy 2: Local transcription only

Audio → Ollama Whisper → Text (local)
Text → Review → Redact → Send to cloud analysis

Strategy 3: Fully local

Audio → Ollama Whisper → Text (local)
Text → Ollama Llama → Analysis (local)
Zero cloud exposure

Best practices:

Generation:

1. Use provider's dashboard (official only)
2. Give descriptive name ("Selfoss Desktop")
3. Set permissions (read-only where possible)
4. Set spending limits
5. Note creation date for rotation

Storage:

Primary: Selfoss app (encrypted)
Backup: Password manager (1Password, Bitwarden)
Never: Plain text files, screenshots, email

Rotation:

Schedule:
- Every 3 months: Routine rotation
- Immediately: If compromised
- Before: Selling/giving away device
- After: Shared device usage
Process:
1. Generate new key (keep old active)
2. Update Selfoss settings
3. Test new key works
4. Revoke old key

Immediate actions:

1. Revoke key on provider dashboard
- OpenAI: platform.openai.com/api-keys
- Gemini: console.cloud.google.com
2. Generate new key
- Different name
- New permissions
3. Update Selfoss
- Settings → API keys
- Test connection
4. Monitor usage
- Check for unauthorized calls
- Verify billing
5. Report if fraudulent
- Contact provider support
- Dispute unauthorized charges

Principle of least privilege:

For transcription:

OpenAI:
- Required: Whisper API access
- Not needed: GPT models, DALL-E, etc.
Gemini:
- Required: Generative Language API
- Not needed: Other Google Cloud services

For analysis:

OpenAI:
- Required: GPT API access
- Not needed: Whisper, DALL-E, etc.
Gemini:
- Required: Generative Language API

GDPR principles:

1. Data Minimization:

  • ✅ Only stores necessary data
  • ✅ No telemetry/tracking
  • ✅ User controls what’s processed

2. Purpose Limitation:

  • ✅ Data used only for transcription/analysis
  • ✅ Not shared with third parties
  • ✅ Not used for other purposes

3. Storage Limitation:

  • ✅ User controls retention
  • ✅ Easy deletion (project/transcript level)
  • ✅ Complete data export

4. Right to Access:

  • ✅ All data accessible locally
  • ✅ Full export capabilities
  • ✅ No barriers to data access

5. Right to Erasure:

  • ✅ Delete projects/transcripts
  • ✅ Uninstall = complete removal
  • ✅ No cloud data to delete

6. Data Portability:

  • ✅ Export in standard formats (JSON, PDF, CSV)
  • ✅ No vendor lock-in
  • ✅ Easy migration

GDPR considerations:

Data Controller:

  • Organization using Selfoss
  • Controls what data is processed
  • Responsible for compliance

Data Processor (AI Providers):

  • OpenAI, Google (if used)
  • Process data on behalf of controller
  • Have their own GDPR compliance

Recommendations:

1. Use local-only mode for sensitive data
- Ollama for transcription
- Ollama for analysis
- Zero data to processors
2. Data Processing Agreements (DPA):
- OpenAI: Available in dashboard
- Google: Available in Cloud Console
- Review and sign before use
3. Employee training:
- What data can be processed
- Redaction procedures
- Handling sensitive information
4. Audit logs:
- Track what was processed
- Who processed it
- When and why
5. Regular reviews:
- Quarterly data audits
- Provider policy updates
- Compliance verification

Model 1: Individual Installations

Each user's device:
├── Own Selfoss installation
├── Own database (isolated)
├── Own API keys (or shared)
└── Own backups
Pros:
✅ Maximum isolation
✅ No shared infrastructure
✅ User-specific settings
Cons:
❌ No centralized management
❌ Individual license per user

Model 2: Shared Ollama Server

Corporate network:
├── Central Ollama server
│ └── All models cached
└── User devices
└── Selfoss pointing to server
Pros:
✅ Shared model downloads
✅ GPU-powered processing
✅ Consistent performance
Cons:
❌ Network dependency
❌ Potential bottleneck

Model 3: Air-Gapped Deployment

Secure environment:
├── No internet access
├── Local Ollama only
└── Manual model transfer
Pros:
✅ Maximum security
✅ No data exfiltration risk
✅ Compliance-friendly
Cons:
❌ No cloud models
❌ Manual updates

Firewall rules:

Outbound allowed (if using cloud):
- api.openai.com (443)
- generativelanguage.googleapis.com (443)
- license.lemonsqueezy.com (443)
Inbound: None required
For Ollama server:
- Internal network: Port 11434

Proxy configuration:

Selfoss → Settings → Advanced → Proxy
HTTP Proxy: http://proxy.company.com:8080
HTTPS Proxy: https://proxy.company.com:8443

Volume licensing:

Contact for enterprise:
- Multiple seat licenses
- Centralized billing
- Admin dashboard (planned)
- SSO integration (future)

Threat model:

  • Device theft
  • Hardware failure
  • Accidental deletion
  • Ransomware

Protection layers:

Layer 1: Encryption

Encrypt backups before storing:
# Linux/macOS
zip -e -r backup.zip selfoss_backup/
# Enter password when prompted
# Windows (7-Zip)
7z a -p -mhe=on backup.7z selfoss_backup\

Layer 2: Off-site Storage

Store in multiple locations:
1. Local drive (primary)
2. External drive (secondary)
3. Cloud storage (encrypted) (tertiary)

Layer 3: Access Control

Limit backup access:
- Password-protected archives
- Encrypted cloud storage
- OS-level file permissions

If using cloud storage (Dropbox, Google Drive, etc.):

DO:

✅ Encrypt locally before upload (see above)
✅ Use strong password (password manager)
✅ Enable 2FA on cloud account
✅ Regularly test restore
✅ Rotate encryption passwords

DON’T:

❌ Upload unencrypted backups
❌ Share backup links
❌ Use weak passwords
❌ Store password with backup

Recommended services:

  • Tresorit (end-to-end encrypted)
  • ProtonDrive (privacy-focused)
  • Cryptomator (client-side encryption)

☑ Keep Selfoss updated

  • Check for updates monthly
  • Apply security patches promptly

☑ Monitor API usage

  • Unusual activity = potential compromise
  • Review monthly

☑ Lock device when away

  • Password/biometric required
  • Auto-lock after 5 minutes

☑ Review API keys

  • Still needed?
  • Still secure?
  • Time to rotate?

☑ Check backups

  • Backup exists?
  • Restore test passed?
  • Stored securely?

☑ Audit processed data

  • Any sensitive data needs removal?
  • Old projects to archive?

☑ Rotate API keys

  • Generate new keys
  • Update Selfoss
  • Revoke old keys

☑ Review provider policies

  • Privacy policy changes?
  • Terms of service updates?
  • Data retention changes?

☑ Update security practices

  • New threats?
  • Better encryption?
  • Enhanced procedures?

🔒 Your data is now secure!

  1. 🔐 Enable encryption - OS keyring or encrypted backups
  2. 🔄 Rotate API keys - Quarterly schedule
  3. 📋 Document procedures - For team/organization
  4. 🧪 Test recovery - Verify backups work
  5. 📚 Train users - Security best practices
  • Monitor for updates
  • Review security logs
  • Test disaster recovery
  • Keep informed of threats

🔒 Privacy-first, security-focused, user-controlled.