Skip to content

06 - Transcript Upload & Processing Guide

πŸ“„ Transform Text Files into Visual Insights
⏱️ Time Estimate: 10 minutes
πŸ“‹ What You’ll Learn: File upload, format support, processing pipeline, validation, reprocessing



Format: UTF-8 encoded text files

Best for:

  • Simple meeting notes
  • Exported transcripts from other tools
  • Voice-to-text outputs

Example:

Meeting: Q4 Planning Session
Date: October 16, 2024
John: We need to finalize the Q4 roadmap.
Sarah: I propose we focus on three key initiatives...
John: That sounds good. Let's decide by Friday.

βœ… Pros: Universal, simple, no formatting
❌ Cons: No speaker labels unless manually added

Format: Microsoft Word documents

Best for:

  • Formatted meeting notes
  • Structured reports
  • Documents with rich formatting

Features:

  • Preserves basic formatting
  • Extracts text content
  • Ignores images/tables

βœ… Pros: Common format, preserves structure
❌ Cons: Larger file size, formatting stripped during processing

Format: Web Video Text Tracks

Best for:

  • Video platform exports (YouTube, Zoom)
  • Timestamped transcripts
  • Subtitle files

Example:

WEBVTT
00:00:05.000 --> 00:00:10.000
John: We need to finalize the Q4 roadmap.
00:00:10.500 --> 00:00:18.000
Sarah: I propose we focus on three key initiatives.

βœ… Pros: Includes timestamps, speaker tracking
⭐ Special: Enables temporal sentiment analysis (F015)

Format: SubRip subtitle format

Best for:

  • Video subtitle exports
  • Timestamped meeting transcripts
  • Media player outputs

Example:

1
00:00:05,000 --> 00:00:10,000
John: We need to finalize the Q4 roadmap.
2
00:00:10,500 --> 00:00:18,000
Sarah: I propose we focus on three key initiatives.

βœ… Pros: Widely supported, includes timestamps
⭐ Special: Enables temporal sentiment analysis (F015)


LimitValueReason
Minimum1KBEnsure meaningful content
Maximum50MBMemory/processing constraints
Recommended1-10MBOptimal performance

Typical sizes:

  • 15-minute meeting: 10-50KB (.txt)
  • 1-hour meeting: 50-200KB (.txt)
  • 3-hour workshop: 200KB-1MB (.txt)

Required: UTF-8 encoding

Common issues:

  • ❌ Windows-1252 (Western European)
  • ❌ ISO-8859-1 (Latin-1)
  • ❌ UTF-16 (Microsoft Word legacy)

How to fix:

# Check encoding (Linux/macOS)
file -i transcript.txt
# Convert to UTF-8
iconv -f WINDOWS-1252 -t UTF-8 input.txt > output.txt

Windows users:

  • Open in Notepad
  • β€œSave As” β†’ Encoding: β€œUTF-8”

Minimum length:

  • At least 100 characters
  • At least 3 sentences
  • Meaningful content (not just test text)

Quality indicators: βœ… Clear sentence structure
βœ… Speaker labels (if multi-speaker)
βœ… Proper punctuation
βœ… Complete thoughts/sentences


Step 1: Select Project

  1. Click on project in sidebar
  2. Verify correct project is selected
  3. Project name appears in header

Step 2: Open Upload Modal

  1. Click β€œUpload & Process” button (header)
  2. Modal opens with dropzone area
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Upload & Process Transcript β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ πŸ“„ Drag and drop file here β”‚
β”‚ or click to browse β”‚
β”‚ β”‚
β”‚ Supported: .txt, .docx, β”‚
β”‚ .vtt, .srt (max 50MB) β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 3: Select File

Option A: Drag-and-Drop

  1. Drag file from file explorer
  2. Drop onto dropzone area
  3. File appears in preview

Option B: File Picker

  1. Click dropzone area
  2. File picker dialog opens
  3. Navigate to file
  4. Click β€œOpen”

Step 4: Validation (Automatic)

Selfoss validates:

  • βœ… File format (.txt, .docx, .vtt, .srt)
  • βœ… File size (1KB - 50MB)
  • βœ… MIME type
  • βœ… Content readability

Status messages:

βœ… "File validated successfully"
⚠️ "File too large (52MB, max 50MB)"
❌ "Unsupported file format (.pdf)"
❌ "File is empty or too small"

Step 5: Process with LLM

After successful upload:

  1. Click β€œProcess with LLM” button
  2. Configure AI settings (first time only):
    • Select provider (OpenAI/Gemini/Ollama)
    • Enter API key (if cloud)
    • Choose model
  3. Click β€œStart Analysis”

Step 6: Wait for Processing

Progress indicators show:

  • πŸ“Š β€œAnalyzing transcript structure…”
  • πŸ€– β€œExtracting decisions…”
  • 🎯 β€œIdentifying action items…”
  • πŸ’‘ β€œMapping concepts…”

⏱️ Processing time: 10-60 seconds (depending on length and provider)

Step 7: View Results

Modal auto-closes and displays:

  • βœ… Transcript added to project list
  • πŸ“Š Visualizations generated
  • 🎯 Decisions, actions, concepts extracted

Each transcript displays its current status:

πŸ“„ Q4 Planning Meeting
[⏳ Pending] Not yet processed
πŸ“„ Strategy Session
[πŸ”„ Processing] Currently analyzing
πŸ“„ Team Sync
[βœ… Completed] Successfully processed
πŸ“„ Budget Review
[❌ Failed] Processing error
1. Upload β†’ Validation β†’ Parsing
└─ Extract text from file
2. Text Analysis
└─ Detect speakers, structure
3. LLM Processing (single pass)
β”œβ”€ Extract decisions
β”œβ”€ Identify action items
└─ Map concepts
4. Visualization Generation
β”œβ”€ Decision flowchart
β”œβ”€ Concept mind map
└─ Action matrix
5. Storage
└─ Save to database

While processing, see:

  • πŸ”„ Spinner animation
  • πŸ“Š Current stage indicator
  • ⏱️ Estimated time remaining (for long transcripts)

πŸ’‘ Pro Tip: You can navigate away during processing - status updates continue in background.


For .vtt and .srt files, Selfoss extracts:

  • ⏱️ Timestamps for each utterance
  • πŸ‘€ Speaker identification (if available)
  • πŸ“Š Time-based emotional flow

Stored as JSON in database:

{
"utterances": [
{
"start_time": 5.0,
"end_time": 10.5,
"speaker": "John",
"text": "We need to finalize the Q4 roadmap."
},
{
"start_time": 10.5,
"end_time": 18.0,
"speaker": "Sarah",
"text": "I propose we focus on three key initiatives."
}
],
"total_duration": 3600,
"speaker_count": 5
}

With temporal data, you can:

  • πŸ“ˆ Sentiment Arc Timeline - Track emotional flow over time
  • πŸ”₯ Tension Indicators - Identify heated discussions
  • 🀝 Agreement Overlays - Visualize consensus moments
  • πŸ“Š Speaker Participation - Time-based contribution analysis

πŸ‘‰ Learn More: See 07_VISUALIZATION_DEEP_DIVE_GUIDE.md β†’ Sentiment Analysis section


Selfoss uses an optimized prompt to extract all data in one API call:

Input: Plain text transcript
Output: Structured JSON with:

  • πŸ“‹ Meeting metadata (title, date, participants)
  • 🎯 Decisions made
  • βœ… Action items assigned
  • πŸ’‘ Key concepts discussed

For audio recordings, pipeline has two stages:

Stage 1: Transcription (Audio β†’ Text)

  • Provider: Whisper (local or cloud)
  • Output: Plain text

Stage 2: Analysis (Text β†’ Insights)

  • Provider: GPT/Gemini/Llama
  • Output: Structured data

πŸ‘‰ Learn More: See 03_CLOUD_PROVIDER_SETUP_GUIDE.md

Before processing, Selfoss shows:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Processing Cost Estimate β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Provider: OpenAI GPT-4o-mini β”‚
β”‚ Input: ~5,000 tokens β”‚
β”‚ Output: ~2,000 tokens β”‚
β”‚ Cost: ~$0.002 β”‚
β”‚ β”‚
β”‚ [Cancel] [Start Analysis] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’‘ Pro Tip: Use Ollama (local) for free unlimited processing.


Reprocess existing transcripts to:

  • πŸ”„ Try different AI model (GPT-4o vs GPT-4o-mini)
  • πŸ†• Use updated prompt (after Selfoss updates)
  • πŸ”§ Fix processing errors (retry failed transcripts)
  • πŸ§ͺ Compare results (local vs cloud)

Method 1: Context Menu

  1. Right-click transcript in list
  2. Select β€œReprocess with LLM”
  3. Choose provider/model
  4. Click β€œStart Analysis”

Method 2: Transcript View

  1. Click transcript to view
  2. Click β€œReprocess” button (top-right)
  3. Configure settings
  4. Start processing
  • βœ… Overwrites existing processed data
  • βœ… Preserves original file
  • βœ… Tracks which model was used
  • βœ… Maintains metadata (upload date, etc.)

⚠️ Warning: Manual edits will be lost! Export before reprocessing.

Workflow:

  1. Export current results (JSON)
  2. Reprocess with different model
  3. Compare outputs side-by-side
  4. Keep better result

β€œFile too large” error

Solutions:

βœ… Split transcript into smaller files
βœ… Remove unnecessary content (signatures, headers)
βœ… Compress text (remove extra whitespace)
βœ… Use summarization tool first (if > 50MB)

β€œUnsupported file format” error

Common causes:

  • Using .pdf (not supported)
  • Using .rtf (not supported)
  • File has wrong extension

Solutions:

βœ… Convert to .txt:
- Open in text editor
- "Save As" β†’ Plain Text (.txt)
βœ… Convert .docx:
- Open in Word
- "Save As" β†’ Word Document (.docx)

β€œInvalid encoding” error

Solutions:

# Windows (Notepad)
1. Open file
2. "Save As" β†’ Encoding: UTF-8
# macOS (TextEdit)
1. Open file
2. Format β†’ Make Plain Text
3. Save (UTF-8 automatic)
# Linux (command line)
iconv -f WINDOWS-1252 -t UTF-8 input.txt > output.txt

β€œFile is empty” error

Causes:

  • File has no content
  • File is corrupted
  • Wrong file selected

Solutions:

βœ… Open file in text editor to verify content
βœ… Check file size (should be > 1KB)
βœ… Re-export from original source
βœ… Copy-paste content into new .txt file

β€œProcessing failed” error

Common causes:

  1. API key invalid β†’ Verify in Settings
  2. No credits β†’ Add payment method (cloud providers)
  3. Network timeout β†’ Check internet connection
  4. Model not available β†’ Select different model
  5. Content too long β†’ Split into smaller files

β€œEmpty result” error

Causes:

  • Transcript too short (< 100 characters)
  • No meaningful content (test text)
  • Wrong language (non-English)

Solutions:

βœ… Verify transcript has actual meeting content
βœ… Minimum 3-5 sentences required
βœ… Check language (English works best)
βœ… Try different AI model

β€œMalformed JSON” error

This is an internal error. Solutions:

βœ… Retry processing
βœ… Use different provider (GPT-4o-mini β†’ Gemini)
βœ… Report bug if persistent
βœ… Check if transcript has unusual characters

Very slow processing

Expected times:

  • Short (1-2 pages): 10-20 seconds
  • Medium (5-10 pages): 20-40 seconds
  • Long (20+ pages): 40-90 seconds

If slower:

βœ… Check internet speed (for cloud)
βœ… Check CPU usage (for Ollama)
βœ… Try smaller model (GPT-3.5-turbo)
βœ… Close other applications

β€œNo speaker labels detected”

This is informational, not an error:

  • Single-speaker transcripts work fine
  • AI will infer context without labels

To add speaker labels:

Original:
"We need to decide on the budget."
With labels:
"John: We need to decide on the budget."

β€œTranscript structure unclear”

Improve structure:

❌ Poor:
we talked about stuff and decided things...
βœ… Good:
Meeting Topic: Q4 Budget Planning
John: We need to finalize the Q4 budget.
Sarah: I propose allocating 40% to marketing.
Decision: Approved 40% marketing budget.

1. Clean up transcript:

Remove:
- Email signatures
- Legal disclaimers
- Repeated headers/footers
- Excessive line breaks
Keep:
- Actual meeting content
- Speaker names
- Decisions and actions
- Important context

2. Add structure:

# Meeting: Q4 Planning
Date: October 16, 2024
Attendees: John, Sarah, Mike
## Discussion
John: ...
Sarah: ...
## Decisions
1. Approved marketing budget
2. Launch date set for November 1
## Action Items
- [ ] John: Finalize contracts by Friday
- [ ] Sarah: Send budget breakdown

3. Verify quality:

  • βœ… Complete sentences
  • βœ… Proper punctuation
  • βœ… No garbled text
  • βœ… Consistent formatting

For faster processing:

  1. Use GPT-4o-mini (fastest cloud)
  2. Remove unnecessary content before upload
  3. Process during off-peak hours
  4. Use Ollama for local processing (no network latency)

For better accuracy:

  1. Use GPT-4o or Gemini 1.5 Pro
  2. Include speaker labels
  3. Add context (meeting type, date)
  4. Proper sentence structure

For cost optimization:

  1. Use Ollama (free, local)
  2. Use Gemini 1.5 Flash (cheapest cloud)
  3. Clean transcript before processing (fewer tokens)
  4. Batch similar transcripts

πŸŽ‰ You’re now a transcript processing expert!

  1. πŸ“„ Upload test transcript - Try different formats
  2. πŸ”„ Experiment with reprocessing - Compare different models
  3. πŸ“Š Explore visualizations β†’ 07_VISUALIZATION_DEEP_DIVE_GUIDE.md
  4. ✏️ Try interactive editing β†’ 08_INTERACTIVE_EDITING_GUIDE.md
  5. πŸ’Ύ Set up backups β†’ 09_DATA_MANAGEMENT_GUIDE.md
  • Batch processing multiple files
  • Custom prompts for specialized analysis
  • API optimization for high volume
  • Sentiment analysis with temporal data

πŸ“„ From text to insight in seconds.