06 - Transcript Upload & Processing Guide

📄 Transform Text Files into Visual Insights
⏱️ Time Estimate: 10 minutes
📋 What You’ll Learn: File upload, format support, processing pipeline, validation, reprocessing

Supported File Formats
File Validation Requirements
Upload Workflow
Processing Status Indicators
Temporal Data Extraction
Understanding the Processing Pipeline
Reprocessing Transcripts
Troubleshooting

Supported File Formats

Plain Text Files (.txt)

Format: UTF-8 encoded text files

Best for:

Simple meeting notes
Exported transcripts from other tools
Voice-to-text outputs

Example:

Meeting: Q4 Planning Session
Date: October 16, 2024

John: We need to finalize the Q4 roadmap.
Sarah: I propose we focus on three key initiatives...
John: That sounds good. Let's decide by Friday.

✅ Pros: Universal, simple, no formatting
❌ Cons: No speaker labels unless manually added

Word Documents (.docx)

Format: Microsoft Word documents

Best for:

Formatted meeting notes
Structured reports
Documents with rich formatting

Features:

Preserves basic formatting
Extracts text content
Ignores images/tables

✅ Pros: Common format, preserves structure
❌ Cons: Larger file size, formatting stripped during processing

WebVTT Files (.vtt)

Format: Web Video Text Tracks

Best for:

Video platform exports (YouTube, Zoom)
Timestamped transcripts
Subtitle files

Example:

WEBVTT

00:00:05.000 --> 00:00:10.000
John: We need to finalize the Q4 roadmap.

00:00:10.500 --> 00:00:18.000
Sarah: I propose we focus on three key initiatives.

✅ Pros: Includes timestamps, speaker tracking
⭐ Special: Enables temporal sentiment analysis (F015)

SubRip Files (.srt)

Format: SubRip subtitle format

Best for:

Video subtitle exports
Timestamped meeting transcripts
Media player outputs

Example:

1
00:00:05,000 --> 00:00:10,000
John: We need to finalize the Q4 roadmap.

2
00:00:10,500 --> 00:00:18,000
Sarah: I propose we focus on three key initiatives.

✅ Pros: Widely supported, includes timestamps
⭐ Special: Enables temporal sentiment analysis (F015)

File Validation Requirements

File Size Limits

Limit	Value	Reason
Minimum	1KB	Ensure meaningful content
Maximum	50MB	Memory/processing constraints
Recommended	1-10MB	Optimal performance

Typical sizes:

15-minute meeting: 10-50KB (.txt)
1-hour meeting: 50-200KB (.txt)
3-hour workshop: 200KB-1MB (.txt)

Encoding Requirements

Required: UTF-8 encoding

Common issues:

❌ Windows-1252 (Western European)
❌ ISO-8859-1 (Latin-1)
❌ UTF-16 (Microsoft Word legacy)

How to fix:

# Check encoding (Linux/macOS)
file -i transcript.txt

# Convert to UTF-8
iconv -f WINDOWS-1252 -t UTF-8 input.txt > output.txt

Windows users:

Open in Notepad
“Save As” → Encoding: “UTF-8”

Content Requirements

Minimum length:

At least 100 characters
At least 3 sentences
Meaningful content (not just test text)

Quality indicators: ✅ Clear sentence structure
✅ Speaker labels (if multi-speaker)
✅ Proper punctuation
✅ Complete thoughts/sentences

Upload Workflow

Step-by-Step Upload

Step 1: Select Project

Click on project in sidebar
Verify correct project is selected
Project name appears in header

Step 2: Open Upload Modal

Click “Upload & Process” button (header)
Modal opens with dropzone area

┌────────────────────────────────┐
│ Upload & Process Transcript    │
├────────────────────────────────┤
│                                │
│  📄 Drag and drop file here    │
│       or click to browse       │
│                                │
│  Supported: .txt, .docx,       │
│  .vtt, .srt (max 50MB)         │
│                                │
└────────────────────────────────┘

Step 3: Select File

Option A: Drag-and-Drop

Drag file from file explorer
Drop onto dropzone area
File appears in preview

Option B: File Picker

Click dropzone area
File picker dialog opens
Navigate to file
Click “Open”

Step 4: Validation (Automatic)

Selfoss validates:

✅ File format (.txt, .docx, .vtt, .srt)
✅ File size (1KB - 50MB)
✅ MIME type
✅ Content readability

Status messages:

✅ "File validated successfully"
⚠️ "File too large (52MB, max 50MB)"
❌ "Unsupported file format (.pdf)"
❌ "File is empty or too small"

Step 5: Process with LLM

After successful upload:

Click “Process with LLM” button
Configure AI settings (first time only):
- Select provider (OpenAI/Gemini/Ollama)
- Enter API key (if cloud)
- Choose model
Click “Start Analysis”

Step 6: Wait for Processing

Progress indicators show:

📊 “Analyzing transcript structure…”
🤖 “Extracting decisions…”
🎯 “Identifying action items…”
💡 “Mapping concepts…”

⏱️ Processing time: 10-60 seconds (depending on length and provider)

Step 7: View Results

Modal auto-closes and displays:

✅ Transcript added to project list
📊 Visualizations generated
🎯 Decisions, actions, concepts extracted

Processing Status Indicators

Status Badges

Each transcript displays its current status:

📄 Q4 Planning Meeting
   [⏳ Pending]     Not yet processed

📄 Strategy Session
   [🔄 Processing]  Currently analyzing

📄 Team Sync
   [✅ Completed]   Successfully processed

📄 Budget Review
   [❌ Failed]      Processing error

Processing Stages (F002)

1. Upload → Validation → Parsing
   └─ Extract text from file

2. Text Analysis
   └─ Detect speakers, structure

3. LLM Processing (single pass)
   ├─ Extract decisions
   ├─ Identify action items
   └─ Map concepts

4. Visualization Generation
   ├─ Decision flowchart
   ├─ Concept mind map
   └─ Action matrix

5. Storage
   └─ Save to database

Real-Time Updates

While processing, see:

🔄 Spinner animation
📊 Current stage indicator
⏱️ Estimated time remaining (for long transcripts)

💡 Pro Tip: You can navigate away during processing - status updates continue in background.

Temporal Data Extraction

What is Temporal Data?

For .vtt and .srt files, Selfoss extracts:

⏱️ Timestamps for each utterance
👤 Speaker identification (if available)
📊 Time-based emotional flow

Temporal Data Structure

Stored as JSON in database:

{
  "utterances": [
    {
      "start_time": 5.0,
      "end_time": 10.5,
      "speaker": "John",
      "text": "We need to finalize the Q4 roadmap."
    },
    {
      "start_time": 10.5,
      "end_time": 18.0,
      "speaker": "Sarah",
      "text": "I propose we focus on three key initiatives."
    }
  ],
  "total_duration": 3600,
  "speaker_count": 5
}

Enabling Advanced Features

With temporal data, you can:

📈 Sentiment Arc Timeline - Track emotional flow over time
🔥 Tension Indicators - Identify heated discussions
🤝 Agreement Overlays - Visualize consensus moments
📊 Speaker Participation - Time-based contribution analysis

👉 Learn More: See 07_VISUALIZATION_DEEP_DIVE_GUIDE.md → Sentiment Analysis section

Understanding the Processing Pipeline

Single-Pass LLM Extraction

Selfoss uses an optimized prompt to extract all data in one API call:

Input: Plain text transcript
Output: Structured JSON with:

📋 Meeting metadata (title, date, participants)
🎯 Decisions made
✅ Action items assigned
💡 Key concepts discussed

Two-Stage Architecture (F011)

For audio recordings, pipeline has two stages:

Stage 1: Transcription (Audio → Text)

Provider: Whisper (local or cloud)
Output: Plain text

Stage 2: Analysis (Text → Insights)

Provider: GPT/Gemini/Llama
Output: Structured data

👉 Learn More: See 03_CLOUD_PROVIDER_SETUP_GUIDE.md

Cost Estimation

Before processing, Selfoss shows:

┌────────────────────────────────┐
│ Processing Cost Estimate       │
├────────────────────────────────┤
│ Provider:   OpenAI GPT-4o-mini │
│ Input:      ~5,000 tokens      │
│ Output:     ~2,000 tokens      │
│ Cost:       ~$0.002            │
│                                │
│ [Cancel]    [Start Analysis]   │
└────────────────────────────────┘

💡 Pro Tip: Use Ollama (local) for free unlimited processing.

Reprocessing Transcripts

Why Reprocess?

Reprocess existing transcripts to:

🔄 Try different AI model (GPT-4o vs GPT-4o-mini)
🆕 Use updated prompt (after Selfoss updates)
🔧 Fix processing errors (retry failed transcripts)
🧪 Compare results (local vs cloud)

How to Reprocess

Method 1: Context Menu

Right-click transcript in list
Select “Reprocess with LLM”
Choose provider/model
Click “Start Analysis”

Method 2: Transcript View

Click transcript to view
Click “Reprocess” button (top-right)
Configure settings
Start processing

Reprocessing Behavior

✅ Overwrites existing processed data
✅ Preserves original file
✅ Tracks which model was used
✅ Maintains metadata (upload date, etc.)

⚠️ Warning: Manual edits will be lost! Export before reprocessing.

Comparing Results

Workflow:

Export current results (JSON)
Reprocess with different model
Compare outputs side-by-side
Keep better result

Troubleshooting

Upload Issues

“File too large” error

Solutions:

✅ Split transcript into smaller files
✅ Remove unnecessary content (signatures, headers)
✅ Compress text (remove extra whitespace)
✅ Use summarization tool first (if > 50MB)

“Unsupported file format” error

Common causes:

Using .pdf (not supported)
Using .rtf (not supported)
File has wrong extension

Solutions:

✅ Convert to .txt:
   - Open in text editor
   - "Save As" → Plain Text (.txt)

✅ Convert .docx:
   - Open in Word
   - "Save As" → Word Document (.docx)

“Invalid encoding” error

Solutions:

# Windows (Notepad)
1. Open file
2. "Save As" → Encoding: UTF-8

# macOS (TextEdit)
1. Open file
2. Format → Make Plain Text
3. Save (UTF-8 automatic)

# Linux (command line)
iconv -f WINDOWS-1252 -t UTF-8 input.txt > output.txt

“File is empty” error

Causes:

File has no content
File is corrupted
Wrong file selected

Solutions:

✅ Open file in text editor to verify content
✅ Check file size (should be > 1KB)
✅ Re-export from original source
✅ Copy-paste content into new .txt file

Processing Issues

“Processing failed” error

Common causes:

API key invalid → Verify in Settings
No credits → Add payment method (cloud providers)
Network timeout → Check internet connection
Model not available → Select different model
Content too long → Split into smaller files

“Empty result” error

Causes:

Transcript too short (< 100 characters)
No meaningful content (test text)
Wrong language (non-English)

Solutions:

✅ Verify transcript has actual meeting content
✅ Minimum 3-5 sentences required
✅ Check language (English works best)
✅ Try different AI model

“Malformed JSON” error

This is an internal error. Solutions:

✅ Retry processing
✅ Use different provider (GPT-4o-mini → Gemini)
✅ Report bug if persistent
✅ Check if transcript has unusual characters

Very slow processing

Expected times:

Short (1-2 pages): 10-20 seconds
Medium (5-10 pages): 20-40 seconds
Long (20+ pages): 40-90 seconds

If slower:

✅ Check internet speed (for cloud)
✅ Check CPU usage (for Ollama)
✅ Try smaller model (GPT-3.5-turbo)
✅ Close other applications

Validation Issues

“No speaker labels detected”

This is informational, not an error:

Single-speaker transcripts work fine
AI will infer context without labels

To add speaker labels:

Original:
"We need to decide on the budget."

With labels:
"John: We need to decide on the budget."

“Transcript structure unclear”

Improve structure:

❌ Poor:
we talked about stuff and decided things...

✅ Good:
Meeting Topic: Q4 Budget Planning
John: We need to finalize the Q4 budget.
Sarah: I propose allocating 40% to marketing.
Decision: Approved 40% marketing budget.

Best Practices

File Preparation

1. Clean up transcript:

Remove:
- Email signatures
- Legal disclaimers
- Repeated headers/footers
- Excessive line breaks

Keep:
- Actual meeting content
- Speaker names
- Decisions and actions
- Important context

2. Add structure:

# Meeting: Q4 Planning
Date: October 16, 2024
Attendees: John, Sarah, Mike

## Discussion

John: ...
Sarah: ...

## Decisions

1. Approved marketing budget
2. Launch date set for November 1

## Action Items

- [ ] John: Finalize contracts by Friday
- [ ] Sarah: Send budget breakdown

3. Verify quality:

✅ Complete sentences
✅ Proper punctuation
✅ No garbled text
✅ Consistent formatting

Optimization Tips

For faster processing:

Use GPT-4o-mini (fastest cloud)
Remove unnecessary content before upload
Process during off-peak hours
Use Ollama for local processing (no network latency)

For better accuracy:

Use GPT-4o or Gemini 1.5 Pro
Include speaker labels
Add context (meeting type, date)
Proper sentence structure

For cost optimization:

Use Ollama (free, local)
Use Gemini 1.5 Flash (cheapest cloud)
Clean transcript before processing (fewer tokens)
Batch similar transcripts

Next Steps

🎉 You’re now a transcript processing expert!

Recommended Actions:

📄 Upload test transcript - Try different formats
🔄 Experiment with reprocessing - Compare different models
📊 Explore visualizations → 07_VISUALIZATION_DEEP_DIVE_GUIDE.md
✏️ Try interactive editing → 08_INTERACTIVE_EDITING_GUIDE.md
💾 Set up backups → 09_DATA_MANAGEMENT_GUIDE.md

Advanced Topics:

Batch processing multiple files
Custom prompts for specialized analysis
API optimization for high volume
Sentiment analysis with temporal data

📄 From text to insight in seconds.

06 - Transcript Upload & Processing Guide

06 - Transcript Upload & Processing Guide

Table of Contents

Supported File Formats

Plain Text Files (.txt)

Word Documents (.docx)

WebVTT Files (.vtt)

SubRip Files (.srt)

File Validation Requirements

File Size Limits

Encoding Requirements

Content Requirements

Upload Workflow

Step-by-Step Upload

Processing Status Indicators

Status Badges

Processing Stages (F002)

Real-Time Updates

Temporal Data Extraction

What is Temporal Data?

Temporal Data Structure

Enabling Advanced Features

Understanding the Processing Pipeline

Single-Pass LLM Extraction

Two-Stage Architecture (F011)

Cost Estimation

Reprocessing Transcripts

Why Reprocess?

How to Reprocess

Reprocessing Behavior

Comparing Results

Troubleshooting

Upload Issues

Processing Issues

Validation Issues

Best Practices

File Preparation

Optimization Tips

Next Steps

Recommended Actions:

Advanced Topics: