06 - Transcript Upload & Processing Guide
06 - Transcript Upload & Processing Guide
Section titled β06 - Transcript Upload & Processing Guideβπ Transform Text Files into Visual Insights
β±οΈ Time Estimate: 10 minutes
π What Youβll Learn: File upload, format support, processing pipeline, validation, reprocessing
Table of Contents
Section titled βTable of Contentsβ- Supported File Formats
- File Validation Requirements
- Upload Workflow
- Processing Status Indicators
- Temporal Data Extraction
- Understanding the Processing Pipeline
- Reprocessing Transcripts
- Troubleshooting
Supported File Formats
Section titled βSupported File FormatsβPlain Text Files (.txt)
Section titled βPlain Text Files (.txt)βFormat: UTF-8 encoded text files
Best for:
- Simple meeting notes
- Exported transcripts from other tools
- Voice-to-text outputs
Example:
Meeting: Q4 Planning SessionDate: October 16, 2024
John: We need to finalize the Q4 roadmap.Sarah: I propose we focus on three key initiatives...John: That sounds good. Let's decide by Friday.β
Pros: Universal, simple, no formatting
β Cons: No speaker labels unless manually added
Word Documents (.docx)
Section titled βWord Documents (.docx)βFormat: Microsoft Word documents
Best for:
- Formatted meeting notes
- Structured reports
- Documents with rich formatting
Features:
- Preserves basic formatting
- Extracts text content
- Ignores images/tables
β
Pros: Common format, preserves structure
β Cons: Larger file size, formatting stripped during processing
WebVTT Files (.vtt)
Section titled βWebVTT Files (.vtt)βFormat: Web Video Text Tracks
Best for:
- Video platform exports (YouTube, Zoom)
- Timestamped transcripts
- Subtitle files
Example:
WEBVTT
00:00:05.000 --> 00:00:10.000John: We need to finalize the Q4 roadmap.
00:00:10.500 --> 00:00:18.000Sarah: I propose we focus on three key initiatives.β
Pros: Includes timestamps, speaker tracking
β Special: Enables temporal sentiment analysis (F015)
SubRip Files (.srt)
Section titled βSubRip Files (.srt)βFormat: SubRip subtitle format
Best for:
- Video subtitle exports
- Timestamped meeting transcripts
- Media player outputs
Example:
100:00:05,000 --> 00:00:10,000John: We need to finalize the Q4 roadmap.
200:00:10,500 --> 00:00:18,000Sarah: I propose we focus on three key initiatives.β
Pros: Widely supported, includes timestamps
β Special: Enables temporal sentiment analysis (F015)
File Validation Requirements
Section titled βFile Validation RequirementsβFile Size Limits
Section titled βFile Size Limitsβ| Limit | Value | Reason |
|---|---|---|
| Minimum | 1KB | Ensure meaningful content |
| Maximum | 50MB | Memory/processing constraints |
| Recommended | 1-10MB | Optimal performance |
Typical sizes:
- 15-minute meeting: 10-50KB (.txt)
- 1-hour meeting: 50-200KB (.txt)
- 3-hour workshop: 200KB-1MB (.txt)
Encoding Requirements
Section titled βEncoding RequirementsβRequired: UTF-8 encoding
Common issues:
- β Windows-1252 (Western European)
- β ISO-8859-1 (Latin-1)
- β UTF-16 (Microsoft Word legacy)
How to fix:
# Check encoding (Linux/macOS)file -i transcript.txt
# Convert to UTF-8iconv -f WINDOWS-1252 -t UTF-8 input.txt > output.txtWindows users:
- Open in Notepad
- βSave Asβ β Encoding: βUTF-8β
Content Requirements
Section titled βContent RequirementsβMinimum length:
- At least 100 characters
- At least 3 sentences
- Meaningful content (not just test text)
Quality indicators:
β
Clear sentence structure
β
Speaker labels (if multi-speaker)
β
Proper punctuation
β
Complete thoughts/sentences
Upload Workflow
Section titled βUpload WorkflowβStep-by-Step Upload
Section titled βStep-by-Step UploadβStep 1: Select Project
- Click on project in sidebar
- Verify correct project is selected
- Project name appears in header
Step 2: Open Upload Modal
- Click βUpload & Processβ button (header)
- Modal opens with dropzone area
βββββββββββββββββββββββββββββββββββ Upload & Process Transcript βββββββββββββββββββββββββββββββββββ€β ββ π Drag and drop file here ββ or click to browse ββ ββ Supported: .txt, .docx, ββ .vtt, .srt (max 50MB) ββ βββββββββββββββββββββββββββββββββββStep 3: Select File
Option A: Drag-and-Drop
- Drag file from file explorer
- Drop onto dropzone area
- File appears in preview
Option B: File Picker
- Click dropzone area
- File picker dialog opens
- Navigate to file
- Click βOpenβ
Step 4: Validation (Automatic)
Selfoss validates:
- β File format (.txt, .docx, .vtt, .srt)
- β File size (1KB - 50MB)
- β MIME type
- β Content readability
Status messages:
β
"File validated successfully"β οΈ "File too large (52MB, max 50MB)"β "Unsupported file format (.pdf)"β "File is empty or too small"Step 5: Process with LLM
After successful upload:
- Click βProcess with LLMβ button
- Configure AI settings (first time only):
- Select provider (OpenAI/Gemini/Ollama)
- Enter API key (if cloud)
- Choose model
- Click βStart Analysisβ
Step 6: Wait for Processing
Progress indicators show:
- π βAnalyzing transcript structureβ¦β
- π€ βExtracting decisionsβ¦β
- π― βIdentifying action itemsβ¦β
- π‘ βMapping conceptsβ¦β
β±οΈ Processing time: 10-60 seconds (depending on length and provider)
Step 7: View Results
Modal auto-closes and displays:
- β Transcript added to project list
- π Visualizations generated
- π― Decisions, actions, concepts extracted
Processing Status Indicators
Section titled βProcessing Status IndicatorsβStatus Badges
Section titled βStatus BadgesβEach transcript displays its current status:
π Q4 Planning Meeting [β³ Pending] Not yet processed
π Strategy Session [π Processing] Currently analyzing
π Team Sync [β
Completed] Successfully processed
π Budget Review [β Failed] Processing errorProcessing Stages (F002)
Section titled βProcessing Stages (F002)β1. Upload β Validation β Parsing ββ Extract text from file
2. Text Analysis ββ Detect speakers, structure
3. LLM Processing (single pass) ββ Extract decisions ββ Identify action items ββ Map concepts
4. Visualization Generation ββ Decision flowchart ββ Concept mind map ββ Action matrix
5. Storage ββ Save to databaseReal-Time Updates
Section titled βReal-Time UpdatesβWhile processing, see:
- π Spinner animation
- π Current stage indicator
- β±οΈ Estimated time remaining (for long transcripts)
π‘ Pro Tip: You can navigate away during processing - status updates continue in background.
Temporal Data Extraction
Section titled βTemporal Data ExtractionβWhat is Temporal Data?
Section titled βWhat is Temporal Data?βFor .vtt and .srt files, Selfoss extracts:
- β±οΈ Timestamps for each utterance
- π€ Speaker identification (if available)
- π Time-based emotional flow
Temporal Data Structure
Section titled βTemporal Data StructureβStored as JSON in database:
{ "utterances": [ { "start_time": 5.0, "end_time": 10.5, "speaker": "John", "text": "We need to finalize the Q4 roadmap." }, { "start_time": 10.5, "end_time": 18.0, "speaker": "Sarah", "text": "I propose we focus on three key initiatives." } ], "total_duration": 3600, "speaker_count": 5}Enabling Advanced Features
Section titled βEnabling Advanced FeaturesβWith temporal data, you can:
- π Sentiment Arc Timeline - Track emotional flow over time
- π₯ Tension Indicators - Identify heated discussions
- π€ Agreement Overlays - Visualize consensus moments
- π Speaker Participation - Time-based contribution analysis
π Learn More: See 07_VISUALIZATION_DEEP_DIVE_GUIDE.md β Sentiment Analysis section
Understanding the Processing Pipeline
Section titled βUnderstanding the Processing PipelineβSingle-Pass LLM Extraction
Section titled βSingle-Pass LLM ExtractionβSelfoss uses an optimized prompt to extract all data in one API call:
Input: Plain text transcript
Output: Structured JSON with:
- π Meeting metadata (title, date, participants)
- π― Decisions made
- β Action items assigned
- π‘ Key concepts discussed
Two-Stage Architecture (F011)
Section titled βTwo-Stage Architecture (F011)βFor audio recordings, pipeline has two stages:
Stage 1: Transcription (Audio β Text)
- Provider: Whisper (local or cloud)
- Output: Plain text
Stage 2: Analysis (Text β Insights)
- Provider: GPT/Gemini/Llama
- Output: Structured data
π Learn More: See 03_CLOUD_PROVIDER_SETUP_GUIDE.md
Cost Estimation
Section titled βCost EstimationβBefore processing, Selfoss shows:
βββββββββββββββββββββββββββββββββββ Processing Cost Estimate βββββββββββββββββββββββββββββββββββ€β Provider: OpenAI GPT-4o-mini ββ Input: ~5,000 tokens ββ Output: ~2,000 tokens ββ Cost: ~$0.002 ββ ββ [Cancel] [Start Analysis] βββββββββββββββββββββββββββββββββββπ‘ Pro Tip: Use Ollama (local) for free unlimited processing.
Reprocessing Transcripts
Section titled βReprocessing TranscriptsβWhy Reprocess?
Section titled βWhy Reprocess?βReprocess existing transcripts to:
- π Try different AI model (GPT-4o vs GPT-4o-mini)
- π Use updated prompt (after Selfoss updates)
- π§ Fix processing errors (retry failed transcripts)
- π§ͺ Compare results (local vs cloud)
How to Reprocess
Section titled βHow to ReprocessβMethod 1: Context Menu
- Right-click transcript in list
- Select βReprocess with LLMβ
- Choose provider/model
- Click βStart Analysisβ
Method 2: Transcript View
- Click transcript to view
- Click βReprocessβ button (top-right)
- Configure settings
- Start processing
Reprocessing Behavior
Section titled βReprocessing Behaviorβ- β Overwrites existing processed data
- β Preserves original file
- β Tracks which model was used
- β Maintains metadata (upload date, etc.)
β οΈ Warning: Manual edits will be lost! Export before reprocessing.
Comparing Results
Section titled βComparing ResultsβWorkflow:
- Export current results (JSON)
- Reprocess with different model
- Compare outputs side-by-side
- Keep better result
Troubleshooting
Section titled βTroubleshootingβUpload Issues
Section titled βUpload IssuesββFile too largeβ error
Solutions:
β
Split transcript into smaller filesβ
Remove unnecessary content (signatures, headers)β
Compress text (remove extra whitespace)β
Use summarization tool first (if > 50MB)βUnsupported file formatβ error
Common causes:
- Using .pdf (not supported)
- Using .rtf (not supported)
- File has wrong extension
Solutions:
β
Convert to .txt: - Open in text editor - "Save As" β Plain Text (.txt)
β
Convert .docx: - Open in Word - "Save As" β Word Document (.docx)βInvalid encodingβ error
Solutions:
# Windows (Notepad)1. Open file2. "Save As" β Encoding: UTF-8
# macOS (TextEdit)1. Open file2. Format β Make Plain Text3. Save (UTF-8 automatic)
# Linux (command line)iconv -f WINDOWS-1252 -t UTF-8 input.txt > output.txtβFile is emptyβ error
Causes:
- File has no content
- File is corrupted
- Wrong file selected
Solutions:
β
Open file in text editor to verify contentβ
Check file size (should be > 1KB)β
Re-export from original sourceβ
Copy-paste content into new .txt fileProcessing Issues
Section titled βProcessing IssuesββProcessing failedβ error
Common causes:
- API key invalid β Verify in Settings
- No credits β Add payment method (cloud providers)
- Network timeout β Check internet connection
- Model not available β Select different model
- Content too long β Split into smaller files
βEmpty resultβ error
Causes:
- Transcript too short (< 100 characters)
- No meaningful content (test text)
- Wrong language (non-English)
Solutions:
β
Verify transcript has actual meeting contentβ
Minimum 3-5 sentences requiredβ
Check language (English works best)β
Try different AI modelβMalformed JSONβ error
This is an internal error. Solutions:
β
Retry processingβ
Use different provider (GPT-4o-mini β Gemini)β
Report bug if persistentβ
Check if transcript has unusual charactersVery slow processing
Expected times:
- Short (1-2 pages): 10-20 seconds
- Medium (5-10 pages): 20-40 seconds
- Long (20+ pages): 40-90 seconds
If slower:
β
Check internet speed (for cloud)β
Check CPU usage (for Ollama)β
Try smaller model (GPT-3.5-turbo)β
Close other applicationsValidation Issues
Section titled βValidation IssuesββNo speaker labels detectedβ
This is informational, not an error:
- Single-speaker transcripts work fine
- AI will infer context without labels
To add speaker labels:
Original:"We need to decide on the budget."
With labels:"John: We need to decide on the budget."βTranscript structure unclearβ
Improve structure:
β Poor:we talked about stuff and decided things...
β
Good:Meeting Topic: Q4 Budget PlanningJohn: We need to finalize the Q4 budget.Sarah: I propose allocating 40% to marketing.Decision: Approved 40% marketing budget.Best Practices
Section titled βBest PracticesβFile Preparation
Section titled βFile Preparationβ1. Clean up transcript:
Remove:- Email signatures- Legal disclaimers- Repeated headers/footers- Excessive line breaks
Keep:- Actual meeting content- Speaker names- Decisions and actions- Important context2. Add structure:
# Meeting: Q4 PlanningDate: October 16, 2024Attendees: John, Sarah, Mike
## Discussion
John: ...Sarah: ...
## Decisions
1. Approved marketing budget2. Launch date set for November 1
## Action Items
- [ ] John: Finalize contracts by Friday- [ ] Sarah: Send budget breakdown3. Verify quality:
- β Complete sentences
- β Proper punctuation
- β No garbled text
- β Consistent formatting
Optimization Tips
Section titled βOptimization TipsβFor faster processing:
- Use GPT-4o-mini (fastest cloud)
- Remove unnecessary content before upload
- Process during off-peak hours
- Use Ollama for local processing (no network latency)
For better accuracy:
- Use GPT-4o or Gemini 1.5 Pro
- Include speaker labels
- Add context (meeting type, date)
- Proper sentence structure
For cost optimization:
- Use Ollama (free, local)
- Use Gemini 1.5 Flash (cheapest cloud)
- Clean transcript before processing (fewer tokens)
- Batch similar transcripts
Next Steps
Section titled βNext Stepsβπ Youβre now a transcript processing expert!
Recommended Actions:
Section titled βRecommended Actions:β- π Upload test transcript - Try different formats
- π Experiment with reprocessing - Compare different models
- π Explore visualizations β 07_VISUALIZATION_DEEP_DIVE_GUIDE.md
- βοΈ Try interactive editing β 08_INTERACTIVE_EDITING_GUIDE.md
- πΎ Set up backups β 09_DATA_MANAGEMENT_GUIDE.md
Advanced Topics:
Section titled βAdvanced Topics:β- Batch processing multiple files
- Custom prompts for specialized analysis
- API optimization for high volume
- Sentiment analysis with temporal data
π From text to insight in seconds.