1) Executive Summary
Many organizations run critical operations using tribal knowledge: procedures live inside people’s heads, informal videos, screenshots, chat messages, or ad-hoc notes. The result is slow onboarding, inconsistent execution, and high operational risk.
Operation Document Generator solves this by converting videos and images into structured operational documentation—with OCR, scene/frame processing, deduplication, LLM-based refinement, and rules/manual templates. It also adds a searchable knowledge layer using vector embeddings so teams can retrieve relevant steps and best practices quickly.
2) The Problem We Solve (Business View)
Common pain points organizations face:
- Operational knowledge is trapped in video recordings (screen recordings, training demos, “how-to” recordings) and is hard to reuse as formal documentation.
- Manual creation of SOPs/runbooks is slow and expensive and often becomes outdated quickly.
- Quality and consistency problems: different teams write procedures differently, steps are missed, and documentation becomes unreliable.
- Duplicate/near-duplicate steps and screenshots bloat documents and confuse users. Your system specifically addresses this through merging and deduplication.
- Lack of governance and repeatability: without rule-based control, documentation generation becomes inconsistent across teams.
Who experiences this most?
- IT operations & support teams building runbooks and incident procedures
- Shared service centers, BPO, and helpdesks documenting repetitive workflows
- Manufacturing/field operations capturing step-based procedures
- Any organization with compliance-driven procedures requiring standardization and traceability
3) The Solution
Operation Document Generator is a modular application that:
- Accepts operational videos and images via a web interface
- Analyzes video quality and extracts metadata before processing
- Splits video into frames, merges related frames into scenes, then removes duplicates
- Runs OCR on selected frames/images and stores results in PostgreSQL
- Refines steps into a structured manual (manual templates + rule-based structure + LLM refinement)
- Provides a Rule Manager UI to govern dedup/overlap handling and analyze rule usage
- Adds a chatbot / knowledge retrieval layer using embeddings + vector similarity search for fast Q&A
4) How It Works (End-to-End Workflow)
Step 1 — Upload & validate
Users upload a video through the web UI; the system extracts metadata and can run a quality analyzer to confirm the video is suitable for processing.
Step 2 — Frame extraction & scene merging
The video processor extracts frames at intervals (frames extractor), then merges related frames into logical groups/scenes to reduce redundancy and make analysis easier.
Step 3 — Deduplication
A dedicated deduplication component removes duplicate or nearly identical frames (hash/image similarity), ensuring only meaningful unique content moves forward to documentation.
Step 4 — OCR + storage
OCR is run on selected frames/images using tools like Tesseract, EasyOCR, or Vision APIs. Results are stored via repository logic into PostgreSQL for later retrieval and auditing.
Step 5 — Convert raw observations into structured steps
A “Chain of Thought” pipeline converts raw extracted content into a clean step-by-step action flow: it extracts candidate steps, validates them, clusters and merges overlaps, removes duplicates, and standardizes output into a manual template format.
Step 6 — Rules + Manual Structure governance
Two approaches can be applied: Manual structure (template-driven) or Rule-based structure (governed via rules integration + rule UI).
Step 7 — Serve, search, and monitor
Generated outputs and artifacts can be stored in object storage (MinIO), while system behavior is logged into separated log files and DB tracking to support monitoring and troubleshooting.
5) Key Implemented Features
A) Video processing engine
- Video upload handling + metadata capture
- Video quality analyzer to filter low-quality inputs
- Frame extraction service
- Scene/frame merging to reduce noise and improve step grouping
- Image/frame deduplication (hash/similarity-based) to keep only meaningful frames
B) OCR and content extraction
- OCR service layer with repository persistence (auditable results)
- Unified handler concept that can route input to OCR, prompting, post-processing
C) “Chain-of-Thought” step structuring (core differentiator)
- Converts messy raw video content into clean, ordered operational steps
- Clustering, overlap resolution, merging, cleanup, and template output
D) Rules Integration + Rule Manager UI (governance & control)
- A web UI for CRUD management of dedup/overlap rules
- Rule analysis view: frequency of rule usage + unused active rules
- Document overlap review page to inspect applied rules against document results
- Versioning behavior for rule Excel outputs (auto-increment versions)
E) Knowledge retrieval (Embeddings + Vector Search)
- Embeddings component supports multiple providers (local models/Ollama/OpenAI/AWS Bedrock)
- PostgreSQL + pgVector used for vector similarity search
- Chatbot repository stores messages + embeddings and retrieves similar messages via vector similarity
- Script for adding “best practice guidelines” into embeddings to improve chatbot guidance
F) Enterprise-ready operations
- PostgreSQL DB initialization + reset scripts; structured objects and sequences for consistent IDs
- Object storage integration via MinIO (S3-compatible) for frames/artifacts/log outputs
- Dedicated logs (app log, docgen log, monitoring logs) to isolate issues quickly
- Containerization with Dockerfile and Kubernetes/OpenShift deployment patterns
G) UX and usability
- Multi-language UI support (English/Japanese translations)
- Video upload UI scripts (drag/drop, progress, error handling)
- Admin dashboard templates for monitoring video upload status and user activities
6) Architecture
- Web layer: routing, templates, sessions, language selection, login protection
- Service layer: video upload service, rules integration service, manual structure service, frame services
- Data layer (PostgreSQL + pgVector): stores video metadata, OCR outputs, embeddings, session logs, processing results
- Object storage (MinIO): stores large binary content like extracted frames and artifacts
- Observability: rotating logs + monitoring scripts to restart and maintain uptime
- Deployment: Dockerized app + Kubernetes/OpenShift deployment configurations
7) A Concrete Example Result
Our pipeline shows a practical impact of deduplication and refinement: starting with 6 unique records, we deduplicate down to 4, then continue clustering, merging, and refining to produce a final document—achieving a ~33% reduction by the end of the pipeline. This means less noise, fewer repeated steps, cleaner manuals, and faster review times.
8) Primary Use Cases
Use Case 1 — IT Operations Runbooks from Screen Recordings
Situation: IT teams record incident fixes and maintenance steps as videos.
Pain: Converting them into runbooks takes hours and often misses steps.
How we help: Upload video → extract frames → OCR UI text → build step sequence → enforce dedup rules → publish runbook.
Use Case 2 — Standard Operating Procedures (SOP) for Business Process Teams
Situation: Operations teams train staff using videos/screenshots.
Pain: Onboarding is slow; procedures are inconsistent.
How we help: Generates standardized manuals with templates + rule governance + multilingual UI.
Use Case 3 — Customer Support Knowledge Base from Repeated Solutions
Situation: Support agents solve similar issues repeatedly.
Pain: Knowledge is scattered across chats and recordings.
How we help: Store embeddings + vector search to retrieve similar cases and best-practice guidance.
Use Case 4 — Compliance-Driven Procedures & Auditability
Situation: Regulated orgs require traceable and consistent procedures.
Pain: Hard to prove steps were captured consistently and updated.
How we help: Rule manager + DB persistence + logs create governance and traceability for documentation generation.
9) Differentiators (Why This Is Not “Just OCR”)
- Video-to-manual pipeline with scene logic and deduplication
- Rule governance UI to control overlaps/duplicates and measure rule effectiveness
- LLM-powered refinement via prompt handling and step quality improvements
- Enterprise stack readiness: Postgres + pgVector, MinIO storage, Docker/Kubernetes/OpenShift patterns, logs/monitoring scripts
- Multilingual UI (English/Japanese) for global delivery