Illustrates the end-to-end process of image transcription in Joplin, from client upload to server-side processing, job management, and result retrieval.
flowchart LR
subgraph Client
ClientNode((Joplin))
end
subgraph JoplinServer
JS[[REST API]]
end
subgraph Transcribe
API[[Transcribe API :4567]]
Q[(Internal Queue)]
Worker[[Job Processor]]
DB[(PostgreSQL - Jobs)]
Store[(Images Folder)]
Engine[[LlamaCPP + LLM]]
end
ClientNode -- "POST /transcribe" --> JS
JS -- "POST /transcribe?secret=***" --> API
API -- "Persist job (created)" --> DB
API -- "Save image" --> Store
API -- "Enqueue job" --> Q
Worker -- "Dequeue" --> Q
Worker -- "Load image" --> Store
Worker -- "Transcribe" --> Engine
Worker -- "Update status/result" --> DB
Worker -- "Delete image" --> Store
ClientNode -- "POST /transcribe/:job_id" --> JS
JS -- "POST /transcribe/:job_id?secret=***" --> API
API -- "Read status/result" --> DB
API -- "Status/result" --> JS
JS -- "Status/result (text)" --> ClientNode
This diagram details the architectural flow for image transcription within the Joplin ecosystem. It shows how the Joplin client initiates an image upload, which is then handled by the Joplin Server and proxied to a dedicated Transcribe service. The Transcribe service manages job persistence, image storage, queues tasks for a job processor, and utilizes an LLM engine (LlamaCPP) for transcription. The client then polls for the job's status and retrieves the final transcribed text.
Use this diagram to understand the asynchronous processing of media files in a client-server architecture, design a job-based system for long-running tasks, or implement an image processing pipeline involving AI/ML models. It's suitable for systems requiring background processing and status updates.
This design can be adapted by replacing LlamaCPP with other LLMs or transcription services, integrating different queuing systems (e.g., Kafka, RabbitMQ), using object storage instead of a local images folder, or implementing webhooks for result notification instead of polling. The 'secret' parameter suggests an authentication layer that could be expanded.