Joplin Server Image Transcription Flow

System Design · flowchart diagram · NOASSERTION

Illustrates the end-to-end process of image transcription in Joplin, from client upload to server-side processing, job management, and result retrieval.

Source: https://github.com/laurent22/joplin/blob/7f55a566e5a95e24cbfd7792cd7baf07ba46dc10/readme/apps/transcribe/system_architecture.md
Curated by laurent22
Joplin Transcription Image Processing LLM Asynchronous Microservices Job Queue

Mermaid source

flowchart LR
	subgraph Client
		ClientNode((Joplin))
	end

	subgraph JoplinServer
		JS[[REST API]]
	end

	subgraph Transcribe
		API[[Transcribe API :4567]]
		Q[(Internal Queue)]
		Worker[[Job Processor]]
		DB[(PostgreSQL - Jobs)]
		Store[(Images Folder)]
		Engine[[LlamaCPP + LLM]]
	end

	ClientNode -- "POST /transcribe" --> JS
	JS -- "POST /transcribe?secret=***" --> API
	API -- "Persist job (created)" --> DB
	API -- "Save image" --> Store
	API -- "Enqueue job" --> Q

	Worker -- "Dequeue" --> Q
	Worker -- "Load image" --> Store
	Worker -- "Transcribe" --> Engine
	Worker -- "Update status/result" --> DB
	Worker -- "Delete image" --> Store

	ClientNode -- "POST /transcribe/:job_id" --> JS
	JS -- "POST /transcribe/:job_id?secret=***" --> API
	API -- "Read status/result" --> DB
	API -- "Status/result" --> JS
	JS -- "Status/result (text)" --> ClientNode

What this diagram shows

This diagram details the architectural flow for image transcription within the Joplin ecosystem. It shows how the Joplin client initiates an image upload, which is then handled by the Joplin Server and proxied to a dedicated Transcribe service. The Transcribe service manages job persistence, image storage, queues tasks for a job processor, and utilizes an LLM engine (LlamaCPP) for transcription. The client then polls for the job's status and retrieves the final transcribed text.

When to use it

Use this diagram to understand the asynchronous processing of media files in a client-server architecture, design a job-based system for long-running tasks, or implement an image processing pipeline involving AI/ML models. It's suitable for systems requiring background processing and status updates.

How to adapt it for your project

This design can be adapted by replacing LlamaCPP with other LLMs or transcription services, integrating different queuing systems (e.g., Kafka, RabbitMQ), using object storage instead of a local images folder, or implementing webhooks for result notification instead of polling. The 'secret' parameter suggests an authentication layer that could be expanded.

Key concepts