Vision-LLM · structured extraction

Screen Pipeline

v1.0.0 Apache 2.0 Self-hosted · BYOK

The data you need is on a screen, and the screen has no API. A legacy line-of-business app, a vendor tool with no export, an industrial HMI, a lab instrument, an operations dashboard — it shows you the numbers and gives you no way to get them out. Screen Pipeline reads that screen with a vision-LLM and emits its state as schema-validated JSON.

Clone it — free How it works

What it is

When the source has no API, read the screen.

Point Screen Pipeline at the screen; it captures a frame on a schedule or a trigger, a vision-LLM reads it the way a person does, and the result is validated against a JSON schema you control — ready to flow into monitoring, logging, automation, and integration pipelines. Self-hosted and BYOK: point it at a local Ollama box or your own model key. Client screens never have to leave your network.

01

Survives UI changes

Template-OCR and coordinate-based RPA break the moment a vendor nudges the layout. A vision-LLM reads the screen by content, so a layout update that would shatter a coordinate script just keeps working.

02

Schema-validated emissions

Every emission validates against a JSON schema you own. If the model returns the wrong shape, you find out at the boundary — not three steps downstream when your monitoring stack chokes on a missing field.

03

Self-hosted, BYOK

Point it at a local Ollama server or any OpenAI-compatible endpoint with your own key. Client screens stay inside the trusted network — no SaaS round-trip, no vendor seeing the data.

How it works

From a frame to a row, on a schedule.

Capture, read, validate, emit — with a JSON schema acting as the contract between you and the model.

Capture

Schedule or trigger

A frame on a cron, or on a manual trigger, or on a content-change watcher. Region-bounded so you read just the chunk that matters.

Read

Vision-LLM, your model

The frame goes to a vision-LLM with a prompt and your target schema. Local Ollama for fully-offline; any OpenAI-compatible endpoint if you want the cloud option.

Emit

Validated JSON

Output is validated against your JSON schema before it leaves the pipeline. Pass it to a monitoring stack, a log sink, a workflow engine, or write it to disk.

Who it's for

If your data is trapped on a screen, it will fit.

You are an automation engineer hitting a legacy line-of-business app or a vendor tool that exposes no export and no API.
You are a systems integrator who needs to lift state off an industrial HMI, a lab instrument, or a closed dashboard into a modern monitoring stack.
You are an ops team whose monitoring source is a vendor portal that refuses to talk to anything except a human staring at it.

Plainly

What it does, and what it doesn't.

What it does

Captures a screen region on a schedule or trigger
Sends the frame to a vision-LLM you control (local or BYOK cloud)
Validates the model's output against your JSON schema
Emits validated JSON to a sink of your choice
Ships with a mock-driven demo — no capture hardware or model needed to try it
Keeps client screens inside the trusted network

What it doesn't

It is not a DOM scraper — if your source is a web page, use Playwright instead
It is not for sources that already have an API, database access, or an export — use those
It does not click or type back into the screen — reads only, by design
It is not a managed SaaS — you run the capture host and the model
It does not promise accuracy beyond what the vision-LLM can read; schemas catch shape errors, not perception errors

It's free and open-source. Clone it.

Screen Pipeline is Apache 2.0 — no tiers, no account, no paid version held back. Clone the repo, install with pip install -e ., and run the mock demo to see it work in 60 seconds.

Get Screen Pipeline on GitHub All the tools