📝 Text-LLM Spike — Phase 0.5 Feasibility

Tests on-device small text LLMs (Gemma 3 270M, SmolLM2, Qwen) for structured-field extraction from already-OCR'd document text. Measures load time, inference time, output quality on the same Motorola G that hit the wall on vision LLMs.

Last completed run (recovered from crash)

—

Environment

Config

Model

Device

Quantization

Idle. Click "Load model" to fetch weights (cached after first load).

Document text (OCR-equivalent)

Paste the raw text content of a doc (or click "Load sample pay stub" for a test fixture). This simulates what native ML Kit OCR would produce.

Extraction prompt

The system prompt + user template that's sent to the LLM. Edit if you want to test prompt variations.

System User template (use {TEXT} for doc content)

Run

Load model and paste doc text.

Timings

Run	Tokenize	Generate	Decode	Total	Tokens/s

Latest output (raw)

—

Latest output (parsed JSON if extractable)

—