.sort() · Zhehang

I worked closely with tutors on a problem that costs them real time. One of the most time-consuming parts of their work is building revision packages: practice-heavy subjects, the maths and sciences, need students to work through challenging past exam questions, so a tutor pulls out every question on a given topic across every year of prelim papers. Doing it one question at a time, by hand, is slow and inconsistent.
.sort() does the whole thing for them, from a pile of papers to a finished package.

It is not a general chatbot you dump fifty PDFs into and hope for a clean answer. It is a purpose-built LLM skill, with its prompt and rubric tuned so that each question lands under the right topic reliably.

The shape of the problem

The input is a pile of prelim papers as PDFs. The wanted output is structured: for every question in every paper, which topic in the syllabus it belongs to. That is a classification task over messy, scanned, inconsistent documents, and at a scale where doing it manually would be highly inefficient.

The pipeline

inputprelim papers

fan out1 agent per paper

classifyschema JSON

reviewmerge + human check

assemblecrop + typeset

outputrevision package

One LLM agent is spawned per paper, all running in parallel, so every paper is processed together rather than one after another. The reliability trick is that every question comes back as a JSON record forced to a fixed schema, so the model cannot hand back any random tags, only a valid one that composes cleanly into one dataset.

Try it

Run the walkthrough on a sample paper

Interactive · run the pipeline

A hand-built sample paper. Hit run and watch the pipeline read it, tag every question, crop the figures, and assemble a revision package.

Illustration of the pipeline on a hand-built sample paper. The real pipeline runs on Claude with a human-review pass.

Making sure it works

A classifier is only usable if it is reliable. To ensure reliability, I drew on my years as a tutor. I first hand-grouped a few papers myself, sorting and tagging each question into its topic exactly as I had done previously.

This allows the skill to understand what each topic looks like. This is a crucial step, as when left to itself the AI will file a topic such as circles into a topic such as coordinate geometry.

From there, the skill runs parallel agents to source and tag every paper given. Where its tags disagreed with mine, the rubric was usually the ambiguous part, not the model, so I tightened the wording and ran it again.

A final human-review pass then sweeps the full set for the residual misses left by the agents.

The reliability comes from anchoring the whole thing to a tutor’s knowledge and judgement first, not from hoping the model knows the syllabus.

From sorted to built

Sorting is only half the job. Once each question is filed and checked, a builder assembles the finished product. The catch is that the AI tends to redraw the diagrams itself, which introduces a host of errors. To keep it reliable, a cropping skill was developed so the agents lift the real figure straight from the source paper rather than draw their own.

.sort() is the whole package: a myriad of prelim papers enter the funnel, and a well-sorted revision package comes out the other end, all without spending days doing it by hand.