Why I Killed the Cursor for Medical Records
The attorneys at one of our firms use a document called the medical index. It's the backbone of deposition prep -- a compiled PDF of every medical record in a case, grouped by provider, ordered by date of service, with a separate section for diagnostics. Building one takes a medical case manager anywhere from 30 minutes to two hours depending on the case, and they do about seven a day.
It's also completely static. Once it's built, it's a PDF. You can Ctrl+F through it and that's about it.
The firm wanted to automate this. The pitch made sense on paper: use AI to generate an initial medical index, then have the medical case managers review and edit it -- a Cursor-style human-in-the-loop approach. I built a prototype, deployed it to a subset of case managers, and ran it for a week.
It didn't work.
The numbers looked fine
This is the part that made the problem hard to see at first. I ran benchmarks across over 600 files and the generated index was over 90% accurate on the metadata used to build the PDF -- provider groupings, date ordering, diagnostic classification. By most measures, the AI was doing its job.
Case manager throughput was marginally faster. Everyone agreed on that. But "marginally faster" isn't enough when you're asking people to change how they work.
Where the client and I disagreed
The client's instinct was straightforward: make the indexes better. Throw more LLM calls at the problem. Improve accuracy from 90% to 95%, then 98%, and eventually the editing step becomes trivial.
I didn't think this would fix it. The issue wasn't accuracy. It was the workflow itself.
Two problems hiding as one
After watching case managers work with the prototype for a week, I realized there were two distinct problems.
Editing isn't much faster than creating from scratch. When a case manager builds a medical index manually, they're constructing a mental model of the case as they go -- which providers matter, how the treatment timeline flows, what the diagnostic picture looks like. They're building a memory stack. When you hand them a pre-generated index to edit, they don't have that context. They have to reconstruct it by reading through someone else's (or something else's) work. The time saved on assembly gets eaten by the time spent on comprehension.
Case managers aren't the consumers. This is the deeper problem. Medical case managers build the index, but attorneys are the ones who use it at depositions. Case managers are going to over-index on completeness -- they'll err toward including everything because they don't have firsthand knowledge of what actually matters in the room. Attorneys do.
This is where the Cursor analogy breaks down. With Cursor, engineers are both the writers and the consumers of code. They write it, they read it, they build on top of it. The feedback loop is tight. With medical indexes, the creator and the consumer are different people with different incentives and different context. A human-in-the-loop editing flow only works when the human in the loop is the one with the most stake in the output.
Removing the middleman
I brought this to the partners. The medical index itself isn't bad -- 90%+ accuracy is genuinely good. The problem is the verification step in the middle, performed by someone with less stake in the final product. That middle step is what's slow, and making the AI more accurate doesn't eliminate it.
My proposal: skip the editing workflow entirely. Generate the medical index automatically and pair it with an interactive pre-deposition tool.
The pre-depo tool annotates the generated index using the medical entities we'd already been extracting -- providers, diagnoses, procedures, treatment timelines. Instead of scrolling through a static PDF, an attorney can query it. Surface the records from a specific provider. Pull every mention of a diagnosis. Build a timeline of treatment leading up to an injury. The information that used to require flipping through dozens of pages and hitting Ctrl+F is now structured and interactive.
Attorneys can interact with the tool and those interactions can reshape the final PDF package or generate a deposition brief with the most critical points for that specific proceeding.
The tradeoff that isn't one
Attorneys spend about 15 minutes in the pre-depo tool per deposition. That's new time. But it's not additional time -- they were already spending time interpreting the case manager's static index before every deposition. The difference is that now those 15 minutes produce a meaningfully deeper understanding of the case, with the information structured around what they actually need in the room.
On the other side, medical case managers are completely out of the medical index process. Ten employees, each handling seven cases a day, each case taking 30 minutes to two hours. That time is returned entirely.
What I took from this
The instinct to put a human in the loop is usually right. But which human matters enormously. A verification step performed by someone who doesn't consume the output is just overhead dressed up as quality control. The real question isn't "how do we make AI accurate enough that editing is easy?" It's "who actually has stake in this output, and how do we put the tool in their hands?"
Sometimes the right move isn't to refine the AI. It's to rethink who's using it.