Every delegation decision a lawyer makes — to an associate, a paralegal, a contract attorney — involves the same threshold question: what am I asking this person to do, and does it require my judgment or theirs? Lawyers delegate judgment to juniors all the time, and it works, because the senior attorney knows what the failure modes look like. She knows where a second-year associate is likely to miss a nuance, where a paralegal might mislabel a document, where a contract attorney will flag too much or too little. She can supervise because she has a mental model of how her delegates fail.
LLMs fail differently — and most lawyers have no model for it at all. The result is that lawyers who would never let a first-year associate decide which risks in a deal are "significant" will ask an LLM to do exactly that, and accept the answer, because the output reads like it was written by someone who knows what they are talking about.
Why this matters more than it seems
The mistake is understandable, because LLMs are extraordinarily good at sounding like they are exercising judgment. But the appearance is a product of mechanics, not cognition. An LLM generates text by predicting the next most probable token in a sequence, drawing on statistical patterns across its training data. It does not reason from principles to conclusions the way a lawyer does. It produces the output that is most likely given the input — which, for well-trodden territory, often looks indistinguishable from expert analysis.
This architecture makes LLMs genuinely excellent at certain tasks. They can manipulate text with high accuracy: reformatting, restructuring, summarizing, translating between registers. They can simulate semantic understanding well enough to compare two documents, extract defined terms, or organize a sprawling set of facts into a coherent narrative. They can generate multiple variations of a clause, a letter, or an argument. These are tasks where pattern-matching across a large corpus produces reliable results.
What the architecture does not support is the exercise of judgment — the weighing of competing considerations against a set of values, priorities, and contextual facts that exist nowhere in the training data. When a lawyer decides that a particular risk is acceptable for this client in this deal, that decision reflects knowledge the model cannot access: the client's risk tolerance, the business relationship between the parties, the practical likelihood of enforcement, the lawyer's own experience with similar transactions. An LLM asked to make that same decision will produce an answer that is fluent, confident, and grounded in nothing.
The most common mistake I see lawyers make with LLMs is not using them on the wrong task. It is using them on the right task in the wrong way — by asking the model to exercise professional judgment rather than to surface the information the lawyer needs to exercise that judgment herself.
The judgment words
There is a class of words that should function as a warning light every time they appear in a prompt. They are words that delegate evaluative judgment — the kind that requires contextual understanding, risk tolerance, client knowledge, and professional responsibility. Words like:
Reasonable. "Draft a reasonable indemnification clause." Reasonable according to whom? For what deal size, what risk profile, what industry? The word "reasonable" encodes a judgment call that depends on facts the model does not have and priorities only the lawyer can set.
Appropriate. "Include appropriate representations and warranties." Appropriate for a $500,000 seed round is not appropriate for a $50 million acquisition. The model will produce something that looks right. Whether it is right depends on context the model cannot evaluate.
Significant. "Identify the significant risks in this agreement." Every provision in a commercial contract allocates risk. Which risks are significant depends on the client's business, the deal's economics, and the lawyer's understanding of what could actually go wrong. The model has none of that.
Best. "What is the best approach to structuring this transaction?" Best for tax efficiency? Best for speed to close? Best for preserving the founder's control? The word "best" implies a single right answer where the real question is which tradeoffs the client is willing to accept.
Material. "Flag any material deviations from our standard terms." Materiality is a legal conclusion — one that depends on the governing law, the specific transaction, and often the client's own internal policies. An LLM can compare two documents and identify differences. It cannot determine which differences matter.
The pattern is the same in every case. These words ask the model to make a decision that requires professional judgment, and the model will comply — confidently, fluently, and without any indication that it is producing the statistically most probable answer rather than the correct one.
The fix: options, not conclusions
The solution is not to avoid these concepts. It is to restructure the prompt so the model does the work it is good at — gathering, organizing, comparing, generating variations — while the lawyer retains the decision.
The difference is mechanical, not philosophical. Compare:
Delegating judgment: "Identify the most significant risks in this lease agreement."
Delegating the task: "Identify the ten largest potential financial exposures in this lease agreement, state the contractual basis for each, and estimate the range of liability based on the terms."
The first prompt asks the model to decide what matters. The second asks it to surface options — with enough specificity that the lawyer can decide what matters.
A few more:
Delegating judgment: "Draft an appropriate force majeure clause."
Delegating the task: "Draft three force majeure clauses: one broad (covering any event beyond the parties' reasonable control), one narrow (limited to named events), and one with a mutual termination right if the force majeure exceeds 90 days."
Delegating judgment: "Summarize this contract and flag anything I should worry about."
Delegating the task: "List every obligation this contract imposes on the buyer, organized by section, and note the triggering condition and remedy for breach of each."
Delegating judgment: "Suggest the best structure for this acquisition."
Delegating the task: "Compare asset purchase, stock purchase, and merger structures for this acquisition on the following dimensions: tax treatment to the buyer, successor liability exposure, third-party consent requirements, and estimated closing timeline."
In each pair, the model does roughly the same amount of work. The difference is what happens next. The first version produces a conclusion the lawyer might accept without scrutiny because it sounds authoritative. The second produces a structured set of options the lawyer must evaluate — which is where the lawyer's value lies.
The supervision problem
The options-versus-conclusions framing serves a second purpose: it tests whether the lawyer can evaluate what the model produces. A prompt that asks for the "best" clause generates a single answer that requires little expertise to accept. A prompt that asks for three clauses with different risk profiles generates a choice — and making that choice requires understanding the tradeoffs among them. The structure of the prompt builds in a competence check that the conclusion-seeking prompt lacks.
This matters because the options-based approach will sometimes reveal that the lawyer asking the question cannot readily distinguish among the alternatives. That is useful information. It suggests either that the lawyer needs to develop more familiarity with the subject matter before proceeding, or that the task calls for someone with deeper expertise. Either response is professionally sound. What is not sound — and what the duty of technological competence increasingly requires lawyers to avoid — is accepting the model's conclusion on faith because the prose is fluent and the formatting is clean.
The question worth asking before relying on any AI-generated analysis is whether you are asking the tool to do something you could verify, or something you are trusting it to get right. Where the answer is closer to trust than verification, you have likely delegated the judgment along with the task.
A practical heuristic
Before submitting a prompt that calls for analysis or recommendations, scan it for evaluative language — words like reasonable, appropriate, significant, best, material, sufficient, or adequate. Where you find one, consider whether you could replace it with a concrete criterion. Where you can, do. Where you cannot, that is usually a signal to restructure the prompt so it requests a range of options rather than a single recommendation.
This is, in substance, the same discipline lawyers apply when delegating to a junior — scoping the assignment so the person doing the work knows what to look for and the person receiving it knows what to do with it. The difference is that the junior will push back or ask clarifying questions when the assignment is ambiguous. The model generally will not.