The Agent Is Not Exercising Judgment

May 8, 2026

Thomson Reuters announced in April that the next generation of CoCounsel Legal will provide “fiduciary-grade AI” and perform “just as well as a senior associate.” Harvey’s platform now processes more than 400,000 agentic queries per day across contract review, document analysis, and legal research. LexisNexis launched Protégé with agentic drafting capabilities in May, promising “review-ready work product in minutes.” Each vendor has adopted the same word to describe what their tools do: they are agents—systems that plan, reason, select tools, and execute multi-step legal workflows with diminishing human involvement.

The marketing is ahead of the architecture. But the more pressing question is whether the architecture, even when it catches up, fits the work that lawyers are professionally obligated to perform themselves.

What “agentic” means and what it doesn’t

The term “agentic AI” describes systems that go beyond single-prompt, single-response interactions to execute multi-step workflows. Rather than answering a question, an agentic system receives a goal, decomposes it into subtasks, selects the tools and data sources for each subtask, evaluates intermediate results, and adjusts its approach before delivering a final output. The shift from chatbot to agent is, in the vendors’ framing, the shift from an AI that assists to an AI that works.

The distinction has some technical basis. A system that can plan a research strategy, retrieve authorities from Westlaw, compare them against uploaded documents, check whether cited cases remain good law, and assemble a structured memo has a different capability profile from a system that responds to a single prompt with a single block of text. The multi-step orchestration represents a genuine capability advance, and the outputs can be useful.

But much of what vendors are selling as “agents” is better described as workflow automation—predefined sequences of steps with conditional logic, executed by a language model rather than a human but following a structure the human (or the vendor) designed in advance. Harvey’s Agent Builder, for instance, evolved from a product called Workflow Builder. Users describe or design workflows, connect blocks in a no-code interface, add branching logic, and choose which model handles each step. The platform has facilitated the creation of more than 25,000 custom workflows. That is sophisticated automation, and it can save substantial time. Calling it an “agent” that “reasons through tasks” borrows language from the AI research literature in ways that obscure what the tool is doing—executing a sequence of operations that a human defined, with a language model providing flexibility at each node rather than rigid rule-following.

The distinction between a genuine agent and a workflow matters, because the professional-responsibility implications differ. A workflow that follows a predefined plan raises questions about whether the plan was well designed and whether the outputs at each step were accurate. An agent that formulates its own plan, makes intermediate judgment calls the user never sees, and adjusts its strategy in response to what it finds raises a different set of questions—ones that go to the core of what it means to supervise.

The delegation framework, revisited

In an earlier post, I drew a line between delegating tasks and delegating judgment. The argument was that LLMs are well suited for work that can be specified with concrete criteria—comparing documents, extracting defined terms, generating multiple clause variations—and poorly suited for work that requires weighing competing considerations against contextual facts the model cannot access. The practical heuristic was to scan prompts for evaluative language (“reasonable,” “appropriate,” “significant,” “material”) and restructure them to request options rather than conclusions.

The rise of agentic tools complicates that framework—or so the standard argument goes. If an agent executes a ten-step workflow, and steps three through seven each involve choices about what to prioritize, which documents to retrieve, how to frame a comparison, and what to include or exclude from the final output, then the human who launched the workflow has delegated not just a task but a series of intermediate judgment calls embedded within it. The task/judgment distinction collapses, on this account, because legal tasks are saturated with judgment at every level of decomposition.

I think the standard argument gets the conclusion wrong, though the observation is right. Legal tasks do contain embedded judgment at nearly every step. Research requires deciding which terms to search, which results to pursue, which authorities to read closely, and which to set aside. Drafting requires deciding which provisions to include, how to frame contested terms, and what risks to address or leave unaddressed. Document review requires deciding what counts as responsive, what is privileged, and how to handle the ambiguous material at the margins. Even summarizing a contract—a task that sounds mechanical—requires judgment about which provisions deserve emphasis and which are boilerplate, and that judgment depends on the deal, the client, and the legal context.

The fact that legal tasks are judgment-laden does not mean the delegation framework fails. It means the framework identifies precisely why these tasks are poor candidates for autonomous execution. If you cannot specify a task without delegating the evaluative decisions embedded in it, then handing that task to an agent hands the judgment to the agent—which is what my earlier post argued lawyers should avoid.

Where agents work and where they don’t

The tasks best suited for agentic automation are the ones where the agent’s output serves as an input to the lawyer’s judgment rather than a substitute for it. That category is broader than it might first appear, and it includes work that falls within the scope of legal practice, not just the administrative functions that surround it.

Start with the operational end. Scheduling, calendaring, and deadline tracking are workflow problems with well-defined rules. Document management—organizing files, applying naming conventions, routing documents to the right matter—involves decisions, but they are decisions governed by firm policy rather than professional judgment. Billing and time-entry review, conflict-check data gathering, and CLE tracking all share the same structural feature: the criteria for a correct output can be specified in advance, and deviations from those criteria can be detected without the kind of contextual evaluation that defines legal reasoning.

But agentic tools can also handle certain practice-area tasks well, and acknowledging that is necessary to draw the line in the right place. An agent that monitors a regulatory docket and flags new rulemaking proposals, scans recent decisions in a practice area and summarizes developments, tracks legislative changes across jurisdictions, or gathers and organizes case law updates on a recurring schedule is performing substantive legal work—work that requires familiarity with legal sources and terminology, and that a firm would historically have assigned to an associate or a research librarian. These tasks are appropriate for agentic automation because their output is informational: the agent collects, organizes, and presents material that the lawyer will then evaluate. The lawyer decides which developments require client alerts, which regulatory changes affect pending transactions, and which new authorities alter the analysis in an ongoing matter. The agent feeds the lawyer’s judgment; it does not exercise judgment of its own.

Oregon’s Formal Opinion 2026-208 illustrates how one state bar is thinking about the boundary. The opinion addresses whether law firms may use AI agents that interact autonomously with prospective clients, and concludes “yes, qualified”—subject to extensive conditions. The lawyer remains responsible for supervising the accuracy of all outputs, for ensuring confidentiality, and for preventing the agent from making false or misleading statements about the firm’s services. The opinion treats the AI agent as a nonlawyer assistant under RPC 5.3, subject to the same supervisory obligations that would apply to a human performing the same function. What the opinion permits is bounded, monitored, information-gathering work—and it permits it only on those terms.

The difficulty arises when the agent’s output crosses from informational input to evaluative conclusion. A system that “plans, selects tools, retrieves authoritative content, analyzes the material, verifies citations, and delivers structured work product”—Thomson Reuters’ description of CoCounsel’s workflow—is performing a sequence of actions that, when done by a human, would involve professional judgment at each step. The selection of which authorities to retrieve is a judgment call. The analysis of retrieved material is a judgment call. The decision about how to structure the work product is a judgment call. Automating the sequence does not eliminate the judgment; it transfers the judgment to a system that exercises something that resembles it but is not. An agent that scans a regulatory docket and delivers a summary of new filings gives the lawyer material to work with. An agent that scans the same docket and delivers a memo recommending which filings require client action has crossed the line—because the recommendation depends on client-specific context, risk tolerance, and strategic considerations the agent cannot access.

The supervision gap

Model Rule 5.1(a) requires partners and managerial lawyers to “make reasonable efforts to ensure that the firm has in effect measures giving reasonable assurance” that all lawyers in the firm conform to their professional obligations. Rule 5.3 imposes analogous duties for nonlawyer assistants, and ABA Formal Opinion 512 extends that category to encompass AI tools. The question is what “reasonable efforts” means when the assistant is an autonomous workflow that makes intermediate decisions the supervising lawyer cannot observe in real time.

When a partner assigns a research memo to a second-year associate, supervision is possible because the partner can inspect the process. She can ask what search terms the associate used, why she focused on certain cases, what she considered and rejected, and how she arrived at her conclusions. The work product reflects a chain of reasoning the supervisor can reconstruct, challenge, and—when necessary—redirect. The associate also exercises a form of professional responsibility herself: she has ethical obligations, she can push back on the assignment, and she can flag problems the assigning partner did not anticipate.

An agentic AI tool does none of those things—it has no professional obligations, it will not push back, and depending on the tool’s design, it may not produce a record of its intermediate decisions that would allow the supervising lawyer to reconstruct why the final output looks the way it does. In most current implementations, the system may log which databases it queried, but it will not explain why it chose one line of authority over another. It may show the documents it retrieved, but it will not account for the documents it considered and discarded. The supervisor receives the output and, in many cases, must evaluate it without meaningful access to the reasoning that produced it.

This is the supervision gap that agentic tools create, and it is structural rather than incidental. The gap does not arise because the tools are poorly designed. It arises because the tools are designed to reduce human involvement—that is their value proposition. Every step the agent handles autonomously is a step the lawyer does not have to perform, but it is also a step the lawyer cannot readily inspect. The efficiency gain and the supervisory loss are two descriptions of the same design choice—and the vendors’ pitch foregrounds only the first.

The ACEDS analysis of agentic AI liability captures the shift precisely: traditional generative AI creates “output risk”—the danger that a single product contains errors. Agentic AI introduces “workflow risk”—failures that propagate across multiple steps and multiple matters before detection. An agent that misapplies a research criterion or a classification rule does not produce one flawed output; it produces systematically flawed results across every matter it touches until someone catches the error. The scale of the efficiency is also the scale of the potential failure.

California’s response

California’s Committee on Professional Responsibility and Conduct approved proposed amendments to six rules of professional conduct in March 2026, after the California Supreme Court itself directed the State Bar to consider guidance addressing agentic AI tools specifically. The proposed amendments, which completed their public comment period on May 4, represent the most detailed AI-specific rule changes any state bar has put forward.

The proposed amendment to Rule 1.1 requires lawyers to “independently review, verify, and exercise professional judgment regarding any output generated by the technology that is used in connection with representing a client.” No exception for low-stakes tasks. No exception for outputs that “look right.” The rule as proposed treats independent verification as a component of competence, not an optional best practice.

The proposed amendments to Rules 5.1 and 5.3 require managerial lawyers to establish functioning AI governance policies and make clear that supervisory obligations extend to staff use of AI tools. These are not new obligations in substance—ABA Formal Opinion 512 already interprets the existing rules to cover AI—but California’s approach makes the obligations textually explicit and, if adopted, enforceable through the disciplinary process rather than through guidance documents with no binding authority.

What the California proposals do not yet address is the specific challenge that agentic workflows pose: the difficulty of supervising a system that makes intermediate decisions the lawyer cannot observe. The verification requirement in the proposed Rule 1.1 amendment is framed around outputs—the lawyer must verify the final product. But a multi-step agentic workflow can produce a final output that appears correct precisely because the intermediate decisions that shaped it are invisible. If the agent chose to emphasize one line of authority over another, or structured a comparison in a way that foregrounded certain risks while underweighting others, the final output will reflect those choices without revealing them. Verifying the output catches the fabricated citation. It does not catch the analysis that was shaped by choices the lawyer never made and cannot reconstruct.

The practical question

None of this means that agentic AI tools lack value in legal practice—even in the practice itself, as opposed to the business operations that support it. The value is substantial for any task where the agent’s output functions as raw material for the lawyer’s own evaluation: gathering regulatory updates, tracking case law developments, scanning for legislative changes, or assembling the factual record the lawyer needs to advise a client. These are tasks where the agent collects and the lawyer decides—where the output is an input, not a conclusion.

The risk is that firms deploy these tools where the output is itself the exercise of judgment, then rely on output verification to catch failures that are, by their nature, invisible in the output. The vendors’ competitive incentives push in exactly that direction—toward positioning their products as substitutes for the evaluative work that defines legal practice rather than as infrastructure for the information-gathering work that supports it. “Fiduciary-grade AI” and “just as well as a senior associate” are marketing claims designed to encourage precisely the kind of delegation the ethics rules counsel against.

The question worth asking before deploying an agentic tool on a legal task is whether you could supervise a human performing the same task using only the information the tool makes available to you. If you could not evaluate an associate’s research memo without knowing what search terms she used, what authorities she considered and rejected, and why she structured the analysis the way she did, you cannot adequately supervise an agent that withholds the same information. And the associate, at least, would tell you when she was in over her head. The malpractice insurers appear to have noticed: the ACEDS reports that carriers are introducing exclusions or limitations for higher-autonomy use cases and warning that over-reliance without verification could affect coverage. The profession’s enthusiasm for agentic AI may, in time, be tempered less by the ethics rules than by the underwriters.

This post draws on product announcements from Thomson Reuters, Harvey, and LexisNexis; the California State Bar’s proposed amendments to the Rules of Professional Conduct; Oregon State Bar Formal Opinion 2026-208; the ACEDS analysis of agentic AI liability; and the ABA Model Rules of Professional Conduct and Formal Opinion 512. It extends the analysis of judgment delegation from a prior post and the supervisory framework discussed in posts on sycophancy and verification.