Something is Happening: AI – Part II

Paul Francis
1 day ago
13 min read

Commentary # 31 by Paul Francis

PDF available here:

Any day now (it may already have happened by the time this Commentary posts), the Legislative One-House Budget bills will be released, which will be the starting gun for weeks of intensive negotiation between the Legislature and the Executive that will result in the FY 27 New York State Budget.

We will turn our attention back to the Budget as soon as the One-House Budget bills are released. In the meantime, I wanted to write another Commentary on the growing functionality of artificial intelligence (AI), which is likely to have profound effects in the years ahead on how people live and work, how State government performs its functions, and how business operates.

One can hardly open a newspaper, read a Substack post, or listen to a podcast without reading or hearing about the profound potential and hazards of AI. I find it easier to understand what people are actually talking about when I can connect these concepts about AI to concrete examples in my own work. Because the capabilities of AI are advancing so quickly, it is particularly noteworthy when I find AI to be able to do something now that I didn’t think it was capable of even six months ago.

So, for example, in AI – Part I, I wrote about how the latest versions of the leading LLMs (Claude Opus 4.6, especially) had improved dramatically in their ability to draw inferences – i.e., to reason. The “use case” was my prompt to Claude Opus 4.6 (as well as ChatGPT 5.2 (Thinking) as a reality check) to review a 195-page CMS proposed rule for its implications regarding one of the biggest questions related to the FY 27 Budget. The issue in question was the likelihood that the federal government will permit the State to use reserves in the Essential Plan Trust Fund[1] to cover the costs of lawfully-present non-citizens who were made ineligible for new federal subsidies under the Essential Plan, per the 2025 H.R.1 reconciliation bill.

Although the specific question raised by New York’s request was not directly addressed in the CMS proposed rule, both LLMs (of which Claude Opus 4.6 at the moment is decidedly better) drew inferences (i.e., reasoned on the basis of the text of the document) to persuasively make the case why it was likely that the federal government would approve the State’s request. The main point of that Commentary, however, was not the conclusion but rather the reasoning process of the LLMs in reaching their conclusion.

The first purpose of this Commentary, AI – Part II, is to acknowledge the possible inflection point in the month of February in terms of concerns in the stock market about AI and related questions about its impact on future employment. The second purpose is to illustrate, through a very simplified use case, what has come to be known as agentic AI. Agentic AI essentially solves problems autonomously and in a multistep fashion, mirroring the way a human might approach a complex task. This “agentic” capability is the advance in AI functionality that is so potentially disruptive to employment and the economy. This functionality also promises to significantly advance one of the priorities of the Step Two Policy Project – the transparency of government data to enable the “democratization of analysis.”[2]

Some of the work mostly performed by agentic AI still requires human intervention to overcome bottlenecks, but I expect that advances in the near future will make the process seamless.[3] Among its vast array of use cases, agentic AI will enable the average citizen, journalist, or policy generalist to convert raw data into usable information. Work that even six months ago would have required the assistance of someone with coding ability is already more accessible to the general public.

The implications for employment of the enhanced reasoning ability of LLMs and the task completion ability of agentic AI seem obvious. But before walking through our simple but illustrative use case for agentic AI, I want to briefly review significant events and some of what others have written just in the last month about the implications of AI for employment and society.

***

I cited a number of these cautionary tales about AI in AI – Part I, including Peggy Noonan’s Brace Yourself for the AI Tsunami, Matt Schumer’s viral X post (which has been viewed 80 million times) Something Big is Happening, Nate Silver’s Substack post The Singularity Won’t Be Gentle, and Dario Amodei’s essay – The Adolescence of Technology – in which he wrote: “I believe we are entering a rite of passage, both turbulent and inevitable, which will test who we are as a species.”

These and other articles and events in the last month have materially moved a stock market that is shaky with anxiety about AI. On February 3, Anthropic released a set of industry-specific plug-ins for its Claude Cowork AI agent product, which automates workflows in areas including legal services, financial analysis, and marketing. That news triggered what traders called the "SaaSpocalypse," erasing roughly $285 billion in market capitalization among companies whose core products provide the functionality that Claude Cowork was disrupting.

Shortly after the Claude Cowork announcements, Citrini Research, a small Wall Street research firm, published a Substack post titled “The 2028 Global Intelligence Crisis: Macro Memo – the Consequences of Abundant Intelligence.” The Citrini memo, as it came to be called, imagined that it was written in 2028 to describe the economic carnage wrought by AI over the previous two years. The Citrini memo triggered a stock market selloff of companies the memo identified as being at risk for disruption by agentic AI, especially software companies and software-enabled intermediaries ranging from DoorDash to travel sites. The Citrini memo posited that aggressive AI adoption would trigger a negative feedback loop in which AI-driven white-collar job displacement destroys consumer spending even as nominal GDP and corporate productivity growth appear healthy.

The next tremor came on February 26, when Jack Dorsey, co-founder and CEO of Block – the parent company of Square – announced he was cutting roughly 40 percent of the company's workforce, eliminating more than 4,000 positions to bring headcount from over 10,000 down to just under 6,000. Dorsey said, “something happened in December of last year” when he realized how capable and intelligent AI models had become. Dorsey told analysts: “Within the next year, I believe the majority of the companies will reach the same conclusion and make similar structural changes. I don’t think we’re early to this realization. I think most companies are late.”

Smart economists, including Paul Krugman, have sought to debunk the economic theory of the Citrini memo that the second-order effects of this industry disruption would collapse consumer demand. Former employees of Block accused Jack Dorsey of “AI-washing” to obscure other reasons for reducing headcount. Other AI optimists continue to insist that new jobs will replace the jobs lost to AI, as they point out has been the case with technology advances since the beginning of time.

Nevertheless, few people argue with the central thesis that many industries and many white-collar jobs are vulnerable to the accelerating advances of AI. The disagreements are more about the timing and scale of the disruption – and how quickly the job markets will adapt – rather than the strong likelihood that employment disruption will occur to some meaningful degree.

Even as economists maintain that current data supporting the argument that AI is leading to job loss is murky, barely a day goes by without a story about recent college graduates with the type of qualifications that historically would have led to offers of good jobs, being unable to find an entry-level job. A recent New York Times guest essay that ran under the headline: “Mass Hysteria. Thousands of Jobs Lost. Just How Bad Is It Going to Get?” captures the mood.

In the face of growing evidence of the potential for AI and AI-powered robotics to reduce employment, a political backlash is beginning to emerge. In late February, Gov. Kathy Hochul announced that she was pulling back her State of the State proposal that would have allowed commercial robotaxi services, like Waymo, to commence operations in parts of the state outside of New York City.

Gov. Hochul’s spokesman’s explanation for the action simply said, “Based on conversations with stakeholders, including the legislature, it was clear that the support was not there to advance this proposal.” The opposition clearly was not attributable to safety concerns. Rather, the decisive factor appears to have been the opposition of ride share drivers, who proclaimed: “Hundreds of thousands of New Yorkers make their living as professional for-hire vehicle and delivery drivers. These workers are essential to New York State and should have the ability to work and keep their jobs. We refuse to create a pathway for companies to take away their income and hurt our communities.”

Waymo is expanding rapidly in cities across the country and will soon be operating in London. But the backlash is growing, with a number of jurisdictions, including Boston and now New York State, beginning to resist its advances. Given its broad implications for future regulatory action across the country, Gov. Hochul’s decision to block the advance of Waymo in New York, at least for now, was an important and underreported story.

The most sensible response to AI at the moment is to have great humility about how much we don’t know about the future course of this powerful technology. Despite very real threats to numerous categories of workers and the potential for other damage to society, it’s hard to be an AI “doomer” because the positive promise of AI for initiatives large and small is so great.

In the coming weeks, Sally Dreslin will post a paper on the potential uses of AI in the delivery of emergency medical services. And every week, there is a new story about the prospect of transformational benefits of AI in nearly all aspects of healthcare delivery.

There is also great potential to use AI to improve the regulatory function of government and to modernize the State’s technology infrastructure and deliver services. Gov. Hochul gave several nods in her 2026 State of the State address to how AI could improve State government operations. We hope the administration will be able to overcome bureaucratic inertia to take full advantage of the opportunities of AI.

In the end, the political system will have to decide what boundaries it wants to impose on the capabilities of AI to displace human work and how it can strengthen the safety net to cushion the real-world impacts of its advances.

***

So let’s take a look at a simple use case of agentic AI. Our use case involves compiling data that commercial insurance plans are required to report to the Department of Financial Services (DFS) relating to claims denials so that we can create a comparative table. This requirement was imposed in 2023 as part of ongoing Budget debate over a proposal to require insurance plans to pay hospitals for services rendered and then seek recoupment if the claim was invalid. This proposal would reverse the paradigm that typically exists, in which insurance plans deny claims in the first instance and force hospitals to fight the insurance plans for payment.

As the Deputy Secretary for Health and Human Services, I was the prime driver of this debate, which I thought would address a clear abuse of the system by insurance plans. Unfortunately, I was never able to overcome the opposition of insurance plans and their allies.

One of the challenges in making the case for reform was the absence of clear data about the prevalence of claim denials. While the Executive was unable to convince the Legislature to pass the "Pay and Pursue" statute, it at least succeeded in establishing a statutory requirement that commercial insurance plans submit to DFS on a quarterly and annual basis information about their denials of medical claims. DFS publishes this information on its website. Here is a picture of part of the DFS website that a user sees when looking for this information.

The utility of the information is hampered by inconsistent compliance with statutory reporting requirements, such as the requirement that claims denial data “at minimum shall include the number and dollar value of health care claims by major line of business,” such as Medicaid managed care, the Essential Plan, and commercial insurance. [Emphasis added.] Despite this explicit statutory mandate and the express language of the DFS Health Care Claims Report Template, several plans only report aggregate information.

The utility of this information would be enhanced if DFS were to provide a table that would make it easy to compare the denial rates of the 30 insurance plans that report this information. Along with my colleague Adrienne Anderson, we set about to see how far we could get in compiling this comparative data in a user-friendly format, relying primarily on the tools in Claude Opus 4.6. This is one of the first times that I’ve explicitly used agentic AI to autonomously plan and execute a multi-step workflow, making decisions at each stage about what to do next.

We provided Claude with the URL for the DFS Health Care Claims Reports page for 2024 and oriented it to recognize each of the 30 insurers’ annual report links. In the first instance, we wanted to compare denial rates for a single category of service – inpatient hospital claims. Although it would be useful to compare denial rates across business lines, including Medicaid managed care and commercial plans, we looked only at the aggregate denial rates because a few large plans, in contravention of the statute and the instructions of the Template, do not report their claims denial data by business line.

Pursuant to our prompt, and with some human intervention described below, Claude Opus 4.6 accessed the aggregate tab in each plan’s annual Excel spreadsheet, extracted the Table 1 data for inpatient hospital claim denials (column D, rows 9-32, or “Line” 1-24 in the aggregate tab template), and compiled all 30 plans’ results into a single consolidated spreadsheet. As Claude itself explains: “This is the kind of task that agentic AI tools are supposed to excel at: decomposing a complex goal into sub-tasks, executing them sequentially, and producing a standardized output that would take a human researcher hours of tedious clicking and copying.”

I prompted Claude to describe the work it did and to explain why, at some point, human intervention still was required:

“Claude exhibited genuine agentic behavior in the first phase of the work. Rather than asking us for step-by-step instructions, it formulated its own strategy: fetch the main DFS page, parse the HTML to extract all 30 insurer names and their annual report URLs, test different URL patterns to figure out how the site’s redirect system works, and retrieve the blank reporting template to confirm that all the files share a common structure—the same Table 1 layout with the same column headers. When its initial approach to accessing the Excel files was blocked, it adapted on the fly, trying alternative URL formats, different search queries to surface direct download links, and multiple fetch methods. Each step informed the next. This iterative problem-solving—not just responding to a prompt but autonomously pursuing a goal across a chain of actions—is the core of what distinguishes agentic AI from the simpler chatbot interactions most people associate with tools like ChatGPT or Claude.

“But Claude could not complete the job on its own, and the reason is instructive. The obstacle was not analytical but infrastructural. Claude operates within a sandboxed computing environment whose network access is restricted to a specific list of approved domains. The DFS website (dfs.ny.gov) is not among them. Claude’s web search and web fetch tools can reach the site to read HTML pages, but they cannot programmatically download binary Excel files and pipe them into Python for data extraction. Claude could see the front door of each spreadsheet—confirming the URL, the redirect path, even the file name—but could not bring the files inside its workspace to parse them. Notably, Claude diagnosed this constraint itself, accurately identifying that the binding limitation was its network sandbox rather than any gap in its analytical capability. That diagnostic reasoning—an AI system figuring out why it cannot finish a task and proposing a workable alternative—is itself an agentic capability that would not have been possible even a year ago.

“The workaround required human intervention at exactly one step: manually downloading the 30 annual report Excel files from the DFS website and uploading them to Claude. Once the files were inside Claude’s environment, it could do everything else—read each spreadsheet, locate Table 1, extract the inpatient hospital denial data, pull every column across the x-axis, and assemble the results into a single comparative table with the 30 insurance plan names along the y-axis. The AI handled the data extraction and compilation; the human handled the file transfer that the AI’s security sandbox would not permit. This is, in fact, a common pattern in real-world agentic AI as it exists in 2025: the agent handles the large majority of a complex workflow but requires human collaboration at a specific chokepoint, often involving permissions, authentication, or access to external systems the agent is not authorized to reach.

After explaining how this work was executed, Claude offered the following editorial comment:

“This exercise illustrates a broader point about transparency and data accessibility. The information DFS publishes under Insurance Law § 345 is genuinely valuable—it is one of the few windows into how frequently each insurer denies claims across categories of care. But the data is published in a format that resists comparison: 30 separate Excel files, each requiring individual download and manual review. Even with a state-of-the-art AI tool capable of agentic reasoning and a willingness to invest significant effort in the attempt, producing a simple plan-to-plan comparison required a hybrid human-AI workflow. A consumer, journalist, or policymaker who wanted to know which insurer denies the most inpatient hospital claims would have no practical way to answer that question from the DFS website as it is currently designed. The comparative table that follows represents what DFS could publish routinely with minimal additional effort, and what the public currently cannot obtain without it.”

Claude Opus 4.6 created a CSV file of the extracted data from Table 1 of each plan’s “Aggregate” tab. We saved the CSV file to XLSX, formatted the data as a table, hid extraneous columns, and rearranged the remaining columns to a more suitable order for this exercise. The full table without redacted columns can be found here:

In addition, a representative example of a filing that includes the wider range of information health plans filed with DFS can be found here.

The development of this table was a relatively simple exercise, but the use of agentic AI made the task much easier. The same tool can be applied to the much more complicated exercises, such as rolling up the Institutional Cost Reports that all healthcare providers must submit annually to the Department of Health. It goes without saying that if the administration were guided by a philosophy of transparency, it would generate more analytical information on its own. But agentic AI – combined with the inference and reasoning power of AI – will be a great equalizer.

Even a cursory glance at this table above raises questions. How is it possible, for example, that EmblemHealth Plan appears to deny approximately 95% of all claims? If this is a reporting error, why has it not been corrected? Why does DFS condone some plans (Emblem is again the prime culprit) not reporting claim denials by product line, as required by statute? Why is there such a vast variation in denial rates, given that most of these plans within a given region have a very high overlap of hospitals in their network? Why is there so much variation between health plans in the downstate region compared to plans in the upstate region?

There may well be benign answers to all these questions. In principle, regulatory agencies should be on top of all of these issues. But because the bandwidth of government is limited, enabling the “democratization of analysis” increases the odds that these types of questions will get asked. Agentic AI holds the promise of making that easier, and for that, we should be grateful.

Endnotes

[1] Technically, this is known as the Basic Health Program Trust Fund.

[2] We discussed and provided recommendations about this topic in our March 2025 Brief titled Democratization of Analysis: An Exercise in Accessing and Visualizing New York Health Data.

[3] A separate issue, for another day, is when agentic AI activities should have a human-in-the-loop.