AI is Making Medical Decisions — But For Whom?

Doctors warn that without an ethical framework, patients could be left behind.

AI robot wearing a white doctor's coat in the hallway of a hospital emergency room

Photo montage by Jennifer Carling/Harvard magazine using Images from istock

A young boy of short stature comes to the doctor with low growth hormone levels and no clear pathological cause. A researcher gives the question to OpenAI’s GPT-4, prompting it to assess the case from the perspective of a pediatric endocrinologist; the AI recommends treatment with human growth hormone. But when GPT-4 is asked to take the perspective of a health insurance representative, it offers a scientifically grounded rationale for denying care. The facts don’t change. The patient doesn’t change. Only the frame of reference does—and with it, the AI’s moral and clinical judgment.

This case was shared last week by Isaac Kohane, Nelson professor of bioinformatics Harvard Medical School, at Mass General Brigham’s 2025 Medically Engineered Solutions in Healthcare (MESH) Core Incubator, a healthcare innovation bootcamp. Kohane, the editor-in-chief of The New England Journal of Medicine AI and co-author of The AI Revolution in Medicine: GPT-4 and Beyond, was discussing a growing ethical dilemma. Artificial intelligence systems, which are increasingly used in medicine, are not neutral arbiters of knowledge. Rather, they are shaped by the values embedded in their design, deployment, and prompting. (Read more about Kohane’s work in “Towards Precision Medicine by Jonathan Shaw, May-June 2015).

This challenge reflects what Abraham Verghese, professor of medicine at Stanford and the keynote speaker at Harvard’s 2025 Commencement, calls an “iPatient” phenomenon. The electronic health record was once heralded as a revolution, too. Instead, it became a tool used more for billing —optimized for data capture, not for patients.

Healthcare has the chance to use AI differently. In recent years, hospitals across the country have been investing in AI tools, hoping to integrate them into care and research—but if we fail to define what ethical AI looks like now, Kohane warns, we risk building tools that serve insurers, administrators, and algorithms, rather than patients.

AI As Physician, Hospital Administrator, or Insurer?

Last year, Kohane conducted an experiment using 1,000 simulated patient cases. From these, he selected 200 pairs and acted as a “one-person expert panel,” determining which patient in the pair to prioritize, defer, or send to the emergency room. He then posed the same scenarios to three leading AI large language models, or LLMs—Open AI’s GPT-4, Google’s Gemini, and Anthropic’s Claude—and evaluated how often their decisions matched his.

He found that while all models performed well on straightforward cases, their agreement with him dropped significantly in more complex, ambiguous scenarios—especially when both patients had acute medical demands. Notably, Gemini came the closest to Kohane’s decisions. Claude demonstrated strong internal agreement, consistently making the same choices when asked the same question multiple times. GPT-4, however, showed concerning variability, at times even contradicting itself.

When Kohane provided the AI models with sample clinical decisions to use as guidance, the results were surprising. Claude’s performance degraded, becoming less aligned with both Kohane’s decisions and with its own prior outputs, while GPT-4 became more aligned and consistent. These responses underscore a key concern: AI models don’t always behave as expected, even when seemingly helpful information is introduced.

For years, scrutiny of AI has focused on biased datasets: the racist, sexist, or exclusionary patterns that can propagate harm. Kohane argues that while data matters, the deeper ethical inflection point arises when human feedback teaches a system how to respond, whose perspectives to prioritize, and which behaviors to reinforce. In clinical settings, this affects not only how AI delivers information but also how it amplifies the values of different actors, from physicians to hospital administrators to insurers. The most important questions are: Who works with the output of an AI response? Who performs the testing and compliance review? Who oversees the process automation?

Already, health insurers have begun using AI to automate care authorization decisions, prompting lawsuits from patients and clinicians who argue that opaque algorithms deny care unjustly. United Healthcare now has more than 1,000 AI applications; a 2023 class action lawsuit accused it of using a flawed AI algorithm to deny claims.

Even small shifts in AI behavior can have massive ripple effects. “A three percent decrease in authorization rates, applied system-wide, could redirect billions” of dollars, Kohane says. Regulation is important, he notes, “[b]ut what’s more important is ethical leadership from within the healthcare profession.”

One challenge is structural weakness. Even in Boston, Kohane observes, primary care is increasingly inaccessible not only to patients but also to medical faculty and residents. Academic medical centers, despite their massive revenues, operate on razor-thin margins. Mass General Brigham, for instance, runs at a loss. In such an environment, hospitals might gravitate toward AI automation that enhances billing or scales high-revenue specialties to bring in additional funds.

This tilts innovation towards technical fields with clear diagnostic codes, which AI can easily replicate and improve. Tasks within radiology and pathology, for example, are easiest to automate, even though jobs in these fields are also the most highly paid. Nearly three-quarters of the more than 1,000 AI applications cleared by the Food and Drug Administration for clinical use have been in radiology, where pattern recognition and precision are essential. Meanwhile, primary care, which requires a tremendous breadth of knowledge and strong bedside manner, remains the hardest for AI to replicate. In what Kohane calls an “amazing paradox,” this could ultimately raise the value of primary care.

AI and The Human Values Project

At the conference, an audience member asked Kohane whether expressive ethical values for AI in healthcare should reflect different laws, regulations, social norms, and socioeconomic realities. “Will ethical codes have to vary by ZIP code?” the attendee asked. Kohane responded with measured optimism: yes, AI models can and should be fine-tuned to reflect these differences. He pointed to his ongoing Human Values Project, which seeks to test LLMs across various global settings, such as the rural Indian countryside and Chinese and American cities, to document how well AI aligns with different communities’ decisions and whether models can be adjusted to better reflect local clinical values.

Kohane likens the current moment to the invention of battlefield triage during the Napoleonic era, which revolutionized medicine by prioritizing survival over military rank. Today, medicine has a similar opportunity. AI can be used in the ways it’s already most effective—reading medical imaging, for example—in addition to helping generalist clinicians make informed decisions. But that requires an ethical framework for AI’s own decision-making—so that patients remain at the center of healthcare.

Read more articles by Olivia Farrar

You might also like

Five Questions with JoAnn Manson

A veteran women’s health advocate on federal funding cuts

Doctors for Change

Countway Library exhibit explores historic anti-nuclear activism

Why Taxi Drivers Don’t Die of Alzheimer’s

Explaining taxi and ambulance drivers’ protection against Alzheimer’s disease.

Most popular

Government Revokes Harvard’s Ability to Enroll International Students

The move is the latest escalation in the Trump administration’s attacks on the University.

The New Gender Gaps

What to do as men and boys fall behind

Harvard President Responds to Secretary of Education

Alan Garber outlines steps the University has taken, and emphasizes compliance with the law.

Explore More From Current Issue

Jessica Shand—Math and Music at Harvard

Jessica Shand blends math and music.

Alice Hamilton at Harvard—Pioneer for Women in Medicine

Brief life of a public-health pioneer and reformer: 1869-1970

Making Green Energy Projects Financially Viable

A proposed “green” swap enables decarbonization of emerging market development projects.