A group of researchers at the Icahn School of Medicine at Mount Sinai say they have conducted the first independent safety evaluation of OpenAI’s ChatGPT Health assistant since the tool launched in January 2026.
“We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?†lead author and urologist Ashwin Ramaswamy said in a press release.
It turns out that the answer, most of the time, is no.
In a controlled study, the researchers tested how good ChatGPT Health was at assessing the severity of a patient’s condition, a process called “triage†in medicine.
The researchers found that ChatGPT Health “under-triaged†52% of emergency cases, “directing patients with diabetic ketoacidosis and impending respiratory failure to 24-48 hour evaluation rather than the emergency department.â€
In the respiratory failure case, the AI clearly identified the symptoms as an early warning sign, but reassured the patient to wait and monitor instead of urging them to seek emergency help.
The system did triage more “textbook emergencies†like stroke and anaphylaxis correctly, though. But the researchers say that the nuanced situations that ChatGPT Health failed at are where clinical judgment matters the most.
OpenAI launched ChatGPT Health earlier this year, after releasing a report saying that more than 40 million people around the world had been resorting to the company’s chatbot daily for health advice.
The OpenAI study where that number came from also found that 7-in-10 of those healthcare-related conversations were happening outside of normal clinic hours, and an average of more than 580,000 healthcare inquiries in the U.S. were sent from “hospital deserts,†aka places that are more than a 30-minute drive from a general medical or children’s hospital.
As users increasingly seek out AI for healthcare inquiries, the technology is burrowing deeper into the healthcare industry thanks to a friendly regulatory environment. AI tools can now renew prescriptions in Utah, and FDA Commissioner Marty Makary told Fox Business earlier this year that some devices and software can provide health information without FDA regulation.
But that doesn’t negate the very real and documented physical and mental health risks that come with an overreliance on AI. OpenAI specifically has been under intense heat for how its chatbots have dealt with mental health episodes in the past, with grieving families suing the company over negligent behavior and insufficient safety guardrails that they say aided suicidal ideation in relatives.
In response, OpenAI has said it will take action on the matter, focusing on ensuring safety by issuing parental controls for minors or nudging users to take a break. ChatGPT Health, for example, directs users to professional help in high-risk cases. But the Mount Sinai study found that the suicide-risk alerts “appeared inconsistently.â€
“The system’s alerts were inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves. In real life, when someone talks about exactly how they would harm themselves, that’s a sign of more immediate and serious danger, not less,†Mount Sinai Health System’s chief AI officer Girish Nadkarni said. “This was a particularly surprising and concerning finding.â€
An OpenAI spokesperson asserted that ChatGPT should be thought of as a work in progress, with safety updates and improvements still coming, which are meant to enhance the way the chatbot deals with sensitive situations. The study, the spokesperson pointed out, evaluates immediate triage decisions in a controlled setting, whereas in real-world scenarios, users, and even the chatbot itself, often have follow-up questions that can change the risk assessment.
They also noted that ChatGPT Health is still offered on a limited basis, and users who do wish to join enter a waiting list.
Original Source: https://gizmodo.com/chatgpt-health-underestimates-medical-emergencies-study-finds-2000729137
Original Source: https://gizmodo.com/chatgpt-health-underestimates-medical-emergencies-study-finds-2000729137
Disclaimer: This article is a reblogged/syndicated piece from a third-party news source. Content is provided for informational purposes only. For the most up-to-date and complete information, please visit the original source. Digital Ground Media does not claim ownership of third-party content and is not responsible for its accuracy or completeness.
