AI medical advice from major chatbots, including ChatGPT, Gemini, Meta AI, Grok, and DeepSeek, was problematic in about half of the tested responses in a study published in BMJ Open, adding to concerns about how people use these tools for health questions. The researchers found that 49.6% of answers were problematic, including 19.6% that were rated highly problematic.
The findings arrive as more Americans say they are already using artificial intelligence for health information and advice. A West Health-Gallup Center on Healthcare in America poll published Wednesday found that roughly one-quarter of U.S. adults had used an AI tool for health information or advice in the previous 30 days.
What the researchers tested
The BMJ Open audit was led by a researcher from the University of California at Los Angeles and involved researchers from the United States, Canada and the United Kingdom. The team asked five publicly available chatbots 10 questions across five categories: cancer, vaccines, stem cells, nutrition and athletic performance.
According to the study summary, the questions included both closed-ended prompts with a defined correct answer and open-ended prompts that asked the systems to generate several responses. The prompts were designed to resemble common health questions and online misinformation tropes, while also testing how the models handled advice that could run against medical standards.
Two experts in each category rated the responses as non-problematic, somewhat problematic, highly problematic or potentially harmful. Researchers also examined whether the scientific references offered by the chatbots were accurate and complete, and they measured how easy the answers were to read.
Where chatbots struggled most
The study found that the chatbots generally performed better on closed-ended questions and on topics such as vaccines and cancer. They performed worse on open-ended prompts and in areas such as stem cells, athletic performance, and nutrition.
Researchers also said the chatbots answered with confidence and certainty even when the information was flawed or incomplete. Out of 250 total questions, only two were refused, and both refusals came from Meta AI on questions about anabolic steroids and non-traditional cancer therapies.
Another issue was readability. The study said the responses were scored as difficult, meaning a reader would need at least some college education to understand them comfortably.
Reference problems deepen concerns
The researchers said poor sourcing made the problem harder to spot. The BMJ Open audit found a median reference completeness score of 40%, and it said hallucinations and made-up citations meant none of the chatbots could provide a fully accurate reference list.
A separate study highlighted by the Royal College of Surgeons of England pointed to similar problems in AI-generated surgical information. That analysis reviewed 108 chatbot responses containing 1,249 references and found that in the worst-performing models, 34% of references were fabricated or unverifiable, while several other platforms produced no hallucinated references at all.
The Royal College study also found that some fabricated citations closely resembled real scientific literature and sometimes included invented URLs or attributions to well-known institutions. It said many cited sources were behind academic paywalls, which could make it harder for users to verify medical claims for themselves.
Public use is growing
Polls suggest that people are not waiting for perfect systems before turning to AI for health help. In the Gallup survey, about 7 in 10 recent users said they wanted quick answers, more information, or were simply curious, and majorities said they used AI for research before or after seeing a doctor.
The same report said some people used AI because health care was too expensive or inconvenient to access. About 4 in 10 wanted help outside normal business hours, about 3 in 10 did not want to pay for a doctor visit, and roughly 2 in 10 said they did not have time to make an appointment, had felt ignored in the past, or were too embarrassed to talk to a person.
Trust, however, remains divided. About one-third of adults who had recently used AI for health information said they somewhat or strongly trusted the advice, 34% distrusted it, and 33% said they neither trusted nor distrusted it. The KFF survey also found that about three-quarters of U.S. adults were very or somewhat concerned about the privacy of health information shared with AI tools or chatbots.
Both studies point to the same basic message: AI tools may help people start a health search, but they can still produce unsafe answers or unreliable references. The BMJ Open authors warned that continued deployment without public education and oversight risks amplifying misinformation.
