Professor Pless Aids in Harnessing LLMs to Support Community Health Workers in Nigeria

March 13, 2024

Authored by:

April Horency

Harnessing the potential of artificial intelligence (AI) can improve the lives and well-being of women, children, and vulnerable communities everywhere. Nigeria, like many low- and middle-income countries (LMICs), faces a serious shortage of qualified health personnel. In the project titled “A Large Language Model (LLM) Tool to Support Frontline Health Workers in Low-Resource Settings,” professor of computer science Robert Pless is supporting research on how AI, and LLMs in particular, may be integrated into the workflow of community health workers in Nigeria and other LMICs to help mitigate this challenge.

As part of its Grand Challenges (GC) initiative, the Bill & Melinda Gates Foundation put forth a call for proposals that aid in developing innovative and safe approaches for the use of LLMs, aiming to build an evidence base on their utilization in LMICs. This project, led by Principal Investigator Nirmal Ravi of eHealth Africa Clinics, was awarded funding through GC because it helps catalyze equitable AI use by developing and testing easy and cost-effective ways to use LLMs, such as ChatGPT-4, to provide “second opinions” for community health workers.

“We worked with doctors and health workers at clinics to understand their workflow, and find how the LLM can help without adding extra work,” Pless stated.

Community health workers have only a couple of years of training but hold significant responsibility in Nigeria, from diagnosing patients to prescribing medicine. For this study, eHealth Africa opened a clinic where patients could receive a free consultation and the health worker would click a button to have the patient’s medical record converted into a patient interaction note and input into ChatGPT-4 for feedback. After receiving feedback in about 30 seconds, they can ask the patient more questions or suggest additional tests before writing a second interaction note, both of which are saved by the researchers.

To assess the quality of the feedback provided by ChatGPT-4, and ensure the safety of patients, these patients are then seen by a fully-trained medical officer paid by the study who conducts an independent examination and creates the final treatment plan for the patient. Only after deciding on that treatment plan does the doctor review the two notes, before and after the AI feedback, to evaluate if there was an improvement.

Pless and his Ph.D. candidate Grady McPeak’s crucial role in this research process is to determine how to present the patient interaction note to ChatGPT-4 by deciding on a prompt and how to interpret the result. In designing the prompt, they tried countless variations where GPT is part of interaction by writing, “I am going to send you a patient interaction note, please comment on X, Y, and Z,” and then pasting the note itself. To mitigate risks, they ultimately developed a single prompt with specific questions about the interaction note to ensure it doesn’t turn into a long conversation that may lead to hallucination.

Other potential risks arise from the biases inherent in LLMs, which Pless said are largely derived from the data used to train them. “That training data is all the text on the internet, and there is much more text on the internet in English than any other language and about the U.S. and the West than any other place in the world.”

An example of how this particular LLM bias manifested in this study is in the number of blood tests suggested in the feedback. It is not feasible to order the same number of tests in the Nigerian clinic as is common in the U.S. because of costs and the availability of tests.

“The GPT response has almost always been to recommend lots of tests in a way that isn’t really useful feedback. Determining how to prompt the language model to consider that local context more accurately in giving feedback for these notes is one of the big things we’re working on,” said Pless.

Despite these challenges, Pless said he was still surprised in some ways at how effective GPT’s feedback was. For example, he noted that a word he had never heard of before, “catarrh,” appeared in all Nigerian notes, and the LLM knew it meant a build-up of mucus in airways or cavities. This is an ongoing study, so the research team is still exploring the level of feedback ChatGPT-4 can provide and evaluating how positive its impact is.

If ChatGPT-4 can enhance the capabilities of community health workers by providing feedback that mirrors what a reviewing physician might advise, they have the potential to improve patient outcomes, free high-skill providers for other tasks, and mitigate the shortage of qualified health personnel. As a study contributor, Pless is utilizing his 25 years of research experience in machine learning to achieve this goal of harnessing the potential of LLMs in low-resource healthcare settings to support community health workers in Nigeria and beyond.