Skip to Main Content

Systematic and Literature Reviews


Can Artificial Intelligence (AI) tools such as ChatGPT be used to produce systematic reviews?








ChatGPT can certainly produce a convincing looking review, but there are a few issues:

  • It makes things up. Often referred to as 'hallucinating' if ChatGPT doesn't know the answer it will simply present a plausible fabrication. This is quite logical as it is a language model, with no innate understanding, so it will present some relevant language that relates to the question.
    • This includes referencing. ChatGPT will produce convincing looking references that do not refer to an actual source.
  • It is not clear what literature ChatGPT is trained on, but it is unlikely to include paywalled articles. This means all answers are derived from a restricted evidence base., and possibly (probably!) excludes the most authoritative sources.
    • it may also mean that answers do not include recent evidence, depending on when the corpus of literature ChatGPT was trained on dates to, and how often it is retrained or updated. Which we don't know
  • For general enquiries these tools may produce 'good enough' answers, but for systematic reviews there is an expectation of transparency of method, rigour of assessment and so forth. These are absent from ChatGPT answers - we don't what it searched or how, or how it selected the references it chose to use in its answer. 


So this all sounds terrible - as quick and plausible as ChatGPT may be, the response it produces may include false information, false sources and a completely opaque methodology. So is there any way it can be used?

  • There may be applications for parts of a systematic review that do not require rigor. eg writing an introduction, or inclusion criteria, or just brainstorming ideas. But you will still need to check all the output thoroughly.
    • this does not include writing search strategies. ChatGPT can produce a convincing search strategy, but - surprise! - it has been shown to make up components such as MeSH terms that don't exist. While it can handle Boolean operators easily it seems to (so far) not make use of functions such as truncation, wildcard characters or proximity searches.
  • Other AI tools include the language model approach of ChatGPT with search functionality to improve the currency of answers
    • Elicit does not generate answers, rather it uses the same Large Language Model (LLM) training as ChatgPT to interpret your question. It then searches  the 115M papers from the Semantic Scholar Academic Graph database and shows ranked snippets from the best results. It is best used to find research (especially on difficult to search topics) and generate ideas, than to produce any form of ready to go answer. It can  be prompted to extract aspects of interest from the results such as population, outcomes measured or main findings.
    • Perplexity combines AI with web search to produce ready made answers. It cites its sources, which are real but tend not to be scholarly. Again it is possibly best suited to generating ideas and identifying sources than to any significant contribution to producing a review.


Of course all of this will change. The use of AI for evidence synthesis is a rapidly developing field, but for clinical use it will still be necessary that syntheses meet the underlying standards of transparency and rigour which are so far absent. Keep this in mind when reading the latest tech hype.


Further Reading

Guidance for Authors, Peer Reviewers, and Editors on Use of AI, Language Models, and Chatbots - JAMA July 2023

Systematic Reviewing and ChatGPT - PICO Portal webinar

What academic research is ChatGPT accessing? LinkedIn post

How Q&A systems based on large language models (eg GPT4) will change things if they become the dominant search paradigm - 9 implications for libraries. Blog

Using Large language models like GPT to do Q&A over papers (II) — using (free) over CORE,, Semantic Scholar etc domains. Blog

Academic Publishers Are Missing the Point on ChatGPT. Blog - Scholarly Kitchen.

Using artificial intelligence methods for systematic review in health sciences: A systematic review. Res Syn Meth. 2022; 13( 3): 353- 362. doi:10.1002/jrsm.1553