Skip to Main Content

AI for Research

This guide provides information and recommended resources on the applications of Generative Artificial Intelligence (GenAI) in research.

Home

Image generated by DALL-E 3

 

 

 

 

 

 

 

 

The banner image is generated by DALL-E  

What is AI? 

Artificial Intelligence, or AI is a very broad term often used these days to describe any advanced machine learning system.

Generative AI, or GenAI refers to deep learning models that can generate text, images, and other media in response to user prompts. Behind the scenes, there is an algorithm that has been trained with massive amounts of data. As a result of this training, a GenAI model can associate user input (prompts) with the patterns learned from the training data and generate an output, simulating the data it was trained on.

See Glossary for more definitions.

What can AI do? 

It depends on the type of AI tool you are using. Popular general-purpose AI chatbots and virtual assistants such as ChatGPT, Claude 3 or Google Gemini are based on Large Language Models (LLMs) trained to generate realistic text in a natural language. Therefore, they are particularly good at text-centric tasks, for example:

  • spelling and grammar correction; 
  • suggesting synonyms; 
  • rewording or reformulating text; 
  • rewriting text in a particular style; 
  • applying formatting, e.g. LaTeX, to text; 
  • tasks that require generating new text based on a template; 
  • autocomplete (for both text and code);
  • translation.

Apart from that, there are AI-based tools optimised for particular tasks and connected to relevant knowledge bases, e.g. Elicit or Consensus for research literature search and review. See more examples of AI-based tools in the AI Tools for Research section of this guide. 

Is AI reliable? 

Never assume the information generated by any AI tool is accurate or true! While AI can produce logical and confident responses to most scenarios, it lacks any real ability to analyse information, and ultimately produce original thought. It is limited by the data it was trained on and prone to hallucinations: generating false or misleading statements, presented as if it were a fact. See the sections on AI Limitations and AI Concerns for more information. 

Should I declare the use of AI? 

In most circumstances, yes, the use of generative AI tools should be declared. Your publisher may have specific guidance on how to attribute the AI tools you've used. Please see the section on Citing AI for more information. For generating emails and editing grammar, there is no need to credit AI unless you're simply copying and pasting its response. Remember that you are still responsible for all your communication even if it was generated by AI. 

Can the use of AI be detected in publications?  

While some platforms have claimed to be able to detect AI usage, none have been able to consistently prove it. However, AI-written content can often stray significantly from our own writing styles, which is easily noticeable to the human eye. Additionally, AI-written content often contains factual errors and non-existent references that appear due to AI hallucinations and a limited knowledge base. An expert on the subject will detect these kinds of errors and understand their nature immediately.  

Limitations of GenAI

Image source: Manning

Every GenAI tool has a Large Language Model (LLM) at its core. Even when combined with other technologies and knowledge bases, LLMs have a range of limitations that you have to keep in mind when using GenAI tools.

  1. LLMs don't actually know things and  don't have human-like reasoning abilitiesUnlike humans, LLMs "learn" by statistically analysing the distributions of words, pixels or any other elements in the training data and identifying common patterns. They are simply trained to predict the next or missing element in a sequence, which lets them simulate the training data and generate responses that "make sense".
  2. LLMs lack information verification mechanisms and can "hallucinate" (generate incorrect or unrelated answers), especially favouring familiar or frequently seen words. Moreover, an LLM would rather produce an incorrect answer than no answer at all: current evaluation procedures reward guessing over acknowledging uncertainty (Kalai et al. 2025). This is why even the best models showing top results on evaluation benchmarks can be problematic for practical use. 
  3. Even the best LLMs lack consistency. On the surface level, it means that an LLM-based chat-bot will produce a different output in response to the same prompt every time you query it. On a deeper level, LLMs are unable to maintain coherent internal reasoning without contradictions (Lin et al. 2025).
  4. LLM-based chat-bots are prone to confirmation bias by design: they are more likely to agree with the user than not, especially if reinforcement learning with human feedback (RLHF) is used (Nehring et al. 2024). 
  5. LLMs aren't self-aware: they have no way to reflect about their own knowledge or abilities. There is no use asking an LLM if it knows or has access to something: in the best case scenario, you will get an answer hard-coded by its developers, and in the worst case scenario, the model will generate a convincing, but meaningless response.  
  6. LLMs don't have "core beliefs": they'll generate responses for or against any topic, depending on the prompt, without holding any stance. If certain perspectives appear more often in their training data, the model may echo those more frequently, as it aims to reflect the most common responses. 
  7. LLMs have no sense of truth or morality. While they may often reflect widely accepted facts, they can just as easily generate incorrect information if prompted, as they lack any inherent understanding of right or wrong.
  8. LLMs are auto-regressive, meaning each word guess becomes part of the input for the next guess, allowing errors to accumulate. Even a single mistake can cascade, with the model building further errors on top of it. LLMs can’t self-correct or revise their outputs; they simply follow the sequence as it unfolds. 
  9. The quality of the output directly depends on the quality of the input; better prompts yield better results. Experiment with different prompts to find what works best and check the guide on prompt engineering
  10. LLMs aren't capable of true problem-solving or planning, as they lack goals and the ability to look ahead. Instead, they can generate plans and solutions based on patterns they've learned from training data. While their outputs may resemble structured plans, they are essentially making educated guesses based on previous examples rather than actively evaluating alternatives or considering outcomes. 

Items 6-10 are adapted from Riedl, M. (2023). Introduction to Large Language Models without the Hype. Medium.

Best Practices of Using AI in Research 

Image source: Digital Promise

  • Check the University’s Research Integrity Policy (QA514) and visit the University’s website section on Research Integrity.
  • Never present AI-generated work as your own. Be transparent about which tools you use and how you have used them in your research. A disclosure statement is often required as a section or appendix in your publication.
  • Do not ask AI to generate experimental data and then present it as the data you have collected, whether in its raw form or after analysis. This is data fabrication and is considered serious misconduct.
  • Keep detailed records of your interactions with GenAI. Ensure you save the prompts, not just the outputs! Note that ChatGPT and some other tools don’t keep a timestamp of when conversations took place — you will need to do this manually.
  • Make sure to save separate drafts of your work and keep notes along the way to show that the final product is the result of your own progress and original thought.
  • Keep sensitive data away from GenAI to avoid personal and experimental data leakage. Once your data is fed to a proprietary GenAI tool, there is no guarantee that it stays private and no way to check it.
  • Meticulously fact-check all of the information produced by generative AI and verify the source of all citations the AI uses to support its claims.
  • Critically evaluate all AI output for any possible biases that can skew the presented information.
  • When available, consult developers' notes to find out if the tool's information is up-to-date, and if it has access to a knowledge graph or a database for fact-checking.
  • Choose specialised tools over general-purpose GenAI assistants like ChatGPT. Avoid asking the latter to produce a list of sources on a specific topic: such prompts may result in the tools fabricating false references. Use specialised tools for literature search and mapping instead.
  • Keep in mind that GenAI virtual assistants like ChatGPT can’t function as a calculator, encyclopedia, oracle or expert on any topic by design. They simply use large amounts of data to generate responses constructed to "make sense" according to common cognitive paradigms, but they lack human-like reasoning abilities and information verification mechanisms.

Further Reading