Many generative AI companies, including OpenAI, Anthropic and Google (creators of ChatGPT, Claude and Gemini, respectively), do not disclose information about their training data or methods. This lack of transparency can make it challenging for users to fully understand the biases and limitations of these tools and make informed decisions about their use.
Entering personal or confidential information into generative AI applications carries risks, and the lack of proper data privacy and security controls by developers exacerbates the issue. Personal data shared with these applications may become accessible to other users, leading to identity theft and privacy violations. Similarly, confidential corporate information could be exposed to competitors or third parties.
“The only foolproof solution? Keep sensitive data far away from LLMs.” (Falconer 2023)
Generative AI models are open ended: they have no inherent limitations on their output, allowing them to produce responses that are not necessarily grounded in fact or accuracy, and can potentially generate misinformation. Some AI tools have been exploited to create fake images, videos, and audio recordings, known as "deep fakes," which can be used to spread disinformation and manipulate public opinion.
AI-generated text can often appear authoritative and confident, making it challenging to distinguish between reliable and unreliable information. While AI can provide logical and coherent responses, it lacks any real ability to truly analyse information, context, and ultimately produce original thought.
Furthermore, generative AI tools may provide outdated information, as they are often trained on historical datasets rather than having access to real-time data. This can result in dated or inaccurate representations of current events.
As the end user of the AI's output, it is your responsibility to critically evaluate the quality of the information and detect potential issues or inaccuracies in the AI's response.
Text or images generated by AI tools have no human author, human authors, but they are trained on data created by humans, which includes human biases. But unlike humans, AI tools are unable to consistently differentiate between biased and unbiased information when formulating responses: they are trained to simply predict the most likely sequence of words in response to a given prompt. As a result, AI systems like ChatGPT have been known to exhibit socio-political biases and, in some cases, generate responses containing sexist, racist, or offensive content.
Generative AI tools that employ reinforcement learning with human feedback (RLHF) face an inherent challenge due to the potential biases of the human testers providing feedback. This kind of training also makes these models susceptible to confirmation bias: an AI chatbot is more likely to agree with the user than not (Nehring et al. 2024).
Using generative AI tools to create derivative works from copyrighted material may infringe on the copyright owner's exclusive rights. Some AI image and text generation tools have been trained on web content without the consent of the content owners, raising legal concerns. Several lawsuits have been filed against AI platforms for copyright infringement due to the use of artists' and writers' content without permission. While the outcome of these cases remains uncertain, some experts have cited fair use arguments in defense of AI training data usage.
It is important to consider obtaining the necessary permissions before inputting copyrighted content into AI systems to avoid potential legal issues.
The carbon footprint of large language models has far-reaching implications for both the environment and the economy. The high energy consumption of these models drives a case for privatising and commodifying the platforms, moving them away from open-source accessibility to subscriber-based models for corporate profit. This shift has economic implications, as the costs of carbon impacts are externalised and not factored into the operational costs or responsibilities of the platform. The carbon footprint spans the entire lifecycle of these models, from design and development to their operational phase and future legacy.
AI ethics is an actively developing multidisciplinary field that aims at addressing the concerns described above. It studies how to ensure that AI is developed and used responsibly to optimise its beneficial impact while reducing risks and adverse outcomes. This means adopting a safe, secure, unbiased, and environmentally friendly approach to AI.
Thais, S. (2024). Misrepresented Technological Solutions in Imagined Futures: The Origins and Dangers of AI Hype in the Research Community. arXiv:2408.15244.
Slattery, P., Saeri, A. K., Grundy, E. A., Graham, J., Noetel, M., Uuk, R., ... & Thompson, N. (2024). The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence. arXiv:2408.12622.
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P. S., ... & Gabriel, I. (2021). Ethical and Social Risks of Harm from Language Models. arXiv:2112.04359.
Salvaggio, E. (2024). Challenging The Myths of Generative AI. Tech Policy Press.
Rogers, A., Luccioni, A. S. (2024). Position: Key Claims in LLM Research Have a Long Tail of Footnotes. In Forty-first International Conference on Machine Learning.
Zitron, E. (2024). The Subprime AI Crisis. Where is Your Ed At?
Milmo, D., Hern, A. (2024). What will the EU’s proposed act to regulate AI mean for consumers? The Guardian.
Vakali, A., Tantalaki, N. (2024). Rolling in the deep of cognitive and AI biases. arXiv:2407.21202.
Nehring, J., Gabryszak, A., Jürgens, P., Burchardt, A., Schaffer, S., Spielkamp, M., Stark, B. (2024). Large Language Models Are Echo Chambers. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10117–10123, Torino, Italia. ELRA and ICCL.
Balloccu, S., Schmidtová, P., Lango, M., Dusek, O. (2024). Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 67–93, St. Julian’s, Malta. Association for Computational Linguistics.
Yin, Z., Sun, Q., Guo, Q., Wu, J., Qiu, X., Huang, X. (2023). Do Large Language Models Know What They Don’t Know?. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada. Association for Computational Linguistics.
Akaissi, H., Mcarlane, S. I. (2023). Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 15(2).
When AI Gets It Wrong: Addressing AI Hallucinations and Bias. MIT Sloan Teaching & Learning Technologies.
Maleki, N., Padmanabhan, B., & Dutta, K. (2024). AI Hallucination: A Misnomer worth clarifying. arXiv:2401.06796.
Marsh, O. (2024). Chatbots are still spreading falsehoods. Algorithm Watch.
Birhane, A., Han, S., Boddeti, V., & Luccioni, S. (2024). Into the LAIONs Den: Investigating Hate in Multimodal Datasets. Advances in Neural Information Processing Systems, 36.
Luccioni, S., Akiki, C., Mitchell, M., & Jernite, Y. (2024). Stable bias: Evaluating societal representations in diffusion models. Advances in Neural Information Processing Systems, 36
Edwards, B. (2023). Why ChatGPT and Bing Chat are so good at making things up. Ars Technica.
Hicks, M. T., Humphries, J., Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, Volume 26.
European Innovation Council and SMEs Executive Agency. (2024). Artificial intelligence and copyright: use of generative AI tools to develop new content. European Commission News Blog.
Dornis, T. W., Stober, S. (2024). Copyright and training of generative AI models — technological and legal foundations. SSRN.
Marcus, G., Southen, R. (2024). Generative AI Has a Visual Plagiarism Problem. IEEE Spectrum.
Klosek, K. & Blumenthal, M. (2024). Training Generative AI Models on Copyrighted Works Is Fair Use. Association of Research Libraries Blog.
Silveira, L. (2024). The Fall of Z-Library: The “Burning of the Library of Alexandria” or Protection for Authors Against AI Companies, 27 SMU Sci. & Tech. L. Rev. 119
Estes, A. C. (2024). What, if anything, is AI search good for? Vox.
Wierda, G. (2024). When ChatGPT summarises, it actually does nothing of the kind. R&A IT Strategy & Architecture.
Heikkilä, M. (2023). Three ways AI chatbots are a security disaster. MIT Technology Review.
Falconer, S. (2023). Privacy in the age of generative AI. StackOverflow Blog.
Koerner, K. (2023). Generative AI: Privacy and tech perspectives. International Association of Privacy Professionals.
Varoquaux, G., Luccioni, A. S., Whittaker, M. (2024). Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI. arXiv:2409.14160
Hao, K. (2024). Microsoft’s Hypocrisy on AI. The Atlantic.
Luccioni, A. S., Hernandez-Garcia, A. (2023). Counting carbon: A survey of factors influencing the emissions of machine learning. arXiv:2302.08476.
Dodge, J., Prewitt, T., Tachet des Combes, R., Odmark, E., Schwartz, R., Strubell, E., ... & Buchanan, W. (2022). Measuring the carbon intensity of AI in cloud instances. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency (pp. 1877-1894).
Luccioni, S., Jernite, Y., & Strubell, E. (2024). Power hungry processing: Watts driving the cost of AI deployment? In The 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 85-99)
O'Brien, I. (2024). Data center emissions probably 662% higher than big tech claims. Can it keep up the ruse? The Guardian.
Saenko, K. (2023). Is generative AI bad for the environment? A computer scientist explains the carbon footprint of ChatGPT and its cousins. The Conversation.
Harding, X. (2023). The Internet’s Invisible Carbon Footprint. Mozilla Blog.
Singh, M. (2023). As the AI industry booms, what toll will it take on the environment? The Guardian.
Hacker, P. (2024). The real existential threat of AI. OUPblog. Oxford University Press.
The Library proactively supports and enhances the learning, teaching, and research activities of the University. The Library acts as a catalyst for your success as University of Galway’s hub for scholarly information discovery, sharing, and publication.
Library
University of Galway
University Road,
Galway, Ireland
T. +353 91 493399