ChatGPT System Analysis
Inside the Black Box: How ChatGPT’s Design Reflects the Promises and Perils of Everyday AI.
Introduction
The arrival of large language models (LLMs) has already changed the ways in which people interact with computers, gather information, conduct research, create art and teach students. With its rapid development, it will likely continue to change the world. Indeed, in a recent report by Senator Bernie Sanders, Artificial Intelligence (AI) is expected to replace 100 million American jobs in the fast food and customer service industries (Fore, 2025). This is just one example of AI’s impact.
OpenAI’s ChatGPT stands out as one of the most influential and controversial AI tools released to date. Built upon the transformer architecture introduced by Vaswani et al. (2017 as cited in Thompson, 2025), ChatGPT represents decades of research in machine learning, computational linguistics, and cognitive simulation. Its public release in 2022 gave people access to powerful generative technology that can understand, interpret, and produce human-like language. Yet, ChatGPT is not above scrutiny. Its uses and creation - specifically the training data - invite critique on a number of potential ethical concerns such as biased information, usage policies, privacy and more.
This essay offers a system analysis of ChatGPT. It is divided into three parts. In the first section, it will examine the AI’s development and uses. In the second, it will look at its design: its sources of content, regulations, and norms. Finally, this essay will consider ethical issues around its use including bias, misinformation, privacy, and authorship. This paper argues that ChatGPT enables unrivaled access to knowledge and creativity for everyday people, and this access is an unparalleled good. Yet, it also poses complex ethical issues that require transparent governance, human oversight, and a commitment to fairness, accountability, and autonomy.
Part One - System Overview
The development of ChatGPT can be traced to the early days of human-machine dialogue such as Joseph Weizenbaum’s ELIZA in the 1960s (Wang, 2024). ELIZA used a rule-based program that recognized simple patterns and substitutions to create human-like dialogue. Although ELIZA didn’t genuinely understand its users, it nevertheless demonstrated chatbots’ potential to communicate in natural language.
ELIZA was followed by a number of AIs including PARRY in the mid 1970s, a bot designed to mimic paranoid humans for research purposes; Smarterchild in 2001, an interactive digital assistant; and Apple’s Siri in 2011 (Wang, 2024). The real breakthrough, however, came with the 2017 paper Attention Is All You Need by Vaswani et al., which introduced the transformer architecture, which was a,
a significant departure from previous approaches to sequence-to-sequence modeling, which relied heavily on recurrent neural networks (RNNs). The transformer’s innovation lies in its exclusive use of attention mechanisms, eliminating the need for recurrence and enabling more efficient parallel processing. (Thompson, 2025, para. 2).
Building on Vaswani et al., OpenAI created GPT-1. Introduced in June 2018, GPT-1 was a basic language understanding model built on large text datasets (Wang, 2024). Its ability to judge semantics and relationships between sentences was an important breakthrough, although it left much to be desired.
GPT-2, 2019, trained on the users’ interactions with its predecessor. The main change, however, was a substantial increase in parameters, from 117 million to 1.5 billion. GPT-2 could generate longer and more coherent responses, better suited for real-world conversations (Wang 2024).
GPT-3 increased the parameters further, to an impressive 17.5 billion, making GPT-3 the “largest and most powerful natural language generation model at that time” (Wang, 2024, para. 10). From this OpenAI developed ChatGPT-3. Trained on an unprecedented amount of data, ChatGPT-3 could produce responses nearly indistinguishable from humans (Wang, 2024). Subsequent iterations, namely ChatGPT-3.5 and -4, have further improved on the earlier models. Belcic and Stryker (2024) write, “In May 2024, OpenAI announced the multilingual and multimodal GPT-4o, capable of processing audio, visual and text inputs in real time” (para. 1).
In November 2022, OpenAI released ChatGPT-3 to the public, making this technology accessible to millions of everyday users for the first time. These users could communicate naturally with an AI capable of answering questions, explaining concepts, summarizing content, drafting or rewriting texts, providing creative suggestions, solving problems and translating language (OpenAI Help Center, n.d.-a).
While ChatGPT functions as a question-answer technology, it can do so much more. Using the project tool, ChatGPT maintains conversational coherence across multiple work sessions while organizing a user’s charts, documents, and so on. It’s an information resource, scanning the internet and synthesizing the contents of several sources with citations. Users can search the web, schedule tasks (e.g., sending a reminder), summarize documents, generate stories or poems, and more (OpenAI Help Center, n.d.-a).
In academia, it can support student critical thinking (Abramson, 2023), enhance engagement, facilitate personalized learning, and serve as teaching assistance (Chen et al., 2023). Although limitations include a lack of emotional intelligence (for more limitations see Part Three) Chen et al. (2023) demonstrated chatbots can support holistic student success.
Part Two - System Design
According to OpenAI, its mission regarding LLMs “is to benefit all of humanity” (OpenAI, 2024a, para. 2). OpenAI continues:
We focus on building tools to help [professionals] create and achieve more. To accomplish this, we listen to and work closely with members of these communities, and look forward to our continued dialogues. (OpenAI, 2024a, para. 4).
Thus, OpenAI has embedded norms in ChatGPT’s programming. Norms in an AI context refer to the values, assumptions, priorities, and constraints either implicitly or explicitly built into the system during its design. When asked what its embedded norms are, ChatGPT responded with the following five:
Helpfulness, safety, and alignment,
Privacy sensitivity and data minimization,
Transparency,
Human oversight, and
Content (OpenAI, 2025)
First, helpfulness, safety, and alignment. “ChatGPT is designed to be helpful, safe, and aligned with human intentions. This includes refusing inappropriate requests, avoiding output that might be harmful, and trying to provide accurate and useful responses” (OpenAI, 2025).
Second, privacy sensitivity and data minimization. The system is engineered not to solicit or retain personal identifying information and to exclude from training any data known to aggregate personal or sensitive content (OpenAI, 2025). This response is supported by OpenAI Help Center (n.d.-b; n.d.-c), which also states users can opt out of having their ChatGPT conversations used in training.
Third, some transparency. According to its Help Center (OpenAI, n.d.-c) OpenAI discloses high-level information about its data sources and safety procedures, but much remains proprietary. “Although there are limits (not all training data is fully disclosed; not all internal processes are public), transparency is a built-in norm to the degree OpenAI currently commits” (OpenAI, 2025).
Fourth, human oversight. ChatGPT is distinguished from its predecessors by its use of Reinforcement Learning from Human Feedback (RLHF) - a method that uses human trainers to fine tune the technology (OpenAI, 2022). RLHF has “informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs” (OpenAI 2022, para. 12). In May of 2024, OpenAI established a new Safety and Security Committee (Field, 2024b), reaffirming human oversight as a priority.
Fifth, content. Content not included in training includes paywalled data, sites known to have disallowed copyrighted use, content aggregating personal information, spam or irrelevant material (OpenAI, 2025).
The data that trains AI is as important as its embedded norms. OpenAI (n.d.-c) identifies three major sources for its training data:
(1) “information that is publicly available on the internet,
(2) “information that we partner with third parties to access, and
(3) “information that our users, human trainers, and researchers provide or generate” (para. 1).
Before use, the data undergo filtering and preprocessing. OpenAI (n.d.-c) avoids collecting data from sources that are paywalled or located on the “dark web.” In addition, filters are applied to exclude some material, including hate speech, explicit content, websites that aggregate personal information, and spam. The data that remains is used to train the models.
In addition to public content, OpenAI licenses some non‐public datasets. These might include archives and content that is not freely available. One such partnership is with Le Monde. OpenAI’s 2024 partnership with Le Monde licensed the French newspaper’s content, marking a shift toward the use of copyrighted material in AI training datasets (Le Monde, 2024).
Internally, OpenAI regulates its training data via its Safety and Security Committee and RLHF. They are also currently developing a Media Manager, “a tool that will enable creators and content owners to tell us what they own and specify how they want their works to be included or excluded from machine learning research and training” (OpenAI, 2024a, para. 8).
Users also have a say in how their own data is used; they can opt out of it being used for training (OpenAI Help Center, n.d.-b; n.d.-c).
Regulatory bodies such as Italy’s Garante per la Protezione dei Dati Personali (the national data-protection authority) have investigated whether OpenAI’s data-handling practices fully comply with European privacy standards (Reuters, 2024). The company claims compliance with relevant privacy laws including the General Data Protection Regulation (GDPR) in the European Union (EU). Countries recognized by the European Commission need data protection under 45(1) of the GDPR (OpenAI, 2024b). Other regions rely on the European Commission’s Standard Contractual Clauses (SCCs), as outlined in Article 46(2)(c) of the GDPR along with the United Kingdom’s Data Transfer Addendum, for similar safeguards (OpenAI, 2024b).
Finally, there are external pressures for data oversight from concerned academics, journalists, creators, and current and past employees (Field, 2024a). Demands include clearer disclosure of training data, fairness, accountability, transparency, liability, and safety.
Part Three - Ethical Implications
While many norms are explicit, others are implicit and, perhaps more importantly, challenged by certain needs. LLMs need to produce accurate and fast responses, pressuring companies to release more powerful models quickly at the expense of oversight.
It has already been stated that OpenAI discloses some of its data sources and safety procedures, but much remains proprietary (OpenAI Help Center, n.d.-c). While this lack of full transparency is unfortunate, maintaining proprietary information is a necessary part of business. Similar to how Coca-Cola protects its secret formula to preserve its market viability, OpenAI must safeguard its system design to prevent advantaging their competitors. Without such protection, competing companies could overtake OpenAI, thereby limiting innovation in AI development.
Similarly, allowing users to opt out of sharing their conversations with OpenAI enhances personal privacy. However, too many privacy restrictions will reduce the data available for training, having a negative effect on quality.
Bias is another valid concern but requires a balanced approach. Cleaning all bias from massive public datasets is extremely difficult. Moreover, efforts to remove biases from the system may unintentionally introduce new biases. For example, censoring historical text with colonialist themes could produce a warped view of history.
ChatGPT, like other AIs, has biases due its training: its data primarily comes from the internet, which may lack diverse viewpoints and contain biases within themselves (Zhou, et al., 2023). For example, in April of 2023 ChatGPT refused to write a poem about then ex-President Donald Trump but did write one about Joe Biden. “ChatGPT shows a clear political bias,” concludes Zhou, et al. (2023, para. 9). This delicate balance between efficiency and ethical responsibility underscores the inherent challenges in developing fair and trustworthy AI systems.
One of the most persistent challenges in AI is hallucinations. Hallucinations are false but believable statements produced by an AI. ChatGPT has limitations when it comes to fact-checking its own statements. According to UMATechnology (2025), ChatGPT lacks real-time data and is limited to finding patterns in the data it was trained on. It therefore cannot find up-to-date articles or studies to either support or counter a claim.
ChatGPT also has no source verification method: it cannot access the reliability of a source (UMATechnology, 2025).
Even though it is adept at understanding and producing natural language, ChatGPT can still misunderstand nuanced conversations, leading to inaccuracies in its responses (UMATechnology, 2025). In high-stakes contexts such as health care or legal advice, these inaccuracies could cause serious complications. Ethical concerns that specifically relate to healthcare include fairness, bias, non-maleficence, transparency, and privacy (Haltaufderheide & Ranisch, 2024).
Another perennial problem is privacy. While OpenAI has safeguards in place (see Part Two of this paper) the company still gathers information on its users from account information, from what they type into the chatbot and from devices (Zhou, et al., 2023). This information could be leaked if data is misused or breached. In addition, ChatGPT may track and profile users based on their interaction histories, or it may learn sensitive information about users from X and other social media platforms (Zhou, et al., 2023).
The ability to generate human-like text has the potential for abuse. Zhou et al. (2023) writes, “phishing email scams, online job hunting and dating scams, and even political propaganda may benefit from human-like text from ChatGPT” (para. 18). For example, ChatGPT falsely hallucinated a sexual harassment allegation and cited a non-existent Washington Post article as proof.
Plagiarism and authorship pose additional challenges, though it’s likely authorship is overstated. Since ChatGPT is trained on a number of copyrighted materials, its responses may be considered plagiarism. If it is used to write an essay or novel, the question of who authored the piece is open for debate. Floridi (2025) calls this “distant writing,” or humans as story architects and AI as story constructors. Building on that analogy, there is no debate about who created a Frank Lloyd Wright building - Wright gets the credit as the architect, not the construction workers who physically build it.
A recent MIT study suggested that LLMs may negatively affect learning outcomes among younger users (Chow, 2025). However, the study has not yet undergone peer review, and its author notes that the primary goal is to raise awareness about society’s growing reliance on LLMs for immediate convenience (para. 3). Similar debates were had 25 years ago over the use of computers in the classroom, which has been an unparalleled good (Murdock et al., 2025). A more appropriate concern regarding AI in education is the gathering of user data to personalize experiences. This raises issues of consent and privacy (Cox and Mazumdar, 2022).
Conclusion
ChatGPT makes a good case study in the promise and peril of AI. While many of the ethical implications studied in this essay are valid concerns, there is no reason to believe that AI researchers, such as those at OpenAI, will not overcome them. Meanwhile, the release of ChatGPT-3 in November of 2022 marked an unprecedented democratization of this technology, increasing access to knowledge, enhanced creativity and communication for the masses.
Its design is not neutral; embedded within its architecture are norms like helpfulness and safety that reflect a deliberate moral engineering effort by OpenAI. The sources of its training data mirror the quality of that data, that is, biases and blindspots and so on. Still, that quality is improving.
In the end, debates over concerns are always present with the introduction of new technologies, like the use of computers in schools and calculators in math classes. While there will always be bumps in the road of progress, that road has brought us to better places.
References
Abramson, A. (2023, June 1). How to use ChatGPT as a learning tool. Monitor on Psychology, 54(4), 67. American Psychological Association. https://www.apa.org/monitor/2023/06/chatgpt-learning-tool
Belcic, I., & Stryker, C. (2024, September 18). What is GPT (generative pretrained transformer)? IBM. https://www.ibm.com/think/topics/gpt
Chen, P., Huang, R., Wu, J., & Zhang, Y. (2023). Student perceptions of chatbots as educational tools: Benefits, limitations, and ethical considerations. Information Systems Frontiers, 25, 161–182.
Chow, A. R. (2025, June 23). ChatGPT may be eroding critical thinking skills, according to a new MIT study. TIME. https://time.com/7295195/ai-chatgpt-google-learning-school/
Cox, A. M., & Mazumdar, S. (2022). Defining artificial intelligence for librarians. Journal of Librarianship and Information Science. https://doi.org/10.1177/09610006221142029
Field, H. (2024a, June 4). Current and former OpenAI employees warn of AI’s ‘serious risks’ and lack of oversight. CNBC.https://www.cnbc.com/2024/06/04/openai-open-ai-risks-lack-of-oversight.html
Field, H. (2024b, September 16). OpenAI announces new independent board oversight committee for safety. CNBC. https://www.cnbc.com/2024/09/16/openai-announces-new-independent-board-oversight-committee-for-safety.html
Floridi, L. (2025). Distant writing: literary production in the age of artificial intelligence. Minds and Machines., 35(3). https://doi.org/10.1007/s11023-025-09732-1
Fore, P. (2025, October 7). 100 million jobs could be wiped out from the U.S. alone thanks to AI, warns Senator Bernie Sanders. MSN. https://www.msn.com/en-us/technology/artificial-intelligence/100-million-jobs-could-be-wiped-out-from-the-u-s-alone-thanks-to-ai-warns-senator-bernie-sanders/ar-AA1O1D4I?ocid=BingNewsSerp
Haltaufderheide, J., & Ranisch, R. (2024). The ethics of ChatGPT in medicine and healthcare: A systematic review on large language models (LLMs). NPJ Digital Medicine, 7(1), 183. https://doi.org/10.1038/s41746-024-01157-x
Le Monde. (2024, March 13). Le Monde and OpenAI sign partnership agreement on artificial intelligence. https://www.lemonde.fr/en/about-us/article/2024/03/13/le-monde-signs-artificial-intelligence-partnership-agreement-with-open-ai_6615418_115.html
Murdock, V., Lee, C. J., & Hersh, W. (2025). Designing for the future of information access with generative information retrieval. In R. W. White & C. Shah (Eds.), Information access in the era of generative AI (The Information Retrieval Series, Vol. 51). Springer. https://doi.org/10.1007/978-3-031-73147-1_9
OpenAI. (2022, November 30). Introducing ChatGPT. https://openai.com/index/chatgpt
OpenAI. (2024a, May 7). Our approach to data and AI. https://openai.com/index/approach-to-data-and-ai
OpenAI. (2024b, November 4). EU privacy policy. https://openai.com/policies/eu-privacy-policy/
OpenAI. (2025, October 14). Ethical issues section of ChatGPT system analysis [Large language model conversation]. ChatGPT. https://chat.openai.com/
OpenAI Help Center. (n.d.-a). ChatGPT capabilities overview. https://help.openai.com/en/articles/9260256-chatgpt-capabilities-overview
OpenAI Help Center. (n.d.-b). How your data is used to improve model performance. https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance
OpenAI Help Center. (n.d.-c). How ChatGPT and our foundation models are developed. https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-foundation-models-are-developed
Reuters. (2024, November 29). Italian watchdog warns publisher GEDI against sharing data with OpenAI. https://www.reuters.com/technology/italian-watchdog-warns-publisher-gedi-against-sharing-data-with-openai-2024-11-29/
Thompson. (2025, April 10). Unveiling the architecture of ChatGPT: A deep dive into the transformer’s core. Technical Explore. https://www.technicalexplore.com/ai/unveiling-the-architecture-of-chatgpt-a-deep-dive-into-the-transformers-core
UMATechnology. (2025, May 19). Can ChatGPT fact check. https://umatechnology.org/can-chatgpt-fact-check/
Wang, K. (2024, January 22). From ELIZA to ChatGPT: A brief history of chatbots and their evolution. Applied and Computational Engineering, 39, 57–62. https://doi.org/10.54254/2755-2721/39/20230579
Zhou, J., Müller, H., Holzinger, A., & Chen, F. (2023). Ethical ChatGPT: Concerns, challenges, and commandments. Electronics, 13(17), 3417; https://doi.org/10.3390/electronics13173417


This piece really made me think about the broader implications. Your point on job displacement and the complex ethical concerns like bias and privacy is so vital. How do you think we best ensure a fair tranisition and equitable future? Your analysis is incredibly insightful.