In June 2025, the MIT Media Lab published a disturbing brainwave study: when humans used ChatGPT to write essays, their neural connectivity decreased by as much as 55%—far more severe than the 34–48% reduction observed when using search engines.[1] Even more striking, 83% of ChatGPT users could not recall the key arguments in the essays they had written with AI assistance—they produced an essay yet could not remember what they had written. That same year, a field experiment by Wharton Business School published in the Proceedings of the National Academy of Sciences (PNAS) revealed an educational paradox: high school students who used GPT-4 without restrictions improved their practice scores by 48%, but in the subsequent exam without AI, their scores actually dropped by 17%.[2] A cross-50-country study by the Brookings Institution named this phenomenon the "doom loop of AI dependence"—students outsource their thinking to AI, their cognitive abilities atrophy, they become more dependent on AI, and their abilities deteriorate further.[3] The Lancet published the first clinical evidence of deskilling—physicians who used AI-assisted colonoscopy saw their adenoma detection rate drop by 20% after the AI was removed.[4] This empirical evidence from neuroscience, education, healthcare, and software engineering collectively points to a profound warning: AI agents are not merely changing how we work—they are changing how we think. And this change may not be for the better.

1. Neuroscientific Evidence of Cognitive Offloading: Your Brain Is Changing

In 2008, a cover story in The Atlantic posed a generational question: "Is Google Making Us Stupid?" Author Nicholas Carr observed that prolonged internet use was altering human cognitive patterns—shifting from deep reading to shallow scanning, from focused thinking to scattered attention.[5] Two years later, in his book The Shallows, Carr systematically argued this point on the scientific basis of neuroplasticity: the brain reshapes itself according to usage patterns—if we continuously acquire information in fragmented ways, the brain's deep-processing circuits will gradually weaken. This argument sparked fierce debate at the time, but seventeen years later, the MIT Media Lab's brainwave study provided the most direct neuroscientific validation of Carr's framework—only this time, what we face is not a search engine, but generative AI.

Nataliya Kosmyna's team at the MIT Media Lab designed an elaborate experiment.[1] Fifty-four participants were randomly divided into three groups: a "brain-only group" that wrote essays using only their own minds, a "search group" that used search engines for assistance, and an "LLM group" that used ChatGPT for assistance. Throughout the writing process, researchers measured participants' brain activity in real time using electroencephalography (EEG). The results revealed a clear gradient: inter-regional brain connectivity in the search group decreased by 34–48% compared to the brain-only group, while the LLM group showed a reduction of up to 55%. In the researchers' words, brain connectivity "systematically decreased with increasing external support."

The deeper finding emerged in the post-task memory tests. Eighty-three percent of ChatGPT users could not recall the key arguments in their own essays—they produced an essay but failed to encode its content into long-term memory. This was not incidental forgetfulness—it reflected a fundamental cognitive mechanism: when the brain expects an external tool to "remember" information, it reduces its own encoding effort. Psychologists call this the "Google effect" or "digital amnesia," first identified by Betsy Sparrow in a 2011 experiment. But the MIT study took this concept a step further—unlike search engines, generative AI does not merely "remember" information but "produces" it, causing human brain engagement across the entire cognitive chain—from ideation and organization to expression—to decline substantially.

The researchers used the concept of "cognitive debt" to frame their findings—a term that has been independently proposed in different academic contexts: the MIT Media Lab used it to describe neural-cognitive deterioration, while Professor Margaret-Anne Storey of the University of Victoria used it in February 2026 to describe developers' loss of code comprehension in software engineering.[6] The two usages may seem different, but they point to the same structural problem: when AI performs cognitive tasks on behalf of humans, humans lose not only the output but also the understanding that is constructed in the process of producing that output.

The institutional implications of this finding are far-reaching. In the knowledge economy, the core of human capital is not "what you know" but "how you think"—critical thinking, problem framing, cross-domain integration. If AI tools are systematically reducing brain activity in these higher-order cognitive functions, then knowledge workers who use AI over the long term may face a paradox: their immediate output increases, but their cognitive capacity—the fundamental ability required to produce that output—is deteriorating. It is like an athlete using a mechanical exoskeleton to run faster while their own muscles atrophy—as long as the exoskeleton keeps working, everything looks fine, but once it malfunctions, they may struggle even to walk.

2. The Deskilling Crisis on the Education Front: From the Wharton Experiment to a Global Alarm

If the MIT study revealed the neurological mechanisms of cognitive offloading, the field experiment by Hamsa Bastani's team at Wharton Business School, published in PNAS, revealed its educational consequences—and those consequences are more severe than many expected.[2]

The research team conducted a large-scale field experiment in Turkey. Approximately 1,000 high school students were randomly assigned to three groups: a control group without AI, a "GPT Base" group with unrestricted access to GPT-4, and a "GPT Tutor" group that used a carefully designed version of the AI (providing guiding hints rather than direct answers). Over several weeks of math practice, the GPT Base group scored 48% higher than the control group, and the GPT Tutor group scored an impressive 127% higher. These figures seemed like powerful evidence for the AI education revolution. However, in the subsequent final exam—conducted without any AI tools—the results took a dramatic turn: the GPT Base group scored 17% lower than the control group.

This 17% score decline was not statistical noise—it was a structural learning injury. When students used GPT-4 without restrictions, they effectively skipped the most critical steps in the learning process: struggling, making mistakes, correcting errors, and achieving understanding. Research in educational psychology has long demonstrated that "desirable difficulties"—moderate frustrations and challenges during learning—are necessary conditions for deep learning. AI eliminated these difficulties, and in doing so, eliminated the learning itself.

But the most important finding of the Wharton experiment was not the problem, but the solution. The GPT Tutor group—the group that used carefully designed guardrails—showed no significant difference from the control group in the final exam.[7] In other words, when AI was designed to "provide hints rather than direct answers," the learning damage virtually disappeared. This finding carries significant policy implications: the question is not "whether to use AI" but "how to design the interface between AI and humans." An AI tool that directly provides answers and one that guides students to think may look similar on the surface, but their impact on learning outcomes is worlds apart.

The large-scale global study published by the Brookings Institution in January 2026 connected individual experimental findings into a global picture.[3] Over the course of a year, the research team conducted focus group interviews and in-depth surveys across more than 50 countries, covering K–12 students, parents, teachers, and technology experts. The conclusion was stark: 56% of feedback emphasized the harms of AI, while only 44% mentioned benefits. The researchers identified a "doom loop of AI dependence"—students use AI to complete assignments, their cognitive abilities atrophy from lack of practice, their diminished cognitive capacity makes them more dependent on AI, and that dependence further accelerates cognitive decline. This was not a theoretical hypothesis—the researchers observed this cycle in concrete form across educational settings in 50 countries.

Gartner's forecast extended the educational problem into the workplace. In October 2025, at its annual IT Symposium, Gartner issued a striking prediction: by 2026, due to critical thinking degradation caused by generative AI use, 50% of global organizations would be compelled to mandate "AI-free" skill assessments.[8] In high-stakes industries such as finance, healthcare, and law, talent capable of independent thinking would become scarce, driving up talent acquisition costs. Gartner simultaneously predicted that by 2027, 75% of hiring processes would require certification of workplace AI proficiency—creating a "skills paradox": enterprises need employees who can skillfully use AI, yet also need them to think independently without it.

A study published in January 2026 by Anthropic's Judy Hanwen Shen and Alex Tamkin validated the Wharton experiment's findings from a different angle.[9] They recruited 52 professional programmers, randomly assigned to groups learning a new asynchronous programming library. The AI-assisted group scored 17% lower on skill assessments than the control group—a difference that was statistically significant across beginner, intermediate, and expert programmers. More granular analysis revealed six distinct AI interaction patterns, three of which preserved learning outcomes (such as asking the AI for explanations rather than directly requesting answers), while the other three severely harmed learning. Participants who fully delegated to AI scored the lowest, while those who asked AI for explanations scored the highest—once again confirming the core finding of the Wharton experiment: what determines learning outcomes is not whether AI is used, but how it is used.

3. The Ironies of Automation and Clinical Deskilling: From Theory to Medical Evidence

In 1983, British human factors expert Lisanne Bainbridge published a mere five-page paper in the journal Automatica, titled "Ironies of Automation."[10] The paper presented the most classic paradox in automation research: the higher the degree of automation, the more critical the role of the human operator—because only humans can handle the anomalies that automation cannot cope with—yet automation simultaneously erodes the skills and vigilance that humans need to handle those situations. Bainbridge illustrated this with the example of a nuclear power plant control room: operators spend 99% of their time merely monitoring the automated system, but in the 1% of anomalous situations, they must make highly complex judgments immediately—and prolonged passive monitoring is precisely what undermines their ability to make those judgments.

Forty-two years later, Bainbridge's theoretical prediction received its first clinical validation in a domain she could never have imagined—AI-assisted medicine. In August 2025, the Lancet Gastroenterology & Hepatology published a multicenter observational study conducted by Krzysztof Budzyn's team across four medical centers in Poland.[4] The study involved 19 experienced colonoscopy physicians (endoscopists) who had used an AI-assisted detection system (CADe) during the ACCEPT clinical trial, then returned to an AI-free work environment after the trial ended. The study tracked the results of 1,443 colonoscopy procedures.

The findings were alarming: before using AI, these physicians had an adenoma detection rate of 28.4%; after using AI for a period and then returning to an AI-free environment, their detection rate dropped to 22.4%—a relative decrease of 20%. These were not novice physicians but experienced senior clinicians. More importantly, this was not a simulated result in a laboratory setting—it was deskilling evidence from real clinical scenarios involving real patients. A decline in adenoma detection rate directly impacts early detection of colorectal cancer, which in turn affects patient survival rates. In the human factors literature, this was the first observation of automation-induced deskilling in clinical medicine, and its significance extends far beyond the narrow scope of colonoscopy.

The structural implications of the Lancet study merit deep reflection. If AI-assisted detection systems genuinely improve detection rates during use (in the ACCEPT trial, the AI group's detection rate was 54.8%, significantly higher than the non-AI group's 40.4%), but simultaneously cause physician skill degradation after AI removal, then we face a classic "lock-in effect": once AI use begins, it cannot be safely stopped—because stopping does not mean "returning to the starting point" but rather "falling to a state worse than the starting point." In Bainbridge's framework, this is the most profound irony of automation: we introduce automation to transcend the limits of human capability, but automation's side effect is to further shrink the boundaries of human capability, making our dependence on automation not a choice but a necessity.

This deskilling paradox exists in software engineering as well—and may be even more severe. As I discussed in a previous analysis, the Vibe Coding revolution is eroding the pipeline for developing junior engineers.[6] When AI replaces junior developers in writing most of the code, these developers lose the opportunity to cultivate "code intuition"—just as the physicians in the Lancet study lost the opportunity to develop "visual detection intuition." The difference is that colonoscopy deskilling can be observed within months (because detection rates can be directly measured), while software engineering deskilling may take years to manifest—and by the time it does, it may have already caused an irreversible talent gap.

In the context of the AI agent economy, the deskilling paradox has even more far-reaching implications. When agentic AI frameworks like OpenClaw enable users to direct AI agents through natural language to complete entire workflows, humans are not merely outsourcing individual cognitive tasks—they are outsourcing the entire cognitive process, from problem definition and solution design to execution and evaluation. If using ChatGPT to write an essay already reduces brain connectivity by 55%, what would be the cognitive impact of using AI agents to manage entire workflows? No direct empirical study exists yet, but based on the logical extension of Bainbridge's theory, the answer is unlikely to be reassuring.

4. Institutional Consequences of Cognitive Debt: Enterprise, Professional, and National Dimensions

A study presented by Microsoft Research at the 2025 CHI conference provided the most comprehensive empirical picture of cognitive offloading in the workplace.[11] The research team surveyed 319 knowledge workers, collecting 936 instances of generative AI use. The self-reported reductions in cognitive effort across each level of Bloom's taxonomy (the classic framework in educational psychology for assessing cognitive levels) were striking: knowledge level reduced by 72%, comprehension by 78%, application by 70%, analysis by 71%, synthesis by 76%, and evaluation by 55%.

The structure of these numbers is worth examining closely. The greatest reduction was at the "comprehension" level (78%), and the smallest at the "evaluation" level (55%)—precisely reflecting the current capability profile of AI: generative AI excels at information synthesis and summarization (corresponding to the comprehension level), but is relatively weaker at value judgment and critical evaluation (corresponding to the evaluation level). Yet even at the evaluation level, where AI is comparatively weaker, the reduction in cognitive effort still reached 55%—more than half. What the researchers observed was not merely a quantitative change but a qualitative transformation: the nature of critical thinking shifted from "information gathering" to "information verification," from "problem solving" to "integrating AI output," from "higher-order thinking" to "stewarding AI."

This role shift from "thinker" to "steward" may represent an efficiency gain at the individual level—as the researchers noted, many participants reported improved work quality. But at the institutional level, it raises a series of structural concerns.

First, the systematic erosion of professional judgment. In professional fields such as law, medicine, and finance, practitioners' core value lies not in information gathering (which AI can do faster and more comprehensively) but in professional judgment—making trade-offs under incomplete information, adjudicating between conflicting evidence, and finding balance among complex stakeholder interests. When these professionals' cognitive effort decreases by 55–78% across all Bloom levels, is their professional judgment deteriorating in parallel? The Lancet colonoscopy study provided evidence in the medical domain; Shen & Tamkin's programming study provided evidence in software engineering. But in law, finance, policy analysis, and similar fields, comparable empirical studies remain scarce—a knowledge gap that is itself cause for concern.

Second, the fragility of organizational knowledge. From my experience conducting policy research for the World Bank and the United Nations, I have come to deeply appreciate that an organization's core capability resides not only in the minds of individuals but also in the shared understanding among team members—common frameworks for problems, consensus on methodologies, and tacit agreements on quality standards. When each member outsources most of their cognitive work to AI, this process of building shared understanding is weakened—each person has their own AI output, but the team lacks a common cognitive experience. Professor Storey has identified this as the organizational dimension of "cognitive debt" in software engineering;[6] but it equally applies to all knowledge-intensive organizations.

Third, cognitive resilience risks at the national level. The Brookings 50-country study elevated the cognitive offloading issue from the organizational to the national level.[3] When a nation's education system adopts AI tools at scale without the kind of carefully designed guardrails revealed by the Wharton experiment, an entire generation of students may grow up within the "doom loop of AI dependence"—capable of using AI to produce high-quality assignments yet lacking the ability to think independently. In an era of escalating geopolitical tensions, a nation's cognitive resilience—the collective capacity of its citizens for independent thinking, critical analysis, and creative problem-solving—is a strategic asset. As I emphasized in my analysis of the relationship between talent and national power, the quality of human capital is the foundation of national competitiveness. If AI tools are systematically eroding that foundation, then the governance of AI in education is not merely an education policy issue—it is a national security issue.

Fourth, the geopolitical vulnerability of AI supply chains. As enterprises and nations outsource ever more cognitive functions to AI systems, the supply chains of these AI systems—from semiconductor manufacturing and model training to API services—become part of the cognitive infrastructure. As I noted in my analysis of digital sovereignty, when core capabilities depend on external suppliers, supply chain disruptions become not just business disruptions but cognitive disruptions. The Lancet study's deskilling findings make this risk even more acute: even after AI services are restored, the human capability degradation exposed during the interruption may have already caused irreversible losses.

5. Rebuilding Cognitive Sovereignty: AI Literacy Frameworks and the Right Architecture for Human-AI Collaboration

Facing the structural risks of cognitive offloading, the solution is not to reject AI—that is neither possible nor wise—but to design the right architecture for human-AI collaboration. The core finding of the Wharton experiment provides the most important clue: carefully designed guardrails can eliminate AI's harm to learning.[2] The success of the GPT Tutor group proves that the key is not whether to use AI, but the design of AI interaction—an AI that provides hints rather than direct answers has a fundamentally different impact on human cognition than one that directly provides answers.

The AI Literacy Framework jointly released by the OECD and the European Commission in 2025 provides a starting point for institutional responses.[12] Developed with the assistance of Code.org and an international expert panel, this framework defines four core competencies for AI literacy in primary and secondary education: Engage with AI—understanding the fundamental principles and limitations of AI; Create with AI—being able to collaborate effectively with AI on tasks; Manage AI—being able to critically evaluate AI output and identify biases; and Design with AI—understanding the design decisions of AI systems and their social implications. The framework's core philosophy is that AI literacy is not merely the technical ability of "how to use AI," but the cognitive ability of "how to think independently in the age of AI."

However, a framework is only a starting point. Translating it into effective educational practice requires answering several key design questions.

First, the concept of "cognitive fitness." Just as physical health requires regular exercise—even if there is an elevator, one should occasionally take the stairs—cognitive health also requires regular "AI-free thinking exercises." Gartner's predicted "AI-free skill assessments" are one form of this concept in the enterprise setting; but a more systematic approach is to embed "cognitive fitness" into the design of educational curricula and professional development.[8] For example, medical education could alternate between AI-assisted training and "AI-free diagnostic exercises" to ensure that physicians' independent judgment does not deteriorate due to AI assistance—the Lancet study's findings transform this suggestion from a theoretical recommendation into a clinical necessity.

Second, the "cognitive protection" principle in interaction design. The success of GPT Tutor in the Wharton experiment, combined with the effectiveness of the "ask AI for explanations" strategy in Shen & Tamkin's study, jointly point to a design principle: AI tools should be designed to augment human thinking, not replace it.[9] Specifically, this means AI should prioritize providing frameworks, hints, and feedback rather than finished output. In software engineering, this means AI coding tools should be able to explain their design decisions, not merely generate code. In medicine, this means AI diagnostic aids should highlight the anomalous features they have detected rather than directly providing a diagnosis. The core of this principle is: AI's output should serve as "input" for human thinking, not a "substitute" for it.

Third, institutional "cognitive resilience" audits. Just as enterprises conduct financial audits and cybersecurity audits, future corporate governance may need to incorporate "cognitive resilience audits"—assessing whether an organization's core business functions can maintain acceptable quality levels when AI tools are unavailable.[13] This concept has already been discussed within the framework of corporate digital resilience; empirical research on cognitive offloading transforms it from a forward-looking suggestion into an urgent governance need. Specific audit items might include: professional competency testing of key personnel in AI-free environments, contingency plans for AI supply chain disruptions, and assessments of the soundness of internal knowledge transfer mechanisms.

Fourth, national-level AI education governance. Taiwan passed the Artificial Intelligence Basic Act in 2025, providing a legal framework for AI governance. However, AI governance in the educational domain still lacks specific policy guidance. Based on the Brookings global study's finding that 56% of feedback highlighted AI's harms in education, Taiwan needs a clear AI education policy that avoids both Luddite-style blanket bans and guardrail-free blanket adoption. The OECD's AI Literacy Framework provides a reference architecture, but it must be localized to Taiwan's educational context.

In my view, the challenge of cognitive offloading ultimately points to a deeper question: in the age of AI, what is the irreplaceable value of being "human"? If AI can gather information faster and more comprehensively, write text more fluently, and detect anomalies more precisely—then what is the human role in the cognitive chain? The MIT study, the Wharton experiment, the Lancet clinical data, and the Microsoft Research workplace survey collectively point to an answer: the irreplaceable value of humans lies not in the efficiency of "executing cognitive tasks" but in the judgment of "understanding why a task should be executed"—the setting of purposes, the weighing of values, and the consideration of ethics. Yet these higher-order cognitive abilities are not innate—they must be cultivated through extensive "lower-order" cognitive practice, just as a conductor must first learn to play at least one instrument before they can understand the role of each section in the orchestra. If AI eliminates the necessity of this "lower-order" practice, we may, in the pursuit of efficiency, sever the very pathway for cultivating judgment. This is the deepest irony of cognitive offloading—and the full manifestation, in the age of AI, of the "ironies of automation" that Bainbridge foresaw forty-two years ago.

References

  1. Kosmyna, N. et al. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. MIT Media Lab / arXiv:2506.08872. media.mit.edu
  2. Bastani, H. et al. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences (PNAS), 122(26). pnas.org
  3. Burns, M. et al. (2026). A New Direction for Students in an AI World: Prosper, Prepare, Protect. Brookings Institution, Center for Universal Education. brookings.edu
  4. Budzyn, K. et al. (2025). Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy: a multicentre, observational study. The Lancet Gastroenterology & Hepatology, 10(10), 896-903. thelancet.com
  5. Carr, N. (2008). Is Google Making Us Stupid? The Atlantic. theatlantic.com; Carr, N. (2010). The Shallows: What the Internet Is Doing to Our Brains. W. W. Norton & Company.
  6. Storey, M.-A. (2026). Cognitive Debt: A New Challenge in AI-Assisted Development. margaretstorey.com
  7. Wharton Knowledge. (2025). Without Guardrails, Generative AI Can Harm Education. knowledge.wharton.upenn.edu
  8. Gartner. (2025). Top Strategic Predictions for IT Organizations and Users in 2026 and Beyond. gartner.com
  9. Shen, J. H. & Tamkin, A. (2026). How AI Impacts Skill Formation. Anthropic. arXiv:2601.20245. anthropic.com
  10. Bainbridge, L. (1983). Ironies of Automation. Automatica, 19(6), 775-779. doi.org
  11. Lee, H.-P. et al. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. Microsoft Research / CHI 2025. microsoft.com
  12. OECD & European Commission. (2025). Empowering Learners for the Age of AI: An AI Literacy Framework for Primary and Secondary Education. ailiteracyframework.org
  13. Gartner. (2025). Strategic Predictions for 2026: AI-Free Skills Assessments. gartner.com
Back to Insights