In an age where artificial intelligence is becoming increasingly integrated into education, a remarkable experiment has illuminated both the potential and limitations of AI in academic performance. A French teacher recently graded a high school philosophy essay written not by a student, but by OpenAI’s language model ChatGPT. The result? A definitive score that places the AI squarely within the human range—but not at the top of the class. This experiment has sparked vigorous debate about academic integrity, the future of grading, and the ever-blurring lines between human and machine intelligence.
The baccalauréat, or “bac,” is one of France’s most important academic milestones, particularly its philosophy exam—which carries deep cultural and intellectual weight. Each year, thousands of high school students face a series of questions requiring original thought, reasoned arguments, and a solid grasp of philosophical texts. When a French educator submitted an AI-generated essay under exam conditions, many were curious: how would artificial intelligence, trained on billions of parameters and countless texts, fare on a test designed to evaluate a human’s cognitive and reasoning abilities?
Overview of the Baccalauréat AI Experiment
| Event | Grading of a baccalauréat philosophy essay written by ChatGPT |
| Location | France |
| Exam Type | Philosophy section of the French baccalauréat |
| Evaluator | Licensed French high school philosophy teacher |
| Score Given | 11 out of 20 |
| Purpose | To assess how well AI performs against real academic standards |
Why this test matters for the future of education
The impact of AI like ChatGPT on education is more than just academic curiosity—it taps into major questions facing schools globally. Teachers increasingly encounter students submitting AI-written assignments or relying on AI tools for assistance. By testing ChatGPT against a real academic standard, this experiment probes whether AI is capable of producing coherent, argument-based essays that meet grading criteria not based solely on grammar or information accuracy, but on interpretation, logic, and depth of analysis.
The score ChatGPT received—11 out of 20—places it in the “satisfactory” range. According to French educational standards, a score between 10 and 12 reflects a paper that shows average understanding but lacks richness in thought or originality. In the eyes of the evaluator, the AI’s work demonstrated sufficient structure and clarity but lacked the creative reasoning and critical engagement that characterize stronger submissions.
How the AI essay was constructed and submitted
To ensure authenticity, the teacher fed ChatGPT a real philosophy question from this year’s bac. The prompt, “Is the experience of the past necessary for constructing identity?” is open-ended and requires nuanced reasoning. ChatGPT generated a complete essay in seconds. To avoid bias, the teacher presented the AI’s work for evaluation anonymously, without revealing that it was computer-generated. The paper’s structure and syntax were appreciated, but its arguments were flagged as lacking depth and philosophical engagement—an important shortcoming in this context.
This raises a significant issue in current AI language models: while they are highly adept at mimicking structure and form, they often falter in personal insight and deep critical analysis. This is precisely what was missing in ChatGPT’s response, according to the teacher’s feedback.
What educators are saying
The teacher who graded the essay remained anonymous but shared detailed commentary. They noted that while the AI displayed no spelling or grammatical errors, its essay felt impersonal and somewhat formulaic.
“It’s correct, coherent, and even elegant at times. But the absence of a personal voice and intellectual struggle is glaring. There’s no sense of genuine inquiry.”
— Anonymous Bac Grader, French High School Teacher
Other educators have also started weighing in on the broader implications of the experiment:
“This tells us AI can pass as average, but not exceptional. We need to rethink how we evaluate—and teach—originality and critical thought in the classroom.”
— Jean-Marc Duval, Educational Researcher
How this score compares to real student performance
An 11/20 on the philosophy portion of the baccalauréat is far from uncommon among human students. According to the French Ministry of Education, the national average fluctuates between 9.5 and 12 out of 20, depending on the year and stream. That means ChatGPT did better than many real students—and worse than many others. It lands squarely in the middle. While the clear phrasing and structure perhaps gave it an edge, its lack of daring and personal engagement held it back from a higher score.
Strengths shown by ChatGPT in essay writing
- Clear structure: Introduction, argument development, and conclusion all well formatted.
- Excellent grammar: No spelling or syntactical issues noted.
- Consistent tone: The language remained formal and academic throughout.
Where AI still fell short in philosophical composition
- Lack of originality: No unique interpretations or novel approaches to the problem.
- Shallow engagement: The arguments were logical but lacked emotional depth or paradox.
- No reference to specific philosophers: The AI avoided explicitly naming thinkers like Locke, Descartes, or Hume, who are naturally expected in such essays.
Possible implications and ethical questions
Since AI can now produce content competent enough to pass a core academic exam, educational systems may have to question what skills truly matter. If general knowledge and writing structure can be automated, perhaps curriculums should pivot more towards critical thinking, creativity, and oral exams. The ethical question is at the heart of this evolution: if AI can submit a passable essay, how do we differentiate between human learning and machine synthesis?
Digital honesty is also becoming an issue. Teachers have begun reporting an increasing number of students using AI platforms to write assignments. While tools to detect AI-written content exist, they are not foolproof. The French education system may soon need guidelines or policy changes to address AI use directly, particularly in standardized testing environments.
Winners and losers from the AI essay experiment
| Winners | Losers |
|---|---|
| AI researchers showcasing real-world use cases | Students relying solely on AI without understanding |
| Educational policymakers gaining new insight | Traditional written assignments as sole evaluation method |
| Teachers adapting to new challenges and tools | Unmonitored take-home assignments |
The future of academic assessment in the age of AI
With AI now visibly present in student lives, academic assessment must evolve. Oral exams may gain importance as a result, or emphasis may shift to project-based learning that demands collaboration and innovation. It’s a pivotal moment for policy frameworks. Students need tools to use AI responsibly—rather than seeing it as a shortcut, it could become a co-educator in the learning process.
“This is our chance to redesign how we evaluate intelligence, creativity, and comprehension for a new era.”
— Claire Vannier, Digital Pedagogy Consultant
Frequently asked questions
What question was used for the ChatGPT essay?
The prompt was: “Is the experience of the past necessary for constructing identity?”—a real philosophy question from the French baccalauréat exam.
What score did ChatGPT receive on the essay?
ChatGPT received an 11 out of 20, placing it in the “satisfactory” range according to French grading standards.
Was the evaluator aware the essay was AI-generated?
No. The teacher graded the AI essay anonymously, just like any other student paper.
What weaknesses did the AI exhibit in the essay?
The AI lacked originality, failed to cite philosophers, and showed shallow analysis despite a strong structure.
Can AI-written essays trick teachers?
In many cases, yes. Without tell-tale signs or detection tools, it can be difficult to distinguish.
How might schools respond to AI-written assignments?
Some schools may adopt new formats of assessment, such as oral exams or in-class essays, to ensure authenticity.