A technical problem in the AI system used to score Massachusetts MCAS essays led to incorrect results for about 1,400 student responses, prompting the state and its testing contractor to rescore affected work and issue updated scores.
State education officials have described the issue as a temporary glitch, and multiple reports say the impact spread widely across the state—though the sources differ on exactly how many districts or schools were involved.
What went wrong with MCAS essay scoring
The Massachusetts Department of Elementary and Secondary Education (DESE) said the problem came from a temporary technical glitch that caused some essays to be scored incorrectly.
In one example, a student received a zero on an essay that should have scored six out of seven points, and a teacher noticed the mismatch.
A teacher at Reilly Elementary School in Lowell spotted inconsistencies in mid-July when reviewing essays and scores, then alerted school and district leaders.
How many students and districts were affected
DESE said around 1,400 essays were incorrectly assessed out of 750,000 essays, and that the issue affected an average of one or two students in each of the 145 districts involved.
Another report said nearly 200 schools were impacted and that AI incorrectly graded about 1,400 essays.
Another report said the state notified nearly 200 school districts that at least one student was affected, and it cited records indicating the issue stretched to 192 different districts.
Because these accounts use different totals—145 districts in one report, nearly 200 districts in another, and nearly 200 schools in a third—the exact scope varies depending on the source.
Rescoring, checks, and updated results
After DESE was notified, the testing contractor Cognia investigated the scoring issues and arranged for the affected essays to be rescored, while DESE also reviewed score distributions from randomly selected essay groups.
One report said DESE issued updated scores to impacted students after the error was identified.
Another report said the state’s testing contractor identified more than 1,400 essays that had to be graded and that updated scores were higher than previously released preliminary scores.
DESE said MCAS preliminary results are provided to districts to allow time to flag discrepancies during an annual review period, and the department said the problem was fixed in early August.
On how the scoring system is monitored, DESE said 10% of MCAS essays are checked by humans after AI scoring to confirm accuracy and catch discrepancies.
Other reports similarly said humans read about 10% of essays as a second read to make sure the AI scoring lines up.
Full MCAS score reports are typically provided to students and families in the fall, consistent with prior schedules.
Concerns about using AI for essay grading
Some educators and parents raised concerns about using AI for high-stakes essay scoring, especially where writing can be creative or varied.
A parent and adjunct professor in Malden said she did not know AI was used to grade MCAS essays and described the situation as “astonishing and concerning.”
In that same report, she expressed worry that the technology could show bias against students learning English as a second language or penalize creative answers that do not fit a certain rubric.
Some educators also questioned whether AI might penalize students for creative responses.
A Medford resident wrote in a letter to the editor that he opposes using AI to grade MCAS essays and said he learned from an NBC Boston investigation that AI grading had incorrectly scored at least 1,400 essays.
In his letter, he argued that AI systems do not truly understand writing and said the state only looks at 10% of student work after scoring, calling the approach disrespectful to students.
He also wrote that, in his view, if essays can no longer be graded by people, the state should consider removing them and reassessing MCAS, while urging Massachusetts residents to contact elected officials and school districts.
A Lowell Public Schools administrator described how one essay initially scored zero out of seven was later reassessed by the teacher as six out of seven, and noted that AI can be useful but that the current balance between AI scoring and human review may not be ideal for high-stakes assessments.
