Robots May Grade Your Next Essay Exam

I have been noticing a heated debate wandering around the internet that has perked my interest.  The debate is based on a recently released study and asks the question:  Do computers have the capability to grade essays as accurately as a human grader?  The report starts with a basic explanation of the study:

“This study compared the results from nine automated essay scoring engines on eight essay scoring prompts drawn from six states that annually administer high-stakes writing assessments.” (pg. 2)

The report explains the basic criteria that each of the nine computer programs are looking for, why the program was developed, who it was developed by and all other pertinent information.  For example, AutoScore was developed by the American Institute for Research and is “designed to create a statistical proxy for prompt-specific rubrics” (pg. 10).  LightSIDE is a free and open-source package developed at Carnegie Mellon University.  It was designed for non-experts to quickly utilize text mining.  And CTB’s Bookette operates on about “90 text-features classified as structural-, syntactic-, semantic-, and mechanics-based.” (pg. 12)

The eight essays that were graded were also explained:

“Four of the essays were drawn from traditional writing genre (persuasive, expository, narrative) and four essays were “source-based”, that is, the questions asked in the prompt referred to a source document that students read as part of the assessment.”

The study then goes into explaining the results of each essay from each of the nine computer programs and the human graders.  From this study, they were able to show that these programs are just as accurate as human testers, however they are much speedier in giving results.

“The findings clearly show that in replicating the distributions of the eight essays, the scores of the automated essay scoring engines performed quite well. Most of the mean predictions were within 0.10 of the means of the resolved score.” (pg. 24)

The report also explains some possible issues that they encountered.  The first is the inconsistency of scores between the computerized graders.  This is also an issue with human graders, however, this problem may be easier to correct with computers, making this issue important but possibly irrelevant   They also stated that comparing computer graded scores to human graded scores may not be the best measure of how accurate the computers are grading.  Another possible issue that occurred was that the essays needed to be typed in by humans leaving room for human error.

At this point in this post, I am sure you have already formed your own opinion on the matter.  I would encourage you to skim, if not read through the full report and leave a comment here.  I’m curious to see what the Litmos blog readers think about this.

Perhaps one day we will be adding this technology to Litmos.