Lightnews — Scholar-powered news

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

You can find the full write-up in our preprint here.
arxiv.org/abs/2501.14499

Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course

Providing students with individualized feedback through assignments is a cornerstone of education that supports their learning and development. Studies have shown that timely, high-quality feedback pl...

arxiv.org

January 29, 2025 at 7:16 PM

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

One encouraging takeaway from the study is that, at least in this particular task, open-source LLMs prove just as capable as OpenAI's commercial GPT4o. This means that universities could run their own LLM graders in-house without fear of compromising student privacy. 7/

January 29, 2025 at 7:16 PM

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

So, are we human TAs obsolete?

Well, not quite.

First, setting up good grading rubrics takes quite a bit of time and effort. Second, LLMs achieved an accuracy of 90%, which could still be improved. However, perhaps newer models will perform even better! 6/

January 29, 2025 at 7:16 PM

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

In terms of feedback, students actually seem to slightly *prefer* feedback written by LLMs over human-written feedback. While there is some nuance to this result, the conclusion is clear: students are just as happy with LLM-generated feedback as with TA-written feedback. 5/

January 29, 2025 at 7:16 PM

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

In our setup, LLMs determined whether answers satisfied predefined grading criteria that we, TAs, painstakingly prepared ahead of time. Here, LLMs achieve about ~90% accuracy. Small LLMs work well on easier questions but are overly generous for harder and open-ended questions. 4/

January 29, 2025 at 7:16 PM

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

We tested several models, including OpenAI's GPT4o and several open-source Llama 3 models of varying sizes. So, can LLMs grade student assignments?

The short answer is "mostly yes".

There are two aspects to grading student answers: the grade and the feedback. 3/

January 29, 2025 at 7:16 PM

pavlinpolicar.bsky.social

@pavlinpolicar.bsky.social

We wanted to see whether LLMs could grade short text answers as well as (or better) than human TAs. Over the course of the semester, students answered 36 questions of varying difficulty, and their answers were randomly assigned to be graded by a human TA or an LLMs. 2/

January 29, 2025 at 7:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news