⚙️📝 Ardern and Henry, Does digital make a difference?

Abstract
Computer use in schools has grown rapidly in recent decades. Within the educational community, interest in authentic assessment has also increased. To enhance the authenticity of tests of writing, as well as of other knowledge and skills, some assessments require students to respond in written form via paper-and-pencil. However, as increasing numbers of students grow accustomed to writing using digital devices, these assessments may yield underestimates of students' writing abilities. This article presents the findings of a small study examining the effect that mode of administration – computer versus paper-and-pencil– has on middle school students' performance on multiple-choice and written test questions. Findings show that, though multiple choice test results do not differ much by mode of administration, for students accustomed to writing on computer, responses written on computer are substantially higher than those written by hand (effect size of 0.9 and relative success rates of 67% versus 30%). Implications are discussed in terms of both future research and test validity.

Study Design

related to: true experimental research

To study the effect the medium of administration has on student performance, that is taking assessments on computer versus by hand on paper, two groups of students were randomly selected random selection from the ALL School Advanced Cluster (grades 6, 7 and 8). For the experimental group, which performed two of three kinds of assessments on computer, 50 students were selected random assignment. The control group, which performed all tests via pencil-and-paper, was composed of the 70 students required for the time-trend study described above. The three kinds of assessments performed by both groups were:

An open-ended (OE) assessment comprising 14 items, which included two writing items, five science items, five math items and two reading items.
A test comprised of NAEP items which was divided into three sections and included 15 language arts items, 23 science items and 18 math items. The majority of NAEP items were multiple-choice. However, 2 language arts items, 3 science items and 1 math item were open-ended and required students to write a brief response to each item's prompt.
A performance writing assessment which required an extended written response.
Both groups performed the open-ended (OE) assessment in exactly the same manner, by hand via paper-and pencil. The experimental group performed the NAEP and writing assessment on computer, whereas the control group performed both in the traditional manner, by hand on paper.
The performance writing assessment consisted of a picture of a mural and two questions. Students formed small groups of 2 or 3 to discuss the mural. After 5 to 10 minutes, students returned to their seats and responded to one of two prompts.

Analysis:
The study is a true experimental design, based on the fact that random assignment and random selection was employed to create sample groups. An independent variable was manipulated (mode of test administration) and several extraneous variables were also controlled in order to yield maximum test validity.

Impact of gender division in sampling on study:

Within the control groups, females performed only slightly better on PWAvg scores than did males (with means 2.33 and 2.27 respectively). However within the experimental group females scored considerably better than males (with means of 2.92 and 2.60). Thus it appears that the effect of computer administration may have been somewhat larger for females than for males.
Nonetheless the males who took the extended writing task on computer still performed considerably better than the females who took the writing task on paper (with respective means of 2.60 and 2.33). A two way analysis of variance (PWAvg by gender and group) showed group but not gender to be significant (this was the case whether or not an interaction term was included). This general pattern was confirmed by regression analyses of PWAvg scores on OE scores, sex and group. Though OE scores and the group variable were significant, the sex variable was not

Analysis:
Based on table 10, we can see that the two variables interacting is the gender attribute variable and the independent variable (test administration). The effect of computerized test administration yielded significant benefits for females than males, as seen in the mean scores of 2.92 for females in the experimental group. This would be considered the primary or main effect.

Though females in both groups scored higher than males, males in the experimental group scored 2.6 mean over the control group's female's average of 2.33 assures us that gender is not a threat to internal validity since the 2.6 mean for males is still a significantly better performance than the 2.33 of the females in the control group.