Given that the VALUE rubrics were not designed as standardized instruments, how confident can we be that they are measuring what is intended (validity) and in a way that is consistent (reliable)?

First, the process by which the VALUE rubrics were developed should give educators confidence in their validity. The VALUE rubrics were developed by teams comprised of faculty members, academic and student affairs professionals, and other experts from public and private, two-year and four-year higher education institutions across the United States. These teams included national experts who were able to confirm that the rubrics covered key aspects of the learning outcomes under consideration. This gives a high degree of confidence in the content reliability of the rubrics. Since then, 3,000 campuses have adopted the rubrics, which suggests they also hold a high degree of face validity.

The VALUE rubrics have also been tested for reliability. In the publication, “We Have a Rubric for That: The VALUE Approach to Assessment”, McConnell et al. (2019) make the argument for the validity and reliability of VALUE rubrics. Using data from 2015-2016 in which artifacts were double-scored by trained scorers, inter-rater reliability for the critical thinking, written communication, and quantitative literacy rubrics was found to be moderate-to-strong using percent agreement, Brennan-Prediger, and Gwet’s AC coefficients. Similarly, in the fall of 2012, AAC&U conducted a national inter-rater reliability study examining three VALUE rubrics: critical thinking, integrative learning, and civic engagement. Forty-four members of faculty participated, each of whom scored three samples of student work for each of the three rubrics. Even without calibration, faculty scorers were found to be in perfect agreement one-third of the time. This rose to 80% for approximate agreement. Previously published case studies on the VALUE rubrics have also reported favorable reliability findings. For example, following calibration training, Carroll Community College, DePaul University, Midland College, and Texas A&M University all reported high inter-rater reliability results among faculty scorers.

Another study examined rater biases in data collected during the 2018-19 academic year from 6,610 student artifacts scored by 221 raters using with the critical thinking and written communication rubrics (Shapovalov, 2021). In general, these data suggested that most certified VALUE scorers did not demonstrate common rater effects when scoring student artifacts. Raters tended to use the full scoring range of each rubric for all dimensions over the student artifacts, though a small number of raters were flagged in that their scores were distinguishable by the degree of leniency or severity. Overall, only 17 scorers (8%) were flagged for demonstrating any of the three common rater effects examined. In response, AAC&U revised scorer calibration training to minimize these rater effects from occurring in the future.

McConnell et al. (2019) also reviewed research showing that VALUE rubric dimension scores correlate significantly with other measures of student knowledge and skills, such as course grades, Collegiate Learning Assessment scores, and other rubrics.

Finley, A. (2012). How reliable are the VALUE rubrics? Peer Review, 13/14(4/1), 31-33. McConnell, K. D., Horan, E. M., Zimmerman, B., & Rhodes, T. L. (2019). We have a rubric for that: The VALUE approach to assessment. American Association of Colleges and Universities. Shapovalov, Y. (2021). Identifying rater effects for writing and critical thinking: Applying the many-facets Rasch model to the VALUE Institute [Master’s thesis, James Madison University]. JMU Scholarly Commons.

Discussions

Have you participated in the process of scoring rubrics in a collaborative process? If so, what were some of the challenges you encountered?

Please share your thoughts and questions in the comments section below.