News and Events Archives
- Monday, April 18, 2016 - 1:00pm
On behalf of the BEAR Center, please join us for a series of workshops presented by visiting faculty Peter Halpin of New York University. Professor Halpin’s research focuses on psychometrics—confirmatory factor analysis, item response theory, latent class analysis)—as well as statistical methods for complex and technology-enhanced educational assessments. His work has been published in methodological journals including Psychometrika, Structural Equation Modeling, and Multivariate Behavioral Research, as well as general interest journals including Educational Researcher.
The workshop schedule is below:
Monday, April 18
1 - 4pm
Item Design for Assessments Involving Collaboration
Peter F. Halpin (joint work with Yoav Bergner)
Overview: The use of collaboration and group work for assessment purposes has a relatively long history, and also features prominently in current initiatives concerning the measurement of ``21st century skills.'' However, fundamental questions remain about how to design assessments involving collaboration. In this workshop I'll discuss item design, focusing on various strategies for deriving collaborative "two-player" items from conventional "one-player" items. Then I'll demo an Xblock for Open edX (currently in Beta!) that allows small groups to collaborate using online chat while writing an assessment. The demo will involve workshop participants teaming up to write a short collaborative assessment, which will provide some of the data that we will analyze in later workshops.
Details: One central issue in the assessment of collaborative problem solving (CPS) is whether and how to simultaneously measure student performance in a traditional content domain, such as math or science, in conjunction with CPS. For example, the PISA 2015 CPS assessment did not evaluate content domain knowledge, but conceptualized collaboration in a general problem solving context. On the other hand, recent reforms to educational standards have often called for the incorporation of collaboration and other ``non-cognitive skills'' within existing curricula. In this workshop, I'll address the problem of designing and assessments that involve CPS but are anchored in a content domain, specifically mathematics.
To make things concrete, I'll consider the following question: How can a conventional ``one-player'' mathematics test question be adapted to a "two-player" collaborative context? In pragmatic terms, the goal is to arrive at a number of recipes for creating collaborative tasks from existing assessment materials. Clearly, this is not an ideal approach to designing group-worthy tasks. However, the reason for starting with existing assessment materials is to retain their strengths (e.g., established psychometric methods), while building towards more authentic and meaningful assessment contexts. The resulting tasks will necessarily represent a compromise between genuine group work and what we can measure well.
For example, one easy way to build a collaborative component into an existing mathematics assessment is just to change the instructions (e.g., "work with a partner") while retaining the assessment materials. A slightly more interesting task might incorporate elements of a jigsaw or hidden profile. The basic idea is that each student sees some incomplete portion of the item stimulus, and must share the information that he/she each possesses to arrive at a solution. A third task type involves students collaborating to request the information that they want to use to answer a question (e.g., in the form of hints). When hints are devised well, this invites students to co-construct the solution path. A fourth type of item involves questions with vector-valued answers. For example, instead of asking students to determine whether a line with a given slope and intercept intersects a certain point, we can turn the problem on its head by providing the point and asking one student to provide the slope and another the intercept. This type of task requires students to collaborate while providing an answer, not simply while obtaining the information used to provide an answer.
After reviewing the theory and practical implementation of each item type, I'll provide a demo of a web-based platform built on Open edX that allows for synchronous chat among small groups. Workshop participants will be invited to author their own items, and will be asked to team up to write a short collaborative assessment, which will provide some of the data that we will analyze in later workshops.
Tuesday, April 19
9am - noon
Modeling the effects of collaboration on student performance.
Peter F. Halpin (joint work with Yoav Bergner)
Overview: This workshop addresses the analysis of ``outcome data'' in assessments that involve collaboration among students. I'll start with the following question: When pairs of individuals work together on a conventional educational assessment, how does their collective performance differ from what would be expected of the two individuals working independently? I'll review past work on the study of small groups, build on this to develop an IRT-based approach, and then consider extensions to assessments that involve non-conventional item types (see workshop: "Item Design for Assessments Involving Collaboration"). After the review, we will analyze our data from the previous workshop.
Details: Perhaps the simplest method for incorporating collaboration into an existing assessment is just to change the instructions while retaining the rest of the assessment materials. Concretely, two students could be presented with one copy of a math test and instructed that their performance will be evaluated based on only what they record on the test form. From the perspective of group-worthy tasks, this is a worst case scenario. From the perspective of psychometric modeling, this is the easiest case to deal with. So I'll start with this situation and consider the implications in a standard IRT framework.
First we need to define what it means for a dyad to get a test item correct. I'll cover various scoring rules and their precedents in the literature on small groups and teamwork. Then I'll provide some definitions of successful and unsuccessful collaborative outcomes, and also some specific models of successful collaboration. Next I'll translate these models into standard IRT framework, which allows for a consideration of the implications of the different models for assessment design. Here we will be concerned with questions such as the following: In order to assess whether team A is collaborating according to model B, what type of questions should we ask them? Finally, I'll talk about how to test the various models using a likelihood ratio approach, estimate effect sizes for the effect of collaboration on test performance, and review some empirical results.
In the second part of the workshop, we will go over the finer details of the analyses using the data from our previous workshop on item design.
Wednesday, April 20
1 - 4pm
Measuring Student Engagement During Collaboration
Peter F. Halpin (joint work with Alina A. von Davier)
Overview: This workshop addresses the analysis of ``process data'' in assessments that involve collaboration among students. I’ll start by talking about how to interpret process data from a psychometric perspective, then I'll give an overview of a particular modeling framework that I’ve been working on, based on the Hawkes process. After considering the statistical set-up, we'll do some data analyses with an R-package I'm developing for the estimation of Hawkes process. Many of the details are in the attached paper, which is currently under review at JEM.
Details: I'll begin by talking about alternatives to the assumption of local independence that can be useful for defining temporally complex tasks. Then I'll decompose the statistical dependence in a collaborative performance assessment into a) a part that depends on interactions between students (inter-individual dependence), and b) an additional part that depends only on the actions of individual students considered in isolation (intra-individual dependence). This provides a general set-up for modeling inter-individual dependence in performance assessments that involve collaboration.
Next I'll provide an overview of temporal point processes, and specifically the Hawkes process as a parametric modeling framework that captures these two sources of dependence. I'll provide a review of some basic results on specification, estimation, and goodness-of-fit for the Hawkes process, but I'll keep the focus on application of the model.
In the present application, the Hawkes process is useful for inferring whether the actions of one student are associated with increased probability of further actions by his / her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators. I'll present a model-based index that can be used to quantify the level of engagement exhibited by individual team members, and show how this can be aggregated to the team level. I'll also present some preliminary results about the standard error of the proposed engagement index, which allows for considerations about how to design tasks such that engagement can be measured reliably. I'll also summarize some empirical results from pilot data.
After all that, I'll introduce an R-package that I am working on, and we can go through the source code together, talk about issues in estimation, and run some analyses.
- Monday, September 15, 2014 - 12:00pm
The BEAR Center has concluded both pilot and field testing of the updated DRDP instruments! Through the state's management bulletin, the California Department of Education's Early Education & Support Division invited all EESD-funded programs to participate in the early implementation of the Desired Results Developmental Profile 2015 (DRDP(2015)). The suite of developmental observational assessments are valid and reliable for use with all children from early infancy to kindergarten entry. This year will be used to collect data for calibration and scaling of the new instruments. All agencies will be required to use the DRDP(2015) assessment beginning in the 2015–16 program year.
- Sunday, December 1, 2013 - 2:56am
- Wednesday, November 20, 2013 - 12:00pmProfessor Luca Mari will conduct four workshops between Wednesday, November 20 and Friday, November 22.Luca Mari (M.Sc. in physics; Ph.D. in measurement science) is a professor at the Cattaneo University – LIUC, Castellanza (VA), Italy. He teaches measurement science, statistical data analysis, and system theory. At the International Electrotechnical Commission (IEC), he is currently the chairman of the Terminology Technical Committee (TC1), the secretary of the Technical Committee on Quantities and Units (TC25), and an expert in the Working Group 2 (VIM) of the Joint Committee for Guides in Metrology (JCGM). He has been the chairman of the Technical Committee 7 (Measurement Science) of the International Measurement Confederation (IMEKO). He is the author or coauthor of several scientific papers published in international journals and international conference proceedings. His research interests include measurement science and system theory.Models of measurement: the general structureWednesday, November 20, 2013 12:00 PM to 2:00 PMTolman Hall - Room 2515Measurement is laden with stereotypes, rooted in its long history and diverse fields of adoption. The consequence is that even the basic terminology (e.g., quantity, scale, accuracy, calibration, ...) is often ambiguous, or least context-dependent. The workshop introduces a background ontology of measurement, from which a basic epistemological characterization is proposed: measurement as a both conceptual and experimental process implementing a property value assignment able to produce information on a predefined property with a specified and provable level of objectivity and intersubjectivity.Models of measurement: measuring systems and metrological infrastructureThursday, November 21, 2013 9:00 AM to 11:00 AMTolman Hall - Room 5634Building upon the proposed epistemological characterization, the workshop focuses on the structural features of measuring systems, front-ends of a metrological infrastructure and tools designed and operated so to guarantee a required minimum level of objectivity and intersubjectivity for the conveyed information. This highlights the twofold nature of measurement, an information acquisition and representation process in which the role of models is unavoidable, even though possibly left implicit in the simplest cases.An overview on measurement uncertainty: from the standpoint of the Guide to the Expression of Uncertainty in Measurement (GUM)Thursday, November 21, 2013 2:00 PM to 4:00 PMTolman Hall - Room 5634The concept of measurement uncertainty offers some new connotations with respect to the traditional way the quality of measurement results has been represented, in a more and more encompassing path from ontology (true value and error), to epistemology (degree of belief), to pragmatics (target measurement uncertainty). The workshop presents a conceptual framework in which measurement uncertainty is interpreted as an overall property, synthesizing both instrumental and definitional contributions.Is the body of knowledge on measurement worth to be a ‘science’, and what may be the scope of a measurement science?Friday, November 22, 2013 10:00 AM to 12:00 PMTolman Hall - Room 5634Measurement is commonly considered a critical but only instrumental process: the body of knowledge related to measurement appears the juxtaposition of multiple contributions, from physics (or chemistry, biology, psychology, economy, ...), to systems theory and control theory, signal theory and statistics, but also information theory and computer science, philosophy of science and ontology. In perspective, also political science and ethics might be progressively interested in measurement and its social implications. Is there a distinctive, common ground for a science of measurement in the diversity of these topics? The workshop aims at introducing the discussion and proposing some reflections on the actual status of science of such a body of knowledge.
- Monday, October 7, 2013 - 1:00pm
Ronli Diakow (New York University) will conduct a workshop on Longitudinal Item Response Models. Abstract: Item response theory provides a sophisticated machinery for analyzing assessment data, but the emphasis is usually on individual assessments. Modern methods of longitudinal data analysis provide flexible and powerful ways to model change over time, but these models often treat the measured variables of interest as known. In this workshop, we will discuss the analysis of data from assessments given at multiple occasions using models that lie at the intersection of these traditions. We will address issues from both traditions such as the flexible treatment of time and measurement invariance. The primary statistical framework will be item response modeling. Connections will be made to ideas and equivalent models from the larger statistical framework of generalized latent variable modeling, in particular developments from hierarchical linear modeling and structural equation modeling. The first two hours will consist of lecture and discussion while the final hour will focus on an empirical application of the models in ConQuest, Stata, and MPlus.
Room: 2515 Tolman
- Thursday, June 13, 2013 - 10:25pmThe Graduate School of Education (GSE) at the University of California, Berkeley, is seeking two fulltime postdoctoral fellows to work with Sophia Rabe-Hesketh and Mark Wilson in the Quantitative Methods and Evaluation (QME) Program. The fellowships are funded by the Institute of Education Sciences (IES) of the U.S. Department of Education.Fellows will advance their methodological expertise and conduct research in critical areas of educational practice. They can participate in a range of existing projects associated with the QME Program and the Berkeley Evaluation and Assessment Research (BEAR) Center, and they may also initiate their own research. We currently have several grants to develop and evaluate assessment systems, including assessments of the statistical reasoning and modeling abilities of middle school students, science knowledge among upper elementary school students, mathematics skills among special education students and school-readiness skills among kindergarteners. We also have several methodological projects to develop new models and estimation methods for applications in education, such as evaluation of educational interventions. The research conducted by the postdoctoral fellows can either have a relatively stronger substantive focus or be more methodological in nature, depending on the fellows’ interests and backgrounds. Postdoctoral fellows will be able to take courses on measurement, evaluation, and statistics within the GSE and across the campus. During the fellowship period, we will invite experts in research methodology to hold workshops and interact directly with the postdoctoral fellows.For details on the required qualifications and application process, please see the full announcement (PDF document).
- Friday, May 24, 2013 - 10:45am
Featuring a focus article by Edward Haertel and commentaries by Derek Briggs, Daniel Koretz, Robert Mislevy, Lorrie Shepard, Dylan Wiliam, Andrew Ho, Gerorge Engelhard & Stefanie Wind, Lyle Bachman, Mary Garner, Suzanne Lane and Kadriye Ercikan.From the publisher:Testing is a ubiquitous tool for day-to-day decision making in schools, communicating learning goals and evaluating progress. Testing also brings unintended consequences.This topical, FREE* Special Issue of Measurement: Interdisciplinary Research and Perspectives explores the ramifications of testing in the classroom, with a view to maximizing the benefits and minimizing possible drawbacks of current educational testing applications.
- Saturday, November 10, 2012 - 1:00pm
Revised instructions for generating the figures in Chapter 5, 6 and 7 of Constructing Measures are now available.
New research report: Estimating the Revised SAT Score and its Marginal Predictive Validity by Maria Veronica Santelices and Mark WilsonMonday, October 1, 2012 - 10:03am
This paper explores the predictive validity of the Revised SAT (R-SAT) score as an alternative to the student SAT score. Freedle proposed this score for students who may potentially be harmed by the relationship between item difficulty and ethnic DIF observed in the test they took in order to apply to college. The R-SAT score is defined as the score minority student would have received if only the hardest questions from the test had been considered and was computed using formula score and an inverse regression approach. Predictive validity of short and long-term academic outcomes is considered as well as the potential effect on the overprediction and underprediction of grades among minorities. The predictive power of the R- SAT score was compared to the predictive capacity of the SAT score and to the predictive capacity of alternative Item Response Theory (IRT) ability estimates based on models that explicitly considered DIF and/or were based on the hardest test questions. We found no evidence of incremental validity in favor of the R-SAT score nor of the IRT ability estimates.
Responses to "Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning" by Santelices & WilsonTuesday, April 27, 2010 - 10:09am
In reaction to the Santelices & Wilson (2010) article on the analysis of DIF effects used in development of the SAT the College Board has published a statement criticizing the original article on its web site.Santelices & Wilson have made available their response to the College Board criticisms, reiterating the conclusions of their original paper.College Board Response: Read the College Board Response.Response to the College Board by Santelices & Wilson: Read Santelices & Wilson Response.
References:Dorans, N. J. (2010). Misrepresentations in Unfair Treatment by Santelices and Wilson. Harvard Educational Review, 80(3), 404–413.Santelices, M. V., & Wilson, M. (2010a). Responding to Claims of Misrepresentation. Harvard Educational Review, 80(3), 413–417.Santelices, M. V., & Wilson, M. (2010b). Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning. Harvard Educational Review, 80(1), 106–134.