Findings from the TIMSS 2019 Problem Solving and Inquiry Tasks

Ina V.S. Mullis, Michael O. Martin, Bethany Fishbein, Pierre Foy, and Sebastian Moncaleano


The TIMSS 2019 Problem Solving and Inquiry (PSI) tasks were developed to gain insights into how using digitally-based interactive assessment items to capture students’ responses could be incorporated into TIMSS. The goal was not to define new problem solving and inquiry constructs, but to collect information that would help enhance and extend the breadth of the TIMSS assessment to provide more comprehensive coverage of problem solving and inquiry as already described in the assessment frameworks.

TIMSS 2019 Transition to Digital Assessment

In 2019, TIMSS transitioned to digital mathematics and science assessments at both fourth and eighth grades. Half the nearly 70 countries participating in TIMSS 2019 administered the new eTIMSS digitally based assessment, contributing to its development through pilot studies and field testing, while the other half continued with paperTIMSS. By carefully managing how the eTIMSS computerization of items was introduced into TIMSS 2019, making some obvious improvements (e.g., clicking a response to multiple-choice item rather than filling in a circle), but still taking great care to mirror paperTIMSS, most of the items in eTIMSS and paperTIMSS had similar psychometric properties. As documented in Methods and Procedures: TIMSS 2019 Technical Report,1 a complicated step-by-step scaling process enabled linking the eTIMSS and paperTIMSS to the TIMSS mathematics and science achievement scales. The publication of the TIMSS 2019 International Results in Mathematics and Science2 reporting the results for all participating countries on the TIMSS 2019 mathematics and science achievement scales signaled that the transition was complete.

As an important feature of the transition, eTIMSS created the opportunity to develop innovative assessment measures that would enhance coverage of problem solving and inquiry processes. It was evident that a computer-based TIMSS had the potential for improving the quality of the TIMSS measures of higher-order skills (e.g., more depth of concepts, dynamic features, and process data), while at the same time making the data collection of complex tasks feasible. Assessing the TIMSS 2019 frameworks with engaging, computerized assessment tasks benefitting from the most current research became an explicit TIMSS 2019 development goal. Beginning in 2017, the TIMSS & PIRLS International Study Center began developing the “TIMSS 2019 Problem Solving and Inquiry” tasks. Eventually, eight tasks were developed—two for mathematics and two for science each at fourth grade and eighth grade. The eight tasks were assembled into two special eBooklets per grade that were assessed together with eTIMSS in the eTIMSS countries according to a rotated design (see Appendix A). Thus, all the eTIMSS countries (but no paperTIMSS countries) participated in assessing the TIMSS 2019 Problem Solving and Inquiry tasks, including 30 countries and 6 benchmarking systems with about 22,000 students at the fourth grade, and 22 countries and 5 benchmarking systems with about 20,000 students at the eighth grade.

This report presents four of the Problem Solving and Inquiry tasks together with the achievement results across the countries, focusing on the strengths and weaknesses of the tasks themselves.

  • School Partyfourth grade mathematics:  Students plan a party for their school (ticket sales, decorations, food, and drinks).
  • Farm Investigationfourth grade science:  A boy investigates which farm animal ate the plants in his garden.
  • Buildingeighth grade mathematics:  Students construct a storage shed with a rain barrel.
  • Pepper Plantseighth grade science:  Students conduct an experiment to determine the most effective fertilizer.

eTIMSS 2019 also made it possible to collect valuable process data about the ways students proceed through the assessment sessions. This included extensive process data on event timing, navigation from screen to screen, scrolling, and the use of calculators and rulers. These data make it possible to recreate the student’s progress through the tasks, and were particularly useful in analyzing non-response data; distinguishing between students who ran out of time and those who stopped responding before time was up. Before erroneously assuming students needed additional assessment time for the PSI tasks, it was important to learn that “running out of time” was less common than “stopping” with plenty of time remaining (see Appendix B). Understanding their reasons for stopping requires further research, but probably some were tired or frustrated. Further highlighting its research potential, the TIMSS 2019 process data also was used for analysis of incorrect responses and learning more about how students dealt with the interactive features to help explain why sometimes performance was lower than expected. Upon discovering considerable non-response to some of the PSI tasks, especially compared to nearly negligible non-response for the “regular” eTIMSS item, the timing data was used to investigate the low completion rates.

Finally, as a byproduct of the PSI tasks, one item in Building asked students to show how they would cut the walls out of a board. These responses were used to study the feasibility of TIMSS using automated scoring in the future (see Appendix C). Looking back in time to the TIMSS 2019 transition to digital assessment, the decision to move forward and take advantage of technology and new psychometric research will be recognized as starting a sea change in TIMSS assessment methods and procedures.

Brief History of TIMSS and Problem Solving and Inquiry Tasks

Innovative assessments to assess higher-order skills have been part of TIMSS since its inception. The inaugural TIMSS 1995 included what was at the time considered to be a “state-of-the-art” performance assessment that was given to fourth grade students in 10 countries and eighth grade students in 21 countries. As explained in the TIMSS 1995 report of the results, the performance assessment was based on integrated, practical tasks involving instruments and equipment as a means of assessing students’ content and procedural knowledge, as well as their ability to use that knowledge in reasoning and problem solving (see TIMSS 1995 Performance Assessment3). Performance assessment was considered particularly useful for assessing science as a process of inquiry (beyond just a body of knowledge). Of the 12 tasks given to the fourth and eighth grade students, 11 were similar across grades and one was unique. There were five mathematics tasks—Dice, Calculator, Folding and Cutting, Around the Bend, and Packaging; and five science tasks—Pulse, Magnets, Batteries, Rubber Band, and Containers (fourth grade) or Solutions (eighth grade). Considerable effort was expended in assessing a framework of “performance expectations” that included problem solving, designing an investigation, analyzing and interpreting findings, as well as formulating conclusions.

The performance assessment was administered in a “circus-ring” format where students visited three of five stations located around a room, each consisting of the assembled equipment for one or two tasks. The equipment for the tasks weighed about 100 lbs and needed to be set up in a large room. Thus, it was only feasible to give this very labor and resource intensive assessment to subsamples of students that had participated in the main assessment.

When TIMSS 2003 established regularly administered assessments at the fourth and eighth grades every four years to monitor trends, the U.S. National Science Foundation (NSF) awarded Boston College a grant to support framework and assessment development. The idea was to develop extended problem solving and inquiry tasks, but using only paper-and-pencil instruments. Progress was made on developing content assessment goals tailored specifically to fourth or eighth grade, but the mathematicians, scientists, and measurement community struggled to make the paper-and-pencil tasks accessible to the students as well as engaging. The performance assessment was different and “fun,” for example, in a task about the effects of exercise on the body, students got to jump up and down to get their heart rates up. As a disadvantage, students in the TIMSS 2003 participating countries faced an unfamiliar idea—a test that gave you a long time to work through a series of items on a topic (e.g., an ocean food chain or why different colors of light can change the color of your shirt). In general, the early paper-and-pencil PSI tasks of 2003 were not very motivating, so these longer tasks were eventually phased out of upcoming assessments.

Nevertheless, it was widely agreed that the problem solving and inquiry skills were fundamental to the TIMSS assessment frameworks. For TIMSS 2007, the U.S. National Center for Education Statistics (NCES) organized an initiative for countries to contribute funding for TIMSS to develop cognitive as well as content assessment goals. This resulted in three cognitive domains—knowing, applying, and reasoning—becoming a permanent dimension of the mathematics and science assessments at both fourth and eighth grades. Once again for TIMSS 2015, the TIMSS & PIRLS International Study Center at Boston College worked with the National Center for Education Statistics to obtain additional funding from NSF for innovative item development, especially since TIMSS 2015 also included assessing trends in TIMSS Advanced. However, this effort to secure funding was unsuccessful, so the reasoning skills associated with problem solving and inquiry remained in the assessment frameworks with little attention further paid to developing longer assessment tasks. The several problem-solving and inquiry assessment goals shown below have been excerpted from the TIMSS 2019 Assessment Frameworks.4

  • Mathematics Frameworks
  • Reasoning mathematically involves logical, systematic thinking. It includes intuitive and inductive reasoning based on patterns and regularities that can be used to arrive at solutions to problems set in…real life settings.
  • Determine efficient/appropriate operations, strategies, and tools for solving problems for which there are commonly used methods of solution.
  • Implement strategies and operations to solve problems involving familiar mathematical concepts and procedures.
  • Link different elements of knowledge, related representations, and procedures to solve problems.
  • Science Frameworks
  • Scientists engage in scientific inquiry by following key science practices that enable them to investigate the natural world and answer questions about it. Students of science must become proficient at these practices…
  • Use a diagram or other model to demonstrate knowledge of science concepts, to illustrate a process, cycle, relationship, or system, or to find solutions to science problems.
  • Provide or identify an explanation for an observation or a natural phenomenon using a science concept or principle.
  • Plan investigations or procedures appropriate for answering scientific questions or testing hypotheses; and describe or recognize the characteristics of well-designed investigations in terms of variables to be measured and controlled and cause-and-effect relationships.


The TIMSS 2019 Problem Solving and Inquiry (PSI) Tasks

Re-imagined for TIMSS 2019, PSI tasks are visually attractive, interactive scenarios that present students with adaptive and responsive ways to follow a series of steps (assessment items) toward a solution or goal. The students’ responses are provided via a mixture of selection and constructed response items as well as through various innovative formats to capture students’ responses (e.g., number pad, drag and drop, graphing tools, and free drawings).

There are many different ways of instantiating a PSI task. For example, a PSI task can be:

  • An interactive science experiment, where students set up and run the experiment, adjusting settings and observing the results (see Pepper Plants—Science Eighth Grade).
  • A mathematics problem, where students work from a visualization to a finished product involving multiple steps and evaluation of interim results (see Building—Mathematics Eighth Grade).
  • A mathematical or scientific model that can be manipulated by the students (e.g., predator-prey relationships, solutions, or forces and motion).
  • A systematic investigation of the attributes of an object, place, or living organism, implementing a process, or considering cause and effect relationships embedded in a scenario that is compelling and targets topics in the framework (see School Party and Farm Investigation, Mathematics and Science, respectively, Fourth Grade).


  • Each PSI task should be situated in a real world, problem, investigation, or activity that provides an underlying narrative or theme for the items. The problem or situation must be sufficiently wide to encompass a number of content and cognitive areas in the Mathematics or Science Frameworks. As much as possible, PSI tasks should attempt to include items addressing various content topics and a range of cognitive demands.
  • The narrative should provide a logical or chronological progression from the first item to the ending.
  • Because PSI tasks with a single narrative from start to finish can be hard to achieve, PSI tasks also can be written that do not have much narrative, provided there is a common theme to link the items together. The thematic type of PSI task gives students an opportunity to interact with various aspects of a scenario without the order of the interactions having an impact. The items can be independent, while still being coherent and engaging.

In any PSI task, it is important that the items are independent of each other. Whether or not a student gets one item correct should not affect whether the student gets another item correct. That is, in general an answer to an item should not give students a clue so that they could go back and change the answer to a previous item. Or, an item should not be based on a correct answer to the previous item, because not all students will have provided the correct answer. The various incorrect answers can impact the difficulty of the second item or even make it impossible to answer. On the other hand, if designed properly, process data can be used to research “looking back” behaviors as part of students’ test-taking strategies.

Developing the Problem Solving and Inquiry Tasks for eTIMSS 2019

TIMSS 2019 PSI task development at fourth and eighth grades adhered to standard TIMSS procedures for ensuring valid measures of the mathematics and science achievement described in the TIMSS 2019 Assessment Frameworks.5 However, developing new and engaging problem contexts with cohesive sets of achievement items necessitated many more rounds of expert review than usual, so staff at the TIMSS & PIRLS International Study Center collaborated with members of the TIMSS 2019 Science and Mathematics Item Review Committee (SMIRC) in August 2015 to begin developing the PSI tasks. This was nearly two years before item writing began for the rest of the TIMSS 2019 field test items (April 2017), and involved five additional in-person meetings at Boston College and numerous online reviews.

Cognitive laboratories involving 34 students in the United States (August 2015) provided critical information about the usability of the eTIMSS interface and various innovative item types. SMIRC as a whole focused its first in-depth review of the PSI tasks on the alignment between the tasks and the frameworks, the extent to which the technology in the tasks supported the intended response processes, and the cross-cultural appropriateness of the problem scenarios. Small pilot tests in several eTIMSS countries provided key information at different points in the development process.

The eTIMSS prePilot including a total of 12 PSI tasks was conducted in September 2016 in three English-speaking countries with experience in conducting digital assessments: Australia, Canada, and Singapore. Each country included students with a range of mathematics and science ability in the prePilot, yielding approximately 100 responses per item at both the fourth and eighth grades. The prePilot provided further information about the usability of newly developed item types and students’ success in using the eTIMSS interface, as well as estimates of the amount of time it took students to complete each task and the task’s approximate difficulty.

National Research Coordinators (NRCs) reviewed the PSI tasks at their 3rd TIMSS 2019 NRC meeting which was held prior to conducting the field test (March 2017) and then reviewed them again after the field test (August 2018) to select the tasks to be included in the eTIMSS 2019 assessment. The NRCs selected eight PSI tasks (four at fourth grade with 50 items and four at eighth grade with 55 items) for the main data collection. The eight tasks covered a range of mathematics and science content domain topics, and consistent with the goal of the PSI tasks to assess higher-order skills, the majority of the items in the PSIs involved applying and reasoning.

Appendix A provides an overview of the parallel assessment designs for paperTIMSS 2019 and for eTIMSS 2019. The eTIMSS design also specifies the rotated arrangement of the eight PSI tasks—two for mathematics and two for science at each grade. Both fourth and eighth grades included two separate booklets of PSI items.

Including the PSI items in the TIMSS 2019 Mathematics and Science Achievement Scales at Fourth and Eighth Grades

Exhibits 1 through 4 compare TIMSS 2019 achievement estimated with and without the PSI data for the eTIMSS countries (one exhibit each for mathematics at fourth grade, science at fourth grade, mathematics at eighth grade, and science at eighth grade, respectively). The first column in each exhibit is a reproduction of the average achievement results published in TIMSS 2019 International Results in Mathematics and Science6 for countries that administered the digital version of TIMSS (eTIMSS). The second column presents the average achievement results for eTIMSS including the TIMSS 2019 PSIs (for details of the scaling procedures, see Chapter 17 in Methods and Procedures: TIMSS 2019 Technical Report7). For each grade, there essentially was no difference (0 scale score points on average) between eTIMSS average achievement excluding the PSI students compared to average achievement including the PSI students for either mathematics or science.

Grade 4 MathematicsGrade 4 ScienceGrade 8 MathematicsGrade 8 Science

Important Information for Future Development

It should be noted that the concept of an effective PSI task will continue to evolve, because following publication of this report at the end of October, IEA will release the process data for the TIMSS 2019 PSI tasks, enabling a series of further in-depth analyses. Basic criteria that were important in TIMSS 2019 remain, however, additional considerations have emerged:

  • The PSI task must address topics in the TIMSS mathematics or science frameworks.
  • PSI tasks can be full length at eighth grade (a block of 10 to 15 items) or “mini” about 5 to 8 items. At fourth grade, there only will be mini-PSI tasks in TIMSS 2023.
    • The completion rates presented in Appendix A for the items in the eTIMSS assessment compared to the PSI tasks show that the fourth grade PSI tasks had comparatively low completion rates.
  • No PSI task or items should require excessive reading, perseverance, or specialized knowledge.
  • Typically, the first screen introduces the topic, and the following screens present the items (no ending screen).
  • Each PSI task should include a range of item difficulty. Typically a task should start with easier items and end with more difficult items.
  • PSI items should not be dependent on other items (unless it is for planned research purposes).
  • PSI items should take advantage of the digital environment, using interactive or adaptive features, but not gratuitously.
  • The mode of capturing the students’ responses should assist the students in displaying their mathematics or science understanding, not create a distraction.
  • PSI tasks and items should be designed to capitalize the potential of process data.
  • PSI items must adhere to the TIMSS 2019 Item Writing Guidelines.8
  • A scoring guide needs to accompany each human scored PSI item. (Partial credit may be awarded if warranted. Process data may be used for this purpose.)



1  Foy, P., Fishbein, B., von Davier, M., & Yin, L. (2020). Implementing the TIMSS 2019 scaling methodology. In M. O. Martin, M. von Davier, & I. V. S. Mullis (Eds.), Methods and Procedures: TIMSS 2019 Technical Report (pp. 12.1–12.146). Retrieved from Boston College, TIMSS & PIRLS International Study Center website:

2  Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science. Retrieved from Boston College, TIMSS & PIRLS International Study Center website:

3  Harmon, E., Smith, T. A., Martin, M. O., Kelly, D. L., Beaton, A. E., Mullis, I. V. S., Gonzalez, E. J., & Orpwood, G. (1997). Performance Assessment in IEA’s Third International Mathematics and Science Study (TIMSS). Retrieved from Boston College, TIMSS & PIRLS International Study Center website:

4  Mullis, I. V. S., & Martin, M. O. (Eds.). (2017). TIMSS 2019 Assessment Frameworks. Retrieved from Boston College, TIMSS & PIRLS International Study Center website:

5  Ibid.

6  Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science. Retrieved from Boston College, TIMSS & PIRLS International Study Center website:

7  Fishbein, B., & Foy, P. (2021). Scaling the TIMSS 2019 problem solving and inquiry data. In M. O. Martin, M. von Davier, & I. V. S. Mullis (Eds.), Methods and Procedures: TIMSS 2019 Technical Report (pp. 17.1–17.51). Retrieved from Boston College, TIMSS & PIRLS International Study Center website:

8  Mullis, I. V. S., Martin, M. O., Cotter, K. E., & Centurino, V. A. S. (2020). TIMSS 2019 Item Writing Guidelines. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: