Oslo Sports Trauma Research Center

Main content of the page

Information about project titled 'How to handle missing data'

How to handle missing data

Details about the project - category Details about the project - value
Project status: Published
Project manager: Lena Kristin Bache-Mathiesen
Supervisor(s): Morten Wang Fagerland, Thor Einar Andersen
Coworker(s): Ben Clarsen


Background: In recent years, researchers have attempted to determine the effect of training load on sports injury risk.Sufficient sample size is necessary to estimate the effect with acceptable certainty. Missing data in training load measures causes reduced sample sizes and, in the worst-case scenario, introduces selection bias. So far, no study has provided a solution to missing data in training load measures.

Aim: Determine how missing data should be handled in training load and injury risk research.

Methods: First, we mapped the current practice of handling missing data in the training load and injury field with a systematic review. Second, we simulated a relationship between training load and injury risk in a Norwegian Premier League men’s football dataset (n = 39). Methods for imputing or deleting missing observations in training load were compared by their ability to uncover the simulated relationship.

Results: Only 37 (34%) of 108 studies reported whether training load had any missing observations. Multiple Imputation using Predicted Mean Matching was the best method of handling missing data across multiple scenarios.

Conclusion: Studies of training load and injury risk should report the extent of missing data, and how they are handled. Multiple Imputation with Predicted Mean Matching should be used when imputing athlete-reported training load and GPS variables.