The Guyot algorithm

Digitized Kaplan–Meier coordinates are not individual patient data. The Guyot algorithm bridges that gap by combining the extracted curve with the publication’s at-risk table to reconstruct approximate patient-level event and censoring times, suitable as input to standard survival model fitting.

From digitized coordinates to analysis-ready data

A digitized KM curve gives a set of (time, survival) coordinates. On its own, that is a re-tracing of the published figure: it preserves the step pattern but contains no information about how many patients were at risk, how many events occurred in each interval, or how censoring was distributed over follow-up. Survival model fitting needs that information.

Reconstruction closes the gap by combining the digitized coordinates with summary statistics that are usually published alongside the curve, most importantly the numbers at risk at reported time points, and where available the total number of events.

What the algorithm does

The Guyot algorithm (Guyot et al., 2012) takes the digitized KM coordinates together with the published at-risk table and estimates the underlying event and censoring pattern over time. The output is an approximate individual patient dataset, with one row per inferred patient and columns for time and event indicator, that can be analysed with standard survival methods such as parametric model fitting, Cox regression, or RMST estimation.

The reconstruction is approximate by construction: it recovers a dataset that is consistent with the published curve and at-risk table, not the original trial data. Where the publication reports total events, the algorithm can be constrained to match that total, reducing the uncertainty in inferred censoring.

Implementation in EasyHTA

EasyHTA performs reconstruction with the IPDfromKM R package (Liu et al., 2021), which extends the original Guyot approach via a modified iterative Kaplan–Meier estimation step. After reconstruction the project view overlays the reconstructed KM curve on the digitized input so the fidelity of the reconstruction can be visually checked before moving on to model fitting.

Time step matters. The reconstruction reads the at-risk numbers at the interval reported in the publication. Mis-specifying that interval (entering yearly counts as if they were quarterly, for example) will produce a reconstructed curve that drifts from the digitized one. Use the overlay to confirm the two curves track each other closely.

What this enables downstream

Once reconstructed IPD is available, it is treated as the input to the standard survival analysis pipeline: parametric model fitting, AIC/BIC comparison, visual inspection of fitted curves against the KM, and extrapolation. See Parametric distributions for the distributions supported and the selection criteria, and Quickstart for the end-to-end workflow.

References

Guyot P, Ades AE, Ouwens MJNM, Welton NJ (2012). Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Medical Research Methodology, 12, 9. doi:10.1186/1471-2288-12-9

Liu N, Zhou Y, Lee JJ (2021). IPDfromKM: reconstruct individual patient data from published Kaplan–Meier survival curves. BMC Medical Research Methodology, 21(1), 111. doi:10.1186/s12874-021-01308-8

← PreviousQuickstart Next →Parametric distributions

Parametric survival analysis and
cure-fraction modelling for health
technology assessment teams.

From digitized coordinates to analysis-ready data

What the algorithm does

Implementation in EasyHTA

What this enables downstream

References

Product

Company

Legal