After Kaplan-Meier curve digitization: IPD reconstruction and survival extrapolation
After extracting survival data from a published Kaplan-Meier curve, teams still need to reconstruct usable patient-level data or other analysis-ready survival inputs, compare survival models, and generate extrapolation outputs that are practical for HEOR work.
This page explains that workflow, and where EasyHTA fits into it.
Why teams digitize Kaplan-Meier curves
In many published time-to-event studies, the Kaplan-Meier curve is available, but the underlying patient-level data is not. For HEOR and biostatistics teams, digitizing the curve is often the first step toward building inputs for survival extrapolation, evidence synthesis, or indirect treatment comparison.
Digitization turns a figure into usable coordinate data, but it does not complete the analysis. In many HEOR workflows, digitized curve coordinates are combined with published summary information to create approximate IPD or other analysis-ready survival inputs for downstream survival modeling. The gap between digitization and analysis-ready survival modeling is where many manual workflows become time consuming.
How most teams handle this workflow today
Most HEOR and market access teams handle this workflow through R scripts, typically using packages like flexsurv for parametric model fitting and digitize or IPDfromKM for data reconstruction. Some teams use survHE for a more structured interface. Others maintain internal templates or bespoke scripts that have been adapted over successive projects.
Where scripted workflows create friction
Reproducibility across analysts. When two analysts write independent scripts for the same task, subtle differences in implementation can produce materially different extrapolation results. Unless the team maintains a shared, version-controlled codebase, these discrepancies tend to surface late.
The comparison bottleneck. Fitting a single parametric model is straightforward. Generating a structured comparison across candidate distribution typically requires additional scripting effort. When a reviewer asks to add a new scenario, the turnaround depends on coding time, not analytical judgement.
The handoff to the economic model. The health economist building the cost-effectiveness model needs survival inputs in a specific format: typically fitted distribution parameters with their variance-covariance matrices, or extrapolated survival probabilities at defined time points. Exporting these cleanly from R fitted objects involves custom code, manual formatting, or both. Errors in this handoff are common and difficult to detect.
Reviewer access. Clinical experts and non-programming stakeholders who need to review the extrapolation choices often cannot engage meaningfully with an R script. This creates an additional reporting step where the analyst generates static outputs for review, then iterates based on feedback – sometimes several times per analysis.
How reconstructed IPD is created from digitized KM data
Depending on what information is available alongside the digitized coordinates, this step may involve combining the extracted curve data with numbers at risk, event totals, or other summary statistics from the publication.
The Guyot algorithm
The foundational reconstruction method is the Guyot algorithm. It uses digitized Kaplan-Meier coordinates together with published at-risk tables to estimate the event and censoring pattern over time, producing approximate individual patient data suitable for standard survival model fitting.
Some teams implement this through the original published R code. Commonly, teams may also use the IPDfromKM R package, which extends the approach of Guyot et al. through a modified iterative KM estimation algorithm.
EasyHTA Survival Studio uses IPDfromKM in its backend – the same established R package used by many HEOR teams for IPD reconstruction, wrapped in a guided interface so teams do not need to write or maintain separate reconstruction scripts.
References: Guyot, P., Ades, A. E., Ouwens, M. J., & Welton, N. J. (2012). Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC medical research methodology, 12, 9. https://doi.org/10.1186/1471-2288-12-9
Liu, N., Zhou, Y., & Lee, J. J. (2021). IPDfromKM: reconstruct individual patient data from published Kaplan-Meier survival curves. BMC medical research methodology, 21(1), 111. https://doi.org/10.1186/s12874-021-01308-8
From reconstructed data to survival extrapolation
After reconstruction, the analytical focus shifts to fitting candidate parametric models and evaluating how they behaviour both within and beyond the observed data.
The standard approach, consistent with NICE Technical Support Document 14 and similar methodological guidance, is to fit a range of candidate distributions to the reconstructed data, assess statistical fit in the observed period, and then evaluate whether the long-term extrapolated survival is clinically and externally plausible.
The main considerations at this stage are: which candidate models provide an acceptable fit in the observed period (assessed through AIC, BIC, and visual inspection); how different models behave when extrapolated beyond the trial follow-up; and whether the implied long-term survival is clinically plausible.
Where EasyHTA supports model comparison
In a scripted workflow, generating a structured comparison across the standard candidate distributions (exponential, Weibull, log-normal, log-logistic, Gompertz, generalized gamma) requires writing or maintaining code, extracting fit statistics, and producing overlaid plots. EasyHTA automates this step: from digitized inputs, it reconstructs IPD and fits the full set of candidate models, generating:
- AIC and BIC comparison tables across all fitted distributions
- Overlaid survival function plots for visual assessment
- Fitted parameter values and extrapolated survival probabilities, ready for export
This does not replace clinical or statistical judgement about which model is appropriate. It removes the coding and formatting overhead that sits between judgement and output.
From fitted models to usable outputs
The survival analysis does not end with a fitted model. The result needs to reach key audiences: health economists building the downstream cost-effectiveness model, and the clinical or methodological assessors. In a scripted workflow, serving both audiences involves custom export code, manual formatting, and a separate reporting step, each introducing delay and potential error.
Export for downstream modelling
EasyHTA Survival Studio includes direct Excel export of extrapolation outputs. This includes shape and scale parameters, variance-covariance matrices, and model comparison summaries formatted for direct use in downstream models.
For many Excel-based cost-effectiveness models, the key inputs are the fitted distribution parameters and their variance-covariance structure, which allow the modeller to recreate the extrapolations and run probabilistic sensitivity analyses within the economic model itself. Extracting these from R fitted objects and formatting them for Excel is a common source of friction and transcription error. EasyHTA exports them in a structured, labelled format ready for use.
Where EasyHTA fits in the workflow
EasyHTA Survival Studio sits after the KM digitization step and before the final decision outputs. It covers IPD reconstruction, parametric model fitting across standard candidate distributions, structured model comparison, and formatted export for downstream use.
Survival Studio is not a replacement for custom statistical analysis where that is warranted. It is designed for the common, repeated workflow pattern where a team needs to move from time and survival data to practical extrapolation outputs without rebuilding the analysis pipeline each time.
Try the workflow on a real survival analysis task
If your team is working from published Kaplan-Meier curves and the path from digitization to extrapolation outputs currently runs through manual scripts and custom formatting, EasyHTA is built for exactly that workflow.
LinkedIn