Simulation models
PySurvival can generate random survival times based on the most commonly used distributions such as:
- Exponential
- Weibull
- Gompertz
- Log-Logistic
- Lognormal
Instance
To create an instance, use pysurvival.models.simulations.SimulationModel
.
Attributes
alpha
: double -- the scale parameterbeta
: double -- the shape parametercensored_parameter
: double -- coefficient used to calculate the censored distribution.risk_type
: string -- Defines the type of risk function.risk_parameter
: double -- scaling coefficient of the risk scoresurvival_distribution
: string -- Defines a known survival distribution.times
: array-like -- representation of the time axis of the modeltime_buckets
: array-like -- representation of the time axis of the model using time bins, which are represented by
API
__init__
- Initialization
SimulationModel( survival_distribution = 'exponential', risk_type = 'linear', censored_parameter = 1., alpha = 1, beta = 1., bins = 100, risk_parameter = 1.)
Parameters:
-
survival_distribution
: string (default = 'exponential') -- Defines a known survival distribution. The available distributions are:Exponential
Weibull
Gompertz
Log-Logistic
Log-Normal
-
risk_type
: string (default='linear') -- Defines the type of risk function: - Linear - Square - Gaussian -
censored_parameter
: double (default = 1.) -- Coefficient used to calculate the censored distribution. This distribution is a normal such that N(loc=censored_parameter, scale=5) -
alpha
: double (default = 1.) -- the scale parameter -
beta
: double (default = 1.) -- the shape parameter -
bins
: int (default=100) -- the number of bins of the time axis -
risk_parameter
: double (default = 1.) -- Scaling coefficient for the risk score which can be written as follow:linear
:square
:gaussian
:
generate_data
- Generating a dataset of simulated survival times from a given distribution through the hazard function using the Cox model
generate_data(num_samples = 100, num_features = 3, feature_weights=None)
Parameters:
-
num_samples
: int (default=100) -- Number of samples to generate -
num_features
: int (default=3) -- Number of features to generate -
feature_weights
: array-like (default=None) -- list of the coefficients of the underlying Cox-Model. The features linked to each coefficient are generated from random distribution from the following list:- binomial
- chisquare
- exponential
- gamma
- normal
- uniform
- laplace
If None then
feature_weights = [1.]*num_features
Returns:
dataset
: pandas.DataFrame -- dataset of simulated survival times, event status and features
predict_hazard
- Predicts the hazard function
predict_hazard(x, t = None)
Parameters:
-
x
: array-like -- input samples; where the rows correspond to an individual sample and the columns represent the features (shape=[n_samples, n_features]). x should not be standardized before, the model will take care of it -
t
: double (default=None) -- time at which the prediction should be performed. If None, then it returns the function for all available t.
Returns:
hazard
: numpy.ndarray -- array-like representing the prediction of the hazard function
predict_risk
- Predicts the risk score
predict_risk(x)
Parameters:
x
: array-like -- input samples; where the rows correspond to an individual sample and the columns represent the features (shape=[n_samples, n_features]). x should not be standardized before, the model will take care of it
Returns:
risk_score
: numpy.ndarray -- array-like representing the prediction of the risk score
predict_survival
- Predicts the survival function
predict_survival(x, t = None)
Parameters:
-
x
: array-like -- input samples; where the rows correspond to an individual sample and the columns represent the features (shape=[n_samples, n_features]). x should not be standardized before, the model will take care of it -
t
: double (default=None) -- time at which the prediction should be performed. If None, then return the function for all available t.
Returns:
survival
: numpy.ndarray -- array-like representing the prediction of the survival function
Example
Let's now see how to generate a dataset designed for survival analysis.
import pandas as pd from pysurvival.models.simulations import SimulationModel %pylab inline # Initializing the simulation model sim = SimulationModel( survival_distribution = 'gompertz', risk_type = 'linear', censored_parameter = 5.0, alpha = 0.01, beta = 5., ) # Generating N Random samples N = 1000 dataset = sim.generate_data(num_samples = N, num_features=5) # Showing a few data-points dataset.head(2)
We can now see an overview of the data:
x_1 | x_2 | x_3 | x_4 | x_5 | time | event |
---|---|---|---|---|---|---|
3.956896 | 124.0 | 0.018274 | 57.480199 | -5.42258 | 0.024329 | 1.0 |
4.106100 | 117.0 | 0.111276 | 51.770875 | 4.105588 | 0.175530 | 1.0 |
PySurvival also displays the Base Survival function of the Simulation model:
from pysurvival.utils.display import display_baseline_simulations display_baseline_simulations(sim, figure_size=(20, 6))