When to use
PRIM can be used for scenario discovery where the scenarios identified will be represented as bounds on parameter values within which scenarios meet some pre-defined criteria [1].
Technically, PRIM identifies (boxes) within the input space in which the mean of the output is significantly higher that the mean of the output over the entire dataset (also described as regions of high density and high coverage) [1].
To be operationalised, PRIM requires three main features [1]:
- The generation of an input database consisting of results of a target performance objective obtained by running a simulation model for different input parameters (predictors), i.e. scenarios are defined by a set of parameter values.
- User selection of boxes that best balance criteria of coverage (i.e., ratio of relevant cases) and density (i.e., fraction of relevant points) for 'pasting'
- User selection of pasted boxes for the 'covering' stage
How
The PRIM iterative process consists of a peeling/pasting stage and a covering stage [5]
- a primary 'peeling' stage in which PRIM algorithm explores the input space where the mean of the output is significantly different from the mean value over the entire dataset. Boxes of increasingly smaller and denser size are generated and successively removed from the input space forming a 'peeling trajectory' [5].
- a 'pasting' stage (i.e., expansion of a box boundary to include some unnecessarily restricted dimensions) in which a user selects a peeling box that best balances objectives of high density and high coverage to be expanded [5].
The user typically selects boxes from graphical representations of the peeling trajectory, plotting the density of each box against its coverage [1].
- a 'covering' stage in which data points within a chosen pasted box are removed from the dataset and the peeling/pasting process is repeated with the remaining data [5].
At the end of the process, the relevance of each parameter defining a specific box is evaluated by users based on coverage, density and interpretability of results [1].
High coverage means that the box includes a higher proportion of the ‘interesting’ model scenarios available through parameter space. High density means that the box includes a higher proportion of ‘interesting’ cases relative to non-interesting ones [1]. High interpretability means that some bounds (expanded or not) may be more interpretable than others [1].
Variants of PRIM for identification of candidate boxes include:
- PCA-PRIM and CPCA-PRIM [6] don't use the original parameters, but instead identify boxes using orthogonal rotations obtained through principal component analysis (PCA). This may improve density and coverage of boxes when parameters are correlated. Constrained PCA (CPCA) limits rotations to groups of similar parameters to improve interpretability.
- Application of a meta-model using rule extraction can reduce run times [7].
- Consideration of more than two classes in scenario discovery's identification of boxes and of integers or categories based on modified objective functions [9].