Robustness metrics measure how performance of an alternative changes in different scenarios

When to use

Alternatives can be ranked according to their robustness, including as part of Efficiency-Robustness trade-offs. This is a quantitative form of stress testing.

Evaluation of metrics requires performance measures across multiple scenarios, e.g. produced from multiple model runs with different settings.

How

A wide range of robustness metrics are possible, depending on 1) how performance measures transformed, 2) scenarios are subsetted, 3) subsetted performance measures are aggregated [1]. See Resources for further details

Performance measures can, for example, be transformed by calculating regret from the best alternative, or by evaluating whether performance satisfies constraints
Subsetting scenarios might involve taking the best or worst case, or another percentile
Aggregation might involve taking mean, sum, variance or skew

Common robustness metrics include:

Maximin: Maximising worst-case performance

Minimax regret: Minimising worst-case regret from best decision alternative

Starr's domain criterion: Maximising proportion of scenarios where a performance threshold is satisfied

Some robustness metrics specifically target adaptive decisions, e.g. measuring flexibility of an initial management action.

Resources

McPhail C, Maier HR, Kwakkel JH, Giuliani M, Castelletti A, Westra S (2018) Robustness metrics: How are they calculated, when should they be used and why do they give different results?. Earth's Future. doi:10.1002/2017EF000649

Herman JD, Reed PM, Zeff HB, Characklis GW (2015). How Should Robustness Be Defined for Water Systems Planning under Change?. Journal of Water Resources Planning and Management. 141 (10): 04015012. doi:10.1061/(ASCE)WR.1943-5452.0000509

Notes

[1] McPhail et al. (2018)