Robustness metrics measure how performance of an alternative changes in different scenarios

When to use

Alternatives can be ranked according to their robustness, including as part of Efficiency-Robustness trade-offs. This is a quantitative form of stress testing.

Evaluation of metrics requires performance measures across multiple scenarios, e.g. produced from multiple model runs with different settings.

How

A wide range of robustness metrics are possible, depending on 1) how performance measures transformed, 2) scenarios are subsetted, 3) subsetted performance measures are aggregated [1]. See Resources for further details

  • Performance measures can, for example, be transformed by calculating regret from the best alternative, or by evaluating whether performance satisfies constraints
  • Subsetting scenarios might involve taking the best or worst case, or another percentile
  • Aggregation might involve taking mean, sum, variance or skew
Common robustness metrics include:
  • Maximin: Maximising worst-case performance
  • Minimax regret: Minimising worst-case regret from best decision alternative
  • Starr's domain criterion: Maximising proportion of scenarios where a performance threshold is satisfied
  • Some robustness metrics specifically target adaptive decisions, e.g. measuring flexibility of an initial management action.

    Resources

  • McPhail C, Maier HR, Kwakkel JH, Giuliani M, Castelletti A, Westra S (2018) Robustness metrics: How are they calculated, when should they be used and why do they give different results?. Earth's Future. doi:10.1002/2017EF000649
  • Herman JD, Reed PM, Zeff HB, Characklis GW (2015). How Should Robustness Be Defined for Water Systems Planning under Change?. Journal of Water Resources Planning and Management. 141 (10): 04015012. doi:10.1061/(ASCE)WR.1943-5452.0000509
  • Notes

  • [1] McPhail et al. (2018)