The accuracy of face recognition algorithms has progressed rapidly due to the onset of deep learning and the widespread availability
of training data. Though tests of face recognition algorithm performance indicate yearly performance gains, error rates for many of these systems differ based on the demographic composition of the test set. These demographic differentials have raised concerns with regard to the fairness of these systems. However, no international standard for measuring fairness in biometric systems yet exists. This paper characterizes two proposed measures of face recognition algorithm fairness (fairness measures) from scientists in the U.S. and Europe, using face recognition error rates disaggregated across race and gender from 126 distinct face recognition algorithms. We find that both methods have mathematical characteristics that make them challenging to interpret when applied to these error rates. To address this, we propose a set of interpretability criteria, termed the Functional Fairness Measure Criteria (FFMC), that outlines a set of properties desirable in a face recognition algorithm fairness measure. We further develop a new fairness measure, the Gini Aggregation Rate for Biometric Equitability (GARBE), and show how, in conjunction with the Pareto optimization, this measure can be used to select among alternative algorithms based on the accuracy/fairness trade-space. Finally, to facilitate the development of fairness measures in the face recognition domain, we have open-sourced our dataset of machine-readable, demographically disaggregated error rates. We believe this is currently the largest open-source dataset of its kind.