International biometric testing standards distinguish scenario evaluations from technology evaluations. Scenario evaluations measure the performance of an end-to-end system, in a simulated real-world environment, using live human participants. Executing biometric scenario evaluations is challenging, but they provide important insights that technology evaluations cannot, such as the simulated performance of the full system and the ability to attribute errors to specific system components. These insights are crucial for assessing what systems should be selected for an operational biometric deployment. The U.S. Department of Homeland Security Biometric Technology Rallies are a series of scenario evaluations of commercial biometric systems designed to operate in high-throughput environments. They are one of the only large-scale, scenario evaluations of complete, commercially available biometric systems. The 2019 Biometric Technology Rally tested the performance of ten face acquisition systems and eight face matching systems with a sample of 430 diverse human subjects. The 2019 Rally found that most (6/10) face acquisition systems maintained average transaction times under five seconds and that half (5/10) received satisfaction ratings in excess of 95% positive. However, less than half (4/10) of the acquisition systems were able to reliably acquire images from 99% of the tested participants and only a single (1/10) system produced images suitable for identifying all 430 participants. These levels of effectiveness were not well anticipated by commercial providers of these acquisition systems, meaning if system owners used vendor provided estimates of performance to plan an operational deployment, serious deficiencies, potentially requiring costly reworks or program cancellation, could have occurred. Results from the 2019 Rally also led to two additional findings. First, the most prominent source of errors in high-throughput face biometric systems were related to acquiring a suitable face biometric sample, not matching two suitable face biometric samples. A renewed focus on user interaction during image acquisition (camera placement, camera adjustment, informative signage, etc.) offers significant room to improve the performance of high-throughput face biometric systems. Second, when matching systems were tested in combination with acquisition systems, half showed statistically significant levels of variation in performance across acquisition systems. The remaining half worked well (> 95% true identification rate) only on some acquisition systems, with one matching system working well only on images from a single acquisition system. We propose a matching system taxonomy (robust, brittle, and specialist) to describe this variation and discuss the impact of matching system choice on operational error rates.