As of today, engineers are building DL systems for high-risk, non-quantifiable performance scenarios such as autonomous cars. This talk highlights the pressing need to incorporate domain knowledge into more rigorous and deterministic testing methodologies for DL models to ensure that these black-box solutions encapsulate the intended behaviors (in line with prior beliefs) and respond as expected within their operational domain data. The oracle problem in DL systems requires more flexible assertion tests, like invariance and directional expectation, to verify the model’s behavior even without ground truth. The presentation introduces domain-aware deep learning (DL) model testing, which complements the usual statistical assessments by addressing the challenges posed by limited training samples and pipeline underspecification. It includes results of previous research on the implementation of invariance and directional expectation tests. Additionally, it introduces semantically-preserving data transformations and their utility for generating valid inputs from complex domains using existing data. Furthermore, generative AI breakthroughs will enable us to create these complex entries using statistical learning, just as software tests are written in code.