

Xiaomeng Peng, Xiaoning Jin, Shiming Duan, and Chaitanya Sankavaram
Data-driven Fault Detection and Diagnostics (FDD) methods often assume that sufficient labeled samples are class-balanced and faulty classes in testing are precedent or seen previously during model training. When monitoring a large fleet of assets at scale, these assumptions may be violated:
(I) only a limited number of samples can be manually labeled due to constraints of time and/or cost; (II) most of the samples collected in the engineering systems are under normal conditions, leading to a highly imbalanced class distribution and a biased prediction model. This work presents a robust and cost-effective FDD framework that integrates active learning and semi-supervised learning methods to detect both known and unknown failure modes iteratively. This framework allows to strategically select the samples to be annotated from a fully unlabeled dataset, while labeling cost is minimal. We tested the framework and algorithms in three synthetic datasets and one real-world dataset of vehicle air intake systems, and demonstrated the superior performance compared to the state-of-the-art methods for fleet-level FDD.