New Paper - The Shape of Explanations: A Topological Account of Rule-Based Explanations in Machine Learning
I have a new paper out that will be presented at the AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI. This paper introduces a formal model to explore rule-based explanations for classifiers. Explanations of this sort explain a classification by providing a simple sufficiency condition. For example, if an applicant’s loan is rejected by a predictive model, a rule-based explanation may be that the application belongs to the group with outstanding debt greater than $X$, missed payments greater than $Y$, etc. and every application (or close to every application) in this group is rejected. The key observation is that this rule completely describes a region in the feature space. One way to think about a topology on a space is that it provides a language for describing subsets of the space with some being more descriptively complex than others (which we can describe using known hierarchy results such as the Borel hierarchy). Using this, we prove that a classifier being explainable is equivalent to the inverse image of each label being the union of a descriptively simple set and a small set.