Publications

A Self-explaining Neural Architecture for Generalizable Concept Learning

Published in IJCAI-2024, 2024

With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level ‘concepts’ - abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA concept learning approaches suffer from two major problems - lack of concept fidelity wherein the models fail to learn consistent concepts among similar classes and limited concept interoperability wherein the models fail to generalize learned concepts to new domains for the same task. Keeping these in mind, we propose a novel self-explaining architecture for concept learning across domains which - i) incorporates a new concept saliency network for representative concept selection, ii) utilizes contrastive learning to capture representative domain invariant concepts, and iii) uses a novel prototype-based concept grounding regularization to improve concept alignment across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets. Empirical results show that our method improves both concept fidelity measured through concept overlap and concept interoperability measured through domain adaptation performance.

Download here

Understanding and Enhancing Robustness of Concept-Based Models

Published in AAAI-2023, 2023

Rising usage of deep neural networks to perform decision making in critical applications like medical diagnosis and financial analysis have raised concerns regarding their reliability and trustworthiness. As automated systems become more mainstream, it is important their decisions be transparent, reliable and understandable by humans for better trust and confidence. To this effect, concept-based models such as Concept Bottleneck Models (CBMs) and Self-Explaining Neural Networks (SENN) have been proposed which constrain the latent space of a model to represent high level concepts easily understood by domain experts in the field. Although concept-based models promise a good approach to both increasing explainability and reliability, it is yet to be shown if they demonstrate robustness and output consistent concepts under systematic perturbations to their inputs. To better understand performance of concept-based models on curated malicious samples, in this paper, we aim to study their robustness to adversarial perturbations, which are also known as the imperceptible changes to the input data that are crafted by an attacker to fool a well-learned concept-based model. Specifically, we first propose and analyze different malicious attacks to evaluate the security vulnerability of concept based models. Subsequently, we propose a potential general adversarial training-based defense mechanism to increase robustness of these systems to the proposed malicious attacks. Extensive experiments on one synthetic and two real-world datasets demonstrate the effectiveness of the proposed attacks and the defense approach.

Download here

Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing

Published in EMNLP-BlackboxNLP, 2021

Interpretability methods like Integrated Gradients and LIME are popular choices for explaining natural language model predictions with relative word importance scores. These interpretations need to be robust for trustworthy NLP applications in high-stake areas like medicine or finance. Our paper demonstrates how interpretations can be manipulated by making simple word perturbations on an input text. Via a small portion of word-level swaps, these adversarial perturbations aim to make the resulting text semantically and spatially similar to its seed input (therefore sharing similar interpretations). Simultaneously, the generated examples achieve the same prediction label as the seed yet are given a substantially different explanation by the interpretation methods. Our experiments generate fragile interpretations to attack two SOTA interpretation methods, across three popular Transformer models and on two different NLP datasets. We observe that the rank order correlation drops by over 20% when less than 10% of words are perturbed on average. Further, rank-order correlation keeps decreasing as more words get perturbed. Furthermore, we demonstrate that candidates generated from our method have good quality metrics.

Download here

Triplet Transform Learning for Automated Primate Face Recognition

Published in IEEE International Conference on Image Processing (ICIP), 2019

Automated primate face recognition has enormous potential in effective conservation of species facing endangerment or extinction. The task is characterized by lack of training data, low inter-class variations, and large intra-class differences. Owing to the challenging nature of the problem, limited research has been performed to automate the process of primate face recognition. In this research, we propose a novel Triplet Transform Learning (TTL) model for learning discriminative representations of primate faces. The proposed model reduces the intra-class variations and increases the inter-class variations to obtain robust sparse representations for the primate faces. It is utilized to present a novel framework for primate face recognition, which is evaluated on the primate dataset, comprising of 80 identities including monkeys, gorillas, and chimpanzees. Experimental results demonstrate the efficacy of the proposed approach, where it outperforms the existing approaches and attains state-of-the-art performance on the primates database

Download here

Exploring Bias in Primate Face Detection and Recognition

Published in ECCV-BEFA, 2018, 2018

Deforestation and loss of habitat have resulted in rapid decline of certain species of primates in forests. On the other hand, uncontrolled growth of a few species of primates in urban areas has led to safety issues and nuisance for the local residents. Hence, identifying individual primates has become the need of the hour not only for conservation and effective mitigation in the wild but also in zoological parks and wildlife sanctuaries. Primates and human faces share a lot of common features like position and shape of eyes, nose and mouth. It is worth exploring whether the knowledge of human faces and recent methods learned from human face detection and recognition can be extended to primate faces. However, similar challenges relating to bias in human faces will also occur in primates. The quality and orientation of primate images along with different species of primates ranging from monkeys to gorillas and chimpanzees will contribute to bias in effective detection and recognition. Experimental results on a primate dataset of over 80 identities show the effect of bias in this research problem.

Download here

Sanchit Sinha

Publications

A Self-explaining Neural Architecture for Generalizable Concept Learning

Understanding and Enhancing Robustness of Concept-Based Models

Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing

Triplet Transform Learning for Automated Primate Face Recognition

Exploring Bias in Primate Face Detection and Recognition