Skip to main content

Giedrius Burachas

Followers

68

Following

80

Public Views

Address: United States

less

Jose Garcia-rodriguez

University of Alicante / Universidad de Alicante

Gabriel Kreiman

University of NSW at Canberra

Hochschule Bonn-Rhein-Sieg

Nestor Arana Arexolaleiba

Mondragon University

Lhassane Idoumghar

Université de Haute-Alsace (UHA)

Alexandra Carlson

Interests

Uploads

Papers by Giedrius Burachas

Learning Invariant World State Representations with Predictive Coding

Cornell University - arXiv, Jul 6, 2022

Self-supervised learning methods overcome the key bottleneck for building more capable AI: limite... more Self-supervised learning methods overcome the key bottleneck for building more capable AI: limited availability of labeled data. However, one of the drawbacks of self-supervised architectures is that the representations that they learn are implicit and it is hard to extract meaningful information about the encoded world states, such as 3D structure of the visual scene encoded in a depth map. Moreover, in the visual domain such representations only rarely undergo evaluations that may be critical for downstream tasks, such as vision for autonomous cars. Herein, we propose a framework for evaluating visual representations for illumination invariance in the context of depth perception. We develop a new predictive coding-based architecture and a hybrid fully-supervised/self-supervised learning method. We propose a novel architecture that extends the predictive coding approach: PRedictive Lateral bottom-Up and top-Down Encoder-decoder Network (PreludeNet), which explicitly learns to infer and predict depth from video frames. In PreludeNet, the encoder's stack of predictive coding layers is trained in a self-supervised manner, while the predictive decoder is trained in a supervised manner to infer or predict the depth. We evaluate the robustness of our model on a new synthetic dataset, in which lighting conditions (such as overall illumination, and effect of shadows) can be be parametrically adjusted while keeping all other aspects of the world constant. PreludeNet achieves both competitive depth inference performance and next frame prediction accuracy. We also show how this new network architecture, coupled with the hybrid fully-supervised/self-supervised learning method, achieves balance between the said performance and invariance to changes in lighting. The proposed framework for evaluating visual representations can be extended to diverse task domains and invariance tests. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Modular Adaptation for Cross-Domain Few-Shot Learning

ArXiv, 2021

Adapting pre-trained representations has become the goto recipe for learning new downstream tasks... more Adapting pre-trained representations has become the goto recipe for learning new downstream tasks with limited examples. While literature has demonstrated great successes via representation learning, in this work, we show that substantial performance improvement of downstream tasks can also be achieved by appropriate designs of the adaptation process. Specifically, we propose a modular adaptation method that selectively performs multiple state-of-the-art (SOTA) adaptation methods in sequence. As different downstream tasks may require different types of adaptation, our modular adaptation enables the dynamic configuration of the most suitable modules based on the downstream task. Moreover, as an extension to existing cross-domain 5-way k-shot benchmarks (e.g., miniImageNet→ CUB), we create a new high-way ( ̃100) k-shot benchmark with data from 10 different datasets. This benchmark provides a diverse set of domains and allows the use of stronger representations learned from ImageNet. E...

Metacognitive Mechanisms for Novelty Processing: Lessons for AI

Novelty is central for survival of biological and design of artificial agents. On one hand, cogni... more Novelty is central for survival of biological and design of artificial agents. On one hand, cognitive and neurosciences accumulated large corpus of experimental data addressing diverse mechanisms of novelty detection, response and adaptation. Increasing evidence supporting the Predictive Coding Theory suggests an approach for integrating these diverse empirical findings of novelty research into coherent framework. On the other hand, AI and deeplearning-based machine learning systems in particular, have been mostly developed under the closed world assumption: Their performance is routinely tested using data that is in-distribution relative to training data, which resulted in fragility of these systems in face of open-world novelty. We propose an integrated approach to novelty processing in biological and AI systems, review supporting neurocognitive research and sketch a roadmap for designing novelty-aware AI systems based on Predictive Coding Theory.

A Study on Multimodal and Interactive Explanations for Visual Question Answering

Explainability and interpretability of AI models is an essential factor affecting the safety of A... more Explainability and interpretability of AI models is an essential factor affecting the safety of AI. While various explainable AI (XAI) approaches aim at mitigating the lack of transparency in deep networks, the evidence of the effectiveness of these approaches in improving usability, trust, and understanding of AI systems are still missing. We evaluate multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors. The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate. Furthermore, we introduce active attention, a novel method for evaluating causal attentional effects through intervention by ed...

Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness

While there have been many proposals on how to make AI algorithms more transparent, few have atte... more While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. We propose a Twenty-Questions style collaborative image guessing game, Explanation-assisted Guess Which (ExAG) as a method of evaluating the efficacy of explanations in the context of Visual Question Answering (VQA) - the task of answering natural language questions on images. We study the effect of VQA agent explanations on the game performance as a function of explanation type and quality. We observe that "helpful" explanations are conducive to game performance (by almost 22% for "excellent" rated explanation games), and having at least one "correct" explanation is significantly helpful when VQA system answers are mostly noisy (by almost 30% compared to no explanation games). We also see that players develop a preference for explanations even when penalized and that th...

Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval

arXiv: Computers and Society, 2019

While there have been many proposals on making AI algorithms explainable, few have attempted to e... more While there have been many proposals on making AI algorithms explainable, few have attempted to evaluate the impact of AI-generated explanations on human performance in conducting human-AI collaborative tasks. To bridge the gap, we propose a Twenty-Questions style collaborative image retrieval game, Explanation-assisted Guess Which (ExAG), as a method of evaluating the efficacy of explanations (visual evidence or textual justification) in the context of Visual Question Answering (VQA). In our proposed ExAG, a human user needs to guess a secret image picked by the VQA agent by asking natural language questions to it. We show that overall, when AI explains its answers, users succeed more often in guessing the secret image correctly. Notably, a few correct explanations can readily improve human performance when VQA answers are mostly incorrect as compared to no-explanation games. Furthermore, we also show that while explanations rated as "helpful" significantly improve human ...

Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness

ArXiv, 2019

While there have been many proposals on how to make AI algorithms more transparent, few have atte... more While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. We propose a Twenty-Questions style collaborative image guessing game, Explanation-assisted Guess Which (ExAG) as a method of evaluating the efficacy of explanations in the context of Visual Question Answering (VQA) - the task of answering natural language questions on images. We study the effect of VQA agent explanations on the game performance as a function of explanation type and quality. We observe that "helpful" explanations are conducive to game performance (by almost 22% for "excellent" rated explanation games), and having at least one "correct" explanation is significantly helpful when VQA system answers are mostly noisy (by almost 30% compared to no explanation games). We also see that players develop a preference for explanations even when penalized and that th...

Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention

ArXiv, 2019

In this paper, we present a novel approach for the task of eXplainable Question Answering (XQA), ... more In this paper, we present a novel approach for the task of eXplainable Question Answering (XQA), i.e., generating natural language (NL) explanations for the Visual Question Answering (VQA) problem. We generate NL explanations comprising of the evidence to support the answer to a question asked to an image using two sources of information: (a) annotations of entities in an image (e.g., object labels, region descriptions, relation phrases) generated from the scene graph of the image, and (b) the attention map generated by a VQA model when answering the question. We show how combining the visual attention map with the NL representation of relevant scene graph entities, carefully selected using a language model, can give reasonable textual explanations without the need of any additional collected data (explanation captions, etc). We run our algorithms on the Visual Genome (VG) dataset and conduct internal user-studies to demonstrate the efficacy of our approach over a strong baseline. W...

A Study on Multimodal and Interactive Explanations for Visual Question Answering

ArXiv, 2020

Explainability and interpretability of AI models is an essential factor affecting the safety of A... more Explainability and interpretability of AI models is an essential factor affecting the safety of AI. While various explainable AI (XAI) approaches aim at mitigating the lack of transparency in deep networks, the evidence of the effectiveness of these approaches in improving usability, trust, and understanding of AI systems are still missing. We evaluate multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors. The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate. Furthermore, we introduce active attention, a novel method for evaluating causal attentional effects through intervention by ed...

Training Deep Spiking Neural Networks

Computation using brain-inspired spiking neural networks (SNNs) with neuromorphic hardware may of... more Computation using brain-inspired spiking neural networks (SNNs) with neuromorphic hardware may offer orders of magnitude higher energy efficiency compared to the current analog neural networks (ANNs). Unfortunately, training SNNs with the same number of layers as state of the art ANNs remains a challenge. To our knowledge the only method which is successful in this regard is supervised training of ANN and then converting it to SNN. In this work we directly train deep SNNs using backpropagation with surrogate gradient and find that due to implicitly recurrent nature of feed forward SNN's the exploding or vanishing gradient problem severely hinders their training. We show that this problem can be solved by tuning the surrogate gradient function. We also propose using batch normalization from ANN literature on input currents of SNN neurons. Using these improvements we show that is is possible to train SNN with ResNet50 architecture on CIFAR100 and Imagenette object recognition data...

Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness

ArXiv, 2021

Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), a... more Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err. Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users’ understanding of those cases. To evaluate our new explanations, we further introduce a metric that simulates users’ interpretation of explanations to evaluate their potential helpfulness to understand model correctness. We finally conduct user studies to see that our new explanations help users understand model correctness better than baselines ...

Make Up Your Mind: Towards Consistent Answer Predictions in VQA Models

Visual Question Answering (VQA) involves answering natural language questions on images [1]. Whil... more Visual Question Answering (VQA) involves answering natural language questions on images [1]. While state-of-the-art models can answer such questions satisfactorily well on a standard VQA dataset [1], we observe that it still often makes blatant mistakes while answering questions from slightly different perspectives. This reduces their perceived trust and raises concerns as to whether they can represent the semantics of concepts in questions and images accurately. We introduce the task of belief consistency in VQA models, i.e. the task of answering different semantically-grounded questions about a certain concept consistently. For example, if the answer to “is it a vegetarian pizza?” is “yes”, the answer to “is there pepperoni on the pizza?” should be “no”. Current VQA datasets do not have such multiple consistent questions about a single concept to test the consistency of a model. We propose a simple approach to auto-generate a consistent VQA dataset (ConVQA) on top of Visual Genome...

Generating and Evaluating Explanations of Attended and Error‐Inducing Input Regions for VQA Models

Applied AI Letters, 2021

Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), a... more Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err. Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users' understanding of those cases. To evaluate our new explanations, we further introduce a metric that simulates users' interpretation of explanations to evaluate their potential helpfulness to understand model correctness. We finally conduct user studies to see that our new explanations help users understand model correctness better than baselines by an expected 30% and that our proxy helpfulness metrics correlate strongly ( > 0.97) with how well users can predict model correctness.

Improving Users’ Mental Model with Attention‐directed Counterfactual Edits

Applied AI Letters, 2021

In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental... more In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs. In this work, we show that showing controlled counterfactual image-question examples are more effective at improving the mental model of users as compared to simply showing random examples. We compare a generative approach and a retrieval-based approach to show

The Impact of Explanations on AI Competency Prediction in VQA

2020 IEEE International Conference on Humanized Computing and Communication with Artificial Intelligence (HCCAI), 2020

Explainability is one of the key elements for building trust in AI systems. Among numerous attemp... more Explainability is one of the key elements for building trust in AI systems. Among numerous attempts to make AI explainable, quantifying the effect of explanations remains a challenge in conducting human-AI collaborative tasks. Aside from the ability to predict the overall behavior of AI, in many applications, users need to understand an AI agents competency in different aspects of the task domain. In this paper, we evaluate the impact of explanations on the users mental model of AI agent competency within the task of visual question answering (VQA). We quantify users understanding of competency, based on the correlation between the actual system performance and user rankings. We introduce an explainable VQA system that uses spatial and object features and is powered by the BERT language model. Each group of users sees only one kind of explanation to rank the competencies of the VQA model. The proposed model is evaluated through between-subject experiments to probe explanations' impact on the users perception of competency. The comparison between two VQA models shows BERT based explanations and the use of object features improve the users prediction of the models competencies.

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

While models for Visual Question Answering (VQA) have steadily improved over the years, interacti... more While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers "red" to "What color is the balloon?", it might answer "no" if asked, "Is the balloon red?". These responses violate simple notions of entailment and raise questions about how effectively VQA models ground language. In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. For a given observable fact in an image (e.g. the balloon's color), we generate a set of logically consistent question-answer (QA) pairs (e.g. Is the balloon red?) and also collect a humanannotated set of common-sense based consistent QA pairs (e.g. Is the balloon the same color as tomato sauce?). Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM). CTM automatically generates entailed (or similar-intent) questions for a source QA pair and fine-tunes the VQA model if the VQA's answer to the entailed question is consistent with the source QA pair. We demonstrate that our CTM-based training improves the consistency of VQA models on the Con-VQA datasets and is a strong baseline for further research.

User centric interface for interaction with visual display that recognizes user intentions

Gauging sensory representations in the brain

Trends in Neurosciences, 1999

The neural substrates of spoken idiom comprehension

Language and Cognitive Processes, 2009

Page 1. The neural substrates of spoken idiom comprehension Dieter G. Hillert and Giedrius T. Bur... more

Modulation of neuronal responses: implications for active vision(Nida, 12-21 July 2000)

Learning Invariant World State Representations with Predictive Coding

Cornell University - arXiv, Jul 6, 2022

Self-supervised learning methods overcome the key bottleneck for building more capable AI: limite... more Self-supervised learning methods overcome the key bottleneck for building more capable AI: limited availability of labeled data. However, one of the drawbacks of self-supervised architectures is that the representations that they learn are implicit and it is hard to extract meaningful information about the encoded world states, such as 3D structure of the visual scene encoded in a depth map. Moreover, in the visual domain such representations only rarely undergo evaluations that may be critical for downstream tasks, such as vision for autonomous cars. Herein, we propose a framework for evaluating visual representations for illumination invariance in the context of depth perception. We develop a new predictive coding-based architecture and a hybrid fully-supervised/self-supervised learning method. We propose a novel architecture that extends the predictive coding approach: PRedictive Lateral bottom-Up and top-Down Encoder-decoder Network (PreludeNet), which explicitly learns to infer and predict depth from video frames. In PreludeNet, the encoder's stack of predictive coding layers is trained in a self-supervised manner, while the predictive decoder is trained in a supervised manner to infer or predict the depth. We evaluate the robustness of our model on a new synthetic dataset, in which lighting conditions (such as overall illumination, and effect of shadows) can be be parametrically adjusted while keeping all other aspects of the world constant. PreludeNet achieves both competitive depth inference performance and next frame prediction accuracy. We also show how this new network architecture, coupled with the hybrid fully-supervised/self-supervised learning method, achieves balance between the said performance and invariance to changes in lighting. The proposed framework for evaluating visual representations can be extended to diverse task domains and invariance tests. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Modular Adaptation for Cross-Domain Few-Shot Learning

ArXiv, 2021

Adapting pre-trained representations has become the goto recipe for learning new downstream tasks... more Adapting pre-trained representations has become the goto recipe for learning new downstream tasks with limited examples. While literature has demonstrated great successes via representation learning, in this work, we show that substantial performance improvement of downstream tasks can also be achieved by appropriate designs of the adaptation process. Specifically, we propose a modular adaptation method that selectively performs multiple state-of-the-art (SOTA) adaptation methods in sequence. As different downstream tasks may require different types of adaptation, our modular adaptation enables the dynamic configuration of the most suitable modules based on the downstream task. Moreover, as an extension to existing cross-domain 5-way k-shot benchmarks (e.g., miniImageNet→ CUB), we create a new high-way ( ̃100) k-shot benchmark with data from 10 different datasets. This benchmark provides a diverse set of domains and allows the use of stronger representations learned from ImageNet. E...

Metacognitive Mechanisms for Novelty Processing: Lessons for AI

Novelty is central for survival of biological and design of artificial agents. On one hand, cogni... more Novelty is central for survival of biological and design of artificial agents. On one hand, cognitive and neurosciences accumulated large corpus of experimental data addressing diverse mechanisms of novelty detection, response and adaptation. Increasing evidence supporting the Predictive Coding Theory suggests an approach for integrating these diverse empirical findings of novelty research into coherent framework. On the other hand, AI and deeplearning-based machine learning systems in particular, have been mostly developed under the closed world assumption: Their performance is routinely tested using data that is in-distribution relative to training data, which resulted in fragility of these systems in face of open-world novelty. We propose an integrated approach to novelty processing in biological and AI systems, review supporting neurocognitive research and sketch a roadmap for designing novelty-aware AI systems based on Predictive Coding Theory.

A Study on Multimodal and Interactive Explanations for Visual Question Answering

Explainability and interpretability of AI models is an essential factor affecting the safety of A... more Explainability and interpretability of AI models is an essential factor affecting the safety of AI. While various explainable AI (XAI) approaches aim at mitigating the lack of transparency in deep networks, the evidence of the effectiveness of these approaches in improving usability, trust, and understanding of AI systems are still missing. We evaluate multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors. The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate. Furthermore, we introduce active attention, a novel method for evaluating causal attentional effects through intervention by ed...

Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness

While there have been many proposals on how to make AI algorithms more transparent, few have atte... more While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. We propose a Twenty-Questions style collaborative image guessing game, Explanation-assisted Guess Which (ExAG) as a method of evaluating the efficacy of explanations in the context of Visual Question Answering (VQA) - the task of answering natural language questions on images. We study the effect of VQA agent explanations on the game performance as a function of explanation type and quality. We observe that "helpful" explanations are conducive to game performance (by almost 22% for "excellent" rated explanation games), and having at least one "correct" explanation is significantly helpful when VQA system answers are mostly noisy (by almost 30% compared to no explanation games). We also see that players develop a preference for explanations even when penalized and that th...

Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval

arXiv: Computers and Society, 2019

While there have been many proposals on making AI algorithms explainable, few have attempted to e... more While there have been many proposals on making AI algorithms explainable, few have attempted to evaluate the impact of AI-generated explanations on human performance in conducting human-AI collaborative tasks. To bridge the gap, we propose a Twenty-Questions style collaborative image retrieval game, Explanation-assisted Guess Which (ExAG), as a method of evaluating the efficacy of explanations (visual evidence or textual justification) in the context of Visual Question Answering (VQA). In our proposed ExAG, a human user needs to guess a secret image picked by the VQA agent by asking natural language questions to it. We show that overall, when AI explains its answers, users succeed more often in guessing the secret image correctly. Notably, a few correct explanations can readily improve human performance when VQA answers are mostly incorrect as compared to no-explanation games. Furthermore, we also show that while explanations rated as "helpful" significantly improve human ...

Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness

ArXiv, 2019

While there have been many proposals on how to make AI algorithms more transparent, few have atte... more While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. We propose a Twenty-Questions style collaborative image guessing game, Explanation-assisted Guess Which (ExAG) as a method of evaluating the efficacy of explanations in the context of Visual Question Answering (VQA) - the task of answering natural language questions on images. We study the effect of VQA agent explanations on the game performance as a function of explanation type and quality. We observe that "helpful" explanations are conducive to game performance (by almost 22% for "excellent" rated explanation games), and having at least one "correct" explanation is significantly helpful when VQA system answers are mostly noisy (by almost 30% compared to no explanation games). We also see that players develop a preference for explanations even when penalized and that th...

Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention

ArXiv, 2019

In this paper, we present a novel approach for the task of eXplainable Question Answering (XQA), ... more In this paper, we present a novel approach for the task of eXplainable Question Answering (XQA), i.e., generating natural language (NL) explanations for the Visual Question Answering (VQA) problem. We generate NL explanations comprising of the evidence to support the answer to a question asked to an image using two sources of information: (a) annotations of entities in an image (e.g., object labels, region descriptions, relation phrases) generated from the scene graph of the image, and (b) the attention map generated by a VQA model when answering the question. We show how combining the visual attention map with the NL representation of relevant scene graph entities, carefully selected using a language model, can give reasonable textual explanations without the need of any additional collected data (explanation captions, etc). We run our algorithms on the Visual Genome (VG) dataset and conduct internal user-studies to demonstrate the efficacy of our approach over a strong baseline. W...

A Study on Multimodal and Interactive Explanations for Visual Question Answering

ArXiv, 2020

Explainability and interpretability of AI models is an essential factor affecting the safety of A... more Explainability and interpretability of AI models is an essential factor affecting the safety of AI. While various explainable AI (XAI) approaches aim at mitigating the lack of transparency in deep networks, the evidence of the effectiveness of these approaches in improving usability, trust, and understanding of AI systems are still missing. We evaluate multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors. The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate. Furthermore, we introduce active attention, a novel method for evaluating causal attentional effects through intervention by ed...

Training Deep Spiking Neural Networks

Computation using brain-inspired spiking neural networks (SNNs) with neuromorphic hardware may of... more Computation using brain-inspired spiking neural networks (SNNs) with neuromorphic hardware may offer orders of magnitude higher energy efficiency compared to the current analog neural networks (ANNs). Unfortunately, training SNNs with the same number of layers as state of the art ANNs remains a challenge. To our knowledge the only method which is successful in this regard is supervised training of ANN and then converting it to SNN. In this work we directly train deep SNNs using backpropagation with surrogate gradient and find that due to implicitly recurrent nature of feed forward SNN's the exploding or vanishing gradient problem severely hinders their training. We show that this problem can be solved by tuning the surrogate gradient function. We also propose using batch normalization from ANN literature on input currents of SNN neurons. Using these improvements we show that is is possible to train SNN with ResNet50 architecture on CIFAR100 and Imagenette object recognition data...

Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness

ArXiv, 2021

Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), a... more Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err. Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users’ understanding of those cases. To evaluate our new explanations, we further introduce a metric that simulates users’ interpretation of explanations to evaluate their potential helpfulness to understand model correctness. We finally conduct user studies to see that our new explanations help users understand model correctness better than baselines ...

Make Up Your Mind: Towards Consistent Answer Predictions in VQA Models

Visual Question Answering (VQA) involves answering natural language questions on images [1]. Whil... more Visual Question Answering (VQA) involves answering natural language questions on images [1]. While state-of-the-art models can answer such questions satisfactorily well on a standard VQA dataset [1], we observe that it still often makes blatant mistakes while answering questions from slightly different perspectives. This reduces their perceived trust and raises concerns as to whether they can represent the semantics of concepts in questions and images accurately. We introduce the task of belief consistency in VQA models, i.e. the task of answering different semantically-grounded questions about a certain concept consistently. For example, if the answer to “is it a vegetarian pizza?” is “yes”, the answer to “is there pepperoni on the pizza?” should be “no”. Current VQA datasets do not have such multiple consistent questions about a single concept to test the consistency of a model. We propose a simple approach to auto-generate a consistent VQA dataset (ConVQA) on top of Visual Genome...

Generating and Evaluating Explanations of Attended and Error‐Inducing Input Regions for VQA Models

Applied AI Letters, 2021

Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), a... more Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err. Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users' understanding of those cases. To evaluate our new explanations, we further introduce a metric that simulates users' interpretation of explanations to evaluate their potential helpfulness to understand model correctness. We finally conduct user studies to see that our new explanations help users understand model correctness better than baselines by an expected 30% and that our proxy helpfulness metrics correlate strongly ( > 0.97) with how well users can predict model correctness.

Improving Users’ Mental Model with Attention‐directed Counterfactual Edits

Applied AI Letters, 2021

In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental... more In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs. In this work, we show that showing controlled counterfactual image-question examples are more effective at improving the mental model of users as compared to simply showing random examples. We compare a generative approach and a retrieval-based approach to show

The Impact of Explanations on AI Competency Prediction in VQA

2020 IEEE International Conference on Humanized Computing and Communication with Artificial Intelligence (HCCAI), 2020

Explainability is one of the key elements for building trust in AI systems. Among numerous attemp... more Explainability is one of the key elements for building trust in AI systems. Among numerous attempts to make AI explainable, quantifying the effect of explanations remains a challenge in conducting human-AI collaborative tasks. Aside from the ability to predict the overall behavior of AI, in many applications, users need to understand an AI agents competency in different aspects of the task domain. In this paper, we evaluate the impact of explanations on the users mental model of AI agent competency within the task of visual question answering (VQA). We quantify users understanding of competency, based on the correlation between the actual system performance and user rankings. We introduce an explainable VQA system that uses spatial and object features and is powered by the BERT language model. Each group of users sees only one kind of explanation to rank the competencies of the VQA model. The proposed model is evaluated through between-subject experiments to probe explanations' impact on the users perception of competency. The comparison between two VQA models shows BERT based explanations and the use of object features improve the users prediction of the models competencies.

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

While models for Visual Question Answering (VQA) have steadily improved over the years, interacti... more While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers "red" to "What color is the balloon?", it might answer "no" if asked, "Is the balloon red?". These responses violate simple notions of entailment and raise questions about how effectively VQA models ground language. In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. For a given observable fact in an image (e.g. the balloon's color), we generate a set of logically consistent question-answer (QA) pairs (e.g. Is the balloon red?) and also collect a humanannotated set of common-sense based consistent QA pairs (e.g. Is the balloon the same color as tomato sauce?). Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM). CTM automatically generates entailed (or similar-intent) questions for a source QA pair and fine-tunes the VQA model if the VQA's answer to the entailed question is consistent with the source QA pair. We demonstrate that our CTM-based training improves the consistency of VQA models on the Con-VQA datasets and is a strong baseline for further research.

User centric interface for interaction with visual display that recognizes user intentions

Gauging sensory representations in the brain

Trends in Neurosciences, 1999

The neural substrates of spoken idiom comprehension

Language and Cognitive Processes, 2009

Page 1. The neural substrates of spoken idiom comprehension Dieter G. Hillert and Giedrius T. Bur... more

Modulation of neuronal responses: implications for active vision(Nida, 12-21 July 2000)

Modulation of Neuronal Responses: Implications for Active Vision