AI systems sometimes exhibit harmful unintended behaviors post-deployment. This is often despite extensive diagnostics and debugging by developers. Minimizing risks from models is challenging because the attack surface is so large. It is not tractable to exhaustively search for inputs that may cause a model to fail. Red- teaming and adversarial training (AT) are commonly used to make AI systems more robust. However, they have not been sufficient to avoid many real-world failure modes that differ from the ones adversarially trained on. In this work, we utilize latent adversarial training (LAT) to defend against vulnerabilities without generating inputs that elicit them. LAT leverages the compressed, abstract, and structured latent representations of concepts that the network actually uses for prediction. We use LAT to remove trojans and defend against held-out classes of adversarial attacks. We show in image classification, text classification, and text generation tasks that LAT usually improves both robustness and performance on clean data relative to AT. This suggests that LAT can be a promising tool for defending against failure modes that are not explicitly identified by developers.
2023
ICRA; ICCV NeRF
High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning
A robot self-model is a task-agnostic representation of the robot’s physical morphology that can be used for motion planning tasks in absence of classical geometric kinematic models. In particular, when the latter are hard to engineer or the robot’s kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot’s workspace dimension. We demonstrate the capabilities of this model on a motion planning task as an exemplary downstream application.
Neural networks are at the core of AI systems recently observing accelerated adoption in high-stakes environments. Consequently, understanding their black-box predictive behavior is paramount. Current explainable AI techniques, however, are limited to explaining a single prediction, rather than characterizing the inherent ability of the model to be explained, reducing their usefulness to manual inspection of samples. In this work, we offer a conceptual distinction between explanation methods and explainability. We use this motivation to propose Object-based Explainability (ObEy), a novel model explainability metric that collectively assesses model-produced saliency maps relative to objects in images, inspired by humans’ perception of scenes. To render ObEy independent of the prediction task, we use full-image instance segmentations obtained from a foundation model, making the metric applicable on existing models in any setting. We demonstrate ObEy’s immediate applicability to use cases in model inspection and comparison. As a result, we present new insights into the explainability of adversarially trained models from a quantitative perspective.
2021
INFO
Evaluating Error Mitigation Strategies for Entangled Quantum States on Near-Term Quantum Computers
Entanglement is one of the quantum mechanical properties to which recently emerging quantum computers attribute an exponential increase in computing power. however, these systems are subject to a set of noise-inducing physical processes and hardware-level imperfections that render the results from quantum circuits erroneous. Bridging the time until sufficient qubits are available to compensate for these effects, quantum error mitigation algorithms aim at improving the result accuracy on near-term quantum devices. This empirical investigation describes and compares customary fundamental approaches to error mitigation for quantum states in condition of entanglement on real quantum computers. It is demonstrated that two readily implementable techniques regarding circuit design and measurement error mitigation may lead to a considerable increase in the quality of results.