Secure and Trustworthy AI

Adversarial Robustness

While AI has made a remarkable progress in solving many real-world problems, AI models remain vulnerable to the so-called adversarial examples – modifications to valid model inputs that can cause the AI models to produce erroneous or unexpected results. The existence of adversarial examples in state-of-the-art AI models raises serious concerns about their reliability in real-world applications, preventing their potential use in high-stakes and security-sensitive domains. 

At INSAIT, we work on both uncovering new vulnerabilities in state-of-the-art AI models, as well as enhancing their resilience to such attacks. Our work includes building on top of techniques such as adversarial training where AI models are trained using adversarial examples to improve their resilience, neural network certification that can be used to verify the resilience of models and provide bounds on the maximum possible perturbation that can be applied to an input without causing the model to misbehave, as well as, randomized smoothing that relies on statistical methods to bound the probability of model errors.

Researchers involved in this area:

Privacy

The remarkable advancements in AI across various domains can largely be attributed to the rapid increase in the availability of training data. However, in many other areas where AI holds significant potential, the sensitive nature of the required data often discourages parties from sharing it due to privacy concerns, thereby hindering AI progress. Consequently, developing methods to enable AI training while preserving the privacy of individuals’ data presents a highly promising and critical research direction.

At INSAIT, we are looking to develop novel methods for training models in privacy preserving ways, as well as, testing the privacy protections provided by existing methods. This encompasses work on developing new protocols and testing the privacy of existing protocols in Federated Learning, as well as, work on differential privacy and incentivizing data sharing in AI applications.

Researchers involved in this area:

 

Fairness

Modern AI models rely on large quantities of real-world data, often scraped on the internet, to be trained. While this has allowed them to gain unprecedented knowledge across a large number of domains, it also forces AI models to inherit the biases present in their training data implicitly. In practice, this has led to AI models that perpetuate existing social inequalities, particularly with respect to sensitive attributes such as race, gender, and socioeconomic status. This has led to serious consequences when these models have been deployed, such as discriminatory treatment of people in key areas like healthcare, law enforcement, and finances. 

Fairness research aims to alleviate such risks by first quantifying and then explicitly reducing the bias present in the AI model decisions. At INSAIT, we work on both identifying new sources of bias and bias amplification, as well as approaches for training less biased models by filtering and generating synthetic training data, as well as explicit de-biasing training using alignment and neural network certification techniques.

Researchers involved in this area:

Fact Checking

The rapid advancement of Large Language Models (LLMs) has revolutionized the way we interact with technology, enabling machines to generate human-like text and converse with users in a more natural and intuitive way. However, as LLMs become increasingly ubiquitous, concerns about the accuracy and reliability of the information they provide have grown. Fact-checking of LLMs has become a crucial aspect of ensuring the trustworthiness of these models, as they can perpetuate misinformation and spread false or misleading information if not properly validated due to the so-called problem of hallucination. By fact-checking LLMs, we can promote transparency, accountability, and responsible AI development, ultimately helping to build trust in these powerful technologies.

At INSAIT, we are developing technology to enable the detection and prevention of misinformation generation in LLMs. This includes analyzing the models’ decisions during the generation process to identify parts of the generated text that potentially contain wrong information, comparing the generated information against sources with knowledge authority, such as textbooks, conducting large-scale evaluations of existing models for factuality, as well as, researching ways of augmenting the generation of existing models with factual information in a reliable manner.

Researchers involved in this area:

 

Watermarking

The increasing capabilities, as well as the recent popularity and accessibility of AI models, have raised concerns about their potential misuse for generating large-scale human-looking text and images for malicious purposes such as misinformation campaigns or the creation of fraudulent academic articles. To mitigate these risks, researchers have been exploring the concept of watermarking, which involves embedding a hidden signature or identifier into the model’s outputs. This watermark can be used to track the origin of generated text or images and detect potential misuse, such as plagiarism or disinformation. Reliable watermarking methods are crucial for the safe and accountable deployment of AI models.

At INSAIT, we work on various aspects of generating watermarks, such as evaluating and increasing the robustness of watermarks to attacks that aim to remove them from the generated output by applying common perturbations like text rephrasing and image compression, as well as their imperceptibility, which requires that the AI models retain their high utility and their natural-looking and sounding outputs in the presence of watermarks.

Researchers involved in this area: