Microsoft's Stop for Chatbot Trickery

The company's copilot recently provided strange, harmful responses. Defensive mechanisms are intended to detect and block suspicious activities.

3/31/2024, 3:01 PM
Eulerpool News Mar 31, 2024, 3:01 PM

Microsoft Corp. Takes Measures to Prevent Artificial Intelligence Chatbots from Being Tricked into Unusual or Harmful Behavior. In a blog post on Thursday, the company based in Redmond, Washington, announced new security features for Azure AI Studio. This tool allows developers to create customized AI assistants using their own data.

The new tools include "Prompt Shields," which are designed to detect and block intentional attempts – so-called prompt injection attacks or jailbreaks – that try to entice an AI model into unintended behavior.

Microsoft also addresses "indirect prompt injections" in which hackers insert malicious instructions into data used to train a model, thereby enticing it to perform unauthorized actions such as stealing user information or taking over a system.

According to Sarah Bird, Microsoft's Chief Product Officer for responsible AI, such attacks pose a unique challenge and threat. The new defensive measures are designed to detect suspicious inputs and block them in real time.

Additionally, Microsoft Introduces a Feature that Alerts Users When a Model Makes Up Inventions or Generates Faulty Answers. Microsoft Strives to Build Trust in Its Generative AI Tools Used by Both Consumers and Business Customers.

In February, the company investigated incidents with its Co-pilot chatbot, which generated responses ranging from strange to harmful. Following the review of the incidents, Microsoft stated that users had intentionally tried to provoke Co-pilot into these responses.

Microsoft is the biggest investor in OpenAI and has made the partnership a key element of its AI strategy. Bird emphasized that Microsoft and OpenAI are committed to the safe use of AI and integrating protective measures into the large language models that underpin generative AI. "However, one cannot rely solely on the model," she said. "These jailbreaks, for instance, are an inherent weakness of the model technology."

Own the gold standard ✨ in financial data & analytics
fair value · 20 million securities worldwide · 50 year history · 10 year estimates · leading business news

Subscribe for $2

News