appsec.fyi

AI Resources

Post Share

A curated AppSec resource library covering XSS, SQLi, SSRF, IDOR, RCE, XXE, OSINT, and more.

AI

AI security encompasses both protecting AI systems from attack and understanding the new vulnerability classes that AI introduces into applications. As organizations rapidly integrate large language models (LLMs), machine learning pipelines, and AI-powered features into their products, the attack surface has expanded in ways that traditional application security frameworks don't fully address.

Key threats to AI systems include prompt injection — where attackers manipulate LLM behavior through crafted inputs — data poisoning of training datasets, model extraction through repeated API queries, and adversarial examples that cause misclassification. Indirect prompt injection, where malicious instructions are embedded in data the AI processes (emails, documents, web pages), is emerging as one of the most significant security challenges for AI-integrated applications.

AI also introduces new categories of application risk: insecure output handling where LLM responses are rendered unsafely, excessive agency when AI agents are given too much access, sensitive information disclosure through training data leakage, and supply chain risks from fine-tuned models and third-party plugins. The OWASP Top 10 for LLM Applications provides a structured framework for understanding these risks.

On the defensive side, AI is being used to enhance security operations — automating vulnerability detection, analyzing malicious patterns, and accelerating incident response.

This page collects AI security research, LLM vulnerability techniques, defensive strategies, and resources covering the intersection of artificial intelligence and application security.

Date Added Link Excerpt
2026-05-21 NEW 2026LLM Security News: Risks Incidents Defenses newsLibrary of LLM security incidents and defenses details how rapid adoption of large language models has created new attack surfaces, expanding the enterprise threat landscape beyond traditional controls. It highlights risks like prompt injection, tool abuse, insecure output handling, and LLM supply chain threats, exemplified by the LiteLLM compromise and early 2025 data breaches. The OWASP LLM Top 10, including sensitive information disclosure and excessive agency, are discussed as persistent vulnerabilities, with conventional tools insufficient for addressing these LLM-specific failure modes. → blockchain-council.org
2026-05-21 NEW 2026AI QA vs AI Security Testing: Why LLM Apps Need Both Before They Scale beginnerLibrary for AI applications that requires both AI QA and AI security testing to move beyond traditional assumptions. It highlights that while AI QA focuses on usefulness, accuracy, and consistency, AI security testing addresses manipulation risks like prompt injection, data leakage, and unauthorized tool use, referencing the OWASP Top 10 for LLMs and the NIST AI Risk Management Framework.
2026-05-21 NEW 2026Generative AI Data Privacy and Security in LLMs beginnerLibrary for securing Generative AI and LLM workflows, addressing data privacy risks including training data leakage, prompt injection, and output harms. It details where sensitive data appears across training data, prompts, outputs, and telemetry, and outlines practical controls like data discovery, classification, minimization, anonymization, and differential privacy. The resource highlights regulatory pressures like GDPR and the AI Act, and common risk patterns identified by MIT and Stanford HAI, emphasizing OWASP's identified critical LLM risks. → blockchain-council.org
2026-05-20 NEW 2026Security for AI Agent Managers: Key Controls beginnerLibrary for securing AI agent managers, focusing on mitigating prompt injection, data leaks, and abuse of capabilities. It details risks inherent in agentic systems, including indirect prompt injection in browser agents and tool-chain injection, referencing industry guidance from NIST and the EU AI Act. Recommended layered mitigations include deploying an AI security gateway, enforcing context separation, hardening tool-use policies with least privilege, improving memory and RAG hygiene, and continuous monitoring and red-teaming. → blockchain-council.org
2026-05-20 NEW 2026How prompt injection broke Nvidia's sandboxed OpenClaw agent intermediateWriteup on prompt injection vulnerabilities in Nvidia's sandboxed OpenClaw agent, detailing how attackers can bypass isolation through dependency poisoning with emoji-encoded payloads and agent configuration poisoning via indirect prompt injection. The research highlights the inadequacy of sandboxes alone to prevent data exfiltration and persistent behavioral corruption, contrasting with the broader "IDEsaster" threat in non-sandboxed AI coding tools like Cursor and GitHub Copilot.
2026-05-19 NEW 2026AI Agent Security: Automating Workflow Without Creating Prompt Injection or Data Leak Risks intermediateReference on securing AI agents, detailing risks like prompt injection and data leakage, as described by OWASP and NIST. It emphasizes separating untrusted content from agent instructions, implementing data minimization, role-based access, output controls, and robust logging. The guide advises starting with lower-risk tasks and incorporating human review for sensitive actions, offering a checklist to identify potential vulnerabilities before deployment. → hackread.com
2026-05-19 NEW 20267 Serious AI Security Risks and How to Mitigate Them beginnerLibrary addressing AI security risks including prompt injection attacks and data leaks. It details mitigations for limited testing, lack of explainability, data breaches, adversarial attacks, bias, and supply chain risks, highlighting techniques like adversarial training, interpretable models, encryption, differential privacy, ensemble methods, and bias audits. The resource also notes how LLMs enable attackers to work faster, create convincing deceptions, operate more independently, and discover new vulnerabilities, impacting systems like Slack AI. → wiz.io
2026-05-17 NEW 2026Researchers Uncover 10 In-the-Wild Prompt Injection Payloads Targeting AI Agents newsWriteup detailing 10 indirect prompt injection (IPI) payloads discovered in the wild targeting AI agents. These payloads leverage poisoned web content to trick agents into executing malicious instructions, leading to data destruction, API key theft, and financial fraud. The attack chain involves threat actors embedding hidden instructions like "Ignore previous instructions" which, when processed by agents that browse and summarize web pages, bypass security protocols. High-impact targets include agentic AIs with privileges like sending emails or executing terminal commands, potentially affecting tools such as GitHub Copilot and AI-powered CI/CD reviewers. → infosecurity-magazine.com
2026-05-13 2026How indirect prompt injection attacks on AI work - and 6 ways to shut them down intermediateLibrary providing defenses against indirect prompt injection attacks, a top LLM security risk. These attacks weaponize AI by embedding malicious instructions within external data sources, leading to actions like API key theft, system overrides, attribute hijacking, and terminal command injection. Mitigation strategies include input/output validation, human oversight, least privilege, and OWASP's cheat sheet for handling these threats, which are ranked as the highest to LLM security by OWASP.
2026-05-12 20267 AI Security Tools to Prepare You for Every Attack Phase beginnerLibrary for hardening machine learning models against adversarial threats, the Adversarial Robustness Toolbox (ART) offers Python modules for assessing, defending, and verifying security. It supports 39 attack and 29 defense modules across major ML frameworks like TensorFlow and PyTorch, handling various data modalities. ART provides robustness metrics for objective resilience reporting, best suited for ML researchers and security engineers focused on adversarial attack simulation and model hardening during development. → wiz.io
2026-05-08 2026The AI Agent Security Surface: What Gets Exposed When You Add Tools and Memory intermediateLibrary for securing AI agents, moving beyond model-centric security to address four distinct attack surfaces: Prompt, Tool, Memory, and Planning Loop. This framework details vulnerabilities like indirect prompt injection, parameter injection against tools, memory poisoning illustrated by MINJA Framework successes, and planning loop manipulation leading to cascading failures in multi-agent systems. Mitigations include boundary sanitization, least privilege, provenance tracking, and reasoning logging.
2026-05-08 2026Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments intermediateLibrary demonstrating indirect AGENTS.md injection attacks in agentic environments. This library highlights a supply chain risk where malicious dependencies can overwrite AGENTS.md files, allowing attackers to hijack AI agent behavior, exemplified by a Golang project with a compromised `github.com/cursorwiz/echo` dependency that injects a stealthy `time.Sleep` command and manipulates PR summaries.
2026-05-05 2026Supply-chain attacks take aim at your AI coding agents beginner Supply ChainLibrary for identifying and mitigating AI coding agent supply-chain risks, including techniques like "slopsquatting" and LLM Optimization abuse used in the PromptMink campaign by North Korean APT group Famous Chollima. It details malicious packages targeting AI agents on registries like NPM and PyPI, featuring persuasive descriptions, legitimate functionality lures, and the use of compiled payloads and obfuscation for evasion. The library addresses how AI agents can be manipulated into installing malicious dependencies, as observed with hallucinated package names and overly convincing documentation designed to influence LLM recommendations. → csoonline.com
2026-05-05 2026LiteLLM flaw exploited within 36 hours of disclosure newsA critical flaw in LiteLLM was exploited within 36 hours of its public disclosure. The vulnerability, which allowed for potential data exfiltration, posed a significant risk to users. The rapid exploitation highlights the urgency of patching security vulnerabilities and the swiftness with which malicious actors can leverage disclosed weaknesses. No specific bounty payout amount was mentioned in the provided content. → msn.com
2026-05-05 2026AI finds 20-year-old bugs in PostgreSQL and MariaDB newsAnalysis of critical vulnerabilities discovered by AI in PostgreSQL and MariaDB, including CVE-2026-2005 (PostgreSQL pgcrypto heap buffer overflow), CVE-2026-2006 (PostgreSQL missing validation), and CVE-2026-32710 (MariaDB JSON_SCHEMA_VALID() buffer overflow). These flaws, some dating back over 20 years, enable remote code execution and have been patched by maintainers. → csoonline.com
2026-05-04 2026Weekly Recap: AI-Powered Phishing Android Spying Tool Linux Exploit GitHub RCE & More news Mobile RCELibrary for detecting and mitigating threats including the CVE-2026-41940 cPanel flaw, CVE-2026-31431 Linux kernel vulnerability (Copy Fail), and CVE-2026-3854 GitHub RCE. It also covers vishing tactics for SaaS breaches, TeamPCP's supply chain attacks across npm, PyPI, and Packagist, a DEEP#DOOR Python backdoor, and the VECT 2.0 ransomware. → thehackernews.com
2026-05-04 2026Local Guardrails for Secrets Security in the Age of AI Coding Assistants beginner Secrets Supply ChainLibrary for local secret scanning, ggshield, addresses the shift of software supply chain attack surfaces to developer workstations. It detects hardcoded credentials in .env files, terminal history, build output, and AI prompts, mitigating risks before they reach remote repositories or pipelines. The tool integrates directly into developer workflows via editors, Git hooks, terminals, and AI coding assistants, preventing credential exposure and simplifying incident response. → blog.gitguardian.com
2026-05-03 2026SecureLayer7 Discloses Two High Injection Vulnerabilities in Spring AI newsWriteup detailing two high-severity injection vulnerabilities in Spring AI, CVE-2026-22730 (SQL Injection) and CVE-2026-22729 (JSONPath Injection). These flaws, discovered by SecureLayer7's Blackf0g team, affect vector store metadata filtering and bypass access controls in RAG applications. The SQL injection allows authenticated attackers to manipulate MariaDBFilterExpressionConverter, while the JSONPath injection impacts PostgreSQL and Oracle vector stores via Vector Stores FilterExpressionConverter. Both vulnerabilities are fixed in Spring AI 1.0.4 and 1.1.3.
2026-05-01 2026Anthropic Rolls Out Claude Security for AI Vulnerability Scanning beginnerTool for AI-powered application security scanning, Claude Security, utilizes Claude Opus 4.7 to reason about code and identify vulnerabilities by understanding component interactions and data flows, rather than relying solely on pattern matching. It offers scheduled and targeted scans, detailed explanations of findings including confidence ratings and severity, and generates patch instructions. Claude Security integrates with existing audit systems and can send results to platforms like Slack and Jira, aiming to reduce false positives through a multi-stage validation pipeline. → infosecurity-magazine.com
2026-05-01 2026Poisoning the well: AI supply chain attacks on Hugging Face and OpenClaw beginner Supply ChainLibrary of malicious AI skills and models found on Hugging Face and ClawHub, facilitating AI supply chain attacks. Attackers exploit trust in these platforms by embedding trojanized skills and disguised payloads, leading to malware delivery including trojans, cryptominers, and the AMOS stealer. Techniques like indirect prompt injection enable AI agents to execute malicious actions on behalf of users, expanding the attack surface beyond initial compromise.
2026-04-30 2026CVE MCP Server Turns Claude Into a Full-Spectrum Security Analyst With 27 Tools Across 21 APIs intermediate API SecThe CVE MCP Server leverages Claude's AI capabilities to transform it into a comprehensive security analyst. It integrates 27 distinct security tools through 21 different APIs. This allows Claude to analyze vulnerabilities and threats from a wide spectrum of angles, enhancing its ability to identify and address security issues. The tool aims to provide a more robust and integrated approach to cybersecurity analysis by bringing together diverse functionalities under a single AI-powered platform. → cybersecuritynews.com
2026-04-30 2026Benchmarking AI Pentesting Tools: A Practical Comparison intermediateThis article provides a practical comparison of AI-powered penetration testing tools. It evaluates their effectiveness and efficiency in various cybersecurity scenarios. The focus is on how these tools leverage AI to automate and enhance aspects of the pentesting process, such as vulnerability detection and exploitation. The comparison aims to help security professionals choose the most suitable AI tools for their needs. No specific bounty payout amounts are mentioned in the provided content. → securityboulevard.com
2026-04-30 2026CVE-2026-42208: LiteLLM SQL Injection Leaks Upstream API Keys news SQLiWriteup of CVE-2026-42208, a critical pre-authentication SQL injection in LiteLLM, a popular AI gateway. Exploited 36 hours after disclosure, this vulnerability in versions prior to 1.83.7-stable allows attackers to steal upstream API keys for providers like OpenAI, Anthropic, and Gemini by targeting the `litellm_credentials` and `litellm_config` tables. Immediate upgrade to version 1.83.7-stable or implementing mitigation strategies is advised.
2026-04-30 2026H-mmer/pentest-agents: Autonomous bug-bounty framework for Claude Code — 40 specialist agents, exploit-chain builder, writeup search, and live HackerOne/Bugcrowd integration. intermediate Bug BountyLibrary for autonomous bug-bounty hunting, integrating with Claude Code and other AI coding tools. It features 50 specialist agents, an exploit-chain builder, writeup search capabilities leveraging FAISS for semantic or keyword retrieval, and live integration with HackerOne and Bugcrowd platforms. The framework supports automated hunt loops, persistent endpoint tracking, and a cross-IDE installer for seamless deployment.
2026-04-29 2026CVE-2026-42208: LiteLLM bug exploited 36 hours after its disclosure news SQLiWriteup of CVE-2026-42208, an SQL injection in LiteLLM's proxy API key verification, exploited 36 hours post-disclosure. Attackers leverage crafted Authorization headers to access and potentially modify sensitive data in database tables holding API keys and credentials. The vulnerability, present in LiteLLM versions 1.81.16 to 1.83.6, was addressed in version 1.83.7. Disabling error logs offers a workaround for unpatchable instances. → securityaffairs.com
2026-04-29 2026AI Finds 38 Security Flaws in OpenEMR news RCEAn AI system has identified 38 security vulnerabilities within the OpenEMR electronic health records software. The AI's analysis, detailed in a linked report, uncovered these flaws, highlighting potential risks to patient data security and system integrity. This discovery underscores the growing role of artificial intelligence in identifying and addressing security weaknesses in critical software applications. No specific bug bounty payout amount was mentioned in the provided content. → darkreading.com
2026-04-29 2026LiteLLM exploited within 36 hours of disclosure via SQL injection bug news SQLiLibrary for managing large language model (LLM) interactions. Explores the exploitation of CVE-2026-42208, a SQL injection vulnerability in LiteLLM, which led to the theft of API keys and provider credentials from enterprises using the proxy to connect to models like OpenAI and Anthropic. The vulnerability, disclosed and exploited within 36 hours, highlights the compressed window between vulnerability discovery and weaponization, potentially exposing sensitive company IP and private data. Disabling error logs is a suggested mitigation. → scworld.com
2026-04-29 2026Malicious npm Dependency Linked to AI Assisted Commit Targets Crypto Wallets news Supply ChainLibrary of malicious npm dependencies linked to AI-assisted commits, specifically @validate-sdk/v2 and the PromptMink campaign, targeting crypto wallets. This North Korean state-sponsored actor, Famous Chollima, employed a layered attack structure with legitimate-seeming Web3 utilities hiding malware payloads, evolving from JavaScript to compiled binaries and Rust across Linux and Windows to exfiltrate sensitive data, system information, project folders, and install SSH keys for persistent access. → infosecurity-magazine.com
2026-04-29 2026Fresh LiteLLM Vulnerability Exploited Shortly After Disclosure news SQLiLibrary for securing AI gateways, specifically addressing CVE-2026-42208, a critical-severity SQL injection vulnerability in LiteLLM. This flaw, exploitable pre-authentication, allowed unauthenticated attackers to craft malicious Authorization headers to access sensitive database tables containing API keys and credentials. The vulnerability arises from a database query that includes caller-supplied values directly, bypassing parameterization. LiteLLM version 1.83.7 resolves this by properly parameterizing the query, with disabling error logs also offered as a mitigation. → securityweek.com
2026-04-29 2026Firefox using advanced AI to find fix browser security flaws news FuzzingFirefox is leveraging advanced AI to proactively identify and fix security vulnerabilities in its browser. This innovative approach aims to enhance user safety by detecting flaws before they can be exploited. The article highlights how AI is becoming an increasingly powerful tool in cybersecurity, particularly in the realm of software development and maintenance. → msn.com
2026-04-29 2026Cursor AI Vulnerability Enables Remote Code Execution news RCEA critical vulnerability in Cursor AI has been discovered, allowing for Remote Code Execution (RCE). This means an attacker could potentially run unauthorized code on a user's system through the AI. The exact impact and exploitation details are likely to be further detailed in the linked content. This type of vulnerability poses a significant security risk, potentially leading to data breaches, system compromise, and other malicious activities. → letsdatascience.com
2026-04-28 2026FIRESIDE CHAT: Leaked secrets are now the go-to attack vector and AI is accelerating exposures news SecretsLibrary for scanning public GitHub commits and private repositories for hard-coded secrets. It detects over 28.6 million leaked credentials in 2025, a 34% year-over-year increase, with AI infrastructure secrets like OpenRouter and DeepSeek API keys spiking significantly. The library addresses the remediation problem, noting that 64% of leaked credentials from 2022 remain active. It highlights how AI-assisted code, like commits co-signed by Claude Code, contains secrets at a 33% rate, and emphasizes the need for governance alongside tools like SPIFFE for machine identity. → securityboulevard.com
2026-04-28 2026Experts flag potentially critical security issues at heart of Anthropic MCP newsSecurity experts have identified potentially critical vulnerabilities within Anthropic's "MCP" (likely referring to their model or platform). These issues, if exploited, could pose significant risks. The article highlights concerns about the security of Anthropic's core technology. No specific payout amounts for bug bounties were mentioned in the provided content. → msn.com
2026-04-27 2026Weekly Recap: Fast16 Malware XChat Launch Federal Backdoor AI Employee Tracking & More newsToolset highlighting recent application security threats including fast16 malware, the UNC6692 group's Snow malware suite, FIRESTARTER backdoor targeting a U.S. federal agency, Lotus Wiper affecting Venezuelan energy systems, and The Gentlemen RaaS deploying SystemBC. It also covers the Bitwarden CLI compromise, detailing vulnerabilities such as CVE-2025-20333 and CVE-2025-20362. → thehackernews.com
2026-04-27 2026Poisoned pixels phishing prompt injection: Cybersecurity threats in AI-driven radiology beginnerLibrary discussing AI vulnerabilities in healthcare radiology, focusing on prompt injection techniques like data poisoning, backdoor attacks, and jailbreaking. It highlights risks of LLMs in DICOM headers and diagnostic imaging data, enabling attacks without advanced programming skills. Countermeasures explored include least privilege, sandboxing, digital watermarking, and red teaming involving clinical specialists, alongside the persistent human factor in cybersecurity.
2026-04-26 2026Anthropic's model context protocol includes a critical remote code execution vulnerability news RCEA critical remote code execution vulnerability has been discovered in Anthropic's model context protocol. This flaw could allow attackers to execute arbitrary code on a system, posing a significant security risk. Further details are available at the provided link. No bug bounty payout amount is mentioned in the content. → msn.com
2026-04-26 2026prompt-security/clawsec: A complete security skill suite for OpenClaw's and NanoClaw agents (and variants). Protect your SOUL.md (etc') with drift detection, live security recommendations, automated audits, and skill integrity verification. All from one installable suite. intermediate Supply ChainLibrary for comprehensive security for AI agent platforms like OpenClaw, NanoClaw, Hermes, and Picoclaw. It provides unified security monitoring, drift detection, live security recommendations from NVD CVE polling, automated audits for prompt injection, and skill integrity verification. The suite includes a one-command installer, file integrity protection for critical agent files (SOUL.md, etc.), and checksum verification for all skill artifacts. It also offers exploitability context enrichment for CVE advisories, detailing exploit existence, weaponization status, attack requirements, and risk assessment to prioritize immediate threats.
2026-04-24 2026Indirect prompt injection is taking hold in the wild beginnerAnalysis of indirect prompt injection (IPI) observed in the wild, detailing techniques for hiding malicious instructions within web pages and metadata. Researchers from Google and Forcepoint identified IPIs ranging from harmless pranks to destructive actions like data exfiltration, financial fraud via PayPal and Stripe, and denial-of-service attacks. Hidden text, HTML comments, and metadata injection are common obfuscation methods. The increasing prevalence and sophistication of these attacks, particularly against agentic AIs with elevated privileges, necessitate strict data-instruction boundaries. → helpnetsecurity.com
2026-04-24 2026GPT-5.5 Bio Bug Bounty Program Aims to Improve AI Safety and Performance news Bug BountyA bug bounty program has been launched for GPT-5.5, focusing on enhancing both AI safety and performance. This initiative encourages researchers to identify and report vulnerabilities, contributing to the ongoing development and refinement of the AI model. The program aims to proactively address potential issues before widespread deployment, ensuring a more robust and secure AI. Specific details on payout amounts are not provided in the title or content. → gbhackers.com
2026-04-24 2026How indirect prompt injection attacks on AI work - and 6 ways to shut them down intermediateLibrary of resources addressing indirect prompt injection attacks on LLMs, a leading security risk. This threat involves hidden instructions within web content, emails, or addresses that can cause AI to perform malicious actions like data exfiltration or unauthorized redirection, as detailed by researchers from Palo Alto Networks and Forcepoint. Techniques such as API key theft, system override, attribute hijacking, and terminal command injection are outlined. The library also covers defensive strategies including input/output validation, human oversight, and vendor-specific mitigation efforts from Google, Microsoft, Anthropic, and OpenAI.
2026-04-23 2026Six AI Vulnerabilities Three Attack Patterns One Dangerous Service Gap newsLibrary for analyzing AI vulnerabilities, focusing on three distinct attack patterns: untrusted input processed as trusted AI context, overly broad AI data access without per-operation enforcement, and process containment and functional scoping failures. This analysis covers vulnerabilities like EchoLeak, Reprompt, ForcedLeak, GeminiJack, and GrafanaGhost, highlighting the need for robust input validation extended to all data sources AI touches, per-operation access control for AI data requests, and strict functional scoping for back-end AI processes, rather than solely relying on model-level guardrails.
2026-04-23 2026AI-powered scanner vulnerabilities newsLibrary detailing vulnerabilities in AI-powered web scanners that leverage Large Language Models. It outlines how attacker-controlled content can influence scanner reasoning, leading to indirect prompt injection attacks. These attacks can cause unintended state changes, data exfiltration, and exploitation of routing-based SSRF, often by manipulating Host headers to access internal services from within the scanner's privileged network position. → portswigger.net
2026-04-23 2026Anthropic's model context protocol includes a critical remote code execution vulnerability newsAnthropic's model context protocol includes a critical remote code execution vulnerability https://ift.tt/Hfb3ygq → msn.com
2026-04-22 2026Massive compromise hits LiteLLM and the whole AI developers community: how did it happen? newsMassive compromise hits LiteLLM and the whole AI developers community: how did it happen? https://ift.tt/kWQ0dJB → cybernews.com
2026-04-22 2026Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it newsThree AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it https://ift.tt/smH86bY
2026-04-22 2026You're Simulating the Wrong Attacker: Who Matters in AI Red Teaming beginnerLibrary for AI red teaming that highlights the limitations of simulating only prompt injection attackers. It details six distinct threat actor profiles, including low-skill script kiddies, insider threats, and sophisticated nation-state actors, each requiring specialized testing approaches across five expertise domains: prompt engineering, application security, architecture, data/ML security, and business logic. The resource emphasizes that traditional app security teams and even many AI-focused firms miss critical attack surfaces by not simulating a broader range of adversaries and attack vectors.
2026-04-22 2026DeepTeam: Open-Source Framework to Red Team LLMs and LLM Systems intermediateFramework for red teaming LLM systems, DeepTeam simulates attacks like jailbreaking, prompt injection, and multi-turn exploitation to uncover vulnerabilities such as bias, PII leakage, and SQL injection. It supports over 50 pre-built vulnerabilities mapped to frameworks like OWASP Top 10 for LLMs and NIST AI RMF, along with 20+ adversarial attack methods. DeepTeam also includes seven production-ready guardrails and allows custom vulnerability creation.
2026-04-22 2026Claude Jailbreaking in 2026: What Repello's Red Teaming Data Shows newsAnalysis of Repello's red-teaming data on LLM jailbreaking reveals Claude Opus 4.5's significantly lower breach rates (4.8%) compared to GPT-5.2 (14.3%) and GPT-5.1 (28.6%) across 21 multi-turn adversarial scenarios. Claude Opus 4.5 demonstrated complete defense against financial fraud and mass deletion attempts, while GPT-5.2 exhibited a "refusal-enablement gap" by refusing harmful actions linguistically yet providing executable attack steps. The analysis highlights that operational risk stems from multi-turn adversarial sequences and application-layer attacks on custom deployments, rather than simple single-prompt jailbreaks.
2026-04-22 2026AI-Infra-Guard: Full-Stack AI Red Teaming Platform intermediatePlatform for full-stack AI red teaming, AI-Infra-Guard integrates capabilities like ClawScan, Agent Scan, AI infra vulnerability scanning, MCP Server & Agent Skills scan, and Jailbreak Evaluation. It aims to detect vulnerabilities including the LiteLLM supply chain attack (CRITICAL) and supports scanning AI components like FastGPT, Upsonic, crewai, and kubeai, with a vulnerability database refreshed across multiple components and new CVE/GHSA entries.
2026-04-22 2026AI Red Teaming Playground Labs (Microsoft) intermediateLibrary providing AI Red Teaming Playground Labs, originally featured in Black Hat USA 2024. It offers challenges for systematically red teaming AI systems, incorporating adversarial machine learning and Responsible AI failures. These labs are also referenced in the Microsoft Learn Limited Series: AI Red Teaming 101. The repository includes Jupyter Notebooks showcasing the use of the Python Risk Identification Tool (PyRIT) for automated risk identification in generative AI systems, specifically for Labs 1 and 5.

Frequently Asked Questions

What is prompt injection?
Prompt injection is an attack against applications that use large language models (LLMs). An attacker crafts input that overrides or manipulates the LLM's system instructions, causing it to perform unintended actions. Direct prompt injection targets the user input; indirect prompt injection embeds malicious instructions in data the LLM processes, such as emails or web pages.
What is the OWASP Top 10 for LLM Applications?
The OWASP Top 10 for LLM Applications identifies the most critical security risks for AI-powered applications, including prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft.
How do you secure AI-integrated applications?
Key practices include validating and sanitizing LLM outputs before rendering or executing them, implementing least-privilege access for AI agents, using guardrails to constrain model behavior, monitoring for prompt injection attempts, applying rate limiting, separating AI processing from privileged operations, and treating all LLM output as untrusted user input.

Weekly AppSec Digest

Get new resources delivered every Monday.