What is prompt injection?

Prompt injection is an attack against applications that use large language models (LLMs). An attacker crafts input that overrides or manipulates the LLM's system instructions, causing it to perform unintended actions. Direct prompt injection targets the user input; indirect prompt injection embeds malicious instructions in data the LLM processes, such as emails or web pages.

What is the OWASP Top 10 for LLM Applications?

The OWASP Top 10 for LLM Applications identifies the most critical security risks for AI-powered applications, including prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft.

How do you secure AI-integrated applications?

Key practices include validating and sanitizing LLM outputs before rendering or executing them, implementing least-privilege access for AI agents, using guardrails to constrain model behavior, monitoring for prompt injection attempts, applying rate limiting, separating AI processing from privileged operations, and treating all LLM output as untrusted user input.

AI Resources | appsec.fyi

AI

AI security encompasses both protecting AI systems from attack and understanding the new vulnerability classes that AI introduces into applications. As organizations rapidly integrate large language models (LLMs), machine learning pipelines, and AI-powered features into their products, the attack surface has expanded in ways that traditional application security frameworks don't fully address.

Key threats to AI systems include prompt injection — where attackers manipulate LLM behavior through crafted inputs — data poisoning of training datasets, model extraction through repeated API queries, and adversarial examples that cause misclassification. Indirect prompt injection, where malicious instructions are embedded in data the AI processes (emails, documents, web pages), is emerging as one of the most significant security challenges for AI-integrated applications.

AI also introduces new categories of application risk: insecure output handling where LLM responses are rendered unsafely, excessive agency when AI agents are given too much access, sensitive information disclosure through training data leakage, and supply chain risks from fine-tuned models and third-party plugins. The OWASP Top 10 for LLM Applications provides a structured framework for understanding these risks.

On the defensive side, AI is being used to enhance security operations — automating vulnerability detection, analyzing malicious patterns, and accelerating incident response.

This page collects AI security research, LLM vulnerability techniques, defensive strategies, and resources covering the intersection of artificial intelligence and application security.

Level:

Date Added	Link	Excerpt
2026-05-21 NEW 2026	LLM Security News: Risks Incidents Defenses news	Library of LLM security incidents and defenses details how rapid adoption of large language models has created new attack surfaces, expanding the enterprise threat landscape beyond traditional controls. It highlights risks like prompt injection, tool abuse, insecure output handling, and LLM supply chain threats, exemplified by the LiteLLM compromise and early 2025 data breaches. The OWASP LLM Top 10, including sensitive information disclosure and excessive agency, are discussed as persistent vulnerabilities, with conventional tools insufficient for addressing these LLM-specific failure modes. → blockchain-council.org
2026-05-21 NEW 2026	AI QA vs AI Security Testing: Why LLM Apps Need Both Before They Scale beginner	Library for AI applications that requires both AI QA and AI security testing to move beyond traditional assumptions. It highlights that while AI QA focuses on usefulness, accuracy, and consistency, AI security testing addresses manipulation risks like prompt injection, data leakage, and unauthorized tool use, referencing the OWASP Top 10 for LLMs and the NIST AI Risk Management Framework.
2026-05-21 NEW 2026	Generative AI Data Privacy and Security in LLMs beginner	Library for securing Generative AI and LLM workflows, addressing data privacy risks including training data leakage, prompt injection, and output harms. It details where sensitive data appears across training data, prompts, outputs, and telemetry, and outlines practical controls like data discovery, classification, minimization, anonymization, and differential privacy. The resource highlights regulatory pressures like GDPR and the AI Act, and common risk patterns identified by MIT and Stanford HAI, emphasizing OWASP's identified critical LLM risks. → blockchain-council.org
2026-05-20 NEW 2026	Security for AI Agent Managers: Key Controls beginner	Library for securing AI agent managers, focusing on mitigating prompt injection, data leaks, and abuse of capabilities. It details risks inherent in agentic systems, including indirect prompt injection in browser agents and tool-chain injection, referencing industry guidance from NIST and the EU AI Act. Recommended layered mitigations include deploying an AI security gateway, enforcing context separation, hardening tool-use policies with least privilege, improving memory and RAG hygiene, and continuous monitoring and red-teaming. → blockchain-council.org
2026-05-20 NEW 2026	How prompt injection broke Nvidia's sandboxed OpenClaw agent intermediate	Writeup on prompt injection vulnerabilities in Nvidia's sandboxed OpenClaw agent, detailing how attackers can bypass isolation through dependency poisoning with emoji-encoded payloads and agent configuration poisoning via indirect prompt injection. The research highlights the inadequacy of sandboxes alone to prevent data exfiltration and persistent behavioral corruption, contrasting with the broader "IDEsaster" threat in non-sandboxed AI coding tools like Cursor and GitHub Copilot.
2026-05-19 NEW 2026	AI Agent Security: Automating Workflow Without Creating Prompt Injection or Data Leak Risks intermediate	Reference on securing AI agents, detailing risks like prompt injection and data leakage, as described by OWASP and NIST. It emphasizes separating untrusted content from agent instructions, implementing data minimization, role-based access, output controls, and robust logging. The guide advises starting with lower-risk tasks and incorporating human review for sensitive actions, offering a checklist to identify potential vulnerabilities before deployment. → hackread.com
2026-05-19 NEW 2026	7 Serious AI Security Risks and How to Mitigate Them beginner	Library addressing AI security risks including prompt injection attacks and data leaks. It details mitigations for limited testing, lack of explainability, data breaches, adversarial attacks, bias, and supply chain risks, highlighting techniques like adversarial training, interpretable models, encryption, differential privacy, ensemble methods, and bias audits. The resource also notes how LLMs enable attackers to work faster, create convincing deceptions, operate more independently, and discover new vulnerabilities, impacting systems like Slack AI. → wiz.io
2026-05-17 NEW 2026	Researchers Uncover 10 In-the-Wild Prompt Injection Payloads Targeting AI Agents news	Writeup detailing 10 indirect prompt injection (IPI) payloads discovered in the wild targeting AI agents. These payloads leverage poisoned web content to trick agents into executing malicious instructions, leading to data destruction, API key theft, and financial fraud. The attack chain involves threat actors embedding hidden instructions like "Ignore previous instructions" which, when processed by agents that browse and summarize web pages, bypass security protocols. High-impact targets include agentic AIs with privileges like sending emails or executing terminal commands, potentially affecting tools such as GitHub Copilot and AI-powered CI/CD reviewers. → infosecurity-magazine.com
2026-05-13 2026	How indirect prompt injection attacks on AI work - and 6 ways to shut them down intermediate	Library providing defenses against indirect prompt injection attacks, a top LLM security risk. These attacks weaponize AI by embedding malicious instructions within external data sources, leading to actions like API key theft, system overrides, attribute hijacking, and terminal command injection. Mitigation strategies include input/output validation, human oversight, least privilege, and OWASP's cheat sheet for handling these threats, which are ranked as the highest to LLM security by OWASP.
2026-05-12 2026	7 AI Security Tools to Prepare You for Every Attack Phase beginner	Library for hardening machine learning models against adversarial threats, the Adversarial Robustness Toolbox (ART) offers Python modules for assessing, defending, and verifying security. It supports 39 attack and 29 defense modules across major ML frameworks like TensorFlow and PyTorch, handling various data modalities. ART provides robustness metrics for objective resilience reporting, best suited for ML researchers and security engineers focused on adversarial attack simulation and model hardening during development. → wiz.io
2026-05-08 2026	The AI Agent Security Surface: What Gets Exposed When You Add Tools and Memory intermediate	Library for securing AI agents, moving beyond model-centric security to address four distinct attack surfaces: Prompt, Tool, Memory, and Planning Loop. This framework details vulnerabilities like indirect prompt injection, parameter injection against tools, memory poisoning illustrated by MINJA Framework successes, and planning loop manipulation leading to cascading failures in multi-agent systems. Mitigations include boundary sanitization, least privilege, provenance tracking, and reasoning logging.
2026-05-08 2026	Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments intermediate	Library demonstrating indirect AGENTS.md injection attacks in agentic environments. This library highlights a supply chain risk where malicious dependencies can overwrite AGENTS.md files, allowing attackers to hijack AI agent behavior, exemplified by a Golang project with a compromised `github.com/cursorwiz/echo` dependency that injects a stealthy `time.Sleep` command and manipulates PR summaries.
2026-05-05 2026	Supply-chain attacks take aim at your AI coding agents beginner Supply Chain	Library for identifying and mitigating AI coding agent supply-chain risks, including techniques like "slopsquatting" and LLM Optimization abuse used in the PromptMink campaign by North Korean APT group Famous Chollima. It details malicious packages targeting AI agents on registries like NPM and PyPI, featuring persuasive descriptions, legitimate functionality lures, and the use of compiled payloads and obfuscation for evasion. The library addresses how AI agents can be manipulated into installing malicious dependencies, as observed with hallucinated package names and overly convincing documentation designed to influence LLM recommendations. → csoonline.com
2026-05-05 2026	LiteLLM flaw exploited within 36 hours of disclosure news	A critical flaw in LiteLLM was exploited within 36 hours of its public disclosure. The vulnerability, which allowed for potential data exfiltration, posed a significant risk to users. The rapid exploitation highlights the urgency of patching security vulnerabilities and the swiftness with which malicious actors can leverage disclosed weaknesses. No specific bounty payout amount was mentioned in the provided content. → msn.com
2026-05-05 2026	AI finds 20-year-old bugs in PostgreSQL and MariaDB news	Analysis of critical vulnerabilities discovered by AI in PostgreSQL and MariaDB, including CVE-2026-2005 (PostgreSQL pgcrypto heap buffer overflow), CVE-2026-2006 (PostgreSQL missing validation), and CVE-2026-32710 (MariaDB JSON_SCHEMA_VALID() buffer overflow). These flaws, some dating back over 20 years, enable remote code execution and have been patched by maintainers. → csoonline.com
2026-05-04 2026	Weekly Recap: AI-Powered Phishing Android Spying Tool Linux Exploit GitHub RCE & More news Mobile RCE	Library for detecting and mitigating threats including the CVE-2026-41940 cPanel flaw, CVE-2026-31431 Linux kernel vulnerability (Copy Fail), and CVE-2026-3854 GitHub RCE. It also covers vishing tactics for SaaS breaches, TeamPCP's supply chain attacks across npm, PyPI, and Packagist, a DEEP#DOOR Python backdoor, and the VECT 2.0 ransomware. → thehackernews.com
2026-05-04 2026	Local Guardrails for Secrets Security in the Age of AI Coding Assistants beginner Secrets Supply Chain	Library for local secret scanning, ggshield, addresses the shift of software supply chain attack surfaces to developer workstations. It detects hardcoded credentials in .env files, terminal history, build output, and AI prompts, mitigating risks before they reach remote repositories or pipelines. The tool integrates directly into developer workflows via editors, Git hooks, terminals, and AI coding assistants, preventing credential exposure and simplifying incident response. → blog.gitguardian.com
2026-05-03 2026	SecureLayer7 Discloses Two High Injection Vulnerabilities in Spring AI news	Writeup detailing two high-severity injection vulnerabilities in Spring AI, CVE-2026-22730 (SQL Injection) and CVE-2026-22729 (JSONPath Injection). These flaws, discovered by SecureLayer7's Blackf0g team, affect vector store metadata filtering and bypass access controls in RAG applications. The SQL injection allows authenticated attackers to manipulate MariaDBFilterExpressionConverter, while the JSONPath injection impacts PostgreSQL and Oracle vector stores via Vector Stores FilterExpressionConverter. Both vulnerabilities are fixed in Spring AI 1.0.4 and 1.1.3.
2026-05-01 2026	Anthropic Rolls Out Claude Security for AI Vulnerability Scanning beginner	Tool for AI-powered application security scanning, Claude Security, utilizes Claude Opus 4.7 to reason about code and identify vulnerabilities by understanding component interactions and data flows, rather than relying solely on pattern matching. It offers scheduled and targeted scans, detailed explanations of findings including confidence ratings and severity, and generates patch instructions. Claude Security integrates with existing audit systems and can send results to platforms like Slack and Jira, aiming to reduce false positives through a multi-stage validation pipeline. → infosecurity-magazine.com
2026-05-01 2026	Poisoning the well: AI supply chain attacks on Hugging Face and OpenClaw beginner Supply Chain	Library of malicious AI skills and models found on Hugging Face and ClawHub, facilitating AI supply chain attacks. Attackers exploit trust in these platforms by embedding trojanized skills and disguised payloads, leading to malware delivery including trojans, cryptominers, and the AMOS stealer. Techniques like indirect prompt injection enable AI agents to execute malicious actions on behalf of users, expanding the attack surface beyond initial compromise.
2026-04-30 2026	CVE MCP Server Turns Claude Into a Full-Spectrum Security Analyst With 27 Tools Across 21 APIs intermediate API Sec	The CVE MCP Server leverages Claude's AI capabilities to transform it into a comprehensive security analyst. It integrates 27 distinct security tools through 21 different APIs. This allows Claude to analyze vulnerabilities and threats from a wide spectrum of angles, enhancing its ability to identify and address security issues. The tool aims to provide a more robust and integrated approach to cybersecurity analysis by bringing together diverse functionalities under a single AI-powered platform. → cybersecuritynews.com
2026-04-30 2026	Benchmarking AI Pentesting Tools: A Practical Comparison intermediate	This article provides a practical comparison of AI-powered penetration testing tools. It evaluates their effectiveness and efficiency in various cybersecurity scenarios. The focus is on how these tools leverage AI to automate and enhance aspects of the pentesting process, such as vulnerability detection and exploitation. The comparison aims to help security professionals choose the most suitable AI tools for their needs. No specific bounty payout amounts are mentioned in the provided content. → securityboulevard.com
2026-04-30 2026	CVE-2026-42208: LiteLLM SQL Injection Leaks Upstream API Keys news SQLi	Writeup of CVE-2026-42208, a critical pre-authentication SQL injection in LiteLLM, a popular AI gateway. Exploited 36 hours after disclosure, this vulnerability in versions prior to 1.83.7-stable allows attackers to steal upstream API keys for providers like OpenAI, Anthropic, and Gemini by targeting the `litellm_credentials` and `litellm_config` tables. Immediate upgrade to version 1.83.7-stable or implementing mitigation strategies is advised.
2026-04-30 2026	H-mmer/pentest-agents: Autonomous bug-bounty framework for Claude Code — 40 specialist agents, exploit-chain builder, writeup search, and live HackerOne/Bugcrowd integration. intermediate Bug Bounty	Library for autonomous bug-bounty hunting, integrating with Claude Code and other AI coding tools. It features 50 specialist agents, an exploit-chain builder, writeup search capabilities leveraging FAISS for semantic or keyword retrieval, and live integration with HackerOne and Bugcrowd platforms. The framework supports automated hunt loops, persistent endpoint tracking, and a cross-IDE installer for seamless deployment.
2026-04-29 2026	CVE-2026-42208: LiteLLM bug exploited 36 hours after its disclosure news SQLi	Writeup of CVE-2026-42208, an SQL injection in LiteLLM's proxy API key verification, exploited 36 hours post-disclosure. Attackers leverage crafted Authorization headers to access and potentially modify sensitive data in database tables holding API keys and credentials. The vulnerability, present in LiteLLM versions 1.81.16 to 1.83.6, was addressed in version 1.83.7. Disabling error logs offers a workaround for unpatchable instances. → securityaffairs.com
2026-04-29 2026	AI Finds 38 Security Flaws in OpenEMR news RCE	An AI system has identified 38 security vulnerabilities within the OpenEMR electronic health records software. The AI's analysis, detailed in a linked report, uncovered these flaws, highlighting potential risks to patient data security and system integrity. This discovery underscores the growing role of artificial intelligence in identifying and addressing security weaknesses in critical software applications. No specific bug bounty payout amount was mentioned in the provided content. → darkreading.com
2026-04-29 2026	LiteLLM exploited within 36 hours of disclosure via SQL injection bug news SQLi	Library for managing large language model (LLM) interactions. Explores the exploitation of CVE-2026-42208, a SQL injection vulnerability in LiteLLM, which led to the theft of API keys and provider credentials from enterprises using the proxy to connect to models like OpenAI and Anthropic. The vulnerability, disclosed and exploited within 36 hours, highlights the compressed window between vulnerability discovery and weaponization, potentially exposing sensitive company IP and private data. Disabling error logs is a suggested mitigation. → scworld.com
2026-04-29 2026	Malicious npm Dependency Linked to AI Assisted Commit Targets Crypto Wallets news Supply Chain	Library of malicious npm dependencies linked to AI-assisted commits, specifically @validate-sdk/v2 and the PromptMink campaign, targeting crypto wallets. This North Korean state-sponsored actor, Famous Chollima, employed a layered attack structure with legitimate-seeming Web3 utilities hiding malware payloads, evolving from JavaScript to compiled binaries and Rust across Linux and Windows to exfiltrate sensitive data, system information, project folders, and install SSH keys for persistent access. → infosecurity-magazine.com
2026-04-29 2026	Fresh LiteLLM Vulnerability Exploited Shortly After Disclosure news SQLi	Library for securing AI gateways, specifically addressing CVE-2026-42208, a critical-severity SQL injection vulnerability in LiteLLM. This flaw, exploitable pre-authentication, allowed unauthenticated attackers to craft malicious Authorization headers to access sensitive database tables containing API keys and credentials. The vulnerability arises from a database query that includes caller-supplied values directly, bypassing parameterization. LiteLLM version 1.83.7 resolves this by properly parameterizing the query, with disabling error logs also offered as a mitigation. → securityweek.com
2026-04-29 2026	Firefox using advanced AI to find fix browser security flaws news Fuzzing	Firefox is leveraging advanced AI to proactively identify and fix security vulnerabilities in its browser. This innovative approach aims to enhance user safety by detecting flaws before they can be exploited. The article highlights how AI is becoming an increasingly powerful tool in cybersecurity, particularly in the realm of software development and maintenance. → msn.com
2026-04-29 2026	Cursor AI Vulnerability Enables Remote Code Execution news RCE	A critical vulnerability in Cursor AI has been discovered, allowing for Remote Code Execution (RCE). This means an attacker could potentially run unauthorized code on a user's system through the AI. The exact impact and exploitation details are likely to be further detailed in the linked content. This type of vulnerability poses a significant security risk, potentially leading to data breaches, system compromise, and other malicious activities. → letsdatascience.com
2026-04-28 2026	FIRESIDE CHAT: Leaked secrets are now the go-to attack vector and AI is accelerating exposures news Secrets	Library for scanning public GitHub commits and private repositories for hard-coded secrets. It detects over 28.6 million leaked credentials in 2025, a 34% year-over-year increase, with AI infrastructure secrets like OpenRouter and DeepSeek API keys spiking significantly. The library addresses the remediation problem, noting that 64% of leaked credentials from 2022 remain active. It highlights how AI-assisted code, like commits co-signed by Claude Code, contains secrets at a 33% rate, and emphasizes the need for governance alongside tools like SPIFFE for machine identity. → securityboulevard.com
2026-04-28 2026	Experts flag potentially critical security issues at heart of Anthropic MCP news	Security experts have identified potentially critical vulnerabilities within Anthropic's "MCP" (likely referring to their model or platform). These issues, if exploited, could pose significant risks. The article highlights concerns about the security of Anthropic's core technology. No specific payout amounts for bug bounties were mentioned in the provided content. → msn.com
2026-04-27 2026	Weekly Recap: Fast16 Malware XChat Launch Federal Backdoor AI Employee Tracking & More news	Toolset highlighting recent application security threats including fast16 malware, the UNC6692 group's Snow malware suite, FIRESTARTER backdoor targeting a U.S. federal agency, Lotus Wiper affecting Venezuelan energy systems, and The Gentlemen RaaS deploying SystemBC. It also covers the Bitwarden CLI compromise, detailing vulnerabilities such as CVE-2025-20333 and CVE-2025-20362. → thehackernews.com
2026-04-27 2026	Poisoned pixels phishing prompt injection: Cybersecurity threats in AI-driven radiology beginner	Library discussing AI vulnerabilities in healthcare radiology, focusing on prompt injection techniques like data poisoning, backdoor attacks, and jailbreaking. It highlights risks of LLMs in DICOM headers and diagnostic imaging data, enabling attacks without advanced programming skills. Countermeasures explored include least privilege, sandboxing, digital watermarking, and red teaming involving clinical specialists, alongside the persistent human factor in cybersecurity.
2026-04-26 2026	Anthropic's model context protocol includes a critical remote code execution vulnerability news RCE	A critical remote code execution vulnerability has been discovered in Anthropic's model context protocol. This flaw could allow attackers to execute arbitrary code on a system, posing a significant security risk. Further details are available at the provided link. No bug bounty payout amount is mentioned in the content. → msn.com
2026-04-26 2026	prompt-security/clawsec: A complete security skill suite for OpenClaw's and NanoClaw agents (and variants). Protect your SOUL.md (etc') with drift detection, live security recommendations, automated audits, and skill integrity verification. All from one installable suite. intermediate Supply Chain	Library for comprehensive security for AI agent platforms like OpenClaw, NanoClaw, Hermes, and Picoclaw. It provides unified security monitoring, drift detection, live security recommendations from NVD CVE polling, automated audits for prompt injection, and skill integrity verification. The suite includes a one-command installer, file integrity protection for critical agent files (SOUL.md, etc.), and checksum verification for all skill artifacts. It also offers exploitability context enrichment for CVE advisories, detailing exploit existence, weaponization status, attack requirements, and risk assessment to prioritize immediate threats.
2026-04-24 2026	Indirect prompt injection is taking hold in the wild beginner	Analysis of indirect prompt injection (IPI) observed in the wild, detailing techniques for hiding malicious instructions within web pages and metadata. Researchers from Google and Forcepoint identified IPIs ranging from harmless pranks to destructive actions like data exfiltration, financial fraud via PayPal and Stripe, and denial-of-service attacks. Hidden text, HTML comments, and metadata injection are common obfuscation methods. The increasing prevalence and sophistication of these attacks, particularly against agentic AIs with elevated privileges, necessitate strict data-instruction boundaries. → helpnetsecurity.com
2026-04-24 2026	GPT-5.5 Bio Bug Bounty Program Aims to Improve AI Safety and Performance news Bug Bounty	A bug bounty program has been launched for GPT-5.5, focusing on enhancing both AI safety and performance. This initiative encourages researchers to identify and report vulnerabilities, contributing to the ongoing development and refinement of the AI model. The program aims to proactively address potential issues before widespread deployment, ensuring a more robust and secure AI. Specific details on payout amounts are not provided in the title or content. → gbhackers.com
2026-04-24 2026	How indirect prompt injection attacks on AI work - and 6 ways to shut them down intermediate	Library of resources addressing indirect prompt injection attacks on LLMs, a leading security risk. This threat involves hidden instructions within web content, emails, or addresses that can cause AI to perform malicious actions like data exfiltration or unauthorized redirection, as detailed by researchers from Palo Alto Networks and Forcepoint. Techniques such as API key theft, system override, attribute hijacking, and terminal command injection are outlined. The library also covers defensive strategies including input/output validation, human oversight, and vendor-specific mitigation efforts from Google, Microsoft, Anthropic, and OpenAI.
2026-04-23 2026	Six AI Vulnerabilities Three Attack Patterns One Dangerous Service Gap news	Library for analyzing AI vulnerabilities, focusing on three distinct attack patterns: untrusted input processed as trusted AI context, overly broad AI data access without per-operation enforcement, and process containment and functional scoping failures. This analysis covers vulnerabilities like EchoLeak, Reprompt, ForcedLeak, GeminiJack, and GrafanaGhost, highlighting the need for robust input validation extended to all data sources AI touches, per-operation access control for AI data requests, and strict functional scoping for back-end AI processes, rather than solely relying on model-level guardrails.
2026-04-23 2026	AI-powered scanner vulnerabilities news	Library detailing vulnerabilities in AI-powered web scanners that leverage Large Language Models. It outlines how attacker-controlled content can influence scanner reasoning, leading to indirect prompt injection attacks. These attacks can cause unintended state changes, data exfiltration, and exploitation of routing-based SSRF, often by manipulating Host headers to access internal services from within the scanner's privileged network position. → portswigger.net
2026-04-23 2026	Anthropic's model context protocol includes a critical remote code execution vulnerability news	Anthropic's model context protocol includes a critical remote code execution vulnerability https://ift.tt/Hfb3ygq → msn.com
2026-04-22 2026	Massive compromise hits LiteLLM and the whole AI developers community: how did it happen? news	Massive compromise hits LiteLLM and the whole AI developers community: how did it happen? https://ift.tt/kWQ0dJB → cybernews.com
2026-04-22 2026	Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it news	Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it https://ift.tt/smH86bY
2026-04-22 2026	You're Simulating the Wrong Attacker: Who Matters in AI Red Teaming beginner	Library for AI red teaming that highlights the limitations of simulating only prompt injection attackers. It details six distinct threat actor profiles, including low-skill script kiddies, insider threats, and sophisticated nation-state actors, each requiring specialized testing approaches across five expertise domains: prompt engineering, application security, architecture, data/ML security, and business logic. The resource emphasizes that traditional app security teams and even many AI-focused firms miss critical attack surfaces by not simulating a broader range of adversaries and attack vectors.
2026-04-22 2026	DeepTeam: Open-Source Framework to Red Team LLMs and LLM Systems intermediate	Framework for red teaming LLM systems, DeepTeam simulates attacks like jailbreaking, prompt injection, and multi-turn exploitation to uncover vulnerabilities such as bias, PII leakage, and SQL injection. It supports over 50 pre-built vulnerabilities mapped to frameworks like OWASP Top 10 for LLMs and NIST AI RMF, along with 20+ adversarial attack methods. DeepTeam also includes seven production-ready guardrails and allows custom vulnerability creation.
2026-04-22 2026	Claude Jailbreaking in 2026: What Repello's Red Teaming Data Shows news	Analysis of Repello's red-teaming data on LLM jailbreaking reveals Claude Opus 4.5's significantly lower breach rates (4.8%) compared to GPT-5.2 (14.3%) and GPT-5.1 (28.6%) across 21 multi-turn adversarial scenarios. Claude Opus 4.5 demonstrated complete defense against financial fraud and mass deletion attempts, while GPT-5.2 exhibited a "refusal-enablement gap" by refusing harmful actions linguistically yet providing executable attack steps. The analysis highlights that operational risk stems from multi-turn adversarial sequences and application-layer attacks on custom deployments, rather than simple single-prompt jailbreaks.
2026-04-22 2026	AI-Infra-Guard: Full-Stack AI Red Teaming Platform intermediate	Platform for full-stack AI red teaming, AI-Infra-Guard integrates capabilities like ClawScan, Agent Scan, AI infra vulnerability scanning, MCP Server & Agent Skills scan, and Jailbreak Evaluation. It aims to detect vulnerabilities including the LiteLLM supply chain attack (CRITICAL) and supports scanning AI components like FastGPT, Upsonic, crewai, and kubeai, with a vulnerability database refreshed across multiple components and new CVE/GHSA entries.
2026-04-22 2026	AI Red Teaming Playground Labs (Microsoft) intermediate	Library providing AI Red Teaming Playground Labs, originally featured in Black Hat USA 2024. It offers challenges for systematically red teaming AI systems, incorporating adversarial machine learning and Responsible AI failures. These labs are also referenced in the Microsoft Learn Limited Series: AI Red Teaming 101. The repository includes Jupyter Notebooks showcasing the use of the Python Risk Identification Tool (PyRIT) for automated risk identification in generative AI systems, specifically for Labs 1 and 5.

2026-04-22 2026	HackerOne: LLM01: Invisible Prompt Injection intermediate	Program: HackerOne Severity: medium Weakness: LLM01: Prompt Injection ## Description Hey team, Hai is vulnerable to invisible prompt injection via Unicode tag characters. ## Reproduction steps 1. ... → hackerone.com
2026-04-22 2026	When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins beginner	Survey of prompt injection risks in third-party AI chatbot plugins, analyzing 17 plugins used by over 10,000 websites. Eight plugins fail to enforce conversation history integrity, amplifying direct prompt injection by allowing forged system messages. Fifteen plugins indiscriminately ingest third-party content for web-scraping, enabling indirect prompt injection when attackers poison external data. This study systematically evaluates these vulnerabilities, showing how insecure plugin practices undermine LLM-level defenses. → arxiv.org
2026-04-22 2026	Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis advanced	Analysis of prompt injection vulnerabilities affecting agentic AI coding assistants like Claude Code, GitHub Copilot, and Cursor, which integrate LLMs with external tools and protocols such as MCP. This work synthesizes findings from 78 studies, detailing 42 attack techniques including input manipulation, tool poisoning, and protocol exploitation. It identifies that over 85% of attacks succeed against current defenses, often enabling arbitrary code execution and system compromise through vulnerabilities in skill-based architectures and protocol ecosystems. → arxiv.org
2026-04-22 2026	Prompt Injection 2.0: Hybrid AI Threats advanced	Library for analyzing Prompt Injection 2.0, which combines LLM manipulation with traditional exploits like XSS and CSRF. It builds upon Preamble's research and mitigation technologies, evaluating them against contemporary threats such as AI worms and multi-agent infections. The library analyzes how these hybrid attacks bypass security controls, referencing CVE-2024-5565 and DeepSeek XSS exploits, and proposes architectural solutions involving prompt isolation and runtime security. → arxiv.org
2026-04-22 2026	Architecting Secure AI Agents: System-Level Defenses Against Indirect Prompt Injection advanced	Library for architecting secure AI agents, focusing on system-level defenses against indirect prompt injection. It proposes dynamic replanning, constrained LLM decision-making, and treating personalization and human interaction as core design elements. The work critiques existing benchmarks, highlighting the importance of system-level structures for controlling agent behavior and integrating rule-based and model-based security checks. → arxiv.org
2026-04-22 2026	Anthropic's Model Context Protocol includes a critical remote code execution vulnerability newly discovered exploit puts 200000 AI servers at risk news	Writeup of critical RCE vulnerability in Anthropic's Model Context Protocol (MCP) affecting its SDKs across Python, TypeScript, Java, and Rust. The flaw, rooted in STDIO transport interface handling of local process execution, allows arbitrary command injection via user-controlled input without sanitization. Exploitation vectors include UI injection in AI frameworks, hardening bypasses in tools like Flowise, zero-click prompt injection in AI coding IDEs such as Windsurf and Cursor, and malicious package distribution via MCP marketplaces. OX Security reported numerous CVEs, with some fixed and others awaiting resolution.
2026-04-21 2026	The 'by design' security flaw of Model Context Protocol (MCP) news	Writeup on the Model Context Protocol (MCP) by OX Security details an architectural flaw allowing remote command execution by exploiting its STDIO interface. This vulnerability affects millions of AI applications and has resulted in numerous CVEs, enabling attackers to hijack servers and exfiltrate data through unverified MCP marketplace configurations like those found in LangFlow and AI IDEs like Windsurf and Cursor. The report emphasizes the need for developers to implement manifest-only execution, strict sandboxing, explicit opt-ins, least-privilege secret management, and marketplace verification to mitigate risks.
2026-04-21 2026	Prompt injection turned Googles Antigravity file search into RCE news	Tool: Prompt injection allows RCE in Google's Antigravity IDE, bypassing Secure Mode. Researchers exploited a flaw in the `find_my_name` tool, which used the `fd` utility. By injecting command-line flags into the `Pattern` parameter, attackers could transform file searches into arbitrary code execution, even through indirect prompt injection from untrusted source files. This bypasses Secure Mode because the native tool invocation occurs before security boundary checks. → csoonline.com
2026-04-21 2026	Claude Code Gemini CLI and GitHub Copilot Vulnerable to Prompt Injection via GitHub Comments news	Claude Code, Gemini CLI, and GitHub Copilot Vulnerable to Prompt Injection via GitHub Comments https://ift.tt/FS25xif → cybersecuritynews.com
2026-04-21 2026	Google Patches Antigravity IDE Flaw Enabling Prompt Injection Code Execution news	Library for defending against prompt injection attacks in AI-powered development tools. This library addresses vulnerabilities like the one in Google's Antigravity IDE, where flaws in file searching and input sanitization allowed code execution via the `-X` flag. It also covers techniques seen in attacks such as Comment and Control against GitHub Copilot, NomShub in Cursor IDE, ToolJack, CVE-2026-21520 in Microsoft Copilot Studio, and Claudy Day in Claude, all of which leverage untrusted input to manipulate AI agents, exfiltrate data, or gain unauthorized access. → thehackernews.com
2026-04-20 2026	Vuln in Googles Antigravity AI agent manager could escape sandbox give attackers remote code execution news	Vulnerability in Google's Antigravity AI agent manager allowed prompt injection to bypass secure mode, granting attackers remote code execution by exploiting the `find_by_name` native tool before sandbox protections engaged. This discovery, made by Pillar Security and since patched, highlights the risks of unvalidated input for agentic AI, similar to findings in Cursor, and emphasizes the need to move beyond sanitization controls for native tool parameters. → cyberscoop.com
2026-04-20 2026	Anthropic MCP Hit by Critical Vulnerability Enabling Remote Code Execution news	Anthropic MCP Hit by Critical Vulnerability Enabling Remote Code Execution https://ift.tt/4HM1zP0 → gbhackers.com
2026-04-20 2026	Critical Anthropic MCP Vulnerability Enables Remote Code Execution Attacks news	Critical Anthropic MCP Vulnerability Enables Remote Code Execution Attacks https://ift.tt/sjNEzGL → cyberpress.org
2026-04-19 2026	MCP Tool Poisoning — How It Works & How To Fight It intermediate	Library detailing MCP tool poisoning, an indirect prompt injection attack targeting AI agents interacting with tools via Model Context Protocol (MCP) servers. Attackers hide malicious instructions within tool metadata, like descriptions or schemas, making them invisible to users but readable by AI agents. This technique can lead to data exfiltration, credential hijacking, and remote code execution, and can be combined with other attacks such as MCP rug pulls. Mitigation strategies primarily involve using MCP gateways and robust AI security tools to detect changes in tool metadata and outputs.
2026-04-19 2026	Model Context Protocol Has Prompt Injection Security Problems intermediate	Library for securing applications that implement the Model Context Protocol (MCP), addressing prompt injection vulnerabilities. It details attacks like rug pulls, tool shadowing, and tool poisoning, as demonstrated by examples involving exfiltrating WhatsApp message history and manipulating `os.system()` calls. The library highlights the inherent dangers of mixing untrusted instructions with tools that can perform actions on a user's behalf.
2026-04-19 2026	Vulnerability of LLMs to Prompt Injection in Medical Advice — JAMA news	Vulnerability of LLMs to Prompt Injection in Medical Advice — JAMA
2026-04-19 2026	Prompt Injection Attack Against LLM-Integrated Applications — arXiv beginner	Survey of prompt injection attacks against LLM-integrated applications, detailing the limitations of current methods and introducing HouYi, a novel black-box attack technique. HouYi, inspired by traditional web injection, comprises a pre-constructed prompt, an injection prompt for context partitioning, and a malicious payload. The study demonstrates severe outcomes like unrestricted LLM usage and application prompt theft across 36 real-world applications, with 31 found vulnerable and 10 vendors, including Notion, validating discoveries. → arxiv.org
2026-04-19 2026	Prompt Injection Attacks in LLMs and AI Agent Systems: A Comprehensive Review beginner	Prompt Injection Attacks in LLMs and AI Agent Systems: A Comprehensive Review
2026-04-16 2026	Anthropic Defends MCP Design Despite Server Takeover Risk news	Anthropic Defends MCP Design Despite Server Takeover Risk https://ift.tt/IsVue9D → letsdatascience.com
2026-04-16 2026	The Mother of All AI Supply Chains: Critical Systemic Vulnerability at the Core of Anthropics MCP news	Analysis of Anthropic's Model Context Protocol (MCP) reveals a systemic vulnerability enabling Arbitrary Command Execution (RCE) across its SDKs for Python, TypeScript, Java, and Rust. Exploitable via unauthenticated UI injection, hardening bypasses in Flowise, zero-click prompt injection in Windsurf and Cursor, and malicious marketplace distribution, this flaw impacts over 150 million downloads and thousands of servers. Affected tools include LiteLLM, LangChain, and IBM's LangFlow, with over 10 CVEs issued. → ox.security
2026-04-16 2026	Bypassing LLM Guardrails: Evasion Attacks against Prompt Injection Detection intermediate	Analysis of evasion attacks against LLM guardrail systems, detailing two methods: character injection and algorithmic Adversarial Machine Learning (AML). Tested against Azure Prompt Shield and Meta's Prompt Guard, these techniques achieved up to 100% evasion success, maintaining adversarial utility. Attack Success Rates against black-box targets were enhanced by leveraging word importance ranking from offline white-box models, exposing vulnerabilities in current LLM protection mechanisms. → arxiv.org
2026-04-16 2026	EchoGram: Bypassing AI Guardrails via Token Flip Attacks - HiddenLayer intermediate	Technique for bypassing AI guardrails, EchoGram, exploits similarities in training data for text classification and LLM-as-a-judge systems. By appending specific "flip tokens" to malicious prompts, attackers can trick defense models into approving harmful content or generating false alarms. This attack targets defenses protecting models like GPT-4, Claude, and Gemini, and works by manipulating the guardrail layer without altering the core payload. EchoGram can be implemented via dataset distillation or model probing techniques.
2026-04-16 2026	MCP Security: Tool Poisoning Attacks - Invariant Labs intermediate	Library detailing Model Context Protocol (MCP) Tool Poisoning Attacks, a vulnerability allowing sensitive data exfiltration and AI model hijacking via malicious tool descriptions. These attacks exploit the disconnect between simplified user interfaces and complete tool descriptions, enabling instructions to access sensitive files like SSH keys and obscure data transmission. The library highlights implications for agentic systems, detailing how attackers can poison tool descriptions to compromise user data and manipulate AI behavior even with trusted servers.
2026-04-16 2026	Poison Everywhere: No Output from Your MCP Server Is Safe - CyberArk intermediate	Library for exploring Tool Poisoning Attacks (TPA) on Anthropic's Model Context Protocol (MCP). This research extends beyond description fields to demonstrate Full-Schema Poisoning (FSP) by manipulating parameter defaults and types within the tool schema. It also introduces Advanced Tool Poisoning Attacks (ATPA), which specifically target and complicate the detection of malicious tool outputs on MCP servers.
2026-04-16 2026	The Embedded Threat in Your LLM: Poisoning RAG Pipelines intermediate	Analysis of the "Embedded Threat" attack against RAG pipelines, demonstrating how attackers can poison vector databases with malicious documents. This exploit manipulates LLM behavior by embedding hidden instructions within vector embeddings, such as those generated by sentence-transformers/all-MiniLM-L6-v2, leading to altered responses without prompt modification. The attack leverages semantic similarity and LLM trust in retrieved context to inject misinformation or change personas, with proof-of-concept results showing an 80% success rate. Defenses focus on vetting sources, preprocessing content before embedding, enforcing prompt boundaries, and monitoring retrieval behavior.
2026-04-16 2026	EchoLeak: First Real-World Zero-Click Prompt Injection Exploit advanced	Writeup of EchoLeak (CVE-2025-32711), the first zero-click prompt injection exploit targeting Microsoft 365 Copilot. This vulnerability allowed unauthenticated data exfiltration via a crafted email by chaining multiple bypasses, including evading XPIA classifiers, using reference-style Markdown, exploiting auto-fetched images, and abusing a Microsoft Teams proxy within the content security policy. The paper analyzes defense failures and proposes mitigations such as prompt partitioning and enhanced filtering, providing generalizable lessons for secure AI copilots. → arxiv.org
2026-04-16 2026	When LLMs Autonomously Attack - CMU Research advanced	Research from Carnegie Mellon University demonstrates LLMs can autonomously plan and execute complex cyberattacks by acting as hierarchical agents with abstracted "mental models" of red teaming behavior. This system, evaluated by recreating the 2017 Equifax data breach, shows advanced LLMs can orchestrate multi-step attacks, including exploitation, malware deployment, and data exfiltration, without detailed human instruction, offering potential for continuous, affordable security testing and autonomous defense development.
2026-04-16 2026	The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover advanced	Survey of LLM agent vulnerabilities; demonstrates how 94.4% of 18 tested LLMs succumb to Direct Prompt Injection and 83.3% to RAG Backdoor Attacks, enabling malware execution. Inter-Agent Trust Exploitation compromises 100.0% of models, showcasing context-dependent security behaviors that create exploitable blind spots within multi-agent systems. → arxiv.org
2026-04-16 2026	MCP Tools: Attack Vectors and Defense Recommendations - Elastic Security Labs intermediate	Library detailing attack vectors and defense recommendations for Model Context Protocol (MCP) tools, which connect LLMs to external resources. It explores prompt injection and orchestration exploits, including obfuscated instructions, rug-pull redefinitions, cross-tool orchestration, and passive influence, with examples and a basic LLM-based detection method. Security precautions and defense tactics for MCP tool vulnerabilities are also discussed.
2026-04-16 2026	MCP Safety Audit: LLMs with MCP Allow Major Security Exploits intermediate	Tool for auditing Model Context Protocol (MCP) servers, McpSafetyScanner automatically detects vulnerabilities like malicious code execution, remote access control, and credential theft in generative AI applications. It identifies adversarial samples, searches for related exploits, and generates remediation reports for MCP developers. The tool aims to proactively mitigate security risks introduced by LLMs using the MCP framework, addressing issues present in industry-leading LLMs such as Claude and Llama. → arxiv.org
2026-04-16 2026	AI Security: 5 Attack Vectors Explained beginner	Talk detailing five critical attack vectors targeting Large Language Models (LLMs), including Prompt Injection, Context Injection, LLM Internals Vector, RAG Vector, and Agentic Vector. It highlights the "Zero Trust Gap" in LLMs and discusses encoder models like ModernBERT as potential building blocks for implementing AI guardrails due to their speed, efficiency, and privacy benefits.
2026-04-16 2026	AI agents on GitHub leak API keys via prompt injection news	Library for detecting prompt injection vulnerabilities in AI agents, specifically detailing "Comment and Control" attacks on GitHub Actions. The vulnerability affects Claude Code Security Review (CVSS 9.4 Critical), Google Gemini CLI Action (bounty $1,337), and GitHub Copilot Agent (bypassing environment filtering, secret scanning, and network firewall). Attackers exploit PR titles, issue bodies, and comments to exfiltrate API keys and tokens like ANTHROPIC_API_KEY, GITHUB_TOKEN, GEMINI_API_KEY, and GITHUB_COPILOT_API_TOKEN. → techzine.eu
2026-04-16 2026	MCP Supply Chain Advisory: RCE Vulnerabilities Across the AI Ecosystem news	Advisory detailing a systemic command injection vulnerability within Anthropic's MCP protocol impacting multiple AI ecosystem products. Exploits, including CVE-2025-65720 for GPT Researcher, CVE-2026-30623 for LiteLLM, and CVE-2026-30624 for Agent Zero, allow unauthenticated or authenticated remote command execution by injecting arbitrary commands through MCP configurations in affected applications like LangFlow, Fay Digital Human Framework, and Bisheng. → ox.security
2026-04-15 2026	Risks of artificial intelligence security beginner	Library of security considerations for artificial intelligence, detailing risks from prompt injection and data poisoning to model stealing and generative AI misuse in deepfakes and phishing. It highlights vulnerabilities in AI systems, adversary misuse of generative AI, and unintended consequences like bias and data leakage, emphasizing challenges posed by LLM integrations with tools and third-party dependencies. The summary also touches on AI-generated code risks and the escalating concern of autonomous AI attack bots. → blockchain-council.org
2026-04-15 2026	Agentic LLM Browsers Expose New Attack Surface for Prompt Injection and Data Theft intermediate	Agentic LLM Browsers Expose New Attack Surface for Prompt Injection and Data Theft https://ift.tt/KeHF0om → cybersecuritynews.com
2026-04-15 2026	Agents hooked into GitHub can steal creds but Anthropic Google and Microsoft haven't warned users news	Library for detecting prompt injection vulnerabilities in AI agents integrated with GitHub Actions. Researchers demonstrated that agents like Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot can be tricked via "comment and control" prompt injection into leaking API keys and GitHub access tokens. This attack can occur proactively when pull requests are opened or issues are filed, bypassing existing security layers. → theregister.com
2026-04-14 2026	Check Point Releases AI Factory Security Blueprint to Safeguard AI Infrastructure from GPU Servers to LLM Prompts beginner	Blueprint for securing AI infrastructure, safeguarding GPU servers to LLM prompts. This vendor-tested reference architecture, developed by Check Point, offers layered protection across perimeter, application and LLM, AI infrastructure, and workload and container layers. It addresses threats like prompt injection, data exfiltration, and lateral movement within Kubernetes, leveraging technologies from Check Point and NVIDIA BlueField DPUs via the NVIDIA DOCA software platform.
2026-04-14 2026	AI Agents Drive Exposure of 29 Million Credentials news	AI Agents Drive Exposure of 29 Million Credentials https://ift.tt/zyb7MrR → letsdatascience.com
2026-04-14 2026	Claude Mythos Changed Everything. Your APIs Are the First Target. news	Platform for agentic security, Salt's Agentic Security Platform addresses the immediate threat posed by AI models like Claude Mythos, which can autonomously discover and exploit zero-day vulnerabilities. It provides continuous, real-time discovery of all API assets, including undocumented and shadow APIs, mapping the full agentic attack surface. The platform then assesses posture, identifying exposures like unauthenticated APIs and excessive permissions, enabling prioritized remediation to fix vulnerabilities before they can be exploited by AI-powered attackers. → securityboulevard.com
2026-04-13 2026	AI Coding Security Vulnerability Statistics 2026: Alarming Data news	Survey of AI coding security vulnerability statistics reveals alarming trends, with up to 62% of AI-generated code containing flaws. Veracode's 2025 analysis shows 45% of AI-generated code fails security tests, and 86% of organizations use third-party packages with critical vulnerabilities in AI-driven environments. Common issues include SQL injection, XSS, log injection, hardcoded credentials, and insecure cryptographic implementations. Java exhibits a 71% failure rate, while Python has a 38% failure rate, highlighting language-specific risks. The report notes a 10x increase in monthly security findings from AI code and a 153% rise in design-level flaws. Prompt injection is now the top OWASP risk for LLM applications. → sqmagazine.co.uk
2026-04-13 2026	GitHub - schwartz1375/genai-security-training beginner Talks	Library of self-paced training materials for security researchers red teaming GenAI and AI/ML systems. It covers adversarial attacks, security vulnerabilities, privacy breaches, model manipulation, evasion techniques, and system-level exploits like prompt injection and jailbreaking. The curriculum includes hands-on labs using tools such as Adversarial Robustness Toolbox (ART), TextAttack, and SHAP, along with theoretical content and references to OWASP LLM Top 10 and MITRE ATLAS.
2026-04-13 2026	GitHub - schwartz1375/genai-essentials beginner Talks	Collection of Jupyter notebooks detailing Generative AI and Large Language Model concepts, prioritizing security considerations. The sequence progresses from core LLM principles and agent introductions to advanced topics like Retrieval-Augmented Generation (RAG), multimodal LLMs, agent frameworks (ReAct, Plan-Execute), and Model Context Protocol (MCP) integration for tool extensibility. Dependencies include Python 3.8+ and Jupyter.
2026-04-12 2026	Could Sock Puppeting Be the New Trick Jailbreaking Major LLMs? news	Technique for jailbreaking LLMs using "sockpuppeting" exploits assistant prefill APIs across major models like Gemini 2.5 Flash and GPT-4o-mini. This method injects a fake acceptance message into the assistant's role, forcing models to bypass safety guardrails and generate prohibited content, including malicious exploit code and system prompts. Providers like OpenAI and AWS Bedrock mitigate this by blocking assistant prefills entirely, while platforms like Google Vertex AI are susceptible due to differing message handling. Security teams are advised to incorporate this vulnerability into AI red-teaming and implement API-layer message ordering validation.
2026-04-11 2026	LLM Red Teaming Guide (Open Source) - Promptfoo intermediate	Library for systematic LLM red teaming, focusing on generating adversarial inputs like prompt injection and jailbreaking to evaluate responses. It supports black-box testing, quantifying risk, and integrating into CI/CD pipelines for applications involving RAG, LLM agents, or chatbots, addressing vulnerabilities such as information leakage, API misuse, and privacy violations.
2026-04-11 2026	Defining LLM Red Teaming - NVIDIA Technical Blog beginner	Analysis defining LLM red teaming as a limit-seeking, manual, and creative practice focused on discovering model deviations rather than malicious harm. It categorizes strategies into language, rhetorical, possible worlds, fictionalizing, and stratagems, identifying 35 specific techniques for exploring LLM vulnerabilities. This approach complements automated benchmarking by leveraging human intuition to uncover novel risks, a crucial element in NVIDIA's trustworthy AI development process.
2026-04-11 2026	Large Reasoning Models are Autonomous Jailbreak Agents advanced	Survey of Large Reasoning Models as autonomous jailbreak agents, evaluating DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, and Qwen3 235B. These models autonomously planned and executed multi-turn conversations with nine target models, achieving a 97.14% jailbreak success rate across harmful prompts. The research highlights an "alignment regression" dynamic, where advanced LRMs can erode the safety guardrails of earlier models.
2026-04-11 2026	Involuntary Jailbreak: On Self-Prompting Attacks advanced	Library disclosing "involuntary jailbreak," a new LLM vulnerability. This technique employs a single universal prompt to compel models like Claude Opus 4.1, Grok 4, Gemini 2.5 Pro, and GPT 4.1 to generate previously rejected questions and their detailed answers, potentially compromising the entire guardrail structure rather than localized components. → arxiv.org
2026-04-11 2026	Single Line of Code Can Jailbreak 11 AI Models Including ChatGPT, Claude, Gemini intermediate	Single Line of Code Can Jailbreak 11 AI Models Including ChatGPT, Claude, Gemini → cyberpress.org
2026-04-11 2026	OWASP Top 10 for LLMs 2025: Key Risks and Mitigation Strategies beginner	Survey of the OWASP Top 10 for LLM Applications (2025), detailing evolving technical and socio-technical risks like prompt injection and excessive agency. This updated list guides enterprises in securing generative AI ecosystems, from training pipelines to plugins, addressing data disclosure and systemic vulnerabilities relevant to GDPR, HIPAA, CCPA, and the EU AI Act. Invicti's proof-based scanning and LLM-specific checks are presented as tools to validate real risks and strengthen defenses. → invicti.com
2026-04-11 2026	OWASP Top 10 for LLM Applications 2025 beginner	OWASP Top 10 for LLM Applications 2025 → genai.owasp.org
2026-04-11 2026	Practical Poisoning Attacks against Retrieval-Augmented Generation advanced	Library introducing CorruptRAG, a novel poisoning attack against Retrieval-Augmented Generation (RAG) systems. This technique injects a single poisoned text into the knowledge database, significantly enhancing attack feasibility and stealth compared to prior methods that required numerous poisoned entries. Experiments on large-scale datasets validate CorruptRAG's effectiveness in compromising RAG outputs. → arxiv.org
2026-04-11 2026	RAG Safety: Exploring Knowledge Poisoning Attacks to RAG advanced	Analysis of knowledge poisoning attacks targeting Retrieval-Augmented Generation (RAG) systems, specifically focusing on KG-RAG. This work introduces a practical, stealthy attack strategy that inserts perturbation triples into knowledge graphs to create misleading inference chains, degrading KG-RAG performance. Experiments demonstrate the attack's effectiveness against four recent KG-RAG methods with minimal KG perturbations. → arxiv.org
2026-04-11 2026	Benchmarking Poisoning Attacks against Retrieval-Augmented Generation advanced	Benchmark framework for evaluating poisoning attacks on Retrieval-Augmented Generation (RAG) systems. This benchmark includes 5 standard QA datasets, 10 expanded variants, 13 poisoning attack methods, and 7 defense mechanisms. Findings reveal that while current attacks are effective on standard datasets, their impact diminishes on expanded versions, and advanced RAG architectures like sequential, branching, conditional, loop, conversational, multimodal RAG, and RAG-based LLM agents remain vulnerable, with existing defenses proving insufficient. → arxiv.org
2026-04-11 2026	Q4 2025 AI Agent Security Trends news	Report on Q4 2025 AI agent security trends, detailing real-world attacks targeting emergent agentic AI systems. Analysis of production traffic reveals attacker focus on system prompt leakage, indirect prompt injection via trusted external content, and exploitation of new surfaces like tool use and script-shaped content. Core techniques include role play and obfuscation to bypass safeguards, with indirect attacks proving more efficient than direct ones.
2026-04-11 2026	OWASP GenAI Top 10 Risks and Mitigations for Agentic AI Security beginner	Library defining the OWASP Top 10 for Agentic Applications, a comprehensive resource for identifying and mitigating risks associated with autonomous AI agents. Developed through input from over 100 industry leaders, it highlights threats such as Agent Behavior Hijacking, Tool Misuse and Exploitation, and Identity and Privilege Abuse. This framework complements existing OWASP GenAI resources, offering practical, actionable guidance grounded in real-world attacks and mitigations to promote the secure development and deployment of generative AI systems. → genai.owasp.org
2026-04-11 2026	AI Agent Attacks in Q4 2025 Signal New Risks for 2026 news	Analysis of Q4 2025 AI agent attacks highlights evolving threats including system prompt extraction via hypothetical scenarios and obfuscation. Attackers also bypass content controls using indirect methods and probe agents for weaknesses. New attack paths emerge through agentic capabilities like document browsing and tool calls, often via indirect prompt injection. Organizations must extend security controls, validate external content, enforce least-privilege access, and prepare AI-specific incident response. → esecurityplanet.com
2026-04-11 2026	Protecting Against Indirect Prompt Injection Attacks in MCP intermediate	Library for mitigating Indirect Prompt Injection attacks within the Model Context Protocol (MCP). This resource details vulnerabilities like Tool Poisoning, where malicious instructions are embedded in tool metadata, and recommends implementing AI Prompt Shields with techniques like "Spotlighting" and "Datamarking." It also emphasizes supply chain security and general security hygiene as crucial for safeguarding AI systems.
2026-04-11 2026	Indirect Prompt Injection Attacks: Hidden AI Risks intermediate	Library for defending against indirect prompt injection attacks, a sophisticated AI threat recognized by OWASP as a top risk. This library addresses vulnerabilities where malicious instructions are embedded in external content like documents, emails, or images, rather than being submitted directly to an AI agent. It aims to mitigate risks such as data exfiltration and manipulation of business processes by enabling prompt injection detection, input validation, and the establishment of content security policies, similar to CrowdStrike's approach using its Falcon platform.
2026-04-11 2026	Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild intermediate	Writeup detailing observed in-the-wild indirect prompt injection (IDPI) attacks targeting AI agents. The analysis highlights real-world cases including AI-based ad review evasion, SEO manipulation for phishing, data destruction, and sensitive information leakage. It discusses 22 distinct payload engineering techniques and classifies attacker intents, emphasizing the growing weaponization of IDPI beyond theoretical risks. → unit42.paloaltonetworks.com
2026-04-11 2026	Anatomy of an Indirect Prompt Injection beginner	Library detailing the CFS (Context, Format, Salience) model for understanding indirect prompt injection in LLMs. It analyzes vulnerabilities, drawing on concepts like Simon Willison's "lethal trifecta" (access to private data, untrusted content exposure, external communication), and examines how attackers refine tactics to bypass LLM security. Real-world examples, such as the Supabase Model Context Protocol (MCP) attack, illustrate the dangers of embedding malicious instructions within seemingly benign data, leading to unauthorized data exposure or system compromise.
2026-04-11 2026	New Prompt Injection Attack Vectors Through MCP Sampling intermediate	Writeup of new prompt injection attack vectors targeting the Model Context Protocol (MCP) sampling feature. Exploiting the implicit trust model and lack of built-in security controls, attackers can achieve resource theft, conversation hijacking, and covert tool invocation. The analysis details three proof-of-concept examples and evaluates mitigation strategies for MCP-based systems, highlighting vulnerabilities in this LLM integration standard. → unit42.paloaltonetworks.com
2026-04-11 2026	A Timeline of Model Context Protocol (MCP) Security Breaches news	Timeline details MCP security breaches from April to December 2025, highlighting vulnerabilities like "tool poisoning" in WhatsApp MCP, prompt injection in GitHub MCP leading to data exfiltration, cross-tenant access flaws in Asana MCP, and remote code execution in Anthropic's MCP Inspector. Other incidents include OS command injection in `mcp-remote` (CVE-2025-6514), sandbox escapes in Anthropic's Filesystem-MCP server, supply-chain compromises via malicious MCP servers, systemic MCP design flaws enabling RCE in Flowise, and path traversal in Smithery MCP hosting.
2026-04-11 2026	The Vulnerable MCP Project: Comprehensive MCP Security Database beginner	Library of known vulnerabilities impacting MCP (Model Configuration Protocol) servers and SDKs. This catalog details specific exploits such as CVE-2025-68145, CVE-2025-68143, and CVE-2025-68144, alongside broader attack classes including prompt injection, DNS rebinding, Server-Side Request Forgery (SSRF), and command injection. Vulnerabilities affect various products like Anthropic's mcp-server-git, MCP TypeScript SDK, Cursor IDE, and Grafana MCP server, often enabling arbitrary code execution, data exfiltration, or unauthorized transactions.
2026-04-11 2026	MCP Security: Critical Vulnerabilities Every CISO Must Address in 2025 intermediate	Library detailing critical vulnerabilities in Model Context Protocol (MCP), a new standard for AI-tool integration. It highlights how prompt injection attacks in MCP ecosystems can trigger automated actions through connected systems, potentially leading to sensitive data exfiltration. The library also addresses supply chain risks, explaining how MCP servers can dynamically modify tool definitions, allowing for "rug pull" attacks where previously approved tools can be repurposed for malicious activity, affecting vendors like Microsoft and impacting applications such as Nginx-ui (CVE-2026-33032) and Adobe Acrobat Reader.
2026-04-11 2026	OWASP LLM Prompt Injection Prevention Cheat Sheet beginner	Reference LLM Prompt Injection Prevention Cheat Sheet detailing vulnerabilities in Large Language Model applications. It covers direct and indirect prompt injection, encoding and obfuscation techniques like Base64 and Unicode smuggling, and typoglycemia-based attacks. The resource also discusses jailbreaking methods such as DAN prompts, multi-turn attacks, system prompt extraction, data exfiltration, multimodal injection, RAG poisoning, and agent-specific attacks. Defenses include input validation and sanitization, with code examples for pattern matching and fuzzy matching against typoglycemia variants. → cheatsheetseries.owasp.org
2026-04-11 2026	Attention Tracker: Detecting Prompt Injection Attacks in LLMs intermediate	Attention Tracker: Detecting Prompt Injection Attacks in LLMs
2026-04-11 2026	How Microsoft Defends Against Indirect Prompt Injection Attacks intermediate	Library that defends against indirect prompt injection attacks targeting LLM-based systems. This library implements a multi-layered defense strategy including preventative techniques like hardened system prompts and Spotlighting, detection tools such as Microsoft Prompt Shields integrated with Defender for Cloud, and impact mitigation through data governance, user consent workflows, and deterministic blocking. It addresses vulnerabilities like data exfiltration via HTML images, clickable links, tool calls, and covert channels, as well as unintended actions and phishing. → microsoft.com
2026-04-10 2026	AI Cybersecurity After Mythos: The Jagged Frontier intermediate	Library for AI-driven vulnerability discovery, demonstrating that smaller, cheaper open-weight models can recover significant analysis from Anthropic's Mythos showcase, including detecting exploit candidates for FreeBSD and OpenBSD bugs. This work emphasizes that the effectiveness of AI cybersecurity lies in the surrounding system architecture and deep security expertise, rather than solely on frontier model scale, impacting the economics of the defensive pipeline.
2026-04-10 2026	Anthropic announces Claude Mythos for cybersecurity research news	Library for AI-driven cybersecurity research, Claude Mythos Preview autonomously identifies zero-day vulnerabilities and develops exploits. It has discovered critical issues in OpenBSD, FFmpeg, and the Linux kernel. Access is offered to select partners via Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry, with an application process for open-source maintainers. Anthropic provides usage credits and donations to security foundations, highlighting significant advances in autonomous vulnerability discovery over prior models.
2026-04-10 2026	Crushing the Axios supply chain threat with Tenable Hexa AI: Use cases for agentic AI intermediate	Tool for identifying exposure to the Axios npm supply chain attack using Tenable Hexa AI. This agentic AI automates scanning, asset identification, and remediation verification, mirroring workflows applicable to other emerging threats like CVEs and zero-days. It enables rapid assessment of exposure, scoping blast radius through asset tagging, and efficient prioritization, transforming emergency response from manual scripting to conversational command. → securityboulevard.com
2026-04-10 2026	MCP Security Vulnerabilities: Prompt Injection and Tool Poisoning intermediate	Library for securing Model Context Protocol (MCP) deployments against prompt injection and tool poisoning. It details vulnerabilities like metadata poisoning, over-permissioned tools, supply chain risks, and indirect prompt injection, referencing incidents such as the Supabase MCP Lethal Trifecta Attack. The library emphasizes prevention strategies including strict input validation, sanitization, and the principle of least privilege for tools.
2026-04-10 2026	How Agentic Tool Chain Attacks Threaten AI Agent Security intermediate	Library for securing AI agents against agentic tool chain attacks, detailing threats like tool poisoning, tool shadowing, and rugpull attacks that exploit the agent's reasoning layer and natural language-based decision-making. It covers how these attacks can lead to data exfiltration, unauthorized actions, and supply chain risks by manipulating tool descriptions, metadata, and server behavior, and recommends mitigation strategies including tool governance, version control, server identity controls, pre-execution guardrails, and observability.
2026-04-10 2026	8,000+ MCP Servers Exposed: The Agentic AI Security Crisis of 2026 news	8,000+ MCP Servers Exposed: The Agentic AI Security Crisis of 2026
2026-04-10 2026	Agentic AI Security in Production: MCP, Memory Poisoning, Tool Misuse intermediate	Tool, a comprehensive analysis of agentic AI security in production, details critical failure modes including MCP Security, Memory Poisoning, and Tool Misuse. It highlights the evolving threat landscape where agents plan and execute actions, emphasizing system design over prompt-level fixes. Specific vulnerabilities like CVE-2025-68144 in mcp-server-git and attack models such as MINJA and AgentPoison are examined, underscoring the need for robust controls across input, memory, tool execution, and identity planes to manage the expanded attack surface created by these systems. → penligent.ai
2026-04-10 2026	Offensive Security for MCP Servers: How to Prevent AI Agent Exploits intermediate	Library for securing MCP (Multi-Cloud Platform) servers against AI agent exploits, addressing vulnerabilities like command injection, SSRF, and path traversal frequently found in modern deployments. It highlights how AI's autonomous execution and dynamic capability discovery, unlike traditional REST APIs, create new risk classes by enabling agents to chain tool calls and reason across APIs. The library emphasizes adapting security from syntax to intent validation, guarding against prompt injection and tool poisoning where manipulated metadata or input can lead to unintended, privileged operations, ultimately leveraging foundational API security principles.
2026-04-10 2026	The New AI Attack Surface: 3 AI Security Predictions for 2026 beginner	Library for confronting three AI attack vectors manifesting in production by 2026: indirect injection via data poisoning, supply chain infiltration through AI development toolchains like MCP servers, and agent-to-agent attack propagation through "toxic combinations" in autonomous agent ecosystems. These vectors exploit how AI agents interpret instructions, trust data sources, and execute permitted actions, moving beyond traditional code vulnerabilities to exploit data as executable commands and the inherent trust in interconnected AI architectures.
2026-04-10 2026	Introduction to Data Poisoning: A 2026 Perspective beginner	Library introducing data poisoning, an adversarial attack corrupting AI/LLM training data to cause backdoors or biased outputs. It details real-world incidents like Basilisk Venom poisoning GitHub code, Qwen 2.5's search tool manipulation, Grok 4's "!Pliny" backdoor triggered by X prompts, and hidden instructions in MCP tools like "joke_teller." The library also covers poisoning in retrieval (RAG), synthetic data pipelines (VIA), and diffusion models, highlighting how even small, hidden manipulations can undermine AI safety and trust across the entire LLM lifecycle.
2026-04-10 2026	AI Security Research — December 2025 news	AI Security Research — December 2025
2026-04-10 2026	From Prompt Injections to Protocol Exploits in LLM Agent Workflows advanced	From Prompt Injections to Protocol Exploits in LLM Agent Workflows
2026-04-10 2026	LLM Security Guide: OWASP GenAI Top-10 Risks beginner	Library detailing offensive and defensive security for Large Language Models and Agentic AI Systems, updated with the OWASP Top 10 for LLMs 2025 and the OWASP Top 10 for Agentic Applications 2026. It covers Agentic AI Security, RAG Vulnerabilities, System Prompt Leakage, Vector/Embedding Weaknesses, and AI Compliance, incorporating tools like DeepTeam, Promptfoo, ARTKIT, and frameworks such as Meta LlamaFirewall and Amazon Bedrock Guardrails.
2026-04-10 2026	Prompt Injection Attacks in LLMs: A Comprehensive Review intermediate	Prompt Injection Attacks in LLMs: A Comprehensive Review
2026-04-10 2026	Prompt Injection Attacks: Examples, Techniques, and Defence intermediate	Library for understanding and defending against prompt injection, a critical LLM security vulnerability. It details direct and indirect injection techniques, including examples like DAN jailbreaks, EchoLeak (CVE-2025-32711), and webpage poisoning attacks, as reported by OWASP, NCSC, and Anthropic. This resource provides practical defense strategies and highlights the inherent challenges in distinguishing trusted instructions from untrusted data within LLM architectures.
2026-04-10 2026	Indirect Prompt Injection: The Hidden Threat intermediate	Library for understanding and defending against indirect prompt injection, a vulnerability where hidden instructions within ingested data (webpages, PDFs, emails, code) can hijack AI reasoning or tool actions. It details real-world incidents like the Perplexity Comet leak and CVE-2025-59944, highlighting how agentic AI amplifies risk. Mitigation requires architectural changes, not prompt tuning, focusing on trust boundaries, context isolation, and output verification.
2026-04-10 2026	AI Agent Security in 2026: Prompt Injection and Memory Poisoning intermediate	Library for understanding AI agent security risks, focusing on prompt injection and memory poisoning attacks. It details indirect prompt injection's impact via emails and documents, exemplified by CVE-2025-32711, and memory poisoning attacks like MemoryGraft, where agents develop false beliefs. The library also covers tool misuse through hidden instructions in metadata, misleading examples, and permissive schemas, observed in frameworks like CrewAI and AutoGen, and discusses supply chain vulnerabilities where agents fetch runtime dependencies without human review.
2026-04-10 2026	Prompt Injection Attacks in 2025: Vulnerabilities and Defense beginner	Library for defending against prompt injection attacks, a significant threat to AI applications highlighted by CVE-2025-32711 and techniques like "EchoLeak." It addresses direct, indirect, and agentic injection methods, including those targeting LangChain with CVE-2025-68664 ("LangGrinch") and demonstrations against Gemini. The library supports defenses like input validation with pattern matching and structured prompt architecture using randomized delimiters, drawing insights from tools like Lakera Guard and Microsoft Prompt Shields.
2026-04-10 2026	Prompt Injection: The Most Common AI Exploit in 2025 beginner	Library detailing prompt injection, the most common AI exploit in 2025, which manipulates AI instructions rather than code. It categorizes attacks into direct, indirect, jailbreak, and cross-plugin poisoning, highlighting risks to enterprise RAG systems and SaaS security operations. The resource emphasizes robust AI agent identity, authorization, continuous monitoring with anomaly detection, and integrating AI security telemetry into existing SIEM infrastructure, aligning with frameworks like NIST AI RMF and ISO/IEC 42001.
2026-04-10 2026	AI Prompt Injection Attacks: How They Work (2026) beginner	Library for defending against AI prompt injection attacks, detailing their evolution from academic curiosities to operational threats with documented cases affecting OpenAI's GPT models and Anthropic's Claude. It covers attack mechanisms like "instruction confusion," evolving vectors such as encoding-based and multi-turn conversation attacks, and real-world incidents like the OpenClaw vulnerability, demonstrating data exfiltration and financial losses totaling $2.3 billion globally in 2025. The library addresses insufficient input sanitization, overprivileged AI agents, and a lack of output validation, highlighting detection gaps where current methods catch only 23% of sophisticated attempts.
2026-04-10 2026	LLM Security Risks in 2026: Prompt Injection, RAG, and Shadow AI beginner	Library for mitigating LLM security risks, including prompt injection, RAG data poisoning, and autonomous exploits like EchoLeak demonstrated against Microsoft 365 Copilot. It addresses the blurred line between data and instructions, AI outputs triggering actions, and the human element in vulnerabilities, emphasizing containment strategies like limiting AI privileges and validating outputs.
2026-04-09 2026	Claude Code security settings nobody told you about beginner	Claude Code security settings nobody told you about
2026-04-09 2026	LangChain Langflow LiteLLM: When AI's Foundation Code Becomes the Attack Surface intermediate	Library of vulnerabilities impacting foundational AI frameworks like LangChain, LangGraph, Langflow, and LiteLLM, including path traversal (CVE-2026-34070), serialization injection (CVE-2025-68664), SQL injection (CVE-2025-67644), and remote code execution (CVE-2026-33017). The article also details a supply chain attack on LiteLLM via a compromised Trivy security scanner, highlighting the systemic risks in AI infrastructure. → securityboulevard.com
2026-04-09 2026	Is 46% of your AI-generated code vulnerable? beginner	Platform for securing AI-generated code, addressing research showing 46% of AI code contains vulnerabilities. It integrates Software Composition Analysis (SCA), Static Application Security Testing (SAST), and Dynamic Application Security Testing (DAST) directly into IDEs and LLMs like Gemini and GitHub Copilot, while also integrating with tools from Wiz, Snyk, and Black Duck. The platform emphasizes continuous governance throughout the Software Development Life Cycle (SDLC) and maintains the necessity of human oversight for final code acceptance and remediation. → techzine.eu
2026-04-09 2026	Claude Code Can Be Manipulated via CLAUDE.md to Run SQL Injection Attacks intermediate	Library that allows manipulation of Claude Code via CLAUDE.md files to automate SQL injection attacks and steal credentials. Researchers at LayerX discovered that by adding three lines of basic English to the CLAUDE.md file, Claude Code's safety guardrails can be bypassed, leading it to execute unauthorized commands and perform actions such as login bypass and database dumping using techniques like SQL injection. The AI trusts the instructions within the CLAUDE.md file implicitly, creating a significant attack surface. → hackread.com
2026-04-08 2026	theNET \| De-risking the AI rollout intermediate	Library for de-risking AI rollouts, providing probabilistic security to address novel threats like prompt injection, data poisoning, and denial-of-wallet attacks. It emphasizes model-agnostic, inline protection, input/output monitoring, observability, and integration with traditional application security to safeguard AI-powered applications against deterministic and unpredictable attack paths.
2026-04-08 2026	AI Security Risks: How Enterprises Manage LLM Shadow AI and Agentic Threats intermediate	Library for AI Security Posture Management (AISPM) designed to provide enterprises with visibility and control over LLM shadow AI and agentic threats. It addresses risks including prompt injection, jailbreaking, data poisoning, and data leakage from unsanctioned AI tools. The library focuses on the emerging threat landscape of agentic AI, where autonomous systems can execute multi-step actions, and highlights the critical risk of Agent Goal Hijacking as outlined in the OWASP Agentic Top 10. → securityboulevard.com
2026-04-06 2026	Best AI Security Tools in 2026 beginner	Platforms for AI security are ranked by their coverage of three critical phases: discovering AI assets and mapping threat graphs (Phase 1), conducting adversarial testing against live applications and RAG pipelines (Phase 2), and deploying runtime guardrails calibrated from red teaming results (Phase 3). Repello AI offers full-lifecycle coverage with its Inventory, ARTEMIS, and ARGUS products. HiddenLayer focuses on model artifact scanning and runtime model anomaly detection. Mindgard provides automated multimodal AI security testing, primarily for Phase 2. Lakera, now part of Check Point, specialized in runtime guardrails for LLM applications.
2026-04-06 2026	OWASP Top 10 for Agents 2026 beginner	Framework for assessing OWASP Agentic AI (ASI) Top 10 2026 risks, including Agent Goal Hijack (ASI01), Tool Misuse & Exploitation (ASI02), and Agent Identity & Privilege Abuse (ASI03). It addresses vulnerabilities introduced by autonomous agents' reasoning, memory, tool integration, and multi-step execution, detecting issues like unexpected code execution (ASI05) and insecure inter-agent communication (ASI07). The framework integrates with DeepTeam's red teaming capabilities for programmatic risk assessment.
2026-04-06 2026	Google Workspace's Continuous Approach to Mitigating Prompt Injection intermediate	Library detailing Google Workspace's continuous approach to mitigating Indirect Prompt Injection (IPI) attacks against Gemini. It outlines proactive strategies including human and automated red-teaming, the AI Vulnerability Rewards Program, and public attack monitoring to discover and catalog new vulnerabilities. The library emphasizes ongoing defense refinement through deterministic and ML-based defenses, LLM prompt engineering, and Gemini model hardening, utilizing synthetic data generation via Simula for robust attack variant expansion and defense model retraining.
2026-04-06 2026	Prompt Injection Attacks in LLMs: What Developers Need to Know in 2026 beginner	Guide on prompt injection attacks in LLMs, detailing how attackers manipulate models using natural language to override system instructions. It covers direct (jailbreaking) and indirect injection, citing examples like the Chevrolet dealership GPT and Perplexity Comet credential theft incidents. Developers are advised to implement architectural separation of instructions, conversation token limits, input filtering, AI guardrails, and developer training to mitigate these risks.
2026-04-05 2026	LangChain LangGraph Flaws Expose Files Secrets Databases in Widely Used AI Frameworks intermediate	Library vulnerabilities in LangChain and LangGraph, specifically CVE-2026-34070 (path traversal), CVE-2025-68664 (deserialization of untrusted data), and CVE-2025-67644 (SQL injection), allow attackers to access arbitrary files, steal API keys and environment secrets, and manipulate SQL queries. These flaws, impacting widely used LLM application frameworks, have been patched in recent versions of langchain-core and langgraph-checkpoint-sqlite. → thehackernews.com
2026-04-04 2026	Detecting and analyzing prompt abuse in AI tools intermediate	Playbook detailing detection, investigation, and response to AI prompt abuse. It covers direct prompt overrides, extractive prompt abuse against sensitive inputs, and indirect prompt injection, including the HashJack technique affecting AI summarization tools via URL fragments. This guide leverages Microsoft security tools like Defender for Cloud Apps, Purview DLP, Microsoft Entra ID conditional access, and Microsoft Sentinel to monitor AI interactions and protect against manipulation. → microsoft.com
2026-04-03 2026	Prompt Injection and LLM Jailbreaks: Defenses intermediate	Survey of prompt injection and LLM jailbreak defenses, addressing risks in generative AI and agentic workflows. It differentiates between instruction hijacking and policy evasion, detailing why modern long-context and tool-using systems amplify attack impact. The survey outlines common attack patterns like instruction override and hidden instructions, then proposes layered defenses including inference-time filtering, independent guardrails, model-level hardening techniques like salting, and secure architectural controls for tool-using systems. → blockchain-council.org
2026-04-03 2026	Training an AI agent to attack LLM applications like a real adversary advanced	Tool that simulates adversarial attacks against LLM-powered applications. This AI pentesting agent autonomously chains techniques like prompt injection, indirect prompt injection, and tool abuse to uncover vulnerabilities missed by traditional scanners. It gathers application context, probes role-based access control, and supports models from OpenAI, Anthropic, and open-source providers, integrating into CI/CD pipelines for continuous testing. Novee Security's agent is trained on real-world vulnerability research, including findings like arbitrary code execution in the Cursor coding assistant. → helpnetsecurity.com
2026-04-03 2026	Prompt Injection Attacks in LLMs: Vulnerabilities, Exploitation & Defense intermediate	Prompt Injection Attacks in LLMs: Vulnerabilities, Exploitation & Defense
2026-04-03 2026	How AI Red Teaming Fixes Vulnerabilities in Your AI Systems intermediate	Library for AI Red Teaming provides a practical playbook for CISOs and AI leaders to test AI systems, including LLMs and chatbots, for vulnerabilities before deployment. It simulates attacks and misuse to identify weaknesses across prompts, data, and agent interactions, addressing risks like prompt injection, data leakage, and abuse of model autonomy. This method moves beyond isolated model testing to system-wide evaluation in operational settings, aligning with frameworks like MITRE ATLAS, EU AI Act, and NIST's AI Risk Management Framework to ensure safe and compliant AI use.
2026-04-03 2026	What Is Prompt Injection in AI? Examples & Prevention \| EC-Council beginner	Library for defending against prompt injection attacks, a technique where attackers manipulate AI systems through malicious instructions embedded in prompts. This resource details direct and indirect injection methods, citing real-world vulnerabilities like CVE-2025-53773 affecting GitHub Copilot and ChatGPT's Azure backdoor. It also highlights attacks against Google Jules and Devin AI, emphasizing the enterprise-wide compromise risks due to AI access to sensitive data and infrastructure. Mitigation strategies include zero-trust AI architecture, strict privilege separation, real-time threat detection, human-in-the-loop approvals, and continuous red teaming.
2026-04-03 2026	Prompt Injection Attacks in 2025: Risks, Defenses & Testing intermediate	Library for detecting and mitigating prompt injection attacks in LLM-powered applications. It focuses on adversarial input testing, prompt isolation analysis, output validation, and workflow abuse simulation to uncover risks missed by traditional security tools. The library addresses how malicious instructions can manipulate model behavior, spread through trusted content, and create business-level impact, emphasizing that prompt injection is a trust problem at the intersection of application logic, content ingestion, and workflow design.
2026-04-03 2026	Red Teaming the Mind of the Machine: Evaluation of Prompt Injection and Jailbreak Vulnerabilities intermediate	Survey of prompt injection and jailbreak vulnerabilities against state-of-the-art LLMs including GPT-4, Claude 2, Mistral 7B, and Vicuna. This research categorizes over 1,400 adversarial prompts and analyzes their success rates, generalizability, and construction logic, drawing from public repositories and forums. The study also proposes layered mitigation strategies and recommends a hybrid red-teaming and sandboxing approach for robust AI security, noting prompt injection as a critical vulnerability identified by OWASP. → arxiv.org
2026-04-03 2026	Practical LLM Security Advice from the NVIDIA AI Red Team intermediate	Library summarizing NVIDIA AI Red Team findings, detailing common LLM application vulnerabilities. It addresses risks like remote code execution (RCE) from executing LLM-generated code (e.g., via `exec` or `eval`), insecure permissions in Retrieval-Augmented Generation (RAG) data stores leading to data leakage and prompt injection, and data exfiltration through active content rendering of Markdown or hyperlinks. Mitigation strategies include sandboxing dynamic code, rigorously managing RAG permissions, and sanitizing LLM output.
2026-04-03 2026	OWASP Top 10 for LLMs 2025 \| DeepTeam Red Teaming Framework beginner	Framework integrating OWASP Top 10 for LLMs 2025 risks, including Prompt Injection (LLM01), Sensitive Information Disclosure (LLM02), Supply Chain (LLM03), Data and Model Poisoning (LLM04), Improper Output Handling (LLM05), Excessive Agency (LLM06), System Prompt Leakage (LLM07), and Vector and Embedding Weaknesses (LLM08). It facilitates detection of vulnerabilities in RAG systems and autonomous agents through programmatic assessment or the Confident AI platform.
2026-04-03 2026	Continuously Hardening ChatGPT Against Prompt Injection \| OpenAI intermediate	Continuously Hardening ChatGPT Against Prompt Injection \| OpenAI
2026-04-03 2026	Red Teaming LLMs Exposes a Harsh Truth About the AI Security Arms Race news	Red Teaming LLMs Exposes a Harsh Truth About the AI Security Arms Race
2026-04-03 2026	LLM01:2025 Prompt Injection \| OWASP Gen AI Security beginner	Reference detailing LLM01:2025 Prompt Injection, a vulnerability where user prompts unintendedly alter Large Language Model behavior. The OWASP Gen AI Security resource covers direct and indirect injections, including scenarios like CVE-2024-5184 exploitation in email assistants and multimodal attacks. It outlines mitigation strategies such as constraining model behavior, input/output filtering, and adversarial testing, emphasizing that while prevention is challenging, impact reduction is achievable. → genai.owasp.org
2026-04-03 2026	AI Security Projects for Practice: 10 Hands-On Labs beginner	Labs provide hands-on practice with prompt injection, including direct and indirect attacks, excessive agency, and tool invocation risks, as well as data poisoning techniques like label-flipping and backdoor trigger injection. These projects are crucial for understanding and mitigating threats outlined in the OWASP LLM Top 10 and MITRE ATLAS, covering offensive strategies and defensive hardening across various AI system components, from preprocessing to model integrity checks and DevSecOps pipelines. → blockchain-council.org
2026-04-03 2026	AI Security Roadmap: From Basics to Model Defense beginner	Reference outlining a structured AI security roadmap, progressing from fundamentals to model defense. It highlights unique threats like prompt injection and data poisoning, and maps learning paths to frameworks such as OWASP Top 10 for LLMs, NIST AI RMF, and MITRE ATLAS. The guide also details practical tooling patterns like AI Security Posture Management (AI-SPM) and adversarial testing tools such as Microsoft Counterfit and IBM Adversarial Robustness Toolbox. → blockchain-council.org
2026-04-03 2026	AI Security Certification Guide for 2026 beginner	Guide to AI security certifications for 2026, detailing credentials for technical, governance, and audit roles. It highlights the growing importance of AI-specific risks like prompt injection and data leakage, and aligns certifications with frameworks such as OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, SAIF, and ISO/IEC 42001. The guide emphasizes hands-on assessment and explains how to choose the right credential based on role fit, framework alignment, cost, and industry recognition. → blockchain-council.org
2026-04-02 2026	Guarding LLMs With a Layered Prompt Injection Representation intermediate	Library for LLM security that learns a low-dimensional latent representation of prompt injection attacks. This approach complements perplexity-based filtering and achieves high precision and recall by training a classifier on features derived from this learned representation, distinguishing benign prompts from adversarial ones. → trendmicro.com
2026-04-02 2026	Auditing the Gatekeepers: Fuzzing "AI Judges" to Bypass Security Controls intermediate	Tool for fuzzing AI judges, called AdvJudge-Zero, exploits prompt injection vulnerabilities in LLM-based security gatekeepers. This fuzzer identifies stealthy control tokens, such as formatting symbols and structural phrases, that manipulate the AI's decision-making logic to bypass safety policies and allow prohibited content, or corrupt training data by awarding high scores to incorrect responses. The research demonstrates a 99% success rate in bypassing controls across various LLM architectures, highlighting the need for adversarial training to harden these systems. → unit42.paloaltonetworks.com
2026-04-02 2026	AI Security for Apps is now generally available news	Library for securing AI-powered applications, generally available, offering discovery of AI endpoints, detection of prompt injection and PII exposure, and mitigation via WAF rules. New features include custom topic detection and free AI endpoint discovery for all Cloudflare customers, with expanded integrations with IBM and Wiz for unified security posture management. It addresses risks cataloged in the OWASP Top 10 for LLM Applications, such as prompt injection and sensitive data leakage, by analyzing prompt and output behavior rather than fixed operations.
2026-03-15 2026	mukul975/Anthropic-Cybersecurity-Skills: 734+ structured cybersecurity skills for AI agents · MITRE ATT&CK mapped · agentskills.io standard · Claude Code, Copilot, Codex CLI, Cursor, Gemini CLI beginner	Library of 754 structured cybersecurity skills designed for AI agents, mapped to MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, MITRE D3FEND, and NIST AI RMF. This community project provides production-grade workflows for tasks including memory forensics with Volatility3, Kerberoasting detection via Sigma rules, and cloud breach scoping, enabling AI to perform expert-level investigations across platforms like Claude Code, GitHub Copilot, and Gemini CLI.
2026-03-14 2026	Teaching Claude Everything You've Hacked intermediate	Library that syncs HackerOne bounty history to a local SQLite database and integrates with AI assistants like Claude via the Model Context Protocol (MCP). It cross-references your personal reports and publicly disclosed bounty-awarded reports against target scopes, identifying overlooked areas and profitable weakness types. This tool also includes a database of community-submitted reports and enables Claude to access and reason over your bounty data, assisting in strategy and discovery.
2026-03-12 2026	Needle in the haystack: LLMs for vulnerability research intermediate Bug Bounty	Library for using LLMs in vulnerability research, focusing on minimal scaffolding for effective code auditing. It highlights the problem of context rot in large language models, demonstrating how overly broad prompts and excessive context lead to missed vulnerabilities. Instead, the approach emphasizes creating a targeted threat model derived from previous CVEs and specific entry points to guide LLMs toward discovering nuanced issues, as seen in its case study with Claude Opus and Firefox.
2026-03-12 2026	PatrikFehrenbach/h1-brain: MCP server that connects AI assistants to HackerOne for bug bounty hunting intermediate Bug Bounty	Library for connecting AI assistants to HackerOne bug bounty programs. It ingests personal bug bounty history, program scopes, and report details into a local SQLite database, and also includes a pre-built database of over 3,600 publicly disclosed bounty-awarded HackerOne reports. The core `hack(handle)` tool generates comprehensive attack briefings by combining personal data with community vulnerability write-ups, weakness types, and bounty amounts, suggesting attack vectors against untouched assets.
2026-03-09 2026	GitHub - eliasbiondo/linkedin-mcp-server: 🔗 A Model Context Protocol (MCP) server for LinkedIn — search people, companies, and jobs, scrape profiles, and get structured data via any MCP-compatible AI client. intermediate	Library for accessing LinkedIn data via a Model Context Protocol (MCP) server. It enables searching for people, companies, and jobs, scraping detailed profiles with granular section control (main profile, experience, education, contact info, interests, honors, languages, posts, recommendations), and retrieving structured JSON output. Built with FastMCP and Patchright, it supports both stdio and HTTP transports for various AI client integrations, with session persistence and configurable browser automation settings.
2026-03-08 2026	How I use LLMs For Security Work: Part 2 intermediate Bug Bounty	Library for leveraging Large Language Models (LLMs) in security work, focusing on advanced patterns beyond basic prompting. It details concepts like Agents, Skills (SKILLS.md), Workflows, and Assistants, emphasizing the critical role of providing precise context through documentation, requirements, and decision-making parameters. The article illustrates how well-defined prompts with explicit instructions and expected outputs, as opposed to vague requests, significantly improve LLM inference for tasks like automating browser profile management for threat hunting.
2026-03-01 2026	gadievron/raptor: Raptor turns Claude Code into a general-purpose AI offensive/defensive security agent. By using Claude.md and creating rules, sub-agents, and skills, and orchestrating security tool usage, we configure the agent for adversarial thinking, and perform research or attack/defense operations. intermediate AuthZ	Framework turning Claude Code into an autonomous AI security agent, RAPTOR orchestrates static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing. It employs Semgrep and CodeQL for scanning, using Z3 for dataflow and one-gadget constraint analysis to improve exploit feasibility. RAPTOR supports customizable LLM analysis dispatchers and offers project management features for organized research and reporting.
2026-02-25 2026	hexsecteam/HexSecGPT: HexSecGPT is designed to provide powerful, unrestricted, and seamless AI-driven conversations, pushing the boundaries of what is possible with natural language processing. beginner	Framework for AI-driven conversations that pushes natural language processing boundaries, utilizing third-party APIs from OpenRouter or DeepSeek with a specialized system prompt. This open-source wrapper demonstrates a proof-of-concept, offering a glimpse of HexSecGPT's capabilities through a command-line interface on platforms like Kali Linux, Ubuntu, and Termux. Users can obtain API keys from OpenRouter or DeepSeek for integration. The framework includes installation scripts and a model discovery script for managing API provider model availability.
2026-02-23 2026	ottosulin/awesome-ai-security: A collection of awesome resources related AI security beginner	Library of curated resources covering AI security, including frameworks, standards, learning materials, and open-source tools. It details attack techniques, defense strategies, benchmarks, and specific vulnerabilities, referencing OWASP LLM Top 10, NIST AIRC, MITRE ATLAS, and tools like garak and promptfoo for vulnerability scanning and prompt injection testing. The collection also highlights resources for understanding adversarial attacks such as evasion, poisoning, extraction, and inference, mentioning libraries like Adversarial Robustness Toolkit (ART), cleverhans, and foolbox.
2026-02-21 2026	samugit83/redamon: An AI-powered agentic red team framework that automates offensive security operations, from reconnaissance to exploitation to post-exploitation, with zero human intervention. advanced Recon	Framework that autonomously orchestrates offensive security operations from reconnaissance to post-exploitation, integrating AI agents for vulnerability validation via Hydra, privilege escalation exploits, and XSS mapping. It logs findings in a Neo4j knowledge graph, then utilizes a CypherFix AI triage agent to deduplicate and rank vulnerabilities. A subsequent CodeFix agent clones repositories, applies targeted fixes using 11 code-aware tools, and submits a GitHub pull request for review.
2026-02-20 2026	Microsoft says bug causes Copilot to summarize confidential emails news	Advisory regarding a Microsoft 365 Copilot bug where confidential emails were summarized, bypassing data loss prevention policies. This issue, tracked under CW1226324 and detected January 21, affected the Copilot "work tab" chat feature, incorrectly processing emails in Sent Items and Drafts, even those with confidentiality labels. Microsoft confirmed a code error as the root cause and began rolling out a fix in early February, with remediation continuing for complex service environments. → bleepingcomputer.com
2026-02-18 2026	anthropics/prompt-eng-interactive-tutorial: Anthropic's Interactive Prompt Engineering Tutorial beginner	Tutorial on prompt engineering for Claude, teaching basic prompt structure, failure modes, Claude's capabilities, and building complex prompts for use cases like chatbots, legal, and financial services. It includes an interactive playground for practice, exercises, an answer key, and an appendix covering chaining prompts, tool use, and search/retrieval, recommending the Claude for Sheets extension for user-friendliness.
2026-02-17 2026	vxcontrol/pentagi: ✨ Fully autonomous AI Agents system capable of performing complex penetration testing tasks advanced Recon	Tool for fully autonomous AI-powered penetration testing, PentAGI leverages a team of specialized agents and integrates professional security tools like nmap, metasploit, and sqlmap within a secure Docker environment. It features a smart memory system, knowledge graph integration with Neo4j, and external search capabilities via Tavily, Perplexity, and Google Custom Search, with comprehensive monitoring and reporting through Grafana and PostgreSQL.
2026-02-16 2026	How I Built a 5-Path AI “Recon Beast” with n8n and Gemini (2026 Guide) intermediate Bug Bounty Recon	In 2026, the bug bounty landscape requires more than just speed, with AI enhancing attacker capabilities. The article discusses building a 5-Path AI "Recon Beast" using n8n and Gemini. This innovative approach leverages automation and AI to enhance reconnaissance processes for bug bounty hunting. The focus is on utilizing technology to improve efficiency and effectiveness in identifying vulnerabilities.
2026-02-11 2026	Thread by @firt on Thread Reader App advanced	Library updates detail the early preview of Chrome's WebMCP, enabling AI agents to query and execute services via imperative or declarative APIs. It also highlights Safari/WebKit's unanswered community questions, contrasting with Chrome's PWA installation on Windows 7, 8.x, and 10, which features a distinct "Install" verb and a similar UX to Chromebook PWAs.
2026-02-11 2026	SILENTCHAIN AI - AI-Powered Security Testing intermediate Burp	Library for AI-powered offensive security, covering web applications, source code, and network infrastructure. Features include OWASP Top 10 detection via a Burp Suite extension, standalone web application scanning with CI/CD integration, and AI-powered static code analysis with PoC generation. It integrates with five AI providers, including local Ollama support, and utilizes a RAG Knowledge Engine with over 80,000 security documents. Products offer cross-product correlation for finding escalation, WAF detection and evasion for 25+ types, and out-of-band testing for XSS, SSRF, and XXE.
2026-02-10 2026	Ed1s0nZ/CyberStrikeAI: CyberStrikeAI is an AI-native security testing platform built in Go. It integrates 100+ security tools, an intelligent orchestration engine, role-based testing with predefined security roles, a skills system with specialized testing skills, and comprehensive lifecycle management capabilities. intermediate	Platform that leverages AI for automated security testing. It integrates over 100 tools, including network scanners like nmap, web scanners such as sqlmap, and vulnerability scanners like nuclei. The platform features an intelligent orchestration engine, role-based testing with predefined security roles, and a skills system for specialized testing. It supports conversational commands, attack-chain analysis, knowledge retrieval via RAG, and provides a dashboard for system status and vulnerability management. Integrations include a Burp Suite extension and chatbot capabilities for DingTalk and Lark.
2026-02-07 2026	Agent twitter client mcp beginner	Agent twitter client mcp
2026-02-06 2026	Claude Opus 4.6 Finds 500+ High-Severity Flaws Across Major Open-Source Libraries news Bug Bounty	Library where Claude Opus 4.6 identified over 500 high-severity vulnerabilities in open-source projects like Ghostscript, OpenSC, and CGIF. The LLM demonstrated advanced code reasoning, finding flaws such as a missing bounds check in Ghostscript, a buffer overflow in OpenSC, and a heap buffer overflow in CGIF, even outperforming traditional fuzzers on complex logic-based bugs. → thehackernews.com
2026-02-06 2026	xalgord/AI-System-Prompts: XBot - Advanced AI Cybersecurity Agent \| Gemini system prompt for automated penetration testing and security assessments intermediate Bug Bounty	Library for XBot, an advanced AI cybersecurity agent system prompt for Gemini AI, facilitating automated penetration testing and security assessments. It supports comprehensive vulnerability scanning, active exploitation, OWASP Top 10 and advanced web application security testing, source code analysis, network security, and detailed reporting with remediation guidance. The system prompt enables autonomous operation, multi-target scanning, and robust vulnerability detection on authorized systems.
2026-02-02 2026	depthfirst \| 1-Click RCE To Steal Your Moltbot Data and Keys advanced RCE Secrets	Library that identifies vulnerabilities in OpenClaw, formerly Moltbot, by analyzing its code for logic flaws. The system maps application lifecycle flows, flagging issues like blindly accepting gateway URLs which, when combined with other issues, can lead to a 1-click RCE exploit, CVE-2026-25253. This exploit allows attackers to steal data and keys by chaining a Cross-Site WebSocket Hijacking vulnerability with API calls to disable security features.
2026-02-02 2026	skills/plugins/insecure-defaults/skills/insecure-defaults/SKILL.md at main · trailofbits/skills intermediate	Library for identifying fail-open vulnerabilities in applications, distinguishing exploitable defaults from crash-safe patterns. It aids in security audits by reviewing code, deployment configurations, and IaC templates for issues like fallback secrets, hardcoded credentials, weak defaults in authentication and CORS, insecure crypto algorithms such as MD5 and ECB, and exposed debug features. The library emphasizes analyzing production-reachable code and tracing execution paths to determine runtime behavior and assess the criticality of findings.
2026-02-01 2026	Prompt Injection Toolkit: 25 Payloads & Techniques for Mastering AI Pentesting intermediate Bug Bounty	Prompt Injection Toolkit: 25 Payloads & Techniques for Mastering AI Pentesting Ever tried breaking an AI chatbot with a ‘please ignore all previous instructions’ prompt, only to realize it’s …
2026-01-28 2026	insaaniManav/prompt-forge: AI prompt engineering workbench for crafting, testing, and systematically evaluating prompts with powerful analysis tools. intermediate	Workbench for AI prompt engineering that generates, analyzes, and systematically tests prompts, featuring smart generation with AI suggestions, advanced analysis for optimization feedback, and systematic evaluation creating comprehensive test suites for robustness, safety, accuracy, and creativity. It supports multiple models including Claude 3.5 Sonnet, GPT-4.1, Azure OpenAI, and Ollama, with organized version control and detailed execution history.
2026-01-27 2026	Hunting Account Takeovers in the Wild West of MCP OAuth Servers" intermediate AuthN	Library that details critical OAuth misconfigurations in MCP (Model Context Protocol) servers, enabling one-click account takeover (ATO) attacks. Vulnerabilities include open Dynamic Client Registration (DCR) and missing redirect URI validation, allowing attackers to register malicious clients and intercept authentication codes. The research highlights findings from subdomain enumeration, endpoint discovery, and configuration analysis, focusing on misaligned security settings like unprotected DCR endpoints and unsupported PKCE enforcement.
2026-01-25 2026	Coding Agents. The Insider Threat You Installed Yourself beginner	Coding Agents. The Insider Threat You Installed Yourself Stop Running AI Coding Assistants Blindly AI coding agents are booming everywhere right now. Not only because they help you ship code faster …
2026-01-23 2026	GitHub - mholzen/workflowy: Powerful CLI and MCP server for WorkFlowy: reports, search/replace, backup support, and AI integration (Claude, LLMs) intermediate	Tool for WorkFlowy, offering a CLI and MCP server. It enables AI integration with models like Claude and ChatGPT, alongside features for search, bulk replace, usage reports, and offline access via backup files. This Go-based application supports full-text search with regex, content transformation, and can pipe data through shell commands for LLM processing. Installation is available via Homebrew, Scoop, Go, or pre-built binaries.
2026-01-22 2026	AI’s Hacking Skills Are Approaching an ‘Inflection Point’ news Bug Bounty	Library detecting federated GraphQL vulnerabilities; AI models are increasingly capable of finding zero-day bugs and complex system interactions, as demonstrated by RunSybil's Sybil tool and Dawn Song's CyberGym benchmark. Frontier models like Anthropic's Claude Sonnet 4.5 show significant improvements in vulnerability identification, highlighting the growing need for AI-assisted defense strategies and secure-by-design coding practices. → wired.com
2026-01-18 2026	harishsg993010/crossbow-agent: world's first Opensource fully Autonomous AI Security Engineer intermediate	Library for an autonomous AI security engineer, "crossbow-agent," which finds and exploits vulnerabilities like hardcoded credentials, SQL injection, exposed admin panels, API key leaks, IDOR, command injection, session fixation, XSS, insecure file permissions, missing rate limiting, XXE, CORS misconfigurations, open redirects, JWT secret key leaks, NoSQL injection, SSRF, weak cryptography, race conditions, and directory traversal. It supports multiple AI models (GPT, Claude, Gemini) and integrates with OpenAI, Anthropic, or Google APIs.
2026-01-16 2026	trailofbits/skills: Trail of Bits Claude Code skills for security research, vulnerability detection, and audit workflows intermediate	Library of Claude Code skills from Trail of Bits, enhancing AI-assisted security analysis, vulnerability detection, and audit workflows. This marketplace provides codex-native skill discovery, allowing researchers to browse and install plugins locally or via a git clone. Contributions and bug reports are welcomed.
2026-01-13 2026	Securing AI Systems beginner	Course on securing AI systems, covering adversarial attacks, data poisoning, and model theft. It offers hands-on labs for implementing defenses, conducting red-team simulations, and evaluating weaknesses. You will learn threat modeling, vulnerability assessments, DevSecOps, and incident response within AI/ML workflows, cloud security, and MLOps.
2026-01-11 2026	Certified AI Security Professional - AI Security Certification - Practical DevSecOps beginner	Library covering the Certified AI Security Professional (CAISP) certification, this resource details AI security fundamentals, Large Language Model (LLM) attacks, and defenses. It explores OWASP Top 10 LLM vulnerabilities like prompt injection and training data poisoning, along with AI-DevOps integration. Key attack tactics from MITRE ATT&CK and ATLAS are examined, alongside threat modeling methodologies and supply chain security for AI. Emerging threats, governance, and compliance are also addressed, including discussions on the EU AI Act and NIST RMF.
2025-12-19 2025	KeygraphHQ/shannon: Fully autonomous AI hacker to find actual exploits in your web apps. Shannon has achieved a 96.15% success rate on the hint-free, source-aware XBOW Benchmark. advanced Bug Bounty	Library for fully autonomous, white-box AI pentesting of web applications and APIs. Shannon analyzes source code and executes real exploits, including Injection, XSS, SSRF, and Broken Authentication, to validate vulnerabilities before production. It leverages tools like Nmap and Subfinder, and can handle 2FA/TOTP logins with reproducible proof-of-concept exploits, achieving a 96.15% success rate on the XBOW Benchmark.
2025-12-17 2025	NVIDIA/garak: the LLM vulnerability scanner beginner	Tool for scanning Large Language Models (LLMs), `garak` probes for vulnerabilities like hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It employs static, dynamic, and adaptive probes to identify weaknesses in LLMs accessible via Hugging Face Hub, Replicate, OpenAI API, AWS Bedrock, LiteLLM, and REST endpoints. `garak` helps assess LLM security by mimicking tools like nmap or Metasploit Framework for LLMs, reporting on failure rates and logging detailed run information.
2025-12-13 2025	Building an Open-Source AI-Powered Auto-Exploiter with a 1.7B Parameter Model: No Paid APIs Required advanced	Library for building an open-source, AI-powered autonomous penetration testing agent. This system utilizes a 1.7 billion parameter qwen3:1.7b model, LangChain, and LangGraph for local execution, eliminating API costs and data exfiltration. It functions as a ReAct agent, independently scanning networks with Nmap, searching for exploits using searchsploit, mirroring them, analyzing code with `inspect_exploit_code`, setting up listeners with `start_listener`, and executing commands via `execute_shell_command` to achieve autonomous exploitation.
2025-12-11 2025	📚 tl;dr sec 308 news Supply Chain	😈 MCP Security, ☁️ AWS re:Invent Recaps, 🤖 Detecting Malicious Pull Requests with AI https://t.co/gt4zMQKZpp
2025-12-05 2025	GitHub - amaiya/onprem: A toolkit for applying LLMs to sensitive, non-public data in offline or restricted environments beginner	Library for applying LLMs to sensitive, non-public data locally or in restricted environments. OnPrem.LLM, a Python toolkit inspired by privateGPT, offers full local execution with optional cloud provider integration (OpenAI, Anthropic). It features analysis pipelines for extraction, summarization, and Q&A, supports resource-constrained environments with SparseStore, and integrates with tools like Elasticsearch. Recent updates include an `AgentExecutor` for sandboxed AI agents and support for workflows and asynchronous prompts.
2025-10-30 2025	fr0gger/proximity: Proximity is a MCP security scanner powered with NOVA intermediate Supply Chain	Library for scanning MCP (Model Context Protocol) servers and Agent Skills, Proximity uses NOVA rules to detect security issues like prompt injection and jailbreaks. It performs detailed analysis of server capabilities and skill structures, supporting MCP Spec 2025-11-25 and providing pattern-specific remediation guidance.
2025-10-15 2025	The MCP Security Tool You Probably Need - MCP Snitch intermediate Supply Chain	Library implementing a proxy-based security model for MCP tools, offering a critical mediation layer until native MCP security primitives and platform-level fine-grained scoping are adopted. MCP Snitch intercepts tool calls, enforces user-defined whitelists for operations, and provides visibility and control, mitigating risks like those demonstrated by the GitHub MCP vulnerability. This approach prioritizes explicit allow-listing over deny-listing for robust access control.
2025-10-14 2025	AI For Hackers: Red Team Editions – Codelivly Resources beginner	Manual for offensive AI tradecraft, this 1,100-page guide teaches red teams to build autonomous hacking agents. It covers AI-augmented reconnaissance, polymorphic payload generation using generative models, AI-driven vulnerability discovery with tools like CodeBERT and reinforcement learning fuzzers, and adaptive C2 frameworks. The resource includes 60+ labs, 500+ Python code examples, and methods for bypassing AI-based security with adversarial examples.
2025-10-12 2025	5 Essential MCP Servers That Give Claude & Cursor Real Superpowers (2025) beginner	“” is published by Prithwish Nath in Artificial Intelligence in Plain English.
2025-10-02 2025	Offensive AI - Hacker Associate beginner	Certification program merging traditional web pentesting with AI automation. This hands-on course teaches how to identify, exploit, and report vulnerabilities using GPT agents, LangChain, AutoGPT, and tools like Burp Suite and Turbo Intruder. Modules cover AI-powered reconnaissance, exploitation of access control and XSS, authentication bypass, API testing, automated reporting, and advanced agent development for WAF bypass, business logic flaws, and CI/CD pipeline analysis. It also delves into AI red teaming, adversarial AI testing, and prompt injection attacks.
2025-08-22 2025	Model Context Protocol (MCP): Understanding security risks and controls intermediate	Library for securing Anthropic's Model Context Protocol (MCP), which connects LLMs to external tools. It addresses confused deputy vulnerabilities via OAuth, supply chain risks by requiring signed components and SAST/SCA in build pipelines, unauthorized command execution with input sanitization and sandboxing, prompt injection through user confirmation, and tool injection by enabling version pinning and modification notifications. The library also details mitigation for MCP sampling exploitation and emphasizes logging best practices.
2025-08-13 2025	AI Mastery for Cybersecurity Professionals beginner Talks	Bundle of 10 EC-Council courses focused on applying AI to cybersecurity. This learning resource covers topics such as AI-driven threat detection, LLM pentesting, automated reconnaissance for bug bounty hunting using tools like Nuclei and HTTPX, and defending against generative AI threats like phishing and deepfakes. It aims to equip cybersecurity professionals with skills to automate detection, strengthen defenses, and enhance cyber intelligence.
2025-04-30 2025	#burp #pentest #ai #hackerassociate #cybersecurity #infosec… \| Harshad Shah intermediate Burp Talks	Setting Up #Burp MCP Server on Claude Desktop #Pentest Modern App with #Ai ⇢ Learn how to set up a 𝗕𝘂𝗿𝗽 𝗠𝗖𝗣 𝗦𝗲𝗿𝘃𝗲𝗿 on your 𝗖𝗹𝗮𝘂𝗱𝗲 𝗱𝗲𝘀𝗸𝘁𝗼𝗽 in this easy-to-follow tutorial. ⇢ Get your server up and...
2025-04-13 2025	Building Your First Offensive Security MCP Server - Renae Schilg - Medium intermediate	So, you’ve read the primer here, you have a basic understanding of MCP servers and how they work and now you’re ready to build your own. We are going to be building a simple MCP server that performs…
2025-04-09 2025	Defensive Deception with Kong and Beelzebub LLM Honeypot intermediate	In today’s increasingly sophisticated cyber threat landscape, organizations need to move beyond traditional defensive measures. While firewalls, intrusion detection systems, and vulnerability…
2025-03-24 2025	Prompt Engineering Guide – Nextra beginner	Guide to prompt engineering, a new discipline for optimizing prompts to interact with and develop large language models (LLMs). This resource compiles the latest papers, advanced prompting techniques, learning guides, model-specific guides, lectures, references, new LLM capabilities, and tools, aiming to improve LLM safety and augment capabilities with domain knowledge and external tools.
2025-02-25 2025	GenAI with Python: Build Agents from Scratch (Complete Tutorial) beginner	Prompt Engineering is the practice of designing and refining prompts (text inputs) to enhance the behavior of Large Language Models (LLMs). The goal is to get the desired responses from the model by…
2025-02-14 2025	GitHub - microsoft/generative-ai-for-beginners: 21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ beginner	Library of 21 lessons for building Generative AI applications, covering concepts and code examples in Python and TypeScript. Lessons include Azure OpenAI Service, GitHub Marketplace Model Catalog, and OpenAI API, with "Keep Learning" sections. Basic Python or TypeScript knowledge is recommended, and a GitHub account is required for local cloning and contributions. Sparse checkout instructions are provided to reduce download size by excluding translations.
2025-02-09 2025	GitHub - potpie-ai/potpie: Prompt-To-Agent : Create custom engineering agents for your codebase intermediate	Library for creating AI agents that reason about your codebase. Potpie transforms repositories into knowledge graphs stored in Neo4j, enabling agents to understand code context for debugging and feature development. It supports OpenAI, Ollama, and Anthropic LLM providers, with configurable authentication for GitHub repositories via GitHub Apps or Personal Access Tokens. The architecture includes a FastAPI API layer, Celery workers for asynchronous parsing, and a Neo4j knowledge graph as the core context provider.
2025-02-09 2025	GitHub - eastlondoner/cursor-tools: Give Cursor Agent an AI Team and Advanced Skills intermediate	Library for extending AI coding assistants like Cursor Composer, Cursor, Claude Code, and Codex with advanced skills and an AI team. It integrates with Perplexity for web search and Gemini 2.0 for large context windows, enabling capabilities such as working with GitHub Issues and Linear, generating local documentation, analyzing YouTube videos, and operating web applications via Stagehand. The library offers a CLI for system-wide access and supports multiple AI providers including OpenAI, Anthropic, and OpenRouter.
2025-01-30 2025	Set Up Your Own Cybersecurity-Focused AI Development, Training, and Fine-Tuning Lab at Home intermediate	As AI applications rapidly evolve, commercial platforms like OpenAI, Gemini, and many other LLM versions are offering advanced capabilities…
2025-01-24 2025	GitHub - JasonLovesDoggo/caddy-defender: Caddy module to block IPs and prevent AIs from training on your website. intermediate	Library for Caddy that blocks IP addresses and prevents AI training on websites. It supports IP range filtering, predefined ranges for services like OpenAI and GitHub Copilot, and custom ranges. Responders include blocking, custom messages, dropping connections, returning garbage data, redirection, rate limiting, and tarpitting. Installation is available via a pre-built Docker image.
2025-01-10 2025	SSH LLM Honeypot caught a real threat actor - Beelzebub Blog intermediate	Library for configuring an SSH LLM honeypot using the Beelzebub framework. This resource details how a threat actor was caught downloading binaries with known exploits and attempting to join a botnet via an IRC channel. Analysis of the threat actor's actions, including IP address, credentials, and observed commands, is provided, along with steps to recreate the honeypot setup and details on the Perl script used for DDoS and C2 communication through Undernet IRC channels.
2024-12-31 2024	GitHub - browser-use/browser-use: Make websites accessible for AI agents intermediate	Library for scalable, stealth-enabled browser automation. It enables coding agents like Cursor and Claude Code to interact with websites, supporting custom tools and offering both open-source and cloud-hosted agent options. The library provides a CLI for direct browser control and features optimized LLMs like ChatBrowserUse for faster, more accurate task completion. Production deployments are recommended for the cloud API due to its scalable infrastructure, proxy rotation, and captcha handling capabilities.
2024-10-05 2024	GitHub - fr0gger/Awesome-GPT-Agents: A curated list of GPT agents for cybersecurity beginner	Library of curated GPT agents for cybersecurity, categorized for offensive and defensive applications. This community-driven resource lists various specialized agents, including MagicUnprotect for malware evasion, GP(en)T(ester) for pentesting, Threat Intel Bot for APT tracking, Vulnerability Bot for secure coding, SourceCodeAnalysis for code review, Web Hacking Wizard for web security education, CyberGPT for CVE details, MITREGPT for MITRE ATT&CK mapping, and AppSec Test Crafter for generating application security test cases in YAML.
2024-08-28 2024	Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information · Embrace The Red advanced	Writeup detailing a Microsoft 365 Copilot vulnerability where prompt injection, automatic tool invocation, and ASCII smuggling were combined to exfiltrate personal information. The exploit chain leveraged malicious emails or shared documents to trigger Copilot's processing, enabling it to access and send sensitive data like emails and MFA codes to attacker-controlled domains via disguised hyperlinks.
2023-12-05 2023	pentestmuse-ai/PentestMuse intermediate	Library for an AI assistant designed for cybersecurity professionals, Pentest Muse aids penetration testers in brainstorming, payload generation, code analysis, and reconnaissance. It offers both command-line and web application interfaces, supporting iterative task completion and direct command execution. Users can connect via managed APIs or integrate their own OpenAI API keys.
2023-11-18 2023	protectai/ai-exploits intermediate	Library of exploits and Nuclei scanning templates for machine learning infrastructure vulnerabilities. This collection, including Metasploit modules and CSRF templates, addresses real-world attacks such as system takeovers and data loss, often without authentication. Vulnerabilities affect tools, libraries, and frameworks used in AI/ML model development, training, and deployment, with specific examples like Ray and MLflow being addressed.
2023-11-09 2023	https://chat.openai.com/g/g-6Bcjkotez-getpaths intermediate	https://ift.tt/fbJIsGN
2023-06-25 2023	Beginners guide to AI in cybersec. Hacking with ChatGPT. beginner	Beginners guide to AI in cybersec. Hacking with ChatGPT. https://ift.tt/UDRVtCp
2023-06-12 2023	Threat Modeling Example with ChatGPT intermediate	Threat Modeling Example with ChatGPT https://ift.tt/FRkZvyO
2023-05-18 2023	The AI Attack Surface Map v1.0 advanced	Framework for thinking about AI system attack surfaces, this resource maps components like AI Assistants, Agents, Tools, Models, and Storage. It highlights natural language as a primary attack vector, detailing techniques such as prompt injection against Agents and Tools to execute arbitrary commands or access sensitive data. Model attacks focus on subtle manipulation, while Storage vulnerabilities, particularly in Vector Databases, allow for data extraction and potential compromise of embeddings. The framework aims to clarify the evolving landscape of AI vulnerabilities beyond just machine learning models. → danielmiessler.com
2023-05-09 2023	How I Automate BugBounty Using Chatgpt intermediate Bug Bounty	How I Automate BugBounty Using Chatgpt https://ift.tt/93SQsPD
2023-04-09 2023	aress31/burpgpt intermediate Burp	Library for integrating OpenAI's GPT models into Burp Suite for passive security vulnerability detection. BurpGPT analyzes web traffic by sending requests and responses to a specified OpenAI model, leveraging custom prompts for tailored analysis. It generates automated security reports, highlighting potential issues beyond traditional scanner capabilities, but requires professional triaging for false positives. The extension supports various OpenAI models and allows granular control over token usage and prompt length. It requires Burp Suite Professional or Community Edition (version 2023.3.2+) and JDK 11+.
2023-04-02 2023	SecGPT transforms cybersecurity through AI-driven insights. news	SecGPT transforms cybersecurity through AI-driven insights. https://ift.tt/4kTKfoJ
2023-04-02 2023	I Used GPT-3 to Find 213 Security Vulnerabilities in a Single Codebase intermediate	I Used GPT-3 to Find 213 Security Vulnerabilities in a Single Codebase https://ift.tt/FrMSdKx
2023-04-02 2023	HackGPT beginner	HackGPT https://ift.tt/JsIGRO1
2023-03-29 2023	Microsoft Security Copilot is a new GPT-4 AI assistant for cybersecurity news	Tool that uses GPT-4 and Microsoft's security-specific model to assist cybersecurity professionals. It synthesizes enterprise security incidents, analyzes files and code, and summarizes alerts from other security tools. Security Copilot draws from 65 trillion daily signals, CISA, NIST, and its own threat intelligence, offering a prompt book for automations and a collaborative workspace. It can also generate PowerPoint summaries of incidents and attack vectors.
2022-02-03 2022	Favorite tweet by @LeaKissner news	Favorite tweet: Nicolas Carlini's ML training data extraction attack talk at #Enigma2022 escalated quickly. https://t.co/C8kzAyq7lh — Lea Kissner (@LeaKissner) Feb 2, 2022

Frequently Asked Questions

What is prompt injection?: Prompt injection is an attack against applications that use large language models (LLMs). An attacker crafts input that overrides or manipulates the LLM's system instructions, causing it to perform unintended actions. Direct prompt injection targets the user input; indirect prompt injection embeds malicious instructions in data the LLM processes, such as emails or web pages.
What is the OWASP Top 10 for LLM Applications?: The OWASP Top 10 for LLM Applications identifies the most critical security risks for AI-powered applications, including prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft.
How do you secure AI-integrated applications?: Key practices include validating and sanitizing LLM outputs before rendering or executing them, implementing least-privilege access for AI agents, using guardrails to constrain model behavior, monitoring for prompt injection attempts, applying rate limiting, separating AI processing from privileged operations, and treating all LLM output as untrusted user input.

Weekly AppSec Digest

Get new resources delivered every Monday.

AI

Related Topics

Frequently Asked Questions

Weekly AppSec Digest