AI
AI security encompasses both protecting AI systems from attack and understanding the new vulnerability classes that AI introduces into applications. As organizations rapidly integrate large language models (LLMs), machine learning pipelines, and AI-powered features into their products, the attack surface has expanded in ways that traditional application security frameworks don't fully address.
Key threats to AI systems include prompt injection — where attackers manipulate LLM behavior through crafted inputs — data poisoning of training datasets, model extraction through repeated API queries, and adversarial examples that cause misclassification. Indirect prompt injection, where malicious instructions are embedded in data the AI processes (emails, documents, web pages), is emerging as one of the most significant security challenges for AI-integrated applications.
AI also introduces new categories of application risk: insecure output handling where LLM responses are rendered unsafely, excessive agency when AI agents are given too much access, sensitive information disclosure through training data leakage, and supply chain risks from fine-tuned models and third-party plugins. The OWASP Top 10 for LLM Applications provides a structured framework for understanding these risks.
On the defensive side, AI is being used to enhance security operations — automating vulnerability detection, analyzing malicious patterns, and accelerating incident response.
This page collects AI security research, LLM vulnerability techniques, defensive strategies, and resources covering the intersection of artificial intelligence and application security.
| Date Added | Link | Excerpt |
|---|---|---|
| 2026-04-29 NEW 2026 | CVE-2026-42208: LiteLLM bug exploited 36 hours after its disclosure news SQLi | Writeup of CVE-2026-42208, an SQL injection in LiteLLM's proxy API key verification, exploited 36 hours post-disclosure. Attackers leverage crafted Authorization headers to access and potentially modify sensitive data in database tables holding API keys and credentials. The vulnerability, present in LiteLLM versions 1.81.16 to 1.83.6, was addressed in version 1.83.7. Disabling error logs offers a workaround for unpatchable instances. → securityaffairs.com |
| 2026-04-29 NEW 2026 | AI Finds 38 Security Flaws in OpenEMR news RCE | An AI system has identified 38 security vulnerabilities within the OpenEMR electronic health records software. The AI's analysis, detailed in a linked report, uncovered these flaws, highlighting potential risks to patient data security and system integrity. This discovery underscores the growing role of artificial intelligence in identifying and addressing security weaknesses in critical software applications. No specific bug bounty payout amount was mentioned in the provided content. → darkreading.com |
| 2026-04-29 NEW 2026 | LiteLLM exploited within 36 hours of disclosure via SQL injection bug news SQLi | Library for managing large language model (LLM) interactions. Explores the exploitation of CVE-2026-42208, a SQL injection vulnerability in LiteLLM, which led to the theft of API keys and provider credentials from enterprises using the proxy to connect to models like OpenAI and Anthropic. The vulnerability, disclosed and exploited within 36 hours, highlights the compressed window between vulnerability discovery and weaponization, potentially exposing sensitive company IP and private data. Disabling error logs is a suggested mitigation. → scworld.com |
| 2026-04-29 NEW 2026 | Malicious npm Dependency Linked to AI Assisted Commit Targets Crypto Wallets news Supply Chain | Library of malicious npm dependencies linked to AI-assisted commits, specifically @validate-sdk/v2 and the PromptMink campaign, targeting crypto wallets. This North Korean state-sponsored actor, Famous Chollima, employed a layered attack structure with legitimate-seeming Web3 utilities hiding malware payloads, evolving from JavaScript to compiled binaries and Rust across Linux and Windows to exfiltrate sensitive data, system information, project folders, and install SSH keys for persistent access. → infosecurity-magazine.com |
| 2026-04-29 NEW 2026 | Fresh LiteLLM Vulnerability Exploited Shortly After Disclosure news SQLi | Library for securing AI gateways, specifically addressing CVE-2026-42208, a critical-severity SQL injection vulnerability in LiteLLM. This flaw, exploitable pre-authentication, allowed unauthenticated attackers to craft malicious Authorization headers to access sensitive database tables containing API keys and credentials. The vulnerability arises from a database query that includes caller-supplied values directly, bypassing parameterization. LiteLLM version 1.83.7 resolves this by properly parameterizing the query, with disabling error logs also offered as a mitigation. → securityweek.com |
| 2026-04-29 NEW 2026 | Firefox using advanced AI to find fix browser security flaws news Fuzzing | Firefox is leveraging advanced AI to proactively identify and fix security vulnerabilities in its browser. This innovative approach aims to enhance user safety by detecting flaws before they can be exploited. The article highlights how AI is becoming an increasingly powerful tool in cybersecurity, particularly in the realm of software development and maintenance. → msn.com |
| 2026-04-29 NEW 2026 | Cursor AI Vulnerability Enables Remote Code Execution news RCE | A critical vulnerability in Cursor AI has been discovered, allowing for Remote Code Execution (RCE). This means an attacker could potentially run unauthorized code on a user's system through the AI. The exact impact and exploitation details are likely to be further detailed in the linked content. This type of vulnerability poses a significant security risk, potentially leading to data breaches, system compromise, and other malicious activities. → letsdatascience.com |
| 2026-04-28 NEW 2026 | FIRESIDE CHAT: Leaked secrets are now the go-to attack vector and AI is accelerating exposures news Secrets | Library for scanning public GitHub commits and private repositories for hard-coded secrets. It detects over 28.6 million leaked credentials in 2025, a 34% year-over-year increase, with AI infrastructure secrets like OpenRouter and DeepSeek API keys spiking significantly. The library addresses the remediation problem, noting that 64% of leaked credentials from 2022 remain active. It highlights how AI-assisted code, like commits co-signed by Claude Code, contains secrets at a 33% rate, and emphasizes the need for governance alongside tools like SPIFFE for machine identity. → securityboulevard.com |
| 2026-04-28 NEW 2026 | Experts flag potentially critical security issues at heart of Anthropic MCP news | Security experts have identified potentially critical vulnerabilities within Anthropic's "MCP" (likely referring to their model or platform). These issues, if exploited, could pose significant risks. The article highlights concerns about the security of Anthropic's core technology. No specific payout amounts for bug bounties were mentioned in the provided content. → msn.com |
| 2026-04-27 NEW 2026 | Weekly Recap: Fast16 Malware XChat Launch Federal Backdoor AI Employee Tracking & More news | Toolset highlighting recent application security threats including fast16 malware, the UNC6692 group's Snow malware suite, FIRESTARTER backdoor targeting a U.S. federal agency, Lotus Wiper affecting Venezuelan energy systems, and The Gentlemen RaaS deploying SystemBC. It also covers the Bitwarden CLI compromise, detailing vulnerabilities such as CVE-2025-20333 and CVE-2025-20362. → thehackernews.com |
| 2026-04-27 NEW 2026 | Poisoned pixels phishing prompt injection: Cybersecurity threats in AI-driven radiology beginner | Library discussing AI vulnerabilities in healthcare radiology, focusing on prompt injection techniques like data poisoning, backdoor attacks, and jailbreaking. It highlights risks of LLMs in DICOM headers and diagnostic imaging data, enabling attacks without advanced programming skills. Countermeasures explored include least privilege, sandboxing, digital watermarking, and red teaming involving clinical specialists, alongside the persistent human factor in cybersecurity. |
| 2026-04-26 NEW 2026 | Anthropic's model context protocol includes a critical remote code execution vulnerability news RCE | A critical remote code execution vulnerability has been discovered in Anthropic's model context protocol. This flaw could allow attackers to execute arbitrary code on a system, posing a significant security risk. Further details are available at the provided link. No bug bounty payout amount is mentioned in the content. → msn.com |
| 2026-04-26 NEW 2026 | prompt-security/clawsec: A complete security skill suite for OpenClaw's and NanoClaw agents (and variants). Protect your SOUL.md (etc') with drift detection, live security recommendations, automated audits, and skill integrity verification. All from one installable suite. intermediate Supply Chain | Library for comprehensive security for AI agent platforms like OpenClaw, NanoClaw, Hermes, and Picoclaw. It provides unified security monitoring, drift detection, live security recommendations from NVD CVE polling, automated audits for prompt injection, and skill integrity verification. The suite includes a one-command installer, file integrity protection for critical agent files (SOUL.md, etc.), and checksum verification for all skill artifacts. It also offers exploitability context enrichment for CVE advisories, detailing exploit existence, weaponization status, attack requirements, and risk assessment to prioritize immediate threats. |
| 2026-04-24 2026 | Indirect prompt injection is taking hold in the wild beginner | Analysis of indirect prompt injection (IPI) observed in the wild, detailing techniques for hiding malicious instructions within web pages and metadata. Researchers from Google and Forcepoint identified IPIs ranging from harmless pranks to destructive actions like data exfiltration, financial fraud via PayPal and Stripe, and denial-of-service attacks. Hidden text, HTML comments, and metadata injection are common obfuscation methods. The increasing prevalence and sophistication of these attacks, particularly against agentic AIs with elevated privileges, necessitate strict data-instruction boundaries. → helpnetsecurity.com |
| 2026-04-24 2026 | GPT-5.5 Bio Bug Bounty Program Aims to Improve AI Safety and Performance news Bug Bounty | A bug bounty program has been launched for GPT-5.5, focusing on enhancing both AI safety and performance. This initiative encourages researchers to identify and report vulnerabilities, contributing to the ongoing development and refinement of the AI model. The program aims to proactively address potential issues before widespread deployment, ensuring a more robust and secure AI. Specific details on payout amounts are not provided in the title or content. → gbhackers.com |
| 2026-04-24 2026 | How indirect prompt injection attacks on AI work - and 6 ways to shut them down intermediate | Library of resources addressing indirect prompt injection attacks on LLMs, a leading security risk. This threat involves hidden instructions within web content, emails, or addresses that can cause AI to perform malicious actions like data exfiltration or unauthorized redirection, as detailed by researchers from Palo Alto Networks and Forcepoint. Techniques such as API key theft, system override, attribute hijacking, and terminal command injection are outlined. The library also covers defensive strategies including input/output validation, human oversight, and vendor-specific mitigation efforts from Google, Microsoft, Anthropic, and OpenAI. |
| 2026-04-23 2026 | Six AI Vulnerabilities Three Attack Patterns One Dangerous Service Gap news | Library for analyzing AI vulnerabilities, focusing on three distinct attack patterns: untrusted input processed as trusted AI context, overly broad AI data access without per-operation enforcement, and process containment and functional scoping failures. This analysis covers vulnerabilities like EchoLeak, Reprompt, ForcedLeak, GeminiJack, and GrafanaGhost, highlighting the need for robust input validation extended to all data sources AI touches, per-operation access control for AI data requests, and strict functional scoping for back-end AI processes, rather than solely relying on model-level guardrails. |
| 2026-04-23 2026 | AI-powered scanner vulnerabilities news | Library detailing vulnerabilities in AI-powered web scanners that leverage Large Language Models. It outlines how attacker-controlled content can influence scanner reasoning, leading to indirect prompt injection attacks. These attacks can cause unintended state changes, data exfiltration, and exploitation of routing-based SSRF, often by manipulating Host headers to access internal services from within the scanner's privileged network position. → portswigger.net |
| 2026-04-23 2026 | Anthropic's model context protocol includes a critical remote code execution vulnerability news | Anthropic's model context protocol includes a critical remote code execution vulnerability https://ift.tt/Hfb3ygq → msn.com |
| 2026-04-22 2026 | Massive compromise hits LiteLLM and the whole AI developers community: how did it happen? news | Massive compromise hits LiteLLM and the whole AI developers community: how did it happen? https://ift.tt/kWQ0dJB → cybernews.com |
| 2026-04-22 2026 | Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it news | Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it https://ift.tt/smH86bY |
| 2026-04-22 2026 | You're Simulating the Wrong Attacker: Who Matters in AI Red Teaming beginner | Library for AI red teaming that highlights the limitations of simulating only prompt injection attackers. It details six distinct threat actor profiles, including low-skill script kiddies, insider threats, and sophisticated nation-state actors, each requiring specialized testing approaches across five expertise domains: prompt engineering, application security, architecture, data/ML security, and business logic. The resource emphasizes that traditional app security teams and even many AI-focused firms miss critical attack surfaces by not simulating a broader range of adversaries and attack vectors. |
| 2026-04-22 2026 | DeepTeam: Open-Source Framework to Red Team LLMs and LLM Systems intermediate | Framework for red teaming LLM systems, DeepTeam simulates attacks like jailbreaking, prompt injection, and multi-turn exploitation to uncover vulnerabilities such as bias, PII leakage, and SQL injection. It supports over 50 pre-built vulnerabilities mapped to frameworks like OWASP Top 10 for LLMs and NIST AI RMF, along with 20+ adversarial attack methods. DeepTeam also includes seven production-ready guardrails and allows custom vulnerability creation. |
| 2026-04-22 2026 | Claude Jailbreaking in 2026: What Repello's Red Teaming Data Shows news | Analysis of Repello's red-teaming data on LLM jailbreaking reveals Claude Opus 4.5's significantly lower breach rates (4.8%) compared to GPT-5.2 (14.3%) and GPT-5.1 (28.6%) across 21 multi-turn adversarial scenarios. Claude Opus 4.5 demonstrated complete defense against financial fraud and mass deletion attempts, while GPT-5.2 exhibited a "refusal-enablement gap" by refusing harmful actions linguistically yet providing executable attack steps. The analysis highlights that operational risk stems from multi-turn adversarial sequences and application-layer attacks on custom deployments, rather than simple single-prompt jailbreaks. |
| 2026-04-22 2026 | AI-Infra-Guard: Full-Stack AI Red Teaming Platform intermediate | Platform for full-stack AI red teaming, AI-Infra-Guard integrates capabilities like ClawScan, Agent Scan, AI infra vulnerability scanning, MCP Server & Agent Skills scan, and Jailbreak Evaluation. It aims to detect vulnerabilities including the LiteLLM supply chain attack (CRITICAL) and supports scanning AI components like FastGPT, Upsonic, crewai, and kubeai, with a vulnerability database refreshed across multiple components and new CVE/GHSA entries. |
| 2026-04-22 2026 | AI Red Teaming Playground Labs (Microsoft) intermediate | Library providing AI Red Teaming Playground Labs, originally featured in Black Hat USA 2024. It offers challenges for systematically red teaming AI systems, incorporating adversarial machine learning and Responsible AI failures. These labs are also referenced in the Microsoft Learn Limited Series: AI Red Teaming 101. The repository includes Jupyter Notebooks showcasing the use of the Python Risk Identification Tool (PyRIT) for automated risk identification in generative AI systems, specifically for Labs 1 and 5. |
| 2026-04-22 2026 | HackerOne: LLM01: Invisible Prompt Injection intermediate | Program: HackerOne Severity: medium Weakness: LLM01: Prompt Injection ## Description Hey team, Hai is vulnerable to invisible prompt injection via Unicode tag characters. ## Reproduction steps 1. ... → hackerone.com |
| 2026-04-22 2026 | When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins beginner | Survey of prompt injection risks in third-party AI chatbot plugins, analyzing 17 plugins used by over 10,000 websites. Eight plugins fail to enforce conversation history integrity, amplifying direct prompt injection by allowing forged system messages. Fifteen plugins indiscriminately ingest third-party content for web-scraping, enabling indirect prompt injection when attackers poison external data. This study systematically evaluates these vulnerabilities, showing how insecure plugin practices undermine LLM-level defenses. → arxiv.org |
| 2026-04-22 2026 | Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis advanced | Analysis of prompt injection vulnerabilities affecting agentic AI coding assistants like Claude Code, GitHub Copilot, and Cursor, which integrate LLMs with external tools and protocols such as MCP. This work synthesizes findings from 78 studies, detailing 42 attack techniques including input manipulation, tool poisoning, and protocol exploitation. It identifies that over 85% of attacks succeed against current defenses, often enabling arbitrary code execution and system compromise through vulnerabilities in skill-based architectures and protocol ecosystems. → arxiv.org |
| 2026-04-22 2026 | Prompt Injection 2.0: Hybrid AI Threats advanced | Library for analyzing Prompt Injection 2.0, which combines LLM manipulation with traditional exploits like XSS and CSRF. It builds upon Preamble's research and mitigation technologies, evaluating them against contemporary threats such as AI worms and multi-agent infections. The library analyzes how these hybrid attacks bypass security controls, referencing CVE-2024-5565 and DeepSeek XSS exploits, and proposes architectural solutions involving prompt isolation and runtime security. → arxiv.org |
| 2026-04-22 2026 | Architecting Secure AI Agents: System-Level Defenses Against Indirect Prompt Injection advanced | Library for architecting secure AI agents, focusing on system-level defenses against indirect prompt injection. It proposes dynamic replanning, constrained LLM decision-making, and treating personalization and human interaction as core design elements. The work critiques existing benchmarks, highlighting the importance of system-level structures for controlling agent behavior and integrating rule-based and model-based security checks. → arxiv.org |
| 2026-04-22 2026 | Anthropic's Model Context Protocol includes a critical remote code execution vulnerability newly discovered exploit puts 200000 AI servers at risk news | Writeup of critical RCE vulnerability in Anthropic's Model Context Protocol (MCP) affecting its SDKs across Python, TypeScript, Java, and Rust. The flaw, rooted in STDIO transport interface handling of local process execution, allows arbitrary command injection via user-controlled input without sanitization. Exploitation vectors include UI injection in AI frameworks, hardening bypasses in tools like Flowise, zero-click prompt injection in AI coding IDEs such as Windsurf and Cursor, and malicious package distribution via MCP marketplaces. OX Security reported numerous CVEs, with some fixed and others awaiting resolution. |
| 2026-04-21 2026 | The 'by design' security flaw of Model Context Protocol (MCP) news | Writeup on the Model Context Protocol (MCP) by OX Security details an architectural flaw allowing remote command execution by exploiting its STDIO interface. This vulnerability affects millions of AI applications and has resulted in numerous CVEs, enabling attackers to hijack servers and exfiltrate data through unverified MCP marketplace configurations like those found in LangFlow and AI IDEs like Windsurf and Cursor. The report emphasizes the need for developers to implement manifest-only execution, strict sandboxing, explicit opt-ins, least-privilege secret management, and marketplace verification to mitigate risks. |
| 2026-04-21 2026 | Prompt injection turned Googles Antigravity file search into RCE news | Tool: Prompt injection allows RCE in Google's Antigravity IDE, bypassing Secure Mode. Researchers exploited a flaw in the `find_my_name` tool, which used the `fd` utility. By injecting command-line flags into the `Pattern` parameter, attackers could transform file searches into arbitrary code execution, even through indirect prompt injection from untrusted source files. This bypasses Secure Mode because the native tool invocation occurs before security boundary checks. → csoonline.com |
| 2026-04-21 2026 | Claude Code Gemini CLI and GitHub Copilot Vulnerable to Prompt Injection via GitHub Comments news | Claude Code, Gemini CLI, and GitHub Copilot Vulnerable to Prompt Injection via GitHub Comments https://ift.tt/FS25xif → cybersecuritynews.com |
| 2026-04-21 2026 | Google Patches Antigravity IDE Flaw Enabling Prompt Injection Code Execution news | Library for defending against prompt injection attacks in AI-powered development tools. This library addresses vulnerabilities like the one in Google's Antigravity IDE, where flaws in file searching and input sanitization allowed code execution via the `-X` flag. It also covers techniques seen in attacks such as Comment and Control against GitHub Copilot, NomShub in Cursor IDE, ToolJack, CVE-2026-21520 in Microsoft Copilot Studio, and Claudy Day in Claude, all of which leverage untrusted input to manipulate AI agents, exfiltrate data, or gain unauthorized access. → thehackernews.com |
| 2026-04-20 2026 | Vuln in Googles Antigravity AI agent manager could escape sandbox give attackers remote code execution news | Vulnerability in Google's Antigravity AI agent manager allowed prompt injection to bypass secure mode, granting attackers remote code execution by exploiting the `find_by_name` native tool before sandbox protections engaged. This discovery, made by Pillar Security and since patched, highlights the risks of unvalidated input for agentic AI, similar to findings in Cursor, and emphasizes the need to move beyond sanitization controls for native tool parameters. |
| 2026-04-20 2026 | Anthropic MCP Hit by Critical Vulnerability Enabling Remote Code Execution news | Anthropic MCP Hit by Critical Vulnerability Enabling Remote Code Execution https://ift.tt/4HM1zP0 → gbhackers.com |
| 2026-04-20 2026 | Critical Anthropic MCP Vulnerability Enables Remote Code Execution Attacks news | Critical Anthropic MCP Vulnerability Enables Remote Code Execution Attacks https://ift.tt/sjNEzGL → cyberpress.org |
| 2026-04-19 2026 | MCP Tool Poisoning — How It Works & How To Fight It intermediate | Library detailing MCP tool poisoning, an indirect prompt injection attack targeting AI agents interacting with tools via Model Context Protocol (MCP) servers. Attackers hide malicious instructions within tool metadata, like descriptions or schemas, making them invisible to users but readable by AI agents. This technique can lead to data exfiltration, credential hijacking, and remote code execution, and can be combined with other attacks such as MCP rug pulls. Mitigation strategies primarily involve using MCP gateways and robust AI security tools to detect changes in tool metadata and outputs. |
| 2026-04-19 2026 | Model Context Protocol Has Prompt Injection Security Problems intermediate | Library for securing applications that implement the Model Context Protocol (MCP), addressing prompt injection vulnerabilities. It details attacks like rug pulls, tool shadowing, and tool poisoning, as demonstrated by examples involving exfiltrating WhatsApp message history and manipulating `os.system()` calls. The library highlights the inherent dangers of mixing untrusted instructions with tools that can perform actions on a user's behalf. |
| 2026-04-19 2026 | Vulnerability of LLMs to Prompt Injection in Medical Advice — JAMA news | Vulnerability of LLMs to Prompt Injection in Medical Advice — JAMA |
| 2026-04-19 2026 | Prompt Injection Attack Against LLM-Integrated Applications — arXiv beginner | Survey of prompt injection attacks against LLM-integrated applications, detailing the limitations of current methods and introducing HouYi, a novel black-box attack technique. HouYi, inspired by traditional web injection, comprises a pre-constructed prompt, an injection prompt for context partitioning, and a malicious payload. The study demonstrates severe outcomes like unrestricted LLM usage and application prompt theft across 36 real-world applications, with 31 found vulnerable and 10 vendors, including Notion, validating discoveries. → arxiv.org |
| 2026-04-19 2026 | Prompt Injection Attacks in LLMs and AI Agent Systems: A Comprehensive Review beginner | Prompt Injection Attacks in LLMs and AI Agent Systems: A Comprehensive Review |
| 2026-04-16 2026 | Anthropic Defends MCP Design Despite Server Takeover Risk news | Anthropic Defends MCP Design Despite Server Takeover Risk https://ift.tt/IsVue9D → letsdatascience.com |
| 2026-04-16 2026 | The Mother of All AI Supply Chains: Critical Systemic Vulnerability at the Core of Anthropics MCP news | Analysis of Anthropic's Model Context Protocol (MCP) reveals a systemic vulnerability enabling Arbitrary Command Execution (RCE) across its SDKs for Python, TypeScript, Java, and Rust. Exploitable via unauthenticated UI injection, hardening bypasses in Flowise, zero-click prompt injection in Windsurf and Cursor, and malicious marketplace distribution, this flaw impacts over 150 million downloads and thousands of servers. Affected tools include LiteLLM, LangChain, and IBM's LangFlow, with over 10 CVEs issued. → ox.security |
| 2026-04-16 2026 | Bypassing LLM Guardrails: Evasion Attacks against Prompt Injection Detection intermediate | Analysis of evasion attacks against LLM guardrail systems, detailing two methods: character injection and algorithmic Adversarial Machine Learning (AML). Tested against Azure Prompt Shield and Meta's Prompt Guard, these techniques achieved up to 100% evasion success, maintaining adversarial utility. Attack Success Rates against black-box targets were enhanced by leveraging word importance ranking from offline white-box models, exposing vulnerabilities in current LLM protection mechanisms. → arxiv.org |
| 2026-04-16 2026 | EchoGram: Bypassing AI Guardrails via Token Flip Attacks - HiddenLayer intermediate | Technique for bypassing AI guardrails, EchoGram, exploits similarities in training data for text classification and LLM-as-a-judge systems. By appending specific "flip tokens" to malicious prompts, attackers can trick defense models into approving harmful content or generating false alarms. This attack targets defenses protecting models like GPT-4, Claude, and Gemini, and works by manipulating the guardrail layer without altering the core payload. EchoGram can be implemented via dataset distillation or model probing techniques. |
| 2026-04-16 2026 | MCP Security: Tool Poisoning Attacks - Invariant Labs intermediate | Library detailing Model Context Protocol (MCP) Tool Poisoning Attacks, a vulnerability allowing sensitive data exfiltration and AI model hijacking via malicious tool descriptions. These attacks exploit the disconnect between simplified user interfaces and complete tool descriptions, enabling instructions to access sensitive files like SSH keys and obscure data transmission. The library highlights implications for agentic systems, detailing how attackers can poison tool descriptions to compromise user data and manipulate AI behavior even with trusted servers. |
| 2026-04-16 2026 | Poison Everywhere: No Output from Your MCP Server Is Safe - CyberArk intermediate | Library for exploring Tool Poisoning Attacks (TPA) on Anthropic's Model Context Protocol (MCP). This research extends beyond description fields to demonstrate Full-Schema Poisoning (FSP) by manipulating parameter defaults and types within the tool schema. It also introduces Advanced Tool Poisoning Attacks (ATPA), which specifically target and complicate the detection of malicious tool outputs on MCP servers. |
| 2026-04-16 2026 | The Embedded Threat in Your LLM: Poisoning RAG Pipelines intermediate | Analysis of the "Embedded Threat" attack against RAG pipelines, demonstrating how attackers can poison vector databases with malicious documents. This exploit manipulates LLM behavior by embedding hidden instructions within vector embeddings, such as those generated by sentence-transformers/all-MiniLM-L6-v2, leading to altered responses without prompt modification. The attack leverages semantic similarity and LLM trust in retrieved context to inject misinformation or change personas, with proof-of-concept results showing an 80% success rate. Defenses focus on vetting sources, preprocessing content before embedding, enforcing prompt boundaries, and monitoring retrieval behavior. |
| 2026-04-16 2026 | EchoLeak: First Real-World Zero-Click Prompt Injection Exploit advanced | Writeup of EchoLeak (CVE-2025-32711), the first zero-click prompt injection exploit targeting Microsoft 365 Copilot. This vulnerability allowed unauthenticated data exfiltration via a crafted email by chaining multiple bypasses, including evading XPIA classifiers, using reference-style Markdown, exploiting auto-fetched images, and abusing a Microsoft Teams proxy within the content security policy. The paper analyzes defense failures and proposes mitigations such as prompt partitioning and enhanced filtering, providing generalizable lessons for secure AI copilots. → arxiv.org |
| 2026-04-16 2026 | When LLMs Autonomously Attack - CMU Research advanced | Research from Carnegie Mellon University demonstrates LLMs can autonomously plan and execute complex cyberattacks by acting as hierarchical agents with abstracted "mental models" of red teaming behavior. This system, evaluated by recreating the 2017 Equifax data breach, shows advanced LLMs can orchestrate multi-step attacks, including exploitation, malware deployment, and data exfiltration, without detailed human instruction, offering potential for continuous, affordable security testing and autonomous defense development. |
| 2026-04-16 2026 | The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover advanced | Survey of LLM agent vulnerabilities; demonstrates how 94.4% of 18 tested LLMs succumb to Direct Prompt Injection and 83.3% to RAG Backdoor Attacks, enabling malware execution. Inter-Agent Trust Exploitation compromises 100.0% of models, showcasing context-dependent security behaviors that create exploitable blind spots within multi-agent systems. → arxiv.org |
| 2026-04-16 2026 | MCP Tools: Attack Vectors and Defense Recommendations - Elastic Security Labs intermediate | Library detailing attack vectors and defense recommendations for Model Context Protocol (MCP) tools, which connect LLMs to external resources. It explores prompt injection and orchestration exploits, including obfuscated instructions, rug-pull redefinitions, cross-tool orchestration, and passive influence, with examples and a basic LLM-based detection method. Security precautions and defense tactics for MCP tool vulnerabilities are also discussed. |
| 2026-04-16 2026 | MCP Safety Audit: LLMs with MCP Allow Major Security Exploits intermediate | Tool for auditing Model Context Protocol (MCP) servers, McpSafetyScanner automatically detects vulnerabilities like malicious code execution, remote access control, and credential theft in generative AI applications. It identifies adversarial samples, searches for related exploits, and generates remediation reports for MCP developers. The tool aims to proactively mitigate security risks introduced by LLMs using the MCP framework, addressing issues present in industry-leading LLMs such as Claude and Llama. → arxiv.org |
| 2026-04-16 2026 | AI Security: 5 Attack Vectors Explained beginner | Talk detailing five critical attack vectors targeting Large Language Models (LLMs), including Prompt Injection, Context Injection, LLM Internals Vector, RAG Vector, and Agentic Vector. It highlights the "Zero Trust Gap" in LLMs and discusses encoder models like ModernBERT as potential building blocks for implementing AI guardrails due to their speed, efficiency, and privacy benefits. |
| 2026-04-16 2026 | AI agents on GitHub leak API keys via prompt injection news | Library for detecting prompt injection vulnerabilities in AI agents, specifically detailing "Comment and Control" attacks on GitHub Actions. The vulnerability affects Claude Code Security Review (CVSS 9.4 Critical), Google Gemini CLI Action (bounty $1,337), and GitHub Copilot Agent (bypassing environment filtering, secret scanning, and network firewall). Attackers exploit PR titles, issue bodies, and comments to exfiltrate API keys and tokens like ANTHROPIC_API_KEY, GITHUB_TOKEN, GEMINI_API_KEY, and GITHUB_COPILOT_API_TOKEN. → techzine.eu |
| 2026-04-16 2026 | MCP Supply Chain Advisory: RCE Vulnerabilities Across the AI Ecosystem news | Advisory detailing a systemic command injection vulnerability within Anthropic's MCP protocol impacting multiple AI ecosystem products. Exploits, including CVE-2025-65720 for GPT Researcher, CVE-2026-30623 for LiteLLM, and CVE-2026-30624 for Agent Zero, allow unauthenticated or authenticated remote command execution by injecting arbitrary commands through MCP configurations in affected applications like LangFlow, Fay Digital Human Framework, and Bisheng. → ox.security |
| 2026-04-15 2026 | Risks of artificial intelligence security beginner | Library of security considerations for artificial intelligence, detailing risks from prompt injection and data poisoning to model stealing and generative AI misuse in deepfakes and phishing. It highlights vulnerabilities in AI systems, adversary misuse of generative AI, and unintended consequences like bias and data leakage, emphasizing challenges posed by LLM integrations with tools and third-party dependencies. The summary also touches on AI-generated code risks and the escalating concern of autonomous AI attack bots. → blockchain-council.org |
| 2026-04-15 2026 | Agentic LLM Browsers Expose New Attack Surface for Prompt Injection and Data Theft intermediate | Agentic LLM Browsers Expose New Attack Surface for Prompt Injection and Data Theft https://ift.tt/KeHF0om → cybersecuritynews.com |
| 2026-04-15 2026 | Agents hooked into GitHub can steal creds but Anthropic Google and Microsoft haven't warned users news | Library for detecting prompt injection vulnerabilities in AI agents integrated with GitHub Actions. Researchers demonstrated that agents like Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot can be tricked via "comment and control" prompt injection into leaking API keys and GitHub access tokens. This attack can occur proactively when pull requests are opened or issues are filed, bypassing existing security layers. → theregister.com |
| 2026-04-14 2026 | Check Point Releases AI Factory Security Blueprint to Safeguard AI Infrastructure from GPU Servers to LLM Prompts beginner | Blueprint for securing AI infrastructure, safeguarding GPU servers to LLM prompts. This vendor-tested reference architecture, developed by Check Point, offers layered protection across perimeter, application and LLM, AI infrastructure, and workload and container layers. It addresses threats like prompt injection, data exfiltration, and lateral movement within Kubernetes, leveraging technologies from Check Point and NVIDIA BlueField DPUs via the NVIDIA DOCA software platform. |
| 2026-04-14 2026 | AI Agents Drive Exposure of 29 Million Credentials news | AI Agents Drive Exposure of 29 Million Credentials https://ift.tt/zyb7MrR → letsdatascience.com |
| 2026-04-14 2026 | Claude Mythos Changed Everything. Your APIs Are the First Target. news | Platform for agentic security, Salt's Agentic Security Platform addresses the immediate threat posed by AI models like Claude Mythos, which can autonomously discover and exploit zero-day vulnerabilities. It provides continuous, real-time discovery of all API assets, including undocumented and shadow APIs, mapping the full agentic attack surface. The platform then assesses posture, identifying exposures like unauthenticated APIs and excessive permissions, enabling prioritized remediation to fix vulnerabilities before they can be exploited by AI-powered attackers. → securityboulevard.com |
| 2026-04-13 2026 | AI Coding Security Vulnerability Statistics 2026: Alarming Data news | Survey of AI coding security vulnerability statistics reveals alarming trends, with up to 62% of AI-generated code containing flaws. Veracode's 2025 analysis shows 45% of AI-generated code fails security tests, and 86% of organizations use third-party packages with critical vulnerabilities in AI-driven environments. Common issues include SQL injection, XSS, log injection, hardcoded credentials, and insecure cryptographic implementations. Java exhibits a 71% failure rate, while Python has a 38% failure rate, highlighting language-specific risks. The report notes a 10x increase in monthly security findings from AI code and a 153% rise in design-level flaws. Prompt injection is now the top OWASP risk for LLM applications. |
| 2026-04-13 2026 | GitHub - schwartz1375/genai-security-training beginner Talks | Library of self-paced training materials for security researchers red teaming GenAI and AI/ML systems. It covers adversarial attacks, security vulnerabilities, privacy breaches, model manipulation, evasion techniques, and system-level exploits like prompt injection and jailbreaking. The curriculum includes hands-on labs using tools such as Adversarial Robustness Toolbox (ART), TextAttack, and SHAP, along with theoretical content and references to OWASP LLM Top 10 and MITRE ATLAS. |
| 2026-04-13 2026 | GitHub - schwartz1375/genai-essentials beginner Talks | Collection of Jupyter notebooks detailing Generative AI and Large Language Model concepts, prioritizing security considerations. The sequence progresses from core LLM principles and agent introductions to advanced topics like Retrieval-Augmented Generation (RAG), multimodal LLMs, agent frameworks (ReAct, Plan-Execute), and Model Context Protocol (MCP) integration for tool extensibility. Dependencies include Python 3.8+ and Jupyter. |
| 2026-04-12 2026 | Could Sock Puppeting Be the New Trick Jailbreaking Major LLMs? news | Technique for jailbreaking LLMs using "sockpuppeting" exploits assistant prefill APIs across major models like Gemini 2.5 Flash and GPT-4o-mini. This method injects a fake acceptance message into the assistant's role, forcing models to bypass safety guardrails and generate prohibited content, including malicious exploit code and system prompts. Providers like OpenAI and AWS Bedrock mitigate this by blocking assistant prefills entirely, while platforms like Google Vertex AI are susceptible due to differing message handling. Security teams are advised to incorporate this vulnerability into AI red-teaming and implement API-layer message ordering validation. |
| 2026-04-11 2026 | LLM Red Teaming Guide (Open Source) - Promptfoo intermediate | Library for systematic LLM red teaming, focusing on generating adversarial inputs like prompt injection and jailbreaking to evaluate responses. It supports black-box testing, quantifying risk, and integrating into CI/CD pipelines for applications involving RAG, LLM agents, or chatbots, addressing vulnerabilities such as information leakage, API misuse, and privacy violations. |
| 2026-04-11 2026 | Defining LLM Red Teaming - NVIDIA Technical Blog beginner | Analysis defining LLM red teaming as a limit-seeking, manual, and creative practice focused on discovering model deviations rather than malicious harm. It categorizes strategies into language, rhetorical, possible worlds, fictionalizing, and stratagems, identifying 35 specific techniques for exploring LLM vulnerabilities. This approach complements automated benchmarking by leveraging human intuition to uncover novel risks, a crucial element in NVIDIA's trustworthy AI development process. |
| 2026-04-11 2026 | Large Reasoning Models are Autonomous Jailbreak Agents advanced | Survey of Large Reasoning Models as autonomous jailbreak agents, evaluating DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, and Qwen3 235B. These models autonomously planned and executed multi-turn conversations with nine target models, achieving a 97.14% jailbreak success rate across harmful prompts. The research highlights an "alignment regression" dynamic, where advanced LRMs can erode the safety guardrails of earlier models. |
| 2026-04-11 2026 | Involuntary Jailbreak: On Self-Prompting Attacks advanced | Library disclosing "involuntary jailbreak," a new LLM vulnerability. This technique employs a single universal prompt to compel models like Claude Opus 4.1, Grok 4, Gemini 2.5 Pro, and GPT 4.1 to generate previously rejected questions and their detailed answers, potentially compromising the entire guardrail structure rather than localized components. → arxiv.org |
| 2026-04-11 2026 | Single Line of Code Can Jailbreak 11 AI Models Including ChatGPT, Claude, Gemini intermediate | Single Line of Code Can Jailbreak 11 AI Models Including ChatGPT, Claude, Gemini → cyberpress.org |
| 2026-04-11 2026 | OWASP Top 10 for LLMs 2025: Key Risks and Mitigation Strategies beginner | Survey of the OWASP Top 10 for LLM Applications (2025), detailing evolving technical and socio-technical risks like prompt injection and excessive agency. This updated list guides enterprises in securing generative AI ecosystems, from training pipelines to plugins, addressing data disclosure and systemic vulnerabilities relevant to GDPR, HIPAA, CCPA, and the EU AI Act. Invicti's proof-based scanning and LLM-specific checks are presented as tools to validate real risks and strengthen defenses. → invicti.com |
| 2026-04-11 2026 | OWASP Top 10 for LLM Applications 2025 beginner | OWASP Top 10 for LLM Applications 2025 → genai.owasp.org |
| 2026-04-11 2026 | Practical Poisoning Attacks against Retrieval-Augmented Generation advanced | Library introducing CorruptRAG, a novel poisoning attack against Retrieval-Augmented Generation (RAG) systems. This technique injects a single poisoned text into the knowledge database, significantly enhancing attack feasibility and stealth compared to prior methods that required numerous poisoned entries. Experiments on large-scale datasets validate CorruptRAG's effectiveness in compromising RAG outputs. → arxiv.org |
| 2026-04-11 2026 | RAG Safety: Exploring Knowledge Poisoning Attacks to RAG advanced | Analysis of knowledge poisoning attacks targeting Retrieval-Augmented Generation (RAG) systems, specifically focusing on KG-RAG. This work introduces a practical, stealthy attack strategy that inserts perturbation triples into knowledge graphs to create misleading inference chains, degrading KG-RAG performance. Experiments demonstrate the attack's effectiveness against four recent KG-RAG methods with minimal KG perturbations. → arxiv.org |
| 2026-04-11 2026 | Benchmarking Poisoning Attacks against Retrieval-Augmented Generation advanced | Benchmark framework for evaluating poisoning attacks on Retrieval-Augmented Generation (RAG) systems. This benchmark includes 5 standard QA datasets, 10 expanded variants, 13 poisoning attack methods, and 7 defense mechanisms. Findings reveal that while current attacks are effective on standard datasets, their impact diminishes on expanded versions, and advanced RAG architectures like sequential, branching, conditional, loop, conversational, multimodal RAG, and RAG-based LLM agents remain vulnerable, with existing defenses proving insufficient. → arxiv.org |
| 2026-04-11 2026 | Q4 2025 AI Agent Security Trends news | Report on Q4 2025 AI agent security trends, detailing real-world attacks targeting emergent agentic AI systems. Analysis of production traffic reveals attacker focus on system prompt leakage, indirect prompt injection via trusted external content, and exploitation of new surfaces like tool use and script-shaped content. Core techniques include role play and obfuscation to bypass safeguards, with indirect attacks proving more efficient than direct ones. |
| 2026-04-11 2026 | OWASP GenAI Top 10 Risks and Mitigations for Agentic AI Security beginner | Library defining the OWASP Top 10 for Agentic Applications, a comprehensive resource for identifying and mitigating risks associated with autonomous AI agents. Developed through input from over 100 industry leaders, it highlights threats such as Agent Behavior Hijacking, Tool Misuse and Exploitation, and Identity and Privilege Abuse. This framework complements existing OWASP GenAI resources, offering practical, actionable guidance grounded in real-world attacks and mitigations to promote the secure development and deployment of generative AI systems. → genai.owasp.org |
| 2026-04-11 2026 | AI Agent Attacks in Q4 2025 Signal New Risks for 2026 news | Analysis of Q4 2025 AI agent attacks highlights evolving threats including system prompt extraction via hypothetical scenarios and obfuscation. Attackers also bypass content controls using indirect methods and probe agents for weaknesses. New attack paths emerge through agentic capabilities like document browsing and tool calls, often via indirect prompt injection. Organizations must extend security controls, validate external content, enforce least-privilege access, and prepare AI-specific incident response. → esecurityplanet.com |
| 2026-04-11 2026 | Protecting Against Indirect Prompt Injection Attacks in MCP intermediate | Library for mitigating Indirect Prompt Injection attacks within the Model Context Protocol (MCP). This resource details vulnerabilities like Tool Poisoning, where malicious instructions are embedded in tool metadata, and recommends implementing AI Prompt Shields with techniques like "Spotlighting" and "Datamarking." It also emphasizes supply chain security and general security hygiene as crucial for safeguarding AI systems. |
| 2026-04-11 2026 | Indirect Prompt Injection Attacks: Hidden AI Risks intermediate | Library for defending against indirect prompt injection attacks, a sophisticated AI threat recognized by OWASP as a top risk. This library addresses vulnerabilities where malicious instructions are embedded in external content like documents, emails, or images, rather than being submitted directly to an AI agent. It aims to mitigate risks such as data exfiltration and manipulation of business processes by enabling prompt injection detection, input validation, and the establishment of content security policies, similar to CrowdStrike's approach using its Falcon platform. |
| 2026-04-11 2026 | Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild intermediate | Writeup detailing observed in-the-wild indirect prompt injection (IDPI) attacks targeting AI agents. The analysis highlights real-world cases including AI-based ad review evasion, SEO manipulation for phishing, data destruction, and sensitive information leakage. It discusses 22 distinct payload engineering techniques and classifies attacker intents, emphasizing the growing weaponization of IDPI beyond theoretical risks. → unit42.paloaltonetworks.com |
| 2026-04-11 2026 | Anatomy of an Indirect Prompt Injection beginner | Library detailing the CFS (Context, Format, Salience) model for understanding indirect prompt injection in LLMs. It analyzes vulnerabilities, drawing on concepts like Simon Willison's "lethal trifecta" (access to private data, untrusted content exposure, external communication), and examines how attackers refine tactics to bypass LLM security. Real-world examples, such as the Supabase Model Context Protocol (MCP) attack, illustrate the dangers of embedding malicious instructions within seemingly benign data, leading to unauthorized data exposure or system compromise. |
| 2026-04-11 2026 | New Prompt Injection Attack Vectors Through MCP Sampling intermediate | Writeup of new prompt injection attack vectors targeting the Model Context Protocol (MCP) sampling feature. Exploiting the implicit trust model and lack of built-in security controls, attackers can achieve resource theft, conversation hijacking, and covert tool invocation. The analysis details three proof-of-concept examples and evaluates mitigation strategies for MCP-based systems, highlighting vulnerabilities in this LLM integration standard. → unit42.paloaltonetworks.com |
| 2026-04-11 2026 | A Timeline of Model Context Protocol (MCP) Security Breaches news | Timeline details MCP security breaches from April to December 2025, highlighting vulnerabilities like "tool poisoning" in WhatsApp MCP, prompt injection in GitHub MCP leading to data exfiltration, cross-tenant access flaws in Asana MCP, and remote code execution in Anthropic's MCP Inspector. Other incidents include OS command injection in `mcp-remote` (CVE-2025-6514), sandbox escapes in Anthropic's Filesystem-MCP server, supply-chain compromises via malicious MCP servers, systemic MCP design flaws enabling RCE in Flowise, and path traversal in Smithery MCP hosting. |
| 2026-04-11 2026 | The Vulnerable MCP Project: Comprehensive MCP Security Database beginner | Library of known vulnerabilities impacting MCP (Model Configuration Protocol) servers and SDKs. This catalog details specific exploits such as CVE-2025-68145, CVE-2025-68143, and CVE-2025-68144, alongside broader attack classes including prompt injection, DNS rebinding, Server-Side Request Forgery (SSRF), and command injection. Vulnerabilities affect various products like Anthropic's mcp-server-git, MCP TypeScript SDK, Cursor IDE, and Grafana MCP server, often enabling arbitrary code execution, data exfiltration, or unauthorized transactions. |
| 2026-04-11 2026 | MCP Security: Critical Vulnerabilities Every CISO Must Address in 2025 intermediate | Library detailing critical vulnerabilities in Model Context Protocol (MCP), a new standard for AI-tool integration. It highlights how prompt injection attacks in MCP ecosystems can trigger automated actions through connected systems, potentially leading to sensitive data exfiltration. The library also addresses supply chain risks, explaining how MCP servers can dynamically modify tool definitions, allowing for "rug pull" attacks where previously approved tools can be repurposed for malicious activity, affecting vendors like Microsoft and impacting applications such as Nginx-ui (CVE-2026-33032) and Adobe Acrobat Reader. |
| 2026-04-11 2026 | OWASP LLM Prompt Injection Prevention Cheat Sheet beginner | Reference LLM Prompt Injection Prevention Cheat Sheet detailing vulnerabilities in Large Language Model applications. It covers direct and indirect prompt injection, encoding and obfuscation techniques like Base64 and Unicode smuggling, and typoglycemia-based attacks. The resource also discusses jailbreaking methods such as DAN prompts, multi-turn attacks, system prompt extraction, data exfiltration, multimodal injection, RAG poisoning, and agent-specific attacks. Defenses include input validation and sanitization, with code examples for pattern matching and fuzzy matching against typoglycemia variants. → cheatsheetseries.owasp.org |
| 2026-04-11 2026 | Attention Tracker: Detecting Prompt Injection Attacks in LLMs intermediate | Attention Tracker: Detecting Prompt Injection Attacks in LLMs |
| 2026-04-11 2026 | How Microsoft Defends Against Indirect Prompt Injection Attacks intermediate | Library that defends against indirect prompt injection attacks targeting LLM-based systems. This library implements a multi-layered defense strategy including preventative techniques like hardened system prompts and Spotlighting, detection tools such as Microsoft Prompt Shields integrated with Defender for Cloud, and impact mitigation through data governance, user consent workflows, and deterministic blocking. It addresses vulnerabilities like data exfiltration via HTML images, clickable links, tool calls, and covert channels, as well as unintended actions and phishing. → microsoft.com |
| 2026-04-10 2026 | AI Cybersecurity After Mythos: The Jagged Frontier intermediate | Library for AI-driven vulnerability discovery, demonstrating that smaller, cheaper open-weight models can recover significant analysis from Anthropic's Mythos showcase, including detecting exploit candidates for FreeBSD and OpenBSD bugs. This work emphasizes that the effectiveness of AI cybersecurity lies in the surrounding system architecture and deep security expertise, rather than solely on frontier model scale, impacting the economics of the defensive pipeline. |
| 2026-04-10 2026 | Anthropic announces Claude Mythos for cybersecurity research news | Library for AI-driven cybersecurity research, Claude Mythos Preview autonomously identifies zero-day vulnerabilities and develops exploits. It has discovered critical issues in OpenBSD, FFmpeg, and the Linux kernel. Access is offered to select partners via Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry, with an application process for open-source maintainers. Anthropic provides usage credits and donations to security foundations, highlighting significant advances in autonomous vulnerability discovery over prior models. |
| 2026-04-10 2026 | Crushing the Axios supply chain threat with Tenable Hexa AI: Use cases for agentic AI intermediate | Tool for identifying exposure to the Axios npm supply chain attack using Tenable Hexa AI. This agentic AI automates scanning, asset identification, and remediation verification, mirroring workflows applicable to other emerging threats like CVEs and zero-days. It enables rapid assessment of exposure, scoping blast radius through asset tagging, and efficient prioritization, transforming emergency response from manual scripting to conversational command. → securityboulevard.com |
| 2026-04-10 2026 | MCP Security Vulnerabilities: Prompt Injection and Tool Poisoning intermediate | Library for securing Model Context Protocol (MCP) deployments against prompt injection and tool poisoning. It details vulnerabilities like metadata poisoning, over-permissioned tools, supply chain risks, and indirect prompt injection, referencing incidents such as the Supabase MCP Lethal Trifecta Attack. The library emphasizes prevention strategies including strict input validation, sanitization, and the principle of least privilege for tools. |
| 2026-04-10 2026 | How Agentic Tool Chain Attacks Threaten AI Agent Security intermediate | Library for securing AI agents against agentic tool chain attacks, detailing threats like tool poisoning, tool shadowing, and rugpull attacks that exploit the agent's reasoning layer and natural language-based decision-making. It covers how these attacks can lead to data exfiltration, unauthorized actions, and supply chain risks by manipulating tool descriptions, metadata, and server behavior, and recommends mitigation strategies including tool governance, version control, server identity controls, pre-execution guardrails, and observability. |
| 2026-04-10 2026 | 8,000+ MCP Servers Exposed: The Agentic AI Security Crisis of 2026 news | 8,000+ MCP Servers Exposed: The Agentic AI Security Crisis of 2026 |
| 2026-04-10 2026 | Agentic AI Security in Production: MCP, Memory Poisoning, Tool Misuse intermediate | Tool, a comprehensive analysis of agentic AI security in production, details critical failure modes including MCP Security, Memory Poisoning, and Tool Misuse. It highlights the evolving threat landscape where agents plan and execute actions, emphasizing system design over prompt-level fixes. Specific vulnerabilities like CVE-2025-68144 in mcp-server-git and attack models such as MINJA and AgentPoison are examined, underscoring the need for robust controls across input, memory, tool execution, and identity planes to manage the expanded attack surface created by these systems. → penligent.ai |
| 2026-04-10 2026 | Offensive Security for MCP Servers: How to Prevent AI Agent Exploits intermediate | Library for securing MCP (Multi-Cloud Platform) servers against AI agent exploits, addressing vulnerabilities like command injection, SSRF, and path traversal frequently found in modern deployments. It highlights how AI's autonomous execution and dynamic capability discovery, unlike traditional REST APIs, create new risk classes by enabling agents to chain tool calls and reason across APIs. The library emphasizes adapting security from syntax to intent validation, guarding against prompt injection and tool poisoning where manipulated metadata or input can lead to unintended, privileged operations, ultimately leveraging foundational API security principles. |
| 2026-04-10 2026 | The New AI Attack Surface: 3 AI Security Predictions for 2026 beginner | Library for confronting three AI attack vectors manifesting in production by 2026: indirect injection via data poisoning, supply chain infiltration through AI development toolchains like MCP servers, and agent-to-agent attack propagation through "toxic combinations" in autonomous agent ecosystems. These vectors exploit how AI agents interpret instructions, trust data sources, and execute permitted actions, moving beyond traditional code vulnerabilities to exploit data as executable commands and the inherent trust in interconnected AI architectures. |
| 2026-04-10 2026 | Introduction to Data Poisoning: A 2026 Perspective beginner | Library introducing data poisoning, an adversarial attack corrupting AI/LLM training data to cause backdoors or biased outputs. It details real-world incidents like Basilisk Venom poisoning GitHub code, Qwen 2.5's search tool manipulation, Grok 4's "!Pliny" backdoor triggered by X prompts, and hidden instructions in MCP tools like "joke_teller." The library also covers poisoning in retrieval (RAG), synthetic data pipelines (VIA), and diffusion models, highlighting how even small, hidden manipulations can undermine AI safety and trust across the entire LLM lifecycle. |
| 2026-04-10 2026 | AI Security Research — December 2025 news | AI Security Research — December 2025 |
| 2026-04-10 2026 | From Prompt Injections to Protocol Exploits in LLM Agent Workflows advanced | From Prompt Injections to Protocol Exploits in LLM Agent Workflows |
| 2026-04-10 2026 | LLM Security Guide: OWASP GenAI Top-10 Risks beginner | Library detailing offensive and defensive security for Large Language Models and Agentic AI Systems, updated with the OWASP Top 10 for LLMs 2025 and the OWASP Top 10 for Agentic Applications 2026. It covers Agentic AI Security, RAG Vulnerabilities, System Prompt Leakage, Vector/Embedding Weaknesses, and AI Compliance, incorporating tools like DeepTeam, Promptfoo, ARTKIT, and frameworks such as Meta LlamaFirewall and Amazon Bedrock Guardrails. |
| 2026-04-10 2026 | Prompt Injection Attacks in LLMs: A Comprehensive Review intermediate | Prompt Injection Attacks in LLMs: A Comprehensive Review |
| 2026-04-10 2026 | Prompt Injection Attacks: Examples, Techniques, and Defence intermediate | Library for understanding and defending against prompt injection, a critical LLM security vulnerability. It details direct and indirect injection techniques, including examples like DAN jailbreaks, EchoLeak (CVE-2025-32711), and webpage poisoning attacks, as reported by OWASP, NCSC, and Anthropic. This resource provides practical defense strategies and highlights the inherent challenges in distinguishing trusted instructions from untrusted data within LLM architectures. |
| 2026-04-10 2026 | Indirect Prompt Injection: The Hidden Threat intermediate | Library for understanding and defending against indirect prompt injection, a vulnerability where hidden instructions within ingested data (webpages, PDFs, emails, code) can hijack AI reasoning or tool actions. It details real-world incidents like the Perplexity Comet leak and CVE-2025-59944, highlighting how agentic AI amplifies risk. Mitigation requires architectural changes, not prompt tuning, focusing on trust boundaries, context isolation, and output verification. |
| 2026-04-10 2026 | AI Agent Security in 2026: Prompt Injection and Memory Poisoning intermediate | Library for understanding AI agent security risks, focusing on prompt injection and memory poisoning attacks. It details indirect prompt injection's impact via emails and documents, exemplified by CVE-2025-32711, and memory poisoning attacks like MemoryGraft, where agents develop false beliefs. The library also covers tool misuse through hidden instructions in metadata, misleading examples, and permissive schemas, observed in frameworks like CrewAI and AutoGen, and discusses supply chain vulnerabilities where agents fetch runtime dependencies without human review. |
| 2026-04-10 2026 | Prompt Injection Attacks in 2025: Vulnerabilities and Defense beginner | Library for defending against prompt injection attacks, a significant threat to AI applications highlighted by CVE-2025-32711 and techniques like "EchoLeak." It addresses direct, indirect, and agentic injection methods, including those targeting LangChain with CVE-2025-68664 ("LangGrinch") and demonstrations against Gemini. The library supports defenses like input validation with pattern matching and structured prompt architecture using randomized delimiters, drawing insights from tools like Lakera Guard and Microsoft Prompt Shields. |
| 2026-04-10 2026 | Prompt Injection: The Most Common AI Exploit in 2025 beginner | Library detailing prompt injection, the most common AI exploit in 2025, which manipulates AI instructions rather than code. It categorizes attacks into direct, indirect, jailbreak, and cross-plugin poisoning, highlighting risks to enterprise RAG systems and SaaS security operations. The resource emphasizes robust AI agent identity, authorization, continuous monitoring with anomaly detection, and integrating AI security telemetry into existing SIEM infrastructure, aligning with frameworks like NIST AI RMF and ISO/IEC 42001. |
| 2026-04-10 2026 | AI Prompt Injection Attacks: How They Work (2026) beginner | Library for defending against AI prompt injection attacks, detailing their evolution from academic curiosities to operational threats with documented cases affecting OpenAI's GPT models and Anthropic's Claude. It covers attack mechanisms like "instruction confusion," evolving vectors such as encoding-based and multi-turn conversation attacks, and real-world incidents like the OpenClaw vulnerability, demonstrating data exfiltration and financial losses totaling $2.3 billion globally in 2025. The library addresses insufficient input sanitization, overprivileged AI agents, and a lack of output validation, highlighting detection gaps where current methods catch only 23% of sophisticated attempts. |
| 2026-04-10 2026 | LLM Security Risks in 2026: Prompt Injection, RAG, and Shadow AI beginner | Library for mitigating LLM security risks, including prompt injection, RAG data poisoning, and autonomous exploits like EchoLeak demonstrated against Microsoft 365 Copilot. It addresses the blurred line between data and instructions, AI outputs triggering actions, and the human element in vulnerabilities, emphasizing containment strategies like limiting AI privileges and validating outputs. |
| 2026-04-09 2026 | Claude Code security settings nobody told you about beginner | Claude Code security settings nobody told you about |
| 2026-04-09 2026 | LangChain Langflow LiteLLM: When AI's Foundation Code Becomes the Attack Surface intermediate | Library of vulnerabilities impacting foundational AI frameworks like LangChain, LangGraph, Langflow, and LiteLLM, including path traversal (CVE-2026-34070), serialization injection (CVE-2025-68664), SQL injection (CVE-2025-67644), and remote code execution (CVE-2026-33017). The article also details a supply chain attack on LiteLLM via a compromised Trivy security scanner, highlighting the systemic risks in AI infrastructure. → securityboulevard.com |
| 2026-04-09 2026 | Is 46% of your AI-generated code vulnerable? beginner | Platform for securing AI-generated code, addressing research showing 46% of AI code contains vulnerabilities. It integrates Software Composition Analysis (SCA), Static Application Security Testing (SAST), and Dynamic Application Security Testing (DAST) directly into IDEs and LLMs like Gemini and GitHub Copilot, while also integrating with tools from Wiz, Snyk, and Black Duck. The platform emphasizes continuous governance throughout the Software Development Life Cycle (SDLC) and maintains the necessity of human oversight for final code acceptance and remediation. → techzine.eu |
| 2026-04-09 2026 | Claude Code Can Be Manipulated via CLAUDE.md to Run SQL Injection Attacks intermediate | Library that allows manipulation of Claude Code via CLAUDE.md files to automate SQL injection attacks and steal credentials. Researchers at LayerX discovered that by adding three lines of basic English to the CLAUDE.md file, Claude Code's safety guardrails can be bypassed, leading it to execute unauthorized commands and perform actions such as login bypass and database dumping using techniques like SQL injection. The AI trusts the instructions within the CLAUDE.md file implicitly, creating a significant attack surface. → hackread.com |
| 2026-04-08 2026 | theNET | De-risking the AI rollout intermediate | Library for de-risking AI rollouts, providing probabilistic security to address novel threats like prompt injection, data poisoning, and denial-of-wallet attacks. It emphasizes model-agnostic, inline protection, input/output monitoring, observability, and integration with traditional application security to safeguard AI-powered applications against deterministic and unpredictable attack paths. |
| 2026-04-08 2026 | AI Security Risks: How Enterprises Manage LLM Shadow AI and Agentic Threats intermediate | Library for AI Security Posture Management (AISPM) designed to provide enterprises with visibility and control over LLM shadow AI and agentic threats. It addresses risks including prompt injection, jailbreaking, data poisoning, and data leakage from unsanctioned AI tools. The library focuses on the emerging threat landscape of agentic AI, where autonomous systems can execute multi-step actions, and highlights the critical risk of Agent Goal Hijacking as outlined in the OWASP Agentic Top 10. → securityboulevard.com |
| 2026-04-06 2026 | Best AI Security Tools in 2026 beginner | Platforms for AI security are ranked by their coverage of three critical phases: discovering AI assets and mapping threat graphs (Phase 1), conducting adversarial testing against live applications and RAG pipelines (Phase 2), and deploying runtime guardrails calibrated from red teaming results (Phase 3). Repello AI offers full-lifecycle coverage with its Inventory, ARTEMIS, and ARGUS products. HiddenLayer focuses on model artifact scanning and runtime model anomaly detection. Mindgard provides automated multimodal AI security testing, primarily for Phase 2. Lakera, now part of Check Point, specialized in runtime guardrails for LLM applications. |
| 2026-04-06 2026 | OWASP Top 10 for Agents 2026 beginner | Framework for assessing OWASP Agentic AI (ASI) Top 10 2026 risks, including Agent Goal Hijack (ASI01), Tool Misuse & Exploitation (ASI02), and Agent Identity & Privilege Abuse (ASI03). It addresses vulnerabilities introduced by autonomous agents' reasoning, memory, tool integration, and multi-step execution, detecting issues like unexpected code execution (ASI05) and insecure inter-agent communication (ASI07). The framework integrates with DeepTeam's red teaming capabilities for programmatic risk assessment. |
| 2026-04-06 2026 | Google Workspace's Continuous Approach to Mitigating Prompt Injection intermediate | Google Workspace's Continuous Approach to Mitigating Prompt Injection |
| 2026-04-06 2026 | Prompt Injection Attacks in LLMs: What Developers Need to Know in 2026 beginner | Guide on prompt injection attacks in LLMs, detailing how attackers manipulate models using natural language to override system instructions. It covers direct (jailbreaking) and indirect injection, citing examples like the Chevrolet dealership GPT and Perplexity Comet credential theft incidents. Developers are advised to implement architectural separation of instructions, conversation token limits, input filtering, AI guardrails, and developer training to mitigate these risks. |
| 2026-04-05 2026 | Prompt Injection and LLM Jailbreaks in Production intermediate | Prompt Injection and LLM Jailbreaks in Production https://ift.tt/IiBLkyh → blockchain-council.org |
| 2026-04-05 2026 | LangChain LangGraph Flaws Expose Files Secrets Databases in Widely Used AI Frameworks intermediate | Library vulnerabilities in LangChain and LangGraph, specifically CVE-2026-34070 (path traversal), CVE-2025-68664 (deserialization of untrusted data), and CVE-2025-67644 (SQL injection), allow attackers to access arbitrary files, steal API keys and environment secrets, and manipulate SQL queries. These flaws, impacting widely used LLM application frameworks, have been patched in recent versions of langchain-core and langgraph-checkpoint-sqlite. → thehackernews.com |
| 2026-04-05 2026 | Adversarial AI in Cybersecurity: Threats and Mitigation intermediate | Adversarial AI in Cybersecurity: Threats and Mitigation https://ift.tt/fV3sob0 → blockchain-council.org |
| 2026-04-04 2026 | Detecting and analyzing prompt abuse in AI tools intermediate | Playbook detailing detection, investigation, and response to AI prompt abuse. It covers direct prompt overrides, extractive prompt abuse against sensitive inputs, and indirect prompt injection, including the HashJack technique affecting AI summarization tools via URL fragments. This guide leverages Microsoft security tools like Defender for Cloud Apps, Purview DLP, Microsoft Entra ID conditional access, and Microsoft Sentinel to monitor AI interactions and protect against manipulation. → microsoft.com |
| 2026-04-03 2026 | Prompt Injection and LLM Jailbreaks: Defenses intermediate | Survey of prompt injection and LLM jailbreak defenses, addressing risks in generative AI and agentic workflows. It differentiates between instruction hijacking and policy evasion, detailing why modern long-context and tool-using systems amplify attack impact. The survey outlines common attack patterns like instruction override and hidden instructions, then proposes layered defenses including inference-time filtering, independent guardrails, model-level hardening techniques like salting, and secure architectural controls for tool-using systems. → blockchain-council.org |
| 2026-04-03 2026 | Training an AI agent to attack LLM applications like a real adversary advanced | Tool that simulates adversarial attacks against LLM-powered applications. This AI pentesting agent autonomously chains techniques like prompt injection, indirect prompt injection, and tool abuse to uncover vulnerabilities missed by traditional scanners. It gathers application context, probes role-based access control, and supports models from OpenAI, Anthropic, and open-source providers, integrating into CI/CD pipelines for continuous testing. Novee Security's agent is trained on real-world vulnerability research, including findings like arbitrary code execution in the Cursor coding assistant. → helpnetsecurity.com |
| 2026-04-03 2026 | Prompt Injection Attacks in LLMs: Vulnerabilities, Exploitation & Defense intermediate | Prompt Injection Attacks in LLMs: Vulnerabilities, Exploitation & Defense |
| 2026-04-03 2026 | How AI Red Teaming Fixes Vulnerabilities in Your AI Systems intermediate | Library for AI Red Teaming provides a practical playbook for CISOs and AI leaders to test AI systems, including LLMs and chatbots, for vulnerabilities before deployment. It simulates attacks and misuse to identify weaknesses across prompts, data, and agent interactions, addressing risks like prompt injection, data leakage, and abuse of model autonomy. This method moves beyond isolated model testing to system-wide evaluation in operational settings, aligning with frameworks like MITRE ATLAS, EU AI Act, and NIST's AI Risk Management Framework to ensure safe and compliant AI use. |
| 2026-04-03 2026 | What Is Prompt Injection in AI? Examples & Prevention | EC-Council beginner | Library for defending against prompt injection attacks, a technique where attackers manipulate AI systems through malicious instructions embedded in prompts. This resource details direct and indirect injection methods, citing real-world vulnerabilities like CVE-2025-53773 affecting GitHub Copilot and ChatGPT's Azure backdoor. It also highlights attacks against Google Jules and Devin AI, emphasizing the enterprise-wide compromise risks due to AI access to sensitive data and infrastructure. Mitigation strategies include zero-trust AI architecture, strict privilege separation, real-time threat detection, human-in-the-loop approvals, and continuous red teaming. |
| 2026-04-03 2026 | Prompt Injection Attacks in 2025: Risks, Defenses & Testing intermediate | Library for detecting and mitigating prompt injection attacks in LLM-powered applications. It focuses on adversarial input testing, prompt isolation analysis, output validation, and workflow abuse simulation to uncover risks missed by traditional security tools. The library addresses how malicious instructions can manipulate model behavior, spread through trusted content, and create business-level impact, emphasizing that prompt injection is a trust problem at the intersection of application logic, content ingestion, and workflow design. |
| 2026-04-03 2026 | Red Teaming the Mind of the Machine: Evaluation of Prompt Injection and Jailbreak Vulnerabilities intermediate | Survey of prompt injection and jailbreak vulnerabilities against state-of-the-art LLMs including GPT-4, Claude 2, Mistral 7B, and Vicuna. This research categorizes over 1,400 adversarial prompts and analyzes their success rates, generalizability, and construction logic, drawing from public repositories and forums. The study also proposes layered mitigation strategies and recommends a hybrid red-teaming and sandboxing approach for robust AI security, noting prompt injection as a critical vulnerability identified by OWASP. → arxiv.org |
| 2026-04-03 2026 | Practical LLM Security Advice from the NVIDIA AI Red Team intermediate | Library summarizing NVIDIA AI Red Team findings, detailing common LLM application vulnerabilities. It addresses risks like remote code execution (RCE) from executing LLM-generated code (e.g., via `exec` or `eval`), insecure permissions in Retrieval-Augmented Generation (RAG) data stores leading to data leakage and prompt injection, and data exfiltration through active content rendering of Markdown or hyperlinks. Mitigation strategies include sandboxing dynamic code, rigorously managing RAG permissions, and sanitizing LLM output. |
| 2026-04-03 2026 | OWASP Top 10 for LLMs 2025 | DeepTeam Red Teaming Framework beginner | Framework integrating OWASP Top 10 for LLMs 2025 risks, including Prompt Injection (LLM01), Sensitive Information Disclosure (LLM02), Supply Chain (LLM03), Data and Model Poisoning (LLM04), Improper Output Handling (LLM05), Excessive Agency (LLM06), System Prompt Leakage (LLM07), and Vector and Embedding Weaknesses (LLM08). It facilitates detection of vulnerabilities in RAG systems and autonomous agents through programmatic assessment or the Confident AI platform. |
| 2026-04-03 2026 | Continuously Hardening ChatGPT Against Prompt Injection | OpenAI intermediate | Continuously Hardening ChatGPT Against Prompt Injection | OpenAI |
| 2026-04-03 2026 | Red Teaming LLMs Exposes a Harsh Truth About the AI Security Arms Race news | Red Teaming LLMs Exposes a Harsh Truth About the AI Security Arms Race |
| 2026-04-03 2026 | LLM01:2025 Prompt Injection | OWASP Gen AI Security beginner | Reference detailing LLM01:2025 Prompt Injection, a vulnerability where user prompts unintendedly alter Large Language Model behavior. The OWASP Gen AI Security resource covers direct and indirect injections, including scenarios like CVE-2024-5184 exploitation in email assistants and multimodal attacks. It outlines mitigation strategies such as constraining model behavior, input/output filtering, and adversarial testing, emphasizing that while prevention is challenging, impact reduction is achievable. → genai.owasp.org |
| 2026-04-03 2026 | AI Security Projects for Practice: 10 Hands-On Labs beginner | Labs provide hands-on practice with prompt injection, including direct and indirect attacks, excessive agency, and tool invocation risks, as well as data poisoning techniques like label-flipping and backdoor trigger injection. These projects are crucial for understanding and mitigating threats outlined in the OWASP LLM Top 10 and MITRE ATLAS, covering offensive strategies and defensive hardening across various AI system components, from preprocessing to model integrity checks and DevSecOps pipelines. → blockchain-council.org |
| 2026-04-03 2026 | AI Security Roadmap: From Basics to Model Defense beginner | Reference outlining a structured AI security roadmap, progressing from fundamentals to model defense. It highlights unique threats like prompt injection and data poisoning, and maps learning paths to frameworks such as OWASP Top 10 for LLMs, NIST AI RMF, and MITRE ATLAS. The guide also details practical tooling patterns like AI Security Posture Management (AI-SPM) and adversarial testing tools such as Microsoft Counterfit and IBM Adversarial Robustness Toolbox. → blockchain-council.org |
| 2026-04-03 2026 | AI Security Certification Guide for 2026 beginner | Guide to AI security certifications for 2026, detailing credentials for technical, governance, and audit roles. It highlights the growing importance of AI-specific risks like prompt injection and data leakage, and aligns certifications with frameworks such as OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS, SAIF, and ISO/IEC 42001. The guide emphasizes hands-on assessment and explains how to choose the right credential based on role fit, framework alignment, cost, and industry recognition. → blockchain-council.org |
| 2026-04-02 2026 | Guarding LLMs With a Layered Prompt Injection Representation intermediate | Library for LLM security that learns a low-dimensional latent representation of prompt injection attacks. This approach complements perplexity-based filtering and achieves high precision and recall by training a classifier on features derived from this learned representation, distinguishing benign prompts from adversarial ones. → trendmicro.com |
| 2026-04-02 2026 | Auditing the Gatekeepers: Fuzzing "AI Judges" to Bypass Security Controls intermediate | Tool for fuzzing AI judges, called AdvJudge-Zero, exploits prompt injection vulnerabilities in LLM-based security gatekeepers. This fuzzer identifies stealthy control tokens, such as formatting symbols and structural phrases, that manipulate the AI's decision-making logic to bypass safety policies and allow prohibited content, or corrupt training data by awarding high scores to incorrect responses. The research demonstrates a 99% success rate in bypassing controls across various LLM architectures, highlighting the need for adversarial training to harden these systems. → unit42.paloaltonetworks.com |
| 2026-04-02 2026 | AI Security for Apps is now generally available news | Library for securing AI-powered applications, generally available, offering discovery of AI endpoints, detection of prompt injection and PII exposure, and mitigation via WAF rules. New features include custom topic detection and free AI endpoint discovery for all Cloudflare customers, with expanded integrations with IBM and Wiz for unified security posture management. It addresses risks cataloged in the OWASP Top 10 for LLM Applications, such as prompt injection and sensitive data leakage, by analyzing prompt and output behavior rather than fixed operations. |
| 2026-03-15 2026 | mukul975/Anthropic-Cybersecurity-Skills: 734+ structured cybersecurity skills for AI agents · MITRE ATT&CK mapped · agentskills.io standard · Claude Code, Copilot, Codex CLI, Cursor, Gemini CLI beginner | Library of 754 structured cybersecurity skills designed for AI agents, mapped to MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, MITRE D3FEND, and NIST AI RMF. This community project provides production-grade workflows for tasks including memory forensics with Volatility3, Kerberoasting detection via Sigma rules, and cloud breach scoping, enabling AI to perform expert-level investigations across platforms like Claude Code, GitHub Copilot, and Gemini CLI. |
| 2026-03-14 2026 | Teaching Claude Everything You've Hacked intermediate | Library that syncs HackerOne bounty history to a local SQLite database and integrates with AI assistants like Claude via the Model Context Protocol (MCP). It cross-references your personal reports and publicly disclosed bounty-awarded reports against target scopes, identifying overlooked areas and profitable weakness types. This tool also includes a database of community-submitted reports and enables Claude to access and reason over your bounty data, assisting in strategy and discovery. |
| 2026-03-12 2026 | Needle in the haystack: LLMs for vulnerability research intermediate Bug Bounty | Library for using LLMs in vulnerability research, focusing on minimal scaffolding for effective code auditing. It highlights the problem of context rot in large language models, demonstrating how overly broad prompts and excessive context lead to missed vulnerabilities. Instead, the approach emphasizes creating a targeted threat model derived from previous CVEs and specific entry points to guide LLMs toward discovering nuanced issues, as seen in its case study with Claude Opus and Firefox. |
| 2026-03-12 2026 | PatrikFehrenbach/h1-brain: MCP server that connects AI assistants to HackerOne for bug bounty hunting intermediate Bug Bounty | Library for connecting AI assistants to HackerOne bug bounty programs. It ingests personal bug bounty history, program scopes, and report details into a local SQLite database, and also includes a pre-built database of over 3,600 publicly disclosed bounty-awarded HackerOne reports. The core `hack(handle)` tool generates comprehensive attack briefings by combining personal data with community vulnerability write-ups, weakness types, and bounty amounts, suggesting attack vectors against untouched assets. |
| 2026-03-09 2026 | GitHub - eliasbiondo/linkedin-mcp-server: 🔗 A Model Context Protocol (MCP) server for LinkedIn — search people, companies, and jobs, scrape profiles, and get structured data via any MCP-compatible AI client. intermediate | Library for accessing LinkedIn data via a Model Context Protocol (MCP) server. It enables searching for people, companies, and jobs, scraping detailed profiles with granular section control (main profile, experience, education, contact info, interests, honors, languages, posts, recommendations), and retrieving structured JSON output. Built with FastMCP and Patchright, it supports both stdio and HTTP transports for various AI client integrations, with session persistence and configurable browser automation settings. |
| 2026-03-08 2026 | How I use LLMs For Security Work: Part 2 intermediate Bug Bounty | Library for leveraging Large Language Models (LLMs) in security work, focusing on advanced patterns beyond basic prompting. It details concepts like Agents, Skills (SKILLS.md), Workflows, and Assistants, emphasizing the critical role of providing precise context through documentation, requirements, and decision-making parameters. The article illustrates how well-defined prompts with explicit instructions and expected outputs, as opposed to vague requests, significantly improve LLM inference for tasks like automating browser profile management for threat hunting. |
| 2026-03-01 2026 | gadievron/raptor: Raptor turns Claude Code into a general-purpose AI offensive/defensive security agent. By using Claude.md and creating rules, sub-agents, and skills, and orchestrating security tool usage, we configure the agent for adversarial thinking, and perform research or attack/defense operations. intermediate AuthZ | Framework turning Claude Code into an autonomous AI security agent, RAPTOR orchestrates static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing. It employs Semgrep and CodeQL for scanning, using Z3 for dataflow and one-gadget constraint analysis to improve exploit feasibility. RAPTOR supports customizable LLM analysis dispatchers and offers project management features for organized research and reporting. |
| 2026-02-25 2026 | hexsecteam/HexSecGPT: HexSecGPT is designed to provide powerful, unrestricted, and seamless AI-driven conversations, pushing the boundaries of what is possible with natural language processing. beginner | Framework for AI-driven conversations that pushes natural language processing boundaries, utilizing third-party APIs from OpenRouter or DeepSeek with a specialized system prompt. This open-source wrapper demonstrates a proof-of-concept, offering a glimpse of HexSecGPT's capabilities through a command-line interface on platforms like Kali Linux, Ubuntu, and Termux. Users can obtain API keys from OpenRouter or DeepSeek for integration. The framework includes installation scripts and a model discovery script for managing API provider model availability. |
| 2026-02-23 2026 | ottosulin/awesome-ai-security: A collection of awesome resources related AI security beginner | Library of curated resources covering AI security, including frameworks, standards, learning materials, and open-source tools. It details attack techniques, defense strategies, benchmarks, and specific vulnerabilities, referencing OWASP LLM Top 10, NIST AIRC, MITRE ATLAS, and tools like garak and promptfoo for vulnerability scanning and prompt injection testing. The collection also highlights resources for understanding adversarial attacks such as evasion, poisoning, extraction, and inference, mentioning libraries like Adversarial Robustness Toolkit (ART), cleverhans, and foolbox. |
| 2026-02-21 2026 | samugit83/redamon: An AI-powered agentic red team framework that automates offensive security operations, from reconnaissance to exploitation to post-exploitation, with zero human intervention. advanced Recon | Framework that autonomously orchestrates offensive security operations from reconnaissance to post-exploitation, integrating AI agents for vulnerability validation via Hydra, privilege escalation exploits, and XSS mapping. It logs findings in a Neo4j knowledge graph, then utilizes a CypherFix AI triage agent to deduplicate and rank vulnerabilities. A subsequent CodeFix agent clones repositories, applies targeted fixes using 11 code-aware tools, and submits a GitHub pull request for review. |
| 2026-02-20 2026 | Microsoft says bug causes Copilot to summarize confidential emails news | Advisory regarding a Microsoft 365 Copilot bug where confidential emails were summarized, bypassing data loss prevention policies. This issue, tracked under CW1226324 and detected January 21, affected the Copilot "work tab" chat feature, incorrectly processing emails in Sent Items and Drafts, even those with confidentiality labels. Microsoft confirmed a code error as the root cause and began rolling out a fix in early February, with remediation continuing for complex service environments. → bleepingcomputer.com |
| 2026-02-18 2026 | anthropics/prompt-eng-interactive-tutorial: Anthropic's Interactive Prompt Engineering Tutorial beginner | Tutorial on prompt engineering for Claude, teaching basic prompt structure, failure modes, Claude's capabilities, and building complex prompts for use cases like chatbots, legal, and financial services. It includes an interactive playground for practice, exercises, an answer key, and an appendix covering chaining prompts, tool use, and search/retrieval, recommending the Claude for Sheets extension for user-friendliness. |
| 2026-02-17 2026 | vxcontrol/pentagi: ✨ Fully autonomous AI Agents system capable of performing complex penetration testing tasks advanced Recon | Tool for fully autonomous AI-powered penetration testing, PentAGI leverages a team of specialized agents and integrates professional security tools like nmap, metasploit, and sqlmap within a secure Docker environment. It features a smart memory system, knowledge graph integration with Neo4j, and external search capabilities via Tavily, Perplexity, and Google Custom Search, with comprehensive monitoring and reporting through Grafana and PostgreSQL. |
| 2026-02-16 2026 | How I Built a 5-Path AI “Recon Beast” with n8n and Gemini (2026 Guide) intermediate Bug Bounty Recon | In 2026, the bug bounty landscape requires more than just speed, with AI enhancing attacker capabilities. The article discusses building a 5-Path AI "Recon Beast" using n8n and Gemini. This innovative approach leverages automation and AI to enhance reconnaissance processes for bug bounty hunting. The focus is on utilizing technology to improve efficiency and effectiveness in identifying vulnerabilities. |
| 2026-02-11 2026 | Thread by @firt on Thread Reader App advanced | Library updates detail the early preview of Chrome's WebMCP, enabling AI agents to query and execute services via imperative or declarative APIs. It also highlights Safari/WebKit's unanswered community questions, contrasting with Chrome's PWA installation on Windows 7, 8.x, and 10, which features a distinct "Install" verb and a similar UX to Chromebook PWAs. |
| 2026-02-11 2026 | SILENTCHAIN AI - AI-Powered Security Testing intermediate Burp | Library for AI-powered offensive security, covering web applications, source code, and network infrastructure. Features include OWASP Top 10 detection via a Burp Suite extension, standalone web application scanning with CI/CD integration, and AI-powered static code analysis with PoC generation. It integrates with five AI providers, including local Ollama support, and utilizes a RAG Knowledge Engine with over 80,000 security documents. Products offer cross-product correlation for finding escalation, WAF detection and evasion for 25+ types, and out-of-band testing for XSS, SSRF, and XXE. |
| 2026-02-10 2026 | Ed1s0nZ/CyberStrikeAI: CyberStrikeAI is an AI-native security testing platform built in Go. It integrates 100+ security tools, an intelligent orchestration engine, role-based testing with predefined security roles, a skills system with specialized testing skills, and comprehensive lifecycle management capabilities. intermediate | Platform that leverages AI for automated security testing. It integrates over 100 tools, including network scanners like nmap, web scanners such as sqlmap, and vulnerability scanners like nuclei. The platform features an intelligent orchestration engine, role-based testing with predefined security roles, and a skills system for specialized testing. It supports conversational commands, attack-chain analysis, knowledge retrieval via RAG, and provides a dashboard for system status and vulnerability management. Integrations include a Burp Suite extension and chatbot capabilities for DingTalk and Lark. |
| 2026-02-07 2026 | Agent twitter client mcp beginner | Agent twitter client mcp |
| 2026-02-06 2026 | Claude Opus 4.6 Finds 500+ High-Severity Flaws Across Major Open-Source Libraries news Bug Bounty | Library where Claude Opus 4.6 identified over 500 high-severity vulnerabilities in open-source projects like Ghostscript, OpenSC, and CGIF. The LLM demonstrated advanced code reasoning, finding flaws such as a missing bounds check in Ghostscript, a buffer overflow in OpenSC, and a heap buffer overflow in CGIF, even outperforming traditional fuzzers on complex logic-based bugs. → thehackernews.com |
| 2026-02-06 2026 | xalgord/AI-System-Prompts: XBot - Advanced AI Cybersecurity Agent | Gemini system prompt for automated penetration testing and security assessments intermediate Bug Bounty | Library for XBot, an advanced AI cybersecurity agent system prompt for Gemini AI, facilitating automated penetration testing and security assessments. It supports comprehensive vulnerability scanning, active exploitation, OWASP Top 10 and advanced web application security testing, source code analysis, network security, and detailed reporting with remediation guidance. The system prompt enables autonomous operation, multi-target scanning, and robust vulnerability detection on authorized systems. |
| 2026-02-02 2026 | depthfirst | 1-Click RCE To Steal Your Moltbot Data and Keys advanced RCE Secrets | Library that identifies vulnerabilities in OpenClaw, formerly Moltbot, by analyzing its code for logic flaws. The system maps application lifecycle flows, flagging issues like blindly accepting gateway URLs which, when combined with other issues, can lead to a 1-click RCE exploit, CVE-2026-25253. This exploit allows attackers to steal data and keys by chaining a Cross-Site WebSocket Hijacking vulnerability with API calls to disable security features. |
| 2026-02-02 2026 | skills/plugins/insecure-defaults/skills/insecure-defaults/SKILL.md at main · trailofbits/skills intermediate | Library for identifying fail-open vulnerabilities in applications, distinguishing exploitable defaults from crash-safe patterns. It aids in security audits by reviewing code, deployment configurations, and IaC templates for issues like fallback secrets, hardcoded credentials, weak defaults in authentication and CORS, insecure crypto algorithms such as MD5 and ECB, and exposed debug features. The library emphasizes analyzing production-reachable code and tracing execution paths to determine runtime behavior and assess the criticality of findings. |
| 2026-02-01 2026 | Prompt Injection Toolkit: 25 Payloads & Techniques for Mastering AI Pentesting intermediate Bug Bounty | Prompt Injection Toolkit: 25 Payloads & Techniques for Mastering AI Pentesting Ever tried breaking an AI chatbot with a ‘please ignore all previous instructions’ prompt, only to realize it’s … |
| 2026-01-28 2026 | insaaniManav/prompt-forge: AI prompt engineering workbench for crafting, testing, and systematically evaluating prompts with powerful analysis tools. intermediate | Workbench for AI prompt engineering that generates, analyzes, and systematically tests prompts, featuring smart generation with AI suggestions, advanced analysis for optimization feedback, and systematic evaluation creating comprehensive test suites for robustness, safety, accuracy, and creativity. It supports multiple models including Claude 3.5 Sonnet, GPT-4.1, Azure OpenAI, and Ollama, with organized version control and detailed execution history. |
| 2026-01-27 2026 | Hunting Account Takeovers in the Wild West of MCP OAuth Servers" intermediate AuthN | Library that details critical OAuth misconfigurations in MCP (Model Context Protocol) servers, enabling one-click account takeover (ATO) attacks. Vulnerabilities include open Dynamic Client Registration (DCR) and missing redirect URI validation, allowing attackers to register malicious clients and intercept authentication codes. The research highlights findings from subdomain enumeration, endpoint discovery, and configuration analysis, focusing on misaligned security settings like unprotected DCR endpoints and unsupported PKCE enforcement. |
| 2026-01-25 2026 | Coding Agents. The Insider Threat You Installed Yourself beginner | Coding Agents. The Insider Threat You Installed Yourself Stop Running AI Coding Assistants Blindly AI coding agents are booming everywhere right now. Not only because they help you ship code faster … |
| 2026-01-23 2026 | GitHub - mholzen/workflowy: Powerful CLI and MCP server for WorkFlowy: reports, search/replace, backup support, and AI integration (Claude, LLMs) intermediate | Tool for WorkFlowy, offering a CLI and MCP server. It enables AI integration with models like Claude and ChatGPT, alongside features for search, bulk replace, usage reports, and offline access via backup files. This Go-based application supports full-text search with regex, content transformation, and can pipe data through shell commands for LLM processing. Installation is available via Homebrew, Scoop, Go, or pre-built binaries. |
| 2026-01-22 2026 | AI’s Hacking Skills Are Approaching an ‘Inflection Point’ news Bug Bounty | Library detecting federated GraphQL vulnerabilities; AI models are increasingly capable of finding zero-day bugs and complex system interactions, as demonstrated by RunSybil's Sybil tool and Dawn Song's CyberGym benchmark. Frontier models like Anthropic's Claude Sonnet 4.5 show significant improvements in vulnerability identification, highlighting the growing need for AI-assisted defense strategies and secure-by-design coding practices. → wired.com |
| 2026-01-18 2026 | harishsg993010/crossbow-agent: world's first Opensource fully Autonomous AI Security Engineer intermediate | Library for an autonomous AI security engineer, "crossbow-agent," which finds and exploits vulnerabilities like hardcoded credentials, SQL injection, exposed admin panels, API key leaks, IDOR, command injection, session fixation, XSS, insecure file permissions, missing rate limiting, XXE, CORS misconfigurations, open redirects, JWT secret key leaks, NoSQL injection, SSRF, weak cryptography, race conditions, and directory traversal. It supports multiple AI models (GPT, Claude, Gemini) and integrates with OpenAI, Anthropic, or Google APIs. |
| 2026-01-16 2026 | trailofbits/skills: Trail of Bits Claude Code skills for security research, vulnerability detection, and audit workflows intermediate | Library of Claude Code skills from Trail of Bits, enhancing AI-assisted security analysis, vulnerability detection, and audit workflows. This marketplace provides codex-native skill discovery, allowing researchers to browse and install plugins locally or via a git clone. Contributions and bug reports are welcomed. |
| 2026-01-13 2026 | Securing AI Systems beginner | Course on securing AI systems, covering adversarial attacks, data poisoning, and model theft. It offers hands-on labs for implementing defenses, conducting red-team simulations, and evaluating weaknesses. You will learn threat modeling, vulnerability assessments, DevSecOps, and incident response within AI/ML workflows, cloud security, and MLOps. |
| 2026-01-11 2026 | Certified AI Security Professional - AI Security Certification - Practical DevSecOps beginner | Library covering the Certified AI Security Professional (CAISP) certification, this resource details AI security fundamentals, Large Language Model (LLM) attacks, and defenses. It explores OWASP Top 10 LLM vulnerabilities like prompt injection and training data poisoning, along with AI-DevOps integration. Key attack tactics from MITRE ATT&CK and ATLAS are examined, alongside threat modeling methodologies and supply chain security for AI. Emerging threats, governance, and compliance are also addressed, including discussions on the EU AI Act and NIST RMF. |
| 2025-12-19 2025 | KeygraphHQ/shannon: Fully autonomous AI hacker to find actual exploits in your web apps. Shannon has achieved a 96.15% success rate on the hint-free, source-aware XBOW Benchmark. advanced Bug Bounty | Library for fully autonomous, white-box AI pentesting of web applications and APIs. Shannon analyzes source code and executes real exploits, including Injection, XSS, SSRF, and Broken Authentication, to validate vulnerabilities before production. It leverages tools like Nmap and Subfinder, and can handle 2FA/TOTP logins with reproducible proof-of-concept exploits, achieving a 96.15% success rate on the XBOW Benchmark. |
| 2025-12-17 2025 | NVIDIA/garak: the LLM vulnerability scanner beginner | Tool for scanning Large Language Models (LLMs), `garak` probes for vulnerabilities like hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It employs static, dynamic, and adaptive probes to identify weaknesses in LLMs accessible via Hugging Face Hub, Replicate, OpenAI API, AWS Bedrock, LiteLLM, and REST endpoints. `garak` helps assess LLM security by mimicking tools like nmap or Metasploit Framework for LLMs, reporting on failure rates and logging detailed run information. |
| 2025-12-13 2025 | Building an Open-Source AI-Powered Auto-Exploiter with a 1.7B Parameter Model: No Paid APIs Required advanced | Library for building an open-source, AI-powered autonomous penetration testing agent. This system utilizes a 1.7 billion parameter qwen3:1.7b model, LangChain, and LangGraph for local execution, eliminating API costs and data exfiltration. It functions as a ReAct agent, independently scanning networks with Nmap, searching for exploits using searchsploit, mirroring them, analyzing code with `inspect_exploit_code`, setting up listeners with `start_listener`, and executing commands via `execute_shell_command` to achieve autonomous exploitation. |
| 2025-12-11 2025 | 📚 tl;dr sec 308 news Supply Chain | 😈 MCP Security, ☁️ AWS re:Invent Recaps, 🤖 Detecting Malicious Pull Requests with AI https://t.co/gt4zMQKZpp |
| 2025-12-05 2025 | GitHub - amaiya/onprem: A toolkit for applying LLMs to sensitive, non-public data in offline or restricted environments beginner | Library for applying LLMs to sensitive, non-public data locally or in restricted environments. OnPrem.LLM, a Python toolkit inspired by privateGPT, offers full local execution with optional cloud provider integration (OpenAI, Anthropic). It features analysis pipelines for extraction, summarization, and Q&A, supports resource-constrained environments with SparseStore, and integrates with tools like Elasticsearch. Recent updates include an `AgentExecutor` for sandboxed AI agents and support for workflows and asynchronous prompts. |
| 2025-10-30 2025 | fr0gger/proximity: Proximity is a MCP security scanner powered with NOVA intermediate Supply Chain | Library for scanning MCP (Model Context Protocol) servers and Agent Skills, Proximity uses NOVA rules to detect security issues like prompt injection and jailbreaks. It performs detailed analysis of server capabilities and skill structures, supporting MCP Spec 2025-11-25 and providing pattern-specific remediation guidance. |
| 2025-10-15 2025 | The MCP Security Tool You Probably Need - MCP Snitch intermediate Supply Chain | Library implementing a proxy-based security model for MCP tools, offering a critical mediation layer until native MCP security primitives and platform-level fine-grained scoping are adopted. MCP Snitch intercepts tool calls, enforces user-defined whitelists for operations, and provides visibility and control, mitigating risks like those demonstrated by the GitHub MCP vulnerability. This approach prioritizes explicit allow-listing over deny-listing for robust access control. |
| 2025-10-14 2025 | AI For Hackers: Red Team Editions – Codelivly Resources beginner | Manual for offensive AI tradecraft, this 1,100-page guide teaches red teams to build autonomous hacking agents. It covers AI-augmented reconnaissance, polymorphic payload generation using generative models, AI-driven vulnerability discovery with tools like CodeBERT and reinforcement learning fuzzers, and adaptive C2 frameworks. The resource includes 60+ labs, 500+ Python code examples, and methods for bypassing AI-based security with adversarial examples. |
| 2025-10-12 2025 | 5 Essential MCP Servers That Give Claude & Cursor Real Superpowers (2025) beginner | “” is published by Prithwish Nath in Artificial Intelligence in Plain English. |
| 2025-10-02 2025 | Offensive AI - Hacker Associate beginner | Certification program merging traditional web pentesting with AI automation. This hands-on course teaches how to identify, exploit, and report vulnerabilities using GPT agents, LangChain, AutoGPT, and tools like Burp Suite and Turbo Intruder. Modules cover AI-powered reconnaissance, exploitation of access control and XSS, authentication bypass, API testing, automated reporting, and advanced agent development for WAF bypass, business logic flaws, and CI/CD pipeline analysis. It also delves into AI red teaming, adversarial AI testing, and prompt injection attacks. |
| 2025-08-22 2025 | Model Context Protocol (MCP): Understanding security risks and controls intermediate | Library for securing Anthropic's Model Context Protocol (MCP), which connects LLMs to external tools. It addresses confused deputy vulnerabilities via OAuth, supply chain risks by requiring signed components and SAST/SCA in build pipelines, unauthorized command execution with input sanitization and sandboxing, prompt injection through user confirmation, and tool injection by enabling version pinning and modification notifications. The library also details mitigation for MCP sampling exploitation and emphasizes logging best practices. |
| 2025-08-13 2025 | AI Mastery for Cybersecurity Professionals beginner Talks | Bundle of 10 EC-Council courses focused on applying AI to cybersecurity. This learning resource covers topics such as AI-driven threat detection, LLM pentesting, automated reconnaissance for bug bounty hunting using tools like Nuclei and HTTPX, and defending against generative AI threats like phishing and deepfakes. It aims to equip cybersecurity professionals with skills to automate detection, strengthen defenses, and enhance cyber intelligence. |
| 2025-04-30 2025 | #burp #pentest #ai #hackerassociate #cybersecurity #infosec… | Harshad Shah intermediate Burp Talks | Setting Up #Burp MCP Server on Claude Desktop #Pentest Modern App with #Ai ⇢ Learn how to set up a 𝗕𝘂𝗿𝗽 𝗠𝗖𝗣 𝗦𝗲𝗿𝘃𝗲𝗿 on your 𝗖𝗹𝗮𝘂𝗱𝗲 𝗱𝗲𝘀𝗸𝘁𝗼𝗽 in this easy-to-follow tutorial. ⇢ Get your server up and... |
| 2025-04-13 2025 | Building Your First Offensive Security MCP Server - Renae Schilg - Medium intermediate | So, you’ve read the primer here, you have a basic understanding of MCP servers and how they work and now you’re ready to build your own. We are going to be building a simple MCP server that performs… |
| 2025-04-09 2025 | Defensive Deception with Kong and Beelzebub LLM Honeypot intermediate | In today’s increasingly sophisticated cyber threat landscape, organizations need to move beyond traditional defensive measures. While firewalls, intrusion detection systems, and vulnerability… |
| 2025-03-24 2025 | Prompt Engineering Guide – Nextra beginner | Guide to prompt engineering, a new discipline for optimizing prompts to interact with and develop large language models (LLMs). This resource compiles the latest papers, advanced prompting techniques, learning guides, model-specific guides, lectures, references, new LLM capabilities, and tools, aiming to improve LLM safety and augment capabilities with domain knowledge and external tools. |
| 2025-02-25 2025 | GenAI with Python: Build Agents from Scratch (Complete Tutorial) beginner | Prompt Engineering is the practice of designing and refining prompts (text inputs) to enhance the behavior of Large Language Models (LLMs). The goal is to get the desired responses from the model by… |
| 2025-02-14 2025 | GitHub - microsoft/generative-ai-for-beginners: 21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ beginner | Library of 21 lessons for building Generative AI applications, covering concepts and code examples in Python and TypeScript. Lessons include Azure OpenAI Service, GitHub Marketplace Model Catalog, and OpenAI API, with "Keep Learning" sections. Basic Python or TypeScript knowledge is recommended, and a GitHub account is required for local cloning and contributions. Sparse checkout instructions are provided to reduce download size by excluding translations. |
| 2025-02-09 2025 | GitHub - potpie-ai/potpie: Prompt-To-Agent : Create custom engineering agents for your codebase intermediate | Library for creating AI agents that reason about your codebase. Potpie transforms repositories into knowledge graphs stored in Neo4j, enabling agents to understand code context for debugging and feature development. It supports OpenAI, Ollama, and Anthropic LLM providers, with configurable authentication for GitHub repositories via GitHub Apps or Personal Access Tokens. The architecture includes a FastAPI API layer, Celery workers for asynchronous parsing, and a Neo4j knowledge graph as the core context provider. |
| 2025-02-09 2025 | GitHub - eastlondoner/cursor-tools: Give Cursor Agent an AI Team and Advanced Skills intermediate | Library for extending AI coding assistants like Cursor Composer, Cursor, Claude Code, and Codex with advanced skills and an AI team. It integrates with Perplexity for web search and Gemini 2.0 for large context windows, enabling capabilities such as working with GitHub Issues and Linear, generating local documentation, analyzing YouTube videos, and operating web applications via Stagehand. The library offers a CLI for system-wide access and supports multiple AI providers including OpenAI, Anthropic, and OpenRouter. |
| 2025-01-30 2025 | Set Up Your Own Cybersecurity-Focused AI Development, Training, and Fine-Tuning Lab at Home intermediate | As AI applications rapidly evolve, commercial platforms like OpenAI, Gemini, and many other LLM versions are offering advanced capabilities… |
| 2025-01-24 2025 | GitHub - JasonLovesDoggo/caddy-defender: Caddy module to block IPs and prevent AIs from training on your website. intermediate | Library for Caddy that blocks IP addresses and prevents AI training on websites. It supports IP range filtering, predefined ranges for services like OpenAI and GitHub Copilot, and custom ranges. Responders include blocking, custom messages, dropping connections, returning garbage data, redirection, rate limiting, and tarpitting. Installation is available via a pre-built Docker image. |
| 2025-01-10 2025 | SSH LLM Honeypot caught a real threat actor - Beelzebub Blog intermediate | Library for configuring an SSH LLM honeypot using the Beelzebub framework. This resource details how a threat actor was caught downloading binaries with known exploits and attempting to join a botnet via an IRC channel. Analysis of the threat actor's actions, including IP address, credentials, and observed commands, is provided, along with steps to recreate the honeypot setup and details on the Perl script used for DDoS and C2 communication through Undernet IRC channels. |
| 2024-12-31 2024 | GitHub - browser-use/browser-use: Make websites accessible for AI agents intermediate | Library for scalable, stealth-enabled browser automation. It enables coding agents like Cursor and Claude Code to interact with websites, supporting custom tools and offering both open-source and cloud-hosted agent options. The library provides a CLI for direct browser control and features optimized LLMs like ChatBrowserUse for faster, more accurate task completion. Production deployments are recommended for the cloud API due to its scalable infrastructure, proxy rotation, and captcha handling capabilities. |
| 2024-10-05 2024 | GitHub - fr0gger/Awesome-GPT-Agents: A curated list of GPT agents for cybersecurity beginner | Library of curated GPT agents for cybersecurity, categorized for offensive and defensive applications. This community-driven resource lists various specialized agents, including MagicUnprotect for malware evasion, GP(en)T(ester) for pentesting, Threat Intel Bot for APT tracking, Vulnerability Bot for secure coding, SourceCodeAnalysis for code review, Web Hacking Wizard for web security education, CyberGPT for CVE details, MITREGPT for MITRE ATT&CK mapping, and AppSec Test Crafter for generating application security test cases in YAML. |
| 2024-08-28 2024 | Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information · Embrace The Red advanced | Writeup detailing a Microsoft 365 Copilot vulnerability where prompt injection, automatic tool invocation, and ASCII smuggling were combined to exfiltrate personal information. The exploit chain leveraged malicious emails or shared documents to trigger Copilot's processing, enabling it to access and send sensitive data like emails and MFA codes to attacker-controlled domains via disguised hyperlinks. |
| 2023-12-05 2023 | pentestmuse-ai/PentestMuse intermediate | Library for an AI assistant designed for cybersecurity professionals, Pentest Muse aids penetration testers in brainstorming, payload generation, code analysis, and reconnaissance. It offers both command-line and web application interfaces, supporting iterative task completion and direct command execution. Users can connect via managed APIs or integrate their own OpenAI API keys. |
| 2023-11-18 2023 | protectai/ai-exploits intermediate | Library of exploits and Nuclei scanning templates for machine learning infrastructure vulnerabilities. This collection, including Metasploit modules and CSRF templates, addresses real-world attacks such as system takeovers and data loss, often without authentication. Vulnerabilities affect tools, libraries, and frameworks used in AI/ML model development, training, and deployment, with specific examples like Ray and MLflow being addressed. |
| 2023-11-09 2023 | https://chat.openai.com/g/g-6Bcjkotez-getpaths intermediate | https://ift.tt/fbJIsGN |
| 2023-06-25 2023 | Beginners guide to AI in cybersec. Hacking with ChatGPT. beginner | Beginners guide to AI in cybersec. Hacking with ChatGPT. https://ift.tt/UDRVtCp |
| 2023-06-12 2023 | Threat Modeling Example with ChatGPT intermediate | Threat Modeling Example with ChatGPT https://ift.tt/FRkZvyO |
| 2023-05-18 2023 | The AI Attack Surface Map v1.0 advanced | Framework for thinking about AI system attack surfaces, this resource maps components like AI Assistants, Agents, Tools, Models, and Storage. It highlights natural language as a primary attack vector, detailing techniques such as prompt injection against Agents and Tools to execute arbitrary commands or access sensitive data. Model attacks focus on subtle manipulation, while Storage vulnerabilities, particularly in Vector Databases, allow for data extraction and potential compromise of embeddings. The framework aims to clarify the evolving landscape of AI vulnerabilities beyond just machine learning models. → danielmiessler.com |
| 2023-05-09 2023 | How I Automate BugBounty Using Chatgpt intermediate Bug Bounty | How I Automate BugBounty Using Chatgpt https://ift.tt/93SQsPD |
| 2023-04-09 2023 | aress31/burpgpt intermediate Burp | Library for integrating OpenAI's GPT models into Burp Suite for passive security vulnerability detection. BurpGPT analyzes web traffic by sending requests and responses to a specified OpenAI model, leveraging custom prompts for tailored analysis. It generates automated security reports, highlighting potential issues beyond traditional scanner capabilities, but requires professional triaging for false positives. The extension supports various OpenAI models and allows granular control over token usage and prompt length. It requires Burp Suite Professional or Community Edition (version 2023.3.2+) and JDK 11+. |
| 2023-04-02 2023 | SecGPT transforms cybersecurity through AI-driven insights. news | SecGPT transforms cybersecurity through AI-driven insights. https://ift.tt/4kTKfoJ |
| 2023-04-02 2023 | I Used GPT-3 to Find 213 Security Vulnerabilities in a Single Codebase intermediate | I Used GPT-3 to Find 213 Security Vulnerabilities in a Single Codebase https://ift.tt/FrMSdKx |
| 2023-04-02 2023 | HackGPT beginner | HackGPT https://ift.tt/JsIGRO1 |
| 2023-03-29 2023 | Microsoft Security Copilot is a new GPT-4 AI assistant for cybersecurity news | Tool that uses GPT-4 and Microsoft's security-specific model to assist cybersecurity professionals. It synthesizes enterprise security incidents, analyzes files and code, and summarizes alerts from other security tools. Security Copilot draws from 65 trillion daily signals, CISA, NIST, and its own threat intelligence, offering a prompt book for automations and a collaborative workspace. It can also generate PowerPoint summaries of incidents and attack vectors. |
| 2022-02-03 2022 | Favorite tweet by @LeaKissner news | Favorite tweet: Nicolas Carlini's ML training data extraction attack talk at #Enigma2022 escalated quickly. https://t.co/C8kzAyq7lh — Lea Kissner (@LeaKissner) Feb 2, 2022 |
Frequently Asked Questions
- What is prompt injection?
- Prompt injection is an attack against applications that use large language models (LLMs). An attacker crafts input that overrides or manipulates the LLM's system instructions, causing it to perform unintended actions. Direct prompt injection targets the user input; indirect prompt injection embeds malicious instructions in data the LLM processes, such as emails or web pages.
- What is the OWASP Top 10 for LLM Applications?
- The OWASP Top 10 for LLM Applications identifies the most critical security risks for AI-powered applications, including prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft.
- How do you secure AI-integrated applications?
- Key practices include validating and sanitizing LLM outputs before rendering or executing them, implementing least-privilege access for AI agents, using guardrails to constrain model behavior, monitoring for prompt injection attempts, applying rate limiting, separating AI processing from privileged operations, and treating all LLM output as untrusted user input.
Weekly AppSec Digest
Get new resources delivered every Monday.