Secrets: A Practical Guide

Problem Framing

Secrets, in the context of application security, represent any sensitive piece of information that grants authorized access to systems, data, or resources. This includes API keys, passwords, database credentials, encryption keys, tokens, and certificates. The pervasive nature of these secrets within software development lifecycles, coupled with evolving attack vectors, makes their management a critical and ongoing challenge. Leaked secrets are not merely an inconvenience; they represent a direct pathway for attackers to gain unauthorized access, leading to data breaches, system compromise, financial loss, and reputational damage.

The landscape of secrets management is becoming increasingly complex due to several intersecting trends. The rapid adoption of AI-assisted development tools introduces new avenues for accidental or malicious secret exposure, often through AI-generated code or through the configuration of AI services themselves ^[1]^[2]. Furthermore, the increasing reliance on cloud infrastructure and containerized environments, while offering agility, also expands the attack surface for secrets. Non-human identities (NHIs) – service accounts, managed identities, and machine identities – are gaining prominence, and their secure management is paramount, as compromised NHIs can grant attackers broad access ^[3]^[4]. Finally, the persistent issue of secrets sprawl, where secrets are scattered across numerous locations and lifecycles, remains a fundamental problem, exacerbated by the sheer volume of code and infrastructure being managed.

Core Mechanics

Secrets management fundamentally revolves around controlling the lifecycle of sensitive information. This begins with creation, where secrets are generated with appropriate entropy and security properties. The subsequent storage is a critical phase, dictating how secrets are protected at rest. This can range from insecure methods like hardcoding directly into source code or configuration files to robust solutions like dedicated secrets management platforms.

Distribution refers to how secrets are delivered to the applications or services that require them. This process must be secure, ensuring that secrets are only accessible by authorized entities and at the time they are needed. Usage involves the actual consumption of secrets by applications. Principles of least privilege should be applied here, meaning that secrets should only grant the minimum necessary permissions.

Rotation is the process of periodically replacing existing secrets with new ones. This is a vital defense mechanism against long-lived compromised secrets. Finally, revocation is the immediate disabling of a secret when it is suspected of compromise or no longer needed. The effectiveness of these mechanics is heavily influenced by the chosen tooling and the adherence to secure development practices throughout the software development lifecycle (SDLC).

A significant portion of secret compromise stems from insecure storage and distribution. Hardcoding secrets directly into source code is a common and easily exploitable vulnerability ^[5]. Similarly, storing secrets in environment variables or unencrypted configuration files, such as .env files, makes them readily accessible to anyone with read access to the system or code repository ^[6]^[7]. Notebook files, often used in AI development, can also inadvertently contain secrets [S-summary-AI-notebooks].

The distribution of secrets to CI/CD pipelines presents a major risk. Compromised CI/CD systems, such as GitHub Actions, can lead to the theft of valuable secrets, which are then used for further lateral movement or resource exploitation ^[8]^[9]. For instance, a compromised GitHub Action, like codfish/semantic-release-action, was observed hijacking tags to steal CI/CD secrets ^[8]. Similarly, the elementary-data PyPI package demonstrated how GitHub Actions scripts can be injected to publish malicious credential-stealing packages ^[10].

Secrets within non-human identities (NHIs) are another critical area. Cloud provider credentials, especially AWS keys, are frequently exposed. Attackers can harvest IAM credentials from public repositories within minutes of their exposure [S-summary-NHI-cloud]. Exposed heap dumps in Spring Boot Actuator have been found to contain AWS keys, JWT tokens, and session cookies ^[11]. The exploitation of cloud services like AWS SES (Simple Email Service) has also been observed, where leaked AWS access keys allowed attackers to escape SES sandbox limitations and conduct large-scale phishing campaigns [S-summary-AWS-SES].

The introduction of AI coding assistants has introduced novel risk vectors. These tools can inadvertently inject secrets into Git history, and even partial remediations may leave secrets exposed in older commits ^[12]. Furthermore, AI service credentials themselves are increasingly becoming targets, with a significant year-over-year increase in their exposure ^[1].

Notable Techniques

Hardcoding Secrets: Directly embedding sensitive information like API keys, passwords, or tokens within source code files, configuration files (e.g., .env, appsettings.json), or even build scripts remains a prevalent, albeit fundamentally insecure, technique ^[5]. This is often a symptom of developer oversight or a lack of awareness regarding secure credential management practices. While seemingly straightforward, it creates a permanent vulnerability if the code is ever exposed publicly or if an attacker gains read access to the codebase.

CI/CD Pipeline Compromise: Attackers target CI/CD pipelines as they often have elevated privileges and access to a wide array of secrets. Techniques include hijacking GitHub Actions workflows through vulnerabilities like YAML anchor misconfigurations or exploiting the pull_request_target trigger. Compromised actions or dependencies within the CI/CD chain can exfiltrate secrets, as demonstrated by the codfish/semantic-release-action which used imposter commits and tag manipulation to steal CI/CD secrets ^[8]. GitHub Actions cache poisoning is another avenue, allowing attackers to inject malicious code or data into cached artifacts that are later used in builds.

Exploiting Cloud Misconfigurations: Publicly accessible cloud databases, such as misconfigured ClickHouse or Supabase instances, have been found to leak sensitive data, including chat history, API secrets, and authentication tokens ^[13]. For example, a misconfigured Supabase database exposed millions of API authentication tokens and private messages [S-summary-Supabase]. The AWS Instance Metadata Service (IMDS), particularly IMDSv1, can be queried to retrieve temporary credentials if not properly secured. In Kubernetes, querying the API for service account tokens can grant broad access if not restricted.

Git History Exploitation: Secrets accidentally committed to Git repositories can persist even after being removed from the current branch. Techniques like unintended commits, orphaned branches, or force-pushes can leave sensitive data accessible. git-filter-repo is a tool used to rewrite Git history to remove such exposures ^[12]. AI coding agents have been observed introducing secrets into Git history, and incomplete remediation efforts mean these secrets can remain discoverable in historical commits ^[12]^[14].

IDE Plugin Compromise: Malicious IDE plugins can act as sophisticated credential theft vectors. Several JetBrains IDE plugins have been found to exfiltrate AI provider API keys and other credentials to attacker-controlled servers ^[15]. These plugins often operate with significant access to a developer's environment, making them a prime target for harvesting sensitive information. VSCode extension packages have also been found to contain over 550 validated secrets, including AI provider secrets and high-risk platform secrets [S-summary-VSCode].

Credential Stuffing and Password Spraying: While not directly related to code security, these techniques leverage leaked or weak credentials obtained through other means (e.g., data breaches) to gain unauthorized access to systems. If valid credentials are found in a repository, attackers can attempt to use them against other services the victim organization uses.

Abusing Cloud Services for Malicious Purposes: Stolen cloud credentials, particularly AWS access keys, have been used to escape SES sandbox restrictions and conduct large-scale phishing operations [S-summary-AWS-SES]. This highlights how compromised secrets can enable further malicious activity beyond the initial compromise.

Leveraging AI Coding Assistants for Secret Exposure: Beyond inadvertently introducing secrets into code, AI assistants can be misused. Prompt injection attacks can persuade an AI model to reveal sensitive information or execute malicious commands. The code generated by these assistants may also contain security vulnerabilities, including hardcoded secrets, which developers might not thoroughly review ^[2]. Furthermore, configuration files for AI services (e.g., MCP, LLM infrastructure) can contain sensitive tokens that are exposed in public repositories [S-summary-MCP-config].

Credential Recovery via DPAPI and CREDHIST: On Windows systems, secrets can be recovered through mechanisms like Data Protection API (DPAPI) and the Credential History (CREDHIST) files. Tools like DPAPISnoop can extract these hashes, which can then be cracked offline using tools like Hashcat [S-summary-DPAPI]. This targets secrets stored by the operating system for applications and user accounts.

Hardcoded Symmetric Keys: In some cases, hardcoded symmetric keys, such as 3DES or AES-256-CBC, have been found within applications. A notable example involved a logistics platform (CargoWise WebTracker) where such keys led to authentication bypass vulnerabilities ^[5].

Non-Human Identity (NHI) Exploitation: A significant percentage of organizations have privileged, internet-exposed machine identities with vulnerabilities [S-summary-NHI-vuln]. Compromised NHIs, such as service accounts or managed identities, can grant attackers extensive access to cloud resources and data. Attackers leverage compromised GitHub Personal Access Tokens (PATs) to discover and execute malicious code within GitHub Actions, leading to cloud credential theft [S-summary-GitHub-PAT].

Detection & Prevention

The primary goal of detection and prevention strategies for secrets is to stop them from entering or residing in insecure locations and to quickly identify and remediate those that are exposed. A multi-layered approach is essential, encompassing developer practices, automated tooling, and robust secrets management infrastructure.

Developer Hygiene and Education: Fostering a security-aware culture among developers is foundational. This includes training on secure coding practices, the risks associated with secrets, and the proper use of secrets management tools. Educating developers on what constitutes a secret and where they commonly appear is crucial. The use of pre-commit hooks is a powerful shift-left technique, allowing developers to scan their staged changes for secrets before they are committed to the version control system. Tools like ggshield, gitleaks, and detect-secrets can be integrated into these hooks ^[16].

Automated Scanning:

Static Code Analysis (SAST): Integrating secret scanning tools into SAST pipelines can identify secrets embedded within source code, configuration files, and infrastructure-as-code (IaC) templates. Tools such as ggshield, truffleHog, and gitleaks are designed for this purpose, capable of scanning repositories, commit history, and even live environments [S-summary-scanning-tools].
CI/CD Pipeline Scanning: Continuous integration and continuous delivery pipelines are prime targets for attackers. Implementing secret scanning as a mandatory step in CI/CD workflows ensures that secrets are not accidentally introduced into build artifacts or deployment configurations. This includes scanning code repositories, container images, and pipeline artifacts. GitHub Secret Scanning and GitLab Secret Detection are built-in features that can help ^[17][S-summary-CI-CD].
Git History Scanning: Secrets committed in the past can remain accessible. Tools that scan the entire Git history are vital for identifying and cleaning up historical exposures. This is particularly important given that many leaked secrets from past years remain active [S-summary-persistence].
Container Image Scanning: Secrets can be baked into Docker images during the build process. Scanning container images for secrets before deployment is a critical step in preventing their exposure in production environments [S-summary-container-images].
Cloud Configuration Scanning: Cloud Security Posture Management (CSPM) tools can identify misconfigurations in cloud services that might lead to secret exposure, such as publicly accessible databases or overly permissive IAM policies.

Secrets Management Systems: Moving away from hardcoded secrets towards centralized, dynamic secrets management is a key preventative measure.

Centralized Vaults: Solutions like HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, and CyberArk Conjur provide a secure, centralized location for storing, managing, and distributing secrets. These systems often support features like dynamic secrets generation, automated rotation, and fine-grained access control ^[18].
Dynamic Secrets: Instead of long-lived static credentials, dynamic secrets are generated on demand with a short TTL and are automatically revoked upon use or expiry. This significantly reduces the attack surface if a secret is compromised ^[18].
Just-In-Time (JIT) Access: JIT access grants temporary permissions to resources only when needed, minimizing the window of opportunity for attackers.
Identity-Based Access Control: Leveraging managed identities (e.g., AWS IAM Roles, Azure Managed Identities, GCP Service Accounts) or OpenID Connect (OIDC) allows services to authenticate and obtain credentials without needing to store static secrets ^[19]^[20]. This is a crucial step towards achieving "secretless" authentication in many scenarios.

Remediation Strategies:

Revocation and Rotation: When a secret is detected as leaked, the immediate priority is to revoke it and rotate any affected credentials. Automated rotation mechanisms in secrets management systems are invaluable here ^[21]. It's critical to understand that a large percentage of leaked secrets remain active for extended periods, underscoring the urgency of timely revocation [S-summary-persistence].
Git History Rewriting: For secrets committed to Git history, tools like git-filter-repo can be used to rewrite the repository's history, effectively removing the secret from all past commits. This is a complex operation and should be performed with caution, especially in collaborative environments ^[12].
Threat Intelligence and Monitoring: Continuously monitoring for leaked secrets using services like GitHub Secret Scanning or commercial solutions helps in early detection. Understanding trends and common leak patterns (e.g., AI service credentials) can inform preventative strategies ^[17]^[1].

Non-Human Identity (NHI) Governance: Explicitly managing and monitoring NHIs is as critical as managing human credentials. This involves discovering all NHIs, understanding their permissions, and applying the principle of least privilege. Tools for NHI governance help in identifying and mitigating risks associated with these identities ^[3]^[4].

Tooling

A robust ecosystem of tools exists to address the multifaceted challenges of secrets management, detection, and prevention. These tools span various stages of the SDLC and cater to different operational needs.

Secrets Scanning and Detection:

ggshield CLI: Developed by GitGuardian, this tool scans Git repositories, staged changes, and commit history locally. It provides a command-line interface for integrating secret detection into developer workflows and CI/CD pipelines ^[12].
gitleaks: A fast, lightweight open-source secrets scanner often used as a pre-commit hook or a CI pipeline component. It's effective for identifying hardcoded credentials in code and configuration files ^[16].
truffleHog: An open-source tool that scans Git repositories for secrets. It offers credential verification capabilities, allowing it to check if discovered secrets are active, and can scan multiple sources beyond Git, including S3 buckets and Docker images.
detect-secrets (Yelp): A framework for finding secrets in codebases. It supports a baseline workflow for handling existing code, making it suitable for integrating into established projects.
GitHub Secret Scanning: A built-in feature within GitHub that automatically scans public repositories for leaked secrets. It offers push protection and validity checks for many common secret types ^[17].
GitLab Secret Detection: Similar to GitHub's offering, this feature scans commits, pipelines, and merge requests within GitLab for exposed secrets.
Spectral: An AI-enhanced platform for secret detection, offering broad coverage and integration capabilities.
Wiz Code: Part of the Wiz platform, it surfaces SAST, SCA, IaC, and secrets findings, providing a consolidated view of code-level security risks.
mcp-scan: A Python tool from Snyk specifically for detecting security issues in AI agent skills, including potential secret leaks.

Secrets Management Platforms:

HashiCorp Vault: A feature-rich secrets management solution offering centralized storage, dynamic secrets generation, encryption-as-a-service, and privileged access management. It supports advanced features like PKI secrets engines and database secrets engines ^[18]^[22].
AWS Secrets Manager: A managed AWS service designed for securely storing, managing, and rotating secrets like credentials, API tokens, and passwords. It integrates tightly with AWS IAM and other AWS services ^[19].
Doppler: A developer experience-focused secrets management platform that aims to simplify the process of managing and injecting secrets across different environments and applications ^[18].
Infisical: An open-source secrets management platform that provides a user-friendly UI and automation capabilities, with a focus on self-hosting and developer workflow integration ^[18]^[23].
CyberArk Conjur: An enterprise-grade secrets management solution focused on securing secrets for applications, infrastructure, and privileged access.
1Password Secrets Automation: Offers developer-facing secrets management capabilities, including the ability to reference secrets directly from .env files.

Infrastructure as Code (IaC) and Kubernetes Secrets:

git-filter-repo: A powerful tool for rewriting Git history to remove sensitive data that has been accidentally committed. It's crucial for cleaning up exposed secrets from past commits ^[12].
SOPS (Secrets OPerationS): A tool for encrypting secrets files directly into Git, enabling them to be stored in version control while maintaining confidentiality. It integrates with KMS, GPG, and other encryption methods.
External Secrets Operator (ESO): A Kubernetes operator that synchronizes secrets from external secrets management systems (like Vault, AWS Secrets Manager) into native Kubernetes Secrets, allowing for unified secret management within Kubernetes clusters ^[18].
Secrets Store CSI Driver: A Kubernetes CSI driver that allows mounting secrets from external stores (e.g., AWS Secrets Manager, HashiCorp Vault) as volumes within pods, enabling secret retrieval without embedding them in container images or environment variables.
Vault Agent Injector: A Kubernetes admission controller that injects a Vault Agent sidecar into pods, facilitating dynamic secret retrieval and authentication for applications running within Kubernetes.
Terraform: While not a secrets management tool itself, Terraform's sensitive flag for outputs and the use of ephemeral resources in configurations are important practices for managing secrets within Infrastructure as Code ^[24].

AI-Specific Tooling:

GitGuardian Agent Skills: Focuses on teaching AI coding assistants to use security tools and workflows, potentially improving their handling of secrets.
Kirin (Knostic): A security tool designed to scan and identify security issues within AI coding assistants, including preventing secret leakage.
Repello's AI Asset Inventory: Automates the discovery of AI coding assistants being used within an organization.

Credential Recovery and Analysis:

DPAPISnoop: A C# tool for extracting Windows DPAPI and CREDHIST hashes, which can then be used with cracking tools like Hashcat.
Lazagne, SharpChrome, DonPAPI, dploot: Tools for harvesting credentials from web browsers and operating system credential stores, often used in forensic or offensive security contexts.

Recent Developments

The realm of secrets management is in constant flux, with recent developments heavily influenced by the rapid integration of AI into the software development lifecycle and the evolving sophistication of supply chain attacks.

AI's Double-Edged Sword: AI coding assistants are now a significant factor in secrets sprawl. Reports indicate that AI-assisted commits are twice as likely to leak secrets compared to the overall baseline ^[25]^[26]. Furthermore, AI service credentials themselves have seen an alarming surge, with leaks increasing by 81% year-over-year ^[1]^[25]. Specific findings include exposed AI service credentials in .ipynb files and AI agent configuration files [S-summary-AI-config]. The risk is amplified by the fact that AI-generated code may contain vulnerabilities, including hardcoded secrets, that developers might overlook ^[2]. Tools are emerging to specifically address AI-related security risks, such as Snyk's AI-BOM for component inventory and mcp-scan for AI agent skills [S-summary-AI-tooling]. Kirin (Knostic) is a tool designed to scan AI coding assistants for security issues, including preventing secret leakage [S-summary-AI-tooling].

Sophistication in Supply Chain Attacks: Attackers are increasingly targeting the software supply chain to compromise secrets. The Shai-Hulud campaign exemplified this, with worm-like malware spreading through hundreds of npm packages, stealing cloud credentials and API keys ^[27]. These attacks exploit build processes, postinstall scripts in npm packages, and build.rs files in Rust crates to exfiltrate data, including source code diffs ^[28]. Compromised CI/CD tools and open-source package repositories (like PyPI and npm) are frequent vectors. For instance, the TanStack npm packages were compromised and used to distribute malicious packages with valid SLSA provenance, highlighting advanced evasion techniques ^[29]. The LiteLLM PyPI package leveraged .pth files for stealthy persistence, exfiltrating extensive data ^[30]. Even infrastructure-as-code repositories and tools like Trivy have been targeted, leading to significant credential exposure and data exfiltration ^[31].

Persistence of Leaked Secrets: A persistent and worrying trend is the longevity of leaked secrets. A significant percentage of secrets leaked in previous years continue to remain active and unrevoked, creating a long-tail risk for organizations [S-summary-persistence-active]. For instance, 64% of valid secrets from 2022 were still active in 2025 [S-summary-persistence-active]. This underscores the importance of not just detection, but also rapid and automated revocation and rotation mechanisms.

Expanding Attack Surfaces: Beyond traditional code repositories, secrets are being discovered in a wider array of locations. Collaboration and productivity tools (Slack, Jira, Confluence) account for a growing percentage of incidents [S-summary-collaboration-tools]. Container image registries like Docker Hub are also significant sources of exposed cloud credentials [S-summary-DockerHub]. Furthermore, leaks from internal repositories are significantly more likely to contain secrets than public ones, often due to less stringent controls [S-summary-internal-repos].

Focus on Non-Human Identities (NHIs): The security of non-human identities is receiving increased attention. A substantial portion of organizations have internet-exposed, privileged machine identities with vulnerabilities [S-summary-NHI-vuln]. The compromise of these identities, such as service accounts or managed identities, can grant attackers broad access to cloud resources. Tools and platforms are emerging to provide better visibility and governance for these entities ^[3]^[4].

Advancements in Secrets Management: The industry is seeing a continued push towards more dynamic and identity-based secrets management. OpenID Connect (OIDC) is gaining traction for enabling secretless authentication between services, especially in CI/CD to cloud provider integrations ^[20]. Kubernetes operators like the External Secrets Operator (ESO) and Secrets Store CSI Driver are crucial for integrating external secrets management systems seamlessly into Kubernetes environments ^[18].

Where to Go Deeper

For practitioners seeking to deepen their understanding and practical application of secrets management best practices, several resources offer extensive insights:

Reputable Security Blogs and Research Firms:

GitGuardian Blog: Provides in-depth research and analysis on secrets sprawl, AI-related risks, supply chain attacks, and non-human identity security. Many of the citations in this guide originate from their research ^[12]^[32]^[3]^[1]^[33].
Snyk Blog: Features detailed reports on supply chain attacks, vulnerability research, and secure coding practices, including specific incidents and their impact ^[34]^[27]^[10]^[31].
Wiz Blog: Offers insights into cloud security posture management, code security, and the discovery of exposed cloud resources and secrets ^[11]^[13]^[35].
Aikido.dev: Publishes analyses of compromised software components and supply chain attacks, often with technical breakdowns of attacker techniques ^[8]^[28]^[9].

Official Documentation and Best Practices:

AWS Well-Architected Framework (SEC02-BP03): Provides foundational guidance on securely storing and using secrets within the AWS ecosystem ^[19].
HashiCorp Vault Documentation: Comprehensive resources on deploying, configuring, and hardening HashiCorp Vault for enterprise-grade secrets management, including best practices for production environments ^[22].
GitHub Documentation: Information on GitHub Secret Scanning features, push protection, and best practices for repository security ^[17]^[36].

Hands-on Learning and Tools:

ggshield, gitleaks, truffleHog: Experimenting with these open-source scanning tools locally and integrating them into pre-commit hooks or test CI pipelines provides practical experience in secret detection ^[16].
OWASP WrongSecrets: This game offers interactive scenarios with real-life examples of secrets management mistakes, allowing for learning through practice and gamification.
SOPS (Secrets OPerationS): Setting up SOPS to encrypt secrets files and integrating it into a Git workflow demonstrates a practical approach to managing secrets in IaC.

Frameworks and Concepts:

Shift-Left Security: Understanding how to integrate security practices, including secret scanning, earlier in the SDLC is crucial for preventing issues downstream ^[37].
Non-Human Identity (NHI) Security: Deep diving into the security considerations and tooling for managing service accounts, managed identities, and other machine identities is increasingly important ^[3]^[4].
Dynamic Secrets and JIT Access: Researching and understanding the architectural shifts towards dynamic secrets and Just-In-Time access will prepare practitioners for modern secrets management paradigms ^[18]^[21].

Deep Dives into Specific Incidents: Studying detailed post-mortems and analyses of major supply chain attacks (e.g., Shai-Hulud, Trivy attack) and vendor breaches (e.g., Klue) offers invaluable lessons on attack vectors and defense strategies ^[34]^[27]^[38]^[31].

Secrets — A Practical Guide