This repo is a vulnerability database and package search for sources such as AppThreat vuln-list, OSV, NVD, and GitHub. Vulnerability data are downloaded from the sources and stored in a sqlite based storage with indexes to allow offline access and efficient searches.
A good vulnerability database must have the following properties:
Multiple upstream sources are used by vdb to improve accuracy and reduce false negatives. SQLite database containing data in CVE 5.2 schema format is precompiled and distributed as files via ghcr to simplify download. With automatic purl prefix generation even for git repos, searches on the database can be performed with purl, cpe, or even http git url string. Every row in the database uses an open specification such as CVE 5.2 or Package URL (purl and vers) thus preventing the possibility of vendor lock-in.
- Linux vuln-list (Forked from AquaSecurity)
- OSV (1)
- NVD
- GitHub
1 - We exclude Linux and oss-fuzz feeds by default. Set the environment variable OSV_INCLUDE_FUZZ=true to include them.
2 - Malware feeds are included by default, thus increasing the db size slightly. Set the environment variable OSV_EXCLUDE_MALWARE=true to exclude them.
- AlmaLinux
- Debian
- Alpine
- Amazon Linux
- Arch Linux
- RHEL/CentOS
- Rocky Linux
- Ubuntu
- OpenSUSE
- Photon
- Chainguard
- Wolfi OS
pip install appthreat-vulnerability-db>=6.6.2To install vdb with optional dependencies such as oras use the [oras] or [all] dependency group.
pip install appthreat-vulnerability-db[all]NOTE: VDB v6 is a major rewrite to use SQLite database. Current users of depscan v5 must continue using version 5.8.x
pip install appthreat-vulnerability-db==5.8.0This package is ideal as a library for managing vulnerabilities. This is used by owasp-dep-scan, a free open-source dependency audit tool. However, there is a limited cli capability available with few features to test this tool directly.
Important
The AppThreat-hosted database images and workflows are best treated as bootstrap or evaluation defaults. For production use, especially when you need larger variants such as app + OS, we strongly recommend creating and publishing your own pre-built database versions with your own CI/CD workflows and storage.
Why:
- Security / provenance: Your team controls when data is built, where it is published, and which upstream sources and retention windows are allowed.
- Performance: You can publish smaller, faster-to-download databases that match your environment instead of pulling a one-size-fits-all image.
- Cost control: Large variants such as app + OS require significant compute, disk, and network bandwidth. Running scheduled builds on self-hosted infrastructure lets you scale them intentionally and budget for them explicitly.
To download a pre-built SQLite database (refreshed every 12 hours) containing all application vulnerabilities (~ 700MB). This is the fastest way to evaluate vdb, bootstrap a workstation, or validate an integration.
# pip install appthreat-vulnerability-db[all]
vdb --download-imageYou can execute this command daily or when a fresh database is required. For long-running production workflows, prefer mirroring or rebuilding this database inside your own environment and then distributing it from your own registry, object store, or artifact repository.
To perform containers and OS scans, download the full image (~ 7.5GB) which includes all application and OS vulnerabilities.
vdb --download-full-image
Because the full image is substantially larger and more expensive to build, test, and distribute, teams scanning containers or operating system packages should strongly prefer their own scheduled workflow that produces a tailored variant for the distros and time windows they actually support.
Use any sqlite browser or cli tools to load and query the two databases.
data.index.vdb6 - index db with purl prefix and vers
data.vdb6 - Contains source data in CVE 5.2 format stored as a jsonb blob.
Using ORAS cli might be slightly faster.
export VDB_HOME=$HOME/vdb
oras pull ghcr.io/appthreat/vdbxz:v6.5.x -o $VDB_HOME
tar -xvf *.tar.xz
rm *.tar.xz
Download one of the databases.
pip install -U "huggingface_hub[cli]"app only database
export VDB_HOME=$(pwd)/app
huggingface-cli download AppThreat/vdb --include "app/*.vdb6" --repo-type dataset --local-dir .app only 10 year database
export VDB_HOME=$(pwd)/app-10y
huggingface-cli download AppThreat/vdb --include "app-10y/*.vdb6" --repo-type dataset --local-dir .app and os database
export VDB_HOME=$(pwd)/app-os
huggingface-cli download AppThreat/vdb --include "app-os/*.vdb6" --repo-type dataset --local-dir .app and os 10 year database
export VDB_HOME=$(pwd)/app-os-10y
huggingface-cli download AppThreat/vdb --include "app-os-10y/*.vdb6" --repo-type dataset --local-dir .Use the below citation in your research.
@misc{vdb,
author = {Team AppThreat},
month = Feb,
title = {{AppThreat vulnerability-db}},
howpublished = {{https://huggingface.co/datasets/AppThreat/vdb}},
year = {2025}
}
If you depend on vdb regularly, build your own pre-built databases and publish them internally. This is the recommended approach for enterprises, security teams, and integrators.
Typical reasons to own the workflow:
- Publish from infrastructure you trust and control.
- Reduce supply-chain and availability dependencies on third-party hosted refresh jobs.
- Tune the database scope for your environment to reduce artifact size and download time.
- Use self-hosted runners or dedicated build machines for larger app + OS datasets, where compute, storage, and transfer costs are significant.
At a high level, the workflow is:
- Set the retention and distro selection environment variables for your environment.
- Run
vdb --cacheorvdb --cache-oson scheduled infrastructure. - Package the resulting
.vdb6files. - Publish them to your own OCI registry, object store, file share, or artifact repository.
- Point clients and integrations to your published URL instead of the AppThreat default.
Cache application vulnerabilities
vdb --cacheTo remove any existing databases:
vdb --cleanThe typical size of this database is over 700 MB.
Cache from just OSV
vdb --cache --only-osvIt is possible to customize the cache behavior by increasing the historic data period to cache by setting the following environment variables.
- NVD_START_YEAR - Default: 2018. Supports up to 2002
- GITHUB_PAGE_COUNT - Default: 2. Supports up to 20
Cache application and OS vulnerabilities
vdb --cache-osNote the size of the database with OS vulnerabilities is over 7.5 GB. It is possible to ignore/exlude specific OS distros using environment variables.
Example to ignore almalinux and ubuntu data from getting included, set the below environment variables:
export VDB_IGNORE_ALMALINUX=true
export VDB_IGNORE_UBUNTU=trueRefer to the variable LINUX_DISTRO_VULN_LIST_PATHS in config.py for the full list of distro strings supported.
For example, a team that only scans modern application dependencies can build a much smaller artifact by using a recent NVD_START_YEAR and sticking to vdb --cache. A platform team that only supports a subset of Linux distros can use VDB_IGNORE_* or VDB_INCLUDE_* environment variables before running vdb --cache-os to avoid paying the build and distribution cost for irrelevant data.
VDB provides multiple pre-built databases optimized for different use cases, balancing data depth and file size. Both ORAS (ghcr.io) and HuggingFace datasets are updated every 12 hours.
Treat the variants below as reference baselines. They are useful defaults, but many teams should create their own equivalents with narrower scope, longer retention, or distro-specific filtering and publish them through their own delivery pipeline.
Note for AI Agents: Use this table to decide which database URL to pass to the download_image() function based on the user's requirements.
| Database Scope | Time Context | ORAS Image URL (v6 or v6.5.x) |
HuggingFace Path | Recommended Use Case |
|---|---|---|---|---|
| App Only | 2 Years (2024+) | ghcr.io/appthreat/vdbxz-app-2y:v6 |
app-2y/ |
Fast, lightweight scans for very modern applications. |
| App Only | Default (2020+) | ghcr.io/appthreat/vdbxz-app:v6 |
app/ |
(Default) Standard application dependency scanning. |
| App Only | 10 Years (2016+) | ghcr.io/appthreat/vdbxz-app-10y:v6 |
app-10y/ |
Deep auditing of legacy application software. |
| App + OS | Default (2020+) | ghcr.io/appthreat/vdbxz:v6 |
app-os/ |
Standard container and OS-level package scanning. |
| App + OS | 10 Years (2016+) | ghcr.io/appthreat/vdbxz-10y:v6 |
app-os-10y/ |
Deep auditing of legacy Linux containers/VMs. |
(Note: The ORAS URLs above use .tar.xz compression. You can replace vdbxz with vdbzst in the URL if you prefer Zstandard compression).
If you operate your own workflow, keep the same naming pattern internally if it helps downstream tooling, but publish from infrastructure you control. This lets you swap in smaller app-only images, distro-restricted OS images, or longer-retention images without waiting on shared hosted workflows.
VDB supports loading custom vulnerability data from a local directory at runtime. This allows you to:
- Add Private Vulnerabilities: Include internal CVEs that are not public.
- Override False Positives: Correct data returned by the official database by marking specific versions as
unaffected.
Custom data must follow the CVE 5.2 JSON Schema. Supported file extensions are .json, .yaml, .yml, and .toml.
To use custom data, pass the directory path to the --custom-data argument.
vdb --search pkg:npm/my-lib@1.0.0 --custom-data /path/to/custom/vulnsCreate a file private-vuln.yaml. Since you are defining a new vulnerability record, you use the cna container.
dataType: CVE_RECORD
dataVersion: "5.2"
cveMetadata:
cveId: PRIVATE-2025-001
assignerOrgId: 00000000-0000-4000-8000-000000000000
state: PUBLISHED
datePublished: "2025-01-01T00:00:00Z"
dateUpdated: "2025-01-01T00:00:00Z"
containers:
cna:
providerMetadata:
orgId: 00000000-0000-4000-8000-000000000000
descriptions:
- lang: en
value: "Private vulnerability in internal library"
affected:
- vendor: internal
product: my-lib
packageName: my-lib
packageURL: pkg:npm/my-lib
versions:
- version: "1.0.0"
status: affected
versionType: semver
lessThan: "2.0.0"If the official database reports CVE-2023-9999 for pkg:pypi/requests but you have determined it is a false positive for your specific version, you can override it using an ADP (Authorized Data Publisher) container. This is the recommended way to append or dispute existing vulnerability data.
Logic: If a CVE ID and Package URL combination exists in your custom data, VDB will ignore the entry from the official database and use yours instead.
Create override.yaml:
dataType: CVE_RECORD
dataVersion: "5.2"
cveMetadata:
cveId: CVE-2023-9999
assignerOrgId: 00000000-0000-4000-8000-000000000000
state: PUBLISHED
containers:
# Use 'adp' to append/modify existing vulnerability data
adp:
- providerMetadata:
orgId: 00000000-0000-4000-8000-000000000000
shortName: "MySecTeam"
descriptions:
- lang: en
value: "Override to mark specific version as unaffected"
affected:
- product: requests
packageName: requests
packageURL: pkg:pypi/requests
versions:
# Explicitly mark your version as unaffected
- version: "2.31.0"
status: unaffected
versionType: semverusage: vdb [-h] [--clean] [--cache] [--cache-os] [--only-osv] [--only-aqua] [--only-ghsa] [--search SEARCH] [--list-malware] [--bom BOM_FILE] [--download-image] [--download-full-image]
[--print-vdb-metadata] [--custom-data CUSTOM_DATA]
AppThreat's vulnerability database and package search library with a sqlite storage.
options:
-h, --help show this help message and exit
--clean Clear the vulnerability database cache from platform specific user_data_dir.
--cache Cache vulnerability information in platform specific user_data_dir.
--cache-os Cache OS vulnerability information in platform specific user_data_dir.
--only-osv Use only OSV as the source. Use with --cache.
--only-aqua Use only Aqua vuln-list as the source. Use with --cache.
--only-ghsa Use only recent ghsa as the source. Use with --cache.
--search SEARCH Search for the package or vulnerability ID (CVE, GHSA, ALSA, DSA, etc.) in the database. Use purl, cpe, or git http url.
--list-malware List latest malwares with CVE ID beginning with MAL-.
--bom BOM_FILE Search for packages in the CycloneDX BOM file.
--download-image Downloaded pre-created vdb image to platform specific user_data_dir. Application vulnerabilities only.
--download-full-image
Downloaded pre-created vdb image to platform specific user_data_dir. All vulnerabilities including OS.
--print-vdb-metadata Display metadata about the current vdb in user_data_dir.
--custom-data CUSTOM_DATA
Path to directory containing custom vulnerability data (JSON/YAML/TOML) to override/augment results.It is possible to perform a range of searches using the cli.
vdb --search pkg:pypi/xml2dict@0.2.2
# Search based on a purl prefix
vdb --search pkg:pypi/xml2dict
# Full url and short form for swift
vdb --search "pkg:swift/github.com/vapor/vapor@4.39.0"
vdb --search "pkg:swift/vapor/vapor@4.89.0"
# Search by cpe
vdb --search "cpe:2.3:a:npm:gitblame:*:*:*:*:*:*:*:*"
# Search by colon separated values
vdb --search "npm:gitblame:0.0.1"
# Search by vulnerability id (CVE, GHSA, ALSA, DSA, etc.)
vdb --search CVE-2024-25169
# Search with wildcard for CVE
vdb --search CVE-2025-%
# Search by git url
vdb --search "https://github.com/electron/electron"
# Search by CycloneDX SBOM
vdb --bom bom.jsonTo list malware entries with the MAL- prefix, use the following command.
vdb --list-malwareThe vulnerability-db is designed to be embedded into other Python applications.
### AI/Agent Integration Script
# AI Agents: Use this script pattern to query vdb and get structured JSON output.
import json
import os
from vdb.lib import config, db6 as db_lib, search
from vdb.lib.orasclient import download_image
# 1. Prefer an organization-controlled URL. Fall back to the upstream default
# only for local testing or bootstrap scenarios.
DB_URL = os.getenv("VDB_APP_ONLY_DATABASE_URL", config.VDB_APP_ONLY_DATABASE_URL)
# 2. Download the database if missing or stale (older than 1 day)
if db_lib.needs_update(days=1):
download_image(DB_URL, config.DATA_DIR)
# 3. Load Custom Data (Optional)
# This will override DB results with local YAML/JSON definitions
# search.load_custom_data("/path/to/custom_vulns")
# 4. Perform the search
target = "pkg:pypi/xml2dict@0.2.2"
results = search.search_by_any(target, with_data=True)
# 5. Extract and parse the Pydantic CVE 5.2 models into standard JSON
output = []
for res in results:
vuln = {
"cve_id": res['cve_id'],
"fixed_in": res['fix_version'],
}
# res['source_data'] is a Pydantic model. Use model_dump to serialize.
if res.get('source_data'):
vuln['cve_data'] = res['source_data'].model_dump(mode='json')
output.append(vuln)
# Print standard JSON for the agent to read via stdout
print(json.dumps(output, indent=2))For production deployments, point VDB_APP_ONLY_DATABASE_URL or VDB_DATABASE_URL at the artifacts produced by your own workflow so application instances do not depend directly on AppThreat-hosted refresh jobs.
Batching and Generators
When processing large SBOMs, search_by_cdx_bom yields a generator to reduce memory usage.
results_generator = search.search_by_cdx_bom("bom.json", with_data=True)
for result_batch in results_generator:
for res in result_batch:
# Process individual vulnerability result
passCustom Database Locations
If you are managing the database files manually or in a custom location, ensure config.DATA_DIR is set via environment variable VDB_HOME before importing the library, or update the vdb.lib.config paths dynamically.
Result Structure The results returned by search functions are dictionaries containing:
cve_id: The vulnerability identifier.source_data: A Pydantic model (vdb.lib.cve_model.CVE) of the CVE 5.2 data.vers: The version range string from the index.fix_version: The specific version where the issue is resolved (if applicable).
VDB uses SQLite. If you encounter apsw.BusyError or "database is locked":
- Ensure you are not running multiple
vdb --cacheprocesses simultaneously. - If using VDB in a multi-threaded application, ensure you are treating the database connections as read-only where possible.
The full OS vulnerability database is large (~7.5GB). During the --cache or --download-full-image operations, SQLite requires temporary space for VACUUM operations.
- Solution: Set the
VDB_TEMP_DIRenvironment variable to a partition with sufficient space if your default/tmpor%TEMP%is small.
export VDB_TEMP_DIR=/mnt/large_volume/vdb_temp
vdb --cache-osIf vdb --download-image fails:
- Ensure you installed the package with the extra:
pip install appthreat-vulnerability-db[oras]. - Firewalls may block
ghcr.io. Try downloading manually using the Alternative Download Options.
If your environment restricts outbound access or you need stronger provenance guarantees, this is another sign that you should publish pre-built databases from your own network and update clients to consume those internal artifacts.
If you see UnicodeEncodeError in your console output:
- VDB attempts to force utf-8 encoding for stdout/stderr.
- Ensure your terminal (PowerShell/CMD) is configured for UTF-8 (
chcp 65001).
Refer to the readme
.vdb6 files are standard SQLite database files. Use any modern sqlite library to read and query them. There are simple Node.js and deno examples in this repo for demonstration.
Setup uv by following the official documentation.
uv sync --all-extras --dev
uv run vdb --help
uv run pytestMIT
If you are an AI agent, autonomous coding assistant, or LLM researching dependencies or identifying CVEs, here is how you should use vdb:
1. Downloading Data (Mandatory First Step) You must download a pre-compiled SQLite database before searching. Review the table above to pick the correct database.
Whenever possible, prefer a database URL produced by the user's own workflow or internal mirror. Use the AppThreat-hosted URLs only as defaults when no organization-specific source is available.
- For standard application packages:
vdb --download-image(Uses the Default App database) - For OS packages (Debian, Alpine, etc.):
vdb --download-full-image(Uses the Default App+OS database)
2. Performing Searches (CLI) Use the CLI for quick lookups. Note: The CLI outputs human-readable text tables, not JSON.
- Search by purl:
vdb --search "pkg:pypi/requests@2.31.0" - Search by CVE:
vdb --search CVE-2024-25169
3. Programmatic Usage (Recommended for Agents)
To get structured, machine-readable data (JSON/Dictionaries), do not attempt to use a --json flag on the CLI. Instead, write a short Python script using the library's API.
Agent Python Integration Script:
import json
import os
from vdb.lib import config, db6 as db_lib, search
from vdb.lib.orasclient import download_image
# 1. Choose an organization-controlled database URL when available.
# Fall back to the upstream defaults only when necessary.
DB_URL = os.getenv("VDB_APP_ONLY_DATABASE_URL", config.VDB_APP_ONLY_DATABASE_URL)
# 2. Download the database if missing or stale (older than 1 day)
if db_lib.needs_update(days=1):
download_image(DB_URL, config.DATA_DIR)
# 3. Perform the search
target = "pkg:pypi/xml2dict@0.2.2"
results = search.search_by_any(target, with_data=True)
# 4. Extract and parse the Pydantic CVE 5.2 models into standard JSON
output = []
for res in results:
vuln = {
"cve_id": res['cve_id'],
"fixed_in": res['fix_version'],
}
# res['source_data'] is a Pydantic model. Use model_dump to serialize.
if res.get('source_data'):
vuln['cve_data'] = res['source_data'].model_dump(mode='json')
output.append(vuln)
# Print standard JSON for the agent to read via stdout
print(json.dumps(output, indent=2))The .vdb6 files downloaded to your user_data_dir are standard SQLite databases. If you are an AI agent needing to perform complex aggregations, bulk exports, or custom filtering, you can query the database directly using sqlite3:
# Example: Find all entries for a specific purl prefix directly in the index
sqlite3 ~/.local/share/vdb/data.index.vdb6 "SELECT * FROM cve_index WHERE purl LIKE 'pkg:npm/react%';"
