Production-ready examples for ingesting large numbers of repositories into Moderne using the Moderne CLI.
This repository provides three progressive deployment examples. Each stage is completely independent and self-contained - you can start at any stage based on your needs.
Best for:
- Quick proof of concept
- Small repository counts (< 1.000 repos)
- Development and testing
- Learning how mass-ingest works
What's included:
- Single Docker container
- Manual docker commands
- Basic monitoring via CLI metrics endpoint
Resources needed:
- 2 CPU cores
- 16 GB RAM
- 32+ GB disk
Best for:
- Production use on a single host
- Small repository counts (< 1.000 repos)
- Medium repository count with manual scaling (<10.000 repos)
- Need for operational visibility
- Continuous ingestion workflows
What's included:
- Docker Compose orchestration
- Integrated Grafana dashboards
- Prometheus metrics collection
- Automated restarts and scheduling
Resources needed:
- 3 CPU cores (2 for mass-ingest, 1 for monitoring)
- 18 GB RAM (16 for mass-ingest, 2 for monitoring)
- 50+ GB disk
Best for:
- Large repository counts (>10.000 repos)
- Parallel processing requirements
- Production deployment with automatic scaling
- Enterprise environments
What's included:
- Cloud-native batch services (AWS Batch, GCP Batch)
- Terraform infrastructure as code
- Scheduled automation (daily/weekly)
- Auto-scaling compute — scales to zero when idle
- Production monitoring and cost optimization
Resources needed:
- Cloud account (AWS or GCP)
- Terraform >= 1.0
- VPC with internet access
- Configurable compute (scales from 0 to 256+ vCPUs)
mass-ingest-example/
├── Dockerfile # Container image definition (used by all stages)
├── Dockerfile.fips # FIPS 140-2/140-3 compliant variant (UBI 9)
├── publish.sh # Main ingestion script
├── publish.ps1 # PowerShell version
├── repos.csv # Example repository list
│
├── 1-quickstart/ # Single container deployment
│ └── README.md
│
├── 2-observability/ # Docker Compose with monitoring
│ ├── docker-compose.yml
│ ├── .env.example
│ ├── observability/ # Grafana and Prometheus configs
│ └── README.md
│
├── 3-scalability/ # Cloud-native batch deployment (multi-cloud)
│ ├── README.md # Platform comparison and architecture overview
│ ├── aws-batch/ # AWS Batch + EventBridge + Secrets Manager
│ │ ├── chunk.sh
│ │ ├── terraform/
│ │ └── README.md
│ ├── gcp-batch/ # GCP Batch + Cloud Scheduler + Secret Manager
│ │ ├── task.sh
│ │ ├── terraform/
│ │ └── README.md
│
└── diagnostics/ # Comprehensive diagnostic system
├── diagnose.sh # Main orchestration script
├── lib/ # Shared libraries
│ ├── core.sh # Colors, output formatting, utilities
│ └── latency.sh # Latency and throughput testing
└── checks/ # Modular check scripts
├── system.sh # CPUs, memory, disk space
├── tools.sh # git, curl, jq, etc.
├── docker.sh # Container detection, CPU arch, emulation
├── threads.sh # Cgroup PID limits, ulimit, kernel threads-max
├── java.sh # JDKs, JAVA_HOME
├── cli.sh # mod CLI version, config
├── config.sh # Env vars, credentials
├── repos-csv.sh # File validation, columns, origins
├── network.sh # Connectivity to all hosts
├── ssl.sh # SSL handshakes, cert expiry
├── auth-publish.sh # Write/read/delete test
├── auth-scm.sh # .git-credentials validation
├── publish-latency.sh # Publish URL latency and throttling
├── maven-repos.sh # Maven repos from settings.xml
├── dependency-repos.sh # User-specified repos (Gradle, etc.)
└── scm-repos.sh # SCM connectivity per origin
Before starting with any stage, you'll need:
-
Repository list: Create
repos.csvwith repositories to ingestcloneUrl,branch,origin,path https://github.com/org/repo1,main,github.com,org/repo1 https://github.com/org/repo2,main,github.com,org/repo2
-
Artifact repository: Maven-formatted repository for publishing LSTs
- Artifactory, Nexus, or similar
- Dedicated repository recommended (separate from other artifacts)
- Credentials with publish permissions
-
Source control access: If repositories require authentication
- Service account with read access to all repositories
- Personal access token or credentials
-
Docker: Installed and running (for stages 1 and 2)
-
Bash: Required in the container image (Alpine users:
apk add bash) -
Cloud account: AWS or GCP account (required only for stage 3)
| Feature | 1-quickstart | 2-observability | 3-scalability |
|---|---|---|---|
| Deployment | Single container | Docker Compose | Cloud-native batch + Terraform |
| Monitoring | CLI metrics endpoint | Grafana + Prometheus | Cloud-native logging + optional Grafana |
| Scaling | Manual | Single host | Auto-scaling parallel workers |
| Scheduling | Manual/cron | Docker restart policy | Cloud-native scheduler |
| Cost | Lowest | Low | Scales with usage |
| Setup time | 15 minutes | 30 minutes | 1-2 hours |
| Ideal repo count | < 100 | 100-1000 | 1000+ |
| Parallel processing | No | No | Yes |
All stages share the same core configuration needs:
PUBLISH_URL- Artifact repository URL (e.g.,https://artifactory.example.com/artifactory/moderne-ingest/)PUBLISH_USER- Repository usernamePUBLISH_PASSWORD- Repository passwordPUBLISH_TOKEN- Alternative to user/password for JFrogMODERNE_TENANT- Your Moderne tenant url (optional)MODERNE_TOKEN- Moderne API token (optional)
For private repositories, credentials are mounted at runtime (never baked into images):
.git-credentialsfile for HTTPS.sshdirectory for SSH
See each stage's README for specific mounting instructions.
The repos.csv file columns:
cloneUrl(required) - Full git clone URLorigin(required) - Source identifier (e.g.,github.com)path(required) - Repository path/identifierbranch(optional) - Branch to build (uses remote default if not specified)gradleVersion(optional) - Selects a specific Gradle version for repos without a wrapper (must match an installation registered viamod config build gradle installation edit)
See repos.csv documentation for advanced options.
Create dependency-repos.csv to test connectivity to Maven/Gradle dependency repositories during diagnostics:
url,username,password,token
https://nexus.example.com/releases,${NEXUS_USER},${NEXUS_PASSWORD},
https://artifactory.example.com/libs,,,${ARTIFACTORY_TOKEN}
https://repo.spring.io/release,,,- Use
username+passwordfor basic auth - Use
tokenfor bearer auth (leave username/password empty) - Leave all auth fields empty for anonymous access
- Use
${ENV_VAR}syntax to reference environment variables
See dependency-repos.csv.example for a template.
All Dockerfiles support:
MODERNE_CLI_VERSION- Specific CLI version (defaults to latest release)MODERNE_CLI_STAGE-release(default) for latest release from Maven Central,snapshotfor latest snapshotMODERNE_CLI_RELEASES_REPO- Maven repository for release CLI artifacts (defaults tohttps://repo1.maven.org/maven2)MODERNE_CLI_SNAPSHOTS_REPO- Maven repository for snapshot CLI artifacts (defaults tohttps://central.sonatype.com/repository/maven-snapshots)
A separate Dockerfile.fips is provided for environments that require FIPS 140-2/140-3 compliance. It uses Red Hat UBI 9 with the FIPS crypto policy enabled, which restricts all cryptographic operations to FIPS-approved algorithms.
Build:
docker build -f Dockerfile.fips -t mass-ingest:fips .Build arguments (in addition to MODERNE_CLI_VERSION):
| Argument | Default | Description |
|---|---|---|
MAVEN_REPO_URL |
https://repo1.maven.org/maven2 |
Maven repository for CLI and Maven |
GRADLE_DIST_URL |
https://services.gradle.org/distributions |
Gradle distribution download URL |
GRADLE_VERSION |
8.14 |
Primary Gradle version to install |
GRADLE_EXTRA_VERSIONS |
(empty) | Comma-separated additional Gradle versions (e.g., 6.9.4,5.6.4) |
MAVEN_VERSION |
3.9.11 |
Maven version to install |
Using internal mirrors:
Public download servers (Maven Central, Gradle services) may not support FIPS-compliant TLS cipher suites. The Dockerfile uses a separate download stage without FIPS restrictions to handle this. To make the entire build FIPS-compliant end to end, point the download URLs at internal mirrors that support FIPS-compliant TLS:
docker build -f Dockerfile.fips \
--build-arg MAVEN_REPO_URL=https://nexus.internal/repository/maven-central \
--build-arg GRADLE_DIST_URL=https://nexus.internal/repository/gradle-dist \
-t mass-ingest:fips .When using internal mirrors, you can remove the downloader stage from the Dockerfile and move its ARG and RUN commands into the base stage (after the dnf install that provides curl). This makes the entire build FIPS-compliant.
Run: All docker run commands from the stage READMEs work unchanged — just substitute the image name:
docker run --rm \
-p 8080:8080 \
-v $(pwd)/data:/var/moderne \
-e PUBLISH_URL=https://your-artifactory.com/artifactory/moderne-ingest/ \
-e PUBLISH_USER=your-username \
-e PUBLISH_PASSWORD=your-password \
mass-ingest:fipsJDK 8 and 11 TLS 1.3 workaround:
RHEL 9 backported TLS 1.3 into JDK 8 and 11, but the backported P11AEADCipher has a bug in AES-GCM decryption that causes TLS 1.3 handshakes to fail with CKR_ENCRYPTED_DATA_INVALID when running through NSS in FIPS mode. JDK 17+ has the fix. The Dockerfile disables TLS 1.3 for JDK 8 and 11, forcing them to use TLS 1.2 which works correctly. This is strictly more restrictive than stock FIPS — same algorithm restrictions plus TLS 1.3 disabled. JDK 17+ is unaffected and uses TLS 1.3 normally.
Key differences from the standard image:
| Aspect | Standard (Dockerfile) |
FIPS (Dockerfile.fips) |
|---|---|---|
| Base image | Eclipse Temurin (Ubuntu) | Red Hat UBI 9 |
| JDK provider | Adoptium Temurin | Red Hat OpenJDK |
| JDK versions | 8, 11, 17, 21, 25 | 8, 11, 17, 21, 25 |
| Crypto policy | Default (unrestricted) | FIPS (update-crypto-policies --set) |
| Certificate mgmt | Per-JDK keytool | System trust store (update-ca-trust) |
| Package manager | apt-get | dnf |
Note
For full kernel-level FIPS compliance, the host OS must also be running in FIPS mode. The container enforces FIPS-approved algorithms at the userspace level (OpenSSL, Java security providers) regardless of host configuration.
We provide scripts to generate repos.csv from various sources:
- Repository Fetchers - Scripts for GitHub, GitLab, Bitbucket, and more
The diagnostics/ directory contains a comprehensive diagnostic system to validate your mass-ingest setup before starting ingestion.
Run comprehensive diagnostics without starting ingestion:
DIAGNOSE=true docker compose upThis validates the entire setup and produces a detailed report:
- System (CPUs, memory, disk space)
- Required tools (git, curl, jq, unzip, tar)
- Runtime environment (container detection, CPU architecture, emulation)
- Thread/process limits (cgroup PID limits, ulimit, kernel threads-max)
- Java/JDKs (available JDKs, JAVA_HOME)
- Moderne CLI (version, build config, proxy, trust store, tenant)
- Configuration (env vars, credentials, git credentials)
- repos.csv (file validation, columns, origins, sample entries)
- Network (Maven Central, Gradle plugins, publish URL, SCM hosts)
- SSL/Certificates (handshakes, expiry warnings)
- Authentication (publish write/read/delete test, SCM credentials validation)
- Publish latency (throughput testing, rate limit detection)
- Maven repositories (dependency repo connectivity from settings.xml)
- Dependency repositories (user-specified repos from dependency-repos.csv)
- SCM repositories (connectivity testing per origin from repos.csv)
The container exits with code 0 if all checks pass, or 1 if any failures are detected.
Use cases:
- Initial setup validation before first real run
- After configuration changes before deploying
- Troubleshooting when something stops working
- Generating diagnostic output to send to Moderne support
Set DIAGNOSE_ON_START=true to run diagnostics before ingestion starts:
docker run -e DIAGNOSE_ON_START=true ...This runs all diagnostic checks and then proceeds to normal ingestion regardless of the results. Use this to capture diagnostic output in your logs while still attempting ingestion.
You can run the main diagnostic script or individual checks:
# Full diagnostics
./diagnostics/diagnose.sh
# Individual checks can be run directly
./diagnostics/checks/docker.sh
./diagnostics/checks/network.sh
./diagnostics/checks/auth-publish.shMass-ingest Diagnostics
Generated: 2025-01-20 14:32 UTC
=== System ===
[PASS] CPUs: 4
[PASS] Memory: 12.5GB / 16.0GB available
[PASS] Disk (data): 45.2GB / 100.0GB available
=== Required tools ===
[PASS] git: 2.39.3
[PASS] curl: 8.4.0
[PASS] jq: 1.7
[PASS] unzip: 6.00
[PASS] tar: 1.35
=== Runtime environment ===
[PASS] Running inside Docker
Base image: Ubuntu 24.04.1 LTS
[PASS] Architecture: x86_64 (no emulation detected)
=== Thread and process limits ===
Java builds use many threads. Low PID/thread limits cause 'pthread_create' errors.
Expect: unlimited or 8192+ for cgroup PID limit and ulimit.
[PASS] Cgroup PID limit: unlimited (3 currently used)
[PASS] Max user processes (ulimit -u): unlimited
Kernel threads-max: 127733
=== Java/JDKs ===
[PASS] JAVA_HOME: /opt/java/openjdk
Detected JDKs (mod config java jdk list):
21.0.1-tem $JAVA_HOME /opt/java/openjdk
17.0.9-tem OS directory /usr/lib/jvm/temurin-17
[PASS] 5 JDK(s) available in /usr/lib/jvm/
=== Moderne CLI ===
[PASS] CLI installed: v3.56.0
Configuration:
Trust store: default JVM
Proxy: not configured
LST artifacts: Maven (https://artifactory.company.com/moderne)
Build timeouts: default
=== Configuration ===
[PASS] DATA_DIR: /var/moderne (writable)
[PASS] PUBLISH_URL: https://artifactory.company.com/moderne
[PASS] Publish credentials: PUBLISH_USER/PASSWORD set
Git credentials:
[PASS] HTTPS credentials: /root/.git-credentials (2 entries)
=== repos.csv ===
[PASS] File: /app/repos.csv (exists)
[PASS] Repositories: 427
[PASS] Required columns: cloneUrl, origin, path (present)
[PASS] Additional column: branch (present)
Repositories by origin:
github.com: 412 repos
gitlab.internal.com: 15 repos
Sample entries (first 3):
https://github.com/company/repo-one (main)
https://github.com/company/repo-two (main)
=== Network ===
[PASS] Maven Central: reachable (45ms)
[PASS] Gradle plugins: reachable (52ms)
[PASS] PUBLISH_URL: reachable (23ms)
[PASS] github.com: reachable (31ms)
[FAIL] gitlab.internal.com: unreachable
=== SSL/Certificates ===
[PASS] artifactory.company.com: SSL OK (expires in 285 days)
[PASS] github.com: SSL OK (expires in 180 days)
[PASS] repo1.maven.org: SSL OK (expires in 340 days)
=== Authentication - Publish ===
[PASS] Write test: succeeded (HTTP 201)
[PASS] Read test: succeeded (HTTP 200)
[PASS] Overwrite test: succeeded (HTTP 201)
[PASS] Delete test: succeeded (HTTP 204)
=== SCM credentials ===
[PASS] .git-credentials: found 2 credential(s)
[PASS] .git-credentials: file is read-only (mode 400)
=== Publish latency ===
Testing PUBLISH_URL (10 sequential requests)...
Sequential: min=23ms avg=45ms max=89ms
[PASS] PUBLISH_URL: average latency 45ms
Testing PUBLISH_URL (3 × 20 concurrent)...
Parallel batches: 850ms, 820ms, 890ms
[PASS] PUBLISH_URL: parallel throughput 42ms/request
=== Maven repositories ===
Using: /root/.m2/settings.xml
Testing central (10 sequential requests)...
Sequential: min=38ms avg=42ms max=67ms
[PASS] central: average latency 42ms
Testing central (3 × 20 concurrent)...
Parallel batches: 920ms, 880ms, 950ms
[PASS] central: parallel throughput 45ms/request
Testing internal-nexus (via mirror: nexus-mirror) (10 sequential requests)...
Sequential: min=15ms avg=18ms max=24ms
[PASS] internal-nexus (via mirror: nexus-mirror): average latency 18ms
Testing internal-nexus (via mirror: nexus-mirror) (3 × 20 concurrent)...
Parallel batches: 380ms, 350ms, 390ms
[PASS] internal-nexus (via mirror: nexus-mirror): parallel throughput 18ms/request
=== Dependency repositories ===
Using: ./dependency-repos.csv
Testing nexus.example.com (10 sequential requests)...
Sequential: min=19ms avg=23ms max=31ms
[PASS] nexus.example.com: average latency 23ms
Testing nexus.example.com (3 × 20 concurrent)...
Parallel batches: 480ms, 450ms, 510ms
[PASS] nexus.example.com: parallel throughput 24ms/request
========================================
RESULT: 1 failure(s), 0 warning(s), 24 passed
========================================
This example code is provided as-is for use with Moderne products.