Mass ingest

Production-ready examples for ingesting large numbers of repositories into Moderne using the Moderne CLI.

Choose your deployment stage

This repository provides three progressive deployment examples. Each stage is completely independent and self-contained - you can start at any stage based on your needs.

1-quickstart: Get started quickly

Best for:

Quick proof of concept
Small repository counts (< 1.000 repos)
Development and testing
Learning how mass-ingest works

What's included:

Single Docker container
Manual docker commands
Basic monitoring via CLI metrics endpoint

Resources needed:

2 CPU cores
16 GB RAM
32+ GB disk

→ Start with 1-quickstart

2-observability: Add monitoring and visibility

Best for:

Production use on a single host
Small repository counts (< 1.000 repos)
Medium repository count with manual scaling (<10.000 repos)
Need for operational visibility
Continuous ingestion workflows

What's included:

Docker Compose orchestration
Integrated Grafana dashboards
Prometheus metrics collection
Automated restarts and scheduling

Resources needed:

3 CPU cores (2 for mass-ingest, 1 for monitoring)
18 GB RAM (16 for mass-ingest, 2 for monitoring)
50+ GB disk

→ Start with 2-observability

3-scalability: Scale to production

Best for:

Large repository counts (>10.000 repos)
Parallel processing requirements
Production deployment with automatic scaling
Enterprise environments

What's included:

Cloud-native batch services (AWS Batch, GCP Batch)
Terraform infrastructure as code
Scheduled automation (daily/weekly)
Auto-scaling compute — scales to zero when idle
Production monitoring and cost optimization

Resources needed:

Cloud account (AWS or GCP)
Terraform >= 1.0
VPC with internet access
Configurable compute (scales from 0 to 256+ vCPUs)

→ Start with 3-scalability

Repository structure

mass-ingest-example/
├── Dockerfile            # Container image definition (used by all stages)
├── Dockerfile.fips       # FIPS 140-2/140-3 compliant variant (UBI 9)
├── publish.sh            # Main ingestion script
├── publish.ps1           # PowerShell version
├── repos.csv             # Example repository list
│
├── 1-quickstart/         # Single container deployment
│   └── README.md
│
├── 2-observability/      # Docker Compose with monitoring
│   ├── docker-compose.yml
│   ├── .env.example
│   ├── observability/    # Grafana and Prometheus configs
│   └── README.md
│
├── 3-scalability/        # Cloud-native batch deployment (multi-cloud)
│   ├── README.md          # Platform comparison and architecture overview
│   ├── aws-batch/         # AWS Batch + EventBridge + Secrets Manager
│   │   ├── chunk.sh
│   │   ├── terraform/
│   │   └── README.md
│   ├── gcp-batch/         # GCP Batch + Cloud Scheduler + Secret Manager
│   │   ├── task.sh
│   │   ├── terraform/
│   │   └── README.md
│
└── diagnostics/          # Comprehensive diagnostic system
    ├── diagnose.sh       # Main orchestration script
    ├── lib/              # Shared libraries
    │   ├── core.sh       # Colors, output formatting, utilities
    │   └── latency.sh    # Latency and throughput testing
    └── checks/           # Modular check scripts
        ├── system.sh     # CPUs, memory, disk space
        ├── tools.sh      # git, curl, jq, etc.
        ├── docker.sh     # Container detection, CPU arch, emulation
        ├── threads.sh    # Cgroup PID limits, ulimit, kernel threads-max
        ├── java.sh       # JDKs, JAVA_HOME
        ├── cli.sh        # mod CLI version, config
        ├── config.sh     # Env vars, credentials
        ├── repos-csv.sh  # File validation, columns, origins
        ├── network.sh    # Connectivity to all hosts
        ├── ssl.sh        # SSL handshakes, cert expiry
        ├── auth-publish.sh # Write/read/delete test
        ├── auth-scm.sh   # .git-credentials validation
        ├── publish-latency.sh # Publish URL latency and throttling
        ├── maven-repos.sh # Maven repos from settings.xml
        ├── dependency-repos.sh # User-specified repos (Gradle, etc.)
        └── scm-repos.sh  # SCM connectivity per origin

Prerequisites (all stages)

Before starting with any stage, you'll need:

Repository list: Create repos.csv with repositories to ingest

cloneUrl,branch,origin,path
https://github.com/org/repo1,main,github.com,org/repo1
https://github.com/org/repo2,main,github.com,org/repo2

Artifact repository: Maven-formatted repository for publishing LSTs
- Artifactory, Nexus, or similar
- Dedicated repository recommended (separate from other artifacts)
- Credentials with publish permissions
Source control access: If repositories require authentication
- Service account with read access to all repositories
- Personal access token or credentials
Docker: Installed and running (for stages 1 and 2)
Bash: Required in the container image (Alpine users: apk add bash)
Cloud account: AWS or GCP account (required only for stage 3)

Quick comparison

Feature	1-quickstart	2-observability	3-scalability
Deployment	Single container	Docker Compose	Cloud-native batch + Terraform
Monitoring	CLI metrics endpoint	Grafana + Prometheus	Cloud-native logging + optional Grafana
Scaling	Manual	Single host	Auto-scaling parallel workers
Scheduling	Manual/cron	Docker restart policy	Cloud-native scheduler
Cost	Lowest	Low	Scales with usage
Setup time	15 minutes	30 minutes	1-2 hours
Ideal repo count	< 100	100-1000	1000+
Parallel processing	No	No	Yes

Common configuration

All stages share the same core configuration needs:

Environment variables

PUBLISH_URL - Artifact repository URL (e.g., https://artifactory.example.com/artifactory/moderne-ingest/)
PUBLISH_USER - Repository username
PUBLISH_PASSWORD - Repository password
PUBLISH_TOKEN - Alternative to user/password for JFrog
MODERNE_TENANT - Your Moderne tenant url (optional)
MODERNE_TOKEN - Moderne API token (optional)

Repository authentication

For private repositories, credentials are mounted at runtime (never baked into images):

.git-credentials file for HTTPS
.ssh directory for SSH

See each stage's README for specific mounting instructions.

Repository list format

The repos.csv file columns:

cloneUrl (required) - Full git clone URL
origin (required) - Source identifier (e.g., github.com)
path (required) - Repository path/identifier
branch (optional) - Branch to build (uses remote default if not specified)
gradleVersion (optional) - Selects a specific Gradle version for repos without a wrapper (must match an installation registered via mod config build gradle installation edit)

See repos.csv documentation for advanced options.

Dependency repositories (optional)

Create dependency-repos.csv to test connectivity to Maven/Gradle dependency repositories during diagnostics:

url,username,password,token
https://nexus.example.com/releases,${NEXUS_USER},${NEXUS_PASSWORD},
https://artifactory.example.com/libs,,,${ARTIFACTORY_TOKEN}
https://repo.spring.io/release,,,

Use username + password for basic auth
Use token for bearer auth (leave username/password empty)
Leave all auth fields empty for anonymous access
Use ${ENV_VAR} syntax to reference environment variables

See dependency-repos.csv.example for a template.

Build arguments

All Dockerfiles support:

MODERNE_CLI_VERSION - Specific CLI version (defaults to latest release)
MODERNE_CLI_STAGE - release (default) for latest release from Maven Central, snapshot for latest snapshot
MODERNE_CLI_RELEASES_REPO - Maven repository for release CLI artifacts (defaults to https://repo1.maven.org/maven2)
MODERNE_CLI_SNAPSHOTS_REPO - Maven repository for snapshot CLI artifacts (defaults to https://central.sonatype.com/repository/maven-snapshots)

FIPS-compliant image

A separate Dockerfile.fips is provided for environments that require FIPS 140-2/140-3 compliance. It uses Red Hat UBI 9 with the FIPS crypto policy enabled, which restricts all cryptographic operations to FIPS-approved algorithms.

Build:

docker build -f Dockerfile.fips -t mass-ingest:fips .

Build arguments (in addition to MODERNE_CLI_VERSION):

Argument	Default	Description
`MAVEN_REPO_URL`	`https://repo1.maven.org/maven2`	Maven repository for CLI and Maven
`GRADLE_DIST_URL`	`https://services.gradle.org/distributions`	Gradle distribution download URL
`GRADLE_VERSION`	`8.14`	Primary Gradle version to install
`GRADLE_EXTRA_VERSIONS`	(empty)	Comma-separated additional Gradle versions (e.g., `6.9.4,5.6.4`)
`MAVEN_VERSION`	`3.9.11`	Maven version to install

Using internal mirrors:

Public download servers (Maven Central, Gradle services) may not support FIPS-compliant TLS cipher suites. The Dockerfile uses a separate download stage without FIPS restrictions to handle this. To make the entire build FIPS-compliant end to end, point the download URLs at internal mirrors that support FIPS-compliant TLS:

docker build -f Dockerfile.fips \
  --build-arg MAVEN_REPO_URL=https://nexus.internal/repository/maven-central \
  --build-arg GRADLE_DIST_URL=https://nexus.internal/repository/gradle-dist \
  -t mass-ingest:fips .

When using internal mirrors, you can remove the downloader stage from the Dockerfile and move its ARG and RUN commands into the base stage (after the dnf install that provides curl). This makes the entire build FIPS-compliant.

Run: All docker run commands from the stage READMEs work unchanged — just substitute the image name:

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/data:/var/moderne \
  -e PUBLISH_URL=https://your-artifactory.com/artifactory/moderne-ingest/ \
  -e PUBLISH_USER=your-username \
  -e PUBLISH_PASSWORD=your-password \
  mass-ingest:fips

JDK 8 and 11 TLS 1.3 workaround:

RHEL 9 backported TLS 1.3 into JDK 8 and 11, but the backported P11AEADCipher has a bug in AES-GCM decryption that causes TLS 1.3 handshakes to fail with CKR_ENCRYPTED_DATA_INVALID when running through NSS in FIPS mode. JDK 17+ has the fix. The Dockerfile disables TLS 1.3 for JDK 8 and 11, forcing them to use TLS 1.2 which works correctly. This is strictly more restrictive than stock FIPS — same algorithm restrictions plus TLS 1.3 disabled. JDK 17+ is unaffected and uses TLS 1.3 normally.

Key differences from the standard image:

Aspect	Standard (`Dockerfile`)	FIPS (`Dockerfile.fips`)
Base image	Eclipse Temurin (Ubuntu)	Red Hat UBI 9
JDK provider	Adoptium Temurin	Red Hat OpenJDK
JDK versions	8, 11, 17, 21, 25	8, 11, 17, 21, 25
Crypto policy	Default (unrestricted)	FIPS (`update-crypto-policies --set`)
Certificate mgmt	Per-JDK keytool	System trust store (`update-ca-trust`)
Package manager	apt-get	dnf

Note

For full kernel-level FIPS compliance, the host OS must also be running in FIPS mode. The container enforces FIPS-approved algorithms at the userspace level (OpenSSL, Java security providers) regardless of host configuration.

Generating repository lists

We provide scripts to generate repos.csv from various sources:

Repository Fetchers - Scripts for GitHub, GitLab, Bitbucket, and more

Diagnostics

The diagnostics/ directory contains a comprehensive diagnostic system to validate your mass-ingest setup before starting ingestion.

Diagnostic mode (full validation)

Run comprehensive diagnostics without starting ingestion:

DIAGNOSE=true docker compose up

This validates the entire setup and produces a detailed report:

System (CPUs, memory, disk space)
Required tools (git, curl, jq, unzip, tar)
Runtime environment (container detection, CPU architecture, emulation)
Thread/process limits (cgroup PID limits, ulimit, kernel threads-max)
Java/JDKs (available JDKs, JAVA_HOME)
Moderne CLI (version, build config, proxy, trust store, tenant)
Configuration (env vars, credentials, git credentials)
repos.csv (file validation, columns, origins, sample entries)
Network (Maven Central, Gradle plugins, publish URL, SCM hosts)
SSL/Certificates (handshakes, expiry warnings)
Authentication (publish write/read/delete test, SCM credentials validation)
Publish latency (throughput testing, rate limit detection)
Maven repositories (dependency repo connectivity from settings.xml)
Dependency repositories (user-specified repos from dependency-repos.csv)
SCM repositories (connectivity testing per origin from repos.csv)

The container exits with code 0 if all checks pass, or 1 if any failures are detected.

Use cases:

Initial setup validation before first real run
After configuration changes before deploying
Troubleshooting when something stops working
Generating diagnostic output to send to Moderne support

Diagnostics at startup

Set DIAGNOSE_ON_START=true to run diagnostics before ingestion starts:

docker run -e DIAGNOSE_ON_START=true ...

This runs all diagnostic checks and then proceeds to normal ingestion regardless of the results. Use this to capture diagnostic output in your logs while still attempting ingestion.

Running diagnostics directly

You can run the main diagnostic script or individual checks:

# Full diagnostics
./diagnostics/diagnose.sh

# Individual checks can be run directly
./diagnostics/checks/docker.sh
./diagnostics/checks/network.sh
./diagnostics/checks/auth-publish.sh

Example output

Mass-ingest Diagnostics
Generated: 2025-01-20 14:32 UTC

=== System ===
[PASS] CPUs: 4
[PASS] Memory: 12.5GB / 16.0GB available
[PASS] Disk (data): 45.2GB / 100.0GB available

=== Required tools ===
[PASS] git: 2.39.3
[PASS] curl: 8.4.0
[PASS] jq: 1.7
[PASS] unzip: 6.00
[PASS] tar: 1.35

=== Runtime environment ===
[PASS] Running inside Docker
       Base image: Ubuntu 24.04.1 LTS
[PASS] Architecture: x86_64 (no emulation detected)

=== Thread and process limits ===
       Java builds use many threads. Low PID/thread limits cause 'pthread_create' errors.
       Expect: unlimited or 8192+ for cgroup PID limit and ulimit.
[PASS] Cgroup PID limit: unlimited (3 currently used)
[PASS] Max user processes (ulimit -u): unlimited
       Kernel threads-max: 127733

=== Java/JDKs ===
[PASS] JAVA_HOME: /opt/java/openjdk
       Detected JDKs (mod config java jdk list):
         21.0.1-tem   $JAVA_HOME     /opt/java/openjdk
         17.0.9-tem   OS directory   /usr/lib/jvm/temurin-17
[PASS] 5 JDK(s) available in /usr/lib/jvm/

=== Moderne CLI ===
[PASS] CLI installed: v3.56.0
       Configuration:
         Trust store: default JVM
         Proxy: not configured
         LST artifacts: Maven (https://artifactory.company.com/moderne)
         Build timeouts: default

=== Configuration ===
[PASS] DATA_DIR: /var/moderne (writable)
[PASS] PUBLISH_URL: https://artifactory.company.com/moderne
[PASS] Publish credentials: PUBLISH_USER/PASSWORD set
       Git credentials:
[PASS] HTTPS credentials: /root/.git-credentials (2 entries)

=== repos.csv ===
[PASS] File: /app/repos.csv (exists)
[PASS] Repositories: 427
[PASS] Required columns: cloneUrl, origin, path (present)
[PASS] Additional column: branch (present)
       Repositories by origin:
         github.com: 412 repos
         gitlab.internal.com: 15 repos
       Sample entries (first 3):
         https://github.com/company/repo-one (main)
         https://github.com/company/repo-two (main)

=== Network ===
[PASS] Maven Central: reachable (45ms)
[PASS] Gradle plugins: reachable (52ms)
[PASS] PUBLISH_URL: reachable (23ms)
[PASS] github.com: reachable (31ms)
[FAIL] gitlab.internal.com: unreachable

=== SSL/Certificates ===
[PASS] artifactory.company.com: SSL OK (expires in 285 days)
[PASS] github.com: SSL OK (expires in 180 days)
[PASS] repo1.maven.org: SSL OK (expires in 340 days)

=== Authentication - Publish ===
[PASS] Write test: succeeded (HTTP 201)
[PASS] Read test: succeeded (HTTP 200)
[PASS] Overwrite test: succeeded (HTTP 201)
[PASS] Delete test: succeeded (HTTP 204)

=== SCM credentials ===
[PASS] .git-credentials: found 2 credential(s)
[PASS] .git-credentials: file is read-only (mode 400)

=== Publish latency ===
       Testing PUBLISH_URL (10 sequential requests)...
       Sequential: min=23ms avg=45ms max=89ms
[PASS] PUBLISH_URL: average latency 45ms
       Testing PUBLISH_URL (3 × 20 concurrent)...
       Parallel batches: 850ms, 820ms, 890ms
[PASS] PUBLISH_URL: parallel throughput 42ms/request

=== Maven repositories ===
       Using: /root/.m2/settings.xml
       Testing central (10 sequential requests)...
       Sequential: min=38ms avg=42ms max=67ms
[PASS] central: average latency 42ms
       Testing central (3 × 20 concurrent)...
       Parallel batches: 920ms, 880ms, 950ms
[PASS] central: parallel throughput 45ms/request
       Testing internal-nexus (via mirror: nexus-mirror) (10 sequential requests)...
       Sequential: min=15ms avg=18ms max=24ms
[PASS] internal-nexus (via mirror: nexus-mirror): average latency 18ms
       Testing internal-nexus (via mirror: nexus-mirror) (3 × 20 concurrent)...
       Parallel batches: 380ms, 350ms, 390ms
[PASS] internal-nexus (via mirror: nexus-mirror): parallel throughput 18ms/request

=== Dependency repositories ===
       Using: ./dependency-repos.csv
       Testing nexus.example.com (10 sequential requests)...
       Sequential: min=19ms avg=23ms max=31ms
[PASS] nexus.example.com: average latency 23ms
       Testing nexus.example.com (3 × 20 concurrent)...
       Parallel batches: 480ms, 450ms, 510ms
[PASS] nexus.example.com: parallel throughput 24ms/request

========================================
RESULT: 1 failure(s), 0 warning(s), 24 passed
========================================

Support and documentation

License

This example code is provided as-is for use with Moderne products.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mass ingest

Choose your deployment stage

1-quickstart: Get started quickly

2-observability: Add monitoring and visibility

3-scalability: Scale to production

Repository structure

Prerequisites (all stages)

Quick comparison

Common configuration

Environment variables

Repository authentication

Repository list format

Dependency repositories (optional)

Build arguments

FIPS-compliant image

Generating repository lists

Diagnostics

Diagnostic mode (full validation)

Diagnostics at startup

Running diagnostics directly

Example output

Support and documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
.github/workflows		.github/workflows
1-quickstart		1-quickstart
2-observability		2-observability
3-scalability		3-scalability
diagnostics		diagnostics
maven		maven
npm		npm
python		python
repo-fetchers		repo-fetchers
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.fips		Dockerfile.fips
README.md		README.md
dependency-repos.csv.example		dependency-repos.csv.example
moderne.yml		moderne.yml
publish.ps1		publish.ps1
publish.sh		publish.sh
repos.csv		repos.csv
script.ps1		script.ps1

Folders and files

Latest commit

History

Repository files navigation

Mass ingest

Choose your deployment stage

1-quickstart: Get started quickly

2-observability: Add monitoring and visibility

3-scalability: Scale to production

Repository structure

Prerequisites (all stages)

Quick comparison

Common configuration

Environment variables

Repository authentication

Repository list format

Dependency repositories (optional)

Build arguments

FIPS-compliant image

Generating repository lists

Diagnostics

Diagnostic mode (full validation)

Diagnostics at startup

Running diagnostics directly

Example output

Support and documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages