Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
- Tier 1 ships in this release. Tiers 2 (manual click), 3 (auto with grace window) and 4 (autonomous in maintenance window) are designed and will land in subsequent releases.
- See `doc/admin/updates.md` for full configuration.
- **Pad compaction.** New `compactPad` HTTP API plus `bin/compactPad` and `bin/compactAllPads` CLIs to reclaim database space on long-lived pads with heavy edit history (issue #6194). `--keep N` retains the last N revisions; `--dry-run` previews per-pad rev counts before writing. Per-pad failures don't stop the bulk run.
- `bin/compactStalePads` (issue #7642) targets only pads not edited in the last `--older-than N` days, so hot pads in active timeslider use are left alone. Same `--keep` / `--dry-run` shape as `bin/compactAllPads`. Targeting is deliberately a CLI concern — the `compactPad` API surface stays unchanged.
- **New packaging targets.**
- Etherpad is now published as a **Snap** package.
- **Debian (.deb)** packages are built via nfpm with a systemd unit, and a signed apt repository is published to `etherpad.org/apt`.
Expand Down
328 changes: 328 additions & 0 deletions bin/compactStalePads.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,328 @@
'use strict';

/*
* Compact every pad on the instance that has not been edited recently.
*
* Usage:
* node bin/compactStalePads.js --older-than 90 # collapse history on pads not edited in 90 days
* node bin/compactStalePads.js --older-than 90 --keep 50 # keep last 50 revisions
* node bin/compactStalePads.js --older-than 90 --dry-run # list, don't write
*
* Composes `listAllPads` → `getLastEdited` → `compactPad`. Same shape as
* `bin/compactAllPads` (per-pad error tolerance, dry-run, tally), but
* filters by edit-recency before touching anything. Targeting which pads
* to compact is deliberately a CLI concern and not a `compactPad` API
* param — staleness changes from one run to the next, the compaction
* primitive does not.
Comment on lines +1 to +16
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. No docs for compactstalepads 📎 Requirement gap ⚙ Maintainability

This PR adds a new operator-facing CLI (bin/compactStalePads) but does not update the admin/CLI
documentation under doc/ to describe its usage and flags. This makes the feature undiscoverable
and violates the documentation update requirement.
Agent Prompt
## Issue description
A new CLI tool (`bin/compactStalePads`) was introduced, but the admin/CLI documentation was not updated to describe how to use it (flags `--older-than`, `--keep`, `--dry-run`, and behavior).

## Issue Context
Compliance requires that operator-facing CLI functionality be documented in the appropriate `doc/` admin/CLI documentation so administrators can discover and safely use it.

## Fix Focus Areas
- doc/cli.md[1-200]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

*
* Destructive — `getEtherpad`-export anything you can't afford to lose
* before running.
*
* Issue #7642: long-lived instances accumulate cold pads whose history
* nobody is navigating any more. Hot pads should be left alone; this
* tool is the brick for reclaiming space on the cold tail.
*/
import path from 'node:path';
import fs from 'node:fs';
import process from 'node:process';

export type CompactStaleOpts = {
olderThanDays: number;
keepRevisions: number | null;
dryRun: boolean;
};

// Minimal interface mirroring the API endpoints the script needs. Tests
// substitute their own implementation that goes through supertest+JWT
// instead of fetch+APIKEY, so the loop logic is exercised against a real
// running server without dragging in apikey-file or fetch setup.
export type CompactStaleApi = {
listAllPads(): Promise<string[]>;
getLastEdited(padId: string): Promise<number>;
getRevisionsCount(padId: string): Promise<number>;
compactPad(padId: string, keepRevisions: number | null): Promise<void>;
};

export type CompactStaleReport = {
total: number;
stale: number;
ok: number;
failed: number;
skippedFresh: number;
totalRevsBefore: number;
totalRevsAfter: number;
};

export type CompactStaleLogger = {
info(msg: string): void;
error(msg: string): void;
};

const defaultLogger: CompactStaleLogger = {
info: (m) => console.log(m),
error: (m) => console.error(m),
};

const DAY_MS = 24 * 60 * 60 * 1000;

// Pure-ish core: compose listAllPads → getLastEdited → compactPad with
// the same per-pad error tolerance + dry-run + tally as compactAllPads.
// `now` is injected so tests can pin the wall clock.
export const runCompactStale = async (
api: CompactStaleApi, opts: CompactStaleOpts,
logger: CompactStaleLogger = defaultLogger,
now: () => number = Date.now,
): Promise<CompactStaleReport> => {
const cutoff = now() - opts.olderThanDays * DAY_MS;

let padIds: string[];
try {
padIds = await api.listAllPads();
} catch (e: any) {
logger.error(`listAllPads failed: ${e.message ?? e}`);
return {
total: 0, stale: 0, ok: 0, failed: 1, skippedFresh: 0,
totalRevsBefore: 0, totalRevsAfter: 0,
};
}

if (padIds.length === 0) {
logger.info('No pads on this instance.');
return {
total: 0, stale: 0, ok: 0, failed: 0, skippedFresh: 0,
totalRevsBefore: 0, totalRevsAfter: 0,
};
}

const strategy = opts.keepRevisions == null
? 'collapse all history'
: `keep last ${opts.keepRevisions} revisions`;
logger.info(
`Found ${padIds.length} pad(s). Filter: not edited in ` +
`${opts.olderThanDays} day(s). Strategy: ${strategy}` +
`${opts.dryRun ? ' (dry run — no writes)' : ''}.`);

const report: CompactStaleReport = {
total: padIds.length, stale: 0, ok: 0, failed: 0, skippedFresh: 0,
totalRevsBefore: 0, totalRevsAfter: 0,
};

// First pass: figure out which pads are actually stale. A getLastEdited
// failure on a pad is counted as a failure (we can't decide), but does
// not stop the run.
const stalePads: string[] = [];
for (const padId of padIds) {
let lastEdited: number;
try {
lastEdited = await api.getLastEdited(padId);
} catch (e: any) {
logger.error(`${padId}: getLastEdited failed: ${e.message ?? e}`);
report.failed++;
continue;
}
if (lastEdited > cutoff) {
report.skippedFresh++;
continue;
}
stalePads.push(padId);
}
report.stale = stalePads.length;

if (stalePads.length === 0) {
logger.info(
`No stale pads (${report.skippedFresh} fresh, ${report.failed} unreadable).`);
return report;
}

logger.info(
`${stalePads.length} stale pad(s) to process ` +
`(${report.skippedFresh} fresh skipped).`);

for (let i = 0; i < stalePads.length; i++) {
const padId = stalePads[i];
const idx = `[${i + 1}/${stalePads.length}]`;

let before: number;
try {
before = await api.getRevisionsCount(padId);
} catch (e: any) {
logger.error(`${idx} ${padId}: getRevisionsCount failed: ${e.message ?? e}`);
report.failed++;
continue;
}

if (opts.dryRun) {
logger.info(`${idx} ${padId}: ${before + 1} revision(s) — would compact`);
report.totalRevsBefore += before + 1;
continue;
}

// Re-check staleness right before compacting. Without this the
// first-pass selection is a TOCTOU window: on a long bulk run a
// pad can become active between selection and compaction, and
// compactPad would then kick those sessions. Re-checking here
// shrinks the window to one round-trip and treats the pad as
// freshened (skipped, not failed).
let lastEditedNow: number;
try {
lastEditedNow = await api.getLastEdited(padId);
} catch (e: any) {
logger.error(`${idx} ${padId}: getLastEdited recheck failed: ${e.message ?? e}`);
report.failed++;
continue;
}
if (lastEditedNow > cutoff) {
logger.info(`${idx} ${padId}: edited during run — skipping (now fresh)`);
report.skippedFresh++;
report.stale--;
continue;
}

try {
await api.compactPad(padId, opts.keepRevisions);
} catch (e: any) {
logger.error(`${idx} ${padId}: compactPad failed: ${e.message ?? e}`);
report.failed++;
continue;
}

let after: number | undefined;
try { after = await api.getRevisionsCount(padId); }
catch { /* main op already succeeded; post-count is informational */ }

if (after != null) {
logger.info(`${idx} ${padId}: ${before + 1} → ${after + 1} revision(s)`);
report.totalRevsBefore += before + 1;
report.totalRevsAfter += after + 1;
} else {
logger.info(`${idx} ${padId}: compacted (post-count unavailable)`);
}
report.ok++;
}

if (opts.dryRun) {
logger.info('');
logger.info(
`Dry run complete. ${stalePads.length} stale pad(s), ` +
`${report.totalRevsBefore} total revision(s) — re-run ` +
'without --dry-run to compact.');
} else {
logger.info('');
logger.info(
`Done. ${report.ok} pad(s) compacted, ${report.failed} failed, ` +
`${report.skippedFresh} fresh skipped. ` +
`Revisions: ${report.totalRevsBefore} → ${report.totalRevsAfter} ` +
`(reclaimed ${report.totalRevsBefore - report.totalRevsAfter}).`);
}

return report;
};

export const parseArgs = (argv: string[]): CompactStaleOpts | null => {
const opts: CompactStaleOpts = {
olderThanDays: NaN, keepRevisions: null, dryRun: false,
};
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === '--dry-run') {
opts.dryRun = true;
} else if (a === '--older-than') {
const v = argv[++i];
const n = Number(v);
if (!Number.isInteger(n) || n < 0) {
console.error(`--older-than expects a non-negative integer; got ${v}`);
return null;
}
opts.olderThanDays = n;
} else if (a === '--keep') {
const v = argv[++i];
const n = Number(v);
if (!Number.isInteger(n) || n < 0) {
console.error(`--keep expects a non-negative integer; got ${v}`);
return null;
}
opts.keepRevisions = n;
} else {
return null;
}
}
if (!Number.isFinite(opts.olderThanDays)) {
console.error('--older-than is required');
return null;
}
return opts;
};

const usage = () => {
console.error('Usage:');
console.error(' node bin/compactStalePads.js --older-than <days>');
console.error(' node bin/compactStalePads.js --older-than <days> --keep <N>');
console.error(' node bin/compactStalePads.js --older-than <days> --dry-run');
process.exit(2);
};

const isMain = require.main === module;
if (isMain) {
process.on('unhandledRejection', (err) => { throw err; });

const settings = require('ep_etherpad-lite/tests/container/loadSettings').loadSettings();
const baseURL = `${settings.ssl ? 'https' : 'http'}://${settings.ip}:${settings.port}`;

const apiGet = async (p: string): Promise<any> => {
const r = await fetch(baseURL + p);
if (!r.ok) throw new Error(`HTTP ${r.status} ${r.statusText}`);
return r.json();
};
const apiPost = async (p: string): Promise<any> => {
const r = await fetch(baseURL + p, {method: 'POST'});
if (!r.ok) throw new Error(`HTTP ${r.status} ${r.statusText}`);
return r.json();
};

const opts = parseArgs(process.argv.slice(2));
if (!opts) usage();

const apikey = fs.readFileSync(
path.join(__dirname, '../APIKEY.txt'), {encoding: 'utf-8'}).trim();

// Bind the abstract API to fetch + APIKEY auth for the CLI shell.
const cliApi: CompactStaleApi = {
async listAllPads() {
const apiInfo = await apiGet('/api/');
const apiVersion: string | undefined = apiInfo.currentVersion;
if (!apiVersion) throw new Error('No version set in API');
(cliApi as any)._apiVersion = apiVersion;
const r = await apiGet(`/api/${apiVersion}/listAllPads?apikey=${apikey}`);
if (r.code !== 0) throw new Error(JSON.stringify(r));
return r.data.padIDs ?? [];
},
async getLastEdited(padId: string) {
const v = (cliApi as any)._apiVersion;
const r = await apiGet(
`/api/${v}/getLastEdited?apikey=${apikey}` +
`&padID=${encodeURIComponent(padId)}`);
if (r.code !== 0) throw new Error(JSON.stringify(r));
return r.data.lastEdited;
},
async getRevisionsCount(padId: string) {
const v = (cliApi as any)._apiVersion;
const r = await apiGet(
`/api/${v}/getRevisionsCount?apikey=${apikey}` +
`&padID=${encodeURIComponent(padId)}`);
if (r.code !== 0) throw new Error(JSON.stringify(r));
return r.data.revisions;
},
async compactPad(padId: string, keepRevisions: number | null) {
const v = (cliApi as any)._apiVersion;
const params = new URLSearchParams({apikey, padID: padId});
if (keepRevisions != null) params.set('keepRevisions', String(keepRevisions));
const r = await apiPost(`/api/${v}/compactPad?${params.toString()}`);
if (r.code !== 0) throw new Error(JSON.stringify(r));
},
};

(async () => {
const report = await runCompactStale(cliApi, opts!);
if (report.failed > 0) process.exit(1);
})();
}
39 changes: 39 additions & 0 deletions doc/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,42 @@ In this example we migrate from the old dirty db to the new rustydb engine. So w

After that we need to move the data from dirty to rustydb.
Therefore, we call `pnpm run --filter bin migrateDB --file1 test1.json --file2 test2.json` with these two files in our root directories. After some time the data should be copied over to the new database.

## Pad compaction

Long-lived pads with heavy edit history accumulate revisions in the database. Three CLIs reclaim that space, in increasing scope:

| Tool | Targets | When to use |
| --- | --- | --- |
| `bin/compactPad.js <padID>` | one pad | you know which pad is fat |
| `bin/compactAllPads.js` | every pad | bulk reclaim across the whole instance |
| `bin/compactStalePads.js --older-than N` | pads not edited in N days | reclaim the cold tail without touching pads still in active use |

All three are gated on `cleanup.enabled = true` in `settings.json` and are **destructive**: history is collapsed (or trimmed). Export anything you can't afford to lose with `getEtherpad` first.

Common flags:

- `--keep N` — retain the last N revisions instead of collapsing all history.
- `--dry-run` — list pads and revision counts without writing.

### Examples

````
# Compact a specific pad, collapsing all history.
node bin/compactPad.js my-pad

# Keep only the last 50 revisions of one pad.
node bin/compactPad.js my-pad --keep 50

# Compact every pad on the instance (per-pad failures don't stop the run).
node bin/compactAllPads.js
node bin/compactAllPads.js --dry-run

# Compact only pads not edited in the last 90 days, keeping the last 50 revisions.
node bin/compactStalePads.js --older-than 90 --keep 50
node bin/compactStalePads.js --older-than 90 --dry-run
````

`bin/compactStalePads.js` is the right tool for periodic operator runs on long-lived instances — hot pads that users are still navigating in timeslider stay untouched, and only the cold tail is rewritten. Per-pad failures (including a `getLastEdited` fault) are counted but do not abort the bulk run; the exit code reflects whether anything failed.

See the `compactPad` HTTP API in `doc/api/http_api.md` for the same primitive over the wire (issues #6194, #7642).
Loading
Loading