Skip to content

Separate saveDate and fetchDate in the Record model #1243

@Ndpnt

Description

@Ndpnt

Context

Currently, a Record (Version or Snapshot) carries a single temporal field: fetchDate, representing when the source content was captured. On the Git side, this fetchDate is used as both GIT_AUTHOR_DATE and GIT_COMMITTER_DATE of the commit, overwriting Git's native distinction between these two notions. On the Mongo side, a created_at field is written but never surfaced in the domain model.

This asymmetry becomes problematic as soon as we want to distinguish two separate chronological questions:

  1. "When was the service's legal content in effect?"fetchDate. Useful for chronological navigation of terms, querying at a given date, semantic diff between two versions.

  2. "When did the engine record this interpretation?" → information currently lost (which could be called saveDate). Useful for engine traceability, audit, activity dashboards, and pagination by actual recording order.

The two notions diverge as soon as a commit is backdated, which happens systematically with applyTechnicalUpgrades: existing snapshots are re-rendered today (T_now) with current extraction rules, but the fetchDate of the original snapshot (T_old) is semantically preserved. The commit is created at T_now but dated at T_old.

Initial problem that surfaced the topic

While adding the Atom feed endpoints (PR #1242), it became apparent that paginating findAll/findByService/findByServiceAndTermsType with --max-count on the Git side can return incorrect results after a batch of technical upgrades: recently-created but backdated commits occupy the first topological positions, pushing chronologically more recent commits (by fetchDate) out of the window. The JavaScript sort at the end doesn't fix this, it only reorders what git already returned.
For the feed, the decision was made that technical upgrades don't belong (a re-render is not a change event for subscribers) and they are filtered out via an option (see PR #1242). This solves the feed problem without touching the data model.

However, any future use case that exposes versions to end users (e.g. a navigation/exploration UI with diffs between versions) would need to include technical upgrades, otherwise two versions separated by a technical upgrade would show a false diff. Such a use case would require both chronological dimensions to be properly exposed.

Solutions considered

Option A: Filter out technical upgrades from all chronological queries

Adopted for the feed PR (#1242), but insufficient for the navigation UI which needs to see them.

Option B: Split by prefix inside #getCommits

Run two Git queries (one for "real change" commits, one for "technical upgrade" commits) each with --max-count=X, merge them, JS-sort, then slice to X.

Mostly correct because startTracking/update commits are never backdated, so their topological order matches their chronological order; topo-pagination of that subset is exact. Backdated commits are isolated in the second query where we accept that the X most recently-created are returned regardless of their fetchDate. The merge + sort + slice then surfaces the truly recent commits at the top, and backdated ones land at their semantic position.

Residual edge case: a single applyTechnicalUpgrades run on more than X services with heterogeneous snapshot ages, where the iteration order happens to put the recently-dated snapshots first (so their commits sit at lower topological positions). The --max-count=X window for the technical upgrade query then captures the older-dated upgrades, and the recently-dated ones are missed.

Option C: Decouple authorDate = fetchDate from commitDate = saveDate

Use Git's native distinction, expose saveDate as a field in the Record model, and stop overwriting GIT_COMMITTER_DATE at commit time. On the Mongo side, surface created_at as saveDate.

Benefits:

  • Aligns the Git model with the Mongo model (which already distinguished both dates internally).
  • Gives consumers a clean semantic axis for each question.
  • Enables pagination by saveDate (= topological order in a linear history) without JS sort and without incorrectness.
  • Lets the navigation UI display both the content validity date AND the date of the last re-render.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions