Skip to content

Spike: Velocity macro library (VM_global_library.vm) not loaded before request serving on pod restart — #renderMarks renders as literal text #35329

@dotCMS-Machine-User

Description

@dotCMS-Machine-User

Summary

After a pod restart (triggered by JVM memory pressure, a Kubernetes node change, or a dotEvergreen update), the Velocity macro library (VM_global_library.vm) is sometimes not fully loaded before the application begins serving HTTP requests. When this happens, macros defined in that library — primarily #renderMarks($element) and #editContentlet — render as literal text on all public-facing pages. The issue self-resolves once the engine fully initializes, typically within 20–30 minutes, or after a targeted node restart.

This is a recurring platform-level bug that has affected multiple cloud customers across at least five support tickets spanning February–April 2026.


Observed Behavior

  • Velocity macros (#renderMarks($element), #editContentlet) appear as raw, unrendered text in LIVE/rendering mode
  • EDIT mode is unaffected (uses a different rendering path)
  • Issue is intermittent and self-resolves after ~20–30 minutes, consistent with engine initialization completing
  • In multi-node environments, the issue may affect only one node at a time, making it appear random to end users
  • Cache flush and manual page republishing do not resolve it — the issue is in the engine state, not the content cache
  • No errors visible in dotCMS application logs in all cases (at least one case had no log errors at all)

Root Cause Hypothesis

VelocityEngine is lazily initialized

VelocityUtil.java:59 uses Lazy.of() to initialize the VelocityEngine. The engine loads VM_global_library.vm, dotCMS_library.vm, and dotCMS_library_ext.vm at init time via the velocimacro.library config property.

If the first rendering request arrives before Lazy.of() has resolved (i.e., the engine is not yet initialized), the macro library is not available. With velocimacro.library.autoreload=false, there is no self-correction mechanism — macros remain unresolvable until the engine is fully re-initialized or the process restarts.

Relevant config:

velocimacro.library=VM_global_library.vm,dotCMS_library.vm,dotCMS_library_ext.vm
velocimacro.library.autoreload=false

No readiness probe gates traffic on Velocity engine readiness

The Kubernetes deployment spec does not appear to include a readiness probe that verifies Velocity engine initialization before routing traffic to a pod. This means a pod can receive live traffic while still in the startup phase, before the macro library is loaded.

Apache Velocity upstream behavior

Apache VELOCITY-86 documents that Velocity logs an error if VM_global_library.vm is not found during initialization. If initialization is incomplete (rather than failing outright), this error may not always appear in logs — consistent with the cases where customers reported no log errors.


Affected Customer Tickets

Ticket Date Environment Notes
#36283 Apr 8, 2026 Prod (Cloud) JVM pressure → pod restart → #renderMarks + #editContentlet literal text. Self-resolved in ~30 min. Cloud confirmed K8s spot node change.
#36427 Apr 13, 2026 Staging (Cloud) Two-node cluster; one node affected post-restart, other fine — explains intermittency. Resolved after targeted node restart.
#35580 Feb 23, 2026 Prod (Cloud) After 2/18 update; #renderMarks($element) across ALL sites. No log errors. Self-resolved. Same instance as #36283 — recurrence.
#35332 Feb 3, 2026 Prod (Cloud) #renderMarks($element) after recent update, version dotcms-core 26.01.16-01. Intermittent — affected both preview and live. Resolved after restart.
#35551 Feb 18, 2026 QA (Cloud) Block editor rendering failure in upgraded QA env. Also showed Velocity parse errors in a customer .vtl file. #renderMarks issue resolved after node restart.

Pattern: 5 tickets across 3+ customers, Feb–Apr 2026, all on dotEvergreen Cloud, all triggered by restarts.


Spike Goals

This is an investigation spike, not a fix. The goal is to answer the following questions so an implementation ticket can be scoped:

  1. Lazy vs. eager initialization — Is VelocityUtil.java lazy initialization intentional? What is the cost of switching to eager initialization so the engine is guaranteed ready before the first request is served?

  2. Readiness probe — What would a Kubernetes readiness probe look like that validates Velocity engine and macro library initialization? Is this feasible to add to the Cloud deployment spec? Who owns this (dotCMS core vs. Cloud infra)?

  3. velocimacro.library.autoreload — What is the impact of setting autoreload=true in production? Does it introduce performance overhead significant enough to rule it out as a mitigation?

  4. Startup sequencing — Is there an existing mechanism (e.g., a startup event, an InitServlet, or a @PostConstruct) that could be used to block the HTTP listener until the Velocity engine reports ready? Or does the engine need to expose a readiness check method?

  5. Log coverage — Why do some cases produce no Velocity errors in the logs when macros fail to render? Is the failure mode silent in certain startup states? If so, can we add explicit logging for VM_global_library.vm load success/failure?

  6. Self-hosted impact — Does this race condition exist in self-hosted deployments, or is it specific to the Kubernetes rolling-restart behavior in Cloud?


Expected Output

A follow-up implementation issue (or issues) scoped to one or more of:

  • Eager initialization of VelocityEngine with a guaranteed-loaded state before HTTP serving begins
  • A Kubernetes readiness probe tied to Velocity engine initialization
  • Explicit startup logging for macro library load state
  • Documentation of the known gap for self-hosted operators

Supporting Code References

  • VelocityUtil.java:59Lazy.of() engine initialization
  • velocimacro.library config — VM_global_library.vm, dotCMS_library.vm, dotCMS_library_ext.vm
  • VM_global_library.vm — defines #renderMarks, #editContentlet, and other core rendering macros
  • Apache VELOCITY-86 — upstream issue documenting failure behavior when the library file is not found at init time

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions