Skip to content

cove: add check for large number of XLSX sheet rows#174

Merged
chrisarridge merged 1 commit into
mainfrom
ca/xlsx-robustness-fix
May 12, 2026
Merged

cove: add check for large number of XLSX sheet rows#174
chrisarridge merged 1 commit into
mainfrom
ca/xlsx-robustness-fix

Conversation

@chrisarridge
Copy link
Copy Markdown
Contributor

This PR fixes an issue where XLSX spreadsheets with a malformed number of rows generate very large unflattened JSON files. This fix only applies to files uploaded to the web interface of the DQT not to the command line tools, such as cove_checks.py. It also doesn't make any changes to lib360dataquality, for example, as used by the datagetter, since the datagetter first unflattens and then validates. However, we could implement a similar check/fix in the datagetter.

The maximum number of rows is controllable with an environment variable. The default has been set to 50000 by sampling existing files in the registry and adding a substantial margin.

Note: A more comprehensive and underlying fix would be to add this as an optional check to flattentool and then enable that optional check across the various tools.

This commit fixes an issue where XLSX spreadsheets with a malformed
number of rows generate very large unflattened JSON files.  The fix
simply checks for an excessive number of rows before passing off
to cove/flattentool.  The number of rows considered excessive is
controllable through the MAX_XLSX_ROWS environment variable and set
by default to 50000.
@chrisarridge chrisarridge requested review from BibianaC and R2ZER0 March 25, 2026 18:03
@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 23556356187

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 6 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 81.013%

Files with Coverage Reduction New Missed Lines %
cove_project/settings.py 6 83.64%
Totals Coverage Status
Change from base Build 23540491931: 0.2%
Covered Lines: 64
Relevant Lines: 79

💛 - Coveralls

@chrisarridge chrisarridge merged commit 783b746 into main May 12, 2026
4 checks passed
@chrisarridge chrisarridge deleted the ca/xlsx-robustness-fix branch May 12, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants