Skip to content

Bump chardet from 5.2.0 to 7.4.1#1670

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.4.1
Open

Bump chardet from 5.2.0 to 7.4.1#1670
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.4.1

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 8, 2026

Bumps chardet from 5.2.0 to 7.4.1.

Release notes

Sourced from chardet's releases.

7.4.1

Bug Fixes

  • BOM-prefixed UTF-16/32 input now returns utf-16/utf-32 instead of utf-16-le/utf-16-be/utf-32-le/utf-32-be. The endian-specific codecs don't strip the BOM on decode, so callers were getting a stray U+FEFF at the start of their text. BOM-less detection is unchanged. (#364, #365)

Full Changelog: chardet/chardet@7.4.0...7.4.1

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

  • New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

  • Eliminated train/test data overlap via content fingerprinting
  • Added MADLAD-400 and Wikipedia as supplemental training sources
  • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF
  • Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)
  • Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6
Accuracy (2,517 files) 99.3% 88.2% 85.4%
Speed 551 files/s 12 files/s 376 files/s
Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.3.0

License

  • 0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.

Features

  • Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (#350)
  • New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)

Bug Fixes

  • Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable

Performance

... (truncated)

Changelog

Sourced from chardet's changelog.

7.4.1 (2026-04-07)

Bug Fixes:

  • BOM-prefixed UTF-16 and UTF-32 input now reports utf-16 and utf-32 instead of the endian-specific variants. Python's utf-16-le/utf-16-be/utf-32-le/utf-32-be codecs keep the BOM as a U+FEFF in the decoded string, while utf-16/utf-32 strip it, so callers passing the detection result directly to .decode() were getting a stray BOM at the start of their text. BOM-less UTF-16/32 detection (via null-byte patterns) is unchanged and still returns the endian-specific name. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#364](https://github.com/chardet/chardet/issues/364) <https://github.com/chardet/chardet/issues/364>, [#365](https://github.com/chardet/chardet/issues/365) <https://github.com/chardet/chardet/pull/365>)

7.4.0 (2026-03-26)

Performance:

  • Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

  • Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

    • Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)
    • Added encoding-aware substitution filtering: character substitutions during training now only apply for characters the target encoding cannot represent
    • Increased training samples from 15K to 25K per language/encoding pair (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude)

... (truncated)

Commits
  • d9ae78d docs: changelog for 7.4.1
  • 2a54c68 Return utf-16/utf-32 (not -le/-be) when a BOM is present (#365)
  • c63c632 Address GitHub code quality findings and add missing test coverage
  • 1ad8e6a Revert "Add PyInstaller hook to collect mypyc shared runtime library (#359)" ...
  • 7fb0563 Add PyInstaller hook to collect mypyc shared runtime library (#359)
  • 2d75e6d Link to blogpost in README
  • e37cf3c fix: prevent dirty-tree version in Windows mypyc wheel builds
  • f9f5af2 Fix a couple errors in the changelog
  • 53755de chore: add .superpowers/ to .gitignore
  • 3a20df6 docs: update README examples with correct outputs
  • Additional commits viewable in compare view

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.69%. Comparing base (efaab69) to head (38d3ac1).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1670   +/-   ##
=======================================
  Coverage   54.69%   54.69%           
=======================================
  Files         336      336           
  Lines       27440    27440           
=======================================
  Hits        15007    15007           
  Misses      12433    12433           
Flag Coverage Δ
functionaltests 0.00% <ø> (ø)
unittests 54.69% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.1.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.4.1)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/chardet-7.4.1 branch from d2b3a1f to 38d3ac1 Compare April 8, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Development

Successfully merging this pull request may close these issues.

0 participants