Skip to content

Sanitize credential-like URLs in telemetry events to avoid False CredScan detection#340

Open
yashnap wants to merge 16 commits into
masterfrom
37515045lpe-uri-credential-redaction
Open

Sanitize credential-like URLs in telemetry events to avoid False CredScan detection#340
yashnap wants to merge 16 commits into
masterfrom
37515045lpe-uri-credential-redaction

Conversation

@yashnap
Copy link
Copy Markdown
Contributor

@yashnap yashnap commented Apr 21, 2026

Problem Statement :

An automated Credential Scan runtime telemetry scan raised an ICM against Azure Guest Patching Service indicating the presence of credential‑like values in extension status telemetry uploaded to Geneva.

CredScan periodically scans Microsoft‑owned telemetry storage and raises incidents when credential‑like patterns are detected.

Analysis indicates that package manager stdout/stderr (e.g., apt, yum, dnf) emits customer‑configured repository authentication values in URI userinfo format (e.g., username:secret@host). When included in LPE extension payloads, these values get uploaded to Geneva telemetry, where CredScan runtime scanning subsequently classifies them as potential credential leaks. Certain format :password that seemed to have been deprecated but the validations surrounding the previous checks are still present flagging the values that are in similar format.

While these values originate from customer‑owned VM configuration and are required for local package installation and diagnostics, their propagation into Microsoft‑owned telemetry systems results in recurring automated CredScan incidents which are false positives.

SOLUTION

This change introduces targeted sanitization at the telemetry export boundary:

  • Added a shared URI sanitization helper in Utility.py
  • Applied sanitization before telemetry JSON writes in both telemetry writers (extension and core paths)
  • Local logs are unchanged since these events are directly written to events folders which is read by CredScan.
  • Added UT's to support the change.

TESTING STRATEGY

  • Created VM ( RHEL in my case) and verified the linux agent is running on it. ( systemctl status waagent)
  • Created Zip from my local code changes and followed steps to run it on the VM
  • Added more verbose logging in the yum update, install commands because it was redacting the URL's itself before emitting locally. (Temporary local setup)
  • Simulated a failure using Wrong URL with credentials exposed that runs when the extension is enabled
  • Verified in /events and /logs ( My handlerEnvironment.json had events located at /tmp/x/events and logs at tmp/x/log) that the log folder ihas the credentials still present vs events folder *.json have them redacted.

Logs Folder Output

logsFolder

Events Folder Output

eventsFolder

APPROACH

  • Created Sanitization class in both core and extension package and injected the Dependency in respective Telemetry class.

  • Good if we want to add more patterns in the futurem, no consumer impact and backward compatible.

  • I have tested out the changes on my local using both RHEL/Linux VM machine.

  • I have considered approach where logic can be stay in one place (i.e Utility class) but Utility approach is discouraged.

  • I have considered adding the sanitizer logic in individual classes instead of creating separate class for Sanitization but from extensibility perspective the current approach I have is better.

Copilot AI review requested due to automatic review settings April 21, 2026 22:56
@yashnap
Copy link
Copy Markdown
Contributor Author

yashnap commented Apr 21, 2026

@microsoft-github-policy-service agree company="Microsoft"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds URI-userinfo credential redaction to telemetry event messages to prevent false-positive CredScan detections when package manager outputs include username:secret@host URLs.

Changes:

  • Added Utility.sanitize_credentials_from_uri() and applied it at the telemetry event creation boundary.
  • Updated both extension and core telemetry writers to sanitize messages before writing event JSON.
  • Added unit tests covering URI credential sanitization scenarios.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/extension/src/Utility.py Introduces a shared sanitization helper to remove password/token from URI userinfo.
src/extension/src/TelemetryWriter.py Applies sanitization to extension telemetry messages before writing events.
src/core/src/service_interfaces/TelemetryWriter.py Applies sanitization to core telemetry messages before writing events.
src/extension/tests/Test_Utility.py Adds unit tests for the new sanitization helper.
src/extension/tests/Test_TelemetryWriter.py Adds tests validating extension telemetry writes are sanitized.
src/core/tests/Test_TelemetryWriter.py Adds tests validating core telemetry writes are sanitized.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/extension/src/TelemetryWriter.py Outdated
Comment on lines +77 to +78
# ...existing code...

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder line # ...existing code... was added in the middle of the class and should be removed; it adds noise and can confuse future readers/tools since it looks like an unfinished merge artifact.

Suggested change
# ...existing code...

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +205
if self.runtime.is_github_runner:
return None

# Write event to telemetry
self.telemetry_writer.write_event(message)

# Load the event file
event_files = os.listdir(self.telemetry_writer.events_folder_path)
with open(os.path.join(self.telemetry_writer.events_folder_path, event_files[0]), 'r+') as f:
events = json.load(f)
sanitized_message = events[0]["Message"]
f.close()
return sanitized_message

# ==================== Test cases for credential sanitization in telemetry messages ====================
def test_sanitize_credentials_multiple_repos(self):
"""Test 2: Failed repo sync with multiple repo URLs containing different credentials"""
message = "Failed repo sync: https://user1:token1@repo1.example.com https://user2:token2@repo2.example.com/path"

sanitized_message = self._load_sanitized_event(message)
expected_message = "Failed repo sync: https://user1@repo1.example.com https://user2@repo2.example.com/path"
self.assertEqual(sanitized_message, expected_message)

def test_sanitize_credentials_username_only_no_password(self):
"""Test 3: Using mirror with username only (no password)"""
message = "Using mirror https://testuser@repo.example.com/path"

sanitized_message = self._load_sanitized_event(message)
self.assertIn("testuser@repo.example.com", sanitized_message)

def test_sanitize_credentials_special_characters_in_password(self):
"""Test 4: Downloading from repo with special characters in password"""
message = "Downloading from https://svc-user:AbC_123-.$%!@repo.contoso.com/rpm"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_sanitized_event() returns None on GitHub runners, but the new tests immediately assert on that return value (e.g., assertEqual / assertIn), which will fail (or raise TypeError) in CI. Prefer skipping these tests with unittest.skipIf(self.runtime.is_github_runner, ...) (or an early return at the start of each test) rather than returning None from the helper.

Copilot uses AI. Check for mistakes.
Comment on lines 25 to 28
from core.src.bootstrap.Constants import Constants
from extension.src.Utility import Utility


Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/src/.../TelemetryWriter.py now imports extension.src.Utility. This introduces a cross-package dependency from core -> extension (and will break consumers that run core without the extension package on PYTHONPATH). Consider moving sanitize_credentials_from_uri into a core/shared module (or duplicating a small helper in core) and have both telemetry writers depend on that shared location instead.

Suggested change
from core.src.bootstrap.Constants import Constants
from extension.src.Utility import Utility
try:
from urllib.parse import urlsplit, urlunsplit
except ImportError:
from urlparse import urlsplit, urlunsplit
from core.src.bootstrap.Constants import Constants
class Utility(object):
@staticmethod
def sanitize_credentials_from_uri(uri):
if uri is None:
return uri
uri = str(uri)
try:
split_uri = urlsplit(uri)
except Exception:
return uri
netloc = split_uri.netloc
if "@" not in netloc:
return uri
credentials, host = netloc.rsplit("@", 1)
if ":" in credentials:
username = credentials.split(":", 1)[0]
sanitized_netloc = "{0}:***@{1}".format(username, host)
else:
sanitized_netloc = "***@{0}".format(host)
return urlunsplit((
split_uri.scheme,
sanitized_netloc,
split_uri.path,
split_uri.query,
split_uri.fragment))

Copilot uses AI. Check for mistakes.
Comment thread src/extension/src/Utility.py Outdated
Comment on lines +85 to +93
# Pattern matches: scheme://user:password@host → scheme://user@host
# Handles credentials containing special characters including @
# Groups:
# (1) scheme: https://, http://, or ftp://
# (2) username: one or more non-whitespace, non-slash, non-colon, non-@ characters
# (3) password: zero or more non-whitespace, non-slash, non-@ characters
sanitized_message = re.sub(
r'(https?://|ftp://)([^:/@\s]+):([^@/\s]*)@',
r'\1\2@',
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment says the regex "Handles credentials containing special characters including @", but the pattern ([^@/\s]*) explicitly excludes @ from the password group. Either adjust the comment to match the actual behavior or update the pattern if @ in the password is truly a supported case.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 69.59459% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.57%. Comparing base (525a32f) to head (31746d9).

Files with missing lines Patch % Lines
src/extension/tests/Test_TelemetryWriter.py 21.42% 33 Missing ⚠️
src/extension/src/CredentialSanitizer.py 41.66% 7 Missing ⚠️
src/core/tests/Test_TelemetryWriter.py 95.52% 3 Missing ⚠️
src/extension/src/TelemetryWriter.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #340      +/-   ##
==========================================
- Coverage   93.82%   86.57%   -7.25%     
==========================================
  Files         105      107       +2     
  Lines       18172    18312     +140     
==========================================
- Hits        17049    15853    -1196     
- Misses       1123     2459    +1336     
Flag Coverage Δ
python27 86.57% <69.59%> (-7.25%) ⬇️
python312 86.57% <69.59%> (-7.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rane-rajasi
Copy link
Copy Markdown
Contributor

@yashnap code coverage checks are failing for this one. We will need those to succeed for a review process to begin. Fyi, we check code coverage in python 3 and python 2. Run the test suites for both of these in your local, you can use any minor versions within python 2 and 3

Copy link
Copy Markdown
Contributor

@rane-rajasi rane-rajasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment here: #340 (comment)

@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from dc5ce77 to e88327a Compare April 23, 2026 15:44
Comment thread src/core/src/Utility.py Outdated
import re


class Utility(object):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved away from a Utility class as it becomes a dumping ground for functions. Ask this question to the engineering agent: "What are the drawbacks of having a Utility class in code? From a best practices perspective."

Also review SOLID principles in programming, with S = Single-responsibility principle.

Copy link
Copy Markdown
Collaborator

@kjohn-msft kjohn-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment inline. Separate: Always get @rane-rajasi's sign off first.

@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from d72bf5e to ac91bd5 Compare April 24, 2026 15:47
@yashnap yashnap requested a review from rane-rajasi April 24, 2026 15:57
@rane-rajasi
Copy link
Copy Markdown
Contributor

#340 (comment)
@yashnap Remove the email and ADO item links from PR description. This is an open-source code base, no confidential MS sources should be shared. If you feel these are needed, add these references in your PR review request post, never share them in here

Comment thread src/core/src/service_interfaces/CredentialSanitizer.py Outdated
Comment thread src/core/src/service_interfaces/CredentialSanitizer.py
Comment thread src/core/src/service_interfaces/CredentialSanitizer.py Outdated
Comment thread src/core/src/service_interfaces/TelemetryWriter.py Outdated
Comment thread src/core/src/service_interfaces/TelemetryWriter.py Outdated
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
Comment thread src/core/src/service_interfaces/CredentialSanitizer.py Outdated
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
Comment thread src/extension/src/CredentialSanitizer.py
Comment thread src/extension/tests/Test_TelemetryWriter.py Outdated
@yashnap yashnap requested a review from rane-rajasi April 28, 2026 17:05
Copy link
Copy Markdown
Contributor

@rane-rajasi rane-rajasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the comments in core\ apply to extension\ too

Comment thread src/core/src/service_interfaces/TelemetryWriter.py Outdated
Comment thread src/core/src/service_interfaces/CredentialSanitizer.py Outdated
Comment thread src/core/src/bootstrap/ConfigurationFactory.py Outdated
Comment thread src/core/src/service_interfaces/TelemetryWriter.py Outdated
Comment thread src/core/src/service_interfaces/TelemetryWriter.py Outdated
Comment thread src/extension/src/TelemetryWriter.py Outdated
Comment thread src/extension/tests/Test_TelemetryWriter.py
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 231ca7f to ae48d7a Compare April 30, 2026 16:57
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from ae48d7a to e0edee2 Compare April 30, 2026 17:27
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from fbde095 to 292547e Compare April 30, 2026 19:13
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from fc9c179 to 485020a Compare April 30, 2026 21:06
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 485020a to e489e9f Compare April 30, 2026 21:23
@yashnap
Copy link
Copy Markdown
Contributor Author

yashnap commented May 1, 2026

Working on fixing coverage

Copy link
Copy Markdown
Contributor

@rane-rajasi rane-rajasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback comments inline. I've closed all of my comments from previous reviews that were addressed, the ones still pending are left open/unresolved. Make sure you address all the open comments from previous reviews too.

And codecoverage checks are failing, please address those

Comment thread src/core/src/service_interfaces/CredentialSanitizer.py Outdated
Comment thread src/core/src/service_interfaces/CredentialSanitizer.py Outdated
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
Comment thread src/extension/src/TelemetryWriter.py Outdated
Comment thread src/core/tests/Test_TelemetryWriter.py Outdated
self.assertEqual(events[-1]["TaskName"], "Test Task")
message_without_tc = events[-1]["Message"][:events[-1]["Message"].rfind(" [TC=")]
self.assertEqual("Error connecting to https://testuser@invalid.repo.example/rpm/repodata/repomd.xml", message_without_tc)
f.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract out common code reused in multiple tests to their own functions.

For eg: one function to open and read event from events folder, one to validate one event file and expected vs actual event (this internally calls the previous one), one for any setup process (such as clearing events folder and the assert), etc

Comment thread src/extension/tests/Test_TelemetryWriter.py Outdated
Comment thread src/extension/tests/Test_TelemetryWriter.py Outdated
Returns: The sanitized message from the event
"""
if self.runtime.is_github_runner:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this check if you want this test to run always (for CI coverage), so you won't need to set and reset it in tests

self.assertEqual(events[-1]["TaskName"], "Test Task")
self.assertEqual("Failed to fetch from https://user1@host1.com/api and http://user2@host2.com/data",
events[-1]["Message"])
f.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use _load_sanitized_event() here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer my comment in core/Test_TelemetryWriter on extracting code reused across tests

# Restore
self.runtime.is_github_runner = original_is_github_runner

def test_sanitize_credentials_multiple_urls_with_credentials_leak_in_input(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the function naming format you've used in core TelemetryWriter tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies for all function names in this file

Comment thread src/core/src/bootstrap/Bootstrapper.py
Comment thread src/extension/tests/Test_TelemetryWriter.py Fixed
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 2c19857 to 440e33c Compare May 7, 2026 13:23
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from 440e33c to 3ddb799 Compare May 7, 2026 13:53
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from d1569e7 to da276da Compare May 8, 2026 18:01
@yashnap yashnap force-pushed the 37515045lpe-uri-credential-redaction branch from da276da to 31746d9 Compare May 8, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants