Skip to content

[patch] vllm amzn2023 CVEs#5818

Closed
Yadan-Wei wants to merge 11 commits intomainfrom
vllm-cves
Closed

[patch] vllm amzn2023 CVEs#5818
Yadan-Wei wants to merge 11 commits intomainfrom
vllm-cves

Conversation

@Yadan-Wei
Copy link
Copy Markdown
Contributor

Purpose

Test Plan

Test Result


Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

  1. Using dlc_developer_config.toml
  2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)
How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

  • Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

  • Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

  • Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

  • sagemaker_remote_tests = true
  • sagemaker_efa_tests = true
  • sagemaker_rc_tests = true
  • sagemaker_local_tests = true
How to use PR description Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:
  • # /buildspec <buildspec_path>
    • e.g.: # /buildspec pytorch/training/buildspec.yml
    • If this line is commented out, dlc_developer_config.toml will be used.
  • # /tests <test_list>
    • e.g.: # /tests sanity security ec2
    • If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.
# /buildspec <buildspec_path>
# /tests <test_list>
Toggle if you are merging into main Branch

PR Checklist

  • [] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

@aws-deep-learning-containers-ci aws-deep-learning-containers-ci Bot added authorized Size:XL Determines the size of the PR labels Mar 24, 2026
Yadan Wei added 3 commits March 23, 2026 21:05
---
X-AI-Tool: Human
X-AI-Prompt: can you summerize this PR #5763 so I can add discription in the pr

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
---
X-AI-Tool: Kiro-cli
X-AI-Handle-Time-Seconds: 74
X-AI-Prompt: can you look at this dockerfile sample https://github.com/aws/deep-learning-containers/pull/5808/changes#diff-aff16f8c535417fcf020bc2184ab09935e6c66cf46842f6ccee6d2022f4077ff to modify my dockerfile for oss setup /Volumes/workplace/kiro-workplace/AsimovBuilderCoreContext/src/AsimovBuilderCoreContext/workspace/2week/deep-learning-containers/docker/vllm/Dockerfile.amzn2023

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
---
X-AI-Tool: Human
X-AI-Prompt: can you look at this dockerfile sample https://github.com/aws/deep-learning-containers/pull/5808/changes#diff-aff16f8c535417fcf020bc2184ab09935e6c66cf46842f6ccee6d2022f4077ff to modify my dockerfile for oss setup /Volumes/workplace/kiro-workplace/AsimovBuilderCoreContext/src/AsimovBuilderCoreContext/workspace/2week/deep-learning-containers/docker/vllm/Dockerfile.amzn2023

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
Yadan Wei added 8 commits March 23, 2026 21:50
---
X-AI-Tool: Kiro-cli
X-AI-Handle-Time-Seconds: 183
X-AI-Prompt: for my build vllm container,how can I add benchmark test with popular models

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
---
X-AI-Tool: Kiro-cli
X-AI-Handle-Time-Seconds: 142
X-AI-Prompt: okay could you implement for me and could you find which s3 bucket sample pr is using, we can use the same one

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
---
X-AI-Tool: Kiro-cli
X-AI-Handle-Time-Seconds: 28
X-AI-Prompt: how the cache will be saved bucket/hash/**.o?

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
feat(vllm): add benchmark tests and sccache build acceleration

Add vLLM benchmark test infrastructure with configurable throughput
and latency thresholds per model, integrated into the PR workflow.
Benchmark tests run against gpt-oss-20b, llama-3.3-70b, and qwen3-32b
using both CodeBuild fleet and runner-scale-sets runners.

Add sccache with S3 backend to the vLLM build stage to cache compiled
object files across CI runs. This replaces the ineffective local ccache
mount (lost on ephemeral CodeBuild runners) and enables incremental
recompilation when upstream cherry-picks or patches change only a
subset of source files. sccache is conditionally enabled via the
SCCACHE_BUCKET build arg, reusing the existing WHEEL_CACHE_BUCKET
repository variable from the PyTorch workflow.

ai-dev-branch commit IDs:
  7da43b3
  d49c6f8
  23c5e1a
  1147845
  eb2f2f5
  8ea8588
  3535d41

The prompts used are captured in the footers of those commits.
The initial prompt was: can you summerize this PR
  #5763 so I
  can add discription in the pr

---
X-AI-Handle-Time-Seconds: 353
X-AI-Line-Changes: New:414, Altered:1, Deleted:0
X-Human-Line-Changes: New:0, Altered:0, Deleted:0
X-AI-Line-Changes-Kiro-cli: New:414, Altered:1, Deleted:0
X-AI-Handle-Time-Seconds-Kiro-cli: 353
X-AI-Change-Count: 3
X-Human-Change-Count: 0
X-AI-Change-Count-Kiro-cli: 3
X-CR-Amendment: false
---
X-AI-Tool: Kiro-cli
X-AI-Handle-Time-Seconds: 75
X-AI-Prompt: how my sample PR access s3 bucket, I think we do not need to do above things

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
fix(vllm): pass AWS credentials to sccache inside docker build

sccache cannot reach the EC2 instance metadata service (IMDS) from
inside a docker build container, causing S3 cache lookups to fail.
Fix by passing the CodeBuild runner's temporary AWS credentials as
build args so sccache can authenticate via the standard env var
credential chain. Credentials only exist in the discarded builder
stage and never appear in the final multi-stage image.

ai-dev-branch commit IDs:
  215775d

The prompts used are captured in the footers of those commits.
The initial prompt was: how my sample PR access s3 bucket, I think
  we do not need to do above things

---
X-AI-Handle-Time-Seconds: 75
X-AI-Line-Changes: New:19, Altered:0, Deleted:0
X-Human-Line-Changes: New:0, Altered:0, Deleted:0
X-AI-Line-Changes-Kiro-cli: New:19, Altered:0, Deleted:0
X-AI-Handle-Time-Seconds-Kiro-cli: 75
X-AI-Change-Count: 1
X-Human-Change-Count: 0
X-AI-Change-Count-Kiro-cli: 1
X-CR-Amendment: false
---
X-AI-Tool: Kiro-cli
X-AI-Handle-Time-Seconds: 50
X-AI-Prompt: #24 6.697 Using MAX_JOBS=32 as the number of jobs.
#24 6.699 Using NVCC_THREADS=16 as the number of nvcc threads.
#24 6.940 -- The CXX compiler identification is GNU 11.5.0
#24 6.951 -- Detecting CXX compiler ABI info
#24 7.024 -- Detecting CXX compiler ABI info - failed
#24 7.024 -- Check for working CXX compiler: /usr/bin/c++
#24 7.089 -- Check for working CXX compiler: /usr/bin/c++ - broken
#24 7.089 CMake Error at /opt/venv/lib/python3.12/site-packages/cmake/data/share/cmake-4.3/Modules/CMakeTestCXXCompiler.cmake:73 (message):
#24 7.089   The C++ compiler
#24 7.089
#24 7.089     "/usr/bin/c++"
#24 7.089
#24 7.089   is not able to compile a simple test program.
#24 7.089
#24 7.089   It fails with the following output:
#24 7.089
#24 7.089     Change Dir: '/workspace/vllm/build/temp.linux-x86_64-cpython-312/CMakeFiles/CMakeScratch/TryCompile-Y7AumQ'
#24 7.089
#24 7.089     Run Build Command(s): /opt/venv/bin/ninja -v cmTC_ff516
#24 7.089     [1/2] sccache /usr/bin/c++    -o CMakeFiles/cmTC_ff516.dir/testCXXCompiler.cxx.o -c /workspace/vllm/build/temp.linux-x86_64-cpython-312/CMakeFiles/CMakeScratch/TryCompile-Y7AumQ/testCXXCompiler.cxx
#24 7.089     FAILED: [code=2] CMakeFiles/cmTC_ff516.dir/testCXXCompiler.cxx.o
#24 7.089     sccache /usr/bin/c++    -o CMakeFiles/cmTC_ff516.dir/testCXXCompiler.cxx.o -c /workspace/vllm/build/temp.linux-x86_64-cpython-312/CMakeFiles/CMakeScratch/TryCompile-Y7AumQ/testCXXCompiler.cxx
#24 7.089     sccache: error: Server startup failed: cache storage failed to read: Unexpected (permanent) at read => S3Error { code: "AuthorizationHeaderMalformed", message: "The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential.", resource: "", request_id: "9JNZ99SMVCR9235F" }
#24 7.089
#24 7.089     Context:
#24 7.089        uri: https://s3.us-west-2.amazonaws.com/dlc-cicd-models/sccache/vllm/.sccache_check
#24 7.089        response: Parts { status: 400, version: HTTP/1.1, headers: {"x-amz-request-id": "9JNZ99SMVCR9235F", "x-amz-id-2": "xP77wFtCDnopxg4jLe8wBmqfAYAk3v+fP16A7xtV1fsZueOgmrd/cCc7CZRjMMLMk+FfKnUhh5c=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Tue, 24 Mar 2026 05:57:07 GMT", "connection": "close", "server": "AmazonS3"} }
#24 7.089        service: s3
#24 7.089        path: .sccache_check
#24 7.089        range: 0-
#24 7.089
#24 7.089     Backtrace:
#24 7.089        0: <unknown>
#24 7.089        1: <unknown>
#24 7.089        2: <unknown>
#24 7.089        3: <unknown>
#24 7.089        4: <unknown>
#24 7.089        5: <unknown>
#24 7.089        6: <unknown>
#24 7.089        7: <unknown>
#24 7.089        8: <unknown>
#24 7.089        9: <unknown>
#24 7.089       10: <unknown>
#24 7.089       11: <unknown>
#24 7.089       12: <unknown>
#24 7.089
#24 7.089
#24 7.089     Run with SCCACHE_LOG=debug SCCACHE_NO_DAEMON=1 to get more information
#24 7.089     ninja: build stopped: subcommand failed.
#24 7.089
#24 7.089
#24 7.089
#24 7.089
#24 7.089
#24 7.089   CMake will not be able to correctly generate this project.
#24 7.089 Call Stack (most recent call first):
#24 7.089   CMakeLists.txt:14 (project)
#24 7.089
#24 7.089
#24 7.090 -- Configuring incomplete, errors occurred!
#24 7.093 Traceback (most recent call last):
#24 7.093   File "/workspace/vllm/setup.py", line 1044, in <module>
#24 7.093     setup(
#24 7.093   File "/opt/venv/lib64/python3.12/site-packages/setuptools/__init__.py", line 117, in setup
#24 7.093     return distutils.core.setup(**attrs)  # type: ignore[return-value]
#24 7.093            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#24 7.093   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/core.py", line 186, in setup
#24 7.093     return run_commands(dist)
#24 7.093            ^^^^^^^^^^^^^^^^^^
#24 7.093   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
#24 7.093     dist.run_commands()
#24 7.093   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
#24 7.093     self.run_command(cmd)
#24 7.093   File "/opt/venv/lib64/python3.12/site-packages/setuptools/dist.py", line 1107, in run_command
#24 7.094     super().run_command(command)
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
#24 7.094     cmd_obj.run()
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/command/bdist_wheel.py", line 370, in run
#24 7.094     self.run_command("build")
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
#24 7.094     self.distribution.run_command(command)
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/dist.py", line 1107, in run_command
#24 7.094     super().run_command(command)
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
#24 7.094     cmd_obj.run()
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/command/build.py", line 135, in run
#24 7.094     self.run_command(cmd_name)
#24 7.094   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
#24 7.094     self.distribution.run_command(command)
#24 7.095   File "/opt/venv/lib64/python3.12/site-packages/setuptools/dist.py", line 1107, in run_command
#24 7.095     super().run_command(command)
#24 7.095   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
#24 7.095     cmd_obj.run()
#24 7.095   File "/workspace/vllm/setup.py", line 360, in run
#24 7.095     super().run()
#24 7.095   File "/opt/venv/lib64/python3.12/site-packages/setuptools/command/build_ext.py", line 97, in run
#24 7.095     _build_ext.run(self)
#24 7.095   File "/opt/venv/lib64/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
#24 7.095     self.build_extensions()
#24 7.095   File "/workspace/vllm/setup.py", line 317, in build_extensions
#24 7.095     self.configure(ext)
#24 7.095   File "/workspace/vllm/setup.py", line 294, in configure
#24 7.095     subprocess.check_call(
#24 7.095   File "/usr/lib64/python3.12/subprocess.py", line 413, in check_call
#24 7.095     raise CalledProcessError(retcode, cmd)
#24 7.095 subprocess.CalledProcessError: Command '['cmake', '/workspace/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda', '-DCMAKE_C_COMPILER_LAUNCHER=sccache', '-DCMAKE_CXX_COMPILER_LAUNCHER=sccache', '-DCMAKE_CUDA_COMPILER_LAUNCHER=sccache', '-DCMAKE_HIP_COMPILER_LAUNCHER=sccache', '-DVLLM_PYTHON_EXECUTABLE=/opt/venv/bin/python3', '-DVLLM_PYTHON_PATH=/workspace/vllm:/usr/lib64/python312.zip:/usr/lib64/python3.12:/usr/lib64/python3.12/lib-dynload:/opt/venv/lib64/python3.12/site-packages:/opt/venv/lib64/python3.12/site-packages/nvidia_cutlass_dsl/python_packages:/opt/venv/lib/python3.12/site-packages:/opt/venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages:/opt/venv/lib64/python3.12/site-packages/setuptools/_vendor:/opt/venv/lib/python3.12/site-packages/grpc_tools/_proto', '-DFETCHCONTENT_BASE_DIR=/workspace/vllm/.deps', '-DNVCC_THREADS=16', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=2', '-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc']' returned non-zero exit status 1.
#24 ERROR: process "/bin/sh -c python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38   && if [ -n \"${SCCACHE_BUCKET}\" ]; then sccache --show-stats; fi" did not complete successfully: exit code: 1

Signed-off-by: Yadan Wei <yadanwei@amazon.com>
fix(vllm): resolve AWS credentials for sccache inside docker build

The CodeBuild runner's IAM credentials are not exposed as environment
variables by default. Use `aws configure export-credentials` to
resolve them from the SDK chain before passing as --build-arg to
docker build, so sccache can authenticate to S3.

ai-dev-branch commit IDs:
  c8835eb

The prompts used are captured in the footers of those commits.
The initial prompt was: (build error log showing sccache S3
  AuthorizationHeaderMalformed failure)

---
X-AI-Handle-Time-Seconds: 50
X-AI-Line-Changes: New:4, Altered:0, Deleted:0
X-Human-Line-Changes: New:0, Altered:0, Deleted:0
X-AI-Line-Changes-Kiro-cli: New:4, Altered:0, Deleted:0
X-AI-Handle-Time-Seconds-Kiro-cli: 50
X-AI-Change-Count: 1
X-Human-Change-Count: 0
X-AI-Change-Count-Kiro-cli: 1
X-CR-Amendment: false
@Yadan-Wei Yadan-Wei closed this Mar 25, 2026
@Yadan-Wei Yadan-Wei deleted the vllm-cves branch March 25, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authorized Size:XL Determines the size of the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant