Skip to content

chore: use dedicated OSS AWS account#6442

Merged
jeevb merged 3 commits intomainfrom
jeev/daft-oss
Mar 26, 2026
Merged

chore: use dedicated OSS AWS account#6442
jeevb merged 3 commits intomainfrom
jeev/daft-oss

Conversation

@jeevb
Copy link
Copy Markdown
Contributor

@jeevb jeevb commented Mar 20, 2026

Changes Made

Update to use new dedicated OSS AWS account:

  • S3 buckets for GitHub artifacts, and public datasets
  • CloudFront distribution

Requires:

  • Update ACTIONS_AWS_IAM_ROLE GitHub secret

Related Issues

jeevb added 2 commits March 19, 2026 08:50
Update all S3 bucket names, CloudFront distribution ID, and domain to
point to the new daft-oss AWS account resources.

- github-actions-artifacts-bucket -> daft-oss-github-actions-artifacts
- daft-public-data -> daft-oss-public-data
- daft-public-datasets -> daft-oss-public-datasets
- CloudFront: E3H8WN738AJ1D4 -> E1QUPAB4XXQ64R
- CloudFront domain: d1p3klp2t5517h.cloudfront.net -> ds0gqyebztuyf.cloudfront.net
@jeevb jeevb changed the title Jeev/daft oss Use dedicated OSS AWS account Mar 20, 2026
@jeevb jeevb changed the title Use dedicated OSS AWS account chore: use dedicated OSS AWS account Mar 20, 2026
@github-actions github-actions Bot added the chore label Mar 20, 2026
@jeevb jeevb marked this pull request as ready for review March 20, 2026 16:01
@jeevb jeevb requested a review from a team as a code owner March 20, 2026 16:01
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 20, 2026

Greptile Summary

This PR migrates all references to a new dedicated OSS AWS account, updating three S3 buckets (github-actions-artifacts-bucketdaft-oss-github-actions-artifacts, daft-public-datadaft-oss-public-data, daft-public-datasetsdaft-oss-public-datasets) and a CloudFront distribution (E3H8WN738AJ1D4 / d1p3klp2t5517h.cloudfront.netE1QUPAB4XXQ64R / ds0gqyebztuyf.cloudfront.net) across 36 files spanning CI workflows, benchmarking scripts, tests, documentation, and tutorials.

Key observations:

  • The rename is consistent and thorough across all changed files, including HTTPS virtual-hosted-style URLs (daft-public-data.s3.us-west-2.amazonaws.comdaft-oss-public-data.s3.us-west-2.amazonaws.com), s3:// and s3a:// protocol paths, and even commented-out references in tests.
  • The OUTPUT_PATH values pointing to s3://eventual-dev-benchmarking-results/ are intentionally left unchanged, as that bucket belongs to a separate internal results account.
  • Three example terminal-output tables in docs/modalities/files.md and docs/modalities/videos.md have truncated path strings that are now longer than their column widths due to the longer bucket name — these need minor re-truncation to keep the documentation tables visually correct.
  • The PR description notes that the ACTIONS_AWS_IAM_ROLE GitHub secret must be updated separately; this is a required out-of-band step that is not tracked in code.

Confidence Score: 5/5

  • This PR is safe to merge; all changes are mechanical bucket/distribution renames with no logic changes, contingent on the ACTIONS_AWS_IAM_ROLE secret being updated.
  • The changes are a consistent find-and-replace of AWS resource identifiers across 36 files. No logic, algorithms, or interfaces are modified. The only risk is that the new S3 buckets and CloudFront distribution must already be provisioned and the IAM role secret updated before these workflows and tests run — both are noted as prerequisites in the PR description. The minor documentation table formatting issues are cosmetic only.
  • docs/modalities/files.md and docs/modalities/videos.md have minor table formatting issues due to the longer new bucket name.

Important Files Changed

Filename Overview
.github/workflows/nightly-publish-s3.yml Updates S3 bucket name, CloudFront distribution ID, and CloudFront domain to use the new dedicated OSS AWS account. Changes are consistent across all three occurrences in the file.
.github/workflows/publish-dev-s3.yml Updates S3 bucket, CloudFront distribution ID, and CloudFront domain consistently. Also updates the inline aws s3api head-object bucket name reference.
docs/modalities/files.md S3 bucket path updated in code examples and in example terminal output tables, but the truncated path string in the table output is now longer than the column width, causing misaligned table borders in the documentation.
docs/modalities/videos.md S3 bucket path updated in code examples and example terminal output tables, but the truncated path string s3://daft-oss-public-data/videos/… (34 chars) now exceeds the 30-char column width in the rendered table output.
tests/integration/io/parquet/test_reads_public_data.py All HTTPS and s3:// / s3a:// URLs, including commented-out references, are consistently updated to the new daft-oss-public-data bucket.
daft/io/delta_lake/_deltalake.py Docstring example S3 path updated from daft-public-data to daft-oss-public-data.
daft/io/lance/_lance.py Docstring example S3 path updated from daft-public-data to daft-oss-public-data.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GitHub Actions Workflows] -->|publish artifacts| B[S3: daft-oss-github-actions-artifacts]
    B -->|invalidate & serve| C[CloudFront: E1QUPAB4XXQ64R\nds0gqyebztuyf.cloudfront.net]
    C -->|pip install --extra-index-url| D[End Users]

    E[Tests / Benchmarks / Docs / Tutorials] -->|anonymous read| F[S3: daft-oss-public-data]
    E -->|anonymous read| G[S3: daft-oss-public-datasets]

    subgraph Old OSS Account
        H[github-actions-artifacts-bucket]
        I[daft-public-data]
        J[daft-public-datasets]
        K[CloudFront: E3H8WN738AJ1D4\nd1p3klp2t5517h.cloudfront.net]
    end

    subgraph New Dedicated OSS Account
        B
        F
        G
        C
    end
Loading

Comments Outside Diff (3)

  1. docs/modalities/files.md, line 519 (link)

    P2 Table output truncation overflows column width

    The new bucket name daft-oss-public-data is 4 characters longer than daft-public-data, so the truncated display string s3://daft-oss-public-data/open-im… (34 chars) now exceeds the table's 30-character column width (as established by the ╞════════════════════════════════╪ border). This causes visually misaligned table borders in the rendered documentation.

    The truncation point should be adjusted so the cell content still fits within the column, for example:

  2. docs/modalities/files.md, line 537 (link)

    P2 Table output truncation overflows column width

    Same overflow as above — the truncated path s3://daft-oss-public-data/open-im… (34 chars) is wider than the column. Suggest re-truncating to fit:

  3. docs/modalities/videos.md, line 590-609 (link)

    P2 Table output truncation overflows column width

    All seven data rows in this table now show s3://daft-oss-public-data/videos/… (34 chars), which is wider than the table's 30-character first column (established by │ --- ┆). Each truncated path should be shortened to fit, e.g. s3://daft-oss-public-data/vid… to maintain correct table alignment in the rendered documentation.

Last reviewed commit: "style: apply ruff fo..."

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.83%. Comparing base (6448008) to head (335d81a).
⚠️ Report is 33 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6442      +/-   ##
==========================================
- Coverage   74.85%   74.83%   -0.02%     
==========================================
  Files        1024     1024              
  Lines      137716   137697      -19     
==========================================
- Hits       103083   103042      -41     
- Misses      34633    34655      +22     
Files with missing lines Coverage Δ
daft/io/delta_lake/_deltalake.py 33.33% <ø> (ø)
daft/io/lance/_lance.py 92.10% <ø> (ø)

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@madvart madvart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rchowell
Copy link
Copy Markdown
Contributor

@jeevb are you able to confirm all the gh actions ran with these update refs? I will merge this ASAP

@jeevb
Copy link
Copy Markdown
Contributor Author

jeevb commented Mar 26, 2026

@rchowell Not yet. Was planning on testing after merge since it requires a secret change at the repo level anyway. If there is someone else around, I can iterate on fixes quickly. Biggest risk is missing permissions, which we can fix at infra level.

@jeevb jeevb merged commit 5086240 into main Mar 26, 2026
61 of 62 checks passed
@jeevb jeevb deleted the jeev/daft-oss branch March 26, 2026 02:23
gavin9402 pushed a commit to gavin9402/Daft that referenced this pull request Apr 7, 2026
## Changes Made

Update to use new dedicated OSS AWS account:
- S3 buckets for GitHub artifacts, and public datasets
- CloudFront distribution

Requires:
- Update `ACTIONS_AWS_IAM_ROLE` GitHub secret

## Related Issues

<!-- Link to related GitHub issues, e.g., "Closes Eventual-Inc#123" -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants