Skip to content

Refactor setup.py to improve GitHub API error handling and configuration#532

Merged
jrhoads merged 4 commits into
devfrom
datadump-errors
May 6, 2026
Merged

Refactor setup.py to improve GitHub API error handling and configuration#532
jrhoads merged 4 commits into
devfrom
datadump-errors

Conversation

@jrhoads
Copy link
Copy Markdown
Member

@jrhoads jrhoads commented May 6, 2026

Purpose

This PR refactors the setup.py management command to improve error handling and configuration management when interacting with the GitHub API to fetch ROR data dumps.

Approach

The changes focus on making the setup.py command more robust by:

  • Adding explicit error handling for network requests and API responses.
  • Centralizing configuration access and validation.
  • Improving logging for better debugging.

Key Modifications

  • Error Handling for GitHub API Requests: Introduced try-except blocks to catch specific requests.exceptions like Timeout, ConnectionError, and general RequestException.
  • HTTP Error Handling: Added response.raise_for_status() with specific checks for common GitHub API error codes (401, 403, 404) to provide more informative error messages.
  • Configuration Validation: Moved GitHub token and repository URL retrieval into dedicated functions (build_github_headers, get_contents_url) with checks for missing configuration values, raising CommandError if necessary.
  • Improved JSON Parsing Error Handling: Added a try-except ValueError block for JSON decoding to handle cases where the GitHub API returns invalid JSON.
  • Response Structure Validation: Added a check to ensure the repo_contents is a list as expected from the GitHub API.
  • Centralized Logging: Introduced a logger for better tracking of execution flow and errors.
  • CommandError Usage: Replaced SystemExit with CommandError for more graceful error reporting within Django management commands.
  • Timeout Configuration: Added a REQUEST_TIMEOUT_SECONDS constant for controlling request timeouts.

Important Technical Details

  • The get_ror_dump_sha function now explicitly handles various network and HTTP errors from the GitHub API, providing clearer feedback to the user.
  • Authentication and repository URL fetching are now validated before making API calls, preventing potential issues down the line.
  • The use of CommandError ensures that errors are reported appropriately by Django's management command system.
  • The build_github_headers function ensures that the GitHub token is present and correctly formatted.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

@adambuttrick
Copy link
Copy Markdown
Contributor

@jrhoads Thanks for getting this up so quickly! I'm a bit wary of the inline env vars, as (I think) they would override anything in .env, so then local stuff gets silently ignored. Kind of non-obvious. I think it also opens us up a bit towards the "oopsie" path of committing real creds (whereas in the setup now .env is gitignored). Could we just keep .env as the source of truth and add a .env.example with the dummy values?

Separately, I think swapping the rorapi bind mount for develop.watch means docker compose up will no longer hot-reload? Is that correct? If so, we should probably update the README/docs so folks know to run with docker compose watch.

@jrhoads
Copy link
Copy Markdown
Member Author

jrhoads commented May 6, 2026

I don't have strong opinions on the docker compose stuff. I was encountering some friction when running the app with docker compose and this cleared it up.

I was not getting the hot reload behavior in passenger and this is why I switched to the develop.watch methods.

For Env variable I just looked up the precedence for container runtime:

  • environment: in docker-compose.yaml (highest)
  • env_file: (if multiple files are listed, later files override earlier ones for duplicate keys)
  • ENV instructions in the Dockerfile (lowest)

If I need it I can do this stuff with a local docker-compose.override.yaml file. I can go ahead and revert most of the stuff in docker-compose.yaml. It's not the real focus of this work.

@jrhoads jrhoads changed the title Feat: Improve GitHub API error handling and local dev setup Refactor setup.py to improve GitHub API error handling and configuration May 6, 2026
@jrhoads jrhoads requested a review from adambuttrick May 6, 2026 15:56
@adambuttrick
Copy link
Copy Markdown
Contributor

@jrhoads Thanks for doing so. Approved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants