Rails API that ingests GitHub public PushEvent records, enriches actors and repositories, and stores structured data in PostgreSQL.
- Docker and Docker Compose v2
From the repository root:
docker compose up --buildThis builds the app image from Dockerfile.dev, starts PostgreSQL, runs migrations via the entrypoint (db:prepare), and runs the Rails server on port 3000.
Follow logs with:
docker compose logs -f app dbCompose sets RAILS_LOG_TO_STDOUT=true for the app and ingest services so Rails logs go to stdout (visible with docker compose logs -f).
In another terminal (with docker compose up running so db is available):
docker compose run --rm ingestThis runs bin/rails github:ingest and fetches from the GitHub public events API.
docker compose run --rm testThis sets RAILS_ENV=test, prepares the test database, and runs bin/rails test.
After docker compose run --rm ingest, you should see lines similar to:
EventIngester: starting ingestion- On successful imports (log level debug):
EventIngester: stored push event id=... - A completion line:
EventIngester: ingest complete — X imported, Y skipped, Z failed (...)including[rate limit remaining: ...]when the client still has a last HTTP response
If GitHub rate-limits the events fetch or enrichment, you may see: EventIngester: rate limit reached (resets at ...), aborting.
With the stack up (docker compose up), inspect counts:
docker compose exec app bin/rails runner "puts({ push_events: PushEvent.count, actors: Actor.count, repositories: Repository.count })"Or a one-off container:
docker compose run --rm app bin/rails runner "puts PushEvent.order(:created_at).last&.attributes"You should see rows in push_events, actors, and repositories after a successful ingest (exact counts depend on how many PushEvent items GitHub returned and how many were new).
Ingestion is a single rake run: it usually finishes within seconds to a minute, depending on network and rate limits. Rows appear as soon as the task completes without error.
Requires Ruby (see .ruby-version), PostgreSQL, and bundle install. Ensure PostgreSQL matches config/database.yml, create databases if needed, then:
bin/rails db:prepare
bin/rails github:ingest
bin/rails testOne row per GitHub user seen in ingested events. github_id is the GitHub user id (unique). login is the GitHub username from the event. raw_payload is the JSON from GET /users/:login (Octokit client.user(login)).
One row per GitHub repository seen in ingested events. github_id is the GitHub repository id (unique). name stores the full name (owner/repo) from the enriched repo. raw_payload is the JSON from GET /repositories/:id (Octokit client.repository(id)).
New actors and repositories trigger a single API fetch each; repeats in the same or later runs reuse rows and skip extra HTTP calls.
Each row is one GitHub PushEvent with:
raw_payload— fullevent.to_hfrom Octokit (JSONB) for audit and debugging.actor_id/repository_id— foreign keys to enrichedactorsandrepositories.- Structured columns (queryable without JSON parsing), mapped from the event payload:
push_id—payload.push_idref—payload.refhead—payload.head(commit SHA)before_sha—payload.before(prior commit SHA; column name avoids Ruby keywordbefore)
github_event_id is the GitHub event id string and remains the natural unique key for idempotent ingestion.