This guide explains how to migrate data from the old production database (geoportal_production_20251030) to the new resources table in the current application. The migration uses a materialized view bridge approach to transform and map data between the two schemas.
The migration process consists of two phases:
- Bridge Creation: Creates a materialized view in the old database that transforms
kithe_modelsrecords into the new schema format - Data Import: Imports the transformed data from the materialized view into the new database
- Old production database accessible: The
geoportal_production_20251030database must be accessible on the same ParadeDB container - Environment configuration: Ensure database connection parameters are set in your
.envfile:
# Database connection (existing)
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=localhost
DB_PORT=2345
DB_NAME=btaa_geospatial_api
# Old database name (add this)
OLD_DB_NAME=geoportal_production_20251030- Database backups: Always backup both databases before starting the migration
- New database:
make db-export - Old database: Connect and export if needed
- New database:
The old database uses a kithe_models table with:
id: Record IDtitle: Record titletype: Record type (we filter for'Document')json_attributes: JSONB column containing all metadata fields
The new database uses a resources table with all OGM Aardvark and BTAA-specific fields as separate columns.
Most fields map directly from json_attributes to the new column names. The field names are designed to match OGM Aardvark standards:
| Old Schema (JSON key) | New Schema (Column) | Notes |
|---|---|---|
id |
id |
Direct mapping |
title |
dct_title_s |
Direct mapping |
publication_state |
publication_state |
Carried over from kithe_models column |
import_id |
import_id |
Carried over from kithe_models column |
| All other fields | Same name | Mapped from json_attributes JSON |
Fields ending with _sm or _im are treated as arrays and cast appropriately:
_sm: String arrays (text[])_im: Integer arrays (integer[])
First, check how fields map between the old and new schemas:
python db/migrations/bridge_old_production.py --verifyThis will:
- Sample records from the old database
- Compare JSON keys with the new schema fields
- Report matching fields, unmapped fields, and new fields
Expected output:
Found X unique JSON keys in sampled records
Keys matching new schema: Y
Unmapped keys: Z
Review any unmapped fields to determine if they should be included in the migration.
Create the materialized view bridge in the old database:
python db/migrations/bridge_old_production.py --create-viewThis will:
- Create the
kithe_to_resources_bridgematerialized view - Extract fields from
json_attributesJSON - Apply proper type casting (arrays, dates, booleans, JSON, etc.)
- Filter for
type = 'Document'records withpublication_state = 'published' - Create an index on the
idfield - Populate the view with data
Expected output:
✓ Materialized view created successfully
✓ Index created
✓ Materialized view refreshed
✓ Materialized view contains X records
Get a summary of the materialized view:
python db/migrations/bridge_old_production.py --summaryThis displays:
- Total record count
- Sample IDs and titles
- Verification of view contents
Test the import process without writing data:
python db/migrations/import_from_old_production.py --dry-run --batch-size 1000This will:
- Connect to both databases
- Process records in batches
- Simulate the import without writing
- Report how many records would be imported
Review the output to ensure everything looks correct before running the actual import.
Run the actual import:
python db/migrations/import_from_old_production.py --batch-size 1000 --conflict updateConflict handling options:
update: Update existing records with incoming data (recommended so the old production values usingfriendlier_idtake precedence)skip: Skip records with duplicate IDs (useful for incremental loads when no overwrite is desired)fail: Stop on first conflict
Expected output:
Processing X records in batches of 1000...
Progress: 1000/X (Y%)
...
✓ Import complete
Total records: X
Imported: Y
Skipped: Z
Verify the imported data and spot-check a sample:
python db/migrations/import_from_old_production.py --verifyThis will:
- Compare record counts between old and new databases
- Sample records and verify they match
- Report any discrepancies
- Log how many rows were updated versus newly inserted based on the conflict strategy
If you need to update the materialized view (e.g., after data changes in old DB):
python db/migrations/bridge_old_production.py --refreshExport the transformed data for backup or analysis:
python db/migrations/bridge_old_production.py --export /path/to/output.jsonThe export is in JSON Lines format, one record per line.
Sample and inspect the old database structure:
python db/migrations/bridge_old_production.py --sample 10This helps understand the json_attributes structure.
Error: "Materialized view does not exist"
Solution: Run the bridge creation step:
python db/migrations/bridge_old_production.py --create-viewError: "Could not connect to database"
Solution:
- Verify Docker containers are running:
docker compose ps - Check environment variables in
.envfile - Ensure
DB_HOSTis set tolocalhost(notparadedb)
Error: Unexpected field names or types
Solution:
- Run
--verifyto see field mapping details - Inspect sample data:
--sample - Check the
kithe_models.json_attributesstructure in the old database - Update the bridge script if field names differ
Error: "Integrity error" or "duplicate key"
Solution:
- Use
--conflict skipto skip duplicates - Or use
--conflict updateto update existing records - Check for ID duplicates in source data
Symptom: Import is very slow
Solution:
- Increase batch size:
--batch-size 5000 - Check database indexes are properly created
- Monitor resource usage:
docker stats - Consider running during low-traffic period
If you need to rollback the migration:
-
Restore from backup:
make db-import
make db-importnow preserves destination-localapi_service_tiers,api_keys,analytics_api_usage_logs,analytics_searches,analytics_search_impressions, andanalytics_eventsby default, along with their owned*_id_seqsequences. If you need a full overwrite during rollback, rerunmake db-export DB_SYNC_PRESERVE_LOCAL_TABLES=falseandmake db-import DB_SYNC_PRESERVE_LOCAL_TABLES=false. -
Or manually clean:
-- Connect to new database TRUNCATE TABLE resources CASCADE; -
Restore from backup file:
gunzip -c tmp/btaa_geospatial_api_export.sql.gz | docker exec -i btaa-geospatial-api-paradedb psql -U postgres -d btaa_geospatial_api
After successful migration:
-
Update Elasticsearch index:
python run_index.py
-
Refresh cached data:
- Clear Redis cache if needed
- Restart application services
-
Verify API endpoints:
- Test search functionality
- Check individual resource views
- Verify facets and filters
-
Monitor:
- Check application logs
- Monitor database performance
- Verify API response times
Complete list of fields in the resources table:
id,dct_title_s
dct_alternative_sm,dct_description_sm,dct_language_sm,gbl_displayNote_sm
dct_creator_sm,dct_publisher_sm,schema_provider_s
gbl_resourceClass_sm,gbl_resourceType_sm
dct_subject_sm,dcat_theme_sm,dcat_keyword_sm
dct_temporal_sm,dct_issued_s,gbl_indexYear_im,gbl_dateRange_drsim
dct_spatial_sm,locn_geometry,dcat_bbox,dcat_centroid
dct_relation_sm,pcdm_memberOf_sm,dct_isPartOf_sm,dct_source_smdct_isVersionOf_sm,dct_replaces_sm,dct_isReplacedBy_sm
dct_rights_sm,dct_rightsHolder_sm,dct_license_sm,dct_accessRights_s
dct_format_s,gbl_fileSize_s,gbl_wxsIdentifier_s,dct_references_s
dct_identifier_sm,gbl_mdModified_dt,gbl_mdVersion_sgbl_suppressed_b,gbl_georeferenced_b
b1g_code_s,b1g_status_s,b1g_dct_accrualMethod_s,b1g_dct_accrualPeriodicity_sb1g_dateAccessioned_s,b1g_dateAccessioned_sm,b1g_dateRetired_s,b1g_child_record_bb1g_dct_mediator_sm,b1g_access_s,b1g_image_ss,b1g_geonames_smb1g_publication_state_s,b1g_language_sm,b1g_creatorID_smb1g_dct_conformsTo_sm,b1g_dcat_spatialResolutionInMeters_smb1g_geodcat_spatialResolutionAsText_sm,b1g_dct_provenanceStatement_smb1g_adminTags_smb1g_adms_supportedSchema_sm,b1g_dcat_endpointDescription_s,b1g_dcat_endpointURL_sb1g_dcat_inSeries_sm,b1g_localCollectionLabel_smb1g_prov_softwareAgent_sm,b1g_prov_wasGeneratedBy_smdate_created_dtsi,date_modified_dtsi,geomg_id_s