This is a brief guide for migrating PosgreSQL repositories to UUID dataset IDs based on experience obtained when we migrated few repositories at NCSA (see this post).
Migration from integer IDs to UUIDs is not strictly necessary but it is advisable to switch repository to UUID if you plan to import contents of other repositories. Depending on size of the repository it can take significant time to migrate, for example migration of largest NCSA repository (
dc2 with ~30 million records) took more than three hours. Migration process generally requires exclusive access to database, it is safest to ask DBAs to limit access to a database to just a single user account which is going to perform this migration (doing
ALTER ROLE user NOLOGIN for all other accounts).
This particular migration executes a series of
ALTER TABLE ... statements which in PostgreSQL require user account executing them to be the owner of the modified table (or PostgreSQL superuser). Butler registry contains multiple tables created on demand by users adding new data which leads to a mixed ownership. For successful migration all tables that undergo migration need to be transferred to an account doing migration. The list of tables that are updated by this migration includes:
- all table names starting with
- all table names starting with
The command to change ownership of a table is
ALTER TABLE table_name OWNER TO new_role. Alternatively the role that is executing migration commands can be given superuser privileges with command
ALTER ROLE role SUPERUSER , all of the
ALTER commands obviously need to be executed by DBA or any other superuser role.
Actual migration is performed by a tool which is a part of the
daf_butler_migrate package which is not a part of LSST stack so it needs to be cloned separately:
git clone email@example.com:lsst-dm/daf_butler_migrate.git cd daf_butler_migrate source some-path/loadLSST.bash setup -r .
The command that performs all migration-related operations is
butler migrate. To check current status of the repository versions as recorded in the registry you can run this command,
repo is the location of the
$ repo=/path/to/butler.yaml $ butler --log-level=warning migrate show-current $repo --butler attributes: lsst.daf.butler.registry.attributes.DefaultButlerAttributeManager 1.0.0 -> f22a777cf382 collections: lsst.daf.butler.registry.collections.nameKey.NameKeyCollectionManager 2.0.0 -> 93341d68b814 datasets: lsst.daf.butler.registry.datasets.byDimensions._manager.ByDimensionsDatasetRecordStorageManager 1.0.0 -> 635083325f20 datastores: lsst.daf.butler.registry.bridge.monolithic.MonolithicDatastoreRegistryBridgeManager 0.2.0 -> a07b3b60e369 dimensions: lsst.daf.butler.registry.dimensions.static.StaticDimensionRecordStorageManager 6.0.0 -> 87a30df8c8c5 opaque: lsst.daf.butler.registry.opaque.ByNameOpaqueTableStorageManager 0.2.0 -> 77e5b803ad3f
The scope of this UUID migration is
datasets manager, and its name and version has to be as shown above
ByDimensionsDatasetRecordStorageManager 1.0.0. Other managers may have different versions from shown above, this should not matter for UUID migration.
Migration tool is based on
Alembic which requires presence of one special table in the database. If this is the first ever migration performed on a database then this special table needs to be created by
$ butler --log-level=warning migrate stamp $repo
After this command one can check revision numbers in alembic table:
$ butler --log-level=warning migrate show-current $repo a07b3b60e369 (head) 77e5b803ad3f (head) 93341d68b814 (head) 635083325f20 (head) f22a777cf382 (head) 87a30df8c8c5 (head)
(the output will match revisions from previous command with
This UUID migration is a special what is called “one-shot” migration which changes type of the
datasets manager from
ByDimensionsDatasetRecordStorageManager 1.0.0 to
ByDimensionsDatasetRecordStorageManagerUUID 1.0.0, revision number for latter is
2101fbf51ad3. The command that performs migration is (redirect its output to a log file in case something goes wrong):
$ butler --log-level=debug --long-log migrate upgrade --one-shot-tree datasets/int_1.0.0_to_uuid_1.0.0 $repo 2101fbf51ad3 |& tee migration.log
The migration can print some warning messages, please ignore. This command can take significant time if the database is large. It is executed in a single database transaction, if anything fails during the migration then the state is reverted to original.
After this command completes one can check the new version numbers:
$ butler --log-level=warning migrate show-current $repo --butler attributes: lsst.daf.butler.registry.attributes.DefaultButlerAttributeManager 1.0.0 -> f22a777cf382 collections: lsst.daf.butler.registry.collections.nameKey.NameKeyCollectionManager 2.0.0 -> 93341d68b814 datasets: lsst.daf.butler.registry.datasets.byDimensions._manager.ByDimensionsDatasetRecordStorageManagerUUID 1.0.0 -> 2101fbf51ad3 datastores: lsst.daf.butler.registry.bridge.monolithic.MonolithicDatastoreRegistryBridgeManager 0.2.0 -> a07b3b60e369 dimensions: lsst.daf.butler.registry.dimensions.static.StaticDimensionRecordStorageManager 6.0.0 -> 87a30df8c8c5 opaque: lsst.daf.butler.registry.opaque.ByNameOpaqueTableStorageManager 0.2.0 -> 77e5b803ad3f
which shows that
datasets manager is recorded as
ByDimensionsDatasetRecordStorageManagerUUID 1.0.0 in the database. This change also needs to be reflected in
butler.yaml file. Edit the file and replace
$ vi $repo :1,$s/ByDimensionsDatasetRecordStorageManager/ByDimensionsDatasetRecordStorageManagerUUID/ :wq
To verify that everything works after migration you can run regular butler commands, e.g.:
$ butler query-datasets $repo --collections <collection> <dataset-type>