Add clustered migration sync for shared disks (SYNCING barrier)#403
Add clustered migration sync for shared disks (SYNCING barrier)#403fabi200123 wants to merge 1 commit intocloudbase:masterfrom
Conversation
| shared disks. Once the owner replicate task completes, any | ||
| waiting (SYNCING) replicate tasks are moved back to SCHEDULED | ||
| so they can continue their normal flow. | ||
| """ |
There was a problem hiding this comment.
So as far as I understand this, this will basically block the rest of the cluster while the main one gets transferred. I mean this is fine for shared disks, but what I proposed initially was to have them all running in parallel once the syncing tasks (which was not meant to be REPLICATE_DISKS btw) are done. You can add SYNCING on this task alone while waiting for the rest of the tasks to complete (tasks like DEPLOY_REPLICA_DISKS, SHUTDOWN_INSTNANCE), that's also feasible, but please set the volumes_info accordingly, not block all of them.
What I originally envisioned was something like this:
instance1 has: root_disk1, shared_disk1;
instance2 has root_diks2, shared_disk1.
What's the point of blocking instance2 while instance1 is replicating? When you can have the following:
instance1 replicates root_disk1, shared_disk1;
instance2 replicates root_disk2, skips shared_disk1, based on the volumes_info that you can set up before launching the replicate_disks task.
There was a problem hiding this comment.
Please let me know if there's anything preventing us from doing a parallel sync, with shared_disks being transferred by only one of the instances.
8152e8e to
9813d2b
Compare
Introduce TASK_STATUS_SYNCING and TASK_TYPES_TO_SYNC (GET_INSTANCE_INFO, DEPLOY_TRANSFER_DISKS, REPLICATE_DISKS) so multi-instance transfers with base_transfer_action.clustered=True wait for all peer tasks of the same type before leaving SYNCING for COMPLETED and advancing dependents. - clustered is set as len(instances) > 1 on transfer create - On task_completed: enter SYNCING when the barrier applies, then when every peer is SYNCING, run sync hooks (GET_INSTANCE_INFO: promote shareable on export disks, DEPLOY_TRANSFER_DISKS: shared-disk volumes_info, REPLICATE_DISKS: sync change_id) - ReplicateDisksTask: skip provider replicate for replicate_disk_data=False - On task error: abort peers stuck in SYNCING for the same task type
| constants.TASK_STATUS_PENDING, | ||
| constants.TASK_STATUS_STARTING): | ||
| constants.TASK_STATUS_STARTING, | ||
| constants.TASK_STATUS_SYNCING): |
There was a problem hiding this comment.
Wasn't this already handling whatever _abort_peer_sync_barrier_tasks_on_error does? If one of the tasks gets an error, the rest of the tasks get cancelled, even if the error is from another instance.
| return s if s else None | ||
|
|
||
|
|
||
| def cluster_disk_identity(disk_id_or_obj): |
There was a problem hiding this comment.
If this is just some kind of workaround for the fact that we use disk_keys instead of IDs, I think you should fix this provider-side. The provider should always return a good identifier as the disk_id (one that's the same amongst clustered instances). In this case, for VMWare, you should use the disk's FCD ID instead of VM device key (which is arbitrary).
| or not v.get( | ||
| constants.VOLUME_INFO_REPLICATE_DISK_DATA, True)) | ||
| else 1)) | ||
| if sorted_volumes != volumes_info: |
There was a problem hiding this comment.
Why are volumes re-ordered?
| owner_vol = owner_volumes_by_disk.get(owner_id, {}).get(ident) | ||
| if not owner_vol: | ||
| continue | ||
| if "change_id" in owner_vol and ( |
There was a problem hiding this comment.
change_idkey is not the only incremental key we use/might use;- should we update this key for all of the instances? As long as we keep the same owner or
replicate_disk_datathe same between syncs, I don't think so. It's not like users are able to remove the "owner" from the clustered migration anyway, that stays the same, always.
| } | ||
| }, | ||
| "required": ["disk_id", "volume_dev"], | ||
| "required": ["disk_id"], |
There was a problem hiding this comment.
Why was volume_dev removed?
The replicate_disks heavily relies on it, even if there's no shared disks (the most common case).
I think we'd be better off just setting an empty volume_dev for the volumes we don't actually transfer (non owners). It's better to enforce it for the most common case instead of removing it entirely imo.
| volumes_info = _get_volumes_info(task_info) | ||
| # NOTE: parallel minion attach may run before conductor merges | ||
| # shareable. | ||
| utils.apply_export_disk_shareable_metadata_to_volumes_info( |
There was a problem hiding this comment.
Is this really necessary?
How can the minion make attachments if the deploy replica disks tasks are synced?
Introduce TASK_STATUS_SYNCING and TASK_TYPES_REQUIRING_CLUSTER_SYNC (DEPLOY_TRANSFER_DISKS, SHUTDOWN_INSTANCE) so multi-instance transfers with base_transfer_action.clustered=True wait for all peer tasks before marking COMPLETED and advancing dependents.
Volumes schema already allows extra properties; replicate_disk_data is consumed by replication only (default True preserves behavior).