BaseLib: Strengthen X86MemoryFence#12274
Conversation
3ed1b3f to
d80569d
Compare
|
Please add some real examples where the order observed is incorrect. The _MemoryBarrier() is already used in IoLib where needed to prevent compiler optimizations from reordering operations that break real use cases. That is mainly around the use of intrinsics for IoRead/Write operations to I/O ports. For registers that are in uncached memory ranges, the Mmio*() functions use |
|
I am not sure of real world cases, but please hear me out a bit. Agreed other instances of _ReadWriteBarrier, and volatile, help alleviate We have shared memory between VM and host, so I/O functions are not used. If this function MemoryFence really can safely be completely empty, why do we have the Gcc form: ? Presumably these three implementations should be more similar than they currently are? I realize the comments say the Gcc versions are overkill, but why? This is what motivates this PR asis. With just _ReadWriteBarrier. And then, of further concern, as I noted and you mentioned:
This is not quite true. Largely true, but not entirely. https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ https://preshing.com/20120515/memory-reordering-caught-in-the-act/ That is what people might expect MemoryFence to inhibit, and what in fact MemoryBarrier does. I don't think there is any technical difference between "barrier" and "fence", just similar words people chose. Oh and why I got here, is because I have this existing code, and I was wondering about removing Thank you for your consideration here. |
|
Sorry I see I read your uncached as cached, right. |
Yes. This is a different use case. This is the "mailbox" model for sending data between two subsystems. One example of this in the UEFI Specification is PCI DMA host controllers. Some of them use one directional read or write transactions. Other use mailboxes with constant access from CPU and PCI Controller. When the mailbox model is used, a "CommonBuffer" must be allocated that has properties to support simultaneous access from both. This could be limited resources such as static RAM in chipset. Could be regular memory that has specific cache setting requirements. This is why there is a clearly documented set of services and algorithms in the UEFI Spec and the UEFI Driver Writer's Guide for this more complex mailbox type of interaction. For your specific use case, you are describing a mailbox in a memory buffer between a VM Host and a VM Guest. If the VM Host and VM Guest can access that mailbox at the same time from different logical processors, then the mailbox interface has to be carefully designed to know who the current owner is for setting up a transaction and ring the doorbell to release a mailbox to the other agent (ownership transfer). That transfer of ownership from one agent to the other is where memory ordering is very important. Need to make sure all transactions to the mailbox visible are in the coherency domain of the other agent before the ownership transfer is triggered. Of course using a non-cached buffer for the mailbox helps resolve a lot of issues, but may have perf impacts. Using cached buffer involves cache line ownership assumptions and that is where CPU specific instructions related to cache line ownership must be used. It is also where visibility of dirty cache line (also CPU specific) must be handled. The edk2 SynchronizationLib and the edk2 BaseLib MemoryFence() service are the CPU specific services required for the use case you are describing if cached buffers are used. The _ReadWriteBarrier() compiler hint should really only apply to the ownership transfer point. And applies to both cached and uncached mappings of the mailbox. Using _ReadWriteBarrier() before and after the write to signal ownership transfer is required. If you look at the implementations of the SynchronizationLib, you will see the use of _ReadWriteBarrier() is already there. So using SynchronizationLib for ownership transfer is the correct edk2 mechanism to apply for this use case. The same approach would be used if shared memory mailboxes are created to communicate between CPUs that have been started using the MP Services Protocol. This use case is not specific to VMs. |
|
I think we are agreeing? I mean, should UEFI library offer a portable name I mean, you want to greatly minimize use of intrinsics, and hide them I suggest UEFI borrow from standard C here. https://en.cppreference.com/w/cpp/header/stdatomic.h atomic_signal_fence atomic_thread_fence In our UEFI downstream, I use both of these for portability. Come up with names for these, then replace each MemoryFence |
Description
Under LTCG X86MemoryFence is completely broken.
It does nothing at all.
Under NoLTCG it was reasonable.
How This Was Tested
Build.
Integration Instructions
Very easy to integrate.
While this is progress, I still feel uneasy and I think this merits further discussion.
In our code we have
MemoryBarrier. That is intrinsic__faststorefence. That islock or [rsp],0.A "full" memory barrier, that no loads or stores will cross, in either direction.
I think that might be what people expect, and that is effectively what ARM64 does (
dmb).But I am also afraid to slow things too much.
FullMemoryFence?