Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 23 additions & 9 deletions doc/content/lib/xenctrl/xc_domain_node_setaffinity.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,30 @@ https://github.com/xen-project/xen/blob/master/xen/common/domain.c#L943-L970"

This function implements the functionality of `xc_domain_node_setaffinity`
to set the NUMA affinity of a domain as described above.
If the new_affinity does not intersect the `node_online_map`,
it returns `-EINVAL`, otherwise on success `0`.

When the `new_affinity` is a specific set of NUMA nodes, it updates the NUMA
`node_affinity` of the domain to these nodes and disables `auto_node_affinity`
for this domain. It also notifies the Xen scheduler of the change.

This sets the preference the memory allocator to the new NUMA nodes,
and in theory, it could also alter the behaviour of the scheduler.
This of course depends on the scheduler and its configuration.
- If `new_affinity` does not intersect the `node_online_map`,
it returns `-EINVAL`. Otherwise, the result is a success and it returns `0`.
- When the `new_affinity` is a specific set of NUMA nodes,
it sets `d->node_affinity` of the domain to these nodes
and disables `auto_node_affinity` for this domain.
- If `new_affinity` has all bits set, it re-enables `auto_node_affinity`
for this domain and calls
[domain_update_node_aff()](https://github.com/xen-project/xen/blob/e16acd80/xen/common/sched/core.c#L1809-L1876)
to re-set the domain's `node_affinity` mask to the NUMA nodes of the current
the hard and soft affinity of the domain's online vCPUs.

The result of changing the domains' node affinity changes the
preference of the memory allocator to the new NUMA nodes.

Currently, the only scheduling change is that if set before vCPU creation,
the initial pCPU of the new vCPU is the first pCPU of the first NUMA node
in the domain's `node_affinity`. This is if further changed when one of more
`cpupools` are set up.

When done early, before vCPU creation, domain-related data structures
could be allocated using the domain's `node_affinity` NUMA node mask.
With further changes in Xen, also the vCPU struct could be allocated
using it.

## Notes on future design improvements

Expand Down
71 changes: 40 additions & 31 deletions doc/content/xenopsd/walkthroughs/VM.build/xenguest.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
title: xenguest
description:
"Perform building VMs: Allocate and populate the domain's system memory."
mermaid:
force: true
---
As part of starting a new domain in VM_build, `xenopsd` calls `xenguest`.
When multiple domain build threads run in parallel,
Expand Down Expand Up @@ -83,38 +85,30 @@ Xenstore[Xenstore platform data] --> xenguest

When called to build a domain, `xenguest` reads those and builds the VM accordingly.

## Walkthrough of the xenguest build mode
## Walk-through of the xenguest build mode

```mermaid
flowchart
subgraph xenguest[xenguest #8209;#8209;mode hvm_build domid]
direction LR
stub_xc_hvm_build[stub_xc_hvm_build#40;#41;] --> get_flags[
get_flags#40;#41;&nbsp;<#8209;&nbsp;Xenstore&nbsp;platform&nbsp;data
]
stub_xc_hvm_build --> configure_vcpus[
configure_vcpus#40;#41;&nbsp;#8209;>&nbsp;Xen&nbsp;hypercall
]
stub_xc_hvm_build --> setup_mem[
setup_mem#40;#41;&nbsp;#8209;>&nbsp;Xen&nbsp;hypercalls&nbsp;to&nbsp;setup&nbsp;domain&nbsp;memory
]
end
```
{{% include "xenguest/mode_vm_build.md" %}}

Based on the given domain type, the `xenguest` program calls dedicated
functions for the build process of the given domain type.
The domain build functions
[stub_xc_hvm_build()](https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436)
and stub_xc_pv_build() call these functions:

These are:
1. `get_flags()` to get the platform data from the Xenstore
for filling out the fields of `struct flags` and `struct xc_dom_image`.
2. `configure_vcpus()` which uses the platform data from the Xenstore to configure:
- If `platform/vcpu/<vcpu-num>/affinity` is set, the vCPU affinity.

- `stub_xc_hvm_build()` for HVM,
- `stub_xc_pvh_build()` for PVH, and
- `stub_xc_pv_build()` for PV domains.
By default, this sets the domain's `node_affinity` mask (NUMA nodes) as well.
This configures
[`get_free_buddy()`](https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L855-L958)
to prefer memory allocations from this NUMA node_affinity mask.
- If `platform/vcpu/weight` is set, the domain's scheduling weight
- If `platform/vcpu/cap` is set, the domain's scheduling cap (%cpu time)
3. The `<domain_type>_build_setup_mem` function for the given domain type.

These domain build functions call these functions:
Call graph of `do_hvm_build()` with emphasis on information flow:

1. `get_flags()` to get the platform data from the Xenstore
2. `configure_vcpus()` which uses the platform data from the Xenstore to configure vCPU affinity and the credit scheduler parameters vCPU weight and vCPU cap (max % pCPU time for throttling)
3. The `setup_mem` function for the given VM type.
{{% include "xenguest/do_hvm_build" %}}

## The function hvm_build_setup_mem()

Expand All @@ -129,26 +123,41 @@ new domain. It must:
4. Call the `libxenguest` function `xc_dom_boot_mem_init()` (see below)
5. Call `construct_cpuid_policy()` to apply the CPUID `featureset` policy

It starts this by:
- Getting `struct xc_dom_image`, `max_mem_mib`, and `max_start_mib`.
- Calculating start and size of lower ranges of the domain's memory maps
- taking memory holes for I/O into account, e.g. `mmio_size` and `mmio_start`.
- Calculating `lowmem_end` and `highmem_end`.

It then calls `xc_dom_boot_mem_init()`:

## The function xc_dom_boot_mem_init()

`hvm_build_setup_mem()` calls
[xc_dom_boot_mem_init()](https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_boot.c#L110-L126)
to allocate and populate the domain's system memory:

```mermaid
flowchart LR
subgraph xenguest
hvm_build_setup_mem[hvm_build_setup_mem#40;#41;]
end
subgraph libxenguest
hvm_build_setup_mem --> xc_dom_boot_mem_init[xc_dom_boot_mem_init#40;#41;]
hvm_build_setup_mem --vmemranges--> xc_dom_boot_mem_init[xc_dom_boot_mem_init#40;#41;]
xc_dom_boot_mem_init -->|vmemranges| meminit_hvm[meninit_hvm#40;#41;]
click xc_dom_boot_mem_init "https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_boot.c#L110-L126" _blank
click meminit_hvm "https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_x86.c#L1348-L1648" _blank
end
```

`hvm_build_setup_mem()` calls
[xc_dom_boot_mem_init()](https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_boot.c#L110-L126)
to allocate and populate the domain's system memory.
Except error handling and tracing, it only is a wrapper to call the
architecture-specific `meminit()` hook for the domain type:

```c
rc = dom->arch_hooks->meminit(dom);
```

It calls
For HVM domains, it calls
[meminit_hvm()](https://github.com/xen-project/xen/blob/39c45c/tools/libs/guest/xg_dom_x86.c#L1348-L1648)
to loop over the `vmemranges` of the domain for mapping the system RAM
of the guest from the Xen hypervisor heap. Its goals are:
Expand Down
62 changes: 62 additions & 0 deletions doc/content/xenopsd/walkthroughs/VM.build/xenguest/do_hvm_build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: Call graph of xenguest/do_hvm_build()
description: Call graph of xenguest/do_hvm_build() with emphasis on information flow
---
```mermaid
flowchart TD
do_hvm_build("<tt>do_hvm_build()</tt> for HVM")
--> stub_xc_hvm_build
get_flags("<tt>get_flags()</tt>") --"VM platform_data from XenStore"
--> stub_xc_hvm_build("<tt>stub_xc_hvm_build()</tt>")
stub_xc_hvm_build --> configure_vcpus(configure_vcpus#40;#41;)
configure_vcpus --"When<br><tt>platform/
vcpu/%d/affinity</tt><br>is set"--> xc_vcpu_setaffinity
configure_vcpus --"When<br><tt>platform/
vcpu/cap</tt><br>or<tt>
vcpu/weight</tt><br>is set"--> xc_sched_credit_domain_set
stub_xc_hvm_build --"struct xc_dom_image, mem_start_mib, mem_max_mib"
--> hvm_build_setup_mem("hvm_build_setup_mem()")
--"struct xc_dom_image
with
optional vmemranges"--> xc_dom_boot_mem_init
subgraph libxenguest
xc_dom_boot_mem_init("xc_dom_boot_mem_init()")
-- "struct xc_dom_image
with
optional vmemranges" -->
meminit_hvm("meminit_hvm()") -- page_size(1GB,2M,4k, memflags: e.g. exact) -->
xc_domain_populate_physmap("xc_domain_populate_physmap()")
end
subgraph set_affinity[XenCtrl Hypercalls]
direction TB
xc_sched_credit_domain_set("xc_sched_credit_domain_set()")
xc_vcpu_setaffinity("xc_vcpu_setaffinity()")
--> vcpu_set_affinity("vcpu_set_affinity()")
--> domain_update_node_aff("domain_update_node_aff()")
-- "if auto_node_affinity
is on (default)"--> auto_node_affinity(Update dom->node_affinity)
end
subgraph Xen hypervisor
xc_domain_populate_physmap("xc_domain_populate_physmap()")
--hypercall: XENMEM_populate_physmap
-->memory_op_populate_physmap("memory_op(populate_physmap)")
end
click vcpu_set_affinity
"https://github.com/xen-project/xen/blob/e16acd806/xen/common/sched/core.c#L1353-L1393" _blank
click domain_update_node_aff
"https://github.com/xen-project/xen/blob/e16acd806/xen/common/sched/core.c#L1809-L1876" _blank
click stub_xc_hvm_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank
click hvm_build_setup_mem
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2002-L2219" _blank
click get_flags
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank
click configure_vcpus
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297" _blank
click xc_dom_boot_mem_init
"https://github.com/xen-project/xen/blob/e16acd806/tools/libs/guest/xg_dom_boot.c#L110-L125"
click meminit_hvm
"https://github.com/xen-project/xen/blob/e16acd806/tools/libs/guest/xg_dom_x86.c#L1348-L1648"
click xc_domain_populate_physmap
"../../../../lib/xenctrl/xc_domain_populate_physmap/index.html" _blank
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
hidden: true
title: Call graph to the xenguest hvm/pvh/pv build functions
description: Call graph of xenguest for calling the hvm/pvh/pv build functions
---
```mermaid
flowchart LR
xenguest_main["
<tt>xenguest
--mode hvm_build
/
--mode pvh_build
/
--mode pv_build
"] --> do_hvm_build["
<tt>do_hvm_build()</tt> for HVM
"] & do_pvh_build["<tt>do_pvh_build()</tt> for PVH"] -- "`**Arguments:**
domid
mem_max_mib
mem_start_mib
image
store_port
store_domid
console_port
console_domid`" --> stub_xc_hvm_build["<tt>stub_xc_hvm_build()"]
xenguest_main --> do_pv_build[<tt>do_pvh_build</tt> for PV] -->
stub_xc_pv_build["<tt>stub_xc_pv_build()"]
click do_pv_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L575-L594" _blank
click do_hvm_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L596-L615" _blank
click do_pvh_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L617-L640" _blank
click stub_xc_hvm_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank
```