Mike Joseph
2018-11-12 02:25:10 UTC
Hi folks,
It appears that the numa_policy attribute of a PCI alias is ignored for
flavors referencing that alias if the flavor also has
hw:cpu_policy=dedicated set. The alias config is:
alias = { "name": "mlx", "device_type": "type-VF", "vendor_id": "15b3",
"product_id": "1004", "numa_policy": "preferred" }
And the flavor config is:
{
"OS-FLV-DISABLED:disabled": false,
"OS-FLV-EXT-DATA:ephemeral": 0,
"access_project_ids": null,
"disk": 10,
"id": "221e1bcd-2dde-48e6-bd09-820012198908",
"name": "vm-2",
"os-flavor-access:is_public": true,
"properties": "hw:cpu_policy='dedicated', pci_passthrough:alias='mlx:1'",
"ram": 8192,
"rxtx_factor": 1.0,
"swap": "",
"vcpus": 2
}
In short, our compute nodes have an SR-IOV Mellanox NIC (ConnectX-3) with
16 VFs configured. We wish to expose these VFs to VMs that schedule on the
host. However, the NIC is in NUMA region 0 which means that only half of
the compute node's CPU cores would be usable if we required VM affinity to
the NIC's NUMA region. But we don't need that, since we are okay with
cross-region access to the PCI device.
However, we do need CPU pinning to work, in order to have efficient cache
hits on our VM processes. Therefore, we still want to pin our vCPUs to
pCPUs, even if the pins end up on on a NUMA region opposite of the NIC.
The spec for numa_policy seem to indicate that this is exactly the intent
of the option:
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/share-pci-between-numa-nodes.html
But, with the above config, we still get PCI affinity scheduling errors:
'Insufficient compute resources: Requested instance NUMA topology together
with requested PCI devices cannot fit the given host NUMA topology.'
This strikes me as a bug, but perhaps I am missing something here?
Thanks,
MJ
It appears that the numa_policy attribute of a PCI alias is ignored for
flavors referencing that alias if the flavor also has
hw:cpu_policy=dedicated set. The alias config is:
alias = { "name": "mlx", "device_type": "type-VF", "vendor_id": "15b3",
"product_id": "1004", "numa_policy": "preferred" }
And the flavor config is:
{
"OS-FLV-DISABLED:disabled": false,
"OS-FLV-EXT-DATA:ephemeral": 0,
"access_project_ids": null,
"disk": 10,
"id": "221e1bcd-2dde-48e6-bd09-820012198908",
"name": "vm-2",
"os-flavor-access:is_public": true,
"properties": "hw:cpu_policy='dedicated', pci_passthrough:alias='mlx:1'",
"ram": 8192,
"rxtx_factor": 1.0,
"swap": "",
"vcpus": 2
}
In short, our compute nodes have an SR-IOV Mellanox NIC (ConnectX-3) with
16 VFs configured. We wish to expose these VFs to VMs that schedule on the
host. However, the NIC is in NUMA region 0 which means that only half of
the compute node's CPU cores would be usable if we required VM affinity to
the NIC's NUMA region. But we don't need that, since we are okay with
cross-region access to the PCI device.
However, we do need CPU pinning to work, in order to have efficient cache
hits on our VM processes. Therefore, we still want to pin our vCPUs to
pCPUs, even if the pins end up on on a NUMA region opposite of the NIC.
The spec for numa_policy seem to indicate that this is exactly the intent
of the option:
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/share-pci-between-numa-nodes.html
But, with the above config, we still get PCI affinity scheduling errors:
'Insufficient compute resources: Requested instance NUMA topology together
with requested PCI devices cannot fit the given host NUMA topology.'
This strikes me as a bug, but perhaps I am missing something here?
Thanks,
MJ