mbind

NAME

mbind - Set memory policy for a memory range

SYNOPSIS

 #include <numaif.h>
 
 int mbind(void *start, unsigned long len, int policy,
           unsigned long *nodemask, unsigned long maxnode,
           unsigned flags);
 
 cc ... -lnuma
 

DESCRIPTION

mbind() sets the NUMA memory policy for the memory range starting with start and continuing for len bytes. The memory of a NUMA machine is divided into multiple nodes. The memory policy defines in which node memory is allocated. mbind() only has an effect for new allocations; if the pages inside the range have been already touched before setting the policy, then the policy has no effect.

Available policies are MPOL_DEFAULT, MPOL_BIND, MPOL_INTERLEAVE, and MPOL_PREFERRED. All policies except MPOL_DEFAULT require the caller to specify the nodes to which the policy applies in the nodemask parameter. nodemask is a bit mask of nodes containing up to maxnode bits. The actual number of bytes transferred via this argument is rounded up to the next multiple of sizeof(unsigned long), but the kernel will only use bits up to maxnode. A NULL argument means an empty set of nodes.

The MPOL_DEFAULT policy is the default and means to use the underlying process policy (which can be modified with set_mempolicy(2)). Unless the process policy has been changed this means to allocate memory on the node of the CPU that triggered the allocation. nodemask should be specified as NULL.

The MPOL_BIND policy is a strict policy that restricts memory allocation to the nodes specified in nodemask. There won't be allocations on other nodes.

MPOL_INTERLEAVE interleaves allocations to the nodes specified in nodemask. This optimizes for bandwidth instead of latency. To be effective the memory area should be fairly large, at least 1MB or bigger.

MPOL_PREFERRED sets the preferred node for allocation. The kernel will try to allocate in this node first and fall back to other nodes if the preferred nodes is low on free memory. Only the first node in the nodemask is used. If no node is set in the mask, then the memory is allocated on the node of the CPU that triggered the allocation allocation).

If MPOL_MF_STRICT is passed in flags and policy is not MPOL_DEFAULT, then the call will fail with the error EIO if the existing pages in the mapping don't follow the policy. In 2.6.16 or later the kernel will also try to move pages to the requested node with this flag.

If MPOL_MF_MOVE is passed in flags, then an attempt will be made to move all the pages in the mapping so that they follow the policy. Pages that are shared with other processes are not moved. If MPOL_MF_STRICT is also specified, then the call will fail with the error EIO if some pages could not be moved.

If MPOL_MF_MOVE_ALL is passed in flags, then all pages in the mapping will be moved regardless of whether other processes use the pages. The calling process must be privileged (CAP_SYS_NICE) to use this flag. If MPOL_MF_STRICT is also specified, then the call will fail with the error EIO if some pages could not be moved.

RETURN VALUE

On success, mbind() returns 0; on error, -1 is returned and errno is set to indicate the error.

ERRORS

EFAULT
There was a unmapped hole in the specified memory range or a passed pointer was not valid.
EINVAL
An invalid value was specified for flags or mode; or start + len was less than start; or policy was MPOL_DEFAULT and nodemask pointed to a non-empty set; or policy was MPOL_BIND or MPOL_INTERLEAVE and nodemask pointed to an empty set,
ENOMEM
System out of memory.
EIO
MPOL_MF_STRICT was specified and an existing page was already on a node that does not follow the policy.

CONFORMING TO

This system call is Linux specific.

NOTES

NUMA policy is not supported on file mappings.

MPOL_MF_STRICT is ignored on huge page mappings right now.

It is unfortunate that the same flag, MPOL_DEFAULT, has different effects for mbind(2) and set_mempolicy(2). To select "allocation on the node of the CPU that triggered the allocation" (like set_mempolicy(2) MPOL_DEFAULT) when calling mbind(), specify a policy of MPOL_PREFERRED with an empty nodemask.

Versions and Library Support

The mbind(), get_mempolicy(2), and set_mempolicy(2) system calls were added to the Linux kernel with version 2.6.7. They are only available on kernels compiled with CONFIG_NUMA.

Support for huge page policy was added with 2.6.16. For interleave policy to be effective on huge page mappings the policied memory needs to be tens of megabytes or larger.

MPOL_MF_MOVE and MPOL_MF_MOVE_ALL are only available on Linux 2.6.16 and later.

These system calls should not be used directly. Instead, the higher level interface provided by the numa(3) functions in the numactl package is recommended. The numactl package is available at ftp://ftp.suse.com/pub/people/ak/numa/.

You can link with -lnuma to get system call definitions. libnuma is available in the numactl package. This package also has the numaif.h header.

SEE ALSO

numa(3), numactl(8), set_mempolicy(2), get_mempolicy(2), mmap(2)