Large Page Considerations

Compiler Methodology for Intel® MIC Architecture

Large Page Considerations


Use THP enabled by default in the MPSS Operating System:

MPSS versions later than 2.1.4982-15 support “Transparent Huge Pages (THP)” which automatically promotes 4K pages to 2MB pages for stack and heap allocated data. This means that for static and dynamic data, 4KB pages get automatically converted by the uOS to 2MB pages if they have a contiguous data access pattern. You can find more details here: http://software.intel.com/en-us/blogs/2013/07/09/transparent-huge-pages-on-intel-xeon-phi-coprocessors

“Transparent huge pages” is a Linux kernel feature introduced in kernel version 2.6.38. The external link  http://lwn.net/Articles/423584/ gives the general picture about how Linux allocates useful huge pages without starving the application as to the number of available pages. 

User programs can use mmap with special arguments to allocate data directly in 2MB pages

User programs can directly allocate dynamic data in  2MB pages using the mmap system call (with special arguments) instead of malloc/new. This may be useful if the data access pattern is such that the program can still benefit from allocating data in 2MB pages even though THP may not get triggered in the uOS. The following macros show how to get 2MB pages using mmap:

#include <sys/mman.h>
#define my_malloc(size) \
            mmap(NULL, size, PROT_READ | PROT_WRITE, \
            MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, 0, 0);
#define my_free(addr,size) munmap(addr, size);

Use library solutions such as libhugetlbfs               

Another alternative is to use a library such as libhugetlbfs to automatically allocate all malloc-ed data and static data in 2MB pages (Also works for Fortran) - Look at the tips in this article to use libhugetlbfs: http://software.intel.com/en-us/articles/optimizing-memory-bandwidth-on-stream-triad

Huge Pages in offload programs

In offload programs, THP automatic promotion applies to static data (defined on MIC side) or for dynamic data that is allocated inside an offload region using a malloc or new call.

For data allocated by #pragma offload for pointer variables in in/out/nocopy clauses, THP does not apply. You can use the env variable MIC_USE_2MB_BUFFERS (on the host) to set a threshold size beyond which allocation is done in 2MB pages. See article here for more details: http://software.intel.com/en-us/articles/effective-use-of-the-intel-compilers-offload-features

NEXT STEPS

It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™ coprocessors. The paths provided in this guide reflect the steps necessary to get best possible application performance.

BACK to Preparing for the Intel® Many Integrated Core Architecture

 

For more complete information about compiler optimizations, see our Optimization Notice.