Linux Transparent HugePages and the impact to InterSystems IRIS

** Revised Feb-12, 2018

While this article is about InterSystems IRIS, it also applies to Caché, Ensemble, and HealthShare distributions.

 

Introduction

Memory is managed in pages.  The default page size is 4KB on Linux systems.  Red Hat Enterprise Linux 6, SUSE Linux Enterprise Server 11, and Oracle Linux 6 introduced a method to provide an increased page size in 2MB or 1GB sizes depending on system configuration know as HugePages.

At first HugePages required to be assigned at boot time, and if not managed or calculated appropriately could result in wasted resources.  As a result various Linux distributions introduced Transparent HugePages with the 2.6.38 kernel as enabled by default.  This was meant as a means to automate creating, managing, and using HugePages.  Prior kernel versions may have this feature as well however may not be marked as [always] and potentially set to [madvise].  

Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.  However in current Linux releases THP can only map individual process heap and stack space.

 

The Problem

The majority of memory allocation in any Cache' system is the shared memory segments (global and routine buffers pools) and because THP does not handle these shared memory segments.  As a result THP are not used for shared memory, and are only used for each individual process.  This can be confirmed using a simple shell command.  

The following is an example from a test system at InterSystems which shows 2MB THP allocated to Cache' processes:

# grep -e AnonHugePages  /proc/*/smaps | awk  '{ if($2>4) print $0} ' |  awk -F "/"  '{print $0; system("ps -fp " $3)} '

/proc/2945/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root       2945      1  0  2015 ?        01:35:41 /usr/sbin/rsyslogd -n
/proc/70937/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70937  70897  0 Jan27 pts/0    00:01:58 /bench/EJR/ycsb161b641/bin/cache WD
/proc/70938/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70938  70897  0 Jan27 pts/0    00:00:00 /bench/EJR/ycsb161b641/bin/cache GC
/proc/70939/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70939  70897  0 Jan27 pts/0    00:00:39 /bench/EJR/ycsb161b641/bin/cache JD
/proc/70939/smaps:AnonHugePages:      4096 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70939  70897  0 Jan27 pts/0    00:00:39 /bench/EJR/ycsb161b641/bin/cache JD
/proc/70940/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70940  70897  0 Jan27 pts/0    00:00:29 /bench/EJR/ycsb161b641/bin/cache SWD 1
/proc/70941/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70941  70897  0 Jan27 pts/0    00:00:29 /bench/EJR/ycsb161b641/bin/cache SWD 2
/proc/70942/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70942  70897  0 Jan27 pts/0    00:00:29 /bench/EJR/ycsb161b641/bin/cache SWD 3
/proc/70943/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70943  70897  0 Jan27 pts/0    00:00:33 /bench/EJR/ycsb161b641/bin/cache SWD 7
/proc/70944/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70944  70897  0 Jan27 pts/0    00:00:29 /bench/EJR/ycsb161b641/bin/cache SWD 4
/proc/70945/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70945  70897  0 Jan27 pts/0    00:00:30 /bench/EJR/ycsb161b641/bin/cache SWD 5
/proc/70946/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
root      70946  70897  0 Jan27 pts/0    00:00:30 /bench/EJR/ycsb161b641/bin/cache SWD 6
/proc/70947/smaps:AnonHugePages:      4096 kB

In addition, there are potential performance penalties in the form of memory allocation delays at runtime especially for applications that may have a high rate of job or process creation.

 

The Recommendation

InterSystems recommends for the time being to disable THP as the intended performance gain is not applicable to Cache' shared memory segment, and the potential for a negative performance impact in some applications.

Check to see if your Linux system has Transparent HugePages enabled by running of the following commands:

For Red Hat Enterprise Linux kernels:

# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled

For other kernels:

# cat /sys/kernel/mm/transparent_hugepage/enabled

The above command will display whether the [always], [madvise], or [never] flag is enabled.   If THP is removed from the kernel then the /sys/kernel/mm/redhat_transparent_hugepage or /sys/kernel/mm/redhat/transparent_hugepage files do not exist.

To disable Transparent HugePages during boot perform the following steps:

1. Add the following entry to the kernel boot line in the /etc/grub.conf file:

transparent_hugepage=never

2. Reboot the operating system

There is a method to also disable THP on-the-fly, however this may not provide the desired result as that method will only stop the creation and usage of THP for new processes.  THP already created will not be disassembled into regular memory pages.  It is advised to completely reboot the system to have THP disabled at boot time.

*Note: It is recommended to confirm with your respective Linux distributor to confirm the methods used for disabling THP.

 

  • + 3
  • 1451
  • 8

Comments

I think further clarification is also needed, You mention that various Linux distributions introduce this with the 2.6.38 Kernel. However this starts with RHEL 6.0/Centos 6 .0 General Availability release. 6.8 is currently only kernel 2.6.32-642 and it has this available in it. Additional information about it's availability in version 6.0 can be found in the RHEl slideshow page 2  http://www.slideshare.net/raghusiddarth/transparent-hugepages-in-rhel-6 and on page 102 of the redhat 6.0 technical documentation https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/pdf/6.0_Technical_Notes/Red_Hat_Enterprise_Linux-6-6.0_Technical_Notes-en-US.pdf .  I have not researched when this was rolled into fedora prior to 2.6.38  but as fedora tends to be a precursor to RHEL, it might also have been before kernel 2.6.38.

It might be better to suggest that people run the check to see if it is enabled or not and that they should not be surprised if they are running a Linux with a kernel less than 2.6.38that does not support it.

Hi Alexander,

Thank you for you post.  We are only relying on what RH documentation is stating as to when THP was introduced to the main stream kernel (2.6.38) and enabled by default as noted in the RH post you referenced.  The option may have existed in previous kernels (although I would not recommending to try it), it may not have been enabled by default.  All the documentation I can find on THP support in RH references the 2.6.38 kernel where is was merged feature.

If you are finding it in previous kernels, confirm that THP are enabled by default or not.  That would be interesting to know.  Unfortunately there isn't much we can do other than to do the checks for enablement as mentioned in the post.  As the ultimate confirmation, RH and the other Linux distributions would need to update their documentation to confirm when this behavior was enacted in the respective kernel versions.  

As I mentioned in other comments, the use of THP is not necessarily a bad thing and won't cause "harm" to a system, but there may be performance impacts for applications that have a large amount of process creation as part of their application.

Kind regards,

Mark B-

Mark, may I ask your for some clarification? You wrote:

As a result THP are not used for shared memory, and are only used for each individual process. 

What's a problem here? Shared memory can use "normal" huge pages, meanwhile individual processes - THP. The memory layout on our developers' serber shows that it's possible.

# uname -a

Linux ubtst 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

# tail -11 /proc/meminfo
AnonHugePages:    131072 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:    1890
HugePages_Free:     1546
HugePages_Rsvd:      898
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      243808 kB
DirectMap2M:    19580928 kB
DirectMap1G:    49283072 kB

# ccontrol list

Configuration 'CACHE1'
        directory: /opt/cache1
        versionid: 2015.1.4.803.0.16768
        ...
# cat /opt/cache1/mgr/cconsole.log | grep Allocated
...
01/27/17-16:41:57:276 (1425) 0 Allocated 1242MB shared memory using Huge Pages: 1024MB global buffers, 64MB routine buffers

# grep -e AnonHugePages  /proc/*/smaps | awk  '{ if($2>4) print $0} ' |  awk -F "/"  '{print $0; system("ps -fp " $3)} '
...
/proc/165553/smaps:AnonHugePages:      2048 kB
UID         PID   PPID  C STIME TTY          TIME CMD
cacheusr 165553   1524  0 фев07 ?     00:00:00 cache -s/opt/cache1/mgr -cj -p18 SuperServer^%SYS.SERVER
...

Hi Alexey,

Thank you for your comment.  Yes, both THP and traditional/reserved Huge_pages can be used at the same time, however there is not benefit and in fact systems with many (thousands) of Caché processes, especially if there is a lot of process creation, has shown a performance penalty in testing.  The overhead of instantiating the THP for those processes at a high rate can be noticeable.  Your application may not exhibit this scenario and may be ok.  

The goal of this article is to provide guidance for those that may not know which is the best option to choose and/or point out that this is a change in recent Linux distributions.  You may find that THP usage is perfectly fine for your application.  There is no replacement for actual testing and benchmarking your application.  :)

Kind regards,

Mark B-

Of course their is no replacement to actual testing. What I am trying to say is that had I started reading the article  straight through instead of skimming and jumping to the how to check if it was on, I probably would have  read at the top "various Linux distibutions introduced Transparent HugePages with the 2.6.38 kernel' and stopped because my kernel is less than that.  I really  think that the current wording will lead people whom work at shops that are still rolling out new builds in RHEL or Centos 6 not to use the ideal settings.     Maybe a complete re-arrange of the first three paragraphs into two or three paragraphs where the RHEL 6,...  might make this clearer.  With a sentence that reads something like, "This was first introduced in Red Hat Enterprise Linux 6, SUSE Linux Enterprise Server 11, and Oracle Linux 6; and then later introduced in may other Linux variants with the 2.6.38 kernel.

Additionally it might make things clearer if you where to mention that for It's setting the item in brackets is what is the current setting as in my redhat the lines reads .   [always] madvise never.

It might also be useful to people on this to mention what to do in the case where the transparent Huge pages enabled is set to madvise .    

I will revise the post to be more clear that THP is enabled by default in 2.6.38 kernel but may be available in prior kernels and to reference your respective Linux distributions documentation for confirming and changing the setting.  Thanks for your comments.

I do not claim to be a  Huge Pages expert, but I have been doing some more reading on Transparent Huge pages and the madvise option.  The following is untested and un-verified.

It seem like if you are running Kernel 2.6.38 or newer that you may be able to use the madvise instead of never for the Transparent Huge Pages setting. According to http://manpages.ubuntu.com/manpages/trusty/man2/madvise.2.html the 2.6.38 kernel’s madvise has a MADV_HUGEPAGE option, that allows applications to enable Transparent Huge pages, If no MADV_* flag is thrown then it defaults to MADV_NORMAL or no special treatment. I believe this means that transparent huge pages should be off by default.  

If you are using RHEL 6 or probably most of its derivatives even though they have a madvise setting for their Transparent Huge pages settings it appears RHEL did not backport the MADV_HUGEPAGE Option to their madvise/Kernel (At least 2.6.32-504.81 and lower), so you have to set the box’s transparent Huge pages to never.   (Man page in RHEL 6 with kernel= 2.6.32-504.8.1 lacking a MADV_HUGEPAGE and https://groups.google.com/forum/#!topic/tokudb-dev/_1YNBMlHftU Bradly Kuszmaul’s 5/8/13 post.)

RHEL 7 & it’s derivatives are running the 3.X kernel and that man pages show a  MADV_HUGEPAGE option so it looks like you can set the box to madvise and it will not use transparent huge pages.

Once again I am not a Transparent Huge Pages expert and have not done any testing to verify the validity of this.