Pivotal Greenplum all versions
This article will explain memory configuration in Pivotal Greenplum (GPDB).
- Segment Level Memory Configuration
Pivotal Greenplum consists of multiple segments. When the query is executed, first, the query plan is prepared on the master segment and then the query plan is sent and executed on the segments. Every segment executes the query plan by itself (with certain communications happening among segments, of course). When each segment executes the query plan, it will do this with its own memory allocation, its own local transactions, and its own data files.
The total amount of memory a segment is allowed to allocate at any time is controlled by the gp_vmem_protect_limit. This means that all sessions (session processes) executing at a certain time on a specific segment should not use more than this amount of memory. The next session that crosses the limit will be terminated with the Out of Memory (OOM) VM Protect error. Note that this may not be the session that is hogging most of the memory, but the session that crosses the limit.
gp_vmem_protect_limit=8GB, Session1: 4GB (running), Session2: 3GB (running) Session3: 1.5GB (terminated with OOM error, crossed the 8GB limit)
To arrive at a gp_vmem_protect_limit for a particular cluster please use the following link:
Hint: Add 2 to # of Primary segments per host to account for any failovers Physical RAM and swap can be found from command "free -g", shows in GB
To change gp_vmem_protect_limit, a Greenplum Instance restart is required; follow these instructions to change:
gpconfig -s gp_vmem_protect_limit
gpconfig -c gp_vmem_protect_limit -v <value-from-vmem-calc-in-MB>gpconfig -c gp_vmem_protect_limit -v 10922
It is necessary to do a gpstop/gpstart after this, so there would be a downtime for the change to take effect:
gpstop -a gpstart -a gpconfig -s gp_vmem_protect_limit -->> Should return 10922
As root cp -p /etc/sysctl.conf /etc/sysctl.conf.<date> vi /etc/sysctl.conf below vm.overcommit_memory add below line vm.overcommit_ratio = <value-from-vmem-calc>, say 90 save and exit file sysctl -p to verify changes do below cat /proc/sys/vm/overcommit_ratio -->> should return 90
Allowing GPDB segments to allocate up to the full amount of the physical memory will possibly lead to memory usage which is (a little bit) more than the physical memory. This is because kernel, mirror segments, and other applications can use some memory too, but it should not be too much more. This may lead to a little bit of paging, but in general, will not lead to excessive swapping.
If there are other memory intensive applications/processes running on the server, the gp_vmem_protect_limit should be adjusted accordingly by lowering the amount of available server memory by the amount used by the memory intensive processes.
If the preference is towards swapping, but no OOM errors, then the gp_vmem_protect_limit can be configured so that some of the swap memory is used in the calculation. This will make OOM errors fewer, but some sessions could be slowed down because of the swap in/swap out activity. Also, make sure you understand vm.overcommit_* parameters so the total available server memory on OS level is not exhausted (hard OOM error).
On DCA: RAM=48GB, swap=48GB, vm.overcommit_memory=2, vm.overcommit_ratio=50, the total memory available before OS returns OOM is swap+50%ofRAM=48+24=72GB. The total memory all processes on this server can use should be less than 72GB. Adjust gp_vmem_protect_limit accordingly.
- Statement Level Memory Configuration
Memory configuration policy is dictated by parameter "gp_resqueue_memory_policy". "gp_resqueue_memory_policy=manual" means using parameter "work_mem" per operator node (this is obsolete, left for compatibility only). The new options "auto"/"eager_free" rely on the "statement_mem" parameter. "statement_mem" parameter sets the amount of memory the statement can use. This memory is then distributed among the query plan operators according to the operator need for memory. In "auto" mode the memory is distributed among query plan operators statically and not changed during plan execution. In "eager_free" mode, the memory is used and reused within the plan execution (if a specific node or set of nodes are finished with data processing and not needed anymore, their memory is freed immediately and given to other nodes that are running currently).
Each session can control the "statement_mem" - the parameter can be changed in the session between queries. This means that you can set statement_mem=100MB if you have a small query to run and after that set statement_mem=2GB if you have a "huge" query to run, all within the same session. Every user can change "statement_mem", up to the value of "max_statement_mem" which can be set only by superuser in the postgresql.conf file.
Another way to limit statement memory is on resource queue level. Resource queues have active connections limit and memory limit. Each session gets "memory_limit"/"active_connections_limit" memory. This former is true unless "statement_mem" is configured higher than that value, in this case, the session will get "statement_mem" amount of memory, but the session that goes over the RQ memory limit will have to wait until memory is freed.
Resource queue or RQ active sessions = 10, RQ memory limit=8GB, statement_mem=125MB -> each session on the RQ gets 800MB (> 125MB), all sessions will use in total 8GB.
RQ active sessions = 10, RQ memory limit=8GB, statement_mem=125MB -> each session on the RQ should use 800MB. If the first session sets the statement_mem=4GB manually (4GB > 125MB), second session sets statement_mem=3GB (3GB > 125MB), then 7GB of RQ memory limit is already used. Then the third session can run (7GB + 800MB=7.8GB), but Session 4 (needs 800MB more) will be held by the resource manager until memory is freed by the other sessions running on this queue.
- OOM errors
There are many different OOM errors that come from different places of the database. There are two general groups of errors:
- "VM Protect" errors ("VM Protect failed to allocate %d bytes, %d MB available"): these are logical OOM errors. They mean that the database is stopping a session from allocating more memory, because it has reached the VM Protect Limit value. This means that sessions executing on this segment are attempting together to use more than configured limit. The appropriate action, in this case, is to attempt to determine whether memory parameters are set properly and whether they can support the concurrency in the system.
gp_vmem_protect_limit=8GB, statement_mem=512MB -> max concurrency of 16 can be supported (16 * 512MB = 8GB).
gp_vmem_protect_limit=8GB, statement_mem=256MB -> max concurrency of 32 can be supported (32 * 256MB = 8GB).
- OS level OOM errors ("Out of memory. Failed on request of size %lu bytes."): these are OS OOM errors. They mean that the database allows sessions to allocate memory, but the OS does not have more memory left. Basically, this means an overcommitted gp_vmem_protect_limit.
RAM=48GB, SWAP=48GB, 6 primary segments, gp_vmem_protect_limit=8GB > total memory available per server = 72GB (48+50%of48), total memory used by database = 6x8=48GB, total memory used by the server = 50GB (approximately, including kernel, mirror segments, etc.), 50GB < 72GB (no OS OOM errors).
RAM=48GB, SWAP=48GB, 6 primary segments, gp_vmem_protect_limit=8GB, 10JVMs using 4GB each > total memory available per server = 72GB (48+50%of48), total memory used by database = 6x8=48GB, total memory used by the server = 50GB (approximately, including kernel, mirror segments, etc.) + 10x4GB (JVMs) = 90GB, 90GB > 72GB > some process (may be GPDB) will get OS OOM error.