Solve ‘MemoryError: Unable to allocate XXXGiB for an array’ by setting Overcommit and Swap
In my recent RL project, I need to generate a multidimensional Numpy array for a Q-table.
self.qtable = np.zeros((2,2,2,2,2,2,2,2,2,2,10,2,10,2,100000))
However, as the array has a really big size, the terminal reports an error:
MemoryError: Unable to allocate 305. GiB for an array with shape (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 10, 2, 100000) and data type float64
Step1: Overcommit handling
Firstly, it exposes the problem of my ubuntu system’s overcommit handling mode.
The default overcommit handling is set to
0 , which means:
Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. The root is allowed to allocate slightly more memory in this mode. This is the default.
305. GiB is much greater than my actual physical memory space
32. GiB . This obvious overcommit of address space is refused.
To check the current overcommit mode, run:
$ cat /proc/sys/vm/overcommit_memory
In my situation, as the array should be sparse, actually it will not occupy
305. GiB memory space as what it claims. So It is fine to allow the overcommit.
To enable the overcommit, run:
$ sudo -i $ echo 1 > /proc/sys/vm/overcommit_memory
Changing this to
Always overcommit. Appropriate for some scientific applications. A classic example is a code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.
Now the system will allow you to declare a large array without worrying about how large it is. The system will only allocate physical memory pages for those explicit data in your sparse array.
Step2: Swap (Virtual memory)
After setting the ‘Overcommit handling mode’ to
1, I can start training my Q-learning model.
However, the ‘sparse’ array becomes denser and denser during the training progress. Finally, it runs out of my
32. GiB physical memory space at around 4750000 epochs (Total epochs: 8000000).
As a result, the system has to terminate the training before it is finished. :(
I need more available memory space!! I realize that increasing the
Swap size could be helpful.
Swap is a special file located on your hard disk, which the system could use as an additional virtual memory space.
The default size of the
Swap on my
ubuntu20.04 system is
2. GiB, which is insufficient.
To extend the
- Turn off all
$ sudo swapoff -a
- Resize the
Swap: (In my situation, I estimate an additional
32GiBvirtual memory should be sufficient)
$ sudo dd if=/dev/zero of=/swapfile bs=1G count=32
if = input file of = output file bs = block size count = multiplier of blocks
- Change its permission:
$ sudo chmod 600 /swapfile
600means that the owner has full read and write access to the file, while no other user can access the file)
- Set the file as
Swapand activate it:
$ sudo mkswap /swapfile $ sudo swapon /swapfile
- Check the file
$ sudo vi /etc/fstab
Check whether the following command is there. If not, add it to the bottom.
/swapfile none swap sw 0 0
- Check the amount of
$ grep SwapTotal /proc/meminfo
$ free -t -m
After finishing these two steps, I can run my RL model without any error related to the memory space.
- “Unable to allocate array with shape and data type.” https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type
- “Change swap size in Ubuntu 18.04 or newer.” https://bogdancornianu.com/change-swap-size-in-ubuntu/
Leave a comment