diff -r ed765f557481 pypy/doc/gc_info.rst --- a/pypy/doc/gc_info.rst Thu Feb 15 14:05:41 2018 +0100 +++ b/pypy/doc/gc_info.rst Thu Feb 15 15:52:14 2018 +0100 @@ -15,14 +15,41 @@ processes) and cache sizes you might want to experiment with it via *PYPY_GC_NURSERY* environment variable. When the nursery is full, there is performed a minor collection. Freed objects are no longer referencable and -just die, without any effort, while surviving objects from the nursery -are copied to the old generation. Either to arenas, which are collections -of objects of the same size, or directly allocated with malloc if they're big -enough. +just die, just by not being referenced any more; on the other hand, objects +found to still be alive must survive and are copied from the nursery +to the old generation. Either to arenas, which are collections +of objects of the same size, or directly allocated with malloc if they're +larger. (A third category, the very large objects, are initially allocated +outside the nursery and never move.) Since Incminimark is an incremental GC, the major collection is incremental, meaning there should not be any pauses longer than 1ms. + +Fragmentation +------------- + +Before we discuss issues of "fragmentation", we need a bit of precision. +There are two kinds of related but distinct issues: + +* If the program allocates a lot of memory, and then frees it all by + dropping all references to it, then we might expect to see the RSS + to drop. (RSS = Resident Set Size on Linux, as seen by "top"; it is an + approximation of the actual memory usage from the OS's point of view.) + This might not occur: the RSS may remain at its highest value. This + issue is more precisely caused by the process not returning "free" + memory to the OS. We call this case "unreturned memory". + +* After doing the above, if the RSS didn't go down, then at least future + allocations should not cause the RSS to grow more. That is, the process + should reuse unreturned memory as long as it has got some left. If this + does not occur, the RSS grows even larger and we have real fragmentation + issues. + + +gc.get_stats +------------ + There is a special function in the ``gc`` module called ``get_stats(memory_pressure=False)``. @@ -56,19 +83,30 @@ In this particular case, which is just at startup, GC consumes relatively little memory and there is even less unused, but allocated memory. In case -there is a high memory fragmentation, the "allocated" can be much higher -than "used". Generally speaking, "peak" will more resemble the actual -memory consumed as reported by RSS, since returning memory to the OS is a hard -and not solved problem. +there is a lot of unreturned memory or actual fragmentation, the "allocated" +can be much higher than "used". Generally speaking, "peak" will more closely +resemble the actual memory consumed as reported by RSS. Indeed, returning +memory to the OS is a hard and not solved problem. In PyPy, it occurs only if +an arena is entirely free---a contiguous block of 64 pages of 4 or 8 KB each. +It is also rare for the "rawmalloced" category, at least for common system +implementations of ``malloc()``. The details of various fields: -* GC in arenas - small old objects held in arenas. If the amount of allocated - is much higher than the amount of used, we have large fragmentation issue +* GC in arenas - small old objects held in arenas. If the amount "allocated" + is much higher than the amount "used", we have unreturned memory. It is + possible but unlikely that we have internal fragmentation here. However, + this unreturned memory cannot be reused for any ``malloc()``, including the + memory from the "rawmalloced" section. -* GC rawmalloced - large objects allocated with malloc. If this does not - correspond to the amount of RSS very well, consider using jemalloc as opposed - to system malloc +* GC rawmalloced - large objects allocated with malloc. This is gives the + current (first block of text) and peak (second block of text) memory + allocated with ``malloc()``. The amount of unreturned memory or + fragmentation caused by ``malloc()`` cannot easily be reported. Usually + you can guess there is some if the RSS is much larger than the total + memory reported for "GC allocated", but do keep in mind that this total + does not include malloc'ed memory not known to PyPy's GC at all. If you + guess there is some, consider using jemalloc as opposed to system malloc. * nursery - amount of memory allocated for nursery, fixed at startup, controlled via an environment variable @@ -91,7 +129,7 @@ ``PYPY_GC_NURSERY`` The nursery size. - Defaults to 1/2 of your cache or ``4M``. + Defaults to 1/2 of your last-level cache, or ``4M`` if unknown. Small values (like 1 or 1KB) are useful for debugging. ``PYPY_GC_NURSERY_DEBUG``