Tuesday, August 18, 2015

How does garbage collection work in JVM (Question excerpt from Quora)

Answered by Li Pi

https://www.quora.com/How-does-garbage-collection-work-in-the-JVM

This actually varies depending on the JVM implementation, but I'm assuming you're talking about Oracle (Sun) Version 6. Java itself does not specify a particular method of garbage collection.

The JVM uses a form of garbage collector called a tracing collector, which essentially operates by first stopping the world, marking all root objects, or objects that are referenced directly by running threads, and following references, marking each object it hits along the way.

Java 6 implements something called a generational garbage collector—based upon the generational hypothesis assumption, which states that the majority of objects that are created are quickly discarded, and that objects that are not quickly collected are likely to be around for some time.

Based upon these assumptions, Java therefore partitions objects into two different generations, and then operates differently upon them.

Visually, the generations look like this:

(Not quite to scale)
Young Generation: This is where objects start out. It has two subgenerations:
  • Eden - Objects start out here.
  • Survivor - Objects that survive Eden end up here. There are two of these, and only one is in use at any given time. One is designated as empty, and the other as live. This switched every GC cycle.

Tenured Generation: Older objects with longer lifetimes end up here.

Java is smart enough to apply different garbage collection methods to each generation. The young generation is handled using a tracingcopyingcollector called the Parallel New Collector. This collector stops the world, but because the young generation is generally small, the pause is short.

For the young generation:

When Eden fills up, the garbage collector stops the world, then traces through the objects in the young generation, starting with those referenced immediately by a running thread.

Those that are marked or "alive" are copied over to the empty survivor space. This survivor space is then marked as "live", and Eden, along with the other survivor space, is marked as empty. This has the side effect of compacting all the objects into a single survivor space, allowing for rather efficient memory usage. If an object has been copied between the two survivor places a certain amount of times, its designated as tenured, and moved to the tenured section.

Eden will now be overwritten by new objects, and the next garbage collection cycle will proceed to use the other survivor space.

This usage of a copying collector for the young generation is fast because the vast majority of objects are very quickly destroyed, and generally, very few objects must be moved around.

For the tenured generation:

I'm going to steal this section from this blog post by Todd Lipcon,http://www.cloudera.com/blog/201... 

Tenured Generation – Concurrent Mark-Sweep

Every time the parallel new collector runs, it will tenure some objects into the tenured generation. So, of course, the old generation will eventually fill up, and we need a strategy for collecting it as well. The Concurrent-Mark-Sweep collector (CMS) is responsible for clearing dead objects in this generation.

The CMS collector operates in a series of phases. Some phases stop the world, and others run concurrently with the Java application. The major phases are:
  1. initial-mark (stops the world). In this phase, the CMS collector places amark on the rootobjects. A root object is something directly referenced from a live Thread – for example, the local variables in use by that thread. This phase is short because the number of roots is very small.
  2. concurrent-mark (concurrent). The collector now follows every pointer starting from the root objects until it has marked all live objects in the system.
  3. remark (stops the world). Since objects might have had references changed, and new objects might have been created during concurrent-mark, we need to go back and take those into account in this phase. This is short because a special data structure allows us to only inspect those objects that were modified during the prior phase.
  4. concurrent-sweep (concurrent). Now, we proceed through all objects in the heap. Any object without a mark is collected and considered free space. New objects allocated during this time are marked as they are created so that they aren’t accidentally collected.

The important things to note here are:
  • The stop-the-world phases are made to be very short. The long work of scanning the whole heap and sweeping up the dead objects happens concurrently.
  • This collector does not relocate the live objects, so free space can be spread in different chunks throughout the heap. We’ll come back to this later!

No comments:

Post a Comment