Download Android App


Alternate Blog View: Timeslide Sidebar Magazine

Sunday, February 19, 2012

BigMemory: Scaling vertically

Until recently, Moore's Law resulted in faster CPUs, but physical constraints - heat dissipation, for example - and computer requirements force manufacturers to place multiple cores to single CPU wafers. Increases in memory, however, are unconstrained by this type of physical requirement. For instance, today you can purchase standard Von Neumann servers from Oracle, Dell and HP with up to 2 TB of physical RAM and 64 cores. Servers with 32 cores and 512 GB of RAM are certainly more typical, but it's clear that today's commodity servers are now “big iron” in their own right.

The following table shows the random access times for different storage technologies:

Storage Technology Latency
Registers 1-3ns
CPU L1 Cache 2-8ns
CPU L2 Cache 5-12ns
Memory (RAM) 10-60ns
High-speed network gear 10,000-30,000ns
Solid State Disk (SSD) Drives 70,000-120,000ns
Hard Disk Drives 3,000,000-10,000,000ns

Since most enterprise applications tend to be I/O bound (i.e. they spend too much time waiting for data stored on disk), it follows that these applications would benefit greatly from the use of the lower-latency forms of storage at the top of this hierarchy. Specifically, today's time-sensitive enterprise applications would speed up significantly without much modification if they could replace all disk access with memory usage.

To drive this point home further, note that with modern network technology, latencies are at worst around 10,000-30,000 ns, with even lower latencies and higher speeds possible. This means that with the right equipment, accessing memory on other servers over the network is still much faster than reading from a local hard disk drive. All of this proves that as an enterprise architect or developer, your goal should be to use as much memory as possible in your applications.

The original Java language and platform design took into account the problems developers had when manually managing memory with other languages. For instance, when memory management goes wrong, developers experience memory leaks (lack of memory de-allocation) or memory access violations due to accessing memory that has already been de-allocated or attempting to de-allocate memory more than once. To relieve developers of these potential problems, Java implemented automatic memory management of the Java heap with a garbage collector (GC). When a running program no longer requires specific objects, the Java garbage collector reclaims its memory within the Java heap. Memory management is no longer an issue for Java developers, which results in greater productivity overall.

Garbage collection works reasonably well, but it becomes increasingly stressed as the size of the Java heap and numbers of live objects within it increase. Today, GC works well with an occupied Java heap around 3-4 GB in size, which also just happens to be the 32-bit memory limit.

The size limits imposed by Java garbage collection explain why 64-bit Java use remains a minority despite the availability of commodity 64-bit CPUs, operating systems and Java for half a decade. Attempts in Java to consume a heap beyond 3-4 GB in size can result in large garbage collection pauses (where application threads are stopped so that the GC can reclaim dead objects), unpredictable application response times and large latencies that can violate your application's service level agreements. With large occupied Java heaps, it's not uncommon to experience multi-second pauses, often at the most inopportune moments.

Solving the Java Heap/GC Problem

Large heaps are desirable in cases such as in-process caching and sessions storage. Both of these use cases use a map-like API where a framework allocates and de-allocates resources programmatically with puts and removes, opening up a way to constrain and solve the garbage collection problem.

BigMemory implementation from Terracotta and Apache (incubated) is an all-Java implementation built on Java's advanced NIO technology.  BigMemory is just a hair slower than Java heap. Its in process with the JVM so there is no management complexity and it is pure Java so there is no deployment complexity or sensitivity to JVM version. It creates a cache store in memory but outside the Java heap using Direct Byte Buffers. By storing data off heap, the garbage collector does not know about it and therefore does not collect it. Instead, BigMemory responds to the put and remove requests to allocate and free memory in its managed byte buffer.

This lets you keep the Java heap relatively small (1-2GB in size), while using the maximum amount of objects within physical memory. As a result, BigMemory can create caches in memory that match physical RAM limits (i.e. 2TB today and more in the future), without the garbage collection penalties that usually come with a Java heap of that size. By storing your application's data outside of the Java heap but within RAM inside your Java process, you get all the benefits of in-memory storage without the traditional Java costs.

How does ByteBuffer help?

Prior to JDK 1.4, Java programmers had limited options: they could read data into a byte[] and use explicit offsets (along with bitwise operators) to combine bytes into larger entities, or they could wrap the byte[] in a DataInputStream and get automatic conversion without random access.

The ByteBuffer class arrived in JDK 1.4 as part of the java.nio package, and combines larger-than-byte data operations with random access. To construct a simple cache using ByteBuffer, see this article.

For those looking for in-depth explanation on the topic, read the article here. It is a long read but it is worth the information gain.

Note: Essentially, both these products are managing contiguous region of memory. Even though  the approach described above avoids GC, fragmentation in any contiguous region eventually has a cost. The compaction cycle would happen far less often than a JVM garbage collection cycle would, so while it would cruelly affect performance during the cycle, it would occur fairly rarely.

That brings up another topic: how does the non-heap memory for direct buffers get released? After all, there's no method to explicitly close or release them. The answer is that they get garbage collected like any other object, but with one twist: if you don't have enough virtual memory space or commit charge to allocate a direct buffer, that will trigger a full collection even if there's plenty of heap memory available. -

Conclusion

It still makes sense to scale horizontally. Even so, you still leverage vertical scalability with BigMemory, which makes the distributed cache faster with higher density.

Further reading
http://raffaeleguidi.wordpress.com/

No comments:

Post a Comment