`

检测Java对象所占内存大小

    博客分类:
  • java
阅读更多
Don't pay the price for hidden class fields
By Vladimir Roubtsov, JavaWorld.com, 08/16/02

Recently, I helped design a Java server application that resembled an in-memory database. That is, we biased the design toward caching tons of data in memory to provide super-fast query performance.

Once we got the prototype running, we naturally decided to profile the data memory footprint after it had been parsed and loaded from disk. The unsatisfactory initial results, however, prompted me to search for explanations.

Since Java purposefully hides many aspects of memory management, discovering how much memory your objects consume takes some work. You could use the Runtime.freeMemory() method to measure heap size differences before and after several objects have been allocated. Several articles, such as Ramchander Varadarajan's "Question of the Week No. 107" (Sun Microsystems, September 2000) and Tony Sintes's "Memory Matters" (JavaWorld, December 2001), detail that idea. Unfortunately, the former article's solution fails because the implementation employs a wrong Runtime method, while the latter article's solution has its own imperfections:

   
    A single call to Runtime.freeMemory() proves insufficient because a JVM may decide to increase its current heap size at any time (especially when it runs garbage collection). Unless the total heap size is already at the -Xmx maximum size, we should use Runtime.totalMemory()-Runtime.freeMemory() as the used heap size.
   
    Executing a single Runtime.gc() call may not prove sufficiently aggressive for requesting garbage collection. We could, for example, request object finalizers to run as well. And since Runtime.gc() is not documented to block until collection completes, it is a good idea to wait until the perceived heap size stabilizes.

   
    If the profiled class creates any static data as part of its per-class class initialization (including static class and field initializers), the heap memory used for the first class instance may include that data. We should ignore heap space consumed by the first class instance.


Considering those problems, I present Sizeof, a tool with which I snoop at various Java core and application classes:

public class Sizeof
{
    public static void main (String [] args) throws Exception
    {
        // Warm up all classes/methods we will use
        runGC ();
        usedMemory ();
        // Array to keep strong references to allocated objects
        final int count = 100000;
        Object [] objects = new Object [count];
        
        long heap1 = 0;
        // Allocate count+1 objects, discard the first one
        for (int i = -1; i < count; ++ i)
        {
            Object object = null;
            
            // Instantiate your data here and assign it to object
            
            object = new Object ();
            //object = new Integer (i);
            //object = new Long (i);
            //object = new String ();
            //object = new byte [128][1]
            
            if (i >= 0)
                objects [i] = object;
            else
            {
                object = null; // Discard the warm up object
                runGC ();
                heap1 = usedMemory (); // Take a before heap snapshot
            }
        }
        runGC ();
        long heap2 = usedMemory (); // Take an after heap snapshot:
        
        final int size = Math.round (((float)(heap2 - heap1))/count);
        System.out.println ("'before' heap: " + heap1 +
                            ", 'after' heap: " + heap2);
        System.out.println ("heap delta: " + (heap2 - heap1) +
            ", {" + objects [0].getClass () + "} size = " + size + " bytes");
        for (int i = 0; i < count; ++ i) objects [i] = null;
        objects = null;
    }
    private static void runGC () throws Exception
    {
        // It helps to call Runtime.gc()
        // using several method calls:
        for (int r = 0; r < 4; ++ r) _runGC ();
    }
    private static void _runGC () throws Exception
    {
        long usedMem1 = usedMemory (), usedMem2 = Long.MAX_VALUE;
        for (int i = 0; (usedMem1 < usedMem2) && (i < 500); ++ i)
        {
            s_runtime.runFinalization ();
            s_runtime.gc ();
            Thread.currentThread ().yield ();
            
            usedMem2 = usedMem1;
            usedMem1 = usedMemory ();
        }
    }
    private static long usedMemory ()
    {
        return s_runtime.totalMemory () - s_runtime.freeMemory ();
    }
    
    private static final Runtime s_runtime = Runtime.getRuntime ();
} // End of class



Sizeof's key methods are runGC() and usedMemory(). I use a runGC() wrapper method to call _runGC() several times because it appears to make the method more aggressive. (I am not sure why, but it's possible creating and destroying a method call-stack frame causes a change in the reachability root set and prompts the garbage collector to work harder. Moreover, consuming a large fraction of the heap space to create enough work for the garbage collector to kick in also helps. In general, it is hard to ensure everything is collected. The exact details depend on the JVM and garbage collection algorithm.)

Note carefully the places where I invoke runGC(). You can edit the code between the heap1 and heap2 declarations to instantiate anything of interest.

Also note how Sizeof prints the object size: the transitive closure of data required by all count class instances, divided by count. For most classes, the result will be memory consumed by a single class instance, including all of its owned fields. That memory footprint value differs from data provided by many commercial profilers that report shallow memory footprints (for example, if an object has an int[] field, its memory consumption will appear separately).
The results

Let's apply this simple tool to a few classes, then see if the results match our expectations.

Note: The following results are based on Sun's JDK 1.3.1 for Windows. Due to what is and is not guaranteed by the Java language and JVM specifications, you cannot apply these specific results to other platforms or other Java implementations.
java.lang.Object

Well, the root of all objects just had to be my first case. For java.lang.Object, I get:

'before' heap: 510696, 'after' heap: 1310696
heap delta: 800000, {class java.lang.Object} size = 8 bytes



So, a plain Object takes 8 bytes; of course, no one should expect the size to be 0, as every instance must carry around fields that support base operations like equals(), hashCode(), wait()/notify(), and so on.
java.lang.Integer

My colleagues and I frequently wrap native ints into Integer instances so we can store them in Java collections. How much does it cost us in memory?

'before' heap: 510696, 'after' heap: 2110696
heap delta: 1600000, {class java.lang.Integer} size = 16 bytes




The 16-byte result is a little worse than I expected because an int value can fit into just 4 extra bytes. Using an Integer costs me a 300 percent memory overhead compared to when I can store the value as a primitive type.
java.lang.Long

Long should take more memory than Integer, but it does not:

'before' heap: 510696, 'after' heap: 2110696
heap delta: 1600000, {class java.lang.Long} size = 16 bytes


Clearly, actual object size on the heap is subject to low-level memory alignment done by a particular JVM implementation for a particular CPU type. It looks like a Long is 8 bytes of Object overhead, plus 8 bytes more for the actual long value. In contrast, Integer had an unused 4-byte hole, most likely because the JVM I use forces object alignment on an 8-byte word boundary.
Arrays

Playing with primitive type arrays proves instructive, partly to discover any hidden overhead and partly to justify another popular trick: wrapping primitive values in a size-1 array to use them as objects. By modifying Sizeof.main() to have a loop that increments the created array length on every iteration, I get for int arrays:

length: 0, {class [I} size = 16 bytes
length: 1, {class [I} size = 16 bytes
length: 2, {class [I} size = 24 bytes
length: 3, {class [I} size = 24 bytes
length: 4, {class [I} size = 32 bytes
length: 5, {class [I} size = 32 bytes
length: 6, {class [I} size = 40 bytes
length: 7, {class [I} size = 40 bytes
length: 8, {class [I} size = 48 bytes
length: 9, {class [I} size = 48 bytes
length: 10, {class [I} size = 56 bytes



and for char arrays:

length: 0, {class [C} size = 16 bytes
length: 1, {class [C} size = 16 bytes
length: 2, {class [C} size = 16 bytes
length: 3, {class [C} size = 24 bytes
length: 4, {class [C} size = 24 bytes
length: 5, {class [C} size = 24 bytes
length: 6, {class [C} size = 24 bytes
length: 7, {class [C} size = 32 bytes
length: 8, {class [C} size = 32 bytes
length: 9, {class [C} size = 32 bytes
length: 10, {class [C} size = 32 bytes




Above, the evidence of 8-byte alignment pops up again. Also, in addition to the inevitable Object 8-byte overhead, a primitive array adds another 8 bytes (out of which at least 4 bytes support the length field). And using int[1] appears to not offer any memory advantages over an Integer instance, except maybe as a mutable version of the same data.
Multidimensional arrays

Multidimensional arrays offer another surprise. Developers commonly employ constructs like int[dim1][dim2] in numerical and scientific computing. In an int[dim1][dim2] array instance, every nested int[dim2] array is an Object in its own right. Each adds the usual 16-byte array overhead. When I don't need a triangular or ragged array, that represents pure overhead. The impact grows when array dimensions greatly differ. For example, a int[128][2] instance takes 3,600 bytes. Compared to the 1,040 bytes an int[256] instance uses (which has the same capacity), 3,600 bytes represent a 246 percent overhead. In the extreme case of byte[256][1], the overhead factor is almost 19! Compare that to the C/C++ situation in which the same syntax does not add any storage overhead.
java.lang.String

Let's try an empty String, first constructed as new String():

'before' heap: 510696, 'after' heap: 4510696
heap delta: 4000000, {class java.lang.String} size = 40 bytes




The result proves quite depressing. An empty String takes 40 bytes—enough memory to fit 20 Java characters.

Before I try Strings with content, I need a helper method to create Strings guaranteed not to get interned. Merely using literals as in:

    object = "string with 20 chars";




will not work because all such object handles will end up pointing to the same String instance. The language specification dictates such behavior (see also the java.lang.String.intern() method). Therefore, to continue our memory snooping, try:

    public static String createString (final int length)
    {
        char [] result = new char [length];
        for (int i = 0; i < length; ++ i) result [i] = (char) i;
        
        return new String (result);
    }




After arming myself with this String creator method, I get the following results:

length: 0, {class java.lang.String} size = 40 bytes
length: 1, {class java.lang.String} size = 40 bytes
length: 2, {class java.lang.String} size = 40 bytes
length: 3, {class java.lang.String} size = 48 bytes
length: 4, {class java.lang.String} size = 48 bytes
length: 5, {class java.lang.String} size = 48 bytes
length: 6, {class java.lang.String} size = 48 bytes
length: 7, {class java.lang.String} size = 56 bytes
length: 8, {class java.lang.String} size = 56 bytes
length: 9, {class java.lang.String} size = 56 bytes
length: 10, {class java.lang.String} size = 56 bytes




The results clearly show that a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead. For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.

Of course, the penalty depends on your application's data distribution. Somehow I suspected that 10 characters represents the typical String length for a variety of applications. To get a concrete data point, I instrumented the SwingSet2 demo (by modifying the String class implementation directly) that came with JDK 1.3.x to track the lengths of the Strings it creates. After a few minutes playing with the demo, a data dump showed that about 180,000 Strings were instantiated. Sorting them into size buckets confirmed my expectations:
[0-10]:  96481
[10-20]: 27279
[20-30]: 31949
[30-40]: 7917
[40-50]: 7344
[50-60]: 3545
[60-70]: 1581
[70-80]: 1247
[80-90]: 874
...



That's right, more than 50 percent of all String lengths fell into the 0-10 bucket, the very hot spot of String class inefficiency!

In reality, Strings can consume even more memory than their lengths suggest: Strings generated out of StringBuffers (either explicitly or via the '+' concatenation operator) likely have char arrays with lengths larger than the reported String lengths because StringBuffers typically start with a capacity of 16, then double it on append() operations. So, for example, createString(1) + ' ' ends up with a char array of size 16, not 2.
What do we do?

"This is all very well, but we don't have any choice but to use Strings and other types provided by Java, do we?" I hear you ask. Let's find out.
Wrapper classes

Wrapper classes like java.lang.Integer seem a bad choice for storing large data amounts in memory. If you strive to be memory-economic, avoid them altogether. Rolling your own vector class for primitive ints isn't difficult. Of course, it would be great if the Java core API already contained such libraries. Perhaps the situation will improve when Java has generic types.
Multidimensional arrays

For large data structures built with multidimensional arrays, you can oftentimes reduce the extra dimension overhead by an easy indexing change: convert every int[dim1][dim2] instance to an int[dim1*dim2] instance and change all expressions like a[i][j] to a[i*dim1 + j]. Of course, you pay a price from the lack of index-range checking on dim1 dimension (which also boosts performance).
java.lang.String

You can try a few simple tricks to reduce your application's String static memory size.

First, you can try one common technique when an application loads and caches many Strings from a data file or a network connection, and the String value range proves limited. For example, if you want to parse an XML file in which you frequently encounter a certain attribute, but the attribute is limited to just two possible values. Your goal: filter all Strings through a hash map and reduce all equal but distinct Strings to identical object references:

    public String internString (String s)
    {
        if (s == null) return null;
        
        String is = (String) m_strings.get (s);
        if (is != null)
            return is;
        else
        {
            m_strings.put (s, s);
            return s;
        }
    }
    
    private Map m_strings = new HashMap ();


When applicable, that trick can decrease your static memory requirements by hundreds of percent. An experienced reader may observe that the trick duplicates java.lang.String.intern()'s functionality. Numerous reasons exist to avoid the String.intern() method. One is that few modern JVMs can intern large amounts of data.

What if your Strings are all different? For the second trick, recollect that for small Strings the underlying char array takes half the memory occupied by the String that wraps it. Thus, when my application caches many distinct String values, I can just keep the arrays in memory and convert them to Strings as needed. That works well if each such String then serves as a transient, quickly discarded object. A simple experiment with caching 90,000 words taken from a sample dictionary file shows that this data takes about 5.6 MB in String form and only 3.4 MB in char[] form, a 65 percent reduction.

The second trick contains one obvious disadvantage: you cannot convert a char[] back to a String through a constructor that would take ownership of the array without cloning it. Why? Because the entire public String API ensures that every String is immutable, so every String constructor defensively clones input data passed through its parameters.

Still, you can try a third trick when the cost of converting from char arrays to Strings proves too high. The trick exploits java.lang.String.substr()'s ability to avoid data copying: the method implementation exploits String immutability and creates a shallow String object that shares the char content array with the original String but has its internal start and end indices adjusted correspondingly. To make an example, new String("smiles").substring(1,5) is a String configured to start at index 1 and end at index 4 within a char buffer "smiles" shared by reference with the originally constructed String. You can exploit that fact as follows: given a large String set, you can merge its char content into one large char array, create a String out of it, and recreate the original Strings as subStrings of this master String, as the following method illustrates:
    public static String [] compactStrings (String [] strings)
    {
        String [] result = new String [strings.length];
        int offset = 0;
        
        for (int i = 0; i < strings.length; ++ i)
            offset += strings [i].length ();
        
        // Can't use StringBuffer due to how it manages capacity
        char [] allchars = new char [offset];
        
        offset = 0;
        for (int i = 0; i < strings.length; ++ i)
        {
            strings [i].getChars (0, strings [i].length (), allchars, offset);
            offset += strings [i].length ();
        }
        
        String allstrings = new String (allchars);
        
        offset = 0;
        for (int i = 0; i < strings.length; ++ i)
            result [i] = allstrings.substring (offset,
                                             offset += strings [i].length ());
        
        return result;
    }


The above method returns a new set of Strings equivalent to the input set but more compact in memory. Recollect from earlier measurements that every char[] adds 16 bytes of overhead; effectively removed by this method. The savings could be significant when cached data comprises mostly short Strings. When you apply this trick to the same 90,000-word dictionary mentioned above, the memory size drops from 5.6 MB to 4.2 MB, a 30 percent reduction. (An astute reader will observe in that particular example the Strings tend to share many prefixes and the compactString() method could be further optimized to reduce the merged char array's size.)

As a side effect, compactString() also removes StringBuffer-related inefficiencies mentioned earlier.
Is it worth the effort?

To many, the techniques I presented may seem like micro-optimizations not worth the time it takes to implement them. However, remember the applications I had in mind: server-side applications that cache massive amounts of data in memory to achieve performance impossible when data comes from a disk or database. Several hundred megabytes of cached data represents a noticeable fraction of maximum heap sizes of today's 32-bit JVMs. Shaving 30 percent or more off is nothing to scoff at; it could push an application's scalability limits quite noticeably. Of course, these tricks cannot substitute for beginning with well-designed data structures and profiling your application to determine its actual hot spots. In any case, you're now more aware of how much memory your objects consume.
Author Bio
Vladimir Roubtsov has programmed in a variety of languages for more than 12 years, including Java since 1995. Currently, he develops enterprise software as a senior developer for Trilogy in Austin, Texas. When coding for fun, Vladimir develops software tools based on Java byte code or source code instrumentation.
分享到:
评论
2 楼 qyongkang 2011-09-06  
纯属转载。。。
1 楼 lan861698789 2011-07-27  
如果我想求出基本数据类型的长度怎么办?就像C中的sizeof

相关推荐

    java内存泄露、溢出检查方法和工具

    解决内存溢出问题通常需要调整JVM的内存参数,如`-Xms`和`-Xmx`用于设置堆的初始大小和最大大小,以及`-XX:MaxPermSize`(对于较旧的JVM版本)或`-XX:MaxMetaspaceSize`(对于Java 8及以上版本)来控制方法区的大小...

    java内存机制及异常处理

    解决这些问题的方法包括但不限于调整JVM参数以增大内存分配、优化代码以减少内存占用、及时关闭不再使用的资源(如数据库连接)以及使用内存分析工具检测和修复内存泄漏。正确理解和运用Java内存机制以及异常处理...

    Java+内存分析工具+MAT

    MAT会展示出所有对象的统计信息,包括对象数量、占用内存大小等,这对于找出消耗内存最多的对象非常有帮助。 其次,MAT的"Leak Suspects"报告是其特色之一。这个报告会自动分析heap dump,找出可能导致内存泄漏的...

    Java对象池实现源码

    本篇文章将深入探讨Java对象池的实现原理,以及如何借鉴"Jakarta Commons Pool"组件来设计一个轻量级的对象池。 一、对象池的基本概念 对象池的基本工作流程包括以下几个步骤: 1. 初始化:预创建一定数量的对象并...

    jProfiler7 java内存分析 linux版本

    - jProfiler7提供了详细的内存分配和存活周期视图,帮助开发者定位内存占用大的对象和可能导致问题的代码片段。 2. **jProfiler7的核心功能** - **对象分配追踪**:实时监控对象创建过程,找出哪个类或方法导致了...

    java内存管理精彩概述

    Java对象在内存中的分配主要发生在堆上,这是所有类实例和数组的内存来源。对象的内存布局包括实例变量、方法引用等,遵循Java对象模型。 2. **浅层大小、保留大小与弱引用** - **浅层大小**(Shallow Size):一...

    Cloud_Foundry中Java应用集合类内存泄漏检测_叶瑞浩.caj

    然后通过收集每次垃圾回收事件后对象之间的引用关系和对象内存大小, 计算得到集合类对象的内存影响值。接着通过修改字节码的方式获取到集合类 对象使用集合内元素的数据,计算出相应的元素使用影响值。最后...

    如何解决Java内存泄漏.pdf

    3. JDK自带的工具,如jmap和jvisualvm,也可以用来检测Java应用程序的内存使用情况。jmap可以生成堆转储文件,jvisualvm可以分析堆转储文件,从而诊断内存泄漏问题。 文档中还提到了一些内存泄漏的典型例子和解决...

    Java 内存分析工具

    8. **阈值过滤**:设置内存大小阈值,快速定位超过特定大小的对象。 在使用MAT进行内存分析时,通常需要遵循以下步骤: 1. **获取heap dump**:当应用出现性能问题时,使用JVM的JMap或VisualVM等工具生成heap dump...

    JAVA内存溢出详解.doc

    Java的垃圾收集器负责自动回收不再使用的对象所占用的内存。了解并优化垃圾收集器的工作方式(如新生代、老年代、CMS、G1等),可以帮助避免或减少内存溢出。 5. **内存溢出的预防** - 编程时遵循良好的内存管理...

    java主要反射和内存机制

    Java是一种广泛使用的面向对象的编程语言,其强大的特性之一就是反射和内存管理机制。这篇文档将深入探讨这两个关键概念。 **一、Java反射机制** Java反射机制是Java提供的一种能够在运行时检查类、接口、字段和...

    java 内存泄漏

    堆内存主要用于存储Java对象实例。这些对象可以是任何类型的Java对象,包括数组、自定义对象等。当我们在代码中创建一个新的对象时,实际上是在堆内存中为该对象分配了一块内存空间。 #### 类变量与实例变量 - **...

    Java堆栈内存分析笔记

    GC会定期检测不再使用的对象并释放其占用的内存。理解垃圾回收机制有助于优化内存使用,避免内存泄漏。 3. **内存碎片**:长时间运行的Java应用可能导致堆内存碎片。碎片化会影响内存效率,因为可用的连续内存块...

    Java系统中内存泄漏测试方法的研究

    Java虚拟机(JVM)有自动垃圾回收机制(Garbage Collection, GC),用于回收不再使用的对象所占用的内存,但并不能保证完全避免内存泄漏。 三、内存泄漏的常见原因 1. 静态集合类引用:静态字段可能导致对象长时间...

    java内存泄露分析工具 eclipse3.5插件

    此外,MAT的"Shallow Heap"和"Retained Heap"指标可以帮助你理解对象本身的大小以及它间接保留的内存大小。 通过MAT插件,你可以定位到那些持有大量内存但本应被释放的对象,从而修复内存泄露问题。同时,MAT还会...

    IBM内存分析工具(java)

    - **内存泄漏探测器**:MAT可以检测出对象实例的异常引用链,从而发现潜在的内存泄露。 - ** dominator树**:显示了对象占用内存的支配关系,帮助识别哪些对象占用了大量内存。 - ** 堆概述**:提供堆的总体视图...

    ibm-java-堆内存分析工具-heapanalyzer

    - **JVM调优**:包括内存大小设置、垃圾回收器选择、性能监控等技巧。 6. **最佳实践** - 定期进行内存分析,尤其是在遇到性能瓶颈时。 - 使用适当的内存分析策略,如全量分析和增量分析。 - 结合代码审查和...

    java 内存监控

    2. **分析堆转储**:使用MAT或YourKit等工具分析dump文件,找出占用内存大的对象和可能的内存泄漏。 3. **定位问题**:根据分析结果,定位代码中的问题,例如过度创建对象、未正确关闭资源等。 4. **优化调整**:...

    JAVA 为什么不会内存泄漏

    2. **垃圾回收机制**:Java虚拟机内置的垃圾回收器能够定期检测并回收不再使用的对象所占用的内存。这个过程对于开发者来说是透明的,不需要手动干预。 3. **软引用、弱引用和虚引用**:Java提供了软引用、弱引用和...

    Java学习之内存泄漏检测及相关问题

    检测Java内存泄漏的方法有多种,其中包括: 1. VisualVM:这是Java自带的性能分析工具,可以显示内存分配情况,发现可能的内存泄漏。 2. MAT (Memory Analyzer Tool):Eclipse的一款插件,能提供详细的内存分析报告...

Global site tag (gtag.js) - Google Analytics