線程堆棧分析

hopechart 6年前發布 | 35K 次閱讀 線程 JRE Java開發

本文為讀書筆記,關于 Java 線程堆棧分析,在閱讀并進行實驗的基礎上進行整理,如果有問題歡迎反饋

Java 虛擬機提供了線程轉儲(Thread dump)的后門,通過這個后門,可以將線程堆棧打印出來。這個后門就是通過向 Java 進程發送一個 QUIT 信號,Java 虛擬機收到該信號之后,將系統當前的 Java 線程調用堆棧打印出來。

kill -3 <java_pid> /jstack <java_pid>

1. Demo

public class MyTest {
        Object obj1 = new Object();
        Object obj2 = new Object();
        public void fun1() {
            synchronized(obj1) {
                fun2();
            }
        }
        public void fun2() {
            synchronized(obj2) {
                while(true) {
                    System.out.print("");
                }
            }
        }
        public static void main(String[] args) {
            MyTest aa = new MyTest();
            aa.fun1();
        }
}

1.1 使用 kill -3 得到的結果

2017-12-26 10:59:58
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode):
    "Attach Listener" daemon prio=10 tid=0x00007fd9d4001000 nid=0x6993 runnable [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "Service Thread" daemon prio=10 tid=0x00007fda3c0a8800 nid=0x695f runnable [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "C2 CompilerThread1" daemon prio=10 tid=0x00007fda3c0a6000 nid=0x695e waiting on condition [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "C2 CompilerThread0" daemon prio=10 tid=0x00007fda3c0a3000 nid=0x695d waiting on condition [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "Signal Dispatcher" daemon prio=10 tid=0x00007fda3c0a0800 nid=0x695c waiting on condition [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "Finalizer" daemon prio=10 tid=0x00007fda3c078000 nid=0x695b in Object.wait() [0x00007fda32e06000]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
    "Reference Handler" daemon prio=10 tid=0x00007fda3c076000 nid=0x695a in Object.wait() [0x00007fd9ec28e000]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000758c04470> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x0000000758c04470> (a java.lang.ref.Reference$Lock)
        "main" prio=10 tid=0x00007fda3c008800 nid=0x6950 runnable [0x00007fda4588a000]
        java.lang.Thread.State: RUNNABLE
        at MyTest.fun2(MyTest.java:12)
        - locked <0x0000000758c4a7f8> (a java.lang.Object)
        at MyTest.fun1(MyTest.java:6)
        - locked <0x0000000758c4a7e8> (a java.lang.Object)
        at MyTest.main(MyTest.java:18)
    "VM Thread" prio=10 tid=0x00007fda3c071800 nid=0x6959 runnable
    "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fda3c01e000 nid=0x6951 runnable
    "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fda3c020000 nid=0x6952 runnable
    "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fda3c022000 nid=0x6953 runnable
    "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fda3c024000 nid=0x6954 runnable
    "GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fda3c026000 nid=0x6955 runnable
    "GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fda3c027800 nid=0x6956 runnable
    "GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fda3c029800 nid=0x6957 runnable
    "GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fda3c02b800 nid=0x6958 runnable
    "VM Periodic Task Thread" prio=10 tid=0x00007fda3c0b3000 nid=0x6960 waiting on condition
    JNI global references: 123
    Heap
        PSYoungGen total 150528K, used 7772K [0x0000000758c00000, 0x0000000763380000, 0x0000000800000000)
        eden space 129536K, 6% used [0x0000000758c00000,0x0000000759397200,0x0000000760a80000)
        from space 20992K, 0% used [0x0000000761f00000,0x0000000761f00000,0x0000000763380000)
        to space 20992K, 0% used [0x0000000760a80000,0x0000000760a80000,0x0000000761f00000)
        ParOldGen total 342528K, used 0K [0x000000060a400000, 0x000000061f280000, 0x0000000758c00000)
        object space 342528K, 0% used [0x000000060a400000,0x000000060a400000,0x000000061f280000)
        PSPermGen total 21504K, used 2429K [0x0000000605200000, 0x0000000606700000, 0x000000060a400000)
        object space 21504K, 11% used [0x0000000605200000,0x000000060545f738,0x0000000606700000)

1.2 jstack 的結果

2017-12-26 11:04:54
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode):
    "Attach Listener" daemon prio=10 tid=0x00007fd9d4001000 nid=0x6993 waiting on condition [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "Service Thread" daemon prio=10 tid=0x00007fda3c0a8800 nid=0x695f runnable [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "C2 CompilerThread1" daemon prio=10 tid=0x00007fda3c0a6000 nid=0x695e waiting on condition [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "C2 CompilerThread0" daemon prio=10 tid=0x00007fda3c0a3000 nid=0x695d waiting on condition [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "Signal Dispatcher" daemon prio=10 tid=0x00007fda3c0a0800 nid=0x695c runnable [0x0000000000000000]
        java.lang.Thread.State: RUNNABLE
    "Finalizer" daemon prio=10 tid=0x00007fda3c078000 nid=0x695b in Object.wait() [0x00007fda32e06000]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
    "Reference Handler" daemon prio=10 tid=0x00007fda3c076000 nid=0x695a in Object.wait() [0x00007fd9ec28e000]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000758c04470> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x0000000758c04470> (a java.lang.ref.Reference$Lock)
    "main" prio=10 tid=0x00007fda3c008800 nid=0x6950 runnable [0x00007fda4588a000]
        java.lang.Thread.State: RUNNABLE
        at MyTest.fun2(MyTest.java:12)
        - locked <0x0000000758c4a7f8> (a java.lang.Object)
        at MyTest.fun1(MyTest.java:6)
        - locked <0x0000000758c4a7e8> (a java.lang.Object)
        at MyTest.main(MyTest.java:18)
    "VM Thread" prio=10 tid=0x00007fda3c071800 nid=0x6959 runnable
    "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fda3c01e000 nid=0x6951 runnable
    "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fda3c020000 nid=0x6952 runnable
    "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fda3c022000 nid=0x6953 runnable
    "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fda3c024000 nid=0x6954 runnable
    "GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fda3c026000 nid=0x6955 runnable
    "GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fda3c027800 nid=0x6956 runnable
    "GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fda3c029800 nid=0x6957 runnable
    "GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fda3c02b800 nid=0x6958 runnable
    "VM Periodic Task Thread" prio=10 tid=0x00007fda3c0b3000 nid=0x6960 waiting on condition
    JNI global references: 123

1.3 分析

用戶線程為

"main" prio=10 tid=0x00007fda3c008800 nid=0x6950 runnable [0x00007fda4588a000] //main 表示線程名稱,proi=1 表示線程優先級,tid=0x00007fda3c008800 表示線程 id,nid=0x6950 表示本地線程 id(這個查問題的時候會用到),runnable 表示線程狀態,0x00007fda4588a000 表示內存地址
 java.lang.Thread.State: RUNNABLE
  at MyTest.fun2(MyTest.java:12) //第一個MyTest 表示類名,fun2 表示方法名,第二個 MyTest.java 表示正在調用的函數所在的源代碼文件,12 表示行號
     - locked <0x0000000758c4a7f8> (a java.lang.Object) //locked 表示持有鎖,后面的 0x0000000758c4a7f8 表示鎖的 ID
     at MyTest.fun1(MyTest.java:6)
     - locked <0x0000000758c4a7e8> (a java.lang.Object)
     at MyTest.main(MyTest.java:18)

如何獲得 Java 虛擬機對應的本地線程

1. jps //獲取 Java 進程 ID
2. pstack <java_pid> 獲得 Java 虛擬機的本地線程的堆棧

獲得的本地線程棧如下所示

Thread 19 (Thread 0x7fda4588b700 (LWP 26960)): // Thread 19 表示線程名稱,0x7fda4588b700 表示本地線程 id,LWP 26960 本地線程的另一種表示,LWP-light weight process(和上面的 nid 會一一對應,這里是十進制,上面的是十六進制)
#0 0x00007fda401b2c47 in ?? ()
#1 0x00007fda401ab8fc in ?? ()
#2 0x0000000758c4a7f8 in ?? ()
#3 0x00007fda4588a8a0 in ?? ()
#4 0x0000000000000001 in ?? ()
#5 0x00007fda4014f058 in ?? ()
#6 0x00007fda4588a8f8 in ?? ()
#7 0x00007fda4014f058 in ?? ()
#8 0x0000000758c4a7d0 in ?? ()
#9 0x0000000000000001 in ?? ()
#10 0x0000000758c4a7e8 in ?? ()
#11 0x00007fda4588a8a8 in ?? ()
#12 0x000000060545d7b0 in ?? ()
#13 0x00007fda4588a918 in ?? ()
#14 0x000000060545dc80 in ?? ()
#15 0x0000000000000000 in ?? ()
    Thread 18 (Thread 0x7fda40148700 (LWP 26961)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
        Thread 17 (Thread 0x7fda3a5b3700 (LWP 26962)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
            Thread 16 (Thread 0x7fda3a4b2700 (LWP 26963)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                Thread 15 (Thread 0x7fda3a3b1700 (LWP 26964)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                    Thread 14 (Thread 0x7fda3a2b0700 (LWP 26965)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                        Thread 13 (Thread 0x7fda3a1af700 (LWP 26966)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                            Thread 12 (Thread 0x7fda3a0ae700 (LWP 26967)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                Thread 11 (Thread 0x7fda39fad700 (LWP 26968)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                    Thread 10 (Thread 0x7fda32f08700 (LWP 26969)):
#0 0x00007fda4546ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44657c0f in os::PlatformEvent::park(long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda44619b2e in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda447e1899 in VMThread::loop() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda447e1ba0 in VMThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                        Thread 9 (Thread 0x7fd9ec28f700 (LWP 26970)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda44646ddd in ObjectMonitor::wait(long, bool, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda444b6298 in JVM_MonitorWait () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4015bd98 in ?? ()
#5 0x00007fda3c0768d8 in ?? ()
#6 0x00007fd900000000 in ?? ()
#7 0x00007fda3c076000 in ?? ()
#8 0x00007fd9ec28e5d8 in ?? ()
#9 0x0000000605202798 in ?? ()
#10 0x00007fd9ec28e648 in ?? ()
#11 0x00000006052a9e68 in ?? ()
#12 0x0000000000000000 in ?? ()
                                            Thread 8 (Thread 0x7fda32e07700 (LWP 26971)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda44646ddd in ObjectMonitor::wait(long, bool, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda444b6298 in JVM_MonitorWait () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4015bd98 in ?? ()
#5 0x00007fda3c078a08 in ?? ()
#6 0x00007fda00000000 in ?? ()
#7 0x00007fda3c078000 in ?? ()
#8 0x00007fda32e065e8 in ?? ()
#9 0x0000000000000000 in ?? ()
                                                Thread 7 (Thread 0x7fda32d06700 (LWP 26972)):
#0 0x00007fda4546da00 in sem_wait () from /lib64/libpthread.so.0
#1 0x00007fda446578ea in check_pending_signals(bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda44650b35 in signal_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#7 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                                    Thread 6 (Thread 0x7fda32c05700 (LWP 26973)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a086 in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda44282648 in CompileQueue::get() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4428674a in CompileBroker::compiler_thread_loop() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#8 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#9 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                                        Thread 5 (Thread 0x7fda32b04700 (LWP 26974)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a086 in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda44282648 in CompileQueue::get() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4428674a in CompileBroker::compiler_thread_loop() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#8 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#9 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                                            Thread 4 (Thread 0x7fda32a03700 (LWP 26975)):
#0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda446e95a8 in ServiceThread::service_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#8 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                                                Thread 3 (Thread 0x7fda32902700 (LWP 26976)):
#0 0x00007fda4546ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fda44657c0f in os::PlatformEvent::park(long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda44619b2e in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda4478f121 in WatcherThread::sleep() const () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda4478f63e in WatcherThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                                                    Thread 2 (Thread 0x7fda32801700 (LWP 27027)):
#0 0x00007fda4546eb2d in accept () from /lib64/libpthread.so.0
#1 0x00007fda44136af1 in LinuxAttachListener::dequeue() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#2 0x00007fda44136c8b in AttachListener::dequeue() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#3 0x00007fda441357da in attach_listener_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#4 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#5 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so
#7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fda44d99aad in clone () from /lib64/libc.so.6
                                                                        Thread 1 (Thread 0x7fda4588d700 (LWP 26959)):
#0 0x00007fda454682fd in pthread_join () from /lib64/libpthread.so.0
#1 0x00007fda45256505 in ContinueInNewThread0 () from /usr/local/jdk1.7.0_76/bin/../lib/amd64/jli/libjli.so
#2 0x00007fda4524b58a in ContinueInNewThread () from /usr/local/jdk1.7.0_76/bin/../lib/amd64/jli/libjli.so
#3 0x00007fda4524e0e0 in JLI_Launch () from /usr/local/jdk1.7.0_76/bin/../lib/amd64/jli/libjli.so
#4 0x0000000000400686 in main ()

線程狀態中的 runnable 從虛擬機的角度來看,表示這個線程正在運行。但是出于 runnable 的線程不一定真正消耗 CPU。出于 runnable 的線程只能說明該線程沒有阻塞在 java 的 wait 或者 sleep 方法上,同時也沒有等待在鎖上面。但是如果該線程調用了本地方法(比如 java.net.SocketInputStream.socketRead0) ,而本地方法出于等待狀態,這個時候虛擬機是不知道本地代碼中發生了什么,此時盡管當前線程實際上也是阻塞狀態,但實際上顯示出來的還是 runnable 狀態,這種情況下是不消耗 CPU 的。如下線程堆棧所示

"Thread-243" prio=1 tid=0xa58f2048 nid=0x7ac2 runnable [0xaeedb000..0xaeedc480]
        at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:129)
            at oracle.net.ns.Packet.receive(Unknown Source)
            at oracle.net.ns.DataPacket.receive(Unknown Source)
            at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
            at oracle.net.ns.NetInputStream.read(Unknown Source)
            at oracle.jdbc.driver.T4CMAREngine.getNBytes(T4CMAREngine.java:1520)
            at oracle.jdbc.driver.T4CMAREngine.unmarshalNBytes()
            at oracle.jdbc.driver.T4CLongRawAccessor.readStreamFromWire()
            at oracle.jdbc.driver.T4CLongRawAccessor.readStream()
            at oracle.jdbc.driver.T4CInputStream.getBytes(T4CInputStream.java:70)
            - locked <0x934f4258> (a oracle.jdbc.driver.T4CInputStream)
            - locked <0x6b0dd600> (a oracle.jdbc.driver.T4CConnection)
            at oracle.jdbc.driver.OracleInputStream.needBytes()
            ... ...
                    at org.hibernate.loader.Loader.list(Loader.java:1577)
            at org.hibernate.loader.hql.QueryLoader.list()
            at com.wes.timer.TimerTaskImpl.execute(TimerTaskImpl.java:627)
            - locked <0x80df8ce8> (a com.wes.timer.TimerTaskImpl)
            at com.wes.threadpool.RunnableWrapper.run(RunnableWrapper.java:209)
            at com.wes.threadpool.PooledExecutorEx$Worker.run()
            at java.lang.Thread.run(Thread.java:595)

那么 “. “ 和 “. “ 各表示什么含義呢? “. “ 表示當前正在執行類的初始化,”. “ 正在執行對象的構造函數。

除接口外,初始化一個類之前必須保證其直接父類已被初始化,并且該初始化過程是由 JVM 保證線程安全的。并非所有的類都擁有一個 方法,在以下條件中該類不會擁有 方法

  • 該類既沒有聲明任何類變量,也沒有靜態初始化語句
  • 該類聲明了類變量,但沒有明確使用類變量初始化語句或靜態初始化語句初始化
  • 該類僅包含靜態 final 變量的類變量初始化語句,并且類變量初始化語句是編譯時常量表達式

2 鎖的解讀

多線程中,wait() 和 sleep() 的區別。wait() 和 sleep() 有一個共同點,就是二者都會把當前的線程阻塞住,我們稱之為睡眠或者等待。但二者實際上是完全不同的兩個函數,有著本質的區別:

  • 當線程執行到 wait() 方法上,當前線程會釋放監視鎖,此時其他線程可以占有該鎖,一旦 wait() 方法執行完成,當前線程又繼續持有該鎖,直到執行完該鎖的作用域。
    wait() 方法退出的原因:
    • 達到了等待時間,自動退出。如 wati(5000), 5秒后 wait 方法退出
    • 其他的線程調用了該鎖的 notify() 方法。當如果多個線程在等待同一個鎖,只有一個線程會被通知到
  • sleep() 與鎖操作無關,如果該方法恰好在一個鎖的保護范圍之內,當前線程及時在執行 sleep() 的時候,仍然繼續保持監視鎖。該方法實際上僅僅完成等待或者睡眠的語義

與鎖相關的三個重要信息如下:

  • 當一個線程占有一個鎖的時候,線程堆棧中會打印 – locked<0x22bffb60>
  • 當一個線程正在等待其他線程釋放該鎖,線程堆棧中會打印 – waiting to lock<0x22bffb60>
  • 當一個線程占有一個鎖,但有執行到該鎖的 wait() 上,線程堆棧中首先打印 locked, 然后又會打印 – waiting on<0x22c03c60>

3 線程狀態的解讀

3.1 RUNNABLE

RUNNABLE 從虛擬機的角度看,線程處于正在運行的狀態,正在運行不一定消耗 CPU,比如下面的線程棧,在等待 IO

Thread-39" daemon prio=1 tid=0x08646590 nid=0x666d runnable [5beb7000..5beb88b8]
  java.lang.Thread.State: RUNNABLE
      at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:201)
        - locked <0x47bfb940> (a java.io.BufferedInputStream)
        at org.postgresql.PG_Stream.ReceiveChar(PG_Stream.java:141)
        at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:68)
        - locked <0x47bfb758> (a org.postgresql.PG_Stream)
        at org.postgresql.Connection.ExecSQL(Connection.java:398)

下面的線程棧則消耗 CPU

"Thread-444" prio=1 tid=0xa4853568 nid=0x7ade runnable [0xafcf7000..0xafcf8680]
  java.lang.Thread.State: RUNNABLE
      //實實在在對應 CPU 指令
     at org.apache.commons.collections.ReferenceMap.getEntry(Unknown Source)
       at org.apache.commons.collections.ReferenceMap.get(Unknown Source)
       at org.hibernate.util.SoftLimitMRUCache.get(SoftLimitMRUCache.java:51)
       at org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan()
       at org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan()
       at org.hibernate.impl.AbstractSessionImpl.list()
       at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)
       at com.mogoko.struts.logic.user.LeaveMesManager.getCommentByShopId()
       at com.mogoko.struts.action.shop.ShopIndexBaseInfoAction.execute()
       ......

下面的代碼正在進行 JNI 本地方法調用,具體是否消耗 CPU,要看 TcpRecvExt 的實現,如果 TcpRecvExt 是純運算代碼,那么實實在在的消耗 CPU,如果 TcpRecvExt() 中存在掛起的代碼,那么該線程盡管顯示 RUNNABLE,實際也不消耗 CPU 的

"ClientReceiveThread" daemon prio=1 tid=0x99dbacf8 nid=0x7988 runnable [...]
  java.lang.Thread.State: RUNNABLE
      at com.pangu.network.icdcomm.htcpapijni.TcpRecvExt(Native Method)
        at com.pangu.network.icdcomm.IcdComm.receive(IcdComm.java:60)
        at com.msp.client.MspFactory$ClientReceiveThread.task(MspFactory.java:333)
        at com.msp.system.TaskThread.run(TaskThread.java:94)

3.2 TIMED_WAITING(on object monitor)

TIMED_WAITING(on object monitor) 表示當前線程被掛起一段時間,說明該線程正在執行 object.wait(int time) 方法,不消耗 CPU

"JMX server" daemon prio=6 tid=0x0ad2c800 nid=0xdec in Object.wait() [...]
  java.lang.Thread.State: TIMED_WAITING (on  object  monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x03129da0> (a [I)
       at com.sun.jmx.remote.internal.ServerComm$Timeout.run(ServerComm.java:150)
       - locked <0x03129da0> (a [I)
       at java.lang.Thread.run(Thread.java:620)

3.3 TIME_WAITING(sleeping)

TIME_WAITING(sleeping) 表示當前線程被掛起一段時間,正在執行 Thread.sleep(int time) 方法,不消耗 CPU

"Comm thread" daemon prio=10 tid=0x00002aaad4107400 nid=0x649f waiting on condition
    [0x000000004133b000..0x000000004133ba00]
            java.lang.Thread.State: TIMED_WAITING (sleeping)
                at java.lang.Thread.sleep(Native Method)
                at org.apache.hadoop.mapred.Task$1.run(Task.java:282)
                at java.lang.Thread.run(Thread.java:619)

3.4 TIME_WAITING(parking)

TIME_WAITING(parking) 表示線程被掛起,正在執行 Thread.sleep(int time) 方法,不消耗 CPU

"RMI TCP" daemon prio=6 tid=0x0ae3b800 nid=0x958 waiting on condition [0x17eff000..0x17effa94]
    java.lang.Thread.State: TIMED_WAITING (parking)
           at sun.misc.Unsafe.park(Native Method)
           - parking to wait for  <0x02f49f58> (a java.util.concurrent.SynchronousQueue$TransferStack)
           at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:179)
           at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
           at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)
           at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:871)
           at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:495)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:693)
           at java.lang.Thread.run(Thread.java:620)

3.5 WAITING(on object monitor)

WAITING(on object monitor) 表示當前線程被掛起,正在執行 obj.wait() 只能通過 notify() 喚醒,不消耗 CPU

"IPC Client" daemon prio=10 tid=0x00002aaad4129800 nid=0x649d in Object.wait() [0x039000..0x039d00]
       java.lang.Thread.State: WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                - waiting on <0x00002aaab3acad18>; (aorg.apache.hadoop.ipc.Client$Connection)
                at java.lang.Object.wait(Object.java:485)
                at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:234)
                - locked <0x00002aaab3acad18> (aorg.apache.hadoop.ipc.Client$Connection)
                at org.apache.hadoop.ipc.Client$Connection.run(Client.java:273)

3.6 小結

處于 TIME_WAITING、WAITING 狀態的線程不消耗 CPU,處于 RUNNABLE 狀態的線程要結合線程代碼性質判斷,是否消耗 CPU

  • 如果是純 Java 運算代碼,消耗 CPU
  • 如果是網絡 IO,很少消耗 CPU
  • 如果是本地代碼,結合本地代碼性質進行判斷(通過 pstack/gstack 獲取本地線程棧),如果是純運算代碼,消耗 CPU,如果被掛起,不消耗 CPU,如果是 IO,則不怎么消耗 CPU

4 接觸線程堆棧進行問題分析

線程堆棧翻譯系統當前時間正在執行什么代碼。根據這些信息就可以知道系統當前到底在做什么。看堆棧一般從三個視角來分析:堆棧的局部信息、一次堆棧的統計信息(全局信息)、多個堆棧的對比信息

  1. 視角一 從一次的堆棧信息中,我們能直接獲取以下直接的信息:
    a. 當前每一個線程的調用層次關系(調用上下文),即每個線程當前正在調用哪些函數
    b. 當前每個線程當前的狀態:持有了哪些鎖?在等待哪些鎖?
  2. 視角二 從一次的堆棧信息中,我們還可以獲取下面的統計方面信息:
    a. 當前鎖的爭用情況:
    i. 是不是很多線程在等待同一個鎖,如果很多線程等待同一個鎖,說明這個系統已經出現了性能瓶頸,并導致了鎖競爭。還可能是某個線程長時間持有一個鎖不釋放(比如這個線程正陷入了死循環的代碼或者正在請求一個資源,很長時間得不到喚醒)
    ii. 是否有死鎖,哪些線程形成了死鎖
    b. 當前大多數線程在干什么,即正在執行什么代碼
    c. 當前線程總數量
  3. 視角三 從多次(即前后打印多次堆棧進行對比)的堆棧信息中,我們還可以獲得下面的統計對比方面的信息:
    a. 一個線程是否在長期執行。如果每次打印的堆棧,某一個線程一直處于同樣的調用上下文中,那么說明這個線程一直在執行這段代碼,此時就要根據代碼邏輯檢查,這種長期執行是否合理?
    b. 某個線程是否存在長期獲取不到鎖的情況?線程是不是永遠得不到喚醒?如果每次打印的堆棧,某一個線程一直在等待一個鎖,那么就需要檢查占有這個鎖的線程為什么不釋放鎖?

通過多個視角進行觀察,線程堆棧在定位如下類型的問題上非常有幫助:

  • 線程死鎖分析(視角一)
  • Java 代碼導致的 CPU 過高分析(視角三)
  • 死循環分析(視角三)
  • 資源不足分析(視角二)
  • 性能瓶頸分析(視角二和視角三)

4.1 線程死鎖分析

死鎖的原因:當兩個或多個線程正在等待被對方占有的鎖,死鎖就會發生。死鎖會導致兩個線程無法繼續運行,被永遠掛起。下圖中在時間點0,線程 0 占有了 lock0,線程1占有了 lock1,然后時間點 2 線程0 企圖獲取 lock 1,線程 1 企圖獲取 lock 0.由于這兩個線程互相要等待被對方占有的鎖,自己才能繼續,因此造成死鎖,二者永遠沒有機會繼續運行下去。

一個簡單的例子

package MyPackage;
public class Main {
    public static void main(String[] args) {
        Object lockobj1 = new Object();
        Object lockobj2 = new Object();
        TestThread1 thread1 = new TestThread1(lockobj1,lockobj2);
        thread1.start();
        TestThread2 thread2 = new TestThread2(lockobj1,lockobj2);
        thread2.start();
    }
}


package MyPackage;
public class TestThread1  extends Thread{
    Object lock1 = null;
    Object lock2 = null;
    public TestThread1(Object lock1_,Object lock2_)
    {
        lock1 = lock1_;
        lock2 = lock2_;
        this.setName(this.getClass().getName());
    }
    public void run()
    {
        fun();
    }
    public void fun(){
        synchronized(lock1){
            try{
                Thread.sleep(2);
            }
            catch(Exception e){
                e.printStackTrace();
            }
            synchronized(lock2){
            }
        }
    }
}


package MyPackage;
public class TestThread2  extends Thread{
    Object lock1 = null;
    Object lock2 = null;
    public TestThread2(Object lock1_,Object lock2_)
    {
        lock1 = lock1_;
        lock2 = lock2_;
        this.setName(this.getClass().getName());
    }
    public void run()
    {
        fun();
    }
    public void fun(){
        synchronized(lock2){
            try{
                Thread.sleep(2);
            }
            catch(Exception e){
                e.printStackTrace();
            }
            synchronized(lock1){
            }
        }
    }
}

jstack 得到結果如下(截取后的堆棧信息)

Found one Java-level deadlock:
=============================
"MyPackage.TestThread2":
  waiting to lock monitor 0x00007fa1b00062c8 (object 0x0000000758c4aa78, a java.lang.Object),
  which is held by "MyPackage.TestThread1"
"MyPackage.TestThread1":
  waiting to lock monitor 0x00007fa1b0004ed8 (object 0x0000000758c4aa88, a java.lang.Object),
  which is held by "MyPackage.TestThread2"

Java stack information for the threads listed above:
===================================================
"MyPackage.TestThread2":
  at MyPackage.TestThread2.fun(TestThread2.java:24)
    - waiting to lock <0x0000000758c4aa78> (a java.lang.Object)
    - locked <0x0000000758c4aa88> (a java.lang.Object)
    at MyPackage.TestThread2.run(TestThread2.java:13)
"MyPackage.TestThread1":
   at MyPackage.TestThread1.fun(TestThread1.java:24)
    - waiting to lock <0x0000000758c4aa88> (a java.lang.Object)
    - locked <0x0000000758c4aa78> (a java.lang.Object)
   at MyPackage.TestThread1.run(TestThread1.java:13)

Found 1 deadlock.

要避免死鎖的問題,唯一的辦法就是修改代碼。一個可靠的并發系統可以說是設計出來的,而不是通過改 Bug 改出來的,這一點與其他類型的 Bug 有很大的不同。另外,死鎖的兩個或多個線程是不消耗 CPU 的,有的人認為 CPU 100% 的使用率是線程死鎖造成的,這個說法是不對的。無限循環(死循環),并且循環中代碼都是 CPU 密集型,才有可能導致 CPU 的 100% 使用率,像 socket 或者數據庫等 IO 操作是不怎么消耗 CPU 的。

4.2 Java 代碼死循環等導致的 CPU 過高分析

當系統負載大的時候,CPU 的使用率會較高,但是不正確的代碼也會導致 CPU 過高,比如死循環。當發生 CPU 過高的問題,我們需要能夠分析 CPU 高的真正原因。通過多次打印堆棧,前后堆棧對比找到一直在運行的線程,這些線程都是可疑的線程,具體的步驟如下:

  • 通過 jstack 等獲取第一次的堆棧信息
  • 等待一段時間,再獲取第二次堆棧信息
  • 預處理兩次堆棧信息,首先去掉處于 sleeping 或者 waiting 狀態的線程,因為這種線程是不消耗 CPU 的
  • 比較第一次堆棧和第二次堆棧預處理后的線程,找出這段時間一直活躍的線程,如果兩次堆棧中同一個線程處于同樣的調用上下文,那么就應該列為重點懷疑對象。結合代碼進行檢查

如果通過堆棧定位,沒有發現熱點代碼段,那么 CPU 過高可能是不恰當的內存設置導致的頻繁 GC,而從導致 CPU 過高

第一次 jstack 的輸出如下:

"Thread-444" prio=1 tid=0xa4853568 nid=0x7ade runnable [0xafcf7000..0xafcf8680]
        at org.apache.commons.collections.ReferenceMap.getEntry(Unknown Source)
            at org.apache.commons.collections.ReferenceMap.get(Unknown Source)
            at org.hibernate.util.SoftLimitMRUCache.get(SoftLimitMRUCache.java:51)
            at org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan()
            at org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan()
            at org.hibernate.impl.AbstractSessionImpl.list()
            at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)
            at com.mogoko.struts.logic.user.LeaveMesManager.getCommentByShopId()
            at com.mogoko.struts.action.shop.ShopIndexBaseInfoAction.execute()
            ......

第二次 jstack 的輸出如下:

"Thread-444" prio=1 tid=0xa4853568 nid=0x7ade runnable [0xafcf7000..0xafcf8680]
        at org.apache.commons.collections.ReferenceMap.getEntry(Unknown Source)
            at org.apache.commons.collections.ReferenceMap.get(Unknown Source)
            at org.hibernate.util.SoftLimitMRUCache.get(SoftLimitMRUCache.java:51)
            at org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan()
            at org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan()
            at org.hibernate.impl.AbstractSessionImpl.list()
            at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)
            at com.mogoko.struts.logic.user.LeaveMesManager.getCommentByShopId()
            at com.mogoko.struts.action.shop.ShopIndexBaseInfoAction.execute()
            ... ...

在長達 5 分鐘的時間里,一直在執行 org.apache.commons.collections.ReferenceMap.getEntry() 方法,說明這個方法一直沒有結束。具體屬于正常還是 Bug,需要結合源碼進行判斷。

導致死循環的代碼屬于代碼的 Bug,這種類型的問題,重現比較難,但一旦重現,就比較容易解決。一般通過分析代碼可以發現問題。導致死循環的原因大致有如下幾個:

  • HashMap 等線程不安全的容器,用在多線程讀/寫的場合,導致 HashMap 的方法調用形成死循環(可以參考2015-08-19 Hadoop升級過程出現異常問題casestudy)
  • 多線程場合,對共享變量沒有進行保護,導致數據混亂,從而使循環退出條件永遠不滿足,導致死循環的發生,比如:
    • for, while 循環中的退出條件永遠不滿足導致的死循環
    • 鏈表等數據結構首尾相連,導致遍歷永遠無法停止
  • 其他編碼錯誤

4.3 高消耗 CPU 代碼的常用分析方法

借助操作系統提供的性能分析工具進行 CPU 消耗分析。上面介紹了針對死循環導致 CPU 遍告的分析方法,但是有些場景沒有死循環也會導致 CPU 變高,我們需要借助系統提供的一些性能分析工具進行分析。在 Linux 中使用 top -p $pid 進行,步驟如下

  1. top -p $pid
  2. 輸入 “H” 查看該進程所有線程的統計情況(CPU 等)如下所示(對于某些操作系統版本不支持 top 命令進行線程統計的
下面 PID 表示線程 ID,即 LWPID
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  25059 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25060 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.04 java
  25061 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25062 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25063 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25064 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25065 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25066 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25067 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25068 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25069 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:01.19 java
  25070 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25071 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25072 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25073 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25074 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25075 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java
  25076 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:24.31 java
  25077 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:01.00 java
  25078 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:01.05 java
  25141 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java

具體導致問題的代碼可能是:

  1. 純 Java 代碼導致的 CPU 過高
  2. Java 代碼中調用的 JNI 代碼導致的 CPU 過高
  3. 虛擬機自身的代碼導致的 CPU 過高,比如 GC 的 bug 等

可以如下進行分析:

  1. top -p $pid 獲取最消耗 CPU 的本地線程 ID
  2. jstack 打印出 Java 堆棧
  3. pstack 打印本地線程堆棧
  4. 在 Java 線程堆棧中查找 nid=<第一步中最消耗 CPU 時間的線程 id>
    a. 如果找到的線程是純 Java 代碼,則該 Java 代碼導致 CPU 過高
    b. 如果找到的是 Native code,說明導致 CPU 過高的問題在 JNI 調用中
    c. 如果找不到對于的線程,則有如下兩種可能:
    i. JNI 調用中重新創建的線程來執行,在 Java 線程堆棧中不存在該線程的信息
    ii. 虛擬機自身的代碼導致的 CPU 過高,如堆內存不夠導致的頻繁 fullgc,或虛擬機的 bug 等

4.4 資源不足等導致的性能下降分析

大多數時候資源不足和性能瓶頸是統一類問題。

  • 大量的線程停在同樣的調用上下文上
  • 資源數量配置太少,系統當前壓力比較大,資源不足導致某些線程不能及時獲得資源而等待
  • 獲得資源的線程把持資源時間太久(比如獲取數據庫連接),導致資源不足
  • 設計不合理導致資源占用時間過久,如 SQL 語句設計不恰當,或者沒有索引導致的數據庫訪問太慢等
  • 資源用完后,某些異常情況下,沒有關閉或回收,導致可用資源泄漏或減少

4.5 線程不退出導致的系統掛死分析

線程掛死,表現為每次打印線程堆棧,該線程都在同一個調用上下文,該類問題可以通過多次打印堆棧進行分析,導致線程掛死的原因有很多,如:

  • 線程正在執行死循環代碼
  • 資源不足或資源泄漏,造成當前線程阻塞在鎖對象上,長期得不到喚醒
  • 如果當前程序和外部通信,當外部程序掛起無返回時,也會導致當前線程掛起

4.6 其他問題分析

  • 多個鎖導致的鎖鏈分析 – 多個線程等待同一個鎖
  • 進行性能瓶頸分析

4.7 線程堆棧不能分析什么問題

線程堆棧定位問題,只能定位在當前線程上留下痕跡的問題,如線程死鎖,線程掛死等,另外,定位由于鎖的設計不恰當導致的性能問題,線程堆棧也是最有效的工具,因為性能問題時時刻刻反映在當前的線程統計狀況上。但線程堆棧對于不留痕跡的問題,就無能為力了。例如:

  • 線程為什么跑飛
  • 并發的 Bug 導致的數據混亂
  • 數據庫梭鏢的問題
  • 在實際的系統中,系統的問題分為幾種類型:

堆棧中能夠表現出問題的,使用線程堆棧進行定位

  • 無法在線程中留下痕跡的問題定位,需要依賴一個好的日志設計
  • 非常隱蔽的問題,只能依賴于豐富的代碼經驗,如多線程導致的數據混亂以及幽靈代碼等

 

來自:http://www.klion26.com/2018/01/06/線程堆棧分析/

 

 本文由用戶 hopechart 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!