Linux内核崩溃分析

持续整理中。。。

崩溃现场

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
[@bjsjs_42_92 ~]# crash /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux /opt/var/crash/127.0.0.1-2019-08-06-10\:16\:35/vmcore

crash 7.1.2-3.el7_2.1
Copyright (C) 2002-2014 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
DUMPFILE: /opt/var/crash/127.0.0.1-2019-08-06-10:16:35/vmcore [PARTIAL DUMP]
CPUS: 40
DATE: Tue Aug 6 10:16:27 2019
UPTIME: 390 days, 14:40:34
LOAD AVERAGE: 81.55, 29.79, 16.87
TASKS: 2448
NODENAME: bjsjs_42_92
RELEASE: 3.10.0-327.el7.x86_64
VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015
MACHINE: x86_64 (2199 Mhz)
MEMORY: 127.9 GB
PANIC: "kernel BUG at fs/buffer.c:1280!"
PID: 57502
COMMAND: "java"
TASK: ffff88201a2eb980 [THREAD_INFO: ffff880f7566c000]
CPU: 26
STATE: (PANIC)

crash> bt
PID: 57502 TASK: ffff88201a2eb980 CPU: 26 COMMAND: "java"
#0 [ffff880f7566f5a8] machine_kexec at ffffffff81051beb
#1 [ffff880f7566f608] crash_kexec at ffffffff810f2542
#2 [ffff880f7566f6d8] oops_end at ffffffff8163e1a8
#3 [ffff880f7566f700] die at ffffffff8101859b
#4 [ffff880f7566f730] do_trap at ffffffff8163d860
#5 [ffff880f7566f780] do_invalid_op at ffffffff81015204
#6 [ffff880f7566f830] invalid_op at ffffffff8164701e
[exception RIP: check_irqs_on+4]
RIP: ffffffff816326d1 RSP: ffff880f7566f8e8 RFLAGS: 00010046
RAX: 0000000000000096 RBX: ffff8810092d9110 RCX: ffff880ee7d05800
RDX: 0000000000001000 RSI: 000000000090014a RDI: ffff880f8cf4b740
RBP: ffff880f7566f8e8 R8: 0000000000000004 R9: 0000000000000004
R10: ffff880ee7d05800 R11: ffff880f7566f9e8 R12: ffff880f8cf4b740
R13: 0000000000001000 R14: ffff8801df370800 R15: 0000000000000010
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880f7566f8f0] __find_get_block at ffffffff81212c85
#8 [ffff880f7566f910] __getblk at ffffffff81212cb5
#9 [ffff880f7566f970] __ext4_get_inode_loc at ffffffffa0373030 [ext4]
#10 [ffff880f7566f9d8] ext4_get_inode_loc at ffffffffa0374dbd [ext4]
#11 [ffff880f7566f9e8] ext4_reserve_inode_write at ffffffffa0376906 [ext4]
#12 [ffff880f7566fa18] ext4_mark_inode_dirty at ffffffffa03769d3 [ext4]
#13 [ffff880f7566fa70] ext4_writepages at ffffffffa0377567 [ext4]
#14 [ffff880f7566fba8] do_writepages at ffffffff811758fe
#15 [ffff880f7566fbb8] __filemap_fdatawrite_range at ffffffff8116a685
#16 [ffff880f7566fc08] filemap_flush at ffffffff8116a74c
#17 [ffff880f7566fc18] ext4_alloc_da_blocks at ffffffffa0374a8c [ext4]
#18 [ffff880f7566fc38] ext4_release_file at ffffffffa036df89 [ext4]
#19 [ffff880f7566fc60] __fput at ffffffff811e0329
#20 [ffff880f7566fca8] ____fput at ffffffff811e05ee
#21 [ffff880f7566fcb8] task_work_run at ffffffff810a22f4
#22 [ffff880f7566fce8] do_exit at ffffffff810815eb
#23 [ffff880f7566fd78] do_group_exit at ffffffff81081dff
#24 [ffff880f7566fda8] get_signal_to_deliver at ffffffff81092c10
#25 [ffff880f7566fe40] do_signal at ffffffff81014417
#26 [ffff880f7566ff30] do_notify_resume at ffffffff81014adf
#27 [ffff880f7566ff50] retint_signal at ffffffff8163d1fc
RIP: 00007f3db495ee17 RSP: 00007f3cbc8a20f0 RFLAGS: 00010246
RAX: 0000000000000006 RBX: 00007f3cbc8a2430 RCX: 0000000000000001
RDX: 00007f3cbc99f700 RSI: 0000000000000008 RDI: 0000000000000000
RBP: 00007f3cbc8a2220 R8: 00007f3cbc8a25f0 R9: 00007f3cbc8a24c0
R10: 00000000fffffe00 R11: 0000000000000000 R12: 000000000000000b
R13: 00007f3cbc8a24c0 R14: 00007f3cbc8a2430 R15: 00007f3db48ee2c5
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
crash>

字段解释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
KERNEL: 系统崩溃时运行的 kernel 文件

DUMPFILE: 内核转储文件

CPUS: 所在机器的 CPU 数量

DATE: 系统崩溃的时间

TASKS: 系统崩溃时内存中的任务数

NODENAME: 崩溃的系统主机名

RELEASE: 和 VERSION: 内核版本号

MACHINE: CPU 架构

MEMORY: 崩溃主机的物理内存

PANIC: 崩溃类型,常见的崩溃类型包括:

SysRq (System Request):通过魔法组合键导致的系统崩溃,通常是测试使用。通过 echo c > /proc/sysrq-trigger,就可以触发系统崩溃。

oops:可以看成是内核级的 Segmentation Fault。应用程序如果进行了非法内存访问或执行了非法指令,会得到 Segfault 信号,一般行为是 coredump,应用程序也可以自己截获 Segfault 信号,自行处理。如果内核自己犯了这样的错误,则会弹出 oops 信息。

安装环境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat >> /etc/yum.repos.d/CentOS-Debug.repo  <<EOF
[CentOS-Debug]
name=CentOS-$releasever - DebugInfo
baseurl=http://debuginfo.centos.org/\$releasever/\$basearch/
gpgcheck=0
enabled=1
protect=1
priority=1
EOF

yum --enablerepo=CentOS-Debug install -y kernel-debuginfo-$(uname -r)

debuginfo-install glibc

vmcore crash

相关资料