当前位置:首页 >> IT/计算机 >>

MySQL线上常见故障剖析_图文

MySQL线上常见故障剖析
俊达

各种故障
? 应用获取不到连接池 ? 数据库响应慢 ? SQL慢 ? 服务器load高 ? SWAP ? 表不见了

? MySQL crash
? 主机Hung ? …

观察你的系统
? MySQL
– – – – – – 活动进程(Process list) 日志文件(slow log, alert log, general query log, binlog) Status variables(com_select, com_insert,.etc ) InnoDB(物理读、逻辑读、innodb status) 参数配置 Stack trace(plus source code) 执行计划,explain

? ?

SQL


OS

– –

内存, SWAP, /proc/meminfo
CPU, load, ps IO(磁盘、网络)
? Iostat

?

Profile
– – Oprofile gprof

Case 1: XXX系统报连接池满
iostat

orzdba

slowlog
? What’s in slow log?

Mk-query-digest
? mk-query-digest 全面分析slow log

explain
? 查看执行计划
– 选择了不好的索引

哪些SQL在执行
? Slow log
– Set global long_query_time=0

? General log ? Binlog
– For DML, mysqlbinlog binlog解析

? Processlist
– If some query is really slow

? Tcpdump
– Tcpdump + mk-query-digest

Case 2: 很多MySQL线程都卡住了
? Processlist
Id: 1842782 User: provide Host: 192.168.0.1:59068 db: provide Command: Query Time: 2326 State: Waiting for table Info: update table_xxxx set sold=sold+1, money=money+39800, Gmt_create=now() where xxxx_id=1 and day='2011-10-07 00:00:00 Id: 1657130 User: provide Host: 192.168.0.2:40093 db: provide Command: Query Time: 184551 State: Sending data Info: select xxxx_id, sum(sold) as sold from table_xxxx where xxxx_id in (select xxxx_id from table_xxxx where Gmt_create >= '2011-10-05 08:59:00') group by xxxx_id 1044 system user Connect 27406 Flushing tables FLUSH TABLES

? Pstack
#0 0x0000003b4380ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000005aac4a in wait_for_refresh () #2 0x00000000005b2857 in open_table () #3 0x00000000005b312f in open_tables () #4 0x00000000005b3440 in open_and_lock_tables () #5 0x00000000005817a4 in mysql_execute_command () #6 0x0000000000586516 in mysql_parse ()

#7 0x0000000000586a65 in dispatch_command ()
#8 0x0000000000588923 in handle_one_connection () #9 0x0000003b438064a7 in start_thread () from /lib64/libpthread.so.0

? Processlist分析
– 谁是因,谁是果?

? System user execute flush tables
– System user是谁, mysql主从复制(io thread, sql thread) – Binlog

? 谁最先执行了flush tables
– 人工执行? – App? 没有权限 – 定时任务,备份
? Xtrabackup 会执行flush tables with read lock, 不记录到binlog ? Mysqldump理论上不会执行flush tables,但如果有bug呢 ( http://bugs.mysql.com/bug.php?id=35157)

Case 3: 服务器load高
? 调查问题
– SQL层面未见明显异常 – 业务没有变动,没有发布 – 调用量没有明显变化

? Iostat
– r/s, w/s – await, svctm – avgrq-sz

? Blktrace, btt

? IO调度算法
– cfq -> deadline

Case 4: DDL lost table
? alert.log大量报错
– 持续10几分钟后,Table lost。

? 几百个进程都block在”opening tables”,这些表都不是DDL的那个表

丢表时的alert.log
110803 2:15:02 InnoDB: Warning: problems renaming 'feel_23/#sql-2635_23da0d' to 'feel_23/feed_send_1476', 24998 iterations InnoDB: Warning: tablespace './feel_23/#sql-2635_23da0d.ibd' has i/o ops stopped for a long time 24998 110803 2:15:02 InnoDB: Warning: problems renaming 'feel_xx/#sql-2635_23da0d' to 'feel_xx/feed_send_xxxx', 24999 iterations InnoDB: Warning: tablespace './feel_23/#sql-2635_23da0d.ibd' has i/o ops stopped for a long time 24999 110803 2:15:02 InnoDB: Warning: problems renaming 'feel_xx/#sql-2635_23da0d' to 'feel_xx/feed_send_xxxx', 25000 iterations InnoDB: Warning: tablespace './feel_23/#sql-2635_23da0d.ibd' has i/o ops stopped for a long time 25000 110803 110803 2:15:02 InnoDB: Warning: problems renaming 'feel_xx/#sql-2635_23da0d' to 'feel_xx/feed_send_xxxx', 25001 iterations 2:15:02 [ERROR] Cannot find or open table feel_23/feed_send_1476 from

the internal data dictionary of InnoDB though the .frm file for the table exists. Maybe you have deleted and recreated InnoDB data files but have forgotten to delete the corresponding .frm files of InnoDB tables, or you have moved .frm files to another database? or, the table contains indexes that this version of the engine doesn't support. See http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html how you can resolve the problem.

Pstack - master thread
#0 #1 #2 #3 #4 #5 #6 #7 #8 0x000000364aacced2 in select () from /lib64/libc.so.6 0x00002aaab2e595fb in os_thread_sleep () 0x00002aaab2e18838 in fil_mutex_enter_and_prepare_for_io () 0x00002aaab2e18aa5 in fil_io () 0x00002aaab2df5b63 in buf_flush_buffered_writes () 0x00002aaab2df6048 in buf_flush_batch () 0x00002aaab2ea13d8 in srv_master_thread () 0x000000364b6064a7 in start_thread () from /lib64/libpthread.so.0 0x000000364aad3c2d in clone () from /lib64/libc.so.6

Pstack – alter table
#0 #1 #2 #3 #4 #5 #6 #7 #8 #9 0x000000364aacced2 in select () from /lib64/libc.so.6 0x00002aaab2e595fb in os_thread_sleep () 0x00002aaab2e1a3e2 in fil_rename_tablespace () 0x00002aaab2e0672b in dict_table_rename_in_cache () 0x00002aaab2e86af5 in row_rename_table_for_mysql () 0x00002aaab2e316db in ha_innodb::rename_table () 0x00000000006bea6c in mysql_rename_table () 0x00000000006c77ff in mysql_alter_table () 0x00000000005c6a8e in mysql_execute_command () 0x00000000005cd371 in mysql_parse ()

#10 0x00000000005cd773 in dispatch_command () #11 0x00000000005cea04 in do_command () #12 0x00000000005bf0d7 in handle_one_connection ()

retry:

(fil_rename_tablespace)

... Some lines ommitted here(print warning,...) space->stop_ios = TRUE; if (node->n_pending > 0 || node->n_pending_flushes > 0) {

/* There are pending i/o's or flushes, sleep for a while and
retry */ mutex_exit(&fil_system->mutex); os_thread_sleep(20000); goto retry; } else if (node->modification_counter > node->flush_counter) { /* Flush the space */ mutex_exit(&fil_system->mutex); os_thread_sleep(20000); fil_flush(id); goto retry; } else if (node->open) { /* Close the file */ fil_node_close_file(node, fil_system); } space->stop_ios = FALSE; mutex_exit(&fil_system->mutex);

retry:(fil_mutex_enter_and_prepare_for_io) mutex_enter(&fil_system->mutex); if (space_id == 0 || space_id >= SRV_LOG_SPACE_FIRST_ID) { return; } if (fil_system->n_open < fil_system->max_n_open) { return; } space = fil_space_get_by_id(space_id); if (space != NULL && space->stop_ios) { /* We are going to do a rename file and want to stop new i/o's for a while */ if (count2 > 20000) { fputs("InnoDB: Warning: tablespace ", stderr); ut_print_filename(stderr, space->name); fprintf(stderr,

" has i/o ops stopped for a long time %lu\n",
(ulong) count2); } mutex_exit(&fil_system->mutex); os_thread_sleep(20000); count2++; goto retry;

mysql_alter_table() { error= copy_data_between_tables(table, new_table, ...

VOID(pthread_mutex_lock(&LOCK_open));

if (mysql_rename_table(old_db_type, db, table_name, db, old_name, FN_TO_IS_TMP)){ error=1; VOID(quick_rm_table(new_db_type, new_db, tmp_name, FN_IS_TMP)); } else if (mysql_rename_table(new_db_type, new_db, tmp_name, new_db, new_alias, FN_FROM_IS_TMP) || ((new_name != table_name || new_db != db) && (need_copy_table != ALTER_TABLE_METADATA_ONLY || mysql_rename_table(save_old_db_type, db, table_name, new_db, new_alias, NO_FRM_RENAME)) && Table_triggers_list::change_table_name(thd, db, table_name, new_db, new_alias))) { error=1; VOID(quick_rm_table(new_db_type,new_db,new_alias, 0)); VOID(quick_rm_table(new_db_type, new_db, tmp_name, FN_IS_TMP));

VOID(mysql_rename_table(old_db_type, db, old_name, db, alias,
FN_FROM_IS_TMP)); } VOID(pthread_mutex_unlock(&LOCK_open));

Case 5: MyISAM
? SQL
select count(*) as num from vid_film left join vid_class_map

on vid_film.id=vid_class_map.v_id
and vid_class_map.s_type=1 and vid_class_map.class_id=1; +-------+ | num | +-------+ | 13536 |

Orzdba
? Qps不到50,cpu耗尽(user 25%, sys 75%)
-----load-avg---- ---cpu-usage--- ---swap--- -QPS- -TPS- -Hit%time | 1m 5m 15m |usr sys idl iow| si so| ins upd del sel iud lor hit | 15:39:20|11.09 5.59 5.07| 26 74 0 0| 0 0| 0 0 0 4 0 0 100.00| 15:39:22|11.09 5.59 5.07| 25 75 0 0| 0 0| 0 0 0 47 0 0 100.00| 15:39:24|14.20 6.33 5.31| 25 74 0 0| 0 0| 0 0 0 8 0 0 100.00| 15:39:25|14.20 6.33 5.31| 26 74 0 0| 0 0| 0 0 0 44 0 0 100.00| 15:39:27|14.20 6.33 5.31| 26 74 0 0| 0 0| 0 0 0 8 0 0 100.00| 15:39:28|14.20 6.33 5.31| 26 74 0 0| 0 0| 0 0 0 36 0 0 100.00|

vmstat
? CS每秒25万次
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu-----r b swpd free buff cache si so bi bo in cs us sy id wa st 49 0 256 2139532 1623780 19942492 0 0 0 68 1027 256677 25 75 0 0 0 48 0 256 2139532 1623784 19942492 0 0 0 16 1007 251610 25 75 0 0 0 47 0 256 2139548 1623784 19942500 0 0 0 0 1003 251741 25 75 0 0 0 33 0 256 2139548 1623784 19942504 0 0 0 0 1003 251605 25 75 0 0 0 49 0 256 2139424 1623792 19942500 0 0 0 36 1007 251313 25 74 0 0 0

50 0 256 2139424 1623792 19942508 0 0 0 52 1006 251436 25 74 0 0 0

strace mysqld
? 大量futex调用
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- ---------

84.93 53.100058 145 365451 127938 futex
14.65 9.159573 277563 33 select 0.16 0.099231 149 668 read 0.13 0.084391 126 668 668 sched_setparam 0.13 0.078847 159 496 write

------ ----------- ----------- --------- --------100.00 62.522100 367316 128606 total

Oprofile global
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % linenr info app name symbol name 2200675 40.3334 processor_idle.c:222 vmlinux acpi_processor_idle 1131206 20.7324 futex.c:1973 vmlinux do_futex 893066 16.3679 futex.c:727 vmlinux futex_wake 489614 8.9735 (no location information) libpthread-2.5.so pthread_mutex_lock 244503 4.4812 thread_info.h:63 vmlinux acpi_safe_halt 66355 1.2161 sched.c:2056 vmlinux thread_return 66173 1.2128 futex.c:603 vmlinux wake_futex 50622 0.9278 (no location information) libpthread-2.5.so pthread_mutex_unlock 39782 0.7291 (no location information) libpthread-2.5.so __lll_lock_wait 27363 0.5015 mf_keycache.c:1666 mysqld find_key_block 25902 0.4747 my_handler.c:124 mysqld ha_key_cmp 25624 0.4696 mf_keycache.c:2542 mysqld key_cache_read 22179 0.4065 bmove512.c:40 mysqld bmove512 21092 0.3866 mf_keycache.c:1349 mysqld unreg_request 14228 0.2608 mi_search.c:184 mysqld _mi_bin_search 11051 0.2025 futex.c:2020 vmlinux sys_futex

Oprofile mysqld
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % linenr info symbol name 27363 14.8909 mf_keycache.c:1666 find_key_block 25902 14.0958 my_handler.c:124 ha_key_cmp 25624 13.9445 mf_keycache.c:2542 key_cache_read 22179 12.0697 bmove512.c:40 bmove512 21092 11.4782 mf_keycache.c:1349 unreg_request 14228 7.7428 mi_search.c:184 _mi_bin_search 4927 2.6813 mi_search.c:64 _mi_search 4069 2.2143 mi_key.c:220 _mi_pack_key 4046 2.2018 sql_select.cc:11896 join_read_key(st_join_table*) 3116 1.6957 sql_select.cc:11345 sub_select(JOIN*, st_join_table*, bool) 2995 1.6299 sql_select.cc:11413 evaluate_join_record(JOIN*, st_join_table*, int) 2748 1.4955 mi_rkey.c:26 mi_rkey 2674 1.4552 mi_key.c:468 _mi_read_key_record 2170 1.1809 mi_page.c:25 _mi_fetch_keypage 1648 0.8968 item_sum.cc:1193 Item_sum_count::add() 1588 0.8642 mi_rnext.c:28 mi_rnext 1564 0.8511 sql_select.cc:12360 end_send_group(JOIN*, st_join_table*, bool)

pstack
典型的调用栈 #0 0x000000321220d174 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003212208b00 in _L_lock_1233 () from /lib64/libpthread.so.0 #2 0x0000003212208a83 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00000000008889db in key_cache_read () #4 0x000000000083fe28 in _mi_fetch_keypage () #5 0x000000000083efa1 in _mi_search () #6 0x000000000083f14d in _mi_search () #7 0x000000000083f692 in _mi_search_next () #8 0x000000000083bbf0 in mi_rnext () #9 0x00000000008332b5 in ha_myisam::index_next () #10 0x00000000006393eb in join_read_next () #11 0x00000000006243ba in sub_select () #12 0x0000000000636e2d in do_select () #13 0x00000000006442f7 in JOIN::exec () #14 0x0000000000645ff8 in mysql_select () #15 0x00000000006469a7 in handle_select ()

Summary
? 了解系统各个部件的工作原理
– MySQL和InnoDB – SQL和索引 – OS 进程、线程、内存 – IO、文件操作 – 硬件的能力(响应时间、吞度量)

?

数据采集和观测
– 熟悉工具 – 收集各个部件的数据,数据的采集和保存 – 汇总,对比,分析 – 重视证据

?

问题的解决
– 应用层优化(设计、SQL优化、索引调整) – 参数调整(MySQL, OS)

– patch

谢谢大家


相关文章:
MySQL线上常见故障剖析_图文.pdf
MySQL线上常见故障剖析 - MySQL线上常见故障剖析 俊达 各种故障 ?
吴炳锡_MySQL线上SQL捕获及分析_图文.pdf
吴炳锡_MySQL线上SQL捕获及分析 - MySQL线上SQL捕获及分析 吴炳
MySQL基础技能与原理——高级应用_图文.ppt
MySQL管理基础 9. 网站常见错误分析 10. MySQL故障案例 MySQL数据库同步方案 ?...SQL未经审核上线,导致没有建立索引,执行计划为全表扫描。 MySQL故障案例 ? BR...
Mysql数据库操作基础剖析_图文.ppt
Mysql数据库操作基础剖析_中职中专_职业教育_教育专区。Mysql 一、Mysql概述特点: ? MySQL是一款从Linux系统上移植过来的优秀的开源软件,适应 于所有的平台 (在网上...
PHP管理MySQL数据库中的数据剖析_图文.ppt
PHP管理MySQL数据库中的数据剖析_中职中专_职业教育_教育专区。PHP ...使用PHP操作MySQL数据库 PHP管理MySQL数据库中的数据 常见问题与解决方法 综合...
MySQL数据库基础与实例教程第7章剖析_图文.ppt
MySQL数据库基础与实例教程第7章剖析_中职中专_职业教育_教育专区。MySQL MySQL数据库基础与实例教程 之视图触发器 内容一览本章首先讲解了 视图以及触发器的 管理...
MySQL源码分析_图文.ppt
MySQL源码分析 代码结构与基本流程彭立勋 Alibaba DBA Team Topics ? ? ? ? ? MySQL基本架构 源码目录结构 核心类库与函数 主要模块 数据流 MySQL基本架构 ...
第17章Mysql数据库分析解析_图文.ppt
第17章Mysql数据库分析解析_幼儿读物_幼儿教育_教育专区。第17章Mysql数据库分析解析 1 Linux Advanced Enterprise Server Management LINUX高级企业服务器管理 2 《 ...
MySQL案例分析与使用优化技巧_图文.ppt
(在线点播系统)、歌华有线(视 频系统)、联想(网盘系统)、淘米网(在线游戏)、...MySQL数据水平拆分自动扩展 商业监控器高效及时分析问题、处 理问题 定期专项的...
MySQL深入分析篇线上捕获慢sql【博森瑞】.ppt
MySQL深入分析线上捕获慢sql【博森瑞】_互联网_IT/计算机_专业资料。...常见问题的体现 ? 数据库压力太大,如何下手 ? 读写分离,从哪些sql下手 ? ...
MySQL数据库的备份与恢复剖析_图文.ppt
MySQL数据库的备份与恢复剖析_中职中专_职业教育_教育专区。MySQL ...相关知识:备份需要考虑的问题 ? 可以容忍丢失多长时间的数据。 ? 恢复数据要在...
MySQL权限系统分析_图文.doc
MySQL权限系统分析_计算机软件及应用_IT/计算机_专业资料。MySQL 有一套先进的但...修改授权表增加、删除或修改用 户权限,它也允许你在检查这些表时诊断权限问题。...
MySQL数据库疑难杂症排查_图文.ppt
提示代码吗,是404还是 其他 问题追踪 问题追踪 常 见瓶 颈 确认是MySQL存在...Creating tmp table 常见于group by没有索引的情况 需要拷?数据到临时...
管中窥豹MySQL(InnoDB)死锁分析之道_图文.ppt
? 但更重要的是,死锁 “血泪史” 公司内,帮助多个业 务线解决死锁问题 公司外部,很多朋友 也在找我讨论MySQL 死锁 死锁分析,是MySQL DBA的一个不...
Mysql内存管理分析_图文.doc
Mysql内存管理分析_IT/计算机_专业资料。介绍mysql的内存管理,包括内存架构、刷盘策略等 Mysql 内存管理一、 Mysql 架构 Mysql 数据库的体系架构如上图所示 数据库...
基于MySql的日志分析系统设计_图文.ppt
主要问题 优化方式:硬件升级后期: 操作系统及文件系统调优 raid0 或 lvm条带化 修改相关mysql参数 sql语句的慢查询分析及索引调优 数据节点的优化单D点性能分析...
MySQL2介绍及性能优化分析_图文.ppt
MySQL2介绍及性能优化分析_自考_成人教育_教育专区。MySQL2 MySQL介绍及性能优化...? 由于进程调度策略问题,这个“每秒执行一次 flush(刷到磁盘)操作”并 不是...
MySQL基础技能与原理——高级应用_图文.ppt
MySQL管理基础 9. 网站常见错误分析 10. MySQL故障案例 MySQL数据库同步方案 ?...SQL未经审核上线,导致没有建立索引,执行计划为全表扫描。 MySQL故障案例 ? BR...
数据库集群中间件MySQL+Proxy研究与分析_图文.pdf
数据库集群中间件MySQL+Proxy研究与分析_计算机软件及应用_IT/计算机_专业资料。...MySQLProxy就 是这样的一种数据库集群中间件,可以很好的解决这些问题。程序员只...
MySQL源码分析_图文.ppt
MySQL源码分析_计算机软件及应用_IT/计算机_专业资料。MySQL源码分析 代码结构与基本流程 彭立勋 Alibaba DBA Team Topics MySQL基本架构 MySQL目录结构(1) ? ...