当前位置:首页 >> IT/计算机 >>

MySQL线上常见故障剖析_图文

MySQL线上常见故障剖析
俊达

各种故障
? 应用获取不到连接池 ? 数据库响应慢 ? SQL慢 ? 服务器load高 ? SWAP ? 表不见了

? MySQL crash
? 主机Hung ? …

观察你的系统
? MySQL
– – – – – – 活动进程(Process list) 日志文件(slow log, alert log, general query log, binlog) Status variables(com_select, com_insert,.etc ) InnoDB(物理读、逻辑读、innodb status) 参数配置 Stack trace(plus source code) 执行计划,explain

? ?

SQL


OS

– –

内存, SWAP, /proc/meminfo
CPU, load, ps IO(磁盘、网络)
? Iostat

?

Profile
– – Oprofile gprof

Case 1: XXX系统报连接池满
iostat

orzdba

slowlog
? What’s in slow log?

Mk-query-digest
? mk-query-digest 全面分析slow log

explain
? 查看执行计划
– 选择了不好的索引

哪些SQL在执行
? Slow log
– Set global long_query_time=0

? General log ? Binlog
– For DML, mysqlbinlog binlog解析

? Processlist
– If some query is really slow

? Tcpdump
– Tcpdump + mk-query-digest

Case 2: 很多MySQL线程都卡住了
? Processlist
Id: 1842782 User: provide Host: 192.168.0.1:59068 db: provide Command: Query Time: 2326 State: Waiting for table Info: update table_xxxx set sold=sold+1, money=money+39800, Gmt_create=now() where xxxx_id=1 and day='2011-10-07 00:00:00 Id: 1657130 User: provide Host: 192.168.0.2:40093 db: provide Command: Query Time: 184551 State: Sending data Info: select xxxx_id, sum(sold) as sold from table_xxxx where xxxx_id in (select xxxx_id from table_xxxx where Gmt_create >= '2011-10-05 08:59:00') group by xxxx_id 1044 system user Connect 27406 Flushing tables FLUSH TABLES

? Pstack
#0 0x0000003b4380ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000005aac4a in wait_for_refresh () #2 0x00000000005b2857 in open_table () #3 0x00000000005b312f in open_tables () #4 0x00000000005b3440 in open_and_lock_tables () #5 0x00000000005817a4 in mysql_execute_command () #6 0x0000000000586516 in mysql_parse ()

#7 0x0000000000586a65 in dispatch_command ()
#8 0x0000000000588923 in handle_one_connection () #9 0x0000003b438064a7 in start_thread () from /lib64/libpthread.so.0

? Processlist分析
– 谁是因,谁是果?

? System user execute flush tables
– System user是谁, mysql主从复制(io thread, sql thread) – Binlog

? 谁最先执行了flush tables
– 人工执行? – App? 没有权限 – 定时任务,备份
? Xtrabackup 会执行flush tables with read lock, 不记录到binlog ? Mysqldump理论上不会执行flush tables,但如果有bug呢 ( http://bugs.mysql.com/bug.php?id=35157)

Case 3: 服务器load高
? 调查问题
– SQL层面未见明显异常 – 业务没有变动,没有发布 – 调用量没有明显变化

? Iostat
– r/s, w/s – await, svctm – avgrq-sz

? Blktrace, btt

? IO调度算法
– cfq -> deadline

Case 4: DDL lost table
? alert.log大量报错
– 持续10几分钟后,Table lost。

? 几百个进程都block在”opening tables”,这些表都不是DDL的那个表

丢表时的alert.log
110803 2:15:02 InnoDB: Warning: problems renaming 'feel_23/#sql-2635_23da0d' to 'feel_23/feed_send_1476', 24998 iterations InnoDB: Warning: tablespace './feel_23/#sql-2635_23da0d.ibd' has i/o ops stopped for a long time 24998 110803 2:15:02 InnoDB: Warning: problems renaming 'feel_xx/#sql-2635_23da0d' to 'feel_xx/feed_send_xxxx', 24999 iterations InnoDB: Warning: tablespace './feel_23/#sql-2635_23da0d.ibd' has i/o ops stopped for a long time 24999 110803 2:15:02 InnoDB: Warning: problems renaming 'feel_xx/#sql-2635_23da0d' to 'feel_xx/feed_send_xxxx', 25000 iterations InnoDB: Warning: tablespace './feel_23/#sql-2635_23da0d.ibd' has i/o ops stopped for a long time 25000 110803 110803 2:15:02 InnoDB: Warning: problems renaming 'feel_xx/#sql-2635_23da0d' to 'feel_xx/feed_send_xxxx', 25001 iterations 2:15:02 [ERROR] Cannot find or open table feel_23/feed_send_1476 from

the internal data dictionary of InnoDB though the .frm file for the table exists. Maybe you have deleted and recreated InnoDB data files but have forgotten to delete the corresponding .frm files of InnoDB tables, or you have moved .frm files to another database? or, the table contains indexes that this version of the engine doesn't support. See http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html how you can resolve the problem.

Pstack - master thread
#0 #1 #2 #3 #4 #5 #6 #7 #8 0x000000364aacced2 in select () from /lib64/libc.so.6 0x00002aaab2e595fb in os_thread_sleep () 0x00002aaab2e18838 in fil_mutex_enter_and_prepare_for_io () 0x00002aaab2e18aa5 in fil_io () 0x00002aaab2df5b63 in buf_flush_buffered_writes () 0x00002aaab2df6048 in buf_flush_batch () 0x00002aaab2ea13d8 in srv_master_thread () 0x000000364b6064a7 in start_thread () from /lib64/libpthread.so.0 0x000000364aad3c2d in clone () from /lib64/libc.so.6

Pstack – alter table
#0 #1 #2 #3 #4 #5 #6 #7 #8 #9 0x000000364aacced2 in select () from /lib64/libc.so.6 0x00002aaab2e595fb in os_thread_sleep () 0x00002aaab2e1a3e2 in fil_rename_tablespace () 0x00002aaab2e0672b in dict_table_rename_in_cache () 0x00002aaab2e86af5 in row_rename_table_for_mysql () 0x00002aaab2e316db in ha_innodb::rename_table () 0x00000000006bea6c in mysql_rename_table () 0x00000000006c77ff in mysql_alter_table () 0x00000000005c6a8e in mysql_execute_command () 0x00000000005cd371 in mysql_parse ()

#10 0x00000000005cd773 in dispatch_command () #11 0x00000000005cea04 in do_command () #12 0x00000000005bf0d7 in handle_one_connection ()

retry:

(fil_rename_tablespace)

... Some lines ommitted here(print warning,...) space->stop_ios = TRUE; if (node->n_pending > 0 || node->n_pending_flushes > 0) {

/* There are pending i/o's or flushes, sleep for a while and
retry */ mutex_exit(&fil_system->mutex); os_thread_sleep(20000); goto retry; } else if (node->modification_counter > node->flush_counter) { /* Flush the space */ mutex_exit(&fil_system->mutex); os_thread_sleep(20000); fil_flush(id); goto retry; } else if (node->open) { /* Close the file */ fil_node_close_file(node, fil_system); } space->stop_ios = FALSE; mutex_exit(&fil_system->mutex);

retry:(fil_mutex_enter_and_prepare_for_io) mutex_enter(&fil_system->mutex); if (space_id == 0 || space_id >= SRV_LOG_SPACE_FIRST_ID) { return; } if (fil_system->n_open < fil_system->max_n_open) { return; } space = fil_space_get_by_id(space_id); if (space != NULL && space->stop_ios) { /* We are going to do a rename file and want to stop new i/o's for a while */ if (count2 > 20000) { fputs("InnoDB: Warning: tablespace ", stderr); ut_print_filename(stderr, space->name); fprintf(stderr,

" has i/o ops stopped for a long time %lu\n",
(ulong) count2); } mutex_exit(&fil_system->mutex); os_thread_sleep(20000); count2++; goto retry;

mysql_alter_table() { error= copy_data_between_tables(table, new_table, ...

VOID(pthread_mutex_lock(&LOCK_open));

if (mysql_rename_table(old_db_type, db, table_name, db, old_name, FN_TO_IS_TMP)){ error=1; VOID(quick_rm_table(new_db_type, new_db, tmp_name, FN_IS_TMP)); } else if (mysql_rename_table(new_db_type, new_db, tmp_name, new_db, new_alias, FN_FROM_IS_TMP) || ((new_name != table_name || new_db != db) && (need_copy_table != ALTER_TABLE_METADATA_ONLY || mysql_rename_table(save_old_db_type, db, table_name, new_db, new_alias, NO_FRM_RENAME)) && Table_triggers_list::change_table_name(thd, db, table_name, new_db, new_alias))) { error=1; VOID(quick_rm_table(new_db_type,new_db,new_alias, 0)); VOID(quick_rm_table(new_db_type, new_db, tmp_name, FN_IS_TMP));

VOID(mysql_rename_table(old_db_type, db, old_name, db, alias,
FN_FROM_IS_TMP)); } VOID(pthread_mutex_unlock(&LOCK_open));

Case 5: MyISAM
? SQL
select count(*) as num from vid_film left join vid_class_map

on vid_film.id=vid_class_map.v_id
and vid_class_map.s_type=1 and vid_class_map.class_id=1; +-------+ | num | +-------+ | 13536 |

Orzdba
? Qps不到50,cpu耗尽(user 25%, sys 75%)
-----load-avg---- ---cpu-usage--- ---swap--- -QPS- -TPS- -Hit%time | 1m 5m 15m |usr sys idl iow| si so| ins upd del sel iud lor hit | 15:39:20|11.09 5.59 5.07| 26 74 0 0| 0 0| 0 0 0 4 0 0 100.00| 15:39:22|11.09 5.59 5.07| 25 75 0 0| 0 0| 0 0 0 47 0 0 100.00| 15:39:24|14.20 6.33 5.31| 25 74 0 0| 0 0| 0 0 0 8 0 0 100.00| 15:39:25|14.20 6.33 5.31| 26 74 0 0| 0 0| 0 0 0 44 0 0 100.00| 15:39:27|14.20 6.33 5.31| 26 74 0 0| 0 0| 0 0 0 8 0 0 100.00| 15:39:28|14.20 6.33 5.31| 26 74 0 0| 0 0| 0 0 0 36 0 0 100.00|

vmstat
? CS每秒25万次
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu-----r b swpd free buff cache si so bi bo in cs us sy id wa st 49 0 256 2139532 1623780 19942492 0 0 0 68 1027 256677 25 75 0 0 0 48 0 256 2139532 1623784 19942492 0 0 0 16 1007 251610 25 75 0 0 0 47 0 256 2139548 1623784 19942500 0 0 0 0 1003 251741 25 75 0 0 0 33 0 256 2139548 1623784 19942504 0 0 0 0 1003 251605 25 75 0 0 0 49 0 256 2139424 1623792 19942500 0 0 0 36 1007 251313 25 74 0 0 0

50 0 256 2139424 1623792 19942508 0 0 0 52 1006 251436 25 74 0 0 0

strace mysqld
? 大量futex调用
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- ---------

84.93 53.100058 145 365451 127938 futex
14.65 9.159573 277563 33 select 0.16 0.099231 149 668 read 0.13 0.084391 126 668 668 sched_setparam 0.13 0.078847 159 496 write

------ ----------- ----------- --------- --------100.00 62.522100 367316 128606 total

Oprofile global
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % linenr info app name symbol name 2200675 40.3334 processor_idle.c:222 vmlinux acpi_processor_idle 1131206 20.7324 futex.c:1973 vmlinux do_futex 893066 16.3679 futex.c:727 vmlinux futex_wake 489614 8.9735 (no location information) libpthread-2.5.so pthread_mutex_lock 244503 4.4812 thread_info.h:63 vmlinux acpi_safe_halt 66355 1.2161 sched.c:2056 vmlinux thread_return 66173 1.2128 futex.c:603 vmlinux wake_futex 50622 0.9278 (no location information) libpthread-2.5.so pthread_mutex_unlock 39782 0.7291 (no location information) libpthread-2.5.so __lll_lock_wait 27363 0.5015 mf_keycache.c:1666 mysqld find_key_block 25902 0.4747 my_handler.c:124 mysqld ha_key_cmp 25624 0.4696 mf_keycache.c:2542 mysqld key_cache_read 22179 0.4065 bmove512.c:40 mysqld bmove512 21092 0.3866 mf_keycache.c:1349 mysqld unreg_request 14228 0.2608 mi_search.c:184 mysqld _mi_bin_search 11051 0.2025 futex.c:2020 vmlinux sys_futex

Oprofile mysqld
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % linenr info symbol name 27363 14.8909 mf_keycache.c:1666 find_key_block 25902 14.0958 my_handler.c:124 ha_key_cmp 25624 13.9445 mf_keycache.c:2542 key_cache_read 22179 12.0697 bmove512.c:40 bmove512 21092 11.4782 mf_keycache.c:1349 unreg_request 14228 7.7428 mi_search.c:184 _mi_bin_search 4927 2.6813 mi_search.c:64 _mi_search 4069 2.2143 mi_key.c:220 _mi_pack_key 4046 2.2018 sql_select.cc:11896 join_read_key(st_join_table*) 3116 1.6957 sql_select.cc:11345 sub_select(JOIN*, st_join_table*, bool) 2995 1.6299 sql_select.cc:11413 evaluate_join_record(JOIN*, st_join_table*, int) 2748 1.4955 mi_rkey.c:26 mi_rkey 2674 1.4552 mi_key.c:468 _mi_read_key_record 2170 1.1809 mi_page.c:25 _mi_fetch_keypage 1648 0.8968 item_sum.cc:1193 Item_sum_count::add() 1588 0.8642 mi_rnext.c:28 mi_rnext 1564 0.8511 sql_select.cc:12360 end_send_group(JOIN*, st_join_table*, bool)

pstack
典型的调用栈 #0 0x000000321220d174 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003212208b00 in _L_lock_1233 () from /lib64/libpthread.so.0 #2 0x0000003212208a83 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00000000008889db in key_cache_read () #4 0x000000000083fe28 in _mi_fetch_keypage () #5 0x000000000083efa1 in _mi_search () #6 0x000000000083f14d in _mi_search () #7 0x000000000083f692 in _mi_search_next () #8 0x000000000083bbf0 in mi_rnext () #9 0x00000000008332b5 in ha_myisam::index_next () #10 0x00000000006393eb in join_read_next () #11 0x00000000006243ba in sub_select () #12 0x0000000000636e2d in do_select () #13 0x00000000006442f7 in JOIN::exec () #14 0x0000000000645ff8 in mysql_select () #15 0x00000000006469a7 in handle_select ()

Summary
? 了解系统各个部件的工作原理
– MySQL和InnoDB – SQL和索引 – OS 进程、线程、内存 – IO、文件操作 – 硬件的能力(响应时间、吞度量)

?

数据采集和观测
– 熟悉工具 – 收集各个部件的数据,数据的采集和保存 – 汇总,对比,分析 – 重视证据

?

问题的解决
– 应用层优化(设计、SQL优化、索引调整) – 参数调整(MySQL, OS)

– patch

谢谢大家


相关文章:
MySQL线上常见故障剖析_图文.pdf
MySQL线上常见故障剖析 - MySQL线上常见故障剖析 俊达 各种故障 ?
吴炳锡_MySQL线上SQL捕获及分析_图文.pdf
吴炳锡_MySQL线上SQL捕获及分析 - MySQL线上SQL捕获及分析 吴炳
管中窥豹MySQL(InnoDB)死锁分析之道_图文.ppt
? 但更重要的是,死锁 “血泪史” 公司内,帮助多个业 务线解决死锁问题 公司外部,很多朋友 也在找我讨论MySQL 死锁 死锁分析,是MySQL DBA的一个不...
Mysql数据库操作基础剖析_图文.ppt
Mysql数据库操作基础剖析_中职中专_职业教育_教育专区。Mysql 一、Mysql概述特点: ? MySQL是一款从Linux系统上移植过来的优秀的开源软件,适应 于所有的平台 (在网上...
第17章Mysql数据库分析解析_图文.ppt
第17章Mysql数据库分析解析_幼儿读物_幼儿教育_教育专区。第17章Mysql数据库分析解析 1 Linux Advanced Enterprise Server Management LINUX高级企业服务器管理 2 《 ...
MySQL数据库基础与实例教程第7章剖析_图文.ppt
MySQL数据库基础与实例教程第7章剖析_中职中专_职业教育_教育专区。MySQL MySQL数据库基础与实例教程 之视图触发器 内容一览本章首先讲解了 视图以及触发器的 管理...
室内分布常见故障分析和处理剖析_图文.ppt
室内分布常见故障分析和处理剖析_自考_成人教育_教育专区。室内分布常见故障分析和处理剖析 室分系统常见故障及处理 室分系统主要设备连接结构图 基站控 制器 馈线 ...
MySQL基础技能与原理——高级应用_图文.ppt
MySQL管理基础 9. 网站常见错误分析 10. MySQL故障案例 MySQL数据库同步方案 ?...SQL未经审核上线,导致没有建立索引,执行计划为全表扫描。 MySQL故障案例 ? BR...
自动化设备常见故障分析与经验剖析_图文.ppt
自动化设备常见故障分析与经验剖析 - 自动化设备电气及维修知识培训 ? 随着磁头
PHP管理MySQL数据库中的数据剖析_图文.ppt
PHP管理MySQL数据库中的数据剖析_中职中专_职业教育_教育专区。PHP ...使用PHP操作MySQL数据库 PHP管理MySQL数据库中的数据 常见问题与解决方法 综合...
MySQL源码分析_图文.ppt
MySQL源码分析 代码结构与基本流程彭立勋 Alibaba DBA Team Topics ? ? ? ? ? MySQL基本架构 源码目录结构 核心类库与函数 主要模块 数据流 MySQL基本架构 ...
MySQL数据库的备份与恢复剖析_图文.ppt
MySQL数据库的备份与恢复剖析_中职中专_职业教育_教育专区。MySQL ...相关知识:备份需要考虑的问题 ? 可以容忍丢失多长时间的数据。 ? 恢复数据要在...
POLYCOM高清产品常见故障排查剖析_图文.ppt
POLYCOM高清产品常见故障排查剖析_中职中专_职业教育_教育专区。POLYCOM Polycom 产品常见故障排查金昊 2013年7月 呼叫中断原因:网络中断;到达最长通话时间; 措施:检查...
MySQL深入分析篇线上捕获慢sql【博森瑞】.ppt
MySQL深入分析线上捕获慢sql【博森瑞】_互联网_IT/计算机_专业资料。...常见问题的体现 ? 数据库压力太大,如何下手 ? 读写分离,从哪些sql下手 ? ...
MySQL主从同步--原理、问题、解决方案和应用@淘宝丁奇_....ppt
MySQL主从同步--原理、问题、解决方案和应用 @淘宝丁奇 2009-8-22 讲师介绍...MySQL主从同步解决方案解决方案分析: 1、一定要多线程!为什么? 2、多线程又会...
MYSQL查询缓存性能问题分析与优化案例(绝对精华)_图文.pdf
MYSQL查询缓存性能问题分析与优化案例(绝对精华)_IT/计算机_专业资料。逐步定位mysql性能问题到查询缓存大小设置并进行了优化。过程最重要,定位出原因,解决问题是顺理...
mysql_wp_enterprise_queryanalyzer.zhMySQL企业版监控....pdf
MySQL 企业版监控器与查询分析器监控应用程序的查询问题 MySQL 白皮书 2010 年 ...常见的查询监控和调优方法即使是上面提到了这些限制,MySQL 提供了几种帮助 DBA ...
MySQL案例分析与使用优化技巧_图文.ppt
(在线点播系统)、歌华有线(视 频系统)、联想(网盘系统)、淘米网(在线游戏)、...MySQL数据水平拆分自动扩展 商业监控器高效及时分析问题、处 理问题 定期专项的...
基于MySql的日志分析系统设计_图文.ppt
主要问题 优化方式:硬件升级后期: 操作系统及文件系统调优 raid0 或 lvm条带化 修改相关mysql参数 sql语句的慢查询分析及索引调优 数据节点的优化单D点性能分析...
基于日志分析的MySQL数据库取证算法研究_图文.pdf
University 万方数据 上海交通大学硕士学位论文 基于日志分析MySQL 数据库取证算法研究 摘要 MySQL 数据库的使用越来越广泛, MySQL 数据库取证的问题也变得 越来越...
更多相关标签: