DBA Morning Check List

简介: DBA Morning Check List By Bill Richards, 2010/08/27 (first published: 2008/04/14) Database Administrators can sometimes have one of the most stressful jobs in the company.

DBA Morning Check List

By Bill Richards, 2010/08/27 (first published: 2008/04/14)

Database Administrators can sometimes have one of the most stressful jobs in the company. If you have been a DBA for long, you know the scenario. You have just sat in your chair with your cup of coffee, and your phone starts ringing off the hook. The voice on the other end states that they can't pull up their data or they are getting timeouts, or the system is running slow. Okay, time to dig in; it's going to be one of those days! Is it Friday yet?

In this article, I will present ways to minimize those stressful days by having a pre-defined DBA morning checklist. A morning DBA checklist is a document of pre-defined administrative checks that are performed every morning to ensure that your server is at optimal performance. By having a standard list of items to check, you are more likely to catch and fix issues before there is a real problem.

The end result of the morning DBA checklist should have three sections. Section one contains the list of items that need checked. Section one should include checks from the following categories: performance, job failures, disk space, backups, connectivity, and anything specific to your environment, such as replication, mirroring, clustering, etc. Section two contains a place to write down issues and how they were resolved. The third section is a confirmation section where it is signed and dated. The third section is very important. Without this section, it is difficult to enforce and guarantee that these checks were performed.

The first step to create an effective morning checklist is to meet with all the DBAs and ask them these questions: 
1. What do you check in the morning?
2. How do you check it?
3. What do you do when there is a problem?
4. Is there anyone you notify in the event of a failure?

In my experience, every DBA has his own mental checklist and different ways that he / she fix issues. It is important to get a list of the items written down in a document. By combining the ideas of every DBA, you will come up with a more thorough checklist, a standardized way to fix issues, and problems are less likely to fall through the cracks.

After the DBA morning checklist is created, completed checklists should be archived in a notebook to ensure that each check was performed every day. This also serves as a history of fixes for past issues, and an audit trail for the DBA.

Since every database environment is different, and every IS shop has its own tools, every DBA's checklist will be different. The end goal is to create a checklist that is customized to your environment, in which issues can be found and fixed quickly, so that you can avoid having one of those difficult days.

With this in mind, listed below is a sample checklist. Your checklist should be unique to your environment and should help find and fix issues as quickly as possible.

Section 1: DBA Morning Checklist

Backups

- Verify that the Network Backups are good by checking the backup emails. If a backup did not complete, contact _____ in the networking group, and send an email to the DBA group.

- Check the SQL Server backups. If a backup failed, research the cause of the failure and ensure that it is scheduled to run tonight.

- Check the database backup run duration of all production servers. Verify that the average time is within the normal range. Any significant increases in backup duration times need to be emailed to the networking group, requesting an explanation. The reason for this is that networking starts placing databases backups to tape at certain times, and if they put it to tape before the DBAs are done backing up, the tape copy will be bad.

- Verify that all databases were backed up. If any new databases were not backed up, create a backup maintenance plan for them and check the current schedule to determine a backup time.

Disk Space
- Verify the free space on each drive of the servers. If there is significant variance in free space from the day before, research the cause of the free space fluctuation and resolve if necessary. Often times, log files will grow because of monthly jobs.

Job Failures
- Check for failed jobs, by connecting to each SQL Server, selecting "job activity" and filtering on failed jobs. If a job failed, resolve the issue by contacting the owner of the job if necessary.

System Checks

- Check SQL logs on each server. In the event of a critical error, notify the DBA group and come to an agreement on how to resolve the problem.

- Check Application log on each server. In the event of a critical or unusual error, notify the DBA group and the networking group to determine what needs to be done to fix the error.

Performance

- Check Performance statistics for All Servers using the monitoring tool and research and resolve any issues.

- Check Performance Monitor on ALL production servers and verify that all counters are within the normal range.

Connectivity
- Log into the Customer application and verify that it can connect to the database and pull up data. Verify that it is performing at an acceptable speed. In the event of a failure, email the Customer Support Group, DBA group, and the DBA manager, before proceeding to resolve the issue.

- Log into the Billing application and verify that it can connect to the database and pull up data. Verify that it is performing at an acceptable speed. In the event of a failure, email the Billing Support Group, DBA group, and the DBA manager, before proceeding to resolve the issue.

Replication

- Check replication on each server by checking each publication to make sure the distributor is running for each subscription.

- When replication is stopped, or changes to replication are made, send an email to the DBA group. For example, if the DBA stops the distributor, let the other DBAs know when it is stopped and then when it is restarted again.

- Check for any emails for the SQL Jobs that monitor row counts on major tables on the publisher and subscriber. If a wide variance occurs, send an email message to the DBAs and any appropriate IS personnel.

Section 2: Write down any issues and how they were resolved

This space is reserved for writing down issues and how they were fixed.

Section 3 - Confirmation

Completed By __________________________ Date: ___________________

Conclusion

Creating a morning DBA checklist has helped me many times in the past. Often times, I found CPU usage up near 100%, broken replication, connectivity problems, and space issues that I have been able to resolve before the majority of the work force was present and the issue could escalate. By having a standard DBA checklist document, it ensures that nothing is forgotten, which could result in a problem. It also minimizes down time of a company or department, provides a archive of past issues and how they were fixed, and helps ensure that the DBA will have a less stressful day! 

目录
相关文章
|
缓存 监控 Java
游戏服务器开服异常Check List
游戏服务器开服异常Check List
58 0
Check List线段树维护偏序三元组
如上的问题是让求出满足三元组(xi,yi),(xj,yj)(xk,yk) 且{ xi < xj < xk yj < yi < yk }的数量 这里的约束条件有两个,可以称作是二维偏序问题 推荐一篇博客:链接 这里面总结了一些经验 关于这个题目,学长的博客链接里面讲到了一些做题的思路: 按照x坐标进行排序,然后对y进行离散化处理(看数据范围就会发现y的数据范围达到了1e9,但是最多只会有2e5个点) 之后,假设当前位置是pos
114 0
|
存储 Java 缓存
|
编解码 Java 数据库
|
Web App开发 测试技术 UED
Web交互设计优化的简易check list
Web交互设计优化的简易check list 00 | 时间: 2011-02-11 | 28,842 Views 交互设计, 用户研究   “优化已有产品的体验”,这是用户体验相关岗位职责中常见的描述。
1130 0
|
C#
Csharp: Treeview check list value
/// &lt;summary&gt; /// 選擇的節點 /// 塗聚文 20121116 /// 捷為工作室 /// /// &lt;/summary&gt; /// &lt;param name="sender"&gt;&lt;/param&gt;
1312 0
|
5月前
|
安全 Java
java线程之List集合并发安全问题及解决方案
java线程之List集合并发安全问题及解决方案
902 1
|
4月前
|
Java API Apache
怎么在在 Java 中对List进行分区
本文介绍了如何将列表拆分为给定大小的子列表。尽管标准Java集合API未直接支持此功能,但Guava和Apache Commons Collections提供了相关API。
|
4月前
|
运维 关系型数据库 Java
PolarDB产品使用问题之使用List或Range分区表时,Java代码是否需要进行改动
PolarDB产品使用合集涵盖了从创建与管理、数据管理、性能优化与诊断、安全与合规到生态与集成、运维与支持等全方位的功能和服务,旨在帮助企业轻松构建高可用、高性能且易于管理的数据库环境,满足不同业务场景的需求。用户可以通过阿里云控制台、API、SDK等方式便捷地使用这些功能,实现数据库的高效运维与持续优化。