Monday, 29 June 2015

SQL DBA's Roles & Responsibilities

  In Brief DBA's Roles & Responsibilities


Key Responsibilities of DBA's

We can categorize SQL Server DBA Responsibilities into 7 types:

  • Capacity Management
  • Security Management
  • Backup and Recovery Management
  • High Availability Management
  • Performance Tuning
  • Process Improvements
  • Daily, Weekly and Monthly maintenance
  • Installations / Upgrades / Patching
  • Health Checks / Report Analysis

DBA's Daily Responsibilities

First always check following things, after coming to shift.. :-)
Request Handling:

  • Check the escalated issues first
  • Check the current queue for requests and identify requests to be processed and work on the issue.
  • We usually process the requests based on the SLA.

On daily basis DBA's should keep an eye on below areas:

1. Backups: We can automate this backup process using auto backup Scripts (will share this in later posts).
  • Confirm that backups have been made and successfully saved to a secure location. 
  • Review the average duration of backup, any significant changes occurred investigates on this. Most of the time it happens due to networking low bandwidth 
  • Validate the backup files using restore verify only. We can create jobs to take care of the task and to send a notification if it fails to verify any backup.
  • Check the backup failure alerts, correct the errors and rerun the backups
2. Disk Space

  • Verify the free space on each drive on all servers. If there is significant variance in free space from the day before, research the cause of the free space fluctuation and resolve if necessary. Often times, log files will grow because of monthly jobs.
  • Automate through a job. The job runs for every one hour and reports any drive which is having less than 15 % of free space. We can design a SSRS report to showcase and review the delta values (I'll show you in later post how to create such reports).
3. Jobs / Maintenance plans

  • Check for the failed jobs, investigate the route cause and resolve the issue. Native SQL notification alert sends a mail to DBA team and also we can design a customized SSRS report which reports all failed/success job details with delta value.
  • Check for the job execution duration and if any significant changes find the root cause and resolve the issue.
4. Servers/Databases

  • Confirm all servers/databases are up and running fine.
  • Usually in an Enterprise Database Environment Third Party Tools are used to monitor Servers (Ex: “What’s Up”)
  • For database monitoring we can design a native SQL Server solution using T-SQL code and a maintenance plan, it run min by min and send an email to DBA team if it is not able to connect to any of the database in the instance.
5. Performance

  • Regularly monitor and identify blocking issues. We can design a procedure that continuously run on all PROD servers and notifies DBA team if any blockings, long running quires/transactions.
  • Check Performance counters on all production servers and verify that all counters are within the normal range. We can design a SSRS metrics report to review the counters for every one hour.
  • Throughout the day, periodically monitor performance using both System Monitor and DMV.
  • Check the fragmentation and rebuild/ reorganize the indexes. We can use a native procedure which takes care of the task.
  • Make sure all Stats Update / nightly_checkalloc / Index_Rebuild jobs are completed without any issue.
6. Logs

  • Have a look at both SQL Server logs and Windows logs. If you find any strange issues notify the network or storage teams. Most of the times we find Network related or I/O related issues.
  • Check the centralized error logs if any.
7. Security

  • Check the error logs for failed logins and notify the audit team if necessary
  • Security Logs – Review the security logs from a third party solution or from the SQL Server Error Logs to determine if you had a breach or a violation in one of your policies.
8. High-Availability

  • High Availability or Disaster Recovery Logs – Check your high availability and/or disaster recovery process logs.  Depending on the solution (Log Shipping, Clustering, Replication, Database Mirroring) that you are using dictates what needs to be checked.
  • We can design a native scripts using T-SQL to monitor Replication, DB-Mirroring, Log-shipping
  • Monitor log-shipping and mirroring using the customized stored procs.
  • In most of the environments we see third party tools in monitoring Clusters or we can design our own native scripts using Windows Batch Programming , Powershell and T-SQL.

Weekly / Monthly Check-list


  • Backup Verification (Comprehensive)- Verify your backups and test on a regular basis to ensure the overall process works as expected. Contact your off site tape vendor and validate the type does not have any restore errors
  • Fragmentation – Review the fragmentation for your databases to determine if you particular indexes must be rebuilt based on analysis from a backup SQL Server.
  • Maintenance – Schedule an official maintenance, do all required health checks on all premium databases and servers.
  • Security – Remove unneeded logins and users for individuals that have left the organization, had a change in position, etc.
  • Check the logins, service accounts for expire dates
  • Windows, SQL Server or Application Updates – Check for service packs/patches that need to be installed on your SQL Server from either a hardware, OS, DBMS or application perspective
  • Capacity Planning – Perform capacity planning to ensure you will have sufficient storage for a specific period of time such as for 3, 6, 12 or 18 months.

Monitoring Infrastructure


  • We need to work with the other teams to make sure that all servers are at health condition and to balance the infrastructure.
  • Usually we approach other teams for below (CR/IR/SR – Change Request/ Incident / Service)
  • Found I/O errors
  • Networking issues
  • Adding space / drives – SAN (Storage Area Network)
  • Starting a Server – When a manual interaction needed
  • Backup Tapes – Archived backups
  • Escalating an issue

Documentation


  • Document all changes you make to the environment that includes:
  • Installations / Upgrades
  • Service packs / HotFixes applied
  • New databases added
  • Logins \ Roles added / removed
  • Check lists
  • Infrastructure changes
  • Process improvements
  • Maintain a centralized inventory with all details for below items
  • Database Servers
  • Application Servers
  • Web Servers
  • Logins (Both SQL Server and Windows)
  • Database Instances
  • Databases (Dev / Stag / QA / PROD)

Note: Document as much as information. It really helps in critical situations. Try to add owner at each level, for example application owners, server owners etc., We can easily reach them in-case of any emergencies / Maintenance.
 

No comments:

Post a Comment