Data and AI Knowledge: 2016

Thursday 22 December 2016

SQL Server : Powershell Commands for Basic Health Check

Here are the few powershell commands to do basic health check of MS SQL Server Instance:

1) SQL Database Status

invoke-sqlcmd -query "SELECT SERVERPROPERTY('ServerName') as SQLInstance,name as DBName,
state_desc as State from sys.databases;" -ServerInstance  | Format-table –AutoSize;

2) SQL Database File Size

invoke-sqlcmd -query "DECLARE @command varchar(1000); SELECT @command = 
'USE ? SELECT SERVERPROPERTY(''ServerName'') as SQLInstance, name as DB_FileName,
convert(varchar(100), FileName) as FileName, cast(Size as decimal(15,2))*8/1024 as ''SizeMB'',
FILEPROPERTY (name, ''spaceused'')*8/1024 as ''UsedMB'', Convert(decimal(10,2),convert(decimal(15,2),
(FILEPROPERTY (name, ''spaceused'')))/Size*100) as PercentFull,filegroup_name(groupid) as ''FileGroup'',
FILEPROPERTY (name, ''IsLogFile'') as ''IsLogFile''  FROM dbo.sysfiles order by IsLogFile'
EXEC sp_MSforeachdb @command" -ServerInstance  | Format-table –AutoSize;

3) SQL Instance Blocking & Session details

invoke-sqlcmd -query "SELECT SERVERPROPERTY('ServerName') as SQLInstance,s.session_id,
r.blocking_session_id,db_name(r.database_id) as [Database],r.wait_time, s.login_name,
s.cpu_time/60000 as[CPU_Time(mins)],s.memory_usage, r.[status], r.percent_complete,r.nest_level, [Text]
from sys.dm_exec_sessions as s 
inner join 
sys.dm_exec_requests as r 
on s.session_id = r.session_id cross apply sys.dm_exec_sql_text (sql_handle) 
where s.status = 'running' order by s.cpu_time desc;" -ServerInstance  | Format-table –AutoSize;

Here are the few powershell commands to get the health check at Windows Level:

1) Drive Free Space

Get-WmiObject -Class win32_Volume | Select-object Name, Label, 
@{Name="Size(GB)";Expression={[decimal]("{0:N0}" -f($_.capacity/1gb))}}, 
@{Name="Free Space(GB)";Expression={[decimal]("{0:N0}" -f($_.freespace/1gb))}}, 
@{Name="Free (%)";Expression={"{0,6:P0}" -f(($_.freespace/1gb) / ($_.capacity/1gb))}} 
| Format-table –AutoSize;

2) Top 5 CPU Utilization

Get-Process | Sort-Object -Property CPUPercent -Descending | Select-Object -First 10 
| Select-Object -Property  @{ Name = "TimeStamp"; Expression = {Get-Date}}, Name, CPU, 
@{Name = "CPUPercent"; Expression = {$TotalSec = (New-TimeSpan -Start $_.StartTime).TotalSeconds; 
[Math]::Round( ($_.CPU * 100 / $TotalSec), 2)}}, Description | Format-table –AutoSize;

3) Top 5 Memory Utilization

Get-Wmiobject -Class WIN32_PROCESS | Sort-Object -Property ws -Descending | Select-Object -First 5 
| Select-Object -Property @{ Name = "TimeStamp"; Expression = {Get-Date}}, processname, 
@{Name="Mem Usage(MB)";Expression={[math]::round($_.ws / 1mb)}},
@{Name="ProcessID";Expression={[String]$_.ProcessID}},
@{Name="UserID";Expression={$_.getowner().user}} | Format-table –AutoSize;

Friday 25 November 2016

MSBI : Data Warehouse Design Concept

Data Warehouse Design can be done by two most common approaches,There are two persons as “Gods of BIDW”: Ralph Kimball and William Inmon.
These two persons given some important Data warehouse concepts which need to be followed as standards for BIDW.

- Ralph Kimball provided “Bottom – Up approach”
- Bill Inmon provided “Top – Down approach”

In simple words definition, A data warehouse is a central repository for all an organization's data. The goal of a data mart, is to meet the particular demands of a specific group of users within the organization, such as human resource management (HRM). Generally, an organization's data marts are subsets of the organization's data warehouse.

Data warehouse (DW) Vs Data Mart (DM)

Data Warehouse:
- Holds multiple subject areas
- Holds very detailed information
- Works to integrate all data sources
- Does not necessarily use a dimensional model but feeds dimensional models.
Data Mart:
- Often holds only one subject area- for e.g. Finance, HR, IT or Sales
- May hold more summarised data (although many hold full detail)
- Concentrates on integrating information from a given subject area or set of source systems
- Is built focused on a dimensional model using a star schema.

Bill Inmon

says “A Data warehouse is a Subject-oriented, Integrated, Time-variant and Non-volatile collection of Data in support of Management’s decision making process”.

It means, First make a normalized data model then split into data marts based upon subject areas.Dimensional data marts are created only after the complete data warehouse has been created.
To built data Warehouse it will take more time compare to Ralph Kim-ball approach.

Ralph Kimball

says "The data warehouse is nothing more than the union of all the data marts".

It means, First create data marts and then make a schema relation. Dimensional data marts (DI mention and Fact table) provide a narrow view into the organizational data, then they related by start or snow flex schema based on requirement.
To built data ware house it will take less time compare to Bill Inmon approach.

MSBI : Data Warehouse

What is a Data Warehouse?

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but can include data from other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources.

There are two persons as “Gods of BIDW”: Ralph Kimball and William Inmon.
These two persons given some important Data warehouse concepts which need to be followed as standards for BIDW.
- Ralph Kimball provided “Bottom – Up approach”
- Bill Inmon provided “Top – Down approach”

In the world of BIDW, William Inmon provided most popular definition: “A Data warehouse is a Subject-oriented, Integrated, Time-variant and Non-volatile collection of Data in support of Management’s decision making process”.

What makes a Data warehouse?

1. Subject-Oriented : Data warehouse can be used to analyse a specific subject area. e.g – In an Organization, Subject area can be different Depts like HR, IT, Sales and many more. So, it can be used to analyse particular Dept like HR or IT.
2. Integrated : With Integrated we mean data from heterogeneous data sources is integrated in Data warehouse. e.g – Excel Sheet Source stores date in different format and MySQL DB source stores date in another format, but in a Data warehouse, dates will be stored in a single format only.
3. Non-volatile : Nonvolatile means that, once entered into the data warehouse, data should not change. This is logical because the purpose of a data warehouse is to enable you to analyse what has occurred.
4. Time-Variant : Data warehouse deals with historical data only. It acts as a pit where Transactional data is dumped every day. We can retrieve data which is 6 months, 12 months old or even more than this from a data warehouse. For example – You hold a SBI bank account and updated your contact details. So, Transaction system can only provide you the current contact details whereas Data warehouse holds all the contact details associated with your account (Current and Previous both).
This is very much in contrast to On-Line Transaction Processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive.

In addition to a relational database, a data warehouse environment can include an Extraction, Transformation, and Loading (ETL) solution, statistical analysis, reporting, data mining capabilities, client analysis tools, and other applications that manage the process of gathering data, transforming it into useful, actionable information, and delivering it to business users.

Friday 18 November 2016

SQL Server Services : Access is denied ( Event ID : 7000 )

Hello Everyone, A week ago I was working on fresh MS SQL Server 2012 SP3 Installation on newly built Microsoft Windows Server 2012 Standard on a VMware virtual machine. I installed default Instance with SQL Services running with "Local System", an Installation went smooth. however once Installation completed I was trying to run the SQL Services from Domain account (i.e. Domain\User) the services were stopped and there were errors in the OS system log:

Log Name:      System
Source:        Service Control Manager
Date:          10/11/2016 12:10:01 PM
Event ID:      7000
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      ******
Description:
The SQL Server (MSSQLSERVER) service failed to start due to the following error: 
Access is denied.

Case - I

I also tried to run SQL services from SQL Server Configuration Manager by creating local server user but again no success, not able to start the services getting same error.
Then I called my most technical friend Google, but unfortunately this didn't get me very far; it took a bit more digging. I got few more errors from the Application log, the System log, and the bootstrap summary (%ProgramFiles%\Microsoft SQL Server\110\Setup Bootstrap\Log\Summary.txt):

- Attempted to perform an unauthorized operation
- error: 40 - Could not open a connection to SQL Server

I was googling more detailed error logs from the bootstrap summary log that helps me to find the right path towards the solution.

Then I got to know that When you build MS Windows Server 2012 on VMware we need to disable the HotAdd/HotPlug capability.
This was a difficult one to track down, but thanks to MSDN forums and the community I now have a new item on my build checklist.

Case - II

Also there is the same solution for SQL Server failing to install if the one of the folders was set to a volume other than C: – the installation logs and Process Monitor traces pointed to “access denied” but it was not clear why this should be failing.

It turns out that the problem is specific to Windows Server 2012 when running under VMWare with the hot-plug capability enabled – under Hyper-V, VirtualBox or on physical hardware the problem does not occur.

Friday 28 October 2016

NoSQL Vs RDBMS

Relational Database Management System (RDBMS)

RDBMS Database is a relational database. It is the standard language for relational database management systems.Data is stored in the form of rows and columns in RDBMS. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, etc. All these RDBMS follows ACID property.

NoSQL

NoSQL commonly referred to as “Not Only SQL”. With NoSQL, unstructured ,schema less data can be stored in multiple collections and nodes and it does not require fixed table sachems, it supports limited join queries , and we scale it horizontally. Some common NoSQL databases are: MongoDB, Cassandra DB, Neo4j etc and all these NoSQL databases follows CAP theorem

How NoSQL Differs from Relational Databases

NoSQL and RDBMS’s are designed to support different application requirements and typically they co-exist in most enterprises.

Where to use RDBMS and NoSQL:

Now main decision points on when to use RDBMS/NoSQL which include the following:

NoSQL Database Overview

What is NoSQL Database?
NOSQL Database generally refers as "Not Only SQL" or "non-Relational". NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.

Basic Concept and Technique Used in NoSQL

1. Distribution Model: How to distribute data when scaling out, basically two type:- Sharding and Replication.
2. Consistency: NoSQL databases use CAP theorem to define consistency.
3. Data Model: In NoSQL databases are basically categories on the basis how data is actually stored, like- Column wise, Document, Graph, Key/Value Store.

1. Distribution Model :
Because of their architecture differences, NoSQL databases differ on how they support the reading, writing, and distribution of data. Some NoSQL platforms like Cassandra support writes and reads on every node in a cluster and can replicate / synchronize data between many data centers and cloud providers.
1.1. Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data.
1.2. Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places.
Replication comes in two forms:
- Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads.
- Peer-to-peer replication allows writes to any node; the nodes coordinate to synchronize their copies of the data.

2. CAP Theorem :
In distribution system, managing Consistency, Availability, Partition Tolerance exist in a mutually dependent relationship. The CAP theorem which states that in any distributed system we can choose only two of consistency, availability or partition tolerance. The relational database is in favour of consistency and availability.

- C onsistency : All nodes have the same data at the same time.
- A vailability : Every request gets a response on success/failure. Every Achieving availability in a distributed system requires that the system remains operational 100% of the time.
- P artition Tolerance : System continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn't result in a failure of the entire network.

All the three combinations can be defined as:
CA – data should be consistent between all nodes. As long as all nodes are online, users can read/write from any node and be sure that the data is the same on all nodes.
CP – data is consistent between all nodes and maintains partition tolerance by becoming unavailable when a node goes down.
AP - nodes remain online even if they can’t communicate with each other and will re-sync data once the partition is resolved, but you aren’t guaranteed that all nodes will have the same data (either during or after the partition)

3. Data Model :
Types of NoSQL Databases
, There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Here is a basic classification by data model, with examples:

3.1. Wide Column Stores:
A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements:
- Unique name: Used to reference the column
- Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
- Timestamp: The system timestamp used to determine the valid content.
Accumulo, Cassandra, Druid, HBase, Vertica are the example of Column database.

3.2. Document Stores:
The central concept of a document store is the notion of a "document". While each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON as well as binary forms like BSON.
Apache CouchDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB are the example of Document database.

3.3. Key-value Stores:
Key-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection.
Aerospike, CouchDB, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB are the example of Key-Value database.

3.4. Graph Databases:
A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
AllegroGraph, InfiniteGraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog are the example of Graph database.

Tuesday 11 October 2016

MSBI Introduction

Microsoft Business Intelligence Introduction

BI stands for Business Intelligence, it's a set of techniques which helps in gathering, storing, analysing and come up with best information from huge/big data which helps to business decision making.
In other words, BI is Integration of data from different data sources, then Analyse the Integrated data for BI engagement and Representation of the analysed result in the form of reports, dashboards, Charts etc. which helps to understand the data easily.

MSBI stands for Microsoft Business Intelligence, The MSBI suite is composed of tools which helps in providing best solutions for Business Intelligence and Data Mining Queries. This tool uses Visual studio along with MS SQL server. It empower users to gain access to accurate and up-to-date information for better decision making in an organization. It offers different tools for different processes which are required in Business Intelligence (BI) solutions.

MSBI is divided into 3 categories:-

1. SSIS – SQL Server Integration Services – Integration tool
2. SSAS – SQL Server Analytical Services – Analysis tool
3. SSRS – SQL Server Reporting Services – Reporting tool

Microsoft Business Intelligence (MSBI) Architecture

Example: Lets take an example to understand how BI beneficial for an organization: Let’s understand above picture by taking an example of an organization. Let’s take the example :- Adidas. We have outlets of Adidas in most parts of India. Every outlet stores their customer data in their respective database and it’s not mandatory that every outlet is using the same database. Some outlets may have Access database, some might be using Excel sheet or some stores prefer storing their data in simple text files, oracle etc.

Before proceeding ahead with our explanation, we should know “what OLTP is”? It stands for Online Transaction Processing. These are basically the online transactions which is heavy DML operation (Insert, Update, Delete) is performed on database at every Adidas outlet.

Phase 1- “SSIS” After storing daily data of the customers who visited Adidas outlet at different stores, the data is then integrated and saved it in a centralized database. This is done with the help of OLTP component of MS SQL Server. Integration means merging of data from different data stores (i.e. it may be a flat file, Excel sheets, Access, Oracle, etc.), refreshing data in data warehouses and to cleanse data (e.g. -Date format may be different for different outlet’s database, so same format is made to make it even) before loading to remove errors. Now, you must be clear with the Integration concept.

Phase 2- “SSAS” Next step is to analyze the stored centralized data. This huge data is then divided into Data Marts on which analytic process is carried on. Analysis services use OLAP (Online Analytical Processing) component and data mining capabilities. It allows to build multi-dimensional structures called CUBES to pre calculate and store complex aggregations, and also to build mining models to perform data analysis which helps to identify valuable information like what are the recent trends, patterns, dislikes of customers. Business analyst then perform data mining function on multi dimensional cube structure to look data from different perspectives. Multi Dimensional analysis of huge data completes.

Phase 3- “SSRS” Now, represent analysis graphically so that an organization (Adidas) can make effective decision to enhance their revenue, gain maximum profit and to reduce time wastage. So, this is done in forms of Reports, Scorecards, Dashboards, Excel workbooks, etc. This data reports will tell the organization what is the revenue of Adidas in specific time at specific place, where they captured the market, in which part they are lacking and needs to boost up and many other things which end users wish to look into. This reporting is done with a tool SQL Server Reporting Services and completes.

SQL Server : How to Change Service Account of SSRS

Change Service Account for SSRS 2012 (SQL Server Reporting Services)

To view and reconfigure service account information, always use the Reporting Services Configuration Manager. Service identity information is stored internally in multiple locations. Using the tool ensures that all references are updated accordingly whenever you change the account or password. The Reporting Services Configuration Manager performs the following additional steps to ensure the report server remains available:

- Automatically adds the new account to the report server group created on the local computer. This group is specified in the access control lists (ACLs) that secure Reporting Services files.
- Automatically updates the login permissions on the SQL Server Database Engine instance used to host the report server database. The new account will be added to the RSExecRole.
- Automatically updates the encryption key to include the profile information of the new account.

Note: If the report server is part of the scale-out deployment, only the report server that you are updating is affected. The encryption keys for other report servers in the deployment are unaffected by the service account change.

Steps to change the account

1. Open the Reporting Services Configuration Manager form Start à Programs
2. Navigate to the Service Account tab and choose one of the available system accounts, or provide a domain account as the new service account and click Apply

3. It will prompt you to backup the Encryption Key. Specify a location to store the Encryption Key file and provide the password and click OK

4. You will get a prompt to choose the administrator account for granting access rights to the new service account. Enter the account and click OK

5. The service account will now be changed to the new account and the list of activities done during the account change will be displayed on the Results window

For more info refer this Article

Thursday 15 September 2016

MySQL Backup and Restore Commands

Backup Commands

How to Backup MySQL Database?

To backup of MySQL database/databases, the database must exist in MySQL Instance and ID which is using to backup, must have access to it.
The format of the command would be:

# mysqldump -u [username] –p[password] [database_name] > [dump_file.sql]


The parameters of the said command as follows.


    [username] : A valid MySQL username.

    [password] : A valid MySQL password for the user.

    [database_name] : A valid Database name you want to take backup.

    [dump_file.sql] : The name of backup dump file you want to generate.

How to Backup a Single MySQL Database?

To take a backup of single database, use the command as follows. The command will dump database [testdb] structure with data on to a single dump file called testdb.sql.

# mysqldump -uroot -p testdb > testdb_file.sql

How to Backup Multiple MySQL Databases?

If you want to take backup of multiple databases, run the following command. The following example command takes a backup of databases [testdb, sysdb] structure and data in to a single file called testdb_sysdb.sql.

# mysqldump -uroot -p --databases testdb sysdb > testdb_sysdb.sql

How to Backup All MySQL Databases?

If you want to take backup of all databases, then use the following command with option –all-database. The following command takes the backup of all databases with their structure and data into a file called all-databases.sql.

# mysqldump -u root -p --all-databases > all-databases.sql

How to Backup MySQL Database Structure Only?

If you only want the backup of database structure without data, then use the option –no-data in the command. The below command exports database [testdb] Structure into a file testdb_structure.sql.

# mysqldump -u root -p --no-data testdb > testdb_structure.sql

How to Backup MySQL Database Data Only?

To backup database Data only without structure, then use the option –no-create-info with the command. This command takes the database [testdb] Data into a file testdb_data.sql.

# mysqldump -u root -p --no-create-db --no-create-info testdb > testdb_data.sql

How to Backup Single Table of Database?

With the below command you can take backup of single table or certain tables of your database. For example, the following command only take backup of wp_posts table from the database testdb.

# mysqldump -u root -p testdb tbl_test > testdb_test.sql

How to Backup Multiple Tables of Database?
If you want to take backup of multiple or certain tables from the database, then separate each table with space.

# mysqldump -u root -p testdb tbl_test tbl_emp > testdb_tbl.sql

How to Backup Remote MySQL Database?

The below command takes the backup of remote server [192.10.92.6] database [testdb] into a local server.

# mysqldump -h 192.10.92.6 -u root -p testdb > testdb.sql

Restore Commands

How to Restore MySQL Database?

In the above tutorial we have seen the how to take the backup of databases, tables, structures and data only, now we will see how to restore them using following format.

# # mysql -u [username] –p[password] [database_name] < [dump_file.sql]

How to Restore Single MySQL Database?

To restore a database, you must create an empty database on the target machine and restore the database using msyql command. For example the following command will restore the testdb.sql file to the testdb database.

# mysql -u root -p testdb < testdb.sql

If you want to restore a database that already exist on targeted machine, then you will need to use the mysqlimport command.

# mysqlimport -u root -p testdb < testdb.sql

In the same way you can also restore database tables, structures and data.

SQL Server Delete Duplicate Rows from Table

In this post, I will demonstrate techniques to delete duplicate rows from SQL Server database table.

One day I was working on one of the freelancing project and found duplicates records in some of client tables.
I have found the work around and created solution to delete duplicate rows.
Let’s create sample table and insert data into it:

CREATE TABLE tbl_DuplicateData
( 
 ID INTEGER PRIMARY KEY
 ,Name VARCHAR(150)
)
GO
 
INSERT INTO tbl_DuplicateData VALUES 
((1,'Roy'),(2,'Ashish'),(3,'Tarun'),
(4,'Rahul'),(5,'Kapoor'),
(6,'Nadun'),(7,'Roy'),(8,'Rahul'),(9,'Kapoor'),
(10,'Rahul'),(11,'Roy'))
GO

There can be multiple solution and different way to get out of this issue:
1st solution is using WITH CTE:

;WITH cte_FindDuplicateRows AS 
(
    SELECT 
       Name
       ,ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) RowNumber
    FROM  tbl_DuplicateData
)
DELETE FROM cte_FindDuplicateRows
WHERE RowNumber > 1
GO

Another way is using SELF JOIN:

DELETE FROM A
FROM tbl_DuplicateData AS A
INNER JOIN tbl_DuplicateData AS B
  ON A.Name = B.Name AND A.ID > B.ID
GO

Check the table after deleting duplicate rows:

SELECT *FROM tbl_DuplicateData ORDER BY ID

As per requirement you can choose any of the solution and before executing please must check your query plan and choose best solution to delete duplicate rows.

Wednesday 31 August 2016

MySQL : Basic usefull Command Part 2

11. How to reload/refresh MySQL Privileges?

# mysqladmin -u root -p reload;
# mysqladmin -u root -p refresh

12. How to shutdown MySQL server Safely?

# mysqladmin -u root -p shutdown

# /etc/init.d/mysqld stop
# /etc/init.d/mysqld start

13. Some useful MySQL Flush commands

# mysqladmin -u root -p flush-hosts
# mysqladmin -u root -p flush-tables
# mysqladmin -u root -p flush-threads
# mysqladmin -u root -p flush-logs
# mysqladmin -u root -p flush-privileges
# mysqladmin -u root -p flush-status

flush-hosts: Flush all host information from host cache.
flush-tables: Flush all tables.
flush-threads: Flush all threads cache.
flush-logs: Flush all information logs.
flush-privileges: Reload the grant tables (same as reload).
flush-status: Clear status variables

14. How to kill Sleeping MySQL Client Process?

 # mysqladmin -u root -p processlist

Enter password:
+----+------+-----------+----+---------+------+-------+------------------+
| Id | User | Host      | db | Command | Time | State | Info             |
+----+------+-----------+----+---------+------+-------+------------------+
| 5  | root | localhost |    | Sleep   | 14   |       |					 |
| 8  | root | localhost |    | Query   | 0    |       | show processlist |
+----+------+-----------+----+---------+------+-------+------------------+

# mysqladmin -u root -p kill 5

Enter password:
+----+------+-----------+----+---------+------+-------+------------------+
| Id | User | Host      | db | Command | Time | State | Info             |
+----+------+-----------+----+---------+------+-------+------------------+
| 12 | root | localhost |    | Query   | 0    |       | show processlist |
+----+------+-----------+----+---------+------+-------+------------------+

To Kill multiple:

# mysqladmin -u root -p kill 5,10

15. How to run multiple mysqladmin commands together?

# mysqladmin  -u root -p processlist status version

Enter password:
+----+------+-----------+----+---------+------+-------+------------------+
| Id | User | Host      | db | Command | Time | State | Info             |
+----+------+-----------+----+---------+------+-------+------------------+
| 8  | root | localhost |    | Query   | 0    |       | show processlist |
+----+------+-----------+----+---------+------+-------+------------------+
Uptime: 3801  Threads: 1  Questions: 15  Slow queries: 0  
Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.003
mysqladmin  Ver 8.42 Distrib 5.5.28, for Linux on i686
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Server version          5.5.28
Protocol version        10
Connection              Localhost via UNIX socket
UNIX socket             /var/lib/mysql/mysql.sock
Uptime:                 1 hour 3 min 21 sec

16. How to Connect remote mysql server ?

# mysqladmin  -h 19x.1x.x5.22x -u root -p

17. How to execute command on remote MySQL server?

# mysqladmin  -h 19x.1x.x5.22x -u root -p status

18. How to start/stop MySQL replication on a slave server?

 # mysqladmin  -u root -p start-slave

# mysqladmin  -u root -p stop-slave

19. How to store MySQL server Debug Information to logs?

# mysqladmin  -u root -p debug

20. How to view mysqladmin options and usage ?

# mysqladmin --help

Monday 22 August 2016

MySQL : Basic usefull Command Part 1

1. How to set MySQL Root password?

# mysqladmin -u root password YOURNEWPASSWORD

2. How to Change MySQL Root password?

 mysqladmin -u root -p123456 password 'xyz123'

3. How to check MySQL Server is running?

# mysqladmin -u root -p ping

Enter password:
mysqld is alive

4. How to Check which MySQL version I am running?

# mysqladmin -u root -p version

Enter password:
mysqladmin  Ver 8.42 Distrib 5.5.28, for Linux on i686
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Server version          5.5.28
Protocol version        10
Connection              Localhost via UNIX socket
UNIX socket             /var/lib/mysql/mysql.sock
Uptime:                 7 days 14 min 45 sec

5. How to Find out current Status of MySQL server?

# mysqladmin -u root -ptmppassword status

Enter password:
Uptime: 606704  Threads: 2  Questions: 36003  Slow queries: 0
Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.059

6. How to check status of all MySQL Server Variable’s and value’s?

# mysqladmin -u root -p extended-status

Enter password:
+------------------------------------------+-------------+
| Variable_name                            | Value       |
+------------------------------------------+-------------+
| Aborted_clients                          | 3           |
| Aborted_connects                         | 3           |
| Binlog_cache_disk_use                    | 0           |
| Binlog_cache_use                         | 0           |
| Binlog_stmt_cache_disk_use               | 0           |
| Binlog_stmt_cache_use                    | 0           |
| Bytes_received                           | 6400357     |
| Bytes_sent                               | 2610105     |
| Com_admin_commands                       | 3           |
| Com_assign_to_keycache                   | 0           |
| Com_alter_db                             | 0           |
| Com_alter_db_upgrade                     | 0           |
| Com_alter_event                          | 0           |
| Com_alter_function                       | 0           |
| Com_alter_procedure                      | 0           |
| Com_alter_server                         | 0           |
| Com_alter_table                          | 0           |
| Com_alter_tablespace                     | 0           |
+------------------------------------------+-------------+

7. How to see all MySQL server Variables and Values?

To see all the running variables and values of MySQL server, use the command as follows.

# mysqladmin  -u root -p variables

Enter password:
+---------------------------------------------------+------------------------------+
| Variable_name                                     | Value                        |
+---------------------------------------------------+------------------------------+
| auto_increment_increment                          | 1                            |
| auto_increment_offset                             | 1                            |
| autocommit                                        | ON                           |
| automatic_sp_privileges                           | ON                           |
| back_log                                          | 50                           |
| basedir                                           | /usr                         |
| big_tables                                        | OFF                          |
| binlog_cache_size                                 | 32768                        |
| binlog_direct_non_transactional_updates           | OFF                          |
| binlog_format                                     | STATEMENT                    |
| binlog_stmt_cache_size                            | 32768                        |
| bulk_insert_buffer_size                           | 8388608                      |
| character_set_client                              | latin1                       |
| character_set_connection                          | latin1                       |
| character_set_database                            | latin1                       |
| character_set_filesystem                          | binary                       |
| character_set_results                             | latin1                       |
| character_set_server                              | latin1                       |
| character_set_system                              | utf8                         |
| character_sets_dir                                | /usr/share/mysql/charsets/   |
| collation_connection                              | latin1_swedish_ci            |
+---------------------------------------------------+------------------------------+

8. How to check all the running Process of MySQL server?

# mysqladmin -u root -p processlist

Enter password:
+-------+---------+-----------------+---------+---------+------+
| Id    | User    | Host            | db      | Command | Time |
+-------+---------+-----------------+---------+---------+------+
| 18001 | rsyslog | localhost:38307 | rsyslog | Sleep   | 5590 |
| 18020 | root    | localhost       |         | Query   | 0    |
+-------+---------+-----------------+---------+---------+------+

9. How to create a Database in MySQL server?

# mysqladmin -u root -p create mydatabasename

Enter password:

# mysql -u root -p

Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 18027
Server version: 5.5.28 MySQL Community Server (GPL) by Remi

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mydatabasename       |
| mysql              |
| test               |
+--------------------+
8 rows in set (0.01 sec)

mysql>

10. How to drop a Database in MySQL server?

# mysqladmin -u root -p drop databasename

Enter password:
Dropping the database is potentially a very bad thing to do.
Any data stored in the database will be destroyed.

Do you really want to drop the 'databasename' database [y/N] y
Database "databasename" dropped

MySQL InnoDB Vs MyISAM Storage Engine

InnoDB vs MyISAM

InnoDB stores data in a file(s) called a tablespace. In newer versions of MySQL, you can store the data and indexes in separate files. MyISAM stores the data and indexes in two files (.MYD, .MYI). InnoDB is extremely critical about its tablespace and log files (They normally should be backed up together). You can't just backup your data by copying files like you would with MyISAM. In some situations you can only restore a table to the server from which you backed it up!
InnoDB are built on clustered indexes and uses MVCC to achieve high concurrency. This provides very fast primary key lookups. MyISAM doesn't support transactions, FK contraints, or row-level locks. MyISAM uses shared and exclusive locks on the entire table. However, concurrent reads & inserts for new rows are allowed.
InnoDB is crash safe (Assuming your flushes are truly durable on disk and not on some volatile cache). MyISAM is no where close to being crash safe. If you care about your data, use InnoDB. It might be OK to use MyISAM for read-only workloads.
InnoDB will support full-text indexes in MySQL 5.6. MyISAM currently supports this feature. Normally these type of things should be offloaded to something like a Sphinx server.
InnoDB repairs are reasonably fast. MyISAM is slow and you might not get all your data back. Eek.
InnoDB requires a lot of memory (buffer pool). The data and indexes are cached in memory. Changes are written to the log buffer (physical memory) and are flushed every second to the log files (method depends on innodb_flush_log_at_trx_commit value). Having the data in memory is a huge performance boost. MyISAM only caches indexes (key_buffer_size) so that's where you would allocate most of your memory if you're only using MyISAM.

Tuesday 16 August 2016

SQL Server Database Growth Rates

Script to find the Growth Size for All files in All Databases:

SELECT   'Database Name' = DB_NAME(database_id)
,'FileName' = NAME
,FILE_ID
,'size' = CONVERT(NVARCHAR(15), CONVERT(BIGINT, size) * 8) + N' KB'
,'maxsize' = (CASE max_size 
WHEN - 1 THEN N'Unlimited'
ELSE CONVERT(NVARCHAR(15), CONVERT(BIGINT, max_size) * 8) + N' KB'
END)
,'growth' = (CASE is_percent_growth
WHEN 1 THEN CONVERT(NVARCHAR(15), growth) + N'%'
ELSE CONVERT(NVARCHAR(15), CONVERT(BIGINT, growth) * 8) + N' KB'
END)
,'type_desc' = type_desc
FROM sys.master_files
ORDER BY database_id

SQL Server catalog stores informations about every single database backup in msdb..backupset.
If you don’t have other mechanism to collect historical database size then this can be achieve by calculating Backup size from msdb..backupset table. This is the best step to start for a capacity planning through Backup history.
Following points to remember before starting:
– msdb..backupset stores historical informations about backup size NOT database file size. This is good for you if you need to understand how your real stored data are growing day by day but obviously datafiles are typically larger: there is empty space inside for future data.
– Every full database backup contains a little part of logs used for recover. For this reason the size reported is not always exactly your data dimension but usually this is not relevant.

Script to find out current and previous size (in megabytes), as well as Increased_Size, for all databases:

SELECT
s.[database_name]
,s.[backup_start_date]
,CAST( ( s.[backup_size] / 1024 / 1024 ) AS INT ) AS [BackupSize(MB)]
,CAST( ( LAG( s.[backup_size] ) 
OVER (PARTITION BY s.[database_name] ORDER BY s.[backup_start_date])/1024/1024) AS INT ) 
AS [Previous_BackupSize(MB)]
,(CAST( ( s.[backup_size] / 1024 / 1024 ) AS INT )-CAST( ( LAG( s.[backup_size] ) 
OVER ( PARTITION BY s.[database_name] ORDER BY s.[backup_start_date] ) / 1024 / 1024 ) AS INT ))
AS [Increased_Size(MB)]
FROM [msdb]..[backupset] s
WHERE s.[type] = 'D' 
ORDER BY s.[database_name],s.[backup_start_date];
GO

To calculate the same for individual database and also by the no. of days:

Declare @dbname nvarchar(1024)    
Declare @days int              
          
--Configure HERE database name  
set @dbname ='YourDB_Name'  
--and number of days to analyse  
set @days   =365;  
  
--Daily Report  
WITH TempTable(Row,database_name,backup_start_date,Mb)   
as   
( SELECT   
ROW_NUMBER() OVER(order by backup_start_date) as Row, 
database_name,
backup_start_date,
cast(backup_size/1024/1024 as decimal(10,2)) MB
from msdb..backupset
where
type='D' and
database_name=@dbname and
backup_start_date>getdate()-@days )
select
A.database_name,
A.backup_start_date,
A.Mb as daily_backup
A.Mb - B.Mb as increment_mb
from TempTable A left join TempTable B on A.Row=B.Row+1  
order by database_name,backup_start_date

Friday 5 August 2016

MySQL Memory & CSV Storage Engine

MEMORY storage engine

The MEMORY storage engine (formerly known as HEAP) creates special-purpose tables with contents that are stored in memory. Because the data is vulnerable to crashes, hardware issues, or power outages, only use these tables as temporary work areas or read-only caches for data pulled from other tables.

MEMORY storage engine creates a table in memory only, this allows for extremely fast data access, however the drawback is that data is not persisted across reboots, also if you replicate this table to a slave server, it will also truncate the slave table as well even though it did not restart. If rows are deleted from a memory table you must use either alter table or optimize table to defrag the table and reduce the amount of memory used.

The MEMORY storage has the following features:
- All data is stored in memory
- Non-transactional
- No foreign key support
- very fast read and write activity due to being entirely in memory
- table-level locking
- A MEMORY table can include yo to 32 indexes per table
- implements both hash and b-tree indexes
- tables use a fixed length row storage format and thus cannot be defined with data types of blob or text

The MEMORY storage engine has a few parameters that you can tweak

max_heap_table_size - the maximum size of MEMORY tables is limited by the system variables, default is 16MB
init_file - this can be used to specify a file to use to populate a MEMORY table when mysqld starts

CSV Storage Engine

The CSV storage engine stores data in text files using comma-separated values format.The CSV storage engine is always compiled into the MySQL server. To examine the source for the CSV engine, look in the storage/csv directory of a MySQL source distribution.

When you create a CSV table, the server creates a table format file in the database directory. The file begins with the table name and has an .frm extension. The storage engine also creates a data file. Its name begins with the table name and has a .CSV extension. The data file is a plain text file. When you store data into the table, the storage engine saves it into the data file in comma-separated values format.

CSV storage has the following features:

- Plain text data in CSV format
- Easily imported into programs such as Microsoft Excel
- Table-level locking
- No foreign key support
- Non-transactional
- Trivial backups and restores (just copy the files)
- Does not support indexes or partitioning

Three files are created:
.frm which contains the table format
.CSM which contains metadata and lastly
.CSV file which contains the data.

MySQL InnoDB Storage Engine

InnoDB :- InnoDB is a general-purpose storage engine that balances high reliability and high performance. In MySQL 5.7, InnoDB is the default MySQL storage engine. Unless you have configured a different default storage engine, issuing a CREATE TABLE statement without an ENGINE= clause creates an InnoDB table.

With the Innodb storage engine you have control of the format and the location of the tablespace, a tablespace is a logical group of one or more data files in a database, using parameters you can control the path of the home directory and if you want to use separate files or a shared tablespace.

innodb_data_file_path - Determine both the path to individual centralized data files (shared tablespace) and the size of the files
innodb_data_home_dir - The common part of the directory path for all InnoDB data files
innodb_file_per_table - If enable then any InnoDB tables will be using their own .idb file for both data and indexes rather than in the shared tablespace.

It has the following key features:
- Transactional support provided by MVCC (Multi Version Concurrency Control)
- Row-level locking
- Foreign key support
- Indexing using clustered B-tree indexes
- Configurable buffer caching of both data and indexes
- Online non-blocking backup through separate commercial backup program

The full path to each shared tablespace is formed by adding innoDB_data_home_dir to each path specified in the innoDB_data_file_path, the file sizes can be specified in Kb, Mb or Gb. By default if innoDB_data_file_path is not specified a 10MB ibdata1 file is created in the data directory (datadir). You cannot just move the data files from one system to another like you can in the MyISAM engine.

There are a number of configuration parameters that you can use with the InnoDB storage engine:

innodb_buffer_pool_size- Determines the size of the buffer that the Innodb storage engine uses to cache both data and indexes
innodb_flush_log_at_trx_commit -Configures how often the log buffer is flushed to disk
0 - log buffer is written every second
1 - every commit will make the log buffer flush
2 - basically is a combination of 0 and 1
innodb_log_file_size
- Determines the size (in bytes) of each of the Innodb log files, by default they are 5MB. Remember the bigger the file the slower

The larger the buffer the more data you can hold in memory which in turn increases performance, thus try to keep your most frequently used data in memory. Don't go too mad as to much memory could cause swapping by the operating system which in turn will degrade performance.
You can get detailed information regarding your InnoDB storage engine using the show command, there are a number of status sections which I will highlight now:

Innodb Status sections
show command	show engine innodb status
semaphores	Reports threads waiting for semaphore and statistics on how many times threads have been forced to wait for an OS call, waiting on a spin wait, or a mutex or rw-lock semaphore. A large number of threads waiting indicates that there is a disk I/O problem or contention issues within the InnoDB.
foreign key errors	Displays information about foreign key problems
deadlocks	Displays information about the last deadlock that occurred
transactions	Reports lock waits, which may highlight lock contention within your application, it can also help detecting transaction deadlocks.
file I/O	Show activity about the threads used by the I/O
insert buffer and adaptive hash index	Displays information about insert buffer including size and amount of free space.
log	show information about the log files
buffer pool and memory	show buffer pool activity, including hit rates, etc
row operations	show activity of the main thread

InnoDB uses shared tablespace for all tables and indexes that can consist of one or more files, hey are generally located in the datadir directory, the files contain metadata and are referred to as ibdata files. When the tablespace becomes fragmented the files are not shrunk, but the free space is still available to the database, you can view this free space by using the data_free field of the information_schema.tables system view and the data_free field of show table status.

You can have separate files instead by using the parameter innodb_file_per_table as mentioned above, however a small amount of metadata will be in the shared tablespace, to defrag the tablespace use the optimize table command which will be discussed in another section entitle SQL Extensions.

When working with ibdata files you can add additional files, these can be split over different disks, but with SAN replacing the old disk technology this seems a lesser requirement today, here are some examples on creating the ibdata files:

ibdata files
## Create a 1GB ibdata file that will autoextend if needed
innodb_data_file=ibdata1:1024M:autoextend
innodb_data_file=ibdata1:1G:autoextend

## create 1GB ibdata file and a second 1GB ibdata file that will autoextend
innodb_data_file=ibdata1:1024M;ibdata2:1024M:autoextend

MySQL will not stripe across disks, thus the first data file must be full before using the second data file, also some tables may be split across the two files when you start to use the second file, for example say that you have 1MB free in the first file and you create a 2MB table, 1MB will be in the first file and the second 1MB will be in the second file.

If you wish to change the size of the data files there is no easy way, the following steps must be taken
- perform a export of all the InnoDB tables
- shutdown MySQL
- change the configuration options as desired to remove or reorganize a tablespace
- move the existing ibdata file and log files to a backup location
- restart MySQL
- import the InnoDB tables
- once happy you can delete the old database files to recoup some space back

Due to the log files, InnoDB automatic crash recovery is possible, just like in other databases all changes are recorded in the log files and replayed back if the server were to crash. There are a number of parameters that can affect the log files

innodb_log_files_in_group	the log files are written in a circular manner, they are initialized to there full size when MySQL starts
innodb_log_file_size	sets the logfile size in each group, the total size of all the logfiles must be less than 4GB, but this may be due to the O/S limitations so check with the latest MySQL release
innodb_fast_shutdown	this determines if logs files are purged during a shutdown which means it may take longer to completely shutdown MySQL 0 - log files are purged after shutdown 1 - log files are not purged after shutdown (default) 2 - simulates a crash, which causes the log buffer to be written after every commit and flushes the log files to disk once per second
innodb_flush_log_at_trx_commit	controls how often the log files are written to 0 - causes the log files to be written to disk once per second 1 - causes the log files to be written after every transaction commit (default) 2 - a combination of 0 and 1
innodb_flush_method	changes how Innodb opens and flushes data log files, this is a trade off between performance and inconsistent data during a crash, I will discuss this further in tuning.
innodb_log_buffer_size	this is a write buffer for the log files, the larger the buffer the less often the log files are written to thus saving I/O.

InnoDB Size Limits

• Max # of tables: 4 G
• Max size of a table: 32TB
• Columns per table: 1000
• Max row size: n*4 GB
• 8 kB if stored on the same page
• n*4 GB with n BLOBs
• Max key length: 3500
• Maximum tablespace size: 64 TB
• Max # of concurrent trxs: 1023

MySQL MyISAM Storage Engine

MyISAM Storage Engine

MyISAM has three files associated with it, because the three files represent a table they can be simply copied to another server and used, however to avoid corruption you should take down mysql server before copying. The other server must have the same endian format as the source server, so you cannot copy from linux X86 server to a Sparc server for instance.

.frm - The table format file
.MYD - The data file
.MYI - The index file

MyISAM has the following features:

- Non-transactional
- No foreign key support
- FULLTEXT indexes for text matching
- No data cache
- Index caches can be specified by name
- Implements both HASH and BTREE indexes
- Table level locking
- Very fast read activity, suitable for data warehouses
- Compressed data (with myisampack)
- Online backup with mysqlhotcopy
- Maximum of 64 indexes per table

You can set a number of parameters within the my.cnf configuration file that relate to the MyISAM engine:

key_buffer_size
Determines the size of the memory cache used for storing MyISAM indexes, the default is 8MB and the maximum is 4GB

concurrent_insert
Determines the behavior of concurrent inserts
0 - disables concurrent inserts
1 (default) - concurrent inserts with no data gaps are enabled
2 - concurrent inserts with data gaps are enabled

delay_key_write
Delays updating indexes for MyISAM tables until table are closed.
ON (default) - MyISAM tables that have delay_key_write option defined will delay index updates
OFF - disable delayed index writes entirely
ALL - enable delayed index writes entirely

max_write_lock_count
Determines how many writes to a table take precedence over reads, this helps with read starvation due to constant writes to a table, the default is 4294967295 I will cover this more in SQL Extensions section.

preload_buffer_size
Determines the size of the buffer used for index preloading of the key cache, the default is 32KB.

There are three utility programs that can be used with MyISAM tables:

myisamchk - used to analyze, optimize and repair tables, to avoid data corruption you should shutdown mysql when performing any of the below actions.
# check a table
myisamchk /var/lib/mysql/.MYI

# Repair a table
myisamchk -r /var/lib/mysql/.MYI

myisampack - used to create compressed, read-only tables, to avoid data corruption you should shutdown mysql when performing any of the below actions.
# compress a table
myisampack .MYI

# Rebuild the indexes for optimal performance
myisampack --rq --sort-index -analyze .MYI

myisam_ftdump - used to display information about fulltext fields in tables
# First you need the program to analyze, use show create table command
show create table employees\G
# Now you can dump the table, notice the index number at the end,
myisam_ftdump employees 1

We will talk about the Merge Storage engine as it is related to the MyISAM one, it is actually a sort of wrapper around MyISAM tables with the same schemas. All underlying tables can be queried at once by querying the Merged table. This is one way on how to implement partitioning in MySQL, the merged tables use the same buffers and configurations as the underlying MyISAM tables.
Two files are created .frm file contains the table format and the second file .MRG contains the names of the underlying MyISAM tables, as a side note you cannot use the replace statement with merged tables.
The benefits of merged tables are better management of tables and better performance, using merged tables with smaller underlying tables not only speeds up these operations because of the smaller table size but it will allow you to rotate out the table from use by modifying the merge table definition to exclude it while maintenance is occurring.

MySQL Pluggable Storage Engine Architecture

MySQL is very different from other databases, in that it's storage engine is pluggable, by pluggable I mean is that the MySQL Instance core code is separate from the storage engine, which means that we can choose a pluggable storage engine that fits as per the application.

MySQL has many storage engines, here is a list of few common engines:

Storage Engine	Transactional Support	Locking Level	Online Backup	Version(s) available
InnoDB (default)	Yes	Row	Yes	5.1, 5.5, 5.6
MyISAM/Merge	No	Table	No	5.1, 5.5, 5.6
Memory	No	Table	No	5.1, 5.5, 5.6
Marta	Yes	Row	No	5.1, 5.5, 5.6
Falcon	Yes	Row	Yes	5.6
PBXT	Yes	Row	Yes	5.1, 5.5, 5.6
FEDERATED	No	n/a	n/a	5.1, 5.5, 5.6
NDB	Yes	Row	Yes	5.1 and in MySQL Cluster
Archive	No	Row	No	5.1, 5.5, 5.6
CSV	No	Table	No	5.1, 5.5, 5.6

The storage engines can vastly improve a database performance when using the correct one, below is a list of parameters that allow you to decide to change from the default InnoDB storage engine:

- Transactional Support - If you application does not require transactional support then you can reduce the resources by using a non-transaction engine.

- Table-Level features - If you need specific features for an example the checksum feature then you would have to use the MyISAM engine.

- Locking - Depending on your application you may want row or table locking.

- Indexing - Different storage engines use different indexing strategies that meet different application types, OLAP, Data Warehouse, etc.

- Foreign Keys - A number of storage engines do not support foreign keys, so double check when you select a particular engine.

- Buffering - Some engines use buffering others don't, for example the MyISAM engine does not buffer data but supports multiple buffers for indexes.

- File Storage - Some engines store their data/indexes in self-contained files, others use centralized metadata this has a impact if you want to move the data files to another server.

- Backup - some engines allow you to perform hot backups, others you need to take the database down before performing a backup.

Although the idea is great sometimes you can get too bogged down on what engine to choose, hence that's why most mysql storage engines will be either be MyISAM or InnoDB. Today we will be covering only MyISAM, InnoDB, MEMORY and CSV in depth, and rest you can read from the MySQL documentation.

MyISAM was the default engine until recently, this engine has been used since MySQL version 3.2, it is a non-transactional engine and does not implement any additional locking mechanisms, this may cause problems when you have a large number of concurrent writes to a table. If your application has a lot of write activity the writes will end up blocking the reads which will cause performance problems, thus you should use either a InnoDB or Falcon engine.

Wednesday 3 August 2016

MySQL Database Architecture

MySQL Database Architecture MySQL is based on tiered architecture, consisting of both subsystems and support components that interact with each other to read, parse and execute queries, and to cache and return query results, which is almost similar to other RDBMS.
MySQL architecture consists of five primary subsystems that work together to respond to a request made to MySQL database server.
Lets have a look into the below picture and its components:

1) Query Engine

SQL Interface
The SQL interface provides the mechanisms to receive commands and transmit results to the user. The MySQL SQL interface was built to the ANSI SQL standard and accepts the same basic SQL statements as most ANSI-compliant database servers. Although many of the SQL commands supported in MySQL have options that are not ANSI standard, the MySQL developers have stayed very close to the ANSI SQL standard.
Connections to the database server are received from the network communication pathways and a thread is created for each. The threaded process is the heart of the executable pathway in the MySQL server. MySQL is built as a true multithreaded application whereby each thread executes independently of the other threads (except for certain helper threads). The incoming SQL command is stored in a class structure and the results are transmitted to the client by writing the results out to the network communication protocols. Once a thread has been created, the MySQL server attempts to parse the SQL command and store the parts in the internal data structure.

Parser
When a client issues a query, a new thread is created and the SQL statement is forwarded to the parser for syntactic validation (or rejection due to errors). The MySQL parser is implemented using a large Lex-YACC script that is compiled with Bison. The parser constructs a query structure used to represent the query statement (SQL) in memory as a tree structure (also called an abstract syntax tree) that can be used to execute the query.

Query Optimizer
The MySQL query optimizer subsystem is considered by some to be misnamed. The optimizer used is a SELECT-PROJECT-JOIN strategy that attempts to restructure the query by first doing any restrictions (SELECT) to narrow the number of tuples to work with, then performs the projections to reduce the number of attributes (fields) in the resulting tuples, and finally evaluates any join conditions. While not considered a member of the extremely complicated query optimizer category, the SELECT-PROJECT-JOIN strategy falls into the category of heuristic optimizers. In this case, the heuristics (rules) are simply
• Horizontally eliminate extra data by evaluating the expressions in the WHERE (HAVING) clause.
• Vertically eliminate extra data by limiting the data to the attributes specified in the attribute list. The exception is the storage of the attributes used in the join clause that may not be kept in the final query.
• Evaluate join expressions.

This results in a strategy that ensures a known-good access method to retrieve data in an efficient manner. Despite critical reviews, the SELECT-PROJECT-JOIN strategy has proven effective at executing the typical queries found in transaction processing.

1. The first step in the optimizer is to check for the existence of tables and access control by the user. If there are errors, the appropriate error message is returned and control returns to the thread manager, or listener. Once the correct tables have been identified, they are opened and the appropriate locks are applied for concurrency control.

2. Once all of the maintenance and setup tasks are complete, the optimizer uses the internal query structure and evaluates the WHERE conditions (a restrict operation) of the query. Results are returned as temporary tables to prepare for the next step. If UNION operators are present, the optimizer executes the SELECT portions of all statements in a loop before continuing. The next step in the optimizer is to execute the projections. These are executed in a similar manner as the restrict portions, again storing the intermediate results as temporary tables and saving only those attributes specified in the column specification in the SELECT statement.

3. Lastly, the structure is analysed for any JOIN conditions that are built using the join class, and then the join::optimize() method is called. At this stage the query is optimized by evaluating the expressions and eliminating any conditions that result in dead branches or always true or always false conditions (as well as many other similar optimizations). The optimizer is attempting to eliminate any known-bad conditions in the query before executing the join. This is done because joins are the most expensive and time consuming of all of the relational operators. It is also important to note that the join optimization step is performed for all queries that have a WHERE or HAVING clause regardless of whether there are any join conditions. This enables developers to concentrate all of the expression evaluation code in one place. Once the join optimization is complete, the optimizer uses a series of conditional statements to route the query to the appropriate library method for execution.

Query Execution
Execution of the query is handled by a set of library methods designed to implement a particular query. For example, the mysql_insert() method is designed to insert data. Likewise, there is a mysql_select() method designed to find and return data matching the WHERE clause. This library of execution methods is located in a variety of source code files under a file of a similar name (e.g., sql_insert.cc or sql_select.cc). All of these methods have as a parameter a thread object that permits the method to access the internal query structure and eases execution. Results from each of the execution methods are returned using the network communication pathways library. The query execution library methods are clearly implemented using the interpretative model of query execution.

Query Cache
While not its own subsystem, the query cache should be considered a vital part of the query optimization and execution subsystem. The query cache is a marvelous invention that caches not only the query structure but also the query results themselves. This enables the system to check for frequently used queries and shortcut the entire query optimization and execution stages altogether. This is another of the technologies that is unique to MySQL. Other database system cache queries, but no others cache the actual results. As you can appreciate, the query cache must also allow for situations where the results are “dirty” in the sense that something has changed since the last time the query was run (e.g., an INSERT, UPDATE, or DELETE was run against the base table) and that the cached queries may need to be occasionally purged.

2) Buffer Manager/Cache and Buffers
The caching and buffers subsystem is responsible for ensuring that the most frequently used data (or structures, as you will see) are available in the most efficient manner possible. In other words, the data must be resident or ready to read at all times. The caches dramatically increase the response time for requests for that data because the data is in memory and thus no additional disk access is necessary to retrieve it. The cache subsystem was created to encapsulate all of the caching and buffering into a loosely coupled set of library functions. Although you will find the caches implemented in several different source code files, they are considered part of the same subsystem.
A number of caches are implemented in this subsystem. Most of the cache mechanisms use the same or similar concept of storing data as structures in a linked list. The caches are implemented in different portions of the code to tailor the implementation to the type of data that is being cached.

Let’s look at each of the caches:
Table Cache
The table cache was created to minimize the overhead in opening, reading, and closing tables (the .FRM files on disk). For this reason, the table cache is designed to store metadata about the tables in memory. This makes it much faster for a thread to read the schema of the table without having to reopen the file every time. Each thread has its own list of table cache structures. This permits the threads to maintain their own views of the tables so that if one thread is altering the schema of a table (but has not committed the changes) another thread may use that table with the original schema. The structure used is a simple one that includes all of the metadata information for a table. The structures are stored in a linked list in memory and associated with each thread.

Record Cache
The record cache was created to enhance sequential reads from the storage engines. Thus the record cache is usually only used during table scans. It works like a read-ahead buffer by retrieving a block of data at a time, thus resulting in fewer disk accesses during the scan. Fewer disk accesses generally equates to improved performance. Interestingly, the record cache is also used in writing data sequentially by writing the new (or altered) data to the cache first and then writing the cache to disk when full. In this way write performance is improved as well. This sequential behavior (called locality of reference) is the main reason the record cache is most often used with the MyISAM storage engine, although it is not limited to MyISAM. The record cache is implemented in an agnostic manner that doesn’t interfere with the code used to access the storage engine API. Developers don’t have to do anything to take advantage of the record cache as it is implemented within the layers of the API.

Key Cache
The key cache is a buffer for frequently used index data. In this case, it is a block of data for the index file (B-tree) and is used exclusively for MyISAM tables (the .MYI files on disk). The indexes themselves are stored as linked lists within the key cache structure. A key cache is created when a MyISAM table is opened for the first time. The key cache is accessed on every index read. If an index is found in the cache, it is read from there; otherwise, a new index block must be read from disk and placed into the cache. However, the cache has a limited size and is tunable by changing the key_cache_block_size configuration variable. Thus not all blocks of the index file will fit into memory.
So how does the system keep track of which blocks have been used?
The cache implements a monitoring system to keep track of how frequent the index blocks are used. The key cache has been implemented to keep track of how “warm” the index blocks are. Warm in this case refers to how many times the index block has been accessed over time. Values for warm include BLOCK_COLD, BLOCK_WARM, and BLOCK_HOT. As the blocks cool off and new blocks become warm, the cold blocks are purged and the warm blocks added. This strategy is a least recently used (LRU) page-replacement strategy—the same algorithm used for virtual memory management and disk buffering in operating systems—that has been proven to be remarkably efficient even in the face of much more sophisticated page-replacement algorithms. In a similar way, the key cache keeps track of the index blocks that have changed (called getting “dirty”). When a dirty block is purged, its data is written back to the index file on disk before being replaced. Conversely, when a clean block is purged it is simply removed from memory.

Privilege Cache
The privilege cache is used to store grant data on a user account. This data is stored in the same manner as an access control list (ACL), which lists all of the privileges a user has for an object in the system. The privilege cache is implemented as a structure stored in a first in, last out (FILO) hash table. Data for the cache is gathered when the grant tables are read during user authentication and initialization. It is important to store this data in memory as it saves a lot of time reading the grant tables.

Hostname Cache
The hostname cache is another of the helper caches, like the privilege cache. It too is implemented as a stack of a structure. It contains the hostnames of all the connections to the server. It may seem surprising, but this data is frequently requested and therefore in high demand and a candidate for a dedicated cache.

Miscellaneous
A number of other small cache mechanisms are implemented throughout the MySQL source code. One example is the join buffer cache used during complex join operations. For example, some join operations require comparing one tuple to all the tuples in the second table. A cache in this case can store the tuples read so that the join can be implemented without having to reread the second table into memory multiple times.

3) The Storage Manager
The storage manager interfaces with the operating system to write data to the disk efficiently. Because the storage functions reside in a separate subsystem, the MYSQL Storage engine operates at a level of abstraction away from the operating system. The storage manager writes to disk all of the data in the user tables. indexes, and logs as well as the internal system data.

4) The Transaction Manager
The function of the Transaction manager is to facilitate concurrency in data access. This subsystem provides a locking facility to ensure that multiple simultaneous users access the data in consistent way. Without corrupting or damaging the data. Transaction control takes place via the lock manager subcomponent, which places and release locks on various objects being used in transaction.

5) Recovery Manager.
The Recovery Manager’s job is to keep copies of data for retrieval later, in case of loss of data. It also logs commands that modify the data and other significant events inside the database So far, only InnoDB and BDB table handlers.

Subsystem Interaction and control Flow
The Query engine requests the data to be read or written to buffer manager to satisfy a users query.it depend on the transaction manager for locking of the data so that the concurrency is ensured. To perform table creation and drop operations, the query engine accesses the storage manager directly, bypassing the buffer manager.to create or delete the files in the file system.
The buffer manager caches data from the storage manager for efficient retrieval by the query engine. It depends on the transaction manager to check the locking status of data before it performs any modification action.
The Transaction manager depends on the Query cache and storage manager to place locks on data in memory and in the file system.
The recovery manager uses the Storage manager to store command/event logs and backups of the data in the file system. It depends on the transaction manager to obtain locks on the log files being written. The recovery manager also needs to use the buffer manager during recovery from crashes.
The storage manager depends on the operating system file system for persistent storage and retrieval of data. It depends on the transaction manager to obtain locking status information.