Tuesday, 30 June 2015

SQL Server Architecture

All the information is consolidated from technet, msdn & different blogs.

In brief about MS SQL Server Architecture


MS SQL Server Architecture Diagram


First will explain about the each components and then about Communication between client and database Server-Network Flow in SQL Server.

Pages and Extents

Database Pages:
The fundamental unit of data storage in SQL Server is the page.

Page size is 8kb means 128 pages = 1 MB.
Page starts with the header of 96 bytes that is used to store page number, page type, the amount of free space on the page, and the object id that owns the page. The maximum size of a single row on a page is 8060 bytes. But this restriction is relaxed for tables which are having varchar, nvarchar, Varbinary, Text or Image columns.

ROW_OVERFLOW_DATA allocation unit:
When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit.
This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page.

EXTENT in a database system:
An extent is eight physically contiguous pages, or 64 KB means 16 Extents= 1 MB.
To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data.

SQL Server has two types of extents

- Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
- Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.
Different Types of Pages available:

- Data: Stores all data except those data types stored by text and image pages.
- Index: Stores index data.
- Text and Image: Stores TEXT, NTEXT, IMAGE, VARCHAR(MAX), VARBINARY(MAX) and XML data types. In addition these pages also store variable length columns (VARCHAR, VARBINARY, SQL_VARIANT data types) once they exceed 8KB in size.
- IAM (Index Allocation Map): Information about extents used by a table or index per allocation unit.
- GAM (Global Allocation Map):
- Extents have been allocated: 1 – Free space 0 – No space
- SGAM (Shared Global Allocation Map ):
Mixed Extents have been allocated: 1 – Free Space + Mixed Extent and 0 – No space
Each GAM / SGAM covers 64000 extents – 4 GB
- PFS (Page Free Space): Percentage of free space available in each page in an extent.
- DCM (Differential Changed Map): This tracks the extents that have changed since the last BACKUP DATABASE statement. 1 – Modified, 0 – Not modified
- BCM (Bulk Changed Map): This tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement. 1 – Modified, 0 – Not modified (Used only in bulk logged Recovery model)

In each data file pages are arranged like below:

Along with that we have three different data pages
Data
Index
Text/ Image (LOB, ROW_OVERFLOE, XML)
Note: Log files do not contain pages; they contain a series of log records. Files and File Groups

About “Files” or “File Groups” in SQL Server Database:

Files: There are three types:

Primary - The recommended file name extn for primary data files is .mdf.The primary data file contains the startup information for the database and points to the other files in the database. User data and objects can be stored in this file or in secondary data files. Every database has one primary data file.
Secondary – The recommended file name extn for secondary data files is .ndf.Secondary data files are optional, are user-defined, and store user data. Secondary files can be used to spread data across multiple disks by putting each file on a different disk drive. Additionally, if a database exceeds the maximum size for a single Windows file, you can use secondary data files so the database can continue to grow.
Log files – recommended file name extn for transaction logs is .ldf.
The transaction log files hold the log information that is used to recover the database. There must be at least one log file for each database.
File Groups: There are two:

Primary File Group – All system tables
User Defined – Depends
All secondary files and user defined file groups are created to optimize the data access and for partitioning the tables.


Advantages:
- Using filegroups, you can explicitly place database objects into a particular set of database files. For example, you can separate tables and their nonclustered indexes into separate filegroups. This can improve performance, because modifications to the table can be written to both the table and the index at the same time. This can be especially useful if you are not using striping with parity (RAID-5).
- Another advantage of filegroups is the ability to back up only a single filegroup at a time. This can be extremely useful for a VLDB, because the sheer size of the database could make backing up an extremely time-consuming process.
- Yet another advantage is the ability to mark the filegroup and all data in the files that are part of it as either read-only or read-write.

Disadvantages
- The first is the administration that is involved in keeping track of the files in the filegroup and the database objects that are placed in them.
- The other is that if you are working with a smaller database and have RAID-5 implemented, you may not be improving performance.
Transaction log Architecture

Transaction log Architecture is divided into two types,
Logical Architecture and Physical Architecture.
Transaction log Logical Architecture:

Each log record is identified by a unique number called LSN (Log Sequence Number). A log record contains the LSN, TransactionID to which it belongs and data modification record.
Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data. The before image is a copy of the data before the operation is performed; the after image is a copy of the data after the operation has been performed. The steps to recover an operation depend on the type of log record:

- Logical operation logged
- To roll the logical operation forward, the operation is performed again.
- To roll the logical operation back, the reverse logical operation is performed.
- Before and after image logged
- To roll the operation forward, the after image is applied.
- To roll the operation back, the before image is applied.
Many types of operations are recorded in the transaction log. These operations include:

- The start and end of each transaction.
- Every data modification (insert, update, or delete). This includes changes by system stored procedures or data definition language (DDL) statements to any table, including system tables.
- Every extent and page allocation or deallocation.
- Creating or dropping a table or index.
Rollback operations are also logged. Each transaction reserves space on the transaction log to make sure that enough log space exists to support a rollback that is caused by either an explicit rollback statement or if an error is encountered. The amount of space reserved depends on the operations performed in the transaction, but generally it is equal to the amount of space used to log each operation. This reserved space is freed when the transaction is completed. The section of the log file from the first log record that must be present for a successful database-wide rollback to the last-written log record is called the active part of the log, or the active log. This is the section of the log required to a full recovery of the database. No part of the active log can ever be truncated. The log sequence number (LSN) of this first log record is known as the minimum recovery LSN (MinLSN).

Transaction Log Physical Architecture:
The transaction log is used to guarantee the data integrity of the database and for data recovery.
The SQL Server Database Engine divides each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file.
The only time virtual log files affect system performance is if the log files are defined by small size and growth_increment values. If these log files grow to a large size because of many small increments, they will have lots of virtual log files. This can slow down database startup and also log backup and restore operations.

The transaction log is a wrap-around file. For example, consider a database with one physical log file divided into four virtual log files. When the database is created, the logical log file begins at the start of the physical log file. New log records are added at the end of the logical log and expand toward the end of the physical log. Log truncation frees any virtual logs whose records all appear in front of the minimum recovery log sequence number (MinLSN). The MinLSN is the log sequence number of the oldest log record that is required for a successful database-wide rollback. The transaction log in the example database would look similar to the one in the following illustration.



When the end of the logical log reaches the end of the physical log file, the new log records wrap around to the start of the physical log file.


This cycle repeats endlessly, as long as the end of the logical log never reaches the beginning of the logical log. If the old log records are truncated frequently enough to always leave sufficient room for all the new log records created through the next checkpoint, the log never fills. However, if the end of the logical log does reach the start of the logical log, one of two things occurs:
- If the FILEGROWTH setting is enabled for the log and space is available on the disk, the file is extended by the amount specified in the growth_increment parameter and the new log records are added to the extension. For more information about the FILEGROWTH setting, see ALTER DATABASE File and Filegroup Options (Transact-SQL).
- If the FILEGROWTH setting is not enabled, or the disk that is holding the log file has less free space than the amount specified in growth_increment, an 9002 error is generated.
If the log contains multiple physical log files, the logical log will move through all the physical log files before it wraps back to the start of the first physical log file.

Checkpoints in SQL Server database:
Checkpoints flush dirty data pages from the buffer cache of the current database to disk. This minimizes the active portion of the log that must be processed during a full recovery of a database. During a full recovery, the following types of actions are performed:
- The log records of modifications not flushed to disk before the system stopped are rolled forward.
- All modifications associated with incomplete transactions, such as transactions for which there is no COMMIT or ROLLBACK log record, are rolled back.

Before a database backup, the Database Engine automatically performs a checkpoint so that all changes to the database pages are contained in the backup. Also, stopping a server issues a checkpoint in each database on the server.

Active Log:
The section of the log file from the MinLSN to the last-written log record is called the active portion of the log, or the active log. This is the section of the log required to do a full recovery of the database. No part of the active log can ever be truncated. All log records must be truncated from the parts of the log before the MinLSN.

Write Ahead Transaction Log:
SQL Server uses a write-ahead log (WAL), which guarantees that no data modifications are written to disk before the associated log record is written to disk.
Data modifications are not made directly to disk, but are made to the copy of the page in the buffer cache. The modification is not written to disk until a checkpoint occurs in the database. A page modified in the cache, but not yet written to disk, is called a dirty page. The internal process that actually goes on:

- Copy of the data pages are pulled and placed in buffer cache
- Applied the operation on the pages that are on buffer cache
- Write the log record details (Pages modified) to Disk
- Write / flush /apply the page to the disk
If step 4 happens before the step 3 then rollback is impossible. SQL Server takes the responsibility of writing the log details to disk before flushing the dirty pages.

Log Truncation
Log truncation is essential to keep the log from filling. Log truncation deletes inactive virtual log files from the logical transaction log of a SQL Server database, freeing space in the logical log for reuse by the physical transaction log. If a transaction log were never truncated, it would eventually fill all the disk space that is allocated to its physical log files. However, before the log can be truncated, a checkpoint operation must occur. A checkpoint writes the current in-memory modified pages (known as dirty pages) and transaction log information from memory to disk. When the checkpoint is performed, the inactive portion of the transaction log is marked as reusable. Thereafter, the inactive portion can be freed by log truncation. The following illustrations show a transaction log before and after truncation. The first illustration shows a transaction log that has never been truncated. Currently, four virtual log files are in use by the logical log. The logical log starts at the front of the first virtual log file and ends at virtual log 4. The MinLSN record is in virtual log 3. Virtual log 1 and virtual log 2 contain only inactive log records. These records can be truncated. Virtual log 5 is still unused and is not part of the current logical log.



The second illustration shows how the log appears after being truncated. Virtual log 1 and virtual log 2 have been freed for reuse. The logical log now starts at the beginning of virtual log 3. Virtual log 5 is still unused, and it is not part of the current logical log.



Log truncation occurs automatically after the following events, except when delayed for some reason:

- Under the simple recovery model, after a checkpoint.
- Under the full recovery model or bulk-logged recovery model, after a log backup, if a checkpoint has occurred since the previous backup.
Log truncation can be delayed by a variety of factors. In the event of a long delay in log truncation, the transaction log can fill up. For information, see Factors That Can Delay Log Truncation and Troubleshoot a Full Transaction Log (SQL Server Error 9002)

Memory Management Architecture

SQL Server Memory Architecture:
SQL Server dynamically acquires and frees memory as required. Typically, an administrator need not have to specify how much memory should be allocated to SQL Server, although the option still exists and is required in some environments.
SQL Server supports Address Windowing Extensions (AWE) allowing use of physical memory over 4 gigabytes (GB) on 32-bit versions of Microsoft Windows operating systems. This feature is deprecated from Dinali 2012.
SQL Server tries to reach a balance between two goals:

- Keep the buffer pool from becoming so big that the entire system is low on memory.
- Minimize physical I/O to the database files by maximizing the size of the buffer pool.
Buffer Management:

A buffer is a 8kb size in memory. To reduce the I/O operations from database to disk buffer manager use the buffer cache. BM gets the data from database to buffer cache and modifies the data and the modified page is sent back to the disk The buffer manager only performs reads and writes to the database. Other file and database operations such as open, close, extend, and shrink are performed by the database manager and file manager components.
Min and Max memory configuration options:
The min server memory and max server memory configuration options establish upper and lower limits to the amount of memory used by the buffer pool of the Microsoft SQL Server Database Engine. The buffer pool starts with only the memory required to initialize. As the Database Engine workload increases, it keeps acquiring the memory required to support the workload. The buffer pool does not free any of the acquired memory until it reaches the amount specified in min server memory. Once min server memory is reached, the buffer pool then uses the standard algorithm to acquire and free memory as needed. The only difference is that the buffer pool never drops its memory allocation below the level specified in min server memory, and never acquires more memory than the level specified in max server memory.

Thread and Task Architecture
How SQL Server handles Batch or Task Scheduling? Each instance must handle potentially thousands of concurrent requests from users. Instances of SQL Server use Microsoft Windows threads, or if configured, they use fibers, to manage these concurrent tasks efficiently. This includes one or more threads for each server Net-Library, a network thread to handle network I/O, and a signal thread for communicating with the Service Control Manager.

Understanding Scheduling: Each instance of SQL Server has an internal layer (SQL OS/Kernel) that implements an environment similar to an operating system. This internal layer is used for scheduling and synchronizing concurrent tasks without having to call the Windows kernel.
Connection: A connection is established when the user is successfully logged in. The user can then submit one or more Transact-SQL statements for execution. A connection is closed when the user explicitly logs out, or the connection is terminated.
Batch: An SQL batch is a set of one or more Transact-SQL statements sent from a client to an instance of SQL Server for execution.
Task: A task represents a unit of work that is scheduled by SQL Server. A batch can map to one or more tasks.
Windows thread: Each Windows thread represents an independent execution mechanism.
Fiber: A fiber is a lightweight thread that requires fewer resources than a Windows thread. One Windows thread can be mapped to many fibers.
Worker thread: The worker thread represents a logical thread (Task) in SQL Server that is internally mapped (1:1) to either a Windows thread or, if lightweight pooling is turned ON, to a fiber. The mapping can be done till the free worker threads available. (Parameter: Max worker Threads)
Thread and Fiber Execution: Microsoft Windows uses a numeric priority system that ranges from 1 through 31 to schedule threads for execution. Zero is reserved for operating system use. When several threads are waiting to execute, Windows dispatches the thread with the highest priority. By default, each instance of SQL Server is a priority of 7, which is referred to as the normal priority. The priority boost configuration option can be used to increase the priority of the threads from an instance of SQL Server to 13. This is referred to as high priority. The performance of any instances running at normal priority can be adversely affected. Also, the performance of other applications and components on the server can decline if priority boost is turned on.

Query Processing Architecture
Network Flow- SQL Server Architecture:
Summary Level
These steps are defined as follows:
1. The user selects an option in a client application. This option calls a function in the client application that generates a query that is sent to SQL Server. The application uses a database access library to send the query in a way SQL Server can understand.
2. The database library transforms the original request into a sequence of one or more Transact-SQL statements to be sent to SQL Server. These statements are encapsulated in one or more Tabular Data Stream (TDS) packets and passed to the database network library to be transferred to the server computer.
3. The database network library uses the network library available in the client computer to repackage the TDS packets as network protocol packets.
4. The network protocol packets are sent to the server computer network library across the network, where they are unwrapped from their network protocol.
5. The extracted TDS packets are sent to Open Data Services (ODS), where the original query is extracted.
6. ODS sends the query to the relational engine, where the query is optimized and executed in collaboration with the storage engine.
7. The relational engine creates a result set with the final data and sends it to ODS.
8. ODS builds one or more TDS packets to be sent to the client application, and sends them to the server database network library.
9. The server database network library repackages the TDS packets as network protocol packets and sends them across the network to the client computer.
10. The client computer receives the network protocol packets and forwards them to the network libraries where the TDS packets are extracted.
11. The network library sends the TDS packets to the database access library, where these packets are reassembled and exposed as a client result set to the client application.
12. The client application displays information contained in the result sets to the user.


Detailed Level explanation:
When you submit a query to a SQL Server database, a number of processes on the server go to work on that query. The purpose of all these processes is to manage the system such that it will provide your data back to you, or store it, in as timely a manner as possible, whilst maintaining the integrity of the data.

All these processes go through two stages:
1. Relational Engine
2. Storage Engine
At Client:
1. User enter data and click on submit
2. The client database library transforms the original request into a sequence of one or more Transact-SQL statements to be sent to SQL Server. These statements are encapsulated in one or more Tabular Data Stream (TDS) packets and passed to the database network library
3. The database network library uses the network library available in the client computer to repackage the TDS packets as network protocol packets.
4. The network protocol packets are sent to the server computer network library across the network

At Server:
5. The extracted TDS packets are sent to Open Data Services (ODS), where the original query is extracted.
6. ODS sends the query to the relational engine
7. A connection established to the relational engine and assign a SID to the connection

At Relational Engine:
8. Check permissions and determines if the query can be executed by the user associated with the request

9. Query sends to Query Parser

- It checks that the T-SQL is written correctly
- Build a Parse Tree \ Sequence Tree

10. Parse Tree sends to Algebrizer

- Verifies all the columns, objects and data types
- Aggregate Binding (determines the location of aggregates such as GROUP BY, and MAX)
- Builds a Query Processor Tree in Binary Format


11. Query Processor Tree sends to Optimizer

- Based on the query processor tree and Histogram (Statistics) builds an optimized execution plan
- Stores the execution plan into cache and send it to the database engine

At Database Engine:

12. Database engine map a batch into different tasks
13. Each task associated with a process
14. Each process assigned with a Windows Thread or a Windows Fiber. The worker thread takes care of this.
15. The Thread/Fiber send to the execution queue and wait for the CPU time.
16. The Thread/Fiber identifies the table location where the data need to be stored
17. Go to the file header, checks the PFS, GAM and GSAM and go to the correct page
18. Verifies the page is not corrupted using Torn page Detection / Check SUM and writes the data
19. If require allocates new pages and stores data on it. Once the data is stored/updated/added in a page, it updates the below locations
- PFS
- Page Header – Checksum / Torn Page Detection (Sector info)
- BCM
- DCM

20. In this process the
- Memory manager take care of allocating buffers, new pages etc,
- Lock manager take care of allocating appropriate locks on the objects/pages and releasing them when task completed
- Thread Scheduler: schedules the threads for CPU time
- I/O manager: Establish memory bus for read/write operations from memory to disk and vice versa
- Deadlock\Resource\Scheduler Monitor: Monitors the processes

21. If that is a DML operation, it picks the appropriate page from disk and put the page in Memory.
22. While the page is available on Memory based on the ISOLATION LEVEL an shared / exclusive / update / Schema lock issued on that page.
23. Once the page is modified at Memory, that means once the transaction completed the transactional operation logged into log file (.ldf) to the concerned VLF.
24. Here we should understand that only the operation (T-SQL statements) logged into ldf file. The modified page waits in memory till check point happens. These pages are know as dirty pages as the page data is differ in between the page on Disk and Memory.
25. Once the checkpoint happens the page will be written back to the disk.
26. Once the process is completed the result set is submitted to the relational engine and follow the same process for sending back the result set to client application.
27. The connection will be closed and the SID is removed


Network Protocols

Types of network protocols:
SQL Server can be configured to support multiple protocols simultaneously, coming from different clients. Each client connects to SQL Server with a single protocol. If the client program does not know which protocols SQL Server is listening on, you can configure the client to attempt multiple protocols sequentially. In Chapter 3, I'll discuss how you can configure your machine to use one or more of the available protocols. The following protocols are available:
- Shared Memory: The simplest protocol to use, with no configurable settings. Clients using the Shared Memory protocol can connect only to a SQL Server instance running on the same computer, so this protocol is not useful for most database activity. Use this protocol for troubleshooting when you suspect that the other protocols are configured incorrectly. Clients using MDAC 2.8 or earlier cannot use the Shared Memory protocol. If such a connection is attempted, the client is switched to the Named Pipes protocol.
- Named Pipes: A protocol developed for local area networks (LANs). A portion of memory is used by one process to pass information to another process, so that the output of one is the input of the other. The second process can be local (on the same computer as the first) or remote (on a networked computer).
- TCP/IP: The most widely used protocol over the Internet. TCP/IP can communicate across interconnected networks of computers with diverse hardware architectures and operating systems. It includes standards for routing network traffic and offers advanced security features. Enabling SQL Server to use TCP/IP requires the most configuration effort, but most networked computers are already properly configured.
- Virtual Interface Adapter (VIA): A protocol that works with VIA hardware. This is a specialized protocol; configuration details are available from your hardware vendor.

7 comments:

  1. Wһat's up it's mе, I am alsⲟ visiting this web site
    daily, thos web site іs tгuly ցood and tһe visitors are in fact sharibg faastidious thoughts.

    ReplyDelete
  2. I like the valuable information you provide in your articles.
    I'll bookmark your blog and check again here regularly.
    I am quite certain I'll learn many new stuff right here! Good luck for the next!

    ReplyDelete
  3. Thanks for sharing this Informative content. Well explained. Got to learn new things from your Blog on. SQL server dba Online Training

    ReplyDelete
  4. Really good information to show through this blog. I really appreciate you for all the valuable information that you are providing us through your blog. SQL server dba Online Course

    ReplyDelete
  5. Simple and Clear Explanation !

    ReplyDelete
  6. I truly appreciate the time and work you put into sharing your knowledge. I found this topic to be quite effective and beneficial to me. Thank you very much for sharing. Continue to blog.

    Data Engineering Services 

    AI & ML Solutions

    Data Analytics Services

    Data Modernization Services

    ReplyDelete