Data and AI Knowledge: March 2016

Thursday, 17 March 2016

The Dangers of Concurrency

First of all, it is important to understand concurrency. Database concurrency ensures that when multiple operations are occurring at once, the fi nal result is still in agreement — that they concur. This agreement typically depends on a set of rules and constraints that coordinate the behaviours of transactions, making sure that different operations will play nicely together.

Concurrency control theory has two classifications for the methods of instituting concurrency control:

- Pessimistic concurrency control
A system of locks prevents users from modifying data in a way that affects other users. After a user performs an action that causes a lock to be applied, other users cannot perform actions that would conflict with the lock until the owner releases it. This is called pessimistic control because it is mainly used in environments where there is high contention for data, where the cost of protecting data with locks is less than the cost of rolling back transactions if concurrency conflicts occur.

- Optimistic concurrency control
In optimistic concurrency control, users do not lock data when they read it. When a user updates data, the system checks to see if another user changed the data after it was read. If another user updated the data, an error is raised. Typically, the user receiving the error rolls back the transaction and starts over. This is called optimistic because it is mainly used in environments where there is low contention for data, and where the cost of occasionally rolling back a transaction is lower than the cost of locking data when read.

Concurrency Effects
Users modifying data can affect other users who are reading or modifying the same data at the same time. These users are said to be accessing the data concurrently. If a data storage system has no concurrency control, users could see the following side effects:
- Lost updates
- Uncommitted dependency (dirty read)
- Inconsistent analysis (nonrepeatable read)
- Phantom reads

Lost Updates
Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of the other transactions. The last update overwrites updates made by the other transactions, which results in lost data.

Occur when two processes read the same data and both manipulate the data, changing its value and then both try to update the original data to the new value. The second process might overwrite the first update completely.

Uncommitted Dependency/Dirty Reads
Uncommitted dependency (Dirty read) occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row.
A dirty read takes no notice of any lock taken by another process. The read is officially “dirty” when it reads data that is uncommitted. This can become problematic if the uncommitted transaction fails or for some other reason is rolled back.

Other words, Occurs when a process reads uncommitted data. If one process has changed data but not yet committed the change, another process reading the data will read it in an inconsistent state.

Inconsistent Analysis (Non-repeatable Read)
Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Also, inconsistent analysis involves multiple reads (two or more) of the same row, and each time the information is changed by another transaction; thus, the term non-repeatable read.

Other words, A read is non-repeatable if a process might get different values when reading the same data in two reads within the same transaction. This can happen when another process changes the data in between the reads that the first process is doing.

Phantom Reads
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read as a result of a deletion by a different transaction. Similarly, the transaction's second or succeeding read shows a row that did not exist in the original read as the result of an insertion by a different transaction.

Other words, Occurs when membership in a set changes. It occurs if two SELECT operations using the same predicate in the same transaction return a different number of rows.

Wednesday, 16 March 2016

Lock Hints in SQL Server

Locking Hints and Examples are as follows:

ROWLOCK
Use row-level locks when reading or modifying data.

PAGLOCK
Use page-level locks when reading or modifying data.

TABLOCK
Use a table lock when reading or modifying data.

DBLOCK
Use a database lock when reading or modifying data.

UPDLOCK
UPDLOCK reads data without blocking other readers, and update it later with the assurance that the data has not changed since last read.

XLOCK
Use exclusive locks instead of shared locks while reading a table, and use hold locks until the end of the statement or transaction.

HOLDLOCK
Use a hold lock to hold a lock until completion of the transaction, instead of releasing the lock as soon as the required table, row, or data page is no longer required.

NOLOCK
This does not lock any object. This is the default for SELECT operations. It does not apply to INSERT, UPDATE, and DELETE statements.

Monday, 7 March 2016

Lock Escalation in SQL Server

Today lets talk about Lock Escalations in SQL Server. Lock Escalations are an optimization technique used by SQL Server to control the amount of locks that are held within the Lock Manager of SQL Server. Let’s start in the first step with the description of the so-called Lock Hierarchy in SQL Server, because that’s the reason why the concept of the Lock Escalations exists in a relational database like SQL Server.

Lock Hierarchy
The following picture shows you the lock hierarchy used by SQL Server:

As you can see from the picture, the lock hierarchy starts at the database level, and goes down to the row level. You always have a Shared Lock (S) on the database level itself. When your query is connected to a database (e.g. USE MyDatabase), the Shared Lock prevents the dropping of the database, or that backups are restored over that database. And underneath the database level, you have locks on the table, on the pages, and the records when you are performing an operation.
When you are executing a SELECT statement, you have an Intent Shared Lock (IS) on the table and page level, and a Shared Lock (S) on the record itself. When you are performing a data modification statement (INSERT, UPDATE, DELETE), you have an Intent Exclusive or Update Lock (IX or IU) on the table and page level, and a Exclusive or Update Lock (X or U) on the changed records. SQL Server always acquires locks from top to bottom to prevent so-called Race Conditions, when multiple threads trying to acquire locks concurrently within the locking hierarchy. Imagine now how the lock hierarchy would look like, when you perform a DELETE operation on a table against 20.000 rows. Let’s assume that a row is 400 bytes long, means that 20 records fit onto one page of 8kb:

You have one S Lock on the database, 1 IX Lock on the table, 1.000 IX locks on the pages (20.000 records are spread across 1.000 pages), and you have finally 20.000 X locks on the records itself. In sum you have acquired 21.002 locks for the DELETE operation. Every lock needs in SQL Server 96 bytes of memory, so we look at 1.9 MB of locks just for 1 simple query. This will not scale indefinitely when you run multiple queries in parallel. For that reason SQL Server implements now the so-called Lock Escalation.

Lock Escalations
As soon as you have more than 5.000 locks on one level in your locking hierarchy, SQL Server escalates these many fine-granularity locks into a simple coarse-granularity lock. By default SQL Server *always* escalates to the table level. This mean that your locking hierarchy from the previous example looks like the following after the Lock Escalation has been successfully performed.

As you can see, you have only one big lock on the table itself. In the case of the DELETE operation, you have one Exclusive Lock (X) on the table level. This will hurt the concurrency of your database in a very negative way! Holding an Exclusive Lock on the table level means that no other session is able anymore to access that table – every other query will just block. When you are running your SELECT statement in the Repeatable Read Isolation Level, you are also holding your Shared Locks till the end of the transaction, means you will have a Lock Escalation as soon as you have read more than 5.000 rows. The result is here one Shared Lock on the table itself! Your table is temporary readonly, because every other data modification on that table will be blocked!
There is also a misconception that SQL Server will escalate from the row level to the page level, and finally to the table level. Wrong! Such a code path doesn’t exist in SQL Server! SQL Server will by default *always* escalate directly to the table level. An escalation policy to the page level just doesn’t exist. If you have your table partitioned (Enterprise Edition only!), then you can configure an escalation to the partition level. But you have to test here very carefully your data access pattern, because a Lock Escalation to the partition level can cause a deadlock. Therefore this option is also not enabled by default.
Since SQL Server 2008 you can also control how SQL Server performs the Lock Escalation – through the ALTER TABLE statement and the property LOCK_ESCALATION. There are 3 different options available:
- TABLE
- AUTO
- DISABLE

-- Controlling Lock Escalation
ALTER TABLE Table_name
SET(
LOCK_ESCALATION = AUTO -- or TABLE or DISABLE
)GO

The default option is TABLE, means that SQL Server *always* performs the Lock Escalation to the table level – even when the table is partitioned. If you have your table partitioned, and you want to have a Partition Level Lock Escalation (because you have tested your data access pattern, and you don’t cause deadlocks with it), then you can change the option to AUTO. AUTO means that the Lock Escalation is performed to the partition level, if the table is partitioned, and otherwise to the table level. And with the option DISABLE you can completely disable the Lock Escalation for that specific table. But disabling Lock Escalations is not the very best option, because the Lock Manager of SQL Server can then consume a huge amount of memory, if you are not very carefully with your queries and your indexing strategy.

Lock Escalation Thresholds
Lock escalation is triggered when lock escalation is not disabled on the table by using the ALTER TABLE SET LOCK_ESCALATION option, and when either of the following conditions exists:

- A single Transact-SQL statement acquires at least 5,000 locks on a single nonpartitioned table or index.
- A single Transact-SQL statement acquires at least 5,000 locks on a single partition of a partitioned table and the ALTER TABLE SET LOCK_ESCALATION option is set to AUTO.
- The number of locks in an instance of the Database Engine exceeds memory or configuration thresholds.
If locks cannot be escalated because of lock conflicts, the Database Engine periodically triggers lock escalation at every 1,250 new locks acquired.

Conclusion
Lock Escalation in SQL Server is mainly a nightmare. How will you delete more than 5.000 rows from a table without running into Lock Escalations? You can disable Lock Escalation temporarily, but you have to be very careful here. Another option (that I’m suggesting) is to make your DELETE/UPDATE statements in a loop as different, separate transactions: DELETE/UPDATE less than 5.000 rows, so that you can prevent Lock Escalations. As a very nice side-effect your huge, big transaction will be splitted into multiple smaller ones, which will also help you with Auto Growth issues that you maybe have with your transaction log.
Click Here For more details