57 posts from: Craig Freedmans WebLog
More on Implicit Conversions
(Indexed 2010-01-20):
Yesterday, a reader posted a question asking me to comment on SQL Server's algorithm for choosing implicit conversions. When SQL Server encounters an expression with mismatched types, it has two options. It can execute the query with an implicit conversion or it can fail with an error. Before digging into the implicit conversion scenario, let's briefly consider the error case.If an implicit conversion is not possible, there are two possible errors that SQL Server can generate. If the two dat...(truncated)...
Correction to my prior post on sys.dm_db_index_operational_stats
(Indexed 2009-07-29):
In this post about the sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats DMVs, I wrote: Another less important difference between these DMVs is that sys.dm_db_index_usage_stats only reports on indexes that have been used at least once since the server was last restarted while sys.dm_db_index_operational_stats reports on all indexes regardless of whether they have been used.The Books Online page for sys.dm_db_index_operational_stats similarly states about the object_id parameter: ...(truncated)...
Maximum Row Size and Query Hints
(Indexed 2009-06-24):
In my last post (yes, that was two months ago), I gave an example of how a query hint could cause a query to fail. In this post, I'll give another example of how query hints can cause problems. As with my last post, this post was inspired by a question submitted by a reader.SQL Server has a documented row size limit of 8060 bytes. This row size limit applies specifically to the fixed width or in-row portion of variable width columns. A web search yields plenty of articles and discussions abo...(truncated)...
Implied Predicates and Query Hints
(Indexed 2009-04-28):
In this post, I want to take a look at how two seemingly unrelated features of SQL Server can interact to cause a problem. The idea for this post came from a question submitted by a reader. Let's begin. Consider the following trivial schema and query:CREATE TABLE T1 (A INT, B INT)CREATE TABLE T2 (A INT, B INT)SELECT *FROM T1 INNER JOIN T2 ON T1.A = T2.AWHERE T1.B = 0OPTION (HASH JOIN)Not surprisingly, this query yields the following plan: |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]),...(truncated)...
OPTIMIZED Nested Loops Joins
(Indexed 2009-03-18):
In my past two posts, I explained how SQL Server may add a sort to the outer side of a nested loops join and showed how this sort can significantly improve performance. In an earlier post, I discussed how SQL Server can use random prefetching to improve the performance of a nested loops join. In this post, I'm going to explore one more nested loops join performance feature. I'll use the same database that I used in my two prior posts. Let's start with the following simple query: SELECT SUM(D...(truncated)...
Optimizing I/O Performance by Sorting Part 2
(Indexed 2009-03-04):
In my last post, I discussed how SQL Server can use sorts to transform random I/Os into sequential I/Os. In this post, I'll demonstrate directly how such a sort can impact performance. For the following experiments, I'll use the same 3 Gbyte database that I created last week.The system I'm using to run this test has 8 GBytes of memory. To exaggerate the performance effects and simulate an even larger table that does not fit in main memory, I'm going to adjust the MAX SERVER MEMORY' SP_CONFIGU...(truncated)...
Optimizing I/O Performance by Sorting Part 1
(Indexed 2009-02-25):
In this post from last year, I discussed how random I/Os are slower than sequential I/Os (particularly for conventional rotating hard drives). For this reason, SQL Server often favors query plans that perform sequential scans of an entire table over plans that perform random lookups of only a portion of a table. (See the last example in this post for a simple demonstration.) In other cases, instead of performing a sequential scan, SQL Server introduces a sort operator whose sole purpose is to...(truncated)...
What is the difference between sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats?
(Indexed 2008-10-30):
SQL Server includes two DMVs - sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats - that are extremely useful for monitoring which indexes are used as well as how and when they are used. Both DMVs report similar statistics on information such as the number of scans, seeks, and updates to different indexes. These DMVs are documented in Books Online (see here and here) and a simple Web search reveals numerous other postings about these DMVs. However, in my own search, I did not f...(truncated)...
Random Prefetching
(Indexed 2008-10-07):
In my last post, I explained the importance of asynchronous I/O and described how SQL Server uses sequential read ahead to boost the performance of scans. In this post, I'll discuss how SQL Server uses random prefetching. Let's begin with a simple example of a query plan that performs many random I/Os. As in my prior post, all of the examples in this post use a 1GB scale factor TPC-H database. The following query returns the number of line items associated with each order placed on March 15...(truncated)...
Sequential Read Ahead
(Indexed 2008-09-23):
Balancing CPU and I/O throughput is essential to achieve good overall performance and to maximize hardware utilization. SQL Server includes two asynchronous I/O mechanisms - sequential read ahead and random prefetching - that are designed to address this challenge.To understand why asynchronous I/O is so important, consider the CPU to I/O performance gap. The memory subsystem on a modern CPU can deliver data sequentially at roughly 5 Gbytes per second per core and (depending on how you measure...(truncated)...
Dynamic Partition Elimination Performance
(Indexed 2008-08-22):
In this post on partitioned tables, I mentioned that SQL Server 2008 has a much more efficient implementation of dynamic partition elimination as compared to SQL Server 2005. In response, a reader posted this comment asking how much dynamic partition elimination really costs in SQL Server 2005. While I was sure that the answer is not much, I nonetheless decided to measure it and find out. I wrote the following stored procedure which creates a table with the specified number of partitions:CREAT...(truncated)...
Partitioned Indexes in SQL Server 2008
(Indexed 2008-08-05):
In my last post, I looked at how SQL Server 2008 handles scans on partitioned tables. I explained that SQL Server 2008 treats partitioned tables as tables with a logical index on the partition id column and that SQL Server 2008 implements partition elimination by performing a logical index seek on the partition id column. Specifically, I showed some examples using a heap. In this post, I'll continue this discussion and explore how SQL Server 2008 handles scans and seek on partitioned indexes....(truncated)...
Partitioned Tables in SQL Server 2008
(Indexed 2008-07-15):
In this post, I introduced how SQL Server 2005 implements query plans on partitioned tables. If you've read that post or used partitioned tables, you may recall that SQL Server 2005 uses a constant scan operator to enumerate the list of partition ids that need to be scanned. As a refresher, here is the example from that post showing the plan for scanning a table with four partitions:CREATE PARTITION FUNCTION PF(INT) AS RANGE FOR VALUES (0, 10, 100)CREATE PARTITION SCHEME PS AS PARTITION PF ALL...(truncated)...
Subqueries in BETWEEN and CASE Statements
(Indexed 2008-06-27):
Consider the following query:CREATE TABLE T1 (A INT, B1 INT, B2 INT)CREATE TABLE T2 (A INT, B INT)SELECT *FROM T1WHERE (SELECT SUM(T2.B) FROM T2 WHERE T2.A = T1.A) BETWEEN T1.B1 AND T1.B2Observe that the subquery in this query only needs to be evaluated once for each row of T1. Indeed running on SQL Server 2000, we get the following plan (with the portion of the plan corresponding to the subquery in bold): |--Nested Loops(Inner Join, OUTER REFERENCES:([T2].[A], [Expr1004])) |--Compute Sc...(truncated)...
Implicit Conversions
(Indexed 2008-06-05):
In my last couple of posts, I wrote about how explicit conversions can lead to errors. In this post, I'm going to take a look at some issues involving implicit conversions. SQL Server adds implicit conversions whenever you mix columns, variables, and/or parameters with different (but compatible) data types in a single expression. For example, if you try to compare INT and FLOAT columns, the INT must be converted to a FLOAT. If you write "C_INT = C_FLOAT", SQL Server rewrites this expressions...(truncated)...
Query Processing Presentation
(Indexed 2008-05-15):
Last week, I had the opportunity to talk to the New England SQL Server Users Group. I would like to thank the group for inviting me, Adam Machanic for organizing the event, and Red Gate for sponsoring it. My talk was an introduction to query processing, query execution, and query plans in SQL Server. I've had a request for the slides, so here they are. Enjoy![Image]
Conversion and Arithmetic Errors: Change between SQL Server 2000 and 2005
(Indexed 2008-05-06):
In this post from last week, I gave an example of a query with a conversion where the optimizer pushes the conversion below a join. The result is that the conversion may be run on rows that do not join which could lead to avoidable failures. I ran this query on SQL Server 2005. After I published that post, a reader pointed out to me that my example query generates a different plan on SQL Server 2000:CREATE TABLE T1 (A INT, B CHAR(8))INSERT T1 VALUES (0, '0')INSERT T1 VALUES (1, '1')INSERT T1 ...(truncated)...
Conversion and Arithmetic Errors
(Indexed 2008-04-28):
Let's take a look at a simple query:CREATE TABLE T1 (A INT, B CHAR(8))INSERT T1 VALUES (0, '0')INSERT T1 VALUES (1, '1')INSERT T1 VALUES (99, 'Error')SELECT T1.A, CONVERT(INT, T1.B) AS B_INT FROM T1There is no way to convert the string "Error" into an integer, so it should come as no surprise that this query fails with a conversion error:A B_INT----------- -----------1 1Msg 245, Level 16, State 1, Line 1Conversion failed when converting the varchar value 'Error ' to data ty...(truncated)...
Ranking Functions: RANK, DENSE_RANK, and NTILE
(Indexed 2008-03-31):
In my previous post, I discussed the ROW_NUMBER ranking function which was introduced in SQL Server 2005. In this post, I'll take a look at the other ranking functions - RANK, DENSE_RANK, and NTILE. Let's begin with RANK and DENSE_RANK. These functions are similar - both in functionality and implementation - to ROW_NUMBER. The difference is that while the ROW_NUMBER function assigns a unique (and ascending) value to each row without regard for ties in the ORDER BY values, the RANK and DENSE_...(truncated)...
Ranking Functions: ROW_NUMBER
(Indexed 2008-03-19):
SQL Server 2005 introduced four new functions, ROW_NUMBER, RANK, DENSE_RANK, and NTILE that are collectively referred to as ranking functions. These functions differ from ordinary scalar functions in that the result that they produce for a given row depends on the other rows in the result set. They also differ from aggregate functions in that they produce exactly one output row for each input row. Unlike aggregates they do not collapse a set of rows into a single row. In this post, I'll take...(truncated)...
Halloween Protection
(Indexed 2008-02-27):
In a prior post, I introduced the notion that update plans consist of two parts: a read cursor that identifies the rows to be updated and a write cursor that actually performs the updates. Logically speaking, SQL Server must execute the read cursor and write cursor of an update plan in two separate steps or phases. To put it another way, the actual update of rows must not affect the selection of which rows to update. This problem of ensuring that the write cursor of an update plan does not af...(truncated)...
Maintaining Unique Indexes with IGNORE_DUP_KEY
(Indexed 2008-01-30):
A few months ago, I wrote a post describing how SQL Server maintains unique indexes while avoiding false uniqueness violations. In this post, I'm going to look at how SQL Server maintains unique indexes that were created with the WITH IGNORE_DUP_KEY clause. Normally, if we attempt to insert a duplicate key into a unique index, SQL Server fails the entire statement, rolls back any changes made by that statement, and returns an error. However, if we attempt to insert a duplicate key into a uniq...(truncated)...
Partial Aggregation
(Indexed 2008-01-18):
In some of my past posts, I've discussed how SQL Server implements aggregation including the stream aggregate and hash aggregate operators. I also used hash aggregation as an initial example in my introductory post on parallel query execution. In this post, I'll look at a partial aggregation. Partial aggregation is a technique that SQL Server uses to optimize parallel aggregation. Before I begin, I just want to note that I also discuss partial aggregation in Inside Microsoft SQL Server 2005...(truncated)...
Recursive CTEs continued ...
(Indexed 2007-11-07):
In this post, I will finish the discussion of recursive CTEs that I began in my last post. I will continue to use the CTE examples from Books Online. To run these examples, you'll need to install the Adventure Works Cycles OLTP sample database. In my last post, I explained that all recursive queries follow the same pattern of one or more anchor sub-selects and one or more recursive sub-selects combined by a UNION ALL. Similarly, all recursive query plans also follow the same pattern which loo...(truncated)...
Recursive CTEs
(Indexed 2007-10-25):
One of the most important uses of CTEs is to write recursive queries. In fact, CTEs provide the only means to write recursive queries. As I noted last week, there are several excellent CTE examples, including recursive CTE examples, in Books Online. I'll once again start with the examples from Books Online. To run these examples, you'll need to install the Adventure Works Cycles OLTP sample database. Recursive CTEs always follow the same pattern. The body of the CTE is a UNION ALL query tha...(truncated)...
CTEs (Common Table Expressions)
(Indexed 2007-10-18):
CTEs or common table expressions, which are new in SQL Server 2005, provide an easy way to break a complex SQL statement down into smaller more manageable queries. CTEs are is some ways very much like views. Unlike a view which can be created once and used by many SQL statements, a CTE is associated with a single SQL statement. There are several excellent CTE examples, including recursive CTE examples, in Books Online. Rather than construct entirely new examples for this post, I've started w...(truncated)...
GROUPING SETS in SQL Server 2008
(Indexed 2007-10-11):
In my last two posts, I gave examples of aggregation WITH ROLLUP and CUBE. SQL Server 2008 continues to support this syntax, but also introduces new more powerful ANSI SQL 2006 compliant syntax. In this post, I'll give an overview of the changes. First, let's see how we rewrite simple WITH ROLLUP and CUBE queries using the new syntax. I'll use the same schema and queries as in my previous posts:CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)INSERT Sales VALUES(1, 2005, 12000)INSERT Sales ...(truncated)...
Aggregation WITH CUBE
(Indexed 2007-09-27):
In my last post, I wrote about how aggregation WITH ROLLUP works. In this post, I will discuss how aggregation WITH CUBE works. Like the WITH ROLLUP clause, the WITH CUBE clause permits us to compute multiple "levels" of aggregation in a single statement. To understand the difference between these two clauses, let's look at an example. We'll use the same fictitious sales data from last week's example.CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)INSERT Sales VALUES(1, 2005, 12000)INSERT...(truncated)...
Aggregation WITH ROLLUP
(Indexed 2007-09-21):
In this post, I'm going to discuss how aggregation WITH ROLLUP works. The WITH ROLLUP clause permits us to execute multiple "levels" of aggregation in a single statement. For example, suppose we have the following fictitious sales data. (This is the same data that I used for my series of posts on the PIVOT operator.)CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)INSERT Sales VALUES(1, 2005, 12000)INSERT Sales VALUES(1, 2006, 18000)INSERT Sales VALUES(1, 2007, 25000)INSERT Sales VALUES(2, ...(truncated)...
Maintaining Unique Indexes
(Indexed 2007-09-06):
Consider the following schema:CREATE TABLE T (PK INT PRIMARY KEY, A INT, B INT)CREATE INDEX TA ON T(A)CREATE UNIQUE INDEX TB ON T(B)INSERT T VALUES (0, 0, 0)INSERT T VALUES (1, 1, 1)Now suppose we run the following update statement:UPDATE T SET A = 1 - AThis update statement affects the clustered index and the non-clustered index TA. The plan is pretty much what you might expect: |--Clustered Index Update(OBJECT:([T].[PK__T__15502E78]), OBJECT:([T].[TA]), SET:([T].[A] = [Expr1003])) |--C...(truncated)...
Optimized Non-clustered Index Maintenance in Per-Index Plans
(Indexed 2007-08-22):
In my last post, I showed how SQL Server 2005 only updates non-clustered indexes when the data in the index actually changes. For my example, I used a simple update statement that results in a per-row or narrow plan. In this post, I'll show how this optimization works in a per-index or wide update plan.Let's use the same schema and update statement from last week. To get a per-index plan, we can trick the optimizer into believing that the table is much larger than it really is by running some...(truncated)...
Optimized Non-clustered Index Maintenance
(Indexed 2007-08-15):
Insert, update, and delete plans consist of two parts. The first part or read cursor identifies the list of rows to be inserted, update, or deleted. The second part or write cursor performs the actual insert, update, or delete. Let's look at a simple example:CREATE TABLE T (PK INT, A INT, B INT, C INT, D INT, E INT)CREATE UNIQUE CLUSTERED INDEX TPK ON T(PK)CREATE INDEX TB ON T(B)CREATE INDEX TCD ON T(C,D)CREATE INDEX TE ON T(E)INSERT T VALUES(0, 10, 20, 30, 40, 50)UPDATE T SET A = 19Here is t...(truncated)...
More on TOP
(Indexed 2007-08-01):
Last week I wrote about a special case of the TOP operator known as ROWCOUNT TOP. This week I'll take a look at some other interesting TOP scenarios. In general, TOP is a fairly mundane operator. It simply counts and returns the specified number of rows. SQL Server 2005 does include two enhancements to TOP that were not present in SQL Server 2000. First, in SQL Server 2000, we can only specify an integer constant for the number of rows to return. In SQL Server 2005, we can specify an arbit...(truncated)...
ROWCOUNT Top
(Indexed 2007-07-25):
If you've looked at any insert, update, or delete plans, including those used in some of my posts, you've probably noticed that nearly all such plans include a top operator. For example, the following update statement yields the following plan: CREATE TABLE T (A INT)INSERT T VALUES (0)INSERT T VALUES (1)INSERT T VALUES (2)UPDATE T SET A = A + 1Rows Executes3 1 UPDATE [T] set [A] = [A]+@13 1 |--Table Update(OBJECT:([T]), SET:([T].[A] = [Expr1004]))0 0 |--C...(truncated)...
The UNPIVOT Operator
(Indexed 2007-07-17):
The UNPIVOT operator is the opposite of the PIVOT operator. As I explained in my earlier post, the PIVOT operator takes a normalized table and transforms it into a new table where the columns of the new table are derived from the values in the original table. The UNPIVOT operator takes a pivoted table and transforms it back into a normalized form with one row per data point using the column names as values in the result. For example, suppose we have the following data:CREATE TABLE PIVOT_Sales...(truncated)...
PIVOT Query Plans
(Indexed 2007-07-09):
In my last post, I gave an overview of the PIVOT operator. In this post, I'm going to take a look at the query plans generated by the PIVOT operator. As we'll see, SQL Server generates a surprisingly simple query plan that is essentially just a fancy aggregation query plan.Let's use the same schema and queries from my previous post:CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)INSERT Sales VALUES(1, 2005, 12000)INSERT Sales VALUES(1, 2006, 18000)INSERT Sales VALUES(1, 2007, 25000)INSERT S...(truncated)...
The PIVOT Operator
(Indexed 2007-07-03):
In my next few posts, I'm going to look at how SQL Server implements the PIVOT and UNPIVOT operators. Let's begin with the PIVOT operator. The PIVOT operator takes a normalized table and transforms it into a new table where the columns of the new table are derived from the values in the original table. For example, suppose we want to store annual sales data by employee. We might create a schema such as the following:CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)INSERT Sales VALUES(1, 20...(truncated)...
Query Failure with Read Uncommitted
(Indexed 2007-06-12):
Over the past month or so, I've looked at pretty much every isolation level except for read uncommitted or nolock. Today I'm going to wrap up this series of posts with a discussion of read uncommitted. Plenty has already been written about the dangers of nolock. For example, see these excellent posts by Lubor Kollar of the SQL Server Development Customer Advisory Team and by Tony Rogerson.I'd like to demonstrate just one additional hazard of nolock. Begin by creating two tables as follows:cr...(truncated)...
Read Committed and Bookmark Lookup
(Indexed 2007-06-07):
In my last two posts, I discussed two scenarios - one involving updates and another involving large objects - where SQL Server extends the duration of read committed locks until the end of a statement instead of releasing the locks as soon as each row is released. In this post - which I promise will be the last in this series on read committed locks - I will discuss a final scenario involving bookmark lookups where SQL Server holds read committed locks longer than expected. As you might expect...(truncated)...
Read Committed and Large Objects
(Indexed 2007-05-31):
In my last post, I explained that SQL Server holds read committed locks until the end of an update statement (instead of releasing the locks as soon as each row is released) if there is a blocking operator between the scan or seek of the rows to be updated and the update itself. In this post, I'll take a look at a similar result involving large objects.Normally, when SQL Server moves data through a blocking operator such as a sort, it makes a copy of the data. Once SQL Server makes a copy, the...(truncated)...
Read Committed and Updates
(Indexed 2007-05-22):
Let's try an experiment. Begin by creating the following simple schema:create table t1 (a int, b int)create clustered index t1a on t1(a)insert t1 values (1, 1)insert t1 values (2, 2)insert t1 values (3, 3)create table t2 (a int)insert t2 values (9)In session 1, lock the third row of table t1:begin tranupdate t1 set b = b where a = 3Now, in session 2 check the spid (you'll need it later) and run the following update at the default read committed isolation level:select @@spidupdate t1 set t1.b = ...(truncated)...
Serializable vs. Snapshot Isolation Level
(Indexed 2007-05-16):
Both the serializable and snapshot isolation levels provide a read consistent view of the database to all transactions. In either of these isolation levels, a transaction can only read data that has been committed. Moreover, a transaction can read the same data multiple times without ever observing any concurrent transactions making changes to this data. The unexpected read committed and repeatable read results that I demonstrated in my prior few posts are not possible in serializable or snap...(truncated)...
Repeatable Read Isolation Level
(Indexed 2007-05-09):
In my last two posts, I showed how queries running at read committed isolation level may generate unexpected results in the presence of concurrent updates. Many but not all of these results can be avoided by running at repeatable read isolation level. In this post, I'll explore how concurrent updates may affect queries running at repeatable read.Unlike a read committed scan, a repeatable read scan retains locks on every row it touches until the end of the transaction. Even rows that do not qu...(truncated)...
Query Plans and Read Committed Isolation Level
(Indexed 2007-05-02):
Last week I looked at how concurrent updates may cause a scan running at read committed isolation level to return the same row multiple times or to miss a row entirely. This week I'm going to take a look at how concurrent updates may affect slightly more complex query plans.Nested Loops JoinLet's begin by considering this simple query:create table Customers (CustId int primary key, LastName varchar(30))insert Customers values (11, 'Doe')create table Orders (OrderId int primary key, CustId int f...(truncated)...
Read Committed Isolation Level
(Indexed 2007-04-25):
SQL Server 2000 supports four different isolation levels: read uncommitted (or nolock), read committed, repeatable read, and serializable. SQL Server 2005 adds two new isolation levels: read committed snapshot and snapshot. These isolation levels determine what locks SQL Server takes when accessing data and, therefore, by extension they determine the level of concurrency and consistency that statements and transactions experience. All of these isolation levels are described in Books Online.In...(truncated)...
Parallel Query Execution Presentation
(Indexed 2007-04-17):
For those of you readers who've been wondering whatever happened to me, I've been rather busy. Among other activities, I've been writing a chapter for Kalen Delaney's upcoming fourth book in the Inside SQL Server 2005 series: Query Tuning and Optimization.I am hoping to resurrect my blog and to get things started, I'm posting a presentation that I recently delivered to the Pacific Northwest SQL Server User's Group. Several attendees asked whether they could have copies of this presentation, so...(truncated)...
Semi-join Transformation
(Indexed 2006-12-04):
In several of my prior posts, Ive given examples of semi-joins. Recall that semi-joins essentially return a row from one input if we can find at least one matching row from the other input. Here is a simple example:create table T1 (a int, b int)create table T2 (a int, b int) set nocount ondeclare @i intset @i = 0while @i < 10000 begin insert T1 values(@i, @i) set @i = @i + 1 end set nocount ondeclare @i intset @i = 0while @i < 100 begin insert T2 values(@i, @i) set @i = @i + 1 ...(truncated)...
Introduction to Partitioned Tables
(Indexed 2006-11-27):
In this post, Im going to take a look at how query plans involving partitioned tables work. Note that there is a big difference between partitioned tables (available only in SQL Server 2005) and partitioned views (available both in SQL Server 2000 and in SQL Server 2005). I will look at the query plans for partitioned views in a future post.Table ScanLets begin by creating a simple partitioned table:create partition function pf(int) as range for values (0, 10, 100)create partition scheme ps as...(truncated)...
Parallel Hash Join
(Indexed 2006-11-17):
SQL Server uses one of two different strategies to parallelize a hash join. The more common strategy uses hash partitioning. In some cases, we use broadcast partitioning; this strategy is often called a broadcast hash join.Hash PartitioningThe more common strategy for parallelizing a hash join involves distributing the build rows (i.e., the rows from the first input) and the probe rows (i.e., the rows from the second input) among the individual hash join threads using hash partitioning. If a ...(truncated)...
Parallel Nested Loops Join
(Indexed 2006-11-09):
SQL Server parallelizes a nested loops join by distributing the outer rows (i.e., the rows from the first input) randomly among the nested loops threads. For example, if we have two threads running a nested loops join, we send about half of the rows to each thread. Each thread then runs the inner side (i.e., the second input) of the loop join for its set of rows as if it were running serially. That is, for each outer row assigned to it, the thread executes its inner input using that row as th...(truncated)...
Parallel Scan
(Indexed 2006-11-02):
In this post, Im going to take a look at how SQL Server parallelizes scans. The scan operator is one of the few operators that is parallel aware. Most operators neither need to know nor care whether they are executing in parallel; the scan is an exception.How does parallel scan work?The threads that compose a parallel scan work together to scan all of the rows in a table. There is no a-priori assignment or rows or pages to a particular thread. Instead, the storage engine dynamically hands ou...(truncated)...
The Parallelism Operator (aka Exchange)
(Indexed 2006-10-25):
As I noted in my Introduction to Parallel Query Execution post, the parallelism (or exchange) iterator actually implements parallelism in query execution. The optimizer places exchanges at the boundaries between threads; the exchange moves the rows between the threads.The iterator thats really two iteratorsThe exchange iterator is unique in that it is really two iterators: a producer and a consumer. We place the producer at the root of a query sub-tree (often called a branch). The producer re...(truncated)...
Index Union
(Indexed 2006-10-18):
I was planning to continue writing about parallelism this week (and I will continue another time in another post), but I received an interesting question and chose to write about it instead.Lets begin by considering the following query:create table T (a int, b int, c int, x char(200))create unique clustered index Ta on T(a)create index Tb on T(b)create index Tc on T(c) insert T values (1, 1, 1, 1)insert T values (2, 2, 2, 2)insert T values (3, 3, 3, 3) select a from T where b = 1 or b = 3 |--I...(truncated)...
Introduction to Parallel Query Execution
(Indexed 2006-10-11):
SQL Server has the ability to execute queries using multiple CPUs simultaneously. We refer to this capability as parallel query execution. Parallel query execution can be used to reduce the response time of (i.e., speed up) a large query. It can also be used to a run a bigger query (one that processes more data) in about the same amount of time as a smaller query (i.e., scale up) by increasing the number of CPUs used in processing the query. For most large queries SQL Server generally scales...(truncated)...
Decorrelating Subqueries
(Indexed 2006-10-04):
In my previous post, we saw some examples where the optimizer is able to take a query with a correlated subquery and rewrite it as a join. For instance, we saw how this simple in subquery:create table T1 (a int, b int)create table T2 (a int, b int, c int) select *from T1where T1.a in (select T2.a from T2 where T2.b = T1.b)Can be rewritten as a (left semi-join) join: |--Nested Loops(Left Semi Join, WHERE:([T2].[b]=[T1].[b] AND [T1].[a]=[T2].[a])) |--Table Scan(OBJECT:([T1])) |--Ta...(truncated)...
Scalar Subqueries
(Indexed 2006-09-27):
A scalar subquery is a subquery that returns a single row. Some scalar subqueries are obvious. For instance:create table T1 (a int, b int)create table T2 (a int, b int) select *from T1where T1.a > (select max(T2.a) from T2 where T2.b < T1.b)The subquery in this example uses an aggregate that ensures that it produces only a single row on each execution. (Of course, due to the correlation, this single row may be different for each for of T1.) Here is the query plan: |--Filter(WHERE:([T1].[a]>...(truncated)...
Hash Aggregate
(Indexed 2006-09-20):
In my prior two posts, I wrote about the stream aggregate operator. Stream aggregate is great for scalar aggregates and for aggregations where we have an index to provide a sort order on the group by column(s) or where we need to sort anyhow (e.g., due to an order by clause).The other aggregation operator, hash aggregate, is similar to hash join. It does not require (or preserve) sort order, requires memory, and is blocking (i.e., it does not produce any results until it has consumed its entir...(truncated)...
