The Black Art Of Spatial Index Tuning In SQL Server

For my inaugural blog post I have decided to discuss some investigations I have been making recently into tuning spatial indexes in SQL Server.

I have found that while there is a lot of information on teh interwebs about the theory of SQL Server spatial indexes, and the utilities that can be used to tune them, there is very little guidance on what spatial index settings are best suited for different data scenarios, and there is only limited information about the characteristics to consider when tuning these indexes.

As an attempt to provide some guidance I have investigated configuring and tuning spatial indexes on three different spatial data sets which I have discussed as case studies, but before jumping straight into these I think it would be wise to discuss some background aspects to spatial indexing in SQL Server.

SQL Server Spatial Index Theory

I'm not going to get too much into the theory of spatial indexes in SQL server, or the details of creating spatial indexes because there is a lot of documentation on this subject, but I will be referring to aspects of spatial index theory throughout this post. The best starting point for delving into this area is the SQL Server Spatial Indexing Overview documentation. Another great source of information is Isaac Kunen's Blog.

A SQL Server spatial index is basically a grid of cells that cover the extent of the spatial data in a table which allows spatial queries to easily filter out prospective records and lookup smaller subsets of records to see if they match the spatial criteria. e.g. If a spatial query is trying to find all the records that intersect a particular rectangle, then a spatial index can improve the performance by only executing the intersection criteria on records that are contained in a grid cell(s) that the criteria rectangle resides in.

Rather than just being a single layer of cells over the extent of the spatial data, the index is actually a layered grid which is four levels deep (see the image from the SQL Server documentation below). Each cell in level 1 of the grid is divided into cells that make up level 2, and so on for level 3 and 4.

Each level of the grid can be divided into a different number of cells based on a setting of Low (4x4 or 16 cells), Medium (8x8 or 64 cells), or High (16x16 or 256 cells). This is what controls the granularity of the spatial index.

Geography vs. Geometry

An interesting thing to note about the differences between Geography and Geometry data types is that spatial indexes for Geography types do not have a bounding box specified. This means that Geography indexes span the entire globe from -180 to 180 longitude and -90 to 90 latitude. This means that the granularity specified for a Geography index divides the entire globe into the desired number of cells.

Geometry indexes have a bounding box specified, so the administrator can define the grid granularity in conjunction with a known data extent, meaning that the administrator has a lot more control over the equivalent "ground" size of each grid cell.

Because of this limitation I see the Geometry index as having a major advantage over the Geography index. I'm not entirely sure of the reason for this constraint on Geography indexes, perhaps there is an assumption that data sets using geographic coordinates (longitude, latitude) are usually large scale data sets spanning large extents around the earth, but in practice I don't think this is always the case, which will become evident in my last case study.

Index Distribution

In SQL Server a primary key index has high selectivity because each indexed value is unique, which means that performance is relatively similar across the distribution of keys in the index regardless of the selection criteria.

It is a different story with non-unique indexes, because there may not be the same level of selectivity across the entire distribution of values in the index, e.g. A column containing surnames of users would undoubtedly have low selectivity when searching for names such as "smith" or "jones", but potentially have higher selectivity when searching for names such as "kerouac" or "zimmerman". These indexes show us that the performance of the index has a high correlation with the criteria we are using to search with.

This same consideration needs to be applied when we create spatial indexes. When calculating the desired settings for a spatial index, not only is it important to make sure that the index is granular enough to efficiently index our records, it is also important that we have a rough idea on the range of spatial selections that users will be performing on our data set.

Based on the four levels of index grid, and three settings of granularity of each grid, it can be seen that the resulting number of cells in the grid will be the same if the index is defined as High, Medium, Medium, Medium (HMMM) as it would if the index was defined as MMMH, but the difference in performance will be dependent on the spatial criteria being applied, i.e. A large sized rectangular criteria filter may perform better using the MMMH index over the HMMM index because it may be able to select fewer grid cells in level 1 to analyse before drilling into the level 2 grid cells.

Spatial Index Tuning

I found a great source of information on spatial index tuning on Bob Beauchemin's Blog which highlights the following stored procedures for investigating spatial index characteristics:

sp_help_spatial_geography_index
sp_help_spatial_geometry_index

This stored procedures accept as input parameters

Table Name
Index Name
Flag indicating whether to return verbose output
A geography/geometry instance representing the selection extent (refer to the index distribution discussion above)

In Bob's Blog he discusses the interesting characteristics being the primary and internal filter efficiency, as well as the statistics on the overall number of rows in the index, broken down as an average number of index rows per row of data.

The primary filter is the spatial index, and the internal filter seems to be an optimised version of the spatial index. The spatial filter's purpose is to reduce the number of times the query's spatial criteria is compared against records in the table, by executing it against cells in the index and looking up the data that resides in these cells.

In this post on Bob Beauchemin's Blog he discusses the output values from the tuning stored procedures, showing that the primary index alone will result in selection of a records that may meet the spatial criteria, including a number of false positives, while the optimised internal filter will perform a conservative selection of the bare minimum of records that meet the spatial criteria, and it is the difference between these selections that the query engine will use to compare the spatial criteria against the data records to determine whether they meet the query criteria.

So the most efficient spatial indexes will have a large percentage of the records being determined from the primary and internal filters (i.e. high efficiency percentages), with only a small number of surplus records needing to be checked individually.

Keep in mind that if we make the index quite granular, i.e. have a large number of small grid cells, we may be able to improve the efficiency percentage of the primary and internal filters, but the index itself will need to analyse a larger number of grid cells to see if those match the query criteria, and once the matching cells are determined, the index will possibly need to look-up the data records from more index leaf nodes.

From these observations we can see that the perfect spatial index will be defined such that the number of grid cells in the index is minimised, while still providing a granularity of the index grid to allow selections to be made across the entire data set, in such a way that the index filter efficiency will be maximised across the broad range of spatial queries being performed.

Wow, that's a mouthful, breaking that down, basically we want the index to be as efficient as possible while trying to minimise the number of rows in the index.

Index Granularity

As discussed in the Index Distribution section above, the main consideration for the spatial index settings comes down to having a good idea of the types of spatial queries that will be getting performed. e.g. If the majority of spatial selections are for view extents, and they are mostly higher scale selections, then it might be appropriate to have an index grid density that is relatively sparse compared to the spatial density of the data in the table. If on the other hand you are doing a lot of selections of small numbers of records, perhaps within a buffer of another feature, then you may want to have an index grid density that is fairly similar to the density of your data.

This leads to us needing to know a little more about the statistics of our data. Typical indexes in SQL Server use statistics in their indexes to determine how selective particular data values are across the range of data in the table, so in a sense we are doing something similar in getting to know the distribution of our data to make an informed decision on the settings for our spatial index.

In my case studies below I performed spatial analysis of my data sets to determine the maximum density of data, and made decisions on my spatial indexes using that information. I did this using a T-SQL script that splits the data extent into a number of blocks, and counts the number of records in each block looking for the maximum number. It then calculates a relative density compared to the extent size using the number of records in the block with the maximum number.

Spatial Index Hints

Just briefly I wanted to mention that when performing my analysis of spatial indexes in SQL Server I found that I had to always define an index hint to force the spatial queries to use the index I had defined. I have read that this may be due to the version of SQL Server I am using, but I thought it would be useful to make a note of so that anyone having problems with the spatial index not being used can be aware that this might be a cause.

The image below shows my query plan without an index hint, showing that a table scan (in this case the clustered primary key) will be performed to check each record to see if it matches the criteria expression.

The image below shows the same query with an index hint defined, showing part of the query plan that indicates a spatial index seek will be employed.

Case Studies

I have a background in cadastral surveying, so I have used three data sets that each represent a layer of cadastral land boundaries, as land parcel polygons:

Data Set 1 - Local Municipality - approximately 15km x 15km, with 14000 records
Data Set 2 - Local Municipality - approximately 40km x 40km, with 66000 records
Data Set 3 - State - approximately 1700km x 2200km, with 2.5 million records

For each of the data sets I created four separate selections in different locations within the data set, and then wrote a script that would execute the four selections a hundred times and then average the time taken to select the records. The number of features selected is not exactly the same for each data set, but is relatively similar enough to make comparisons.

For each of the data sets, the scenarios that I had in mind were spatial selections of urban or semi-urban land parcels, so three of the four spatial selections in the tests select a relatively small number of records, i.e. akin to selecting the land parcels in a street, or subdivision etc. The fourth selection is like a selection for a view extent, of a number of thousand parcels, i.e. around 1:10000 scale.

Data Set 1

This data set represents a small regional local government authority. The extent of the data is about 15km x 15km (10 miles). Regardless of the small size of the area, the majority of land parcels are typical residential sized allotments.

The data set was based on a projected map grid datum, so I was able to create each land parcel as a geometry data type. This meant that I was able to define my spatial indexes with a bounding box, and be in more control of the size of the resulting index grid cells.

The data set has an average parcel size of 10000sqm, with 80% of parcels being under 2000sqm (residential land parcel size), with the total area of those 80% of parcels being 0.64% of the area of the bounding box of the data set, and being 6.78% of the total area of all parcels.

This gives you an indication of the relative density of the data, and that the majority of the small spatial selections that will be performed on the data will be on parcels that are very small compared to the extent of the data set.

I calculated the highest area of spatial density to be 1:238400 (comparison to the size of the overall extent of the data set), which is a lot higher than the number of records in the data set (approx 14000).

I made a decision that I wanted an index grid that would be similar to the highest record density, so I figured that a grid of MLLL, or 64x16x16x16 (equating to 262144 grid cells) would be the ideal, or a permutation of those settings, i.e. LMLL, LLML, LLLM. I also executed my test script against other index configurations as well to test my hypothesis, e.g. LLLL, HLLL, etc.

My test selections were performed with an STIntersects criteria expression with different sized rectangles in different locations in the area. The results being:

Test 1 Count: 56 features

Test 2 Count: 124 features

Test 3 Count: 110 features

Test 4 Count: 7016 features

The results for each index are shown below, with the average time taken to perform each query, and the filter efficiency of each selection, along with the number of rows in the index.

Index Settings

Index Performance

No Index

Test 1	0.1081s
Test 2	0.1091s
Test 3	0.1080s
Test 4	0.1266s

MLLL

Total_Primary_Index_Rows 87980

Total_Primary_Index_Pages 264

Average_Number_Of_Index_Rows_Per_Base_Row 6

Test 1	0.0049s	Internal_Filter_Efficiency 94.64% Primary_Filter_Efficiency 67.46%
Test 2	0.0072s	Internal_Filter_Efficiency 82.25% Primary_Filter_Efficiency 92.53%
Test 3	0.0049s	Internal_Filter_Efficiency 88.18% Primary_Filter_Efficiency 88.70%
Test 4	0.0433s	Internal_Filter_Efficiency 88.18% Primary_Filter_Efficiency 88.70%

LMLL

Total_Primary_Index_Rows 87955

Total_Primary_Index_Pages 263

Average_Number_Of_Index_Rows_Per_Base_Row 6

Test 1	0.0039s	Internal_Filter_Efficiency 94.64% Primary_Filter_Efficiency 67.46%
Test 2	0.0082s	Internal_Filter_Efficiency 82.25% Primary_Filter_Efficiency 92.53%
Test 3	0.0040s	Internal_Filter_Efficiency 88.18% Primary_Filter_Efficiency 88.70%
Test 4	0.0417s	Internal_Filter_Efficiency 88.54% Primary_Filter_Efficiency 98.63%

LLML

Total_Primary_Index_Rows 87020

Total_Primary_Index_Pages 261

Average_Number_Of_Index_Rows_Per_Base_Row 6

Test 1	0.0051s	Internal_Filter_Efficiency 94.64% Primary_Filter_Efficiency 67.46%
Test 2	0.0074s	Internal_Filter_Efficiency 78.22% Primary_Filter_Efficiency 89.20%
Test 3	0.0050s	Internal_Filter_Efficiency 87.27% Primary_Filter_Efficiency 82.08%
Test 4	0.0416s	Internal_Filter_Efficiency 87.10% Primary_Filter_Efficiency 98.24%

LLLL

Total_Primary_Index_Rows 53354

Total_Primary_Index_Pages 161

Average_Number_Of_Index_Rows_Per_Base_Row 3

Test 1	0.0047s	Internal_Filter_Efficiency 73.21% Primary_Filter_Efficiency 58.33%
Test 2	0.0070s	Internal_Filter_Efficiency 79.83% Primary_Filter_Efficiency 93.23%
Test 3	0.0042s	Internal_Filter_Efficiency 79.09% Primary_Filter_Efficiency 69.18%
Test 4	0.0506s	Internal_Filter_Efficiency 73.07% Primary_Filter_Efficiency 97.55%

HLLL

Total_Primary_Index_Rows 157928

Total_Primary_Index_Pages 473

Average_Number_Of_Index_Rows_Per_Base_Row 11

Test 1	0.0048s	Internal_Filter_Efficiency 100% Primary_Filter_Efficiency 71.79%
Test 2	0.0103s	Internal_Filter_Efficiency 85.48% Primary_Filter_Efficiency 96.12%
Test 3	0.0053s	Internal_Filter_Efficiency 90% Primary_Filter_Efficiency 94.01%
Test 4	0.0496s	Internal_Filter_Efficiency 95.89% Primary_Filter_Efficiency 98.40%

The results show that creating an index with a granularity equivalent to the most dense area of my data resulted in the best performance. The LMLL index provided the best filter efficiency across all four selections, but the timed performance was slightly worse for one of the smaller selections, but this time difference is negligible in the context of the entire query time.

You can see that the HLLL index actually provided a much better primary and internal filter efficiency than the MLLL permutations, but because this index results in twice the number of rows of the MLLL indices, the performance time is longer, as I discussed in the Spatial Index Tuning section.

Conversely, you can see from the LLLL index that the number of index rows is significantly less, but the primary and internal filter efficiency is worse, because the granularity of the grid cells does not support the queries that we are performing, so the query engine is having to resort to comparing more of the data against the spatial criteria, and hence the efficiency is poor.

One thing to note about this data set is that it is so small (14000 records) that the performance with no index is still quite good, with the table scan only taking around 0.1s to check each record in the table to see if it matches the spatial criteria (this is why the time is very similar for each test). It is worth noting though that by adding the spatial index the large selection query performance is 3 times faster, and the smaller selections are 20 times faster.

Data Set 2

This data set represents a medium sized suburban local government authority. The extent of the data is about 40km x 40km (25 miles). The data has a number of medium to high density suburban areas, and there are large portions within the extent of data set that are not utilised as land parcels, leading to the data set having areas of higher density parcels relative to the size of the extent of the data set.

Like the previous data set, this data was based on a projected map grid datum, so I was able to create each land parcel as a geometry data type. This meant that I was able to define my index with a bounding box, and be in more control of the size of the resulting index grid cells.

This data set has an average parcel size of 8000sqm, with 90% of parcels being under 2000sqm, with the total area of those 90% of parcels being 2.78% of the area of the extents of the data set, and being 7.73% of total area of all parcels.

I calculated the highest area of spatial density to be 1:1189200, which is a lot higher than the number of records in the data set (approx 66000).

Again, basing my decision for the index grid size to be similar to the highest record density, I decided on a grid size of MMLL (64x64x16x16) or HLLL (256x16x16x16) to equate to 1048576 grid cells.

The spatial selection test results are shown below:

Test 1 Count: 82 features

Test 2 Count: 167 features

Test 3 Count: 116 features

Test 4 Count: 6226 features

Index Settings

Index Performance

No Index

Test 1	0.4770s
Test 2	0.4755s
Test 3	0.4784s
Test 4	0.4958s

MMLL

Total_Primary_Index_Rows 274349

Total_Primary_Index_Pages 820

Average_Number_Of_Index_Rows_Per_Base_Row 4

Test 1	0.0046s	Internal_Filter_Efficiency 50% Primary_Filter_Efficiency 84.53%
Test 2	0.0072s	Internal_Filter_Efficiency 91.01% Primary_Filter_Efficiency 97.66%
Test 3	0.0048s	Internal_Filter_Efficiency 77.58% Primary_Filter_Efficiency 76.31%
Test 4	0.0417s	Internal_Filter_Efficiency 84.90% Primary_Filter_Efficiency 96.94%

MLML

Total_Primary_Index_Rows 273834

Total_Primary_Index_Pages 818

Average_Number_Of_Index_Rows_Per_Base_Row 4

Test 1	0.0043s	Internal_Filter_Efficiency 50% Primary_Filter_Efficiency 84.53%
Test 2	0.0079s	Internal_Filter_Efficiency 91.01% Primary_Filter_Efficiency 97.66%
Test 3	0.0047s	Internal_Filter_Efficiency 77.58% Primary_Filter_Efficiency 76.31%
Test 4	0.0421s	Internal_Filter_Efficiency 84.86% Primary_Filter_Efficiency 96.90%

HLLL

Total_Primary_Index_Rows 274443

Total_Primary_Index_Pages 820

Average_Number_Of_Index_Rows_Per_Base_Row 4

Test 1	0.0051s	Internal_Filter_Efficiency 50% Primary_Filter_Efficiency 84.53%
Test 2	0.0076s	Internal_Filter_Efficiency 91.01% Primary_Filter_Efficiency 97.66%
Test 3	0.0053s	Internal_Filter_Efficiency 77.58% Primary_Filter_Efficiency 76.31%
Test 4	0.0403s	Internal_Filter_Efficiency 84.90% Primary_Filter_Efficiency 96.94%

MMML

Total_Primary_Index_Rows 489475

Total_Primary_Index_Pages 1459

Average_Number_Of_Index_Rows_Per_Base_Row 7

Test 1	0.0051s	Internal_Filter_Efficiency 92.41% Primary_Filter_Efficiency 97.14%
Test 2	0.0100s	Internal_Filter_Efficiency 86.82% Primary_Filter_Efficiency 95.42%
Test 3	0.0054s	Internal_Filter_Efficiency 82.92% Primary_Filter_Efficiency 88.17%
Test 4	0.0386s	Internal_Filter_Efficiency 91.37% Primary_Filter_Efficiency 87.87%

As we can see again, by making the index granularity similar to the density of the most dense data in the data set, we have defined the best performing spatial index - MMLL. As I mentioned above, the grid cell granularity of MMLL is the same as HLLL. What is interesting to note is that the primary and internal filter efficiency is exactly the same between MMLL and HLLL, and the number of index rows is very similar as well, but the time taken to perform the larger selections is slightly faster for the HLLL index. It may be that more of the solution may be getting determined by grid levels 1 or 2 and not having to resort to lower grid levels. When using the verbose output setting on the stored procedure, there are results such as Total_Number_Of_ObjectCells_In_Level1_In_Index that will give a breakdown of each grid level, and can be analysed to see what the object/index cell distribution is across each of the levels.

One of the interesting results in indexes MMLL and HLLL is the efficiency values in test 1. The internal and primary filter efficiency is quite poor. I checked this selection and found that the selection rectangle actually passed through most of the features that were being selected. This means that a high percentage of features needed to be analysed to see if they fit the criteria. One thing we may be able to read into this is that the granularity of the index grid may not be small enough, given than our optimal choice in the first data set resulted in an average index rows per data row of 6, and this choice was 4, perhaps we don't have the granularity we need. Since the selection was so small, and the performance time quite fast anyway, I ignored this result as an outlier, considering it an anomaly of my selection choice.

Another interesting result was the use of the MMML index, which results in twice the number of index rows, but results in an average number of index rows per data row that is closer to our choice from data set 1 - 7. Looking at the filter efficiency we can see that there has been a big improvement for test 1, although the performance time has not improved. Again we can see this as the trade-off between efficiency based on grid granularity, and number of index rows that need to be analysed. If over time we found the MMLL index being inefficient for other selections, we may need to think about using the MMML index, depending on how it was performing.

It is interesting to note with this data set having 66000 records, that the time to perform a table scan has increased significantly on the time encountered in data set 1, with indexed results being between 10 and 100 times faster.

Data Set 3

This data set represents the cadastral boundaries for the Australian state of Queensland. For overseas readers, you may recall Queensland being in the news earlier in the year due to severe flooding in the south east, and Cyclone Yasi in the north. Queensland is the 3rd most populated state of Australia, and 2nd in area to Western Australia. The state is 1700km (1000 miles) wide, by 2200km (1400 miles) in length. Most of the population is centred around the south east corner of the state (containing the vast majority of land parcels in the state), with a number of populated regional centres located along the coast.

This data set has an average parcel size of 654150sqm, with 80% of parcels being under 2000sqm, with the total area of those 90% of parcels being 0.033% of the area of the extents of the data set, and being 0.071% of total area of all parcels.

To create a spatial index that will cater for the diversity of spatial density for this data set is quite challenging, but as I mentioned in the discussion on Geography vs Geometry, I struck a limitation with this data set that constrained what I could achieve generally. This data set is in geographic coordinates (longitude, latitude), so I am not able to define a bounding box for the spatial index. Instead I was only able to try to make the grid as granular as I could to improve the types of selections I desired.

As with the other two data sets, I calculated the highest area of spatial density to be 1:508659200, which is a lot higher than the number of records in the data set (approx 2.5 million), which if using geometry data I could have created a HHHM (256x256x256x64) 1073741824 grid cell index, but using this granularity for the geography index provided poor efficiency (see below). I was only able to define one level higher in the number of grid cells - HHHH (256x256x256x256) with 4294967296 grid cells, and while this did improve upon the previous settings, was still quite poor performing in terms of efficiency.

The spatial selection test results are shown below:

Test 1 Count: 79 features

Test 2 Count: 264 features

Test 3 Count: 117 features

Test 4 Count: 6297 features

Index Settings

Index Performance

No Index

Test 1	34.8497s
Test 2	34.9102s
Test 3	34.8761s
Test 4	34.9194s

HHHL

Total_Primary_Index_Rows 3378393

Total_Primary_Index_Pages 10054

Average_Number_Of_Index_Rows_Per_Base_Row 0

Test 1	0.0397s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 3.97%
Test 2	0.0565s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 8.96%
Test 3	0.0343s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 6.92%
Test 4	0.3844s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 31.26%

HHHM

Total_Primary_Index_Rows 3882914

Total_Primary_Index_Pages 11554

Average_Number_Of_Index_Rows_Per_Base_Row 0

Test 1	0.0228s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 6.86%
Test 2	0.0197s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 29.01%
Test 3	0.0207s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 11.83%
Test 4	0.2183s	Internal_Filter_Efficiency 2.79% Primary_Filter_Efficiency 49.22%

HHHH

Total_Primary_Index_Rows 4582733

Total_Primary_Index_Pages 13637

Average_Number_Of_Index_Rows_Per_Base_Row 0

Test 1	0.0162s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 11.70%
Test 2	0.0167s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 38.59%
Test 3	0.0144s	Internal_Filter_Efficiency 0% Primary_Filter_Efficiency 19.05%
Test 4	0.1785s	Internal_Filter_Efficiency 9.30% Primary_Filter_Efficiency 61.30%

As discussed above, the use of the Geography data type has constrained our choices with this data set, so HHHH is the natural choice for our index, based on it having the best filter efficiency, and unfortunately not really having any more granular settings at out disposal, i.e. HHHH is the most detailed index grid we can define.

When performing the selections on this data set with no index, the time taken to perform a table scan was around 35s, so the improvement in using the index was from 200 to 2000 times faster.

Unfortunately the lack of control over the granularity of the grid means that a lot of processing would be occurring in the database to perform each query, so I would expect performance to degrade quickly for this data set in a multi user environment.

While this data set was interesting to use from an analysis perspective, in an enterprise situation I would expect that the data would be tiled into logical regions, with queries incorporating one or more tiles depending on the size of the query. In this way a tile would act as a kind of preliminary index.

Conclusion

Like any system implementation I think that applying a spatial index is something that needs to be reviewed and tweaked over the lifetime of the data set. Depending on the type of data being kept, the data may incrementally change over time, possibly requiring tweaking of the index, but regardless of this, you will also learn more about the queries that users are performing against the data as time goes by, and this knowledge will help you design you spatial index.

Remember that your spatial index will have a profound effect on the performance of your database, especially in a multi user scenario, so the index should be designed well to meet the needs of the queries being performed.

Another suggestion that I haven't spoken of so far is that like typical SQL Server indexes, it is possible to have multiple spatial indexes on the same table, so if necessary you could have an index that is utilised by larger queries, such as view extents at a high scale, and another for granular spatial queries, and then use index hints to swap between the two when performing different operations. This would split the processing and potentially improve the performance in a multi user environment.

The following are a summary of my main tips from what I have learned:

Always define a spatial index, no matter what the settings - don't get hung up on needing the perfect index from the start, as I have shown having any index improves spatial queries over no index at all. As I said you can improve your index over time as you get to understand your users, and your data.
Make sure that your index is being used by checking the query plan.
Determine the density of your data, or at least be able to estimate the density of your desired index grid cells.
Get some information about the types of spatial selections users will be performing - perhaps log all user selections so that you can test your spatial indexes with these queries.
Use the tuning stored procedures - be objective about filter efficiency vs. number index rows, because an imbalance in either will affect your query performance, and therefore load on the server in a multi user environment.

My final comment would be that were you have the choice to use Geometry or Geography data, choose Geometry, if only for the control over the spatial index. I see the index limitation with the Geography data type to be a major stumbling block, especially in situations such as my third case study. I have heard that future versions of SQL Server will have the ability to have indexes with deeper grid levels, so this may alleviate the situation I have, so I look forward to seeing this new functionality.

4 comments:

AnonymousSeptember 12, 2013 at 8:03 PM
Amazing post, learned a lot about something i never really new about in SQL server (always something new to learn right?). Theres a nice page here too:

SQL Index tuning tips
AnonymousMarch 23, 2015 at 1:32 PM
Great Info
AnonymousNovember 19, 2015 at 12:27 AM
Hi
Awesome post. I have a question though, I didnt get why you couldn't use geometry on the dcdb? I have to maintain a points dataset with geotagged photos in Qld, am I doing something ridiculously wrong by loading them up into a geometry feature class (they're photos geotagged from the gps in their exif)
Regards

Note: Only a member of this blog may post a comment.

boomphisto

Tuesday, April 26, 2011

The Black Art Of Spatial Index Tuning In SQL Server

4 comments: