Focusworks AI for Microsoft Teams - Chat with your business data https://focusworks.ai/
Fresh - Bring your SharePoint intranet to life https://freshintranet.com/
Tuesday 6 November 2012
Saturday 3 November 2012
Querying List Items from Large Number of Sites in SharePoint
When scouting the web for working with SharePoint Large
Lists, you can find many articles which deal with fetching a huge number of
items from one particular list. But very little data when you want to fetch
items from a large number of sub sites. So after a little bit of poking around,
I decided to blog about some of my findings here:
The Scenario:
Here are the conditions on which I was testing:
- 1 Site Collection
- 500 Sub sites
- 1 Task List in Each sub site - > 500 Lists
- 10 items in each List -> 5000 List Items
The Tools:
The tools I was using for measuring the performance were
nothing extraordinary:
1) I was
using the StopWatch
Class from the System.Diagnostics namespace. This class provides a fairly
simple and easy mechanism for recording the time a particular operation took to
execute.
This MSDN link has excellent examples on
how to use the StopWatch class for performance measuring
2) The Developer Dashboard has always been my goto
tool for performance measuring. I don’t know how I used to get by before I
started using it. It provides a wealth of information about the page load. It
can provide you with the time taken, the database calls made, the stack trace
and a whole lot of other very useful information. A good tutorial on the
Developer Dashboard can be found here.
SPSiteDataQuery:
The SPSiteDataQuery class is the heart of architecture when
you want to get data from multiple sites. This class by itself does not use any
form of caching and always returns data based on the real time queries. So even
if it takes a bit longer to fetch the data, it is guaranteed that you will get
all the current results and your users will never have to wait to see their new
items to be returned by the query.
Here is the code for doing a simple query with the
SPSiteDataQuery class:
Here is a stack trace of the internal methods which are
called by the SharePoint framework when a SPSiteDataQuery is used:
So as you can see, it calls the SPRequest.CrossListQuery method which internally makes queries to
the Database to fetch the relevant results.
When querying the database the procedure proc_EnumListsWithMetadata is used. You
can have a look at this procedure in your Content DB. It queries several tables
such as the dbo.AllLists, dbo.AllWebs
etc. to fetch the relevant results.
Time taken to query
5000 items in 500 sub sites and return 1200 matching items:
650ms average on each load.
CrossListQueryInfo:
The CrossListQueryInfo class is another mechanism you can
use to fetch the List Items from multiple sites. This class internally uses the
SPSiteDataQuery class to actually fetch the items from the database and when the
items are returned, it stores them in the object cache of the Publishing
Infrastructure. When any more calls to the same data are made subsequently,
then the data is returned from the cache itself without making any more trips
to the database.
The working of the CrossListQueryInfo class
largely depends on the object cache of the Publishing Features of SharePoint
server. So you cannot use this class in SharePoint 2010 Foundation or in
sandbox solutions. Also, the default expiry time of the object cache is set to
60 seconds. So you might want to change that time depending upon your environment
requirements.
Here is the same code for using the CrossListQueryInfo
class:
Make sure to set the CrossListQueryInfo.UseCache
as true if you want to use the caching features. Another very important thing
to mention is that there are 4 overloads of the CrossListQueryCache.GetSiteData
method and only 2 of them support caching.
So only use the methods which accepts the SPSite object as one of the parameters
if you want to use caching in your code.
The Stack Trace of the CrossListQueryInfo class looks like
this:
So as you can see, the Publishing.CachedArea is queried
first to check whether the items exist in the cache. If they don’t exist, then
a call to the SPSiteDataQuery is made which fetches the values from the
database and stores it in the cache. All the next subsequent calls will find
that the items are present in the cache so no more calls with the SPSiteDataQuery
class will be made.
As a result, the very first call will take longer than a
vanilla SPSiteDataQuery call as under the hood, the CrossListQueryInfo is not
only fetching the items but also building a cache with them.
Time taken to query
5000 items in 500 sub sites and return 1200 matching items:
2000ms on first load and 30ms average on each
subsequent load until the object cache expires.
PortalSiteMapProvider:
The PortalSiteMapProvider is a class which can used to
generate the navigation on SharePoint Publishing sites. The Global navigation,
the Quick Launch and the Breadcrumb navigation can all be generated with help
of the PortalSiteMapProvider. It also provides methods to query sub sites,
lists and list items with help of caching.
The main advantage of the PSMP is that it queries the SharePoint
change log to check whether any changes have happened to the data being queried.
If yes, then only the incremental changes are fetched and thus the cache is
updated accordingly.
However, my tests showed that the
PortalSiteMapProvider.GetCachedSiteDataQuery method which is used to get items
from multiple sub sites does not maintain an incremental cache and it only
fetches the new or updated items when the object cache has expired.
So essentially when querying for items from multiple sites,
the CrossListQueryInfo and the PortalSiteMapProvider behave almost similarly.
Here is the sample code for the PortalSiteMapProvider:
The stack trace for the PortalSiteMapProvider:
You can see that it’s very similar to the
CrossListQueryInfo.
Time taken to query
5000 items and return 1200 matching items:
2000ms on first load and 30ms average on each
subsequent load until the object cache expires
So these are some of the methods you can use to query multiple List Items in multiple sites. Hope you had a good time reading through the post.
Happy SharePointing!
SharePoint List Indexes : Under the Hood
You must be already aware that SharePoint provides the
functionality to index columns so querying on them will be faster and that
Throttling will not occur. Let us look at how SharePoint
maintains this index and how exactly is it stored in the Content Database.
SharePoint maintains multiple tables in the content database
with names starting with dbo.NameValuePair
and followed by Culture Names where the Site ID, Web ID, List Id, Item ID
and Value of the Indexed fields are stored.
If the value of the field is not Culture Dependent, e.g. the
DateTime Fields, the Person or Group Fields etc. then the Value is stored in
the table named dbo.NameValuePair.
If the value of the field is Culture Dependent e.g. the Text
fields, then the value is stored in a table named dbo.NameValuePair_and followed by Culture Name. E.g. If the current language
being used is English, then the value is stored in the table called dbo.NameValuePair_Latin1_General_CI_AS.
When a query is made to any of the lists which has an
indexed field, then a JOIN is performed on the dbo.AllLists table and
the relevant dbo.NameValuePair table and the joined data is presented.
Since the data in the indexed fields is stored completely in
a different table, list throttling does not occur even if the query is made to
more than 5000 rows for a normal user.
To test this out, I created a brand new WebApplication and
new root level site collection under that with the Team Site template. After that, I created an index in my “Tasks” list on the
Assigned To field which is a Person or Group field and added a sample task to
the list. Then I opened the Content Database of the new WebApplication and had
a look at the dbo.NameValuePair table:
The row which is highlighted with red contains the SiteId,
WebId, ListId, ItemID and the value which I entered in the Assigned To field of
my Tasks List. It is showing as 9 because it is a Person field and that is the
ID of the User I assigned the Task to.
The 2 rows below the highlighted rows are the values of the Modified field from the “Site Pages”
list. The values belong to the Home.aspx
and the How To Use This Library.aspx as
these 2 items are created by default in a Team Site. This field is added to the Index by default
and since it is a field of type Date and Time (which is not culture dependent)
we can see it in the dbo.NameValuePair
table.
After that I opened the dbo.NameValuePair_Latin1_General_CI_AS
table and had a look in there:
So just like we saw earlier, all the data related to the
Title field and its value (which is culture dependent) is stored in this table.
Composite Indexes:
SharePoint also allows composite indexes to be created on a
list. However there are limitations on which type of columns can be included as
the Primary and the Secondary Columns in the composite index.
If the Primary Column in a composite index is selected as
column whose value is stored in the dbo.NameValuePair table, then the secondary
column must also be selected which will be stored in the same table. So in
short, if the primary column is language independent, then the secondary column
in the index should also be language independent.
If the primary column in the list is language dependent, then you cannot specify a secondary column in the composite index.
Subscribe to:
Posts (Atom)