Microsoft SharePoint: Most Extreme SharePoint Search (MESS) solved with a little CAML
Most Extreme SharePoint Search (MESS)
Purpose
For about the last month I've been working on a task where the objective of the MESS was to search an External Database, and then show results matched with data from within a SharePoint 2010 Document Library. This is a compilation of the issues I encountered, such that if it has to be done again, I won't forget to remember the lessons learned. Hopefully you'll find value as well.
Alpha
The connection method to the External database involved a SharePoint 2010 External Content Type (ECT), which is the basis of an SharePoint External List. (It was setup by a coworker, and built within Visual Studio, as we're using DB2, and couldn't be built in SharePoint Designer. I'm not sure of all the exact details, I just know that it works.)
Based on a post by Amit Kumawat 'Using KeywordQuery Class in SharePoint Query Object Model' (<-- Link gone, sorry), I decided to try using Keyword Query Language (KQL). When his technique & modified code was used against the External List, it returned the values I needed, but that's where the data stopped. To understand what I mean about stopped, I'll give you some more specs about my infrastructure.
MESS Infrastructure
The External Content was an Assets table in DB2, and it contains all the needed metadata about the Asset. As this data is used regularly for other systems, the decision to not move nor duplicate it in SharePoint meant that the search would need to hit this source first.
A document library, called 'Shared Documents' (original, huh?), is where users can upload documents regarding the Assets. This is a relatively straight forward, except that any document can be related to any Asset, therefore a Many-To-Many list is needed to match the Assets to the Documents.
To Handle the Many-To-Many relationship, a SharePoint Custom list, called xrefAssetsToDocs, was employed to simply hold the AssetID from the External list, and the Document ID from the SharePoint Document Library.
KQL Strike Out
Like I said, the KQL executed flawlessly, and returned a result set of Assets without issue. But, to continue the linking through xref and over to Documents would require iterating through each returned Item, and then figuring out how to retrieve the join of the xref to Docs. At the time of doing this, I was only vaguely aware of the abilities of LINQ and CAML, so the hockey-stick learning curve began.
I painted myself into a corner with the GridView concept, which I plagarized from Amit. The GridView control can be bound using a DataTable, and with the snap of the fingers the data is displayed. So, with the assistance of a patient and diligent co-worker, we began exploring the use of LINQ to perform the needed matchup of the Assets to Docs, but quickly found that returning data from LINQ into a DataTable required a wiggling of Samantha's nose, and we soon abandoned the effort.
From there, the thought of simply populating DataTables with both the xref and DocLib data, and then looping through multiple iterations, but the thought of a possible 100,000+ documents being iterated numerous times was a performance nightmare for which we didn't want to be responsible. Thus, we abandoned LINQ, and KQL.
Pushing the CAML through the eye of the needle
Upon further research, it appeared the CAML was designed to solve the problems of all mankind. And it is, but it's only about as wide as the eye of a needle. The limitations, complications, and ambiguity surrounding the use of CAML was enough to cause great consternation while crossing the desert of newbie development. (I have a comp sci degree, but trying to learn under pressure is never good.)
Below is a list of lessons learned acquired over about a month's time. So as lackluster as this appears to be, please know that the many hours of confusion and suffering have been removed.
1. A CAML Join must be based on the child of a relationship, not the parent.
2. A CAML Join can only be between data sources where there exists a Lookup Field in the Child. And, the Parent of the relation CAN NOT be an external source. (i.e. ECT)
3. CAML uses the internal names of fields, not the display names. These can be different. SharePoint Manager is your friend when it comes to identifying Internal field names.
4. If you copy and paste code from the internet into Visual Studio, it may convert double-quotes and apostrophes into the cartoony version of them, and not be the true characters that VS expects.
5. CAML Query Editors, like U2U are fantastic improvements over trying to write CAML yourself, but unfortunately many of them DON'T support Joins.
6. Certain fields, like FileLeafRef or any of the other Internal Field Name for a DocLib's Name field, can NOT be used in the CAML projectfields, as they cause an unexplained error. And if they can't be projected, then they can't be added to the ViewFields.
Armed with these fair warnings, I present to you an excerpt of the code:
void btnSearch_Click(object sender, EventArgs e) { using (SPSite site = new SPSite(Constants.Namespace)) { SPWeb web = site.OpenWeb(); SPQuery qryAsset = new SPQuery(); SPListItemCollection licXref; SPListItemCollection licAssets; SPListItemCollection licDocs; DateTime dtSeized; SPList lstXref = web.Lists["xrefAssetsToDocs"]; SPList lstAsset = web.Lists["Assets"]; SPList lstDocs = web.Lists["Shared Documents"]; if (String.IsNullOrEmpty(txtSeized.Text)) { lblResults.Text = "Please enter/select a Seizure Date."; return; } else { try { //Will convert dtSeized to the ISO8601 DateTime format. dtSeized = Convert.ToDateTime(txtSeized.Text); } catch { lblResults.Text = "Please enter/select a Valid Seizure Date."; return; } } //Search via the SZ_DT field, which is in the ECT based list qryAsset.Query = string.Concat( "<Where><Eq><FieldRef Name='SZ_DT' /><Value IncludeTimeValue='FALSE' Type='DateTime'>", string.Format("{0}", SPUtility.CreateISO8601DateTimeFromSystemDateTime(dtSeized)), "</Value></Eq></Where>", "<OrderBy><FieldRef Name='ASSET_ID'/></OrderBy>" ); qryAsset.ViewFields = string.Concat( "<Field Name='ASSET_ID' Type='Lookup' List='Assets' ShowField='ASSET_ID' />", "<Field Name='ASSET_DESC' Type='Lookup' List='Assets' ShowField='ASSET_DESC' />", "<Field Name='ASSET_TYP' Type='Lookup' List='Assets' ShowField='ASSET_TYP' />", "<Field Name='SZ_DT' Type='Lookup' List='Assets' ShowField='SZ_DT' />", "<Field Name='CA_ID_AGCY' Type='Lookup' List='Assets' ShowField='CA_ID_AGCY' />" ); licAssets = lstAsset.GetItems(qryAsset); if (licAssets.Count > 0) { tblresults = string.Concat( "Assets Found: ", licAssets.Count, "<table width='100%'>", "<tr>", "<td>", "Asset ID", "</td>", "<td>", "Asset Seized", "</td>", "<td>", "Document ID", "</td>", "<td>", "Document Title", "</td>", "<td>", "Document Link", "</td>", "</tr>" ); //For Each AssetID found, Loop through to gather the other data foreach (SPListItem itemAsset in licAssets) { SPQuery qryXref = new SPQuery(); qryXref.Query = string.Concat( "<Where><Eq><FieldRef Name='AssetID'/><Value Type='Text'>", itemAsset["ASSET_ID"], "</Value></Eq></Where>" ); //Base on the Child list of the relationship //and join in the parent //DocID is the internal name of the DocID field in xref //ID is the internal name of the ID field in Shared Docs qryXref.Joins = string.Concat( "<Join Type='INNER' ListAlias='Shared Documents'><Eq>", "<FieldRef Name='DocID' RefType='Id' />", "<FieldRef List='Shared Documents' Name='ID' />", "</Eq></Join>" ); //Attempted to add LinkFileName here, but it doesn't work!! qryXref.ProjectedFields = string.Concat( "<Field Name='DocTitle' Type='Lookup' List='Shared Documents' ShowField='Title' />" ); qryXref.ViewFields = string.Concat( "<FieldRef Name='DocTitle' />", "<FieldRef Name='DocID' />" ); //Because LinkFileName can NOT be a projected field, added //the following loop to get it. (Ugh!) licXref = lstXref.GetItems(qryXref); foreach (SPListItem itemXref in licXref) { SPQuery qryDocs = new SPQuery(); //DocID returns as a hyperlink. 2;#2 //Thus, need to parse the string before trying to query with it. string strDocID = string.Concat(itemXref["DocID"]); int intPos = strDocID.IndexOf(";"); if (intPos > -1) { strDocID = strDocID.Substring(0,intPos); } qryDocs.Query = string.Concat( "<Where><Eq><FieldRef Name='ID'/><Value Type='Text'>", strDocID, "</Value></Eq></Where>" ); //Made this a variable, in case I decided to return //a different internal name field, like FileLeafRef string FileLink = "LinkFilename"; qryDocs.ViewFields = string.Concat( "<FieldRef Name='",FileLink,"' />" ); licDocs = lstDocs.GetItems(qryDocs); foreach (SPListItem itemDoc in licDocs) { tblresults += string.Concat( "<tr>", "<td>", itemAsset["ASSET_ID"], "</td>", "<td>", itemAsset["SZ_DT"], "</td>", "<td>", strDocID, "</td>", "<td>", itemXref["DocTitle"], "</td>", "<td><a href='http://dev-sps-21/sites/EDOCSdevsite/Shared%20Documents/", itemDoc[FileLink], "'>",itemDoc[FileLink],"</a></td>", "</tr>" ); } } licXref = null; }; tblresults += "</table>"; lblResults.Text = tblresults; } else { lblResults.Text = "No Records Found"; } } }
Omega
I don't like that I have to iterate through two result sets, but my estimate is that they'll be low, and the fact that I can finitely query the Document Library gives me confidence that this solution will scale effectively.
Thanks to all the other bloggers out there that taught me what I now know, and I hope this helps the next developer.