Comments on: How FS4SP primary keys work

By: Basant

Basant — Thu, 03 Oct 2013 10:14:34 +0000

Thanks for explian in the depth of FS4SP primaty key concept with respect of internalId and contentid.

By: Mikael Svenson

Mikael Svenson — Wed, 11 Jan 2012 19:20:10 +0000

Christian,

This is somewhat correct but also wrong.

If you add another Content SSA and point it towards a different collection than the first one, for example “sp2″ instead of “sp”, this will work just fine as the collection name is appended to the internal id in FS4SP. And you will not get a collision on ID’s in FS4SP. Yes, the same ID will appear in two Content SSA’s, but this works just fine.

By: Christian Marshall Rieck

Christian Marshall Rieck — Wed, 11 Jan 2012 13:07:58 +0000

Just to elaborate on “Luckily, SharePoint makes sure to assign the Item IDs so that they’re unique across all collections, hence creating unique internalid:s even though the items are in the same collection.”

SharePoint will generate this unique ID with a counter. The counter is stored in the Content SSA and this is the reason why you cannot have more than one Content SSA, they would generate the same ID for different documents.

By: Marcus Johansson

Marcus Johansson — Fri, 06 Jan 2012 13:53:35 +0000

Hi Ben,

Glad it was useful!

I guess you’re thinking of this tool http://gallery.technet.microsoft.com/scriptcenter/14105abb-29da-43fd-90f4-ac12f1a0233a ?

It asks for the internalid and the contentid, so in your case the contentid should be the full URL that was crawled, and the internalid is derived from the contentid as explained in the post above.

By: Ben Liang

Ben Liang — Thu, 05 Jan 2012 16:42:57 +0000

Timely tip indeed. I was trying to figure out why some items in my index have contentid that is not an integer. Microsoft has a PowerShell script (GetFiXML) that seems to require an integer contentid. So I guess I am out of luck when it comes to getting FiXML for content indexed by FAST Web Crawler.

By: Mikael Svenson

Mikael Svenson — Sun, 18 Dec 2011 19:48:55 +0000

Good explanation Marcus!

The good thing about using MD5′s is that the index can be independent of any crawler framework and still generate an internal ID to represent the item.

The bad part however is that, although minuscule, there is a chance of ID overlap, as it’s a check sum.

Storing an integer in the search index also take less space than md5 and will in most cases be more optimal. Time will tell if we can still use multiple crawler frameworks in the future, or if MS optimize it forcing everything via the SP crawler framework. Having one crawler framework makes maintenance a bit easier imo.