Fast ESP 5.3 index profile oddities
The following mentions a few nuissances in the Fast ESP 5.3 index profile.
Composite field context weight sums
Weights are not relative to its containing element. I.e. the field-weights within the context part of a composite-rank do not sum up to a 100% which most people find intuitive. If you have two fields weighted 100 and 100 and get a hit in both fields, then hit in the composite becomes 200, instead of 100 as one could expect from a “100% hit”.
The single-field-composite warning
A search on non-composite fields do not generate dynamic rank. For this you will need to wrap it in a composite field, which is perfectly ok. However, when you bliss (upload) an index profile, ESP will spew out warnings. These can be ignored. (But remember to add the composite field reference to the rank profile, as usual, or it will have no effect.)
The context/occurence oddity
An often encountered problem is a composite with fields that have values contain the same tokens (words). A search will then get context hits in multiple fields and get rank contributions from each. However, in most cases you would only like to have a rank contribution from the hit in the highest weighted field. I have tried to turn off whatever I could find in config files, but not been able to solve this problem. There are however a few ways around this.
One involves ripping out duplicate words from the lower weighted fields during document processing, but that makes those fields useless for other purposes.
Another solution involves splitting the fields into singular composite fields and rewriting the query to contain a lot of parts searching each field with the same words, and joining them with the ANY operator.
A third solution is join the fields in question in a field-ref-group in the composite field. This will count multiple field hits as a hit the one group and assign a rank contribution according to the field-ref-group’s weight. But you will no longer be able to assign individual weights to each field.
The default oddity
One composite field must be tagged with
1 |
default="yes" |
If you have no default composite field, ESP can start to act funny, such as swapping int32 and string types in the result(!)
The quality oddity
This refers to the static boost contribution to the result defined by the “quality” element of the rank-profile.
If the quality element is not present, the default weight is 50, and the default quality field is “hwboost”. This is a magic field that is hard coded and not defined in the index profile. However, try to specify
1 |
<quality weight="50" field-ref="hwboost"/> |
and you will get an error that the field is not defined. This can seemingly safely be explicitly defined in the index profile as
1 |
<field name="hwboost" type="uint32" index="yes" sort="yes"/> |
The default start value of hwboost is 10000, and this can be added to or subtracted from during document processing.
The quality weight is limited to steps of 50 (0, 50, 100, 150, …). These values are actually transformed to multipliers 0, 1, 2, … So a weight value of “50″ does not mean half or 50%. With a default hwboost, a weight of “50″ transforms to multiplier 1, i.e. 1*10000=10000.
You can specify your own quality field. It must be of type uint32 (not int32!), index and sorting set to “yes”.
After changing the values, run
1 2 |
bliss-core -C index-profile.xml view-admin -m refresh |
and then wait a minute or so for the views to refresh.
Hardcore FAST ESP rank-tuning insights. Great way to introduce yourself, Hans Terje :-)
I know from my own experience that hwboost can be activated for empty queries by adding the directive “expandbitvector” to fsearch.addon. By default, empty queries get zero rank.
Hopefully someone will write blog post about the other gory details of the FAST ESP rank model, like freshness boosting and the entire rank computation algorithm itself.
Great work, Hans Terje – the issues you pinpoint will be of great value for our colleagues and other search specialits to save and make it right the first time!
hello,
I have an index Profile with accent character.
I’m using the command to export the indexprofile. Then i’m trying to import the xml that has just been exported, without any modification, thougt i have an error because the file do not match utf-8, ANSI…
I’ve tried to change the file charset, but it still doesn’t work ! Any idea ?
ERRATA :
My problem is in the search profile, not in the index profile.
Hi exstyle,
Just to clarify: are you saying the name of your search profile contains a letter with a diacritic? And you’re using the exportsearchprofile / importsearchprofile tools?
Hello,
Yes I’m using importsearchprofile/exportsearchprofile.
I get a MySearchProfile.zip with exportseachprofile. In it I have a file PreviewView.xml which contains some french letters such as é à è ù…
When I try to import it back (with importsearchprofile and without any modification) I have an encoding error :
View validation errors:
Fatal : Invalid byte 2 of 3-byte UTF-8 sequence.
I’m sure that it’s a problem with PreviewView because I tried to import it without french letters and it worked.
That sounds like a bug. If you haven’t already, make sure to apply all relevant ESP patches (for this particular problem look for the “adminserver” patches), or contact MS technical support.
Thanks, it seems that I don’t have the latest patch.
Will see if it works better.
IDocument doc = DocumentFactory.newDocument(uniqueid);
doc.addElement(DocumentFactory.newInteger(“hwboost”, 25000));
In java api, you can boost the ranking with the above code