1
votes

I'm doing a typical full text search using containstable using 'ISABOUT(term1,term2,term3)' and although it supports term weighting that's not what I need. I need the ability to boost the relevancy of terms contained in certain portions of text. For example, it is customary for metatags or page title to be weighted differently than body text when searching web pages. Although I'm not dealing with web pages I do seek the same functionality. In Lucene it's called Document Field Level Boosting. How would one natively do this in Sql Server Full Text Search?

2
I dont think you are going to find a native solution for this in sql server fts; even the near function is limited. You might be better off in lucene or dtsearch maybe even sharepoint taking advantage of its meta tagging features.u07ch
If someone hasn't already done it I might make a Sql Server CLR table valued function to Lucene.net. I'd be curious to see how it measures up.Snives

2 Answers

1
votes

This is just a thought -- is it possible to isolate the part you need boosting and then add the two together? I haven't had time to put it together properly, but let's say you have a 'document' column and a computed 'header' column, you could do something like this;

with compoundResults([KEY], [RANK]) as
(
 select 
     a.[key], 
     a.[rank] *0.7 + b.[rank] * 0.3
 from FREETEXTTABLE(dbo.Docs, document, @term) a 
 inner join FREETEXTTABLE(dbo.Docs, header, @term) b
 on a.[Key] = b.[Key]
)
select * from dbo.Docs c
 LEFT OUTER JOIN compoundResults d
  ON c.TermId = d.[KEY]

So this example uses freetexttable and not containstable, but the thing to note is that there is a CTE which selects a weighted rank, taking seven tenths from the document body and three tenths from the header.

0
votes

The native functionality you're looking for doesn't exist in SQL Server FTS.

What does your data look like? Would it work to do extend the keyword patterns in some way, so that they match the corresponding parts of the document? Something like:

ISABOUT("title ~ keyword ~ title" weight 0.8, "keyword" 0.2)