[TYPO3-Solr]  Explanation of scoring wanted
    Jigal van Hemert 
    jigal.van.hemert at typo3.org
       
    Fri Mar  6 13:25:12 CET 2015
    
    
  
Hi,
On a simple query "afval" ("garbage" in Dutch) without any further 
boosting, etcetera (directly on the solr server admin interface) there 
is a huge difference between the scores of Nutch items and TYPO3 pages. 
Nutch records score between a few thousand and a few hundred thousand 
while the pages have a maximum score of a little over 2.0 .
I know the score value doesn't have a meaning as an absolute number, but 
the difference in these scores can hardly be influenced by any boosting 
settings.
List of the debugQuery output ("domain" is the placeholder of the actual 
domain name):
[... lots of nutch records skipped ...]
<str 
name="c293c0a7c8d3311249d309c91f39e5e5b192b6c0/tx_nutch_external/https://domain/Loket/prodcat/products/getProductDetailsAction.do?name=Asbestverwijdering+bedrijfsmatig">
14760.001 = (MATCH) sum of:
   14760.001 = (MATCH) max of:
     14760.001 = (MATCH) weight(content:afval^40.0 in 6617), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       14760.002 = (MATCH) fieldWeight(content:afval in 6617), product of:
         1.0 = tf(termFreq(content:afval)=1)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         3072.0 = fieldNorm(field=content, doc=6617)
</str><str 
name="c293c0a7c8d3311249d309c91f39e5e5b192b6c0/tx_nutch_external/https://domain/Loket/knowledgebase/faqs/getFaqContentAction.do?id=725">
6150.0 = (MATCH) sum of:
   6150.0 = (MATCH) max of:
     6150.0 = (MATCH) weight(content:afval^40.0 in 5877), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       6150.0005 = (MATCH) fieldWeight(content:afval in 5877), product of:
         1.0 = tf(termFreq(content:afval)=1)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         1280.0 = fieldNorm(field=content, doc=5877)
</str><str 
name="102b19e401862068820dd53b4a1beccb286f03a7/pages/27363/0/0/0">
2.1233919 = (MATCH) sum of:
   2.1233919 = (MATCH) max of:
     2.1233919 = (MATCH) weight(content:afval^40.0 in 493), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       2.123392 = (MATCH) fieldWeight(content:afval in 493), product of:
         1.4142135 = tf(termFreq(content:afval)=2)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.3125 = fieldNorm(field=content, doc=493)
     1.1733533 = (MATCH) weight(title:afval^5.0 in 493), product of:
       0.17471766 = queryWeight(title:afval^5.0), product of:
         5.0 = boost
         6.715711 = idf(docFreq=24, maxDocs=7590)
         0.0052032513 = queryNorm
       6.715711 = (MATCH) fieldWeight(title:afval in 493), product of:
         1.0 = tf(termFreq(title:afval)=1)
         6.715711 = idf(docFreq=24, maxDocs=7590)
         1.0 = fieldNorm(field=title, doc=493)
     1.500486 = (MATCH) weight(tagsH2H3:afval^3.0 in 493), product of:
       0.11628768 = queryWeight(tagsH2H3:afval^3.0), product of:
         3.0 = boost
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         0.0052032513 = queryNorm
       12.903225 = (MATCH) fieldWeight(tagsH2H3:afval in 493), product of:
         1.7320508 = tf(termFreq(tagsH2H3:afval)=3)
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         1.0 = fieldNorm(field=tagsH2H3, doc=493)
</str><str name="102b19e401862068820dd53b4a1beccb286f03a7/pages/7844/0/0/0">
1.7667065 = (MATCH) sum of:
   1.7667065 = (MATCH) max of:
     1.1917508 = (MATCH) weight(content:afval^40.0 in 3750), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       1.1917509 = (MATCH) fieldWeight(content:afval in 3750), product of:
         2.6457512 = tf(termFreq(content:afval)=7)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.09375 = fieldNorm(field=content, doc=3750)
     1.1733533 = (MATCH) weight(title:afval^5.0 in 3750), product of:
       0.17471766 = queryWeight(title:afval^5.0), product of:
         5.0 = boost
         6.715711 = idf(docFreq=24, maxDocs=7590)
         0.0052032513 = queryNorm
       6.715711 = (MATCH) fieldWeight(title:afval in 3750), product of:
         1.0 = tf(termFreq(title:afval)=1)
         6.715711 = idf(docFreq=24, maxDocs=7590)
         1.0 = fieldNorm(field=title, doc=3750)
     1.7667065 = (MATCH) weight(keywords:afval^2.0 in 3750), product of:
       0.08663568 = queryWeight(keywords:afval^2.0), product of:
         2.0 = boost
         8.325149 = idf(docFreq=4, maxDocs=7590)
         0.0052032513 = queryNorm
       20.392366 = (MATCH) fieldWeight(keywords:afval in 3750), product of:
         2.4494898 = tf(termFreq(keywords:afval)=6)
         8.325149 = idf(docFreq=4, maxDocs=7590)
         1.0 = fieldNorm(field=keywords, doc=3750)
     1.500486 = (MATCH) weight(tagsH2H3:afval^3.0 in 3750), product of:
       0.11628768 = queryWeight(tagsH2H3:afval^3.0), product of:
         3.0 = boost
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         0.0052032513 = queryNorm
       12.903225 = (MATCH) fieldWeight(tagsH2H3:afval in 3750), product of:
         1.7320508 = tf(termFreq(tagsH2H3:afval)=3)
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         1.0 = fieldNorm(field=tagsH2H3, doc=3750)
</str>
[... lots of page documents skipped ...]
Can anyone explain the huge differences a bit? Thanks in advance!
-- 
Jigal van Hemert
TYPO3 CMS Active Contributor
TYPO3 .... inspiring people to share!
Get involved: typo3.org
    
    
More information about the TYPO3-project-solr
mailing list