[TYPO3-Solr] Explanation of scoring wanted

Jigal van Hemert jigal.van.hemert at typo3.org
Fri Mar 6 13:25:12 CET 2015


Hi,

On a simple query "afval" ("garbage" in Dutch) without any further 
boosting, etcetera (directly on the solr server admin interface) there 
is a huge difference between the scores of Nutch items and TYPO3 pages. 
Nutch records score between a few thousand and a few hundred thousand 
while the pages have a maximum score of a little over 2.0 .

I know the score value doesn't have a meaning as an absolute number, but 
the difference in these scores can hardly be influenced by any boosting 
settings.

List of the debugQuery output ("domain" is the placeholder of the actual 
domain name):

[... lots of nutch records skipped ...]
<str 
name="c293c0a7c8d3311249d309c91f39e5e5b192b6c0/tx_nutch_external/https://domain/Loket/prodcat/products/getProductDetailsAction.do?name=Asbestverwijdering+bedrijfsmatig">
14760.001 = (MATCH) sum of:
   14760.001 = (MATCH) max of:
     14760.001 = (MATCH) weight(content:afval^40.0 in 6617), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       14760.002 = (MATCH) fieldWeight(content:afval in 6617), product of:
         1.0 = tf(termFreq(content:afval)=1)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         3072.0 = fieldNorm(field=content, doc=6617)
</str><str 
name="c293c0a7c8d3311249d309c91f39e5e5b192b6c0/tx_nutch_external/https://domain/Loket/knowledgebase/faqs/getFaqContentAction.do?id=725">
6150.0 = (MATCH) sum of:
   6150.0 = (MATCH) max of:
     6150.0 = (MATCH) weight(content:afval^40.0 in 5877), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       6150.0005 = (MATCH) fieldWeight(content:afval in 5877), product of:
         1.0 = tf(termFreq(content:afval)=1)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         1280.0 = fieldNorm(field=content, doc=5877)
</str><str 
name="102b19e401862068820dd53b4a1beccb286f03a7/pages/27363/0/0/0">
2.1233919 = (MATCH) sum of:
   2.1233919 = (MATCH) max of:
     2.1233919 = (MATCH) weight(content:afval^40.0 in 493), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       2.123392 = (MATCH) fieldWeight(content:afval in 493), product of:
         1.4142135 = tf(termFreq(content:afval)=2)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.3125 = fieldNorm(field=content, doc=493)
     1.1733533 = (MATCH) weight(title:afval^5.0 in 493), product of:
       0.17471766 = queryWeight(title:afval^5.0), product of:
         5.0 = boost
         6.715711 = idf(docFreq=24, maxDocs=7590)
         0.0052032513 = queryNorm
       6.715711 = (MATCH) fieldWeight(title:afval in 493), product of:
         1.0 = tf(termFreq(title:afval)=1)
         6.715711 = idf(docFreq=24, maxDocs=7590)
         1.0 = fieldNorm(field=title, doc=493)
     1.500486 = (MATCH) weight(tagsH2H3:afval^3.0 in 493), product of:
       0.11628768 = queryWeight(tagsH2H3:afval^3.0), product of:
         3.0 = boost
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         0.0052032513 = queryNorm
       12.903225 = (MATCH) fieldWeight(tagsH2H3:afval in 493), product of:
         1.7320508 = tf(termFreq(tagsH2H3:afval)=3)
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         1.0 = fieldNorm(field=tagsH2H3, doc=493)
</str><str name="102b19e401862068820dd53b4a1beccb286f03a7/pages/7844/0/0/0">
1.7667065 = (MATCH) sum of:
   1.7667065 = (MATCH) max of:
     1.1917508 = (MATCH) weight(content:afval^40.0 in 3750), product of:
       0.99999994 = queryWeight(content:afval^40.0), product of:
         40.0 = boost
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.0052032513 = queryNorm
       1.1917509 = (MATCH) fieldWeight(content:afval in 3750), product of:
         2.6457512 = tf(termFreq(content:afval)=7)
         4.804688 = idf(docFreq=168, maxDocs=7590)
         0.09375 = fieldNorm(field=content, doc=3750)
     1.1733533 = (MATCH) weight(title:afval^5.0 in 3750), product of:
       0.17471766 = queryWeight(title:afval^5.0), product of:
         5.0 = boost
         6.715711 = idf(docFreq=24, maxDocs=7590)
         0.0052032513 = queryNorm
       6.715711 = (MATCH) fieldWeight(title:afval in 3750), product of:
         1.0 = tf(termFreq(title:afval)=1)
         6.715711 = idf(docFreq=24, maxDocs=7590)
         1.0 = fieldNorm(field=title, doc=3750)
     1.7667065 = (MATCH) weight(keywords:afval^2.0 in 3750), product of:
       0.08663568 = queryWeight(keywords:afval^2.0), product of:
         2.0 = boost
         8.325149 = idf(docFreq=4, maxDocs=7590)
         0.0052032513 = queryNorm
       20.392366 = (MATCH) fieldWeight(keywords:afval in 3750), product of:
         2.4494898 = tf(termFreq(keywords:afval)=6)
         8.325149 = idf(docFreq=4, maxDocs=7590)
         1.0 = fieldNorm(field=keywords, doc=3750)
     1.500486 = (MATCH) weight(tagsH2H3:afval^3.0 in 3750), product of:
       0.11628768 = queryWeight(tagsH2H3:afval^3.0), product of:
         3.0 = boost
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         0.0052032513 = queryNorm
       12.903225 = (MATCH) fieldWeight(tagsH2H3:afval in 3750), product of:
         1.7320508 = tf(termFreq(tagsH2H3:afval)=3)
         7.4496803 = idf(docFreq=11, maxDocs=7590)
         1.0 = fieldNorm(field=tagsH2H3, doc=3750)
</str>
[... lots of page documents skipped ...]

Can anyone explain the huge differences a bit? Thanks in advance!

-- 
Jigal van Hemert
TYPO3 CMS Active Contributor

TYPO3 .... inspiring people to share!
Get involved: typo3.org


More information about the TYPO3-project-solr mailing list