[TYPO3-Solr] Explanation of scoring wanted
Jigal van Hemert
jigal.van.hemert at typo3.org
Fri Mar 6 13:25:12 CET 2015
Hi,
On a simple query "afval" ("garbage" in Dutch) without any further
boosting, etcetera (directly on the solr server admin interface) there
is a huge difference between the scores of Nutch items and TYPO3 pages.
Nutch records score between a few thousand and a few hundred thousand
while the pages have a maximum score of a little over 2.0 .
I know the score value doesn't have a meaning as an absolute number, but
the difference in these scores can hardly be influenced by any boosting
settings.
List of the debugQuery output ("domain" is the placeholder of the actual
domain name):
[... lots of nutch records skipped ...]
<str
name="c293c0a7c8d3311249d309c91f39e5e5b192b6c0/tx_nutch_external/https://domain/Loket/prodcat/products/getProductDetailsAction.do?name=Asbestverwijdering+bedrijfsmatig">
14760.001 = (MATCH) sum of:
14760.001 = (MATCH) max of:
14760.001 = (MATCH) weight(content:afval^40.0 in 6617), product of:
0.99999994 = queryWeight(content:afval^40.0), product of:
40.0 = boost
4.804688 = idf(docFreq=168, maxDocs=7590)
0.0052032513 = queryNorm
14760.002 = (MATCH) fieldWeight(content:afval in 6617), product of:
1.0 = tf(termFreq(content:afval)=1)
4.804688 = idf(docFreq=168, maxDocs=7590)
3072.0 = fieldNorm(field=content, doc=6617)
</str><str
name="c293c0a7c8d3311249d309c91f39e5e5b192b6c0/tx_nutch_external/https://domain/Loket/knowledgebase/faqs/getFaqContentAction.do?id=725">
6150.0 = (MATCH) sum of:
6150.0 = (MATCH) max of:
6150.0 = (MATCH) weight(content:afval^40.0 in 5877), product of:
0.99999994 = queryWeight(content:afval^40.0), product of:
40.0 = boost
4.804688 = idf(docFreq=168, maxDocs=7590)
0.0052032513 = queryNorm
6150.0005 = (MATCH) fieldWeight(content:afval in 5877), product of:
1.0 = tf(termFreq(content:afval)=1)
4.804688 = idf(docFreq=168, maxDocs=7590)
1280.0 = fieldNorm(field=content, doc=5877)
</str><str
name="102b19e401862068820dd53b4a1beccb286f03a7/pages/27363/0/0/0">
2.1233919 = (MATCH) sum of:
2.1233919 = (MATCH) max of:
2.1233919 = (MATCH) weight(content:afval^40.0 in 493), product of:
0.99999994 = queryWeight(content:afval^40.0), product of:
40.0 = boost
4.804688 = idf(docFreq=168, maxDocs=7590)
0.0052032513 = queryNorm
2.123392 = (MATCH) fieldWeight(content:afval in 493), product of:
1.4142135 = tf(termFreq(content:afval)=2)
4.804688 = idf(docFreq=168, maxDocs=7590)
0.3125 = fieldNorm(field=content, doc=493)
1.1733533 = (MATCH) weight(title:afval^5.0 in 493), product of:
0.17471766 = queryWeight(title:afval^5.0), product of:
5.0 = boost
6.715711 = idf(docFreq=24, maxDocs=7590)
0.0052032513 = queryNorm
6.715711 = (MATCH) fieldWeight(title:afval in 493), product of:
1.0 = tf(termFreq(title:afval)=1)
6.715711 = idf(docFreq=24, maxDocs=7590)
1.0 = fieldNorm(field=title, doc=493)
1.500486 = (MATCH) weight(tagsH2H3:afval^3.0 in 493), product of:
0.11628768 = queryWeight(tagsH2H3:afval^3.0), product of:
3.0 = boost
7.4496803 = idf(docFreq=11, maxDocs=7590)
0.0052032513 = queryNorm
12.903225 = (MATCH) fieldWeight(tagsH2H3:afval in 493), product of:
1.7320508 = tf(termFreq(tagsH2H3:afval)=3)
7.4496803 = idf(docFreq=11, maxDocs=7590)
1.0 = fieldNorm(field=tagsH2H3, doc=493)
</str><str name="102b19e401862068820dd53b4a1beccb286f03a7/pages/7844/0/0/0">
1.7667065 = (MATCH) sum of:
1.7667065 = (MATCH) max of:
1.1917508 = (MATCH) weight(content:afval^40.0 in 3750), product of:
0.99999994 = queryWeight(content:afval^40.0), product of:
40.0 = boost
4.804688 = idf(docFreq=168, maxDocs=7590)
0.0052032513 = queryNorm
1.1917509 = (MATCH) fieldWeight(content:afval in 3750), product of:
2.6457512 = tf(termFreq(content:afval)=7)
4.804688 = idf(docFreq=168, maxDocs=7590)
0.09375 = fieldNorm(field=content, doc=3750)
1.1733533 = (MATCH) weight(title:afval^5.0 in 3750), product of:
0.17471766 = queryWeight(title:afval^5.0), product of:
5.0 = boost
6.715711 = idf(docFreq=24, maxDocs=7590)
0.0052032513 = queryNorm
6.715711 = (MATCH) fieldWeight(title:afval in 3750), product of:
1.0 = tf(termFreq(title:afval)=1)
6.715711 = idf(docFreq=24, maxDocs=7590)
1.0 = fieldNorm(field=title, doc=3750)
1.7667065 = (MATCH) weight(keywords:afval^2.0 in 3750), product of:
0.08663568 = queryWeight(keywords:afval^2.0), product of:
2.0 = boost
8.325149 = idf(docFreq=4, maxDocs=7590)
0.0052032513 = queryNorm
20.392366 = (MATCH) fieldWeight(keywords:afval in 3750), product of:
2.4494898 = tf(termFreq(keywords:afval)=6)
8.325149 = idf(docFreq=4, maxDocs=7590)
1.0 = fieldNorm(field=keywords, doc=3750)
1.500486 = (MATCH) weight(tagsH2H3:afval^3.0 in 3750), product of:
0.11628768 = queryWeight(tagsH2H3:afval^3.0), product of:
3.0 = boost
7.4496803 = idf(docFreq=11, maxDocs=7590)
0.0052032513 = queryNorm
12.903225 = (MATCH) fieldWeight(tagsH2H3:afval in 3750), product of:
1.7320508 = tf(termFreq(tagsH2H3:afval)=3)
7.4496803 = idf(docFreq=11, maxDocs=7590)
1.0 = fieldNorm(field=tagsH2H3, doc=3750)
</str>
[... lots of page documents skipped ...]
Can anyone explain the huge differences a bit? Thanks in advance!
--
Jigal van Hemert
TYPO3 CMS Active Contributor
TYPO3 .... inspiring people to share!
Get involved: typo3.org
More information about the TYPO3-project-solr
mailing list