[TYPO3-Solr] Stemming and config
Jigal van Hemert
jigal.van.hemert at typo3.org
Wed Jun 12 11:58:08 CEST 2013
Hi,
On 12-6-2013 8:54, Rik Willems wrote:
> In typo3cores/conf/dutch/dutch-common-nouns.txt I have the following:
> reiskosten
> reiskostenaftrek
> reiskostenforfait
> reiskostenregeling
> reiskostenvergoeding
The whole words should already be tokenized by the
StandardSolrTokenizer. Normally I would expect the real sub-words in
your list:
reis
kosten
reiskosten
aftrek
forfait
regeling
vergoeding
> <!-- split subwords dutch nouns -->
> <filter class="solr.DictionaryCompoundWordTokenFilterFactory"
> dictionary="dutch/dutch-common-nouns.txt"
> minWordSize="5" minSubwordSize="4" maxSubwordSize="15"
> onlyLongestMatch="true"/>
onlyLongestMatch would match with "reiskostenvergoeding" (which is in
the dictionary) and none of the subwords would be included in the index
(as far as I understood this filter factory).
--
Jigal van Hemert
TYPO3 CMS Active Contributor
TYPO3 .... inspiring people to share!
Get involved: typo3.org
More information about the TYPO3-project-solr
mailing list