[TYPO3-Solr] Stemming and config

Jigal van Hemert jigal.van.hemert at typo3.org
Wed Jun 12 11:58:08 CEST 2013


Hi,

On 12-6-2013 8:54, Rik Willems wrote:
> In typo3cores/conf/dutch/dutch-common-nouns.txt I have the following:
> reiskosten
> reiskostenaftrek
> reiskostenforfait
> reiskostenregeling
> reiskostenvergoeding

The whole words should already be tokenized by the 
StandardSolrTokenizer. Normally I would expect the real sub-words in 
your list:
reis
kosten
reiskosten
aftrek
forfait
regeling
vergoeding

> <!-- split subwords dutch nouns -->
> <filter class="solr.DictionaryCompoundWordTokenFilterFactory"
> dictionary="dutch/dutch-common-nouns.txt"
> minWordSize="5" minSubwordSize="4" maxSubwordSize="15"
> onlyLongestMatch="true"/>

onlyLongestMatch would match with "reiskostenvergoeding" (which is in 
the dictionary) and none of the subwords would be included in the index 
(as far as I understood this filter factory).

-- 
Jigal van Hemert
TYPO3 CMS Active Contributor

TYPO3 .... inspiring people to share!
Get involved: typo3.org


More information about the TYPO3-project-solr mailing list