[TYPO3-Solr] Stemming and config
Rik Willems
rik at metmeer.nl
Wed Jun 12 13:41:21 CEST 2013
Op 12-06-13 11:58, Jigal van Hemert schreef:
> Hi,
>
> On 12-6-2013 8:54, Rik Willems wrote:
>> In typo3cores/conf/dutch/dutch-common-nouns.txt I have the following:
>> reiskosten
>> reiskostenaftrek
>> reiskostenforfait
>> reiskostenregeling
>> reiskostenvergoeding
>
> The whole words should already be tokenized by the
> StandardSolrTokenizer. Normally I would expect the real sub-words in
> your list:
> reis
> kosten
> reiskosten
> aftrek
> forfait
> regeling
> vergoeding
>
>> <!-- split subwords dutch nouns -->
>> <filter class="solr.DictionaryCompoundWordTokenFilterFactory"
>> dictionary="dutch/dutch-common-nouns.txt"
>> minWordSize="5" minSubwordSize="4" maxSubwordSize="15"
>> onlyLongestMatch="true"/>
>
> onlyLongestMatch would match with "reiskostenvergoeding" (which is in
> the dictionary) and none of the subwords would be included in the index
> (as far as I understood this filter factory).
>
Hi Jigal,
None of my changes/tries result in a change in the search results. Until
now I used the standard TYPO3 Solr schema.xml and added these changes.
Is this the correct place to do this?
Should I restart Tomcat after changes in the schema.xml? Doing this
resulted in nothing by the way, but it is good to know.
Cheers!
More information about the TYPO3-project-solr
mailing list