[TYPO3-Solr] SOLR_CONTENT and HTML output
Jigal van Hemert
jigal.van.hemert at typo3.org
Sun Aug 10 13:54:01 CEST 2014
Hi,
SOLR_CONTENT is meant to clean up content; it removes tags, entities,
incorrect UTF-8 characters and so on. There is however a small problem
with the resulting text:
If it's used in a field in a solr document and result highlighting is on
you may and up with a piece of text that is not valid HTML:
Original: [...] the department R&D; HRM is [...]
SOLR_CONTENT: [...] the department R&D; HRM is [...]
Match "department": [...] the <span class="highlight">department</span>
R&D; HRML is [...]
Validator says the &D; is not a valid entity. htmlSpecialChars cannot be
used on the result because it would ruin the highlighting tags.
Same problem might occur for other characters which should be encoded
for use in HTML.
Solution?
At the moment the workaround could be to use the SOLR_CONTENT object
inside a COA and apply htmlSpecialChars to it.
Maybe it would be useful for SOLR_CONTENT to get a property to set the
target context. HTML / JS / Text / ... and apply the proper encoding
before sending it off to the solr index.
What do you think?
--
Jigal van Hemert
TYPO3 CMS Active Contributor
TYPO3 .... inspiring people to share!
Get involved: typo3.org
More information about the TYPO3-project-solr
mailing list