[TYPO3-Solr] Crop viewhelper not cropping Tika PDF abstract in results

Thu May 24 21:36:24 CEST 2012

Hello newsgroup,

we have problems getting the crop viewhelper working on teaser text in 
the result list originating from PDF content extracted by Tika.

Our config:

viewhelpers {
	crop {
		maxLength = 300
		cropIndicator = ...
		cropFullWords = 0
         }
}

That works fine with regular indexed TYPO3 DB content.

However, it has no effect at all on some (not all!) of the indexed PDF 
file results - the abstract text will not be cropped.

The indexed file text seems to include all kinds of charset garbage, 
which is non-optimal for displaying in SERPs anyway.

I think this might by related to tslib_cObj::cropHTML() used in the 
helper, which has problems with newlines (and probably other 
non-printable stuff) - see http://forge.typo3.org/issues/28741

We tried preg-removing all non printable chars before the cropHTML() 
call in the helper class, but to no avail.

After all, we only want indexed PDF abstract text to be cropped just the 
way it works with TYPO3 content.

Anyone having similar issues?

Thanks and cheers
Chris