[TYPO3-core] RFC: html cleaner corrupts RDF data in HTML comments

Dmitry Dulepov [typo3] dmitry at typo3.org
Mon Oct 1 13:31:42 CEST 2007


Hi!

This is SVN patch request.

Branches: trunk, 4.1

Problem: if config.xhtml_clean=all is set in TS setup, html cleaner will convert all tags to lower case, including those in comments. RDF uses several upper-case tags and they are case-sensitive. Thus html cleaner corrupts this data.

Solution: first problem was discovered by Ingo (see http://www.ingo-renner.com/index.php?id=13) and he intended to use regular expresssion to remove comments and than put them back. I am not sure if this succeeded or not but I made my own solution, which just ignores content of comments completely while cleaning html. I think this is compatible with existing decription of xhtml_cleaning in TSRef. One note to prevent possible questions about nesting: nested comments are not allowed in html by specification ;)

How to test: add the following block to main TS and see the effect on rdf:RDF before and after the patch.

----------------------
config.xhtml_ceaning = all

page.100 = TEXT
page.100.value (
<!--
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:cd="http://www.recshop.fake/cd#"> 

<rdf:Description
 rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
  <cd:artist>Bob Dylan</cd:artist>
  <cd:country>USA</cd:country>
  <cd:company>Columbia</cd:company>
  <cd:price>10.90</cd:price>
  <cd:year>1985</cd:year>
</rdf:Description>

</rdf:RDF>
-->This text goes immediately after comment and outside any tag
----------------------

-- 
Dmitry Dulepov
TYPO3 freelancer / TYPO3 core team member
Web: http://typo3bloke.net/
Skype: callto:liels_bugs
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: htmlcleaner_comments.patch
Url: http://lists.netfielders.de/pipermail/typo3-team-core/attachments/20071001/8233c80a/attachment.txt 


More information about the TYPO3-team-core mailing list