[TYPO3-Solr] Search request "spam", partly by Google

Peter Kraume usenet at kraume.de
Tue Jul 3 11:47:09 CEST 2012


We see a lot of nonsensical search requests in our TYPO3 projects which 
use the Solr extension. Most of these requests are coming from China and 
look similar to these examples:

lbqurss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/

A-40-D40-USB\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\*rss/no_cache/rss/no_cache

位相変調器 ln-10.1radar/case/rush/case

33333333333333333333333333333333333333333x3

When using the "last searches" feature from the database, this leads to 
ugly lists. We temporarily fixed this by switching the 
getLastSearchesFromDatabase() command to the tx_solr_statistics table 
and adding num_found > 0 to the query.

But I think it would be better to stop this kind of requests right in 
the beginning by entering only those searches to the 
tx_solr_last_searches table which have at least one result.

Furthermore the search terms should be checked for excessive use of 
slashes or backslashes.


Another thing is the Google Bot hitting the search result page. We have 
checked the statistics table and there are several 100k hits by Google 
IP Addresses.
I think this can be stopped by adding a rel="nofollow" attribute to the 
last searches and frequent searches links in the search result page.


I'd like to know now if others have similar problems and what you do to 
prevent these kind of problems.

Cheers
Peter


More information about the TYPO3-project-solr mailing list