[TYPO3-Solr] Search request "spam", partly by Google
Peter Kraume
usenet at kraume.de
Tue Jul 3 11:47:09 CEST 2012
We see a lot of nonsensical search requests in our TYPO3 projects which
use the Solr extension. Most of these requests are coming from China and
look similar to these examples:
lbqurss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/rss/no_cache/
A-40-D40-USB\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\*rss/no_cache/rss/no_cache
ä½ç›¸å¤‰èª¿å™¨ã€€ln-10.1radar/case/rush/case
33333333333333333333333333333333333333333x3
When using the "last searches" feature from the database, this leads to
ugly lists. We temporarily fixed this by switching the
getLastSearchesFromDatabase() command to the tx_solr_statistics table
and adding num_found > 0 to the query.
But I think it would be better to stop this kind of requests right in
the beginning by entering only those searches to the
tx_solr_last_searches table which have at least one result.
Furthermore the search terms should be checked for excessive use of
slashes or backslashes.
Another thing is the Google Bot hitting the search result page. We have
checked the statistics table and there are several 100k hits by Google
IP Addresses.
I think this can be stopped by adding a rel="nofollow" attribute to the
last searches and frequent searches links in the search result page.
I'd like to know now if others have similar problems and what you do to
prevent these kind of problems.
Cheers
Peter
More information about the TYPO3-project-solr
mailing list