[TYPO3-performance] Cache priming

Stephan Schuler Stephan.Schuler at netlogix.de
Tue Dec 23 14:39:02 CET 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hey there.

Cached content depends on various things.

You can have e.g. one page that is public for each and every visitor but contains one single content which is only available for one user group and another single content that is *not* available for logged in users to notify them about additional content they can benefit after registering.

In addition to this very obvious thing: Think about every condition you can make in TypoScript.
You can serve different content based on different IPs by using the "IP" condition.
You can provide different static layouts based on mobile stuff. Although I would suggest to serve one static thing and make media queries instead, you could use the "device" condition.
You can provide additional "fixup" CSS for legacy browsers with the "device" and "version" condition, or maybe use the "useragent".
You can introduce some "show this page in your language" messages by utilizing the "language" condition.
There used to be clumsy pages back then in in 1990 or something that provided dark layouts for evening hours and light colors for daylight hours. I guess that was what the "hour" condition was meant for.

You see, there are tons of conditions matching request data. And all of those will force you to create distinct content cache entries for each and every *condition combination*.

Filling the cache can easily become a very heavy task in terms of both, taking care of every edge case on one hand and pure workload on the other side.

I'm not completely sure if you shape your performance in a way you want.
Did you try how long it takes for you to grep your whole site once?
You *think* you fill the cache to avoid fe_users to hit uncached pages. But on heavy loaded sites you might just increase the overall load to your server which makes rendering slower for everybody. And on heavy loaded pages it's likely to only fill a small portion of your pages through your crawing script -- the small portion that has nearly no load itself and thus doesn't get triggered by regular fe_users. Those single pages that have heavy load themselves might get triggered by regular fe_users *before* your crawler reaches them. So your crawler just increases load, too, by fetching cached data.

Depending on your server setup, there are chances that you limit the number of PHP worker processes. Usually you take the max amount of memory a single PHP process is allowed to take, assume every PHP process takes as much memory as it is allowed to and then calculate how many PHP processes your server can survive before being required to swap. That brings you to a certain number of PHP processes, which equals the number of simultaneous site requests that do not request static data. Usually everybody from the outside world can determine this number by firing up "ab" with different concurrency values.
If there are times your server hits that number and goes to 100% load, having a crawler just to fill caches does more harm than good.

I would suggest to *not* rely on cache filling mechanisms from the outside. Better create a strong concept of cache tags and avoid the "clear all" cache commands at all. When adjusting e.g. a news record there are only two pages to be cleared: The list view and the detail view. Done. This results in nearly 100% of your data *being* cached and *staying* cached during daily business. Only those rare situations where you deploy new code requires full cache clearing and thus full cache renewal. But as long as you don't do that twice a day you're just fine.

Regards,




Stephan Schuler
Web-Entwickler | netlogix Media

Telefon: +49 (911) 539909 - 0
E-Mail: Stephan.Schuler at netlogix.de
Web: media.netlogix.de




netlogix GmbH & Co. KG
IT-Services | IT-Training | Media
Neuwieder Straße 10 | 90411 Nürnberg
Telefon: +49 (911) 539909 - 0 | Fax: +49 (911) 539909 - 99
E-Mail: info at netlogix.de | Web: http://www.netlogix.de

netlogix GmbH & Co. KG ist eingetragen am Amtsgericht Nürnberg (HRA 13338)
Persönlich haftende Gesellschafterin: netlogix Verwaltungs GmbH (HRB 20634)
Umsatzsteuer-Identifikationsnummer: DE 233472254
Geschäftsführer: Stefan Buchta, Matthias Schmidt



- -----Ursprüngliche Nachricht-----
Von: typo3-performance-bounces at lists.typo3.org [mailto:typo3-performance-bounces at lists.typo3.org] Im Auftrag von Jonas Eberle
Gesendet: Donnerstag, 18. Dezember 2014 10:52
An: typo3-performance at lists.typo3.org
Betreff: [TYPO3-performance] Cache priming

Hi list,

Just if someone is interested in 'priming' the page cache of a site, here is a script I use.

I put it up here as open for discussion. I think a goal should always be to not let the frontend user regenerate the cache. With this script there is at least some reasonable possiblity that this regenerates the page cache instead of the user having to wait.

Regards,
Jonas


#! /bin/bash
# @author jonas.eberle at d-mind.de
# Crawl a site to 'prime' the frontend cache. It will log the total time taken for one run.
# It will not spawn multiple times (locking mechanism). Thus it is save calling it in an often-running cronjob. Keep in mind that it might # affect your stats counter if it does not ignore the wget user-agent.
#
# @param URL $1
# @param User $2 (HTTP-Auth)
# @param Password $3 (HTTP-Auth)

URL="$1"
if [ -z "$URL" ]; then
         echo 'ERROR: no URL given'
         exit 1
fi

THIS=$(readlink -f $0)
THISDIR=$(dirname $THIS)
THISDIRBASE=$THISDIR/$(basename $0)
URL_sanitized=$(echo $URL | sed -r s/[:/]/_/g)

LOG=${THISDIRBASE}_${URL_sanitized}.log
TMP_LOG=${LOG}.temp

LOCKFILE=/var/lock/$(basename $0)_${URL_sanitized} {
         if ! flock -n 9; then
                 echo "Unable to lock $LOCKFILE, exiting" 2>&1
                 exit 1
         fi

         # remove lockfile
         trap "rm $LOCKFILE 2> /dev/null" EXIT INT KILL TERM

         echo STARTED $(date -Is -u) >> "$LOG"
         rm $TMP_LOG 2> /dev/null
         { time wget -o ${TMP_LOG} --no-check-certificate -nv -e robots=off -R.css,.js,.jpg,.png,.pdf -nd -r --delete-after -l 3
- --user=$2 --password=$3 $URL ; } 2>> "$LOG"
         printf "files: %d, thereof %d HTML\n" "$(wc -l $TMP_LOG | cut
- -f1 -d' ')" "$(grep -Ec '[-]>.*.htm[l]?[\"]' $TMP_LOG)" >> "$LOG"

         rm $TMP_LOG 2> /dev/null
} 9>"$LOCKFILE"



- --
Neu: Portal Region Stuttgart - www.region-stuttgart.de
Neu: emobil-in-bw.de
Neukunde: Statistisches Landesamt BaWü
- --------------------------------
d-mind bei Facebook: http://www.facebook.com/werbeagentur.internetagentur.stuttgart
- --------------------------------
d-mind
Fuchs/Weiß/Strobel GbR
Mörikestraße 69
70199 Stuttgart
Tel.: +49 711 280481-1 (Durchwahl: -18)
Fax: +49 711 2804813
Inhaber: Jens Fuchs, Michael Weiß, Jens Strobel www.d-mind.de

_______________________________________________
TYPO3-performance mailing list
TYPO3-performance at lists.typo3.org
http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-performance

-----BEGIN PGP SIGNATURE-----
Version: PGP Universal 3.3.2 (Build 15704)
Charset: utf-8

wpUDBQFUmXB4pp0IwsibV8MBCMGpA/4/xhfgoDSn4lZZOSR/MbnMREmY+6hYnwaw
RD7zMb6A71XG/pkxmHXaA/shLspjP85pqfIZJ538kXyvpySnYnkg8+t4bliJZkxE
O+WMZd3MAUz8L26TKgjonM8DfRYG0mJFlKcbrHKz1JwfyNnsUlc4OgEw8WmSn3qU
EuoWH5wbZQ==
=qgrL
-----END PGP SIGNATURE-----


More information about the TYPO3-performance mailing list