[TYPO3-performance] Cache priming

Wed Jan 21 17:05:13 CET 2015

Hi Stephan,

thank you for your input. I think you are right for a big share of 
cases. And I am fully with you in the sense that this is merely a 
work-around. Anyhow, in our situations it helps.

I would say the constraints for it to being useful are:
* special configuration cases (Typoscript): the crawler can only be 
successful if it is configured to mimic the user. On the sites I use it 
we have 99% public, non-logged-in traffic
* it would be perfect if I could tell Typo3 to do a 'cache refresh' 
instead of a mere 'page view'. So to say the worst case is if the 
crawled page is just served from the cache. I want to generate cache 
entries, not just read them <-- I do not have a solution for that and am 
eager to learn more about it!
* it is a kind of 'black box'-cache generation. As you mentioned, it 
would be far more clever if you could regenerate the cache *when* 
something changes <-- This can be partly addressed with TSConfig that 
allows to clear (also not *regenerate*, so this is bad for 
performance!!) the page cache on specific page(s) when an object in a 
system folder changes.

What would greatly help would be a way to tell Typo3 to regenerate the 
page cache for a specific page or page+parameters! I am very eager to 
learn more clever ways to handle the Typo3 cache.

Best regards,
Jonas

Am 23.12.2014 um 14:39 schrieb Stephan Schuler:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hey there.
>
> Cached content depends on various things.
>
> You can have e.g. one page that is public for each and every visitor but contains one single content which is only available for one user group and another single content that is *not* available for logged in users to notify them about additional content they can benefit after registering.
>
> In addition to this very obvious thing: Think about every condition you can make in TypoScript.
> You can serve different content based on different IPs by using the "IP" condition.
> You can provide different static layouts based on mobile stuff. Although I would suggest to serve one static thing and make media queries instead, you could use the "device" condition.
> You can provide additional "fixup" CSS for legacy browsers with the "device" and "version" condition, or maybe use the "useragent".
> You can introduce some "show this page in your language" messages by utilizing the "language" condition.
> There used to be clumsy pages back then in in 1990 or something that provided dark layouts for evening hours and light colors for daylight hours. I guess that was what the "hour" condition was meant for.
>
> You see, there are tons of conditions matching request data. And all of those will force you to create distinct content cache entries for each and every *condition combination*.
>
> Filling the cache can easily become a very heavy task in terms of both, taking care of every edge case on one hand and pure workload on the other side.
>
> I'm not completely sure if you shape your performance in a way you want.
> Did you try how long it takes for you to grep your whole site once?
> You *think* you fill the cache to avoid fe_users to hit uncached pages. But on heavy loaded sites you might just increase the overall load to your server which makes rendering slower for everybody. And on heavy loaded pages it's likely to only fill a small portion of your pages through your crawing script -- the small portion that has nearly no load itself and thus doesn't get triggered by regular fe_users. Those single pages that have heavy load themselves might get triggered by regular fe_users *before* your crawler reaches them. So your crawler just increases load, too, by fetching cached data.
>
> Depending on your server setup, there are chances that you limit the number of PHP worker processes. Usually you take the max amount of memory a single PHP process is allowed to take, assume every PHP process takes as much memory as it is allowed to and then calculate how many PHP processes your server can survive before being required to swap. That brings you to a certain number of PHP processes, which equals the number of simultaneous site requests that do not request static data. Usually everybody from the outside world can determine this number by firing up "ab" with different concurrency values.
> If there are times your server hits that number and goes to 100% load, having a crawler just to fill caches does more harm than good.
>
> I would suggest to *not* rely on cache filling mechanisms from the outside. Better create a strong concept of cache tags and avoid the "clear all" cache commands at all. When adjusting e.g. a news record there are only two pages to be cleared: The list view and the detail view. Done. This results in nearly 100% of your data *being* cached and *staying* cached during daily business. Only those rare situations where you deploy new code requires full cache clearing and thus full cache renewal. But as long as you don't do that twice a day you're just fine.
>
> Regards,
>
>
>
>
> Stephan Schuler
> Web-Entwickler | netlogix Media
>
> Telefon: +49 (911) 539909 - 0
> E-Mail:Stephan.Schuler at netlogix.de
> Web: media.netlogix.de
>
>
>
>
> netlogix GmbH & Co. KG
> IT-Services | IT-Training | Media
> Neuwieder Straße 10 | 90411 Nürnberg
> Telefon: +49 (911) 539909 - 0 | Fax: +49 (911) 539909 - 99
> E-Mail:info at netlogix.de  | Web:http://www.netlogix.de
>
> netlogix GmbH & Co. KG ist eingetragen am Amtsgericht Nürnberg (HRA 13338)
> Persönlich haftende Gesellschafterin: netlogix Verwaltungs GmbH (HRB 20634)
> Umsatzsteuer-Identifikationsnummer: DE 233472254
> Geschäftsführer: Stefan Buchta, Matthias Schmidt
>
>