[TYPO3-core] Caches and Locking

Sat Mar 8 16:21:46 CET 2014

Hi Markus,

Markus Klein wrote:

> The access to the caches still needs proper locking for concurrency,
> though. As outlined in the blueprint, we have one more major problem here:
> Any implementation of the readers-writers-problem requires at least a
> shared counter variable for the number of active readers. Such a shared
> counter may be stored in a file, but this is rather slow and I would not
> consider this an appropriate IPC utility for resources with high access
> frequency like caches. Therefore we need to use shared memory. The problem
> here is that no PHP library for such a purpose is compiled in by default.
> (There are two of them, but only one being Win compatible as well.)

Isn't that what a semaphore is supposed to do? A shared counter that can be 
incremented or decremented.

Keep in mind that many shared hoster do not have shared memory between two 
requests.
We should not kick shared hosting sites as they are the majority of TYPO3 
users.

IMHO we face several problems that exists since longer, but pop up here now:

1) Many operations are not performed as atomic is they need to be.
That happens a lot with SQL queries which should actually run as a single 
statement. (e.g. delete cache entries in DB backend or create a new record 
in backend)
We can either rewrite large parts of the core (the current implementation of 
the DB class does not easily support combined statements, unless you build 
the query yourself) or we can add transaction support that does solve the 
whole concurrency stuff inside the DB. The later case might be easily 
possible to implement (we just need to wrap the relevant code parts with 
start and stop transaction) and this should work on many shared hosting 
providers as well.
https://dev.mysql.com/doc/refman/5.0/en/commit.html

2) Write-always vs DOS and the readers-writers-problem
Old problem, new skin. The whole problem comes from the need to be able to 
selectively delete individual cache entries instead of dropping the whole 
cache if it is not needed any more.
Even worse, certain caches, like the class cache need to mess with the data 
while needing to maintain the cache entry as such.

Normally one would let the last-write win. However for expensive operations 
(such as image manipulation, etc) one does not want to trigger the expensive 
processes again to prevent DOS vectors.
We should decide by a case-by-case basis if TYPO3 actually needs to provide 
synchronization or whether it makes sense to allow to skip the whole 
generation (e.g. via a config option) and rely on external processes to 
generate the data for us. I guess this is already done for media 
manipulation with FAL, however might need some fine-tuning of the API 
throughout the core. I am not sure about the current state.

The other problem here is the original stumbling block, the core code 
caches, especially the class loader.

I suggest to use a semi-transaction by using exclusive read-write locks by 
either using flock (process synchronization issues) or even better by using 
a lock file. Processes that fail to acquire a lock should continue without 
writing for those kind of caches (unless the file is older xx seconds).
This should be fairly robust with only small overhead.

Thus we have three cases here:
a) data is not used to calculate data and calculation is cheep
--> always write on cache miss, concurrency does not matter
b) operation is expensive (e.g. media manipulation), data not used in 
calculation process
--> try locking if available, fallback to write always (for shared hosting) 
or allow to skip generation (for custom solutions)
c) data is required for calculation of the data (readers-writers problem)
--> relay on OS or database where possible (use transactions), otherwise 
fallback to lock files or shared memory locks (if supported)

3) Missing stacked cache concept
You already described this very well. We need to deal with the different 
level of persistence of caches and that uses might should a non-persistent 
cache for performance reasons that should hold should-be-persistent data.
The question here is to what extent implementation can expect cache data to 
exists. IMHO they should always work the the NULL backend, meaning they 
should never assume that a cache entry the just write can be read 
immediately afterwards or any time later.

4) Boilerplate code instead of central function with callback
Currently all CF-using code does something like in the documentation:
http://docs.typo3.org/typo3cms/CoreApiReference/CachingFramework/Developer/Index.html#caching-developer-access
The problem here is that this puts too much logic in the hand of the 
developer and that more advanced functions like stacked caches cannot be 
easily implemented.
I agree the to concept of an high-level API that reads a cache entry and 
gets a generator function to generate and (possibly) store a missing cache 
entry to reduce the boilerplate to:
$data = $highLvl->getCache('cache')->get($identifier, 'generator');
...
function generator($identifier) {...}
Such that the code can works by assuming that a cache entry *always* exists 
(either coming from cache or being generated on-the-fly).

I suggest to only fix the classloader/code caches and leave the high level 
synchronization issues for either 6.2+1 or ship this as a major update for 
6.2 (in the sense of a service pack or a RHEL X Update Y).
I fear we are running short on time otherwise.

Best regards
-- 
Philipp Gampe – PGP-Key 0AD96065 – TYPO3 UG Bonn/Köln
Documentation – Active contributor TYPO3 CMS
TYPO3 .... inspiring people to share!