[TYPO3-core] Caches and Locking
Philipp Gampe
philipp.gampe at typo3.org
Sat Mar 8 16:21:46 CET 2014
Hi Markus,
Markus Klein wrote:
> The access to the caches still needs proper locking for concurrency,
> though. As outlined in the blueprint, we have one more major problem here:
> Any implementation of the readers-writers-problem requires at least a
> shared counter variable for the number of active readers. Such a shared
> counter may be stored in a file, but this is rather slow and I would not
> consider this an appropriate IPC utility for resources with high access
> frequency like caches. Therefore we need to use shared memory. The problem
> here is that no PHP library for such a purpose is compiled in by default.
> (There are two of them, but only one being Win compatible as well.)
Isn't that what a semaphore is supposed to do? A shared counter that can be
incremented or decremented.
Keep in mind that many shared hoster do not have shared memory between two
requests.
We should not kick shared hosting sites as they are the majority of TYPO3
users.
IMHO we face several problems that exists since longer, but pop up here now:
1) Many operations are not performed as atomic is they need to be.
That happens a lot with SQL queries which should actually run as a single
statement. (e.g. delete cache entries in DB backend or create a new record
in backend)
We can either rewrite large parts of the core (the current implementation of
the DB class does not easily support combined statements, unless you build
the query yourself) or we can add transaction support that does solve the
whole concurrency stuff inside the DB. The later case might be easily
possible to implement (we just need to wrap the relevant code parts with
start and stop transaction) and this should work on many shared hosting
providers as well.
https://dev.mysql.com/doc/refman/5.0/en/commit.html
2) Write-always vs DOS and the readers-writers-problem
Old problem, new skin. The whole problem comes from the need to be able to
selectively delete individual cache entries instead of dropping the whole
cache if it is not needed any more.
Even worse, certain caches, like the class cache need to mess with the data
while needing to maintain the cache entry as such.
Normally one would let the last-write win. However for expensive operations
(such as image manipulation, etc) one does not want to trigger the expensive
processes again to prevent DOS vectors.
We should decide by a case-by-case basis if TYPO3 actually needs to provide
synchronization or whether it makes sense to allow to skip the whole
generation (e.g. via a config option) and rely on external processes to
generate the data for us. I guess this is already done for media
manipulation with FAL, however might need some fine-tuning of the API
throughout the core. I am not sure about the current state.
The other problem here is the original stumbling block, the core code
caches, especially the class loader.
I suggest to use a semi-transaction by using exclusive read-write locks by
either using flock (process synchronization issues) or even better by using
a lock file. Processes that fail to acquire a lock should continue without
writing for those kind of caches (unless the file is older xx seconds).
This should be fairly robust with only small overhead.
Thus we have three cases here:
a) data is not used to calculate data and calculation is cheep
--> always write on cache miss, concurrency does not matter
b) operation is expensive (e.g. media manipulation), data not used in
calculation process
--> try locking if available, fallback to write always (for shared hosting)
or allow to skip generation (for custom solutions)
c) data is required for calculation of the data (readers-writers problem)
--> relay on OS or database where possible (use transactions), otherwise
fallback to lock files or shared memory locks (if supported)
3) Missing stacked cache concept
You already described this very well. We need to deal with the different
level of persistence of caches and that uses might should a non-persistent
cache for performance reasons that should hold should-be-persistent data.
The question here is to what extent implementation can expect cache data to
exists. IMHO they should always work the the NULL backend, meaning they
should never assume that a cache entry the just write can be read
immediately afterwards or any time later.
4) Boilerplate code instead of central function with callback
Currently all CF-using code does something like in the documentation:
http://docs.typo3.org/typo3cms/CoreApiReference/CachingFramework/Developer/Index.html#caching-developer-access
The problem here is that this puts too much logic in the hand of the
developer and that more advanced functions like stacked caches cannot be
easily implemented.
I agree the to concept of an high-level API that reads a cache entry and
gets a generator function to generate and (possibly) store a missing cache
entry to reduce the boilerplate to:
$data = $highLvl->getCache('cache')->get($identifier, 'generator');
...
function generator($identifier) {...}
Such that the code can works by assuming that a cache entry *always* exists
(either coming from cache or being generated on-the-fly).
I suggest to only fix the classloader/code caches and leave the high level
synchronization issues for either 6.2+1 or ship this as a major update for
6.2 (in the sense of a service pack or a RHEL X Update Y).
I fear we are running short on time otherwise.
Best regards
--
Philipp Gampe – PGP-Key 0AD96065 – TYPO3 UG Bonn/Köln
Documentation – Active contributor TYPO3 CMS
TYPO3 .... inspiring people to share!
More information about the TYPO3-team-core
mailing list