[TYPO3-dev] Missing feature in t3lib_cache_backend_MemcachedBackend

Wed Aug 11 22:00:59 CEST 2010

Hey.

Ugh, pretty long post, hope it's still interesting :)

Chris Zepernick {SwiftLizard} wrote:
> There is no way to check if a server is connected, so we can not check
> on initialization if connect really worked.

This doesn't really fit to the interface, and such a change should go to 
FLOW3 first.

FYI: The current state of memcache backend:
- There is a pending FLOW3 issue which simplifies and speeds up the 
implementation a bit, see [0] for details. This patch will be backported 
to v4 as soon as it has been patched to FLOW3.

- There is a systematic problem with the memcache backend: memcache is 
just a key-value store, there are no relations between keys. But we need 
to put some structure in it to store the identifier-data-tags relations. 
So, for each cache entry, there is a identifier->data entry, a 
identifier->tags and a tags->identifier entry. This is by principle a 
*bad* idea with memcache for the following reasons:
-- If memcache runs out-of-memory but must store new entries, it will 
toss *some* other entry out of the cache (this is called an eviction in 
memcache-speak).
-- If you are running a cluster of memcache servers and some server 
fails, key-values on this system will just vanish from cache.

The above situations will both lead to a corrupt cache: If eg. a 
tags->identifier entry is lost, dropByTag() will not be able to find the 
corresponding identifier->data entries that should be removed and they 
will not be deleted. So, your cache might deliver old data on get() 
after that.

I have an implementation for collectGarbage() in mind, to at least find 
and clean up the state of the cache if such things happen (didn't really 
implement that, so I'm unsure if it actually works).
But, important thing is: If you are running a cluster, your cache *will* 
be corrupt if some server fails, and if some of the memcache cluster 
systems begins to evict data, your cache *will* be corrupt as well.

Keep these things in mind if you choose to use memcache. BTW: There is 
an extension called memcached_reports in TER [4], which shows some 
memcache server stats within the reports module.

An implementation without those problems is the redis backend, it's 
already pending in FLOW3 with issue [1], it scales *very* well, even 
better than memcache. redis [2] is a young project and as such a bit 
experimental, though.

Best bet is currently still the db backend, maybe with the newly added 
compression (very usefull for bigger data sets like page cache). We're 
running several multi gigabyte caches without problems, it just won't 
scale *much* more: The db backend typically slows down if you are unable 
to give mysql enough RAM to make the cache table fully RAM driven. Be 
sure to tune your mysql innodb settings if you have big tables!

If you are really interested in performance of the different backends, 
you could also give enetcacheanalytics [3] a shot, it comes with a BE 
module to run performance test cases against different backends.

I have started writing documentation about the caching framework, but 
it's not finished yet. As a start, here is a sum up of your current 
backend alternatives:

* apcBackend: Lightning fast wih get() and set(), but doesn't fit for 
bigger caches, only usable if you're using apc anyway. If seen heavy 
memory leaks with php 5.2
* dbBackend: Mature. Best bet for all usual things. Scales well until 
you run out of memory. For 4.3 the insertMulipleRows() patch is 
recommended if you're adding many tags to an entry (delivered with 4.4).
* fileBackend: Very fast with get() and set(), but scales only O(n) with 
the number of cache entries on flushByTag(). This makes it pretty much 
unusable for page caches. FLOW3 uses it for AOP caches, where it fits 
perfectly well.
* pdoBackend: Alternative for dbBackend, *might* be neat with a db like 
Oracle, but currently untested by me in this regard (I just tested with 
sqlite, where it sucks, but that is because of sqlite).
* memcachedBackend: Ok performancewise, but has the mentioned drawbacks.
* redisBackend: Experimental, but architecture fits perfectly to our 
needs. Pretty much every operation scales O(1) with the number of cache 
entries. I'm able to give O-notations for every operation, depending on 
number of input parameters and number of cache entries.

Regards
Christian

[0] http://forge.typo3.org/issues/8918
[1] http://forge.typo3.org/issues/9017
[2] http://code.google.com/p/redis/
[3] http://forge.typo3.org/projects/show/extension-enetcacheanalytics
[4] http://typo3.org/extensions/repository/view/memcached_reports/current/