[TYPO3-Solr] How to handle changes on related values

Stephan Schuler Stephan.Schuler at netlogix.de
Thu Dec 27 11:44:03 CET 2012


Hi Ingo.


Thank you for answering my question. For I minute, I thought you would just ignore this because my "workaround" is the way to go.


On the one hand I think ref index is kind of unreliable. I don't know if things have changed since extbase rose because I didn't touch the ref index since 4.2 or so. But when I tried to use ref index the last time, I had several extensions that modified database tables without updating the ref index table. So using ref index should work when you deal with records getting modified by backend only, but everything else feels kind of unpredictable to me. I know that this situation is related to bad programming of foreign extensions and we should not care about stuff that works against the stable API, but if this means a certain percentage of use cases conflicts and behaves unpredictable crazy, we should do the best to avoid this. A not properly managed ref index should be really hard to discover when the error message is "some records are only indexed partially".

And on the other hand: I don't know if we should restrict the "foreign record tracking" behavior to records that have TCA relations to a record.
Think about a relation from Tx_Myext_Record to tt_content. That's something the ref index will cover.
But think about the related tt_content being either a templavoila record or of type "core records" (don't know its exact name, but there definitely is a default core tt_content which collects a couple of UIDs and passes them to the cObject "RECORD" when being rendered). Then you will keep track of relations to the tt_content container record, but this doesn't necessarily change when child records get changed.
As soon as the child record has relations itself and gets passed to the indexer by default core rendering mechanisms (RECORDS/CONTENT), you completely lose the child-child records that might be important for the rendering output. They influence the rendering output, but you don't know about them and therefore cannot track them. I guess very often we will end up collecting almost every record in the database when we rely on ref index.

A third thing would be: What if a relation gets removed from ref index? Losing a given relation might be triggered by deleting the child record. This removes the relation from ref index as well. Or the relation is not parent->child but child->parent. When the child gets updated by unselecting the parent relation, the ref index gets updated as well. Ok, this can be covered by ref index's "deleted" flag. But .. I feel not really happy with this :).

Another thing: Not all child record update actions really do influence the rendering output. I think the very most relations a record has don't go into solr.

And the last thing is very technical: Management of indexing queue filled with both, indexing records and relation records. The indexing queue currently holds the lastIndexTime, then it decides by a simple sql query if a record is younger then the lastIndexTime. When we add relations to the tracking mechanism, we either add them to the indexing_queue table as well and introduce a type field or we create a dedicated relations table. But in both cases, we need some place to track the lastIndexeTime for the related record. And because a record can have relations to different to different indexqueue items, we end up having an additional table filled with "relations". Which means: parentTable, parentUid, childTable, childUid and childLastIndexTime. And I'm not completely sure how this (multi-parent child records in particular) ends up for nesting and recursion. I guess the existing sys_refindex table is a bit too inflexible to cover all this information.

That's why I thought about a "relationWatcherService".

Tx_Solr_RelationWatcherService impements t3lib_Sigleton
$relationWatcherService->reset();
$relationWatcherService->setParentRecord($parentRecordTableName, $parentRecordUid);
$relationWatcherService->addChildRecord($childRecordTableName, $childRecordUid);
$relationWatcherService->commit();

This service has only one task: Create a proper "solr ref index" database table which collects only those relations that are needed for an index item.

I would enhance this by adding two custom cObject types:
* SOLR_RECORD_RELATION_WATCHER which has the same interface as RECORDS and takes source and tables. It passes the automatically fetched records to the relationWatcherService.
* SOLR_CUSTOM_RELATION_WATCHER which takes a userFunc. The userFunc has to return something like "tt_content_22, pages_24". This is also passed to the relationWatcherService.

By a hopefully existing hook we could extend the core RECORDS element to automatically pass its source to the relationWatcherService. Your SOLR_RELATION object should do the same.
This should cover all used records.

The main difference is: I would collect the relations myself, done by rendering time when a parent record gets passed to the index.
And compared to sys_refindex: I would not create a nesting or hierarchy structure but a very flat "[parent] to [child or child/child]". If A->B->C->D and only A is an indexed item, then my relation index would show "A->B, A->C, A->D". This should be enough for change tracking and makes the hierarchy situation a lot easier.

So, it's up to you what you want to do. But as you can see, I really would not go ref index but a custom relation tracker.


Kind regards,
Stephan.



Stephan Schuler
Web-Entwickler

Telefon: +49 (911) 539909 - 0
E-Mail: Stephan.Schuler at netlogix.de
Website: media.netlogix.de


--
netlogix GmbH & Co. KG
IT-Services | IT-Training | Media
Andernacher Straße 53 | 90411 Nürnberg
Telefon: +49 (911) 539909 - 0 | Fax: +49 (911) 539909 - 99
E-Mail: info at netlogix.de | Internet: http://www.netlogix.de

netlogix GmbH & Co. KG ist eingetragen am Amtsgericht Nürnberg (HRA 13338)
Persönlich haftende Gesellschafterin: netlogix Verwaltungs GmbH (HRB 20634)
Umsatzsteuer-Identifikationsnummer: DE 233472254
Geschäftsführer: Stefan Buchta, Matthias Schmidt



-----Ursprüngliche Nachricht-----
Von: typo3-project-solr-bounces at lists.typo3.org [mailto:typo3-project-solr-bounces at lists.typo3.org] Im Auftrag von Ingo Renner
Gesendet: Donnerstag, 27. Dezember 2012 06:13
An: typo3-project-solr at lists.typo3.org
Betreff: Re: [TYPO3-Solr] How to handle changes on related values

Am 10.10.12 09:12, schrieb Stephan Schuler:

Hi Stephan,

> We have several index queue configurations where a single solr document is not only related to a single TYPO3 record but a couple of records. And we have some index queue configurations with relations to other stuff then only files or records.

> Is there a solution for that problem?

It's a known issue, and maybe even a bit advanced already. I've come across this too already but couldn't think about an easy and transparent solution yet...

Maybe the reference index could be used here? Let's say with the indexing configuration you could specify other tables to watch too. Then when a record of that related table is changed the record monitor would use the re-f index to find records of the original table that could be affected by the change and update the Index Queue accordingly...

What do you think?


Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code

TYPO3 - Open Source Enterprise Content Management System http://typo3.org

Apache Solr for TYPO3 -
Open Source Enterprise Search meets Open Source Enterprise CMS http://www.typo3-solr.com _______________________________________________
TYPO3-project-solr mailing list
TYPO3-project-solr at lists.typo3.org
http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-project-solr
..

What do you think?


Ingo

--
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code

TYPO3 - Open Source Enterprise Content Management System http://typo3.org

Apache Solr for TYPO3 -
Open Source Enterprise Search meets Open Source Enterprise CMS http://www.typo3-solr.com _______________________________________________
TYPO3-project-s


More information about the TYPO3-project-solr mailing list