[Typo3-dev] New feature in 4.0: Reference Index Table

Kasper Skårhøj kasper2005 at typo3.com
Sat Dec 17 15:39:55 CET 2005


Hi TYPO3 Developers.

From the saturday-pre-christmas-lab of TYPO3, I have used my electronic paper 
and scissors the last week to cut you some cool features for christmas fun:


Databases like MySQL allows you to specify "an index" over a number of rows. 
The point of the index is to allow quick look ups which doesn't involve 
scanning the whole table. Essentially, the information in a database index 
table is a condensed redundant reflection of selected "real" content in the 
actual table rows. The index is automatically kept up-to-date during any 
change made to the database so the user has nothing to worry about in terms 
of consistency.

Now, TYPO3 also have an index table which is a redundant reflections of all 
relations (db / file / softrefs + traversing flexforms) found in tables/field 
registered in the $TCA. The main motivation to introduce this table is to 
provide an information base for the up-coming integrity check functions of 
version 4.0. However it quickly shows its usefulness in many other cases as 
well. For instance: In TemplaVoila you can "unlink" content elements from the 
page structure but you can't safely delete them because what if some other 
page in the system also refers to that element? With the new reference index 
its quick to detect any such references and if none are found the system 
could delete the element right away. Example:

mysql> SELECT tablename,recuid FROM sys_refindex WHERE ref_table="tt_content" 
AND ref_uid=39;
+-----------+--------+
| tablename | recuid |
+-----------+--------+
| pages     |     20 |
| pages     |      8 |
+-----------+--------+
2 rows in set (0.00 sec)

This query quickly tells us that the element "tt_content:39" is referenced on 
two pages, namely 20 and 8. Since page 8 is the parent page of the element 
TemplaVoila now knows that another reference is found on page 20! This had 
been completely unrealistic to discover otherwise, except if a full search 
through the page tree had been done.


VARIOUS EXAMPLES:
Another application of the reference table can remedy the typical rants over 
the comma separated reference lists in TYPO3. For example, backend users list 
references to their usergroups by "1,3,4". This makes it very hard to create 
a reverse lookup to find which backend users are members of usergroup "1". 
That is now possible (not taking subgroups into account though!):

mysql> SELECT recuid FROM sys_refindex WHERE tablename="be_users" AND 
ref_table="be_groups" AND ref_uid=1;
+--------+
| recuid |
+--------+
|      2 |
|     10 |
+--------+
2 rows in set (0.00 sec)

It turns out that two records from the "be_users" table has a reference to 
"be_groups:1".



Want to know which backend users has this group as their "main" (first listed) 
usergroup? Just use the sorting field:

mysql> SELECT recuid FROM sys_refindex WHERE tablename="be_users" AND 
ref_table="be_groups" AND ref_uid=1 AND sorting=0;  



Wanna know which records might contain a reference to the HTML template file 
"fileadmin/templates/cmf2005.html"?

mysql> SELECT tablename,recuid FROM sys_refindex WHERE ref_table="_FILE" AND 
ref_string="fileadmin/templates/cmf2005.html";
+------------------------+--------+
| tablename              | recuid |
+------------------------+--------+
| tx_templavoila_tmplobj |      3 |
| tx_templavoila_tmplobj |      4 |
| tx_templavoila_tmplobj |      5 |
| tx_templavoila_tmplobj |      7 |
| tx_templavoila_tmplobj |      9 |
| tx_templavoila_tmplobj |     21 |
| tx_templavoila_tmplobj |     23 |
| tx_templavoila_tmplobj |     26 |
| tx_templavoila_tmplobj |     31 |
| tx_templavoila_tmplobj |     30 |
| tx_templavoila_tmplobj |     28 |
| tx_templavoila_tmplobj |     24 |
| tx_templavoila_tmplobj |     32 |
| tx_templavoila_tmplobj |     34 |
| tx_templavoila_tmplobj |     36 |
+------------------------+--------+
15 rows in set (0.00 sec)

.... 15 TemplaVoila Template Objects did! Now you know you should NOT delete 
this file!



Wanna know if any TypoScript records contain references in their setup or 
constants fields to "fileadmin/menuoftypes.php"?

mysql> SELECT recuid,field,softref_key FROM sys_refindex WHERE 
tablename="sys_template" AND ref_table="_FILE" AND 
ref_string="fileadmin/menuoftypes.php";
+--------+--------+-------------+
| recuid | field  | softref_key |
+--------+--------+-------------+
|     32 | config | TStemplate  |
|     37 | config | TStemplate  |
|     56 | config | TStemplate  |
|     57 | config | TStemplate  |
+--------+--------+-------------+
4 rows in set (0.00 sec)

In this case you can see the references were found in the "config" field 
(TypoScript Setup field) and it was found by the softreference parser named 
"TStemplate". Conclusion: The reference index even contains "soft references" 
- references to files and records located in a plain text field!




Now, find all tt_content elements where a "<LINK 1>" tag is found with a link 
to a page with id "1":

mysql> SELECT tablename,recuid,field FROM sys_refindex WHERE 
tablename="tt_content" AND ref_table="pages" AND ref_uid=1 AND 
softref_key='typolink_tag' ;
+------------+--------+----------+
| tablename  | recuid | field    |
+------------+--------+----------+
| tt_content |      8 | bodytext |
| tt_content |    110 | bodytext |
| tt_content |    128 | bodytext |
| tt_content |    128 | bodytext |
| tt_content |    128 | bodytext |
| tt_content |    128 | bodytext |
| tt_content |    128 | bodytext |
| tt_content |    372 | bodytext |
| tt_content |    437 | bodytext |
+------------+--------+----------+
9 rows in set (0.00 sec)



Making a true JOIN based sql query between backend users and backend groups 
(something otherwise impossible due to the comma separated reference 
principle)?

mysql> SELECT be_users.username, be_groups.uid, 
be_groups.title,sys_refindex.sorting FROM be_users,be_groups,sys_refindex 
WHERE sys_refindex.tablename="be_users" AND 
sys_refindex.ref_table="be_groups" AND be_users.uid=sys_refindex.recuid AND 
be_groups.uid=sys_refindex.ref_uid ORDER BY be_users.username;
+------------------+-----+---------------------+---------+
| username         | uid | title               | sorting |
+------------------+-----+---------------------+---------+
| be_test          |   1 | be_testgroup        |       0 |
| news_article     |   6 | Common editor group |       0 |
| news_article     |   4 | Article Writers     |       1 |
| news_editor      |   3 | Editor              |       1 |
| news_editor      |   6 | Common editor group |       0 |
| news_news        |   2 | News Writers        |       1 |
| news_news        |   6 | Common editor group |       0 |
| reviewer         |   6 | Common editor group |       0 |
| reviewer         |   2 | News Writers        |       1 |
| reviewer         |   4 | Article Writers     |       2 |
| template_be_test |   1 | be_testgroup        |       0 |
| tony             |   7 | tony                |       0 |
+------------------+-----+---------------------+---------+
12 rows in set (0.00 sec)

There you go!


What about finding an email address? How do I find all occurencies on the 
site?

mysql> SELECT tablename,recuid,field FROM sys_refindex WHERE softref_key = 
"email" AND ref_string="mail at foo.bar.com";
+------------+--------+--------------+
| tablename  | recuid | field        |
+------------+--------+--------------+
| tt_content |    457 | bodytext     |
| tt_content |    457 | imagecaption |
| tt_content |    467 | bodytext     |
| tt_content |    475 | bodytext     |
| tt_content |    475 | bodytext     |
| tt_content |    558 | bodytext     |
| tt_content |    558 | imagecaption |
| tt_content |    568 | bodytext     |
| tt_content |    576 | bodytext     |
| tt_content |    576 | bodytext     |
| tt_content |    650 | bodytext     |
| tt_content |    650 | imagecaption |
| tt_content |    660 | bodytext     |
| tt_content |    668 | bodytext     |
| tt_content |    668 | bodytext     |
| tt_content |    735 | bodytext     |
| tt_content |    735 | imagecaption |
| tt_content |    740 | bodytext     |
| tt_content |    744 | bodytext     |
| tt_content |    744 | bodytext     |
+------------+--------+--------------+
20 rows in set (0.00 sec)

Like that!




POSSIBLE APPLICATIONS:
OK, so we can proceed like this for a long time. This should give you all 
ideas for a bunch of nice applications. Here are some ideas I suggest:

- Global Data Integrity Check [I'm going to do this in January for version 
4.0]
- TemplaVoila clean-up [Robert is going to do that for 4.0]
- In a context menu for a database record or file from Web > Filelist, imagine 
a function that will show us all references to this element! Could be very 
useful to know but until now has been impossible to get an exact answer.
- Web > Filelist: Show "flag" for all files and directories in the list if 
they are referenced somewhere in the system. Click the "flag" and see where 
(or let it be a mouse-over function?). Would also make it easy to remove 
unused files.
- A search (and replace?) function using the reference index: Could search for 
typolink-tags pointing to some page and let you change the value across the 
whole system. Even more useful: For email addresses!
- Delete file or record: Displays a warning if there are references to the 
element! Or it could even reject the deletion unless those references were 
removed?
- One could also imagine a hook in the t3lib_refindex class that lets other 
application provide indexing services; That could be the maintenance of a 
word index used for searching.
- Another usage of such a hook could be generating a similar reference table 
for all values in flexform XML: This could enable exact look-ups of 
information inside flexform XML which is currently impossible, almost like a 
sort of "XPath" surrogate.



TECHNICAL INFORMATION:

1) Creation, validation and correction of the Index is done in the extension 
"lowlevel", accessed through "Tools > DB check", selecting "Manage Reference 
Index".
2) Runtime updates are created with TCEmain and as long as TCEmain is used for 
ANY manipulation of relations to database records and files the index 
integrity should be intact.
3) t3lib_refindex is used to maintain the index. 

Table structure:

mysql> show columns from sys_refindex;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| hash        | varchar(32) |      | PRI |         |       |
| tablename   | varchar(40) |      | MUL |         |       |
| recuid      | int(11)     |      |     | 0       |       |
| field       | varchar(40) |      |     |         |       |
| flexpointer | tinytext    |      |     |         |       |
| softref_key | varchar(30) |      |     |         |       |
| softref_id  | varchar(40) |      |     |         |       |
| sorting     | int(11)     |      |     | 0       |       |
| ref_table   | varchar(40) |      | MUL |         |       |
| ref_uid     | int(11)     |      |     | 0       |       |
| ref_string  | varchar(40) |      |     |         |       |
+-------------+-------------+------+-----+---------+-------+
11 rows in set (0.00 sec)


TODO:
- (For 4.0) In "Help > About Modules" screen we might want to look if the 
index is empty and if so, present a link to "Tools > DB check" tool so the 
index can be created initially for updated sites?
- (For 4.0) Improve the "Tools > DB check" tool?
- Improve MySQL database indexes on the "sys_refindex" table if needed for 
various applications?
- Make t3lib_refindex available in the frontend scope for integration in 
plugins that manipulate the database on their own and need to update the 
reference index. Actually, I'm inclined to think we should rather think about 
how to provide t3lib_tcemain for the frontend instead...!
	


Merry Christmas everyone!

-- 
- kasper

-----------------
"A 'please' would be nice", John Travolta, Pulp Fiction




More information about the TYPO3-dev mailing list