[TYPO3-english] mnogosearch and tt_news indexing

Dmitry Dulepov dmitry at typo3.org
Sun Jan 25 18:55:29 CET 2009


Hi!

Steffen Gebert wrote:
> 1. Found a few mistakes in the following paragraph (section: users manual):
> To exclude parts of the web site from indexing, create an indexing configuration record as described above _bit_ set method to “Disallow”. It will prohibit any pages starting from the current path from indexing. The _the_ screenshot above (“Real” record). _...?_

Can you add this as a bug report to Forge? I will fix it when I have a next mnogosearch round.

> 2. Indexing of normal pages works like a charm (also disallowing of /news/.*). But I can't get tt_news record indexed.
> In which order should these 3 indexing setups be?
> - Record: tt_news
> - Realm: disallow .../news/.*
> - Server: http://...

"Disallow" for any 'http://' must go before the first "allow". Think of it like "first matching wins". So you need to disallow news indexing before you allow the rest.

The position of "Records" is not important, it can be anywhere.

> I also tried tt_news as last one. Indexing recods are set up as in the manual.
> 
> I use tt-news 3.0.0, but this shouldn't matter IMHO. Furthermore TYPO3 4.3-trunk and mnogosearch-extension 2.2.0 with mnogosearch-module 3.3.8 snapshot.

I did not try tt_news 3.0.0 but there should be no difference. Mnogosearch indexes database table fields, not extensions.

> mnogosearch setup ends with:
>> # uid=10
>> Period 10y
>> Realm Disallow Regex http://www.dmm.local/news/.*
>> # uid=7
>> Period 1h
>> Server http://www.dmm.local/

This is ok.

>> # uid=11
>> HTDBAddr mysql://typo3_dmm:123456@localhost/typo3_dmm/?setnames=utf-8
>> HTDBList "SELECT CONCAT('htdb:/tt_news/11/', uid) FROM tt_news WHERE 1=1 AND tt_news.hidden=0 AND (tt_news.starttime<=1232817840) AND (tt_news.endtime=0 OR tt_news.endtime>1232817840) AND tt_news.deleted=0"
>> HTDBLimit 4096
>> HTDBDoc "SELECT title AS title,datetime AS last_mod_time,bodytext AS body FROM tt_news WHERE uid=$3"
>> Server htdb:/tt_news/

This also looks ok.

> If I manually run this query
>> SELECT CONCAT('htdb:/tt_news/11/', uid) FROM tt_news WHERE 1=1 AND
>> tt_news.hidden=0 AND (tt_news.starttime<=1232817840) AND
>> (tt_news.endtime=0> OR tt_news.endtime>1232817840) AND tt_news.deleted=0"
> i get 4096 results. Also
>> SELECT title AS title,datetime AS last_mod_time,bodytext AS body FROM
>> tt_news WHERE uid=xyz
> works (with xyz = valid uid).
> 
> Running crawler gives the following output:
>> st at st:/var/www/vhosts/dmm$ typo3/cli_dispatch.php mnogosearch -n -v 3                                      
>> indexer from mnogosearch-3.3.8-mysql-DB2-solid-SAPDB-ibase-ctlib-freetds-oracle8-oracle8i started with '/tmp/mnogosearch-Qzj3Ky'                                                                                      
>> [20851]{01} Writing words (0 words, 64 bytes, final).                                                      
>> [20851]{01} The words are written successfully. (final)                                                    
>> [20851]{01} Done (0 seconds, 0 documents, 0 bytes,  0.00 Kbytes/sec.)                                      
>> indexer from mnogosearch-3.3.8-mysql-DB2-solid-SAPDB-ibase-ctlib-freetds-oracle8-oracle8i started with '/tmp/mnogosearch-Qzj3Ky'                                                                                      
>> [20854]{01} URL: htdb:/tt_news/11/18670                                                                    
>> {htdb.c:314} Query: SELECT title AS title,datetime AS last_mod_time,bodytext AS body FROM tt_news WHERE uid=18670                                                                                                     
>>
>> [20854]{01} No data received

Hmmm. It says "no data received". It appears that it cannot fetch data. What happens if you run this query manually?

> As I don't know anything about mnogosearch internals, I turned on mysql logging. While indexing there are no SELECT queries on tt_news table - if I understood Records-principle right, the should be directly retrieved from database.

This is correct.

> 3. When are Records indexed? I see that after saving of tt_news an entry in tx_mnogosearch_urllog is created and processed at indexing time.
> Are news only indexed after saving (so should I create jobs for all existing news entries there)?
> Nevertheless, also newly saved/created are not listed in search results :(

News are indexed when you run "cli_dispatch.php mnogosearch". URL log just records information about modified pages/records to ensure that only they are reindexed (not the whole web site).

> Can you give me any hints, where to search?

I think the problem is that "SELECT title AS title,datetime AS last_mod_time,bodytext AS body FROM tt_news WHERE uid=18670" does not return anything. You can try to run the indexer with "-v 5" to see more information.

-- 
Dmitry Dulepov
TYPO3 core team
"Sometimes they go bad. No one knows why" (Cameron, TSCC, "Dungeons&Dragons")


More information about the TYPO3-english mailing list