[TYPO3-english] Tesseract project and the Google Query data provider

Søren Madsen sma at science.au.dk
Thu Oct 21 15:18:43 CEST 2010


Dear mailing list,

I promised the Tesseract guys to continue our discussion of various Tesseract questions in the mailing list - so that I  – and the TYPO3 community as well – might learn something about this cool suite of extensions. The discussion and questions below are about using the Google Query Data Provider in particular – and if any one else besides Francois and Roberto can help me, please feel free to help. Any help and enlightenment is highly appreciated! :)

--

On 20/10/2010, at 16.00, Roberto Presedo wrote:

> Hi Søren,
> 
> Here are some answers to your questions...
> 
> 1) For the caching issue, this is indeed related to session. Tesseract
> keeps filter's information in the session in order to reuse the filter
> when browsing page results. If you want to clear the session's
> information, you can use the clear_cache parameter in the url (either
> by GET or POST) example : &clear_cache=1

Ok - that's clear. I tried your other suggestions, but as I wrote to you, that clashed with using pagebrowse, since the query parameter is not transferred to the pagebrowser URL parameters, thus not sending a q parameter, and therefore clearing the cache. 

> 2) Synonyms and Keymatches are available mapping the googleSynonymes
> or googleKeymatches which both contains a "label" and a "link" field.
> This information is only available if the query returns a synonym or a
> keymatch.

Yes - I get that, but I need to place that inside the googleInfos loop, right? But - then it's output for each loop. 
I tried looping googleKeymatches (both inside and outside the googleInfos loop), but that returns nothing.

I obviously have some HTML elements containing my keymatches and/or synonyms, and I only want to output this, if there are matching results. How do I do this?


Here is my current template for reference:

<div class="au_gsa">
<!--LOOP(googleInfos)-->
<div class="au_gsa_keymatch"><a href="###FIELD.keylink###">###FIELD.keylabel###</a></div>
<!--IF ( ###COUNTER(googleInfos)### == 0 ) -->
<p>Side ###RECORD_OFFSET### af <strong>###TOTAL_RECORDS###</strong> fundne resultater</p>
<p>###PAGE_BROWSER###</p>
<ul>
<!--ENDIF-->
  <li>
      <span class="title"><a href="###FIELD.url###">###FIELD.title###</a></span><br />
      <span class="snippet">###FIELD.snippet###</span><br/>
      <span class="url"><a href="###FIELD.url###">FUNCTION:parse_url(###FIELD.lit_url###,1)</a> (###FIELD.pagelang###) - ###FIELD.Author### - ###FIELD.DC-date###</span><br/>
  </li>
<!--IF ( ###COUNTER(googleInfos)### == ( ###TOTAL_RECORDS### - 1 ) ) -->
</ul>
<!--ENDIF-->
<!--ENDLOOP-->
###PAGE_BROWSER###
</div>

> 3) To setting up proper paging of results, you must first have
> "pagebrowse" extension installed, and then configure the datafilter. A
> typical configuration would be
> "Max items per view" = 10
> and
> "Start at page (offset)" = "vars:page" (this is the value of the
> "tx_displaycontroller[page]" parameter provided by the "pagebrowse"
> extension.

vars:page - Thanks! 

This re-introduces an old problem we have with the pagebrowse extension though. We have a lot of websites that we want to provide search for individually via collections, but on the same page we also wish to show results from all of the University. Therefore we have at least two search listings on the same page, but the pagebrowse extension can't differ between these two, so when I hit "Next" for one listing, the other listing also get's "Nexted" :)

> 4) and 5) By passing a "debug=1" parameter to a page containing a
> tesseract element, a table containing data structure information will
> be displayed (you must be logged in BE to see that table). Using this,
> you'll be able to see what kind of information is available for each
> loop, and also how metadatas are stored in the googlequery provider
> (TIP : take a close look the the record table)

I still don't get how only to output a given marker if it contains data. Some snippets are eg. empty, so I don't want to output the empty containg HTML.

I also don't get how to map the meta tags.

The record table (and the XML) reveals the following example meta tags:
DC.Title persons
DC.Language da
DC.Date 2010-09-28T20:41:40+02:00
viewport width=1000;
rating general
DC.Type text/html

So I've tried to enter Author,generator, rating etc. in "Selected meta tags" in the DP, and these are available to me in the template controller - but they don't output anything.


> Then, regarding the features provided by GSA, here is what I can say :
> 
> 1) Clusters (The "Narrow your search" part)
> This is provided by a Javascript included in the GSA. I didn't find a
> way to get that information in the XML output provided by the GSA.
> Maybe there is, but I couldn't find it.

Both clusters and dynamic search suggestions (which I haven't mentioned earlier) are provided by javascript, so perhaps we could figure out a way to interact with the GSA directly?

> 2) Spelling suggestions for searches, 3) Sort by date / sort by
> relevance, 4) Display file types and 5) Cached version link
> All those features are actually not supported by the googlequery
> provider, but this can easily be added in a future version.

So - they are possible - great. Are these features in a roadmap, or is this something we would need to sponsor the development of?

> I hope this help...

It most certainly did. Thanks for your time!

Regards,
Søren Madsen

> 
> _______________________________
> Roberto PRESEDO
> 
> 
> 2010/10/19 Søren Madsen <sma at science.au.dk>
>> 
>> Dear Roberto,
>> Thank you so much for your helping me with exploring how we could use our Google Search Appliance box with your Tesseract concept. Having complete control over the look and feel of search results, and having these natively in our CMS is very very appealing.
>> As I told you earlier in our correspondence, this may very well be something that we'd like to use in production at Aarhus University. But before I feel I could suggest my superiors and coworkers whether we should pursue this direction, I have some more questions and inquiries.
>> Questions
>> 1) I still haven't heard from you regarding the caching issues I'm experiencing (is this session related?). I'm experiencing this both in my local 4.5-dev installation, and on a 4.3.7 installation.
>> 2) How are Synonyms and Keymatches mapped? In separate loops as well? I tried different things, but to no avail.
>> 3) Could you lead me in the right direction in regards to setting up proper paging of results?
>> 4) I tried mapping meta tags, but I don't get any output from these. How are they supposed to work?
>> 5) Generally – how do I check if a given marker contains data, and only outputs if any data is available. If loops cannot be nested, I don't see how I can do this.
>> Inquiries
>> The GSA provides us with several helpful tools - such as spelling suggestions etc. – and we'd obviously like to provide these tools to our users.
>> So – are the following GSA concepts accessible with the Tesseract concept (or could this be developed):
>> 1) Clusters (The "Narrow your search" part)
>> 
>> 
>> 2) Spelling suggestions for searches
>> 
>> 
>> 3) Sort by date / sort by relevance
>> 
>> 4) Display file types
>> 
>> 5) Cached version link
>> 
>> ?
>> Thanks in advance for your time – and for a very interesting project!
>> Kind regards,
>> 
>> 
>> Søren Madsen
>> Communications Officer






More information about the TYPO3-english mailing list