[TYPO3-dam] does Indexed search index pdf files with DAM enabled

michael miousse michael.miousse at tc2l.ca
Wed Aug 30 17:06:14 CEST 2006


Hello every body

we had the same probleme here, we needed a way to fully index a pdf for
extended search, so i'v develop an extention that link DAM with
indexedsearch. 

I haven't finished it yet, i'v got some autentification probleme with
multiple group for users when i try to use macmade loginbox with it. But if
you work with a crawler and  without multiplegroup users, everyting works
fine.

if you want to test it, i could give it to you, that way you could tell me
wath you think of it and track bugs for me as well

Pascal Voitot wrote:

> Hello,
> 
> Daniel, you are right when you say the indexed_search is not meant for
> what I want to do!
> In fact, I'm trying to use typo3 and its DAM as a basic, simple and free
> DMS without the need to put documents in a DB managed by a full DMS
> architecture (such as Sharepoint, Alfresco etc...) and with very powerful
> features to create FE presentation of the documents. This is not the
> original role of typo3 but it appears not to be a so bad idea when I see
> how expensive and complex document management systems are and how basic
> needs of people can be.
> 
> My problem is that the DAM indexes documents with metadata and even
> extracts of the doc but I need also a full-text indexation and search for
> DOC/PDF/PPT/XLS/OO etc...
> 
> The indexed_search is not meant for this at all for the reason you tell,
> it is linked to the FE and you are right: putting DAM doc information in
> the indexed_search tables would only bring rubbish in the search results!
> 
> So my idea is to create an "indexed_search" specifically linked to the DAM
> and also to think about a real versioning system etc...
> 
> here we are :)
> 
> br
> Pascal
> 
> 
> Selon Daniel Thomas <dt at dpool.net>:
> 
>> Hi Pascal,
>>
>> > As far as I know, there are no relationship between the DAM and the
>> > indexed
>> > search... The DAM only tries to retrieve some metadata from the
>> > files with
>> > tools such as catdoc, pdftotext etc...
>> > So you can get generally get the header of the file, the language
>> > and an extract
>> > of the content. But for PDF, files are often protected against
>> > reading content
>> > and the DAM can only retrieve basic information.
>> >
>> > BUT there is no way for the DAM to index a document in the
>> > indexed_search. But I
>> > think this should be the logical evolution!!
>> >
>>
>> Could you outline your idea here.
>> What exactly should indexed_search do?
>> So far it is closely wedded to the TYPO3 page or page-hash paradigm.
>> Both the indexing mechanism and the frontend-search functionality are
>> built to support an output format and not a data storage format.
>>
>> Personally, I see no use in fulltext-indexing of DAM records if this
>> is not combined with a major concept change in the way the
>> indexed_search operates or if it comes to that a different search
>> engine. In the end the indexed_search plugin would have to evolve,
>> not the DAM.
>>
>> However, I am not sure if this is a good solution for TYPO3. The
>> primary value of the indexed_search is that with its "format-
>> blindness" it integrates easily with virtually every output  a plugin
>> could produce. The downside is of course lack of precision and
>> redundancy in search results. If you wanted to have a search engine
>> which is not format blind, but which would connect indexed
>> information with unique occurances in the database or file system
>> both the indexer and the search engine would be much more complex and
>> harder to use.
>>
>> Regards
>>
>> Daniel
>>
>> > br
>> > Pascal
>> >
>> > Selon Sacha Vorbeck <Vorbeck at moduleBox.com>:
>> >
>> >> Hi,
>> >>
>> >> does anybody know if the indexed search will index PDF files when
>> >> using
>> >> DAM? Would be interesting to know.
>> >>
>> >> --
>> >> thank you - all the best,
>> >> Sacha
>> >> _______________________________________________
>> >> TYPO3-project-dam mailing list
>> >> TYPO3-project-dam at lists.netfielders.de
>> >> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-project-
>> >> dam
>> >>
>> >
>> >
>> > --
>> > Pascal Voitot
>> > ingénieur en génie informatique de l'ISMRA ENSI de Caen
>> > _______________________________________________
>> > TYPO3-project-dam mailing list
>> > TYPO3-project-dam at lists.netfielders.de
>> > http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-project-dam
>> >
>> >
>>
>> --/
>>
>> Daniel Thomas dpool
>>
>> Hinderink und Thomas Partnerschaft IT-Berater und Projektmanager
>>
>> Eduard-Schmid-Str. 9 | D-81541 München
>> t 08945227582 | m 01793918781 | fax 08945227583
>>
>> http://www.dpool.net | http://www.typergy.com
>> http://typo3partner.net
>>
>> /--
>>
>> _______________________________________________
>> TYPO3-project-dam mailing list
>> TYPO3-project-dam at lists.netfielders.de
>> http://lists.netfielders.de/cgi-bin/mailman/listinfo/typo3-project-dam
>>
> 
> 
> --
> Pascal Voitot
> ingénieur en génie informatique de l'ISMRA ENSI de Caen




More information about the TYPO3-project-dam mailing list