Hi Olivier, > how do you extract the pdf? tika or cc_text? EXT:Tika (latest forge rev), standalone jar (latest as of last week), pushing content to remote Solr instance. Site/Filesystem are running in UTF-8, PDF/Office docs are somehow mixed. Chris