How to Find Full Text Files on Lightweight Linux Distros

As I suppose some of you know, KDE comes with Nepomuk, which among other things allows us to search files or programs very easily. Just start typing the name and they're out. Something similar happens in Unity or GNOME. By making some adjustments, some of them even allow you to search within the files (what in English is called "full text search"). Those who have ever used Windows 7 will also know what I'm talking about: just start typing a word to bring up related files or programs.

On lighter distributions this is a bit more difficult to achieve. But the method that I am going to teach you is VERY light (according to distros of this type) and effective.

Choose the launcher: dmenu

My first choice was to try launchers that don't depend on a particular environment or distro. I mean Synapse (which is now in fashion), Gnome-Do, Kupfer, etc. They all share one thing: they cannot perform "full text" searches (that is, within files). In addition, they come with a lot of other plugins that do not contribute much to me. Not to mention they are not "minimalist" and light enough.

Those who use Openbox, Enlightenment or similar probably know dmenu. Those who have never used it, I suggest you visit this old post where its main characteristics are explained. In short, it is an ultra-minimalist and super-light application launcher. But it is not only that, what I did not know is that if it is configured correctly, it can also be used to show elements of any list that we pass to it. This discovery opens the doors to many possibilities ...

To install dmenu in Arch, just open a terminal and run:

sudo pacman -S dmenu

Install Recoll

The second discovery was Recoll. Our friend Fico talked about him a few months ago, article whose reading I recommend.

Recoll is a very light tool, independent from any desktop environment, that allows you to search in full text ("full text"). Obviously, for that you will need to index the files first, which may take a while, but after the initial indexing is done, the rest of the updates don't take long.

Recoll is a graphical interface, easy to use and with many options, designed in QT and based on the search engine Xapian.

Are you still using locate, find or catfish? Ha! I kept reading ...

To install Recoll on Arch and derivatives:

yaourt -S recoll

You will notice that recoll has a bunch of packages as optional dependencies:

  • libxslt: for XML based formats (fb2, etc)
  • unzip: for the OpenOffice.org documents
  • xpdf: for pdf
  • pstotext: for postscript
  • antiword: for msword
  • catdoc: for ms excel and powerpoint
  • unrtf: for RTF
  • untex: for dvi support with dvips
  • djvullibre: for djvu
  • id3lib: for mp3 tags support with id3info
  • python2: for using some filters
  • mutagen: Audio metadata
  • python2-pychm: CHM files
  • perl-image-exiftool: EXIF ​​data from raw files
  • aspell-en: English stemming support

Installing these packages will allow Recoll to index the contents of the corresponding file types. For example, antiword, allows Recoll to index the contents of .DOC files, etc.

The selection of additional components to install depends on your needs and the variety of file types that are stored on your computer. However, not to despair because Recoll, after indexing our files, will recommend which components to install to improve their effectiveness.

How to use Recoll

When you start Recoll for the first time, the screen shown below will appear. In case you want to index only your HOME (in full), just click on Start indexing now.

Recoll home screen

Recoll home screen

Recoll has powerful search facilities. In addition to entering the words to search, it also allows Boolean searches assisted with proximity clauses, filtering the types of files or location. It also allows Xesam compatible search, by field and filtering by date.

The response of the program when carrying out searches and presenting results is also surprising for its speed, and interesting for the way in which they present those results, determining the most relevant documents, for the search terms that are provided, and including a preview.

In the image below, I decided to show the results in a table, although Recoll comes by default with another style to show the results, much more complete and descriptive.

Results of a search in Recoll

Results of a search in Recoll

To see the missing packages so that Recoll can perform a full indexing of your files, just go to File> Show Missing Helpers.

Missing additional components

Missing additional components

En Preferences> Indexing Schedule You can configure the file indexing schedule. Obviously, for Recoll to work well it needs to index all your files (or at least the files in the folder that interests you, usually your HOME). For this, there are 3 alternatives: indexing by hand (my preferred one), indexing through cron or indexing at system boot.

Indexing programming in Recoll

Indexing programming in Recoll

Magic: combining Recoll and dmenu… is it possible?

Yes Yes it is. The trick is to know that dmenu not only allows you to list applications but also anything that we pass to it. You just have to figure out how to search Recoll using a terminal and pass the results to dmenu.

Magic is achieved, how could it be otherwise, through a simple script, whose authorship is Massimo Lauria and that I dared to modify slightly to translate it into Spanish.

Download script

Save the file (suppose, search-recoll.sh). Grant it execute permissions (sudo chmod + to fetch-recoll.sh) and assign it an appropriate key combination. In Openbox, this is achieved by editing the file ~ / .config / openbox / rc.xml or through the graphical interface obkey.

The bottom line: full-text searches using very few resources. As the Bambino Veira would say: "Beauty!"

dmenu when entering search text

dmenu when entering search text

dmenu, showing the results returned by recoll

dmenu, showing the results returned by recoll

yapa

Those using Ubuntu can get similar results through Recoll's Lens. To do this, it is necessary to add the corresponding PPA and install the following packages:

sudo add-apt-repository ppa: recoll-backports / recoll-1.15-on sudo apt-get update sudo apt-get install recoll sudo apt-get install recoll-lens

19 comments, leave yours

Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.

  1.   elav said

    Simply great U_U

  2.   let's use linux said

    That's right… 🙂 And it's MUCH faster and lighter than your beloved KDE… haha!

    1.    elav said

      It may be, but I don't change my integrated Dolphin search engine for anything. 😉

      1.    let's use linux said

        Ah ... yes ... Dolphin is something else ... major words.

  3.   AlonsoSanti14 said

    and in gnome how do I configure it, so that it does what you say the "full text search"?
    I hope you can help me, because if I would like to be able to search for documents that way.

    1.    let's use linux said

      I don't use GNOME, but if I remember correctly it comes with a tool called Tracker that can be used to do full-text searches.
      Cheers! Paul.

      1.    AlonsoSanti14 said

        ok thanks, right now I'm looking for info about Tracker.

  4.   gonzalezmd (# Bik'it Bolom #) said

    It is good to know these solutions. Thank you.

    1.    let's use linux said

      To you, for commenting. 😉

  5.   maximi89 said

    In my case I know something very simple that is in all distros ... it is «updatedb» and then use «locate file», it is very easy and very light ajaaja

    1.    eliotime3000 said

      Yes, but sometimes going the long way is fun.

    2.    let's use linux said

      That is not right. With locate and updatedb it is not possible to search full text.
      Cheers! Paul

  6.   gonza_212 said

    Very good post, interesting information ... I'm testing it.
    I enter the page to download the script that you left last but I get an error, I would appreciate it if you can upload it again.

    Thank you very much, greetings!

    : )

    1.    let's use linux said

      It works well. Try again ...

      1.    gonza_212 said

        Thank you very much, now I can download it.

        Regards!

        : )

  7.   gonza_212 said

    Sorry, but I've tried it on my computer and it doesn't work for me. I have ArchLinux with the PekWM manager (without desktop environment)… but it seems that the script does not work. Somebody could help me?

    Thank you very much, greetings!

  8.   let's use linux said

    Could you specify a little better what is not working for you?

    1.    gonza_212 said

      The truth is that I don't know if it will be running ... in the PekWM manager there is a file called «keys» (found in the /home/usuario/.pekwm/ directory) where the hotkeys (or keyboard shortcuts) are configured, so I assigned the script to the combination Ctrl + F, but I don't know if the syntax of the command to execute it will be correct.

      I show you how the line corresponding to that combination of keys is written:

      KeyPress = "Ctrl F" {Actions = "Exec` sh search-recoll.sh` "}

      note: the search-recoll.sh script is in my home, that is, in / home / myuser /

      But when pressing Ctrl + F nothing happens ... I tried modifying the line so that it executes dmenu instead of the script and it works.

      Another thing I did was run said script in the terminal, and when I did it it showed me the following:

      $ sh search-recoll.sh
      search-recoll.sh: line 39: syntactic error, the end of the file was not expected

  9.   gonza_212 said

    Sorry for the inconvenience, I have already solved the problem. What happened was that I downloaded the script from the paste and that way there is a problem in the coding it uses. Therefore, what must be done is to copy all the content and paste it into an empty file so that it does not generate this inconvenience.

    A thousand apologies, thank you very much anyway.

    Regards!