Tagged: solr

Setup Solr with Moodle for search inside files content

Following is the step by step guide to setup Solr with Moodle for search inside files content. This will need setting up of SOLR, setting up php-solr extension and solr core configuration.

Moodle has an option for global search, which provide an option to search among course, activities and other areas. This is a unified search, which extrac the data from database , basis on user serach query and context.

Context means : Search inside only enrolled courses or All the courses.

We will divide this topic into following part,

What is the need for Solr

Out of many benefits like,

  • fast searching
  • ranking
  • search stats

the most benefit is, searching inside the files.

Majorly, files are uploaded in the system, as pdf, doc, txt, ppt etc whether as an activity or assignment submission etc.

these files are not much relevan if we are not able to search within the file content as an 100 pages pdf will be irrelevant if not comes with a saearch query,

Here, SOLR do its job to provide the result by searching inside the file content at a speed.

There are alternatives to SOLR as elasticsearch, but, SOLR is avialable and integrated directly in moodle.

more about solr, https://solr.apache.org/features.html

What is solr

the next section is about, what is solr ?

SOLR is an open-source search engine that provides a REST API Interface to ingest the content and query against that content and the output is in JSON.

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called “indexing”) via JSON, XML, CSV or binary over HTTP. You query it via HTTP GET and receive JSON, XML, CSV or binary results.

more about solr , https://solr.apache.org/

How to setup solr ?

SOLR can be setup on any linux or windows or mac server.

It will run independently and can be run on the same server at which the moodle runs or on any other server.

there are many tutorial to install solr on specific platforms

https://www.vultr.com/docs/install-apache-solr-on-ubuntu-20-04/

Remember this,

  • During setup on Linux, you will create a solr user and try to run solr by that user only, you can switch the user from sudo su command and then do the solr-related operation for sake of permission issues in Linux.
  • Once you have set up the solr and able to access the solr home page from the web, you need to create a collection. The collection is like a logical area, where ingested data and operations will perform
su - solr -c "/opt/solr/bin/solr create -c moodle 
  • Apart from the solar, you will require the php solr extension at the moodle server. that is not available directly for installation. You can install it through pear or pecl. and depend upon the version , you may require to build the extension from zip.
  • following command can help you in centos based system
yum install php-pear php-pecl  php-devel curl-devel zlib-devel pcre-devel gcc libxml2-devel
pecl install solr

remember,
you need to create a solr.ini file under /etc/php.d and add this line
extension=solr.so

How to setup it with Moodle

Now, you have

  • setup the solr
  • create a basic collection name as moodle
  • setup the php solr extension

it comes to the integration part,

  • login through site admin in moodle
  • enable global serach under Site administration > advance features
  • Go on Administration > Plugins > Search > Manage global search
  • Select search engine as Solr
  • Configure the Solr under Administration > Plugins > Search > Solr
  • Enable File Indexing option, and setup upper size, if 0 then unlimited.

if dependencies are met, it will show the screen like this

  • Once the first three are yes, Click on Index data. this will ingest the data into solr and will show a screen like this.
  • You can enable the global search block and can try with query.

If you are not getting the result, then try to query the in solr directly, like below

Still, the file content would not be searchable.

Setting up file content serachable

still, the file content is not searchable if you have created a core from the default configuration. To make it searchable, we need to apply some changes at solr side.

Add following at line number 70 around

<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />

  <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />

  <lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />

Add following at line number 850 around

            <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="uprefix">ignored_</str>

      <!-- capture link hrefs but ignore div attributes -->
      <str name="captureAttr">true</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>
    </lst>
  </requestHandler>
  • restart the solr service
  • re-index the data from moodle form Index Data Page
  • Now you can query from solr or from global search , and if all goes well, it will display the result

Moodle Search result

Solr Search Result