It sounds painful and it kind of was, but progress has been made in the quest to open our digital repository data and that progress involves Solr. To wit, there are a few things I have learned recently:
- I can actually figure out a lot more with Solr than I thought
- Trying to implement a Solr plugin when you are somewhat challenged to make Solr run and index records in the first place is a clear indication of one’s willfulness
- The following movies are really good for taking breaks: Fierce Creatures, The American, the new Muppets movie, and Monuments Men (breaks are important and George Clooney seems to help too!)
The latest chapter of my story begins in Bloomington and ends in Amsterdam, with a few trips around the sun in between. Let’s begin by talking about Solr because we all know what happened when I contemplated opening up the data directly from Fedora.
If you want to grab a version of Solr and give it a whirl, nothing is easier than hitting the Solr download page, unzipping/un-tarring the latest version, and running the magic “java -jar start.jar” command in the magic “core” directory. On a Mac or Linux box, of course (who knows what happens on Windows). You can even index some sample records with a different java command and the admin interface gives you search results and it just works. Magic! Everything’s great. Then, you want to do something more. Like try this Solr install with a giant index of actual data and then see if you can add a plugin for OAI feeds. What could go wrong?
Problem: Giant index of data is for a different Solr version than the one you installed
Problem: OAI plugin requires a different Solr version (3.4.0) than the one you installed (4.7.0) and the one where you got the giant index of data (1.4.1)
Problem: Even if you could get that giant index of data working on a Solr version that would also work for the OAI plugin, you can’t figure out where to put the OAI plugin’s binary (.jar file) because the magic install you used works with Jetty, a self-contained Java server, that mixes everything up between the Java code and the Solr code
We next embark on installing Solr to run on Tomcat, because that’s how I give up on things.
I had a giant index of data from Solr 1.4.1 and the oai4solr plugin needed Solr 3.4.0 at least, so 3.4.0 is what I chose and I backed off from using the giant index of data. Instead, I found the pieces we use to index certain types of MODS records from our Fedora digital repository into Solr 1.4.1 and, after copying the important pieces from the Solr 1.4.1 config (I hope), I’ve managed to recreate what could be our Solr index from Fedora using 3.4.0 – with 4 records of data.Essentially, for those who like pictures, I was dealing with one version of Solr that had an index, like a piece of bread with a pat of butter on it.
Then I had another version of Solr that also had an index, like a broom and a dustpan.
What I was initially trying to do was butter my broom, or apply an index from one version of Solr to a different version of Solr. This is not a thing, apparently.
Once Solr was installed on Tomcat, all .jar files had a single place to go and I knew where to put the oai4solr.jar file. But OAI feeds don’t just happen, they are a specialized metadata format surrounding records in more commonly recognizable metadata formats (Dublin Core, for example) and can be called up using a specific URL with specific parameters. So stuff has to be programmed (.jar file) but stuff also has to be configured, and I wasn’t configuring anything well in the code that accompanied the .jar file.
Spending your Sunday afternoon with Java error messages on your localhost server sucks. In the end, clearer minds prevailed (Cliffster) and I was convinced that it was reasonable to email the guy who wrote the oai4solr plugin and put it on Github in the first place. In Amsterdam. On a Sunday night.
He answered. From Amsterdam. On a Sunday night. Lucien van Wouw from the International Institute of Social History, the author of oai4solr, helped me fix up my plugin configuration and I had a working oai4solr plugin on Monday morning. I now have OAI feeds from my Solr index returning sets and record lists in DC and MODS. (I’m not exactly sure how to say his last name, but I’m going with Wow! – including exclamation point – because that was awesome.)
So now I have a different Solr version with a super-small number of records, possibly configured the same as the original Solr index, definitely with a couple new things that need to be included when indexing Fedora records into Solr, and absolutely with other collections in Fedora that still need to receive this mapping treatment so they can have descriptive information and be included in our Solr index. But most importantly, there is a way forward to open our data with OAI feeds. I banged my head against the Sun and kinda made something happen!