Discoveries about Discovery – moving panes, feeding solr, and going public about google

A couple of reactions have come forward on jamun, our attempt to “search the web site and everything else” project.  For the most part, the comments have been quite positive, but some significant modifications have also been suggested.

jamun - sample searchThe “pane” approach generally seems to work well, but jamun may overdo it. Although we adjust for the number of results in a pane, we don’t attempt to calculate font metrics or other measures that would keep the panes proportional to each other. This often leads to the kind of displays shown at the left. Even with large monitors, the reality is that the columns stretch in a way that sometimes clobbers the panes at the bottom.

There is an interesting discussion on the NGC Mailing List on the next generation of discovery tools and David Walker made a plug for the NCSU model that we have adopted. Steven Morris offered a rationale that hit a lot of bases for what we want to accomplish:

“Although the search tool can often provide immediate gratification by way of the abbreviated result sets for each silo, another objective is stealth instruction:  using taste results to help the user navigate our silos and decide which to dive into for further searching, to then take advantage of whatever advantages (functionality, etc.) the silo-specific discovery environment has to offer.”

Our next steps on the layout are to change the sizing when panes are closed and revamp the columns by content type. We will stay with a three column design for now, but each column will attempt to reflect the type of material it brings forward.

jamun layoutThis means giving article searching the coveted left hand column with the most pixels and paging options, and putting a mish-mash of content into the middle. With solr indexes, we can combine some of the results so that WinSpace (our ETD repository) and SWODA (our historical collection) can share a pane. We anticipate that WinSpace will become a more full-fledged repository in Islandora and that we can try to blend in research data, so that the “other” column might be better characterized as “unpublished” content. A pane for the web site itself would be on the top and the idea would be for each column to have no more than two panes if possible.

That leaves  conifer (our catalogue), Scholar Portal E-Books, and the option to “search inside the book” via Google Books in the right-hand column. Dan Scott came up with a brillant strategy for keeping a table in evergreen up to date with the necessary field content. The field representations to be used in a solr index are then in one easy to use place with negliable overhead. In turn, solr’s data import handler can take care of keeping the index current.

This combination seems like a great option for leveraging the agility of evergreen and the indexing power of solr. For example, adjusting relevency based on date and possibly blending in full text content. Mixing full text content with metadata in particular requires a lot of configuration options and solr seems to be the best option out there for uniting the two types of materials. (BTW, solr already has a replication feature, and the folks at Scholars Portal are already maintaining a solr index, maybe this combo could be achieved without even requiring much local indexing?)

Using a bookshelf in Google Books for searching inside the book may be the most peculiar part of the puzzle, and we have gone to the source itself to ask about hidden limitations. This google forum question hasn’t received much attention yet but the potential benefit could be substantial if allowed. The code for building a large shelf is also on github, jamun itself is soon to follow.

Finally, we posted before about using a Google CSE for article content, but one interesting development in this space comes from Microsoft, which seems to be very close to releasing the details about its API. Add in the already formidable API from Scholars Portal, and we may soon need a discovery layer just to navigate among the discovery options.

