Designing a faceted search for solar data

PDF version of this document

Ian Ruotsala

Abstract

In library, information and computer science, an ontology is a network of relationships. A well-known subset of an ontology is a “taxonomy,” which is an ontology consisting of “is a” relationships. Biological taxonomies such as “a human is a primate is a mammal is an animal” provide a concrete example. While taxonomies have often been used by software developers as a basis for categorizing a set of items, more general ontologies can provide a richer description of relationships within a set.

Faceted searches provide a way to navigate through instances of a more general ontology and have been increasingly used by Internet merchants such as Amazon to provide intuitive ways of searching and browsing through their products. For this project, an implementation of faceted search was attempted for solar physics data. Using an Internet browser as a platform, a JavaScript program was to be developed in which a data base of solar events (e.g. flares, sunspots) and observations (e.g. which instrument was used, which astronomer cataloged the event) would be queried so that data could be accessed in a dynamic, intuitive way.

Keywords:
faceted search, ontology, search, JavaScript, software development, web application, helioinformatics, solar physics

Background: ontologies and faceted search

Taxonomies are constrained ontologies

Imagine one were to enumerate a set of animal species. One could classify the species according to their types, e.g.:
biological taxonomy
This is an example of a taxonomy, a network of “is a” relationships.

This is indeed useful, yet one could imagine many other sorts of relationships among species than type. One could describe “lion preys upon gazelle” or “hyena competes with lion”. This more thorough network of relationships is an example of an ontology.

Faceted search is implemented by Internet merchants

It is common for Internet merchants to categorize their merchandise in an ontology, so that a faceted search may be implemented. Whereas traditional type classification encouraged hierarchal browsing according to a single relationship (“is a”), faceted search is a way of browsing through an ontology in lateral ways, and using many different relationships.

Helioinformatics

With increasingly cheap computer storage and processing, more massive amounts of data are generated by scientific studies. In solar physics, the field known as “helioinformatics” uses various tools of informations technology, including ontologies, to better get a handle on data.

Ontology of solar data

During the summer of 2009, I interned at the Lockheed Martin Solar & Astrophysics Laboratory (LMSAL). LMSAL has a database of solar data which is accessible via a web API (application programming interface). My task over that summer and into this school year has been to implement a faceted search for their solar data using that API.

Implementation challenges and achievements

Background: description of current system

LMSAL maintains a database of solar events and observations, the Heliophysics Event Registry (HER). The HER can be queried from a web API, returning a JSON file that contains information on events and observations within specified parameters, as well as the relationships of those events and observations to other events and observations. The JSON file format is a data-interchange format (a la XML), as well as a subset of JavaScript, making it especially pliable to that language. My faceted search was to be incorporated into their present search interface at http://www.lmsal.com/helio-informatics/hpkb/

Initial approach

The initial idea was to have the search be structured in a tree-like fashion, with a description of the currently searched set at the root, and, for leaf nodes, a description that was describable by the more limited web API. To mediate between the leaves and the root, various intermediary nodes would be used.

Changing approach

I think that I did not fully understand how faceted searching worked when I decided on the initial approach. Fortunately, I decided on a better approach; unfortunately, it was not until mid-way through my project. With my new approach, the user would be presented with every event and observation within a timespan. The user could then narrow down by selecting for specific features.

A challenge came up by way of the fact that the API only allows 200 events to be returned per query. My way around this was to have each of the about twenty different event types do a query involving just themselves; so, e.g., the “solar flare” type would only search for solar flares within a time span, the sunspot only sunspots, and so on. Once returned, the various events would be stored together so that events with different types but same properties (spots and flares both have grid locations, for example) could be found in the appropriate search.

Obstacles to completion

There were a variety of hindrances that kept this project behind schedule. Most significant among them, I underestimated the steepness of JavaScript’s learning curve. JavaScript is described as having “Java-like syntax,” and I foolishly thought that, since I could already program in Java, JavaScript would easily come to me. Despite the surface similarity of the two languages, they are, in fact, quite dissimilar: JavaScript is intended to be run on web browsers, while Java is run on a JVM; contrary to Java, there are no classes in JavaScript, only instantiated objects; and so on.

Coda

Sadly, this project is not near completion as of writing time for this paper. It was, however, a valuable learning experience. My plan is to continue working on this project even after the end of the school quarter with the hope that a modicum of extra effort, I can complete this.

Thanks

I would like to thank the many people whose insight, assistance and understanding brought this project along, including:

EJ Zita (The Evergreen State College) for providing the opportunity and preparation for the solar physics internship

Neal Hurlburt (Lockheed Martin Solar Astrophysics Lab) for his guidance on the direction of this project

Neal Nelson, Sherri Shulman and Richard Weiss (The Evergreen State College), my professors, for helping me mature as a programmer and student of computer science.

Posted in Uncategorized | Leave a comment

Designing a faceted search for solar data (abstract)

by Ian Ruotsala

In library, information and computer science, an ontology is a network of relationships. A well-known subset of an ontology is a “taxonomy,” which is an ontology consisting of “is a” relationships. Biological taxonomies such as “a human is a primate is a mammal is an animal” provide a concrete example. While taxonomies have often been used by software developers as a basis for categorizing a set of items, more general ontologies can provide a richer description of relationships within a set.

Faceted searches provide a way to navigate through instances of a more general ontology, and have been increasingly used by Internet merchants such as Amazon to provide intuitive ways of searching and browsing through their products. For this project, an implementation of faceted search was attempted for solar physics data. Using an Internet browser as a platform, a JavaScript program was to be developed in which a data base of solar events (e.g. flares, sunspots) and observations (which instrument was used, which astronomer cataloged the event) would be queried so that data could be accessed in a dynamic, intuitive way.

Posted in Uncategorized | Tagged , , , , , , , | 1 Comment

Spring Quarter, week 7 presentation

* end of year is fast approaching, yet still so much I need to do

* I have been all-around too ambitious about what I wanted to do this year

* I am poor at context switching while doing homework

* try to do once-a-week, day-long coding marathons for next big project

data_handling.js

Posted in Uncategorized | Leave a comment

AJAX

So, I think I should create a script specifically to query the server, call is, say, query_server.js. This is called everytime the user does something that requires the server to be queried.

query_server.js will then change some global data in another script, hopefully updating the solar data, but also communicating errors such as server timeouts, etc.

Posted in Uncategorized | Tagged , , | Leave a comment

asynchronous JS

Here is my first (semi) professional faceted search JS file.

I append semi to the professional because it is still a work-in-development. What I am especially uncertain about is how to do asynchronicity: how should I handle the queries to the HEK server? Mainly, I don’t want the program to hang if the server times out. Also, I want the code to continue doing stuff even if it has to wait for a few seconds to receieve a reply from the HEK server. I was initially planning to have the code “hang” on the XMLhttpRequest.send for a few hundred milliseconds; then, if no reply was had, have it just continue onto the other queries, then later return to again query for the event.

Posted in Uncategorized | Leave a comment

JavaScript reference semantics

I’ve decided to have the program grab the events from array, sort them by their unique ID, and put them into cumulative data structures which will be remembered throughout the course of the program.

Each time the API is queried, the newly returned JSON objects will again be sorted, checked for duplicated in the persistent structures.

Questions:
* when I grab the objects from their original JSON and place them in the persistent structure, how should I do it so as to not have the JSON stick around? JS Objects are passed via reference semantics. I worry that if I grab some but not all the objects from their original JSON, the entire JSON structure will persist, consuming resources. I think perhaps something like “clone” would solve this problem.

Posted in Uncategorized | Tagged , | 1 Comment

Spring Quarter, week 3 presentation

* realized there was a much better way to do this

* if the project is getting cluttered way fast, there is probably a simpler solution

* how to organize all the event objects? Probably I will put them in some arrays and sort by their ID

hacktest01.js

hacktest02.js

Posted in Uncategorized | Leave a comment

professional code

I think I am actually, finally, starting to grasp JavaScript at a pretty competent level. Time to start being more professional with my code, i.e. I should do more documentation and versioning.

Posted in Uncategorized | Tagged , , , | Leave a comment

more backend details

Program will start by displaying all events within the past 24 hours. The API will be queried once by each event type. An appropriate data structure (probably a list) will hold the events that are returned, with a list for each events. The events will be sorted by start time.

Posted in Uncategorized | Leave a comment

backend details

Querying the HEK API will be fairly straightforward: I will do one query for each event type. The returned JSON files will then be parsed into JavaScript objects. These will be stored on the user’s system for the remainder of the time the system is in use. POTENTIAL PROBLEM: what if the JSON data gets into the 100’s of MB range? I don’t want to start bogging down the user’s browser.

Thus, the only times the server will be queried will be when the extrema of end time or start time change.

Posted in Uncategorized | Leave a comment