Skip to main content

1 Comment

  • Kat says:

    I’m thrilled that our paper was a topic for discussion!I thought I’d provide some comments, more explanatory than anything else.We chose the High-Level Browse because it is a local instantiation of a classification, as you noted. One simple classification used locally means we can integrate and share more easily in-house. That was the hope at least– that if we used this classification, we’d be able to integrate our system better inside the library. We haven’t gotten a chance to test this to date for multiple reasons, one of which is that HLB will likely be modified soon.The quality checking we did was not in fact an enormous task. There were 500 clusters generated, and many of these were very clearly “junk” topics, i.e., topics for which we couldn’t generate a clear word/phrase label. I pulled together 5 folks from our digital library department to help me “junk” topics and come up with labels for the rest. In point of fact, this was so much fun for my colleagues, I didn’t even have to nudge them once to finish! My colleagues all had different subject backgrounds, so we covered the basic large subject disciplines. We didn’t have a chance to talk to catalogers, much as we would have wanted to (and if we didn’t say that in the paper explicitly, we should have) because our time limit was extraordinarily short. We performed everything you read in literally 2 months time.We did label the clusters, plus we connected the clusters to both the top level HLB categories and the second-level HLB categories. Apologies if this wasn’t clear– see Figure 9 for more description.We thought about using creator as a field for use in the experiment, but the clustering around creator, as you mentioned, was going to be problematic for us so we decided against it.I believe that “simple interface” was our vain hope that OAIster could someday become that, whether that is Google-like or not. (Perhaps we’re on our way now, but more on that later…) Clustering was an attempt to create a simple way into a complex system. In terms of our user base, I believe we’ve always assumed it is for researchers and scholars, but that those researchers and scholars could be tweens. It has been extremely difficult to gather this information to date.And last, but not least, we just did some experiments to determine how much of our data is in the Google search index. Not as much as you’d expect– about 30%. We’re refining our experiments before we publish (likely in DLib this fall), but I think this points to the need for aggregations like OAIster.Cheers, -Kat

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.