I attended the NISO/DCMI webinar Metadata for Managing Scientific Research Data presented by Jane Greenberg just now and thought I’d leave share some take-aways:
- If you think the bibliographic world is a mess, wait until you get an eyeful of the research data world.
- Data sets need metadata and boy, are there a ton of options. Among those mentioned: Dublin Core, MARC, DataCite, Darwin Core, Access to Biological Collections Data (ABCD), Ecological Metadata Language, Content Standard for Digital Geospatial Metadata (FGDC), and Document Data Initiative (DDI).
- When picking and implementing a metadata schema for big data, focus on shareability and try to avoid creating silos.
For more information on the topic, see the paper Greenberg recently co-authored:
Willis, C., Greenberg, J. and White, H. (2012). “Analysis and synthesis of metadata goals for scientific data.” Journal of the American Society for Information Science and Technology, 63: 1505–1520. doi: 10.1002/asi.22683
IU affiliates will be able to access the archived webinar in the near future.
Bob Noel recently pointed me to this example of user-contributed metadata: citizen astronomers have classified galaxies via the Galaxy Zoo project. Information about the project can be viewed at the archived website.
A ‘Galaxy Zoo 2’ project is currently underway here. Galaxy Zoo: Hubble enlists astronomers in the classification of galaxies found in the archives of NASA’s Hubble Space Telescope.
An article in Market Watch alerted me to this really cool resource: The Cell: an Image Library http://www.cellimagelibrary.org/. My love for cellular biology aside, this project is exciting for a number of reasons. First, it’s clear that robust ontologies are driving powerful search and browse features. Poke around the advanced search options to view the controlled vocabularies.
Second, anyone can contribute media files AND raw data. A team of annotators checks the submission and enriches metadata.
Third, while this NIH-funded project proclaims itself to benefit primary research first, the site is well-suited for classroom use.
A summary of the technical infrastructure can be found in this article in Scientific Computing.
The California Digital Library announced that they are developing an extension for Microsoft Excel that allows science researchers to manage and share their data sets. Find out more about the project at the Digital Curation for Excel Project (DCXL) blog: http://dcxl.cdlib.org/. Project developers are currently taking input from researchers via Twitter: http://twitter.com/dcxlCDL.
Why is data curation important? This page on the DataCite website has a good answer.
What incredible potential for data reuse. What new discoveries might we enable? What new services will be created if we facilitate open data exchange?