Ithaka’s December Big Data Infrastructure Report

Last month, Ithaka S&R published “Big Data Infrastructure at the Crossroads: Support Needs and Challenges for Universities”. CADRE typifies the type of big data infrastructure that this report celebrates and encourages. What direction or inspiration does this report give to CADRE’s future work? Here are some of our key takeaways.

1 Human networks

“Human networks are as essential to big data research as computing networks”. Cultivating and managing collaborative projects is incredibly difficult across borders of discipline, department, and institution. CADRE wields a talented, hard-working outreach team that consists of:

  • Outreach Coordinator Maks Szostalo
  • Research Scientist Filipi Silva
  • Data Librarian & Network Scientist Ethan Fridmanski

This interdisciplinary group completes outreach tasks under the direction of our Executive Director, Jaci Wilkinson, a librarian whose background is in user experience and digital content strategy. Our number one strategic priority in the next year is to identify new CADRE user groups, particularly at current member institutions. This requires extensive research to locate labs, institutes, libraries, and other networks that live in occasionally isolated pockets within their respective institutions. CADRE has untapped potential for big data researchers in the following areas:

  1. Innovation science
  2. Computational social science
  3. Digital humanities/history of science
  4. And more!

In another portion of the Ithaka report, they write, “Many researchers expressed hopes that more repositories would become available, or that their libraries would purchase more subscription databases.” Here at CADRE, we’re certain there are researchers who would find value in using datasets available in CADRE, whether it be Microsoft Academic Graph, USPTO patent data, or Web of Science, who just don’t know we exist yet.

2 “It honestly hadn’t occurred to me to think of the library as a resource for big data tools”

We need to change the perception that libraries are simply beautiful book buildings. Libraries are hubs of resources and expertise that can meet researchers’ sophisticated big data needs! CADRE is rooted in the innovation and values of academic libraries. Membership is administered by libraries and our leaders are librarians. Unfortunately, Ithaka reports that there is “uneven awareness and modest use” of the data management staff and programming that libraries have been heavily investing in. Here at CADRE, we are re-calibrating our outreach efforts in two important ways: Helping our library partners conduct outreach with the rest of their campus community to ensure maximum CADRE use, and Extending our own outreach to other institutes, labs, and centers, particularly within current member institutions.

3 Cloud computing… We know… it’s expensive

For financial and technical reasons, many university labs create and maintain their own computing infrastructure. This decentralization results in duplicative effort across academia. Here at CADRE, we host all of our datasets in the Cloud using Azure and Amazon Web Services. This is incredibly useful for our researchers but it is also expensive and difficult to budget for since charges are based on use. Over 50% of our budget goes to cloud computing. We’re exploring tools to reduce our cloud computing costs and membership modifications so we don’t have to pass cloud computing surcharges to our members. More recent data enclaves, such as OpenAlex, have adopted an open access, self-hosting model. That means each user/institution sets up and pays for its own cloud computing storage. Is the OpenAlex model a more sustainable future for big dataset administration? CADRE is paying close attention to OpenAlex and our other peers in this quickly changing environment.

4 Help us fuel the future of CADRE

A key recommendation to funders found in this report is, “Continue to support the robust development of data repositories.” We couldn’t agree more! We’re seeking grant and private funding to help build off of the already vital work CADRE has done over the past three years to centralize the infrastructure to host USPTO, MAG, and Web of Science datasets. There’s more we can do to optimize our value to our current and future member community. But our current budget comes only from membership fees and barely sustains the costs to host CADRE. Reach out to us at cadre@iu.edu if you want to discuss new funding and strategic directions.

Leave a Reply

Your email address will not be published. Required fields are marked *