One year closer to democratizing access to big bibliometric data

It’s been one year since CADRE began its mission of democratizing access to big bibliometric data for academic libraries and researchers.

Our two-year, IMLS-funded project set out to develop, seed, and maintain a cloud-based, extendable cyberinfrastructure for sharing large academic library data resources. This sustainable and affordable solution would facilitate collaboration and reproducibility among researchers and allow any researcher to mine enormous datasets and analyze and visualize results.

In the past year, we have laid the foundation for creating a platform that can provide these much-needed resources.

We began by developing a cloud-based infrastructure of virtual machines and copying the Web of Science and Microsoft Academic Graph datasets into the cloud. We then expanded our data platform to allow secure access from our BTAA partner institution researchers.

CADRE's logo; a blue and purple owl wearing a birthday hat. Illustration.

The CADRE IT team also created CADRE’s website and a preliminary version of the Research Asset Commons and its web interface. All aspects of CADRE’s Research Asset Commons interface will continue to evolve to reflect changes in the platform’s development.

Plus, there were a few accomplishments that exceeded our expectations, including the CADRE Fellowship Program and our decision to broaden the scope of CADRE’s datasets past WoS and MAG in the future. Changes like these often come in the form of important user stories we receive about how to improve the platform.

And of critical importance, we introduced you to our mascot, the CADRE Owl (who has dressed up for the festivities) and who will be featured in the first CADRE newsletter launching today.

What’s in store for year two?

By the beginning of 2020, CADRE plans to release a stable “pre-alpha” version of the platform. We still have a few important tasks to complete before we get there.

Our next steps will be dedicated to finishing CADRE’s core technology, the Research Asset Commons, and everything that comes with it. That includes improving packages that allow users to reproduce another researcher’s work, benchmarking graph databases for the GUI query-builder, and refining the querying process until we consider it the best solution out there–and one that can run near real-time.

Aside from building a platform, one of the promises of CADRE is empowering researchers by educating them about how to work with these technologies. Early next year, we’ll launch a series of webinars about CADRE Fellow projects, platform and tool tutorials, and training on leveraging WoS and MAG datasets.

It has been a busy year indeed–and we still have features beyond these that we will continue to develop in line with our mission.

If you are interested in taking part in our platform’s success, let us know. And if you don’t want to miss another year of CADRE’s progress, be sure to follow us on Twitter and subscribe to our newsletter.

CADRE: a collaborative gateway for large academic library resources

Earlier this week, the NSF-funded Science Gateways Community Institute hosted its annual Gateways 2019 conference.

The institute, in partnership with seven universities including Indiana University, provides resources and support for the development of science gateways. The Gateways 2019 conference gives gateway creators and users an opportunity to connect and share resources.

So what’s a science gateway, and what does it have to do with CADRE?

A science gateway gives access to shared scientific resources through simplified, user-friendly interfaces. While science gateways offer researchers accessible and affordable options to work with data and advance their research, constructing and maintaining these gateways is not easy.

Three-dimensional cubes floating in the darkness, with shattered glass and light beams surrounding the cubes. Illustration.

Take CADRE’s GUI query builder. If an institution can afford to purchase big bibliometric datasets, it often lacks the resources to provide standardized access to the data. Researchers who aren’t proficient coders are then unable to utilize these large, unwieldy datasets.

At the same time, building a GUI query builder is a costly and time-consuming task. It requires buying, seeding, and maintaining enormous datasets; figuring out how to clean the data and fit it into a combination of database technology; refining the querying process to be more efficient; and continuously maintaining it all.

The final product, however, will offer researchers with no programming or data science experience an affordable way to effortlessly query millions of scientific publications and save results.

The importance of gateways

So why does CADRE go through all the trouble? The CADRE platform was created to help the IMLS-funded Shared BigData Gateway for Research Libraries achieve its mission of developing and maintaining a cloud-based, extendable cyberinfrastructure for sharing large academic library data resources.

We believe libraries’ ability to provision large datasets is the modern equivalent of the collection building and stewardship roles libraries have always been entrusted with. However, libraries are struggling to do this.

Creating and curating infrastructure to contain these large, unwieldy bilbiometric datasets and providing analysis, visualization, and machine-learning services to work with these data is costly and time-consuming, as we said before.

Our affordable, open solution, which is built upon shared resources and ideas, will help libraries work more effectively with big data and make the data more accessible to all researchers.

Our science gateway will also facilitate collaboration between researchers through reproducibility. When researchers can share research and results within a common infrastructure, it will become easier to compare, collaborate, and build on each other’s work and advance research.

Recap: CADRE at ISSI 2019


Last week, the CADRE team presented our workshop and tutorial at the 2019 International Conference on Scientometrics & Infometrics (ISSI).

Our workshop showed researchers how the platform can benefit science of science research work, including discussions on the importance of big data reproducibility and accessibility. CADRE Fellows Michael ParkChao Min, and Samuel Hansen all spoke about their research projects and the role CADRE would play in them.

CADRE Fellow Yi Bu spoke at the tutorial, where attendees built a CADRE profile, queried the Microsoft Academic Graph dataset with our GUI query builder, performed analysis with simple code, reproduced their work, and even created a network visualization

If you weren’t at ISSI, there’s still an opportunity to engage with our presentations! You can visit our events pages to look at slides and recordings from both the workshop and the tutorial.

Top tweets

If you want to dive further into the action, you can check out our Twitter feed, where the workshop and tutorial were live-tweeted–and many attendees joined in.

CADRE Fellow Michael Witt enjoyed our creative spin on loading logos:
Follow Twitter link.

Attendee Cynthia Vitale found out how CADRE facilitates Nobel-worthy research about Nobel-prize papers:
Follow Twitter link.

Another CADRE Fellow, Scout Calvert, wants you to know that being a CADRE Fellow is pretty great–and that our owl is adorable . We can’t disagree:
Follow Twitter link.

To make sure you don’t miss another exciting event like our presentation at ISSI, follow us on Twitter and subscribe to our newsletter.

CADRE team heads to ISSI 2019

The CADRE team is heading to the 2019 International Conference on Scientometrics & Infometrics (ISSI) next week to present a workshop and tutorial to science of science researchers.

Both events will take place on Monday, Sept. 2, in the Calasso Room at Sapienza University in Rome.

Red and white logo that reads: ISSI.

These presentations will serve as CADRE’s introduction to an international community of researchers, who will be able to learn about and test the platform throughout the conference.

Our team will use the workshop to show science of science researchers how they can utilize CADRE to collaborate with other researchers and reproduce work. Researchers will also hear from our CADRE Fellows on how they’ll be using the platform for their fellowship research projects.

In our tutorial, attendees will learn how to use all the fundamental CADRE functions. Researchers will start by building a CADRE profile and will then have the opportunity to query the Microsoft Academic Graph dataset with our GUI query builder, perform analysis with simple code, reproduce their work, and more.

Website launch

Do you want more detailed information on what our presentations will include? Visit the event pages on our newly launched website. You can find a summary of our presentations, a list of speakers, a detailed program itinerary, and more. To learn more about what our ISSI workshop will contain, click here. To get an in-depth look at how our tutorial will work, click here.

Our new website also offers a comprehensive view of how CADRE will work, including documentation, helpful demonstrations, and updates about what we’ve been up to. Best of all, you can finally meet all of us.

As we button up our platform and presentations before ISSI, be sure to follow us on Twitter and scan our site for any updates. We’ll be tweeting throughout ISSI and our presentation.

And don’t forget to subscribe to our newsletter if you don’t want to miss any more CADRE news or updates.

Latest CADRE Updates


The CADRE team is hard at work developing a platform that will do what academic libraries have long been trying to achieve. 

We are gearing up for ISSI 2019 in September, where CADRE will hold a workshop and tutorial. Our hands-on CADRE tutorial at ISSI will offer an option to use assisted programming to access Microsoft Academic Graph (MAG), as well as a second option that will allow access to the dataset using the CADRE Query Builder, which uses a graphical user interface.

A bulletin board mapping next web development steps with photos, post-it notes and string.

But building a platform that allows novice coders to easily query massive datasets with a GUI is a lengthy process of trial and error—and that is only one component of CADRE.

Building CADRE is a complex and fluid task: Along with the Web of Science (WoS) and MAG datasets, CADRE will include U.S. patent and trademark data. And more datasets will be added to the platform as different types of researchers request access.

IUNI Lead Software Engineer Ben Serrette says because of the potential to take on more datasets, software solutions must be as generic and adaptable as possible. Like fitting and refitting the pieces of a complex puzzle that keeps changing shape, the IUNI IT team is solving multi-faceted problems with a flexible approach on an enormous scale.

Find out how they’re doing it below. 

Latest updates
  • Ruling out what doesn’t work: The IT team narrowed down the many serverless technology options that cloud-computing platform Amazon Web Services (AWS) offers by eliminating the ones that don’t fit the bill in terms of cost or ability to interface with other CADRE technical components.
  • Designing fundamental cloud architecture: CADRE’s infrastructure of cloud-based virtual machines and AWS services has been developed.
  • Integrating Jupyter Notebooks & file storage: Advanced users can write their own code to create data-analysis tools in CADRE’s notebook. Jupyter Notebooks is up and running with a working file system for storing code.
  • Testing the query builder: One service essential to CADRE is the GUI users can use to easily query massive datasets. The IT team is testing the combined powers of a relational database and various graph databases with MAG data to create a more efficient query-builder. 
  • Building CADRE’s website: The IT team is finishing the front-end of some pages of the CADRE website, including the homepage and the event page for ISSI 2019. They are preparing to make the website live in a couple weeks.
Decorative screenshot of website showing three CADRE Fellows and two school logos.

If you want to stay updated on what CADRE is doing, be sure to follow us on Twitter.

Meet the CADRE Owl logo


CADRE's logo. It is purple and blue with an owl on the left and text on the right that reads: CADRE, Collaborative Archive & Data Research Environement.

You met the CADRE Fellows. Now, meet the CADRE Owl! That’s right, CADRE has a logo. More than a picture, our logo represents who we are.

For Jessie Ma, the UI/UX designer at the Indiana University Network Science Institute, who has taken the lead on designing CADRE’s visuals, an owl aligned perfectly with CADRE’s mission.

Five versions of an illustrated owl head to show the design development of the CADRE logo.Ma says CADRE is a project designed to make valuable and complex large library data resources approachable to any academic researcher, regardless of their skill level in handling such data.

An owl is not just beautiful, it is a symbol of wisdom and a known powerful predator. Ma says this is exactly how CADRE resonates with her: a powerful system filled with complex big data wrapped in a user-friendly interface that allows even non-experienced coders to tame the data.

But to not go too deep, she adds that owls are pretty cute. A friendly mascot like the CADRE Owl helps balance a project that seems very technical and complicated to those who aren’t familiar with it.

Owl Evolution

The CADRE Owl has come a long way. The original version was a complex illustration that included the owl with glasses, hard drives to perch on, and a network backdrop. But after months of brainstorming and refining, Ma’s end product became a sharp, playful owl profile.An owl perched on a database with a network in the background. Illustration.

Even though she streamlined the logo to make it cleaner and easier to stamp across any outreach materials, Ma included details that would make the CADRE Owl feel a bit more connected to the project, including the owl’s network necklace and her creative take on an owl’s facial disc. All in all, Ma and the rest of the CADRE team are excited to be represented by the CADRE Owl.

July has been a busy month for CADRE—not only have we created our visual identity, but we also selected our first class of CADRE Fellows. We have so much more news to come, including our new website publishing soon.

We’ll also be presenting a webinar about CADRE on Aug. 8 for the OCLC Research Works in Progress Webinar Series. Don’t forget to register.

Keep your head on a swivel: Follow us on Twitter so you don’t miss a thing!

Decide how CADRE advances your research with User Stories

Research needs are varied and dynamic. While CADRE can ease the many technical and financial obstacles researchers face when working with large data resources, our platform also has the flexibility to fit more specific research needs. To better understand those needs, we collect User Stories.

Whether you are an academic researcher, librarian, or technical provider, CADRE’s ongoing call for User Stories allows you to tell us what you want from CADRE.

The corner of a laptop screen with graphs on the screen.

Maybe you are a researcher who requires the CADRE platform to apply multiple graph visualization tools, such as Gephi or VoS, to the same query result so you can explore your data more easily.  

Or you might be a researcher with multiple affiliations, who wants to ensure the code you store in your repository will live in a cloud and not become lost if you change affiliations.

Perhaps you like that CADRE allows you to save space by running queries on the cloud server, but you also want the ability to download data locally to share findings with outside collaborators.

Then tell us—the CADRE team took each of the stories above into serious consideration when we began developing our platform.

User Stories give insight into the functionality requirements of different types of researchers, which will help us build a platform that better suits the needs of every researcher who uses CADRE. Any request ranging from specific interface design elements to a general computational environment will assist in our future development decisions.

How to submit stories

No user story is too trivial to share, and you can submit as many stories as you want.

Simply visit our User Story Collection page and tell us about who you are. Explain your project goals and what CADRE functionalities could help you achieve those goals.

If you are also interested in taking advantage of CADRE’s current resources and capabilities, consider applying to the CADRE Fellowship Program. The deadline for the first round of CADRE Fellows is June 25. Similar to User Stories, this fellowship will help CADRE better understand the needs of its researchers.

How will CADRE advance your research? Tell us your story today.

Deadline extended to become a CADRE Fellow

The Collaborative Archive & Data Research Environment (CADRE) is extending the deadline to apply for the CADRE Fellowship Program to June 25. Academic researchers and librarians from any institution are invited to apply.

If you are not familiar with CADRE, we are an IMLS-funded project that provides sustainable, affordable, and standardized data- and text-mining services for licensed big datasets, as well as open and non-consumptive datasets too large or unwieldy to work with in existing research library environments. CADRE offers academic researchers access to these data in a secure cloud-based platform.

The benefits of being a CADRE Fellow include:A bar graph that compares Web of Science (lower) and Microsoft Academic Graph (higher). Illustration.

  • Full travel support to present your work at ISSI 2019 in Rome this fall,
  • Free and early access to our cloud-computing resources,
  • Access to big bibliometric datasets, including the Web of Science and Microsoft Academic Graph,
  • And training and technical support for the CADRE platform and for your project.
Fellowship Requirements

This fellowship program will help the CADRE team form expansive relationships with researchers, librarians, and data providers to gain critical feedback on developing the CADRE platform. As such, you do not need to have extensive programming experience to use CADRE. The platform will provide a user-friendly graphical user interface for data querying.

Applicants can form research teams consisting of graduate students, staff, and faculty from any U.S. or non-U.S. university—and teams can span any discipline and institution. You may also submit a research proposal without a team.

Sound interesting? Submit your CADRE Fellowship proposal here by June 25.

You can find more information about the CADRE Fellowship Program here. Fellows will be selected the first week of July.

Contact us at c a d r e @ i u . e d u with any questions and follow us at @CADRE_Project for the latest news.

Introducing the SBD-Gateway Project & CADRE Platform

The Shared BigData Gateway for Research Libraries (SBD-G) is a two-year IMLS-funded project to develop, seed, and maintain a cloud-based, extendable cyberinfrastructure for sharing large academic library data resources with a growing community of scholars.

SBD-G will achieve this through its platform, called The Collaborative Archive & Data Research Environment (CADRE).

CADRE will initially be seeded with a combination of open and licensed bibliometric datasets, including Microsoft Academic and Web of Science data.

The project is led by Indiana University Libraries in collaboration with the Indiana University Network Science Institute, the Pervasive Technology Institute, and the Big Ten Academic Alliance. Additionally, the SBD-Gateway is proud to be partnering with:

  • Michigan State University
  • Microsoft Research
  • Midwest Big Data Hub
  • Ohio State University
  • Penn State University
  • Purdue University
  • Rutgers University
  • South Big Data Hub
  • University of Iowa
  • University of Michigan
  • University of Minnesota
  • Web of Science Group
  • West Big Data Hub

To understand more about the value and complexity of these datasets, we encourage you to watch our short video on the subject: