CADRE creates fellowship program for those working on coronavirus-related research

April 6, 2020
UPDATE: We have received a lot of interest in this program and have recently taken on four new fellowship teams. While we want to support as many RCSC research projects as possible, we have put a pause on accepting new fellows until we have the capacity to provide intensive support for new teams. You are welcome to submit a prospoal to be considered at a future date, and you can contact us if you have questions.

In response to the White House’s call to action for the scientific community to help solve important COVID-19 questions, the CADRE project has created the Research Cohort for the Study of Coronaviruses (RCSC) fellowship program.

The RCSC Program gives any researcher working on COVID-19 or coronavirus-related research the opportunity to work in CADRE’s cloud-based platform and take advantage of a special tier of service. Research teams can consist of graduate students, staff, and faculty from any U.S. or non-U.S. university—teams can span any discipline and institution.

RCSC researchers will be able to:

  • Query across the COVID-19 Open Research Dataset (CORD-19) of scholarly literature in its raw format and parsed into a relational database, which will be updated weekly
  • Query across CADRE’s Web of Science and Microsoft Academic Graph datasets
  • Use CADRE’s cloud-computing resources, GUI query-builder, and Jupyter Notebook coding environment
  • Receive intensive technical support for their work
  • Present their research in the CADRE Fellows Webinar Series

Currently, there is no deadline for proposal submissions, and we plan to accept RCSC research teams as we are able to provide support.

If you are interested in submitting a proposal, you can do so here. You can also read more about the RCSC Program here.

About CADRE

CADRE is an IMLS-funded, cloud-based platform that provides academic libraries and researchers with sustainable, affordable, and standardized text- and data-mining services for licensed big datasets, as well as open and non-consumptive datasets too large or unwieldy to work with in existing research library environments. The platform is currently in its alpha phase and any researchers may test it.

CADRE is led by the Indiana University Libraries in partnership with the Indiana University Network Science Institute and the Big Ten Academic Alliance. The project is also supported by eight Big Ten Academic Alliance library partners, the Web of Science Group, Microsoft Research, and the Midwest, South, and West Big Data Hubs.

You can learn more about the CADRE platform here.

CADRE launches Alpha Version of Open Research Platform

Feb. 10, 2020
The Collaborative Archive & Data Research Environment (CADRE) will release the alpha version of its platform today in tandem with its major workshop that will show researchers from any institution how to access and test the free tier of the platform during alpha.

CADRE’s cloud-based platform provides sustainable, affordable, and standardized text- and data-mining services for licensed big datasets, as well as open and non-consumptive datasets too large or unwieldy to work with in existing research library environments. CADRE is currently seeded with the Web of Science (paid tier), Microsoft Academic Graph, and U.S. Patent and Trademark Office datasets.

The $2 million project began in 2018 with a two-year National Leadership grant from the Institute For Museum and Library Services. CADRE is led by Indiana University Libraries, in partnership with the Indiana University Network Science Institute, and the Big Ten Academic Alliance.

A collection of headshots of the CADRE leadership team.

CADRE’s core leadership team includes CADRE Director Jamie V. Wittenberg (IU Libraries) and CADRE Co-Directors Patricia L. Mabry (HealthPartners Institute), Valentin Pentchev (IU Network Science Institute), Xiaoran Yan (IU Network Science Institute), and Robert Van Rennes (Big Ten Academic Alliance).

CADRE’s Alpha Release (Version 0.1.0-Alpha) means users can begin working with Microsoft Academic Graph and using CADRE’s many features at no cost. CADRE’s sponsoring partners will additionally have access to the Web of Science.

“Over the past year, we have dedicated much of our resources to the creation of the CADRE platform and worked hand-in-hand with the CADRE Fellows,” said Pentchev, who leads the technical team building CADRE. “With this alpha release, we are unveiling the first pre-production version of the product. While a lot of components are still under development, we are confident that the alpha release presents a strong foundation with a variety of stable tools and useful features.”

“The Big Ten Academic Alliance is delighted with the progress being made with CADRE, as it’s another great example of what we can achieve when we work collaboratively,” added Van Rennes of the Big Ten Academic Alliance.

At today’s CADRE workshop, CADRE: A One-Stop Shop for Scholarly Data Access, Sharing, and Reproducible Computation, attendees will learn how to work with all of the CADRE features available to them in the alpha version of the platform, such as how to:

  • Access open Microsoft Academic Graph datasets (or Web of Science for certain users)
  • Query datasets with the CADRE Query Builder, CADRE’s user-friendly GUI query builder
  • Code data-analysis and visualization tools in Jupyter Notebook
  • Create a private space to store query outputs, data-analysis tools, and research results
  • Reproduce queries, data-analysis tools, derived data, research results, workflows, and visualizations in the Marketplace
  • Reproduce analysis or visualizations shared by other researchers in the Marketplace

“Those who attended our CADRE tutorial at the ISSI conference in Rome last fall will see a lot of familiar content, as well as some new features and product enhancements, such as package creation,“ noted Mabry, one of CADRE’s Co-Directors.

The CADRE logo: A blue owl. Illustration.CADRE’s alpha testers will be asked to provide feedback on the platform’s functionality, which is crucial for informing CADRE’s continued development and ensuring the platform meets the needs of the broad community of users it is intended to serve.

“While we learned a lot from our close collaboration with the fellows, we are eager for feedback from a broader audience, and that’s why we are doing an open release with the alpha version of the platform,” said Yan, who oversees the team’s interactions with CADRE Fellows.

CADRE will continue to provide intensive support to its CADRE Fellows—the original, interdisciplinary group of researchers who provided use cases to the CADRE team by using the platform to advance their research. Fellows will begin presenting webinars on their work in March.

To learn more about how to sign up for CADRE, visit the Getting Started page and attend the CADRE workshop.

CADRE Partnerships

This project is funded with IMLS award LG-70-18-0202 and is additionally supported by a unique group of cross-industry partners.

CADRE partner institutions include:

  • Midwest Big Data Hub
  • South Big Data Hub
  • West Big Data Hub
  • Microsoft Research
  • Web of Science Group
  • Michigan State University Libraries
  • Ohio State University Libraries
  • Penn State University Libraries
  • Purdue University Libraries
  • Rutgers University Libraries
  • University of Iowa Libraries
  • University of Michigan Libraries
  • University of Minnesota Libraries
Contact:

Stephanie Hernandez McGavin
Email: smcgavin @ iu . edu

Meet CADRE’s first class of fellows

July 18, 2019
The Collaborative Archive & Data Research Environment (CADRE) accepted its first class of CADRE Fellows.

These seven fellowship teams span across disciplines and offer compelling research that incorporates big data and bibliometrics. Each fellow team will access CADRE’s Web of Science (WoS) and Microsoft Academic Graph (MAG) datasets to achieve their research goals.

Our fellows will present their research at the International Society for Scientometrics and Informetrics (ISSI) 2019 Conference in Rome at either the workshop or tutorial that CADRE is hosting on Sept. 2.

Not only will these fellows show how CADRE helped advance their work, they will serve as integral use cases for how we develop our platform to suit the needs of every type of academic researcher.

Now, let’s meet the research teams!

Utilizing Data Citation for Aggregating, Contextualizing, and Engaging with Research Data in STEM Education Research from Purdue University
Collection of three headshots in the order of researchers listed below.

Researchers:

  • Michael Witt, associate professor of library science, Purdue Libraries and School of Information Studies, Purdue University
  • Loran Carleton Parker, associate director & senior evaluation and research associate, Evaluation Learning Research Center, Purdue University
  • Ann Bessenbacher, research associate and data scientist, STEMEd HUB, Purdue University

Researchers will characterize citation of data from the literature in the field of STEM education research. A sample of relevant publication venues in the field will be identified from WoS and MAG. Digital Object Identifiers (DOIs) of datasets registered with DataCite will be used to query and associate datasets with publications. The team will assess rates of citation for datasets that are cited using DataCite DOIs for each publication venue and analyze a sample of data citations and publications to determine suitability for providing an initial context to help a researcher who is unfamiliar with the data determine whether to use the dataset.

Understanding citation impact of scientific publications through ego-centered citation networks from Indiana University Bloomington, Nanjing University
Collection of three headshots in the order of researchers listed below.

Researchers:

  • Yi Bu, Ph.D. candidate in informatics, Indiana University Bloomington
  • Chao Min, research assistant professor in information management, Nanjing University in China
  • Ying Ding, professor of informatics, Indiana University

The research team seeks to find the “deeper” and “broader” impact of network-based citation measurements in the scientific community. This project will determine the citation impact of scientific publications using an ego-centered citation network, which contains the citing relationships between a publication and its citing publications, as well as the relationships within its citing publications. Researchers will use the entirety of the WoS and MAG data to establish empirical evidence in this project.

MCAP: Mapping Collaborations and Partnerships in SDG Research from Michigan State University
Collection of four headshots in the order of researchers listed below.

Researchers:

  • Jane Payumo, academic specialist and research and data evaluation manager, MSU AgBioResearch, Michigan State University
  • Devin Higgins, digital library programmer, MSU Libraries, Michigan State University
  • Scout Calvert, data librarian, MSU Libraries, Michigan State University
  • Guangming He, information management analyst, MSU Innovation Center, Michigan State University

This project will build on the WoS report “Navigating the Structure of Research on Sustainable Development Goals (SDG),” as the researchers search for patterns of global collaboration and support the United Nations’ SDG call for action. Researchers will design a prototype to analyze and visualize the input-output of partnerships over time in SDG-supportive research. They also plan to create a scoring measure or partnership index that defines and conducts partnership analytics for SDGs by using data sourced from WoS and MAG.

The global network of air links and scientific collaboration – a quasi-experimental analysis from Indiana University Bloomington and University of Warsaw
Collection of four headshots in the order of researchers listed below.

Researchers:

  • Katy Börner, Victor H. Yngve distinguished professor of engineering & information science, Indiana University Bloomington
  • Adam Ploszaj, assistant professor at the Centre for European Regional and Local Studies, University of Warsaw
  • Lisel Record, associate director, Cyberinfrastructure for Network Science Center
  • Bruce Herr II, senior system architect and project manager, Cyberinfrastructure for Network Science Center

Researchers plan to determine the impact of the introduction and availability of long-distance flights on international scientific collaboration. The team will measure collaboration through co-authorship and co-affiliation. They will also geocode publication affiliations from WoS and MAG from 1998 through 2017. This quasi-experimental research will apply state-of-the-art causal modeling techniques and explore how data-driven causality can enhance science of science policy relevance.

Measuring and Modeling the Dynamics of Science Using the CADRE Platform from University of Minnesota, New York University, Boston University, University of Pennsylvania, University of Arizona
Collection of five headshots in the order of the first five researchers listed below.
Collection of four headshots in the order of the last four researchers listed below.

Researchers:

  • Russell Funk, assistant professor of strategic management & entrepreneurship, University of Minnesota
  • Michael Park, Ph.D. student in strategic management and entrepreneurship, University of Minnesota
  • Thomas Gebhart, Ph.D. student in computer science and engineering, University of Minnesota
  • Britta Glennon, assistant professor at Wharton School, University of Pennsylvania
  • Julia Lane, professor at Wagner Graduate School of Public Service, New York University
  • Raviv Murciano-Goroff, assistant professor at Questrom School of Business, Boston University
  • Matthew Ross, research assistant professor at Wagner Graduate School of Public Service, New York University
  • Erin Leahey, professor and director of sociology, University of Arizona
  • Jina Lee, Ph.D. student in sociology, University of Arizona

This research team wants to better characterize scientific influence of papers, typically measured by how many times papers are cited, by distinguishing between papers that destabilize existing knowledge with novel concepts and papers that consolidate existing knowledge. In a separate but closely related aim, the researchers also plan to create a novel unsupervised machine learning technique for author-name disambiguation by pulling abstract, title, and citation data from WoS and MAG. For both aims, the CADRE platform will provide essential infrastructure in terms of large-scale data storage and high performance computational resources.

Comparative analysis of legacy and emerging journals in mathematical biology from University of Michigan and University of Michigan Medical School
Collection of four headshots in the order of the researchers listed below.

Researchers:

  • Marisa Conte, assistant director of research & informatics, Taubman Health Sciences Library, University of Michigan
  • Samuel Hansen, mathematics and statistics librarian, Shapiro Science Library, University of Michigan
  • Scott Martin, biological sciences librarian, Shapiro Science Library, University of Michigan
  • Santiago Schnell, John A. Jacquez collegiate professor of physiology, University of Michigan Medical School

Researchers will perform a comparative analysis on papers published in four mathematical biology legacy journals and on newer journals with different publication models and disciplinary scope. The team will use the CADRE datasets to develop methodologies for comparative bibliometrics and content analyses; provide insight into publication trends in theoretical and applied domains; give authors new factors to consider when trying to publish; and help editors in similar disciplines use informatics to distinguish their journals.

Systematic over-time study of the similarities and differences in research across mathematics and the sciences from University of Michigan
Headshot of the researcher listed below.

Researcher:

  • Samuel Hansen, mathematics and statistics librarian, Shapiro Science Library, University of Michigan

Samuel’s project uses reference and citation aging, bibliographic coupling, and network breadth and depth to find similarities and differences between research fields in mathematics and the sciences. Specifically, they will find how information ages differently across disciplines, generate data about changes in the development of these research fields, and study how actively collaborative the disciplines are. Samuel will use WoS data from 1900 to 2017 to perform these analyses, which have typically only been done on a smaller scale in a single discipline.

Purdue Research Team Among First Class of Fellows for Collaborative Archive Data Research Environment (CADRE)

July 18, 2019
A team of Purdue University researchers is among the seven fellowship teams selected for the first class of the Collaborative Archive Data Research Environment (CADRE) Fellows.

These seven fellowship teams span across disciplines and offer compelling research that incorporates big data and bibliometrics. Each fellow team will access CADRE’s Web of Science (WoS) and Microsoft Academic Graph (MAG) datasets to achieve their research goals.

Purdue University members of the first class of CADRE Fellows, L to R: Michael Witt, Loran Carleton Parker, and Ann Bessenbacher

The three-member Purdue University team will work on the project, “Utilizing Data Citation for Aggregating, Contextualizing, and Engaging with Research Data in STEM Education Research.” The researchers are:

Michael Witt, associate professor of library science, Purdue Libraries and School of Information Studies, Purdue University, Loran Carleton Parker, associate director and senior evaluation and research associate, Evaluation Learning Research Center (ELRC), College of Education, Purdue University, and Ann Bessenbacher, research associate and data scientist (ELRC), STEMEd HUB, Purdue University.

Per the description of their project: “Researchers will characterize citation of data from the literature in the field of STEM education research. A sample of relevant publication venues in the field will be identified from WoS and MAG. Digital Object Identifiers (DOIs) of datasets registered with DataCite will be used to query and associate datasets with publications. The team will assess rates of citation for datasets that are cited using DataCite DOIs for each publication venue and analyze a sample of data citations and publications to determine suitability for providing an initial context to help a researcher who is unfamiliar with the data determine whether to use the dataset.”

The other six teams and their CADRE research projects are listed at https://blogs.libraries.indiana.edu/sbd-gateway/2019/07/18/cadre-first-fellows/.

The Fellows will present their research at the International Society for Scientometrics and Informetrics (ISSI) 2019 Conference in Rome at either the workshop or tutorial that CADRE is hosting on Sept. 2.

Not only will these fellows show how CADRE helped advance their work, but they will also serve as integral use cases for how the CADRE platform is developed to suit the needs of every type of academic researcher.

Made Possible in Part by IMLS

The Shared BigData Gateway for Research Libraries (SBD-G) is a two-year Institute of Museum and Library Services-funded project to develop, seed, and maintain a cloud-based, extendable cyberinfrastructure for sharing large academic library data resources with a growing community of scholars.

SBD-G will achieve this through its platform, the Collaborative Archive & Data Research Environment (CADRE).

Source: Purdue University Libraries

Collaborative Archive & Data Research Environment Fellowship Program Announced by IU

May 6, 2019
Indiana University’s Network Science Institute has recently announced its CADRE Fellowship Program.

CADRE, or the Collaborative Archive & Data Research Environment, is ready to provide early access to a limited set of researchers. We invite you to collaborate with our team of experts and use our state-of-the-art big bibliometric data and cloud computing environment for your next research project!

CADRE aims to provide sustainable, affordable, and standardized data and analytic services for open and licensed big bibliometric data. To better understand the needs of our users, including but not limited to Scientometrics/Science of Science researchers, librarians, and other research technical service providers, we would like to take the opportunity announce our CADRE fellowship program.

As a CADRE Fellow, you will

Gain access to the latest bibliometric data sets, including Web of Science and Microsoft Academic Graph Receive data and technical support for your project, including training webinars on CADRE tools and data sets Join the CADRE community with other fellows, and share your ideas and feedback with the CADRE team on Slack channels and in GitHub repositories Have early access to free cloud computing resources as we update and test different components of the CADRE platform Receive travel scholarships to present your work at primer venues Applications are due May 31.

Rutgers University Libraries are part of the Shared BigData Gateway for Research Libraries, a public-private partnership led by Indiana University Libraries and the IU Network Science Institute. To learn more, visit the IU news site.

Source: Rutgers University Library News

IU Leading Partnership to Create Research Database

Oct. 18, 2018
An effort to create a secure online database for academic resources has received a boost after being awarded nearly $850,000 from the Institute of Museum and Library Sciences.

The funding will support the $2 million Shared BigData Gateway for Research Libraries, which will enable researchers to gain access to data through the cloud-based platform being created by a partnership led by the IU Libraries and IU Network Science Institute.

Additional funding is coming from eight other universities in the Big Ten, Big Ten Academic Alliance, National Science Foundation’s Big Data Regional Innovation Hubs program, along with private companies: Microsoft Research and Clarivate Analytics.

“This project exemplifies the role of libraries in the information age,” said Jamie Wittenberg, research data management librarian and head of scholarly communication at IU Libraries, who will direct the project. “Our mission is to efficiently and effectively connect researchers with the materials they need to advance innovation and discovery. The Shared BigData Gateway for Research Libraries will open up the power of data mining to everyone, not only people with specialized expertise.”

The Shared BigData Gateway will also offer a “front door” to allow members to ask for bibliometric data analysis through an online form. The project aims to automate many required tasks to complete the research. Another feature allows the sharing of data, including software code, algorithms, methods and workflows. A video of the project can be seen below.

Source: Inside INdiana Business

IU will lead $2 million partnership to expand access to research data

Oct. 18, 2018
IU Libraries and IU Network Science Institute are leading a public-private partnership to create the Shared BigData Gateway for Research Libraries.

Students, faculty and researchers across the Midwest and beyond will gain critical access to new research data through a cloud-based platform whose construction has been made possible under a large-scale partnership led by the IU Libraries and IU Network Science Institute.

A $2 million project to create a secure online database for academic resources, the Shared BigData Gateway for Research Libraries has been awarded nearly $850,000 from the Institute of Museum and Library Services, the primary federal funding agency supporting the nation’s libraries and museums. Additional support comes from eight other universities in the Big Ten; the Big Ten Academic Alliance; the National Science Foundation’s Big Data Regional Innovation Hubs program; and two private companies: Clarivate Analytics and Microsoft Research.

Watch a video about the project “This project exemplifies the role of libraries in the information age,” said Jamie Wittenberg, research data management librarian and head of scholarly communication at IU Libraries, who will direct the project. “Our mission is to efficiently and effectively connect researchers with the materials they need to advance innovation and discovery. The Shared BigData Gateway for Research Libraries will open up the power of data mining to everyone, not only people with specialized expertise.”

“The combination of technical expertise and investments represented under this partnership will support a cyberinfrastructure that advances research across the Midwest and beyond,” added Patricia Mabry, a senior research scientist at the IU Network Science Institute and a co-director on the project. “We’re also taking steps to support the effort though workshops that cultivate a community of researchers and librarians who will cooperatively play a role in the project’s future development and growth.”

The university partners are Michigan State University, Purdue University, University of Iowa, University of Michigan, University of Minnesota, The Ohio State University, Pennsylvania State University and Rutgers University. Additional project co-directors are Valentin Pentchev, director of information technology at the IU Network Science Institute, and Xiaoran Yan, an assistant research scientist at the institute.

IU currently offers access to some of the resources that will open up to new partners through a system developed by the IT team at the IU Network Science Institute. Led by Pentchev, the Secure Enclave for Critical Data is the nation’s first universitywide implementation of the entire Clarivate Analytics Web of Science, a private database with over 68 million records spanning more than 100 years.

A groundbreaking work of software engineering, the strength of the university’s secure system was a key factor in garnering grant support from IMLS. The award will make possible the use of cloud technology to scale out the enclave with additional open data sets, extending access to every research library in the country.

The first new materials to be added to the Shared BigData Gateway are a copy of records of the U.S. Patent and Trademark Office, which contains data on publicly available patents and intellectual property, and the Microsoft Academic Graph, a public database of 160 million scientific records.

Access to these resources will be based on a federated security system that will enable users from multiple organizations to access the system with their institutional usernames and passwords. Members of the Big Ten Academic Alliance will use the gateway to access and mine shared Clarivate XML citation data, purchased cooperatively in 2017. Some data sets will be accessible to anyone with a .edu email address.

The ability to deeply analyze connections between these texts will support bibliometric research, a growing field that plumbs the world’s increasingly large and complex databases to reveal the underlying structural forces that affect the production of scientific knowledge. This work — often called the “science of science” — has shed light on a wide range of subjects. For example, bibliometric analysis has helped reveal the depth of women’s historical contributions to science and the influence of large-scale historical events on research activity.

In addition to data access, the Shared BigData Gateway will provide a user-friendly “front door” through which the partner institution members can request bibliometric analysis of data in the system through an online form. The project will automate many complex and time-consuming tasks that were previously required to conduct this research.

Another important feature of the system is the power to share data. Individuals who use the platform will not only be able to share the results of their analyses, but also the software code, algorithms, workflows, methods, and the specific software versions and configurations used to run their analyses. This is critical for making the work reproducible — as well as helping the original researchers refine their methods for other projects.

Also contributing expertise to the project will be IT experts at Microsoft, Clarivate and several units at IU, including the Research Data Services group; Science Gateways Research Center; Pervasive Technology Institute; and University Information Technology Services, or UITS. UITS will also contribute to the Shared BigData Gateway through access to the university’s supercomputing resources and cloud-computing platform, Jetstream.

Other organizations supporting the project include the Greater Western Library Alliance and the Private Academic Library Network of Indiana.

Additional quotes “As centers of learning and catalysts of community change, libraries and museums connect people with programs, services, collections, information and new ideas in the arts, sciences and humanities,” said Kathryn K. Matthew, director of the IMLS. “IMLS is proud to support their work through our grant-making as they inform and inspire all in their communities.”

“Co-investment in infrastructure accelerates our ability to recognize and support new forms of inquiry in scholarship,” said Kimberly Armstrong, director of the libraries initiative at the Big Ten Academic Alliance. “Given faculty interest in text mining of the cooperatively purchased citation data, we are glad to support delivery of a tested access solution across the Alliance.”

Source: News at IU

Ithaka’s December Big Data Infrastructure Report

Last month, Ithaka S&R published “Big Data Infrastructure at the Crossroads: Support Needs and Challenges for Universities”. CADRE typifies the type of big data infrastructure that this report celebrates and encourages. What direction or inspiration does this report give to CADRE’s future work? Here are some of our key takeaways.

1 Human networks

“Human networks are as essential to big data research as computing networks”. Cultivating and managing collaborative projects is incredibly difficult across borders of discipline, department, and institution. CADRE wields a talented, hard-working outreach team that consists of:

  • Outreach Coordinator Maks Szostalo
  • Research Scientist Filipi Silva
  • Data Librarian & Network Scientist Ethan Fridmanski

This interdisciplinary group completes outreach tasks under the direction of our Executive Director, Jaci Wilkinson, a librarian whose background is in user experience and digital content strategy. Our number one strategic priority in the next year is to identify new CADRE user groups, particularly at current member institutions. This requires extensive research to locate labs, institutes, libraries, and other networks that live in occasionally isolated pockets within their respective institutions. CADRE has untapped potential for big data researchers in the following areas:

  1. Innovation science
  2. Computational social science
  3. Digital humanities/history of science
  4. And more!

In another portion of the Ithaka report, they write, “Many researchers expressed hopes that more repositories would become available, or that their libraries would purchase more subscription databases.” Here at CADRE, we’re certain there are researchers who would find value in using datasets available in CADRE, whether it be Microsoft Academic Graph, USPTO patent data, or Web of Science, who just don’t know we exist yet.

2 “It honestly hadn’t occurred to me to think of the library as a resource for big data tools”

We need to change the perception that libraries are simply beautiful book buildings. Libraries are hubs of resources and expertise that can meet researchers’ sophisticated big data needs! CADRE is rooted in the innovation and values of academic libraries. Membership is administered by libraries and our leaders are librarians. Unfortunately, Ithaka reports that there is “uneven awareness and modest use” of the data management staff and programming that libraries have been heavily investing in. Here at CADRE, we are re-calibrating our outreach efforts in two important ways: Helping our library partners conduct outreach with the rest of their campus community to ensure maximum CADRE use, and Extending our own outreach to other institutes, labs, and centers, particularly within current member institutions.

3 Cloud computing… We know… it’s expensive

For financial and technical reasons, many university labs create and maintain their own computing infrastructure. This decentralization results in duplicative effort across academia. Here at CADRE, we host all of our datasets in the Cloud using Azure and Amazon Web Services. This is incredibly useful for our researchers but it is also expensive and difficult to budget for since charges are based on use. Over 50% of our budget goes to cloud computing. We’re exploring tools to reduce our cloud computing costs and membership modifications so we don’t have to pass cloud computing surcharges to our members. More recent data enclaves, such as OpenAlex, have adopted an open access, self-hosting model. That means each user/institution sets up and pays for its own cloud computing storage. Is the OpenAlex model a more sustainable future for big dataset administration? CADRE is paying close attention to OpenAlex and our other peers in this quickly changing environment.

4 Help us fuel the future of CADRE

A key recommendation to funders found in this report is, “Continue to support the robust development of data repositories.” We couldn’t agree more! We’re seeking grant and private funding to help build off of the already vital work CADRE has done over the past three years to centralize the infrastructure to host USPTO, MAG, and Web of Science datasets. There’s more we can do to optimize our value to our current and future member community. But our current budget comes only from membership fees and barely sustains the costs to host CADRE. Reach out to us at cadre@iu.edu if you want to discuss new funding and strategic directions.

Final Draft of the Cadre Roadmap, October 2021–January 2023

8/30/21

The CADRE Leadership Team has developed a stop gap strategic plan, the CADRE Roadmap, that will govern CADRE’s work from October 2021 – January 2023. This document is the culmination of extensive research, focus groups, and interviews with CADRE stakeholders. The road map is available as a 5pp PDF.

CADRE wants to continue to add datasets that are useful for our researchers and partner institutions. Do you have a dataset you’d like to be considered for CADRE? Let us know.

USPTO Patent Data Coming to CADRE

4/9/21

The latest dataset is making its way to CADRE: U.S. Patent and Trademark Office’s (USPTO) patent dataset will be available for easy querying on the CADRE Gateway on April 15, 2021.

The dataset is part of CADRE’s free tier, so anyone can gain access to 7.4 million patent application documents from 1976 to 2020, which include titles, abstracts, and referenced works regardless of your institutional affiliation.

Those researchers focused on intellectual property, innovation, and collaboration, among other topics, may be interested in this dataset. We originally identified the USPTO patent dataset because of the known needs of the Big Ten Academic Alliance researcher community. Open and accessible government datasets also contribute to better transparency and accountability in government agencies.

Once the USPTO patent data are available on the platform, users will be able to run CADRE’s GUI query builder and data-analysis and -visualization tools on the data. We’ll show researchers how to use CADRE to mine USPTO data at our webinar on April 22.

Coming soon

Next up, that CADRE team will be working on functionality that allows users to attach Digital Object Identifiers (DOIs) to their packages, making the research citable and reproducible. We’ll also be updating our Web of Science and Microsoft Academic Graph datasets and finalizing a querying enhancement that allows users to perform an address search against datasets to track publications from their own institutions’ authors. Stay tuned!

CADRE wants to continue to add datasets that are useful for our researchers and partner institutions. Do you have a dataset you’d like to be considered for CADRE? Let us know.