The CADRE team is hard at work developing a platform that will do what academic libraries have long been trying to achieve.
We are gearing up for ISSI 2019 in September, where CADRE will hold a workshop and tutorial. Our hands-on CADRE tutorial at ISSI will offer an option to use assisted programming to access Microsoft Academic Graph (MAG), as well as a second option that will allow access to the dataset using the CADRE Query Builder, which uses a graphical user interface.
But building a platform that allows novice coders to easily query massive datasets with a GUI is a lengthy process of trial and error—and that is only one component of CADRE.
Building CADRE is a complex and fluid task: Along with the Web of Science (WoS) and MAG datasets, CADRE will include U.S. patent and trademark data. And more datasets will be added to the platform as different types of researchers request access.
IUNI Lead Software Engineer Ben Serrette says because of the potential to take on more datasets, software solutions must be as generic and adaptable as possible. Like fitting and refitting the pieces of a complex puzzle that keeps changing shape, the IUNI IT team is solving multi-faceted problems with a flexible approach on an enormous scale.
Find out how they’re doing it below.
Ruling out what doesn’t work: The IT team narrowed down the many serverless technology options that cloud-computing platform Amazon Web Services (AWS) offers by eliminating the ones that don’t fit the bill in terms of cost or ability to interface with other CADRE technical components.
Designing fundamental cloud architecture: CADRE’s infrastructure of cloud-based virtual machines and AWS services has been developed.
Integrating Jupyter Notebooks & file storage: Advanced users can write their own code to create data-analysis tools in CADRE’s notebook. Jupyter Notebooks is up and running with a working file system for storing code.
Testing the query builder: One service essential to CADRE is the GUI users can use to easily query massive datasets. The IT team is testing the combined powers of a relational database and various graph databases with MAG data to create a more efficient query-builder.
Building CADRE’s website: The IT team is finishing the front-end of some pages of the CADRE website, including the homepage and the event page for ISSI 2019. They are preparing to make the website live in a couple weeks.
If you want to stay updated on what CADRE is doing, be sure to follow us on Twitter.
The Collaborative Archive & Data Research Environment (CADRE) accepted its first class of CADRE Fellows.
These seven fellowship teams span across disciplines and offer compelling research that incorporates big data and bibliometrics. Each fellow team will access CADRE’s Web of Science (WoS) and Microsoft Academic Graph (MAG) datasets to achieve their research goals.
Our fellows will present their research at the International Society for Scientometrics and Informetrics (ISSI) 2019 Conference in Rome at either the workshop or tutorial that CADRE is hosting on Sept. 2.
Not only will these fellows show how CADRE helped advance their work, they will serve as integral use cases for how we develop our platform to suit the needs of every type of academic researcher.
We plan to accept fellows on a rolling basis in the future, as spots become available. If you are interested in applying, email us at email@example.com.
Now, let’s meet the research teams!
Utilizing Data Citation for Aggregating, Contextualizing, and Engaging with Research Data in STEM Education Research from Purdue University
Michael Witt, associate professor of library science, Purdue Libraries and School of Information Studies, Purdue University
Loran Carleton Parker, associate director & senior evaluation and research associate, Evaluation Learning Research Center, Purdue University
Ann Bessenbacher, research associate and data scientist, STEMEd HUB, Purdue University
Researchers will characterize citation of data from the literature in the field of STEM education research. A sample of relevant publication venues in the field will be identified from WoS and MAG. Digital Object Identifiers (DOIs) of datasets registered with DataCite will be used to query and associate datasets with publications. The team will assess rates of citation for datasets that are cited using DataCite DOIs for each publication venue and analyze a sample of data citations and publications to determine suitability for providing an initial context to help a researcher who is unfamiliar with the data determine whether to use the dataset.
Understanding citation impact of scientific publications through ego-centered citation networks from Indiana University Bloomington, Nanjing University
Yi Bu, Ph.D. candidate in informatics, Indiana University Bloomington
Chao Min, research assistant professor in information management, Nanjing University in China
Ying Ding, professor of informatics, Indiana University
The research team seeks to find the “deeper” and “broader” impact of network-based citation measurements in the scientific community. This project will determine the citation impact of scientific publications using an ego-centered citation network, which contains the citing relationships between a publication and its citing publications, as well as the relationships within its citing publications. Researchers will use the entirety of the WoS and MAG data to establish empirical evidence in this project.
MCAP: Mapping Collaborations and Partnerships in SDG Research from Michigan State University
Jane Payumo, academic specialist and research and data evaluation manager, MSU AgBioResearch, Michigan State University
Devin Higgins, digital library programmer, MSU Libraries, Michigan State University
Scout Calvert, data librarian, MSU Libraries, Michigan State University
Guangming He, information management analyst, MSU Innovation Center, Michigan State University
This project will build on the WoS report “Navigating the Structure of Research on Sustainable Development Goals (SDG),” as the researchers search for patterns of global collaboration and support the United Nations’ SDG call for action. Researchers will design a prototype to analyze and visualize the input-output of partnerships over time in SDG-supportive research. They also plan to create a scoring measure or partnership index that defines and conducts partnership analytics for SDGs by using data sourced from WoS and MAG.
The global network of air links and scientific collaboration – a quasi-experimental analysis from Indiana University Bloomington and University of Warsaw
Katy Börner, Victor H. Yngve distinguished professor of engineering & information science, Indiana University Bloomington
Adam Ploszaj, assistant professor at the Centre for European Regional and Local Studies, University of Warsaw
Lisel Record, associate director, Cyberinfrastructure for Network Science Center
Bruce Herr II, senior system architect and project manager, Cyberinfrastructure for Network Science Center
Researchers plan to determine the impact of the introduction and availability of long-distance flights on international scientific collaboration. The team will measure collaboration through co-authorship and co-affiliation. They will also geocode publication affiliations from WoS and MAG from 1998 through 2017. This quasi-experimental research will apply state-of-the-art causal modeling techniques and explore how data-driven causality can enhance science of science policy relevance.
Measuring and Modeling the Dynamics of Science Using the CADRE Platform from University of Minnesota, New York University, Boston University, University of Pennsylvania, University of Arizona
Russell Funk, assistant professor of strategic management & entrepreneurship, University of Minnesota
Thomas Gebhart, Ph.D. student in computer science and engineering, University of Minnesota
Michael Park, Ph.D. student in strategic management and entrepreneurship, University of Minnesota
Julia Lane, professor at Wagner Graduate School of Public Service, New York University
Raviv Murciano-Goroff, assistant professor at Questrom School of Business, Boston University
Matthew Ross, research assistant professor at Wagner Graduate School of Public Service, New York University
Britta Glennon, assistant professor at Wharton School, University of Pennsylvania
Erin Leahey, professor and director of sociology, University of Arizona
Jina Lee, Ph.D. student in sociology, University of Arizona
This research team wants to better characterize scientific influence of papers, typically measured by how many times papers are cited, by distinguishing between papers that destabilize existing knowledge with novel concepts and papers that consolidate existing knowledge. In a separate but closely related aim, the researchers also plan to create a novel unsupervised machine learning technique for author-name disambiguation by pulling abstract, title, and citation data from WoS and MAG. For both aims, the CADRE platform will provide essential infrastructure in terms of large-scale data storage and high performance computational resources.
Comparative analysis of legacy and emerging journals in mathematical biology from University of Michigan and University of Michigan Medical School
Marisa Conte, assistant director of research & informatics, Taubman Health Sciences Library, University of Michigan
Samuel Hansen, mathematics and statistics librarian, Shapiro Science Library, University of Michigan
Scott Martin, biological sciences librarian, Shapiro Science Library, University of Michigan
Santiago Schnell, John A. Jacquez collegiate professor of physiology, University of Michigan Medical School
Researchers will perform a comparative analysis on papers published in four mathematical biology legacy journals and on newer journals with different publication models and disciplinary scope. The team will use the CADRE datasets to develop methodologies for comparative bibliometrics and content analyses; provide insight into publication trends in theoretical and applied domains; give authors new factors to consider when trying to publish; and help editors in similar disciplines use informatics to distinguish their journals.
Systematic over-time study of the similarities and differences in research across mathematics and the sciences from University of Michigan
Samuel Hansen, mathematics and statistics librarian, Shapiro Science Library, University of Michigan
Samuel’s project uses reference and citation aging, bibliographic coupling, and network breadth and depth to find similarities and differences between research fields in mathematics and the sciences. Specifically, they will find how information ages differently across disciplines, generate data about changes in the development of these research fields, and study how actively collaborative the disciplines are. Samuel will use WoS data from 1900 to 2017 to perform these analyses, which have typically only been done on a smaller scale in a single discipline.
The Collaborative Archive & Data Research Environment (CADRE) is extending the deadline to apply for the CADRE Fellowship Program to June 25. Academic researchers and librarians from any institution are invited to apply.
If you are not familiar with CADRE, we are an IMLS-funded project that provides sustainable, affordable, and standardized data- and text-mining services for licensed big datasets, as well as open and non-consumptive datasets too large or unwieldy to work with in existing research library environments. CADRE offers academic researchers access to these data in a secure cloud-based platform.
The benefits of being a CADRE Fellow include:
Full travel support to present your work at ISSI 2019 in Rome this fall,
Free and early access to our cloud-computing resources,
Access to big bibliometric datasets, including the Web of Science and Microsoft Academic Graph,
And training and technical support for the CADRE platform and for your project.
Please note, your affiliated institution must license the Web of Science data for you to access it—you can contact firstname.lastname@example.org to check if you have access. Microsoft Academic Graph is available to everyone.
This fellowship program will help the CADRE team form expansive relationships with researchers, librarians, and data providers to gain critical feedback on developing the CADRE platform. As such, you do not need to have extensive programming experience to use CADRE. The platform will provide a user-friendly graphical user interface for data querying.
Applicants can form research teams consisting of graduate students, staff, and faculty from any U.S. or non-U.S. university—and teams can span any discipline and institution. You may also submit a research proposal without a team.
Sound interesting? Submit your CADRE Fellowship proposal here by June 25.