Copyright and Data Curation

Digital technologies have engendered new research methodologies that can render mass collections or assemblages of things as data and analyze them as such. Things such as images, the millions of books on Google Books, or commercial databases of scholarly research articles that were originally created to be viewed or read can now be mined for data, coded, and analyzed statistically.

These new technologies and research methods, like many technologies before them, raise concomitant copyright issues and questions. In addition, the advent of open data policies from the U.S. government, foundations, and other grant funders have also raised questions from researchers about who owns data; what, if any legal protections exist for data; and how other researchers may use such data? These questions arise throughout the life cycle of data, from its creation, to archiving it, and its possible licensing for use by other researchers.

Data and its curation clearly raise other legal issues as well, including privacy, cybersecurity, trade secrets, and patent law. In the context of copyright law, data implicates issues about the subject matter and ownership of copyright, or what is copyrightable, and who owns the copyright in copyrightable intellectual property.

Data v. Databases

By data, I mean the raw content of assembled, collected, or generated stuff to be subjected to statistical analysis and interpretation. Illustrations or representations of the analyzed data in tables, charts or graphs, present related but separate copyright issues.

By databases, I am referring to the organization of the data, its relationship to different data elements, or how the data is organized in a structured set of data, typically stored in a computer, and made accessible and manipulable by means of software applications.

Copyrightability of Data and Databases

U.S. copyright has very little to say, at least not directly, about either data or databases. Instead, copyright law provides a framework for establishing the subject matter of copyright – or what is copyrightable – and who owns copyrightable intellectual property once it has been created. Copyright law then provides certain protections for that copyrightable intellectual property in the form of specifically enumerated exclusive rights granted to copyright owners.

Under the law, copyright protection is granted to “…original works of authorship fixed in any tangible medium of expression…” A lot of data will not be copyrightable because it does not meet the first requirement for copyright protection, namely, originality. While many sources of data, such as images or texts in a database, are of course copyrightable, the data generated from those sources, as well as other data sets generally, does not constitute an “original work of authorship,” as described by the Copyright Act and litigated in numerous cases. This might not make sense to a lot of researchers: if a researcher designs an experiment or study, runs experiments or conducts surveys, collects and compiles the data, isn’t that original, and aren’t they the author of it? Yes, in a certain sense, but not in the sense that is important for copyright. Copyright is intended to incentivize the publication and distribution of creative works. Facts and data aren’t considered original works of authorship because they are not “created” so much as they are “engendered” by or are a result of a researcher’s methods. They are discovered and compiled, and copyright does not reward that effort.

Moreover, data is typically factual or informational, and U.S. copyright does not protect facts or information. It is not possible to copyright facts, ideas, procedures, processes, methods, systems, concepts, formulas, algorithms, principles or discoveries, although such things might be protectable by patent law.

Similarly, while U.S. copyright law does protect compilations, Congress has not seen fit to extend copyright protection to databases themselves. There could nevertheless be a thin layer of copyright protection in a database, premised on choices regarding what data to include in the database, the organization of the data, or defining the relationships between different data elements. Such creative decisions potentially meet the requirements for copyrightability and copyright protection.

Ownership and Protection of Data and Databases

Because of the varying degrees of copyrightability of databases and data content, and because copyright only protects copyrightable works, different strategies are required to manage the ownership and protection of data and databases. Copyright can govern the use of databases and some data content (that is “an original work of authorship”), but other mechanisms must be relied on to regulate access to and the use of data and databases, typically on the basis of access controls by means of authentication, and contracts and licensing agreements to restrict the extraction and reuse of the data, or other contents of a database.

Data Curation and Licensing

Ideally, repository collections of data will provide information regarding the terms of use for the database and its data content. The Open Data Commons group (http://opendatacommons.org) has developed three standard licenses based on copyright and contract principles. They are:
1. Public Domain Dedication and License (PDDL): This dedicates the database and its content to the public domain, free for everyone to use as they see fit.
2. Attribution License (ODC-By): Users are free to use the database and its content in new and different ways, provided they provide attribution to the source of the data and/or the database.
3. Open Database License (ODC-ODbL): ODbL stipulates that any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses.

Leveraging the License: Part II

From 2015 to 2017, I served as co-chair of the Bloomington Faculty Council (BFC) Library Committee. The committee worked for two years to pass, by a unanimous vote of the BFC, the IU Bloomington Open Access Policy.

During my time as co-chair, I spoke with dozens of faculty members, including department chairs and administrators, about the policy. In addition to touting the benefits of Open Access, such as more exposure and potential impact for the scholarship of faculty authors achieved by means of free access and long-term preservation, I routinely described the Open Access as ‘symbolic’ and ‘heuristic’.

By symbolic, I wanted to suggest that adoption of the policy would add the moral authority of another large public research university, such as Indiana University – Bloomington, to the list of U.S. colleges and universities who have adopted such policies.

By heuristic, I meant to express my view that the policy would – and now does – provide an impetus for faculty to think about how they might like to be able to reuse their work in other ways that could be professionally beneficial to them, besides simply transferring their copyright to a journal publisher in return for publication of their scholarly articles. Such uses could include freely distributing their publications through their own professional website, via social media, by means of an institutional or discipline-specific Open Access repository, or simply making them available to students in their classes. The IUB Open Access Policy fosters this goal by providing an institutional mechanism for retaining at least some of a faculty member’s copyrights in their scholarly work.

The policy is not a mandate. Faculty are not required to make their work Open Access. Under the policy, each IUB faculty member grants for themselves, at their discretion, the non-exclusive license articulated by the policy, which permits the university to make their scholarly “articles freely and widely available in an open access repository, provided that the articles are not sold, and appropriate attribution is given to authors.” Because authors can only license their work to the university in keeping with the Open Access policy if they retain enough of their rights to do so, the prior license granted in the policy provides leverage for a faculty member to use when negotiating publishing agreements with journal publishers. This is why Open Access policies, like IUB’s, which are modeled on Harvard University’s policy, are also often referred to as rights-retention policies.

While many publishers now have self-archiving policies that are consistent with the requirements of institutional and government-mandated Open Access polices (see http://www.sherpa.ac.uk/romeo/index.php), it might still be necessary to negotiate with publishers to achieve those ends. If you choose to negotiate your copyright with your publisher, the following suggested statement can be used to begin the discussion:

“Journal acknowledges that Author retains the right to provide a copy of the final manuscript, upon acceptance for Journal publication or thereafter, for compliance with the Indiana University Open Access Policy and for public archiving in IUScholarWorks as soon as possible after publication by Journal.”

This language can be added to amend a journal publishing agreement. Alternatively, IU provides a suitable form of addendum used in copyright negotiations at Big 10 Academic Alliance (formerly CIC) institutions. SPARC, the Scholarly Publishing and Academic Resources Coalition, also offers an author addendum with supporting documentation. Whether you use one of these addenda or not, the license to IUB will have force, unless you complete the opt out process. For information about opting out or obtaining a waiver letter, visit https://openscholarship.indiana.edu/.

A faculty author could have legitimate reasons to elect to opt out of the Open Access policy. One of the most prevalent reasons is the inclusion of third-party intellectual property quoted or included in a scholarly article under license from a copyright owner. Some common examples include an image or a musical excerpt. Licensing such content can be prohibitively expensive if the article is to be published in an Open Access repository. And while it is possible to deposit a faculty author’s final edited version of a scholarly article without any third-party content that exceeds fair use or is covered by a licensing agreement, an author might legitimately be concerned that the value of their article would be undermined by doing so. If an author cannot secure a license to make third-party intellectual property included in their work available with their article in an Open Access repository, they should opt out of the policy when reporting their work in their annual review on DMAI.

For help with author addenda or other intellectual property issues related to the IUB Open Access policy, please refer to the policy FAQ, or email nazapant@indiana.edu.