20 Jun National Emergency Library Closes, But Open Library Continues to Infringe Copyrights
In mid-June, the Internet Archive terminated their National Emergency Library (NEL)two weeks prematurely in response to a lawsuit brought by publishers Hachette, Penguin Random House, Wiley, and HarperCollins. Publishers were not the only parties concerned with the copyright issues raised by NEL. When the NEL launched in March, the Authors Guild issued a statement saying that they were “shocked that the Internet Archive would use the Covid-19 epidemic as an excuse to push copyright law further out to the edges, and in doing so, harm authors, many of whom are already struggling.” However, a more significant issue for copyright holders remains unresolved: the Open Library, and it’s “controlled digital lending” model. For visual artists, Open Library poses a particular threat.
The NEL was an extension of the Open Library, and permitted lenders immediate access to digital copies of millions of books in Open Library’s collection. NEL was intended to give students, academics, and basically anyone able to set up an Open Library account unfettered access to books during a period when public health measures had shuttered brick-and-mortar libraries. Open Library permits registered users to “borrow” up to five digitized or ebooks books for a two week period. In pivoting to the NEL , they did away with the waiting period, meaning that users wanting to access books which were already “borrowed” no longer needed to wait until the ebook had been “returned.”
Open Library is an initiative of the Internet Archive, and seeks to “make all the published works of humankind available to everyone in the world.” Their efforts towards that end have been prodigious: by their own account, they have scanned over 1.4 million books. As an example of their reach, their website was visited by between 11 and 12 million visitors in a recent two week period, during which over 400,000 books were “borrowed” from their collection.
The Internet Archive has made entire works of visual artists – complete photographs and illustrations published within a scanned Open Library offerings – available to bad actors in a usable JPEG format.
Controlled Digital Lending: Cutting Out the Copyright Holder
The process by which the library lends books is “controlled digital lending” (CDL), which resembles the licensed ebook lending model libraries currently employ. Under CDL, the Open Library scans donated, bought, or lent books and uploads them to their server. That ebook is copied for each online borrower requesting the book (up to five, according the the Open Library). The difference between CDL and how traditional libraries operate is that traditional libraries purchase licenses for ebook lending. Those licenses emulate the experience of borrowing physical books: only one digital copy at a time can be borrowed, and that the borrower can’t access that digital copy after their borrowing time has ended (unless they borrow the book again). More importantly, the licenses negotiated by traditional libraries ensure that publishers and copyright holders control where and how their works are distributed, and are renumerated.
CDL, however, has written the copyright holders out of the equation. The books which Open Library scans have been donated or lent by libraries and users, acquired from closed libraries or library discards, or purchased. No effort appears to have been made to limit the contributions to books which are out of copyright. These books are scanned and uploaded to the Library’s servers, and converted to audiobooks, without obtaining permission from the copyright holders. In contrast, in ebook lending as exercised by traditional libraries, publishers and authors retain control of their works. They decide whether their books will be digitized and when, and whether they will be converted to audio books. Copyright holders derive an income from those additional licenses for ebook and audio book publications. Publishers and copyright holders also negotiate with libraries the licenses for digital lending of those ebooks and audio books.
About half of the material being sorted at the Internet Archive’s physical archive warehouse for the software collection.
Jason Scott / CC BY
Neither Controlled nor Lending
The National Writers Union (NWU) has written a comprehensive article on how the CDL operates. Their conclusion is that CDL is neither controlled nor lending. CDL is in fact only one of five ways that Open Library distributes books:
- Ebooks assembled from page images: each book is scanned, page by page, and the page images are assembled into an epublication, which is then copied and distributed to borrowers, using digital rights management to limit the number of borrowers. When “borrowed,” the books are downloaded as complete JPEG collections onto the borrower’s computer, and are not automatically deleted when the borrowing period is ended.
- Optical character recognition software is used to convert the scanned ebooks to a text file, which is then processed with text-to-voice technology to create an audio work, which can be streamed by any registered Open Library user.
- The books can be viewed as page images on Openlibrary.org by any registered user, using a “viewer” tool which actually downloads the JPEGs of the page images to the user’s hard drive.
- The books can be viewed as page images on internetarchive.org, and since Internet Archive doesn’t require users to create an account, there is no limit the number of downloads the user can make of each page. The pages are made available on the Internet Archive as a means of providing a “preview” of book. However, since each individual page image has a unique URL, savvy users can access additional pages by simply substituting page numbers in the URL. Internet Archive also instructs Wikipedia users on how to link directly to individual pages. NWU speculates that Wikipedia constitutes the majority of traffic to (and least controlled use of) the book page images on Internet Archive.
- Internet Archive has created an API (application programming interface) which permits programmers to automate searches, discover URLs, and download book page images.
Of Particular Concern to Visual Artists
The API created by Internet Archive has been already utilized to selectively search out and download all book page images which contain illustration, graphics, or photographs. In 2014, Python programmer Stephen Krewson published an article on how to extract only book page images containing pictures from the digitized books on Internet Archive. (Krewson falsely identifies the books as “public domain – most of the volumes on Open Library are still under copyright.) Essentially, Internet Archive’s API provides determined programmers a way to scan through the millions of book pages and snarf up all pages containing pictures and graphics. This would enable bad actors to create and monetize libraries of digital images. The copyright holders and creators of these works for the most part wouldn’t even be aware that their works have been made vulnerable by the Internet Archive’s practices.
Even without the API, visual artists are compromised by Open Library’s lending model. When accessing a book, the entire book – the JPEGs of each page – are downloaded to the user’s hard drive. “Returning” the book doesn’t delete that collection of JPEGs, which remains on the borrower’s hard drive, and can be accessed by anyone capable of accessing recent files. In fact, from NWU’s tests, Open Library deems a book “returned” at the end of the two-week borrow period, whether or not the borrower clicks onto “return” button.
This means that visual artists – illustrators, photographers, graphic designers, etc. – are particularly vulnerable. The Internet Archive has made their entire work, published within a scanned Open Library offering, available to bad actors in a usable JPEG format. There is little to stop someone from utilizing a script as described above, or scan through the JPEGs stashed on their computer, to create entire libraries of photographs and illustrations.
A book scanner at the Internet Archive headquarters in San Francisco, California.
Reaction from Authors
Authors, led by associations representing writers, have publicized their dismay with Internet Archive and the Open Library Association. In January of 2019, the Authors Guild issued an open letter to the Internet Archive protesting the Controlled Digital Lending model. The letter described how writers who contacted Open Library with takedown notices for their work received a reply referencing a white paper and position statement written by legal scholars in support of the CDL. It’s an argument the Authors Guild (and artist advocates) believes is misguided, based a misunderstanding of fair use and an outdated understanding of the book market. In the UK, the Society of Authors issued their own open letter to the Internet Archive, echoing many of the Authors Guild’s concerns.
In February 2019, NWU coordinated a group letter to librarians and library associations titled “An appeal to readers and librarians from the victims of CDL.” 40 other organizations representing writers, illustrators, graphic artists, and photographers signed the letter, including the Graphic Artists Guild. NWU also pulled together the comprehensive FAQs on CDL, explaining how the CDL process works and why creators have issues with it. In April of this year, the Science Fiction & Fantasy Writers of America issued a copyright infringement alert to their members. The Copyright Alliance published an article on the harm the Internet Archive is doing to authors, and issued a statement in support of the publishers’ lawsuit against NEL.
Both the Authors Guild and NWU have worked with international associations raise awareness of Open Library’s copyright infringing activities. The International Federation of Reprographic Rights Organizations (IFFRO) joined NWU’s appeal letter and issued a public rejection of the Internet Archive’s CDL distribution model. The International Authors Forum (IAF) published an article calling for authors to sign the Authors Guild letter. (The Graphic Artists Guild, Authors Guild, and NWU are all members of IFRRO and IAF.)
To learn more about CDL and how it damages authors, including visual artists, view NWU’s webinar on the topic. Hosted by Edward Hasbrouk, the webinar provides an in-depth analysis of CDL. The Copyright Alliance’s article includes a linked list to numerous calls to action, blog articles, media coverage and op-eds, and statements.
Featured photograph: Book scanning stations at the Internet Archive’s San Francisco headquarters.
Steven Walling / CC BY-SA