Collections as Data

A portal to Williams Libraries collections as datasets. Explore pre-packaged data or request bespoke datasets for your research.

What is Collections as Data?

Collections as Data is the concept of using digital collections, digital objects, and their descriptive metadata, made available as datasets, to perform computational analysis. 

  • A text file or corpus comprised of all Williams unrestricted theses for a particular academic department
  • All digital issues of the Williams Record
  • Metadata describing translated books in the catalog along with which languages they were translated from/to and how many times they were checked out

Collections as Data might consist of the content of the library, archival resources, unrestricted records of the College themselves or might consist of information about those resources. From this data, a researcher may be able to derive quantitative measures, new datasets, or even predictive models.

Often these datasets need to be extracted from our systems and cleaned or reformatted to make them usable for different types of analysis. If you are interested in data that is not already available on-demand, please see the "Request Data" page for ways to contact us. We can discuss with you whether that data is available, the extent of that data and the format(s) it can be provided in. You might be surprised by just how much interesting data we can offer!

Collections as Data Values

  • Collections as data aims to encourage computational use of digitized and born-digital collections
  • Collections as data stewards are guided by ongoing ethical commitments
  • Collection as data aims to respect the rights and needs of the content creators, those represented in collections, and the communities that use them
  • Collections as data development values interoperability
  • Developing collections as data is an ongoing, iterative process