How to Find and Work with Data

A collection of resources on finding, accessing, evaluating, and working with data responsibly, critically, and ethically.

Hi! I'm here to help!

A cartoon face with brown wavy hair, green eyes, and glasses.

Régan Schwartz
Research and Instruction
Librarian

rms8@williams.edu
413-597-3085

Pronouns: she/her

Evaluating Data

The ability to assess the quality and fit for purpose of a data set is a key component of data literacy. Evaluating the data and our own use of it is one step towards not perpetuating systemic injustice in our work.

Evaluating data is similar to evaluating other sources of information. We ask many of the same initial questions about how that data came to exist that we would ask of any piece of scholarship. However, we first look for the answers to these questions in the data's documentation.

  • Good documentation can be the first sign of good quality data
  • Data documentation can be a:
    • guide
    • codebook
    • readme.txt file
  • Documentation is published alongside the data and should include:
    • information about the research project
    • methodology
    • limitations of the data
Source Bias
  • How were the data collected?
    • Counts, estimates, surveys?
    • Who was the intended audience?
  • Who collected the data?
    • Individual researcher?
    • Non profit organization?
    • Government agency?
  • Why was the data collected?
    • How was it originally used?
    • What was the purpose of the collection?
    • Is the measure widely accepted?
  • Has the data been repackaged or republished?
    • Are you looking at the original source? If not, can you locate it?
  • Who was financially responsible for the collection and publication?
  • Who benefited the most from the collection and publication?
  • Was the data collected expressly for advocacy?
  • How could the measure introduce bias?
    • Sample size
    • Point of view
    • Response rate
    • Technology
    • Included/excluded populations
    • Terminology
Authority Currency
  • Is the source of the data well-known? 
    • How has their work been used? 
    • Who is using their data?
  • Is the source an expert in the field?
  • How do they come by that expertise?
    • Practical experience? 
    • Cultural knowledge?
    • Education?
  • Is this data the latest collected? Are there more recent figures?
  • Was there a lag between collection and publication? How long? What may have changed in the meantime?

Citing Data

Properly citing data allows research to be more easily discovered, reproduced, and interrogated. Good citations also make sure that research data's impact is measured and that researchers are credited for their work and contribution to the scholarly conversation.

There are no universal standards for data citation (yet!). In general you want to include as much information as you can to allow another researcher to replicate your process.