Week02: Discovery and documentation

Data discovery and data and code documentation with AI

Published

March 20, 2025

Week02: Discovery and documentation

Data discovery and data and code documentation with AI


Objectives

Summary

Sometimes data is large and discovery is hard. Sometimes you need to write data documentation. LLMs can help. You will learn how to write a clear and professional README. We use a cleaned subset of the 7th Wave of the World Values Survey (WVS). We’ll also talk some tech on documentation.

Learning Objectives:

  • Understand how to document a new dataset using as an example th WVS 7th wave data.
  • Create a README that describes data.
  • Learn to refine documentation by incorporating iterative feedback from peers and AI tools.

Preparation BEFORE class

Reading and review

Get data and info:

Access the VWS dataset 1. Data: WVS_random_subset.csv - random subset (N=2000) - covering all countries 2. Download its official codebook documentation

If you prefer datasets are also at OSF, Gabors Data Analysis / World Values Survey

Class plan

Review Assignment 01

  • Follow instructions.
  • How to get close to original, different ways
  • Why do an app? What to expect from an app
    • streamlit
    • shinyapps

I. Background

About Markdown

What is a good readme?

Some examples for reproduction package

Key ingredients

  • Overview of project
  • license
  • All datasets (data tables) separately discussed
  • All key variables described (name, content, type, coverage (% share missing)
    • maybe also: source, extension (csv / xlsx/ parquet)

What is a variable dictionary (also called codebook)

  • more details of a dataset, often as xlsx
  • metric (euro, %), meaning of values if categorical
  • maybe even mean, min, max

Examples

II. Work on data

No AI

  • Download and look at the Random Subset data
  • Start collecting some info on the data without AI
  • Start thinking about an interesting research question (find \(y\) and \(x\))

AI: let AI teach you also about

  • Start asking for skeleton readme, ask about advice
  • Discussion

AI: Learning and idea generation

  • Tell AI about your plan and need for a readme
    • experiment with one-shot vs interaction
  • Discussion

Cyborg mode: create a readme with AI

  • Upload the codebook + random subset data
  • Get AI to design a README TEMPLATE for this task.
  • Get a draft
  • Understand and edit draft

III additional idea

  • Sometimes, complicated projects have extensive folder structure. Use A to design a folder structure

End of Week Discussion points

  • What was the biggest contribution of AI?
  • First result vs after iterations – what did improve?
  • How do you feel about learning from AI vs human instructor? Pros and cons?

Assignment

See suggested assignment for week 02