The seven steps of data analysis
Data Analysis is a Process: Doing real life empirical projects
The Seven Steps of Data Analyis lecture discusses the process of empirical projects: research question, data collection, cleaning and wrangling, exploration, modeling, communicating results, and answering our question and discussing the validity. It also touches upon a range of issues related to working with data: aspects of collection and data wrangling with an emphasis on the role of coding, data engineering and reproducible research. The lecture is based on Békés-Kézdi: Data Analysis for Business, Economics, and Policy (Cambridge UP 2021)
The talk is 60-90 mins.
Target audience
The target audience is terminal year undergraduate (BA, BSc) as well as applied Masters (MA/MSc) students in economics, finance, business and other social sciences who intend to a dissertation (thesis) with an emprical focus.
Topics
In particular, to discuss data analysis as a process, we’ll discuss 7 topics about how data analysis will…
- First comes a research topic and a specific research question
- Data collection is the foundation for all empirical work
- Cleaning and organizing the data is a necessary and time-consuming part
- Exploratory data analysis helps both data preparation and analysis
- Analytical work tests hypotheses and estimates model(s)
- Results shall be presented in a user friendly way
- Finally, we answer the original question and discuss generality
A case study
Throughout the talk I will use a case study from my textbook on family ownership of firms and management quality. The case study is based on the World Management Survey, data from WMS
Tools
I’ll talk about tools for all seven steps as well:
- Read up on your topic with Google Scholar, Repec manage references with Paperpile or Zotero or similar tools
- Doing surveys online with Google Forms SurveyMonkey
- Coding environment for reproducible research: in R Rstudio and for Python: Jupyter Notebook
- Data exploration and visualization with ggplot in R and PlotNine in Python
- Doing reproducible research with Git and Github (for case studies)
- Writing up a thesis and presentation in Latex and Overleaf
- Benefit from AI (ChatGPT, Claude.ai and Github copilot. Other tools: scite.ai és consensus.app
Talks AY 2021/22
- FEP, University of Porto, Portugal: 18 November 2021
- CAED conference, University of Coimbra, Portugal: 20 November 2021
- UCL University College London, UK: 7 December 2021
- Corvinus University Budapest, Hungary: 3 March 2022
Talks AY 2022/23
- Middlesex University, UK
Talks AY 2024/25
- HUN-REN KRTK
Others
- Ecommerce Hungary, Online 30 November, 2021
Ping me if interested in hosting an event