Gábor Békés and Gábor Kézdi
CEU and U Michigan
- Custom Social Profile Link
Data Analysis for Business, Economics, and Policy
This textbook
This textbook provides future data analysts with the tools, methods, and skills needed to answer data-focused, real life questions, to choose and apply appropriate methods to answer those questions, and to visualize and interpret results to support better decisions in business, economics, and public policy. Data wrangling and exploration, regression analysis, prediction with machine learning, and causal analysis are comprehensively covered, as well as when, why, and how the methods work, and how they relate to each other.
As the most effective way to communicate data analysis, running case studies play a central role in this textbook. Each case starts with an industry relevant question and answers it by using real-world data and applying the tools and methods covered in the textbook. Learning is then consolidated by over 360 practice questions and 120 data exercises. Extensive online resources, including raw and cleaned data and codes for all analysis in Stata, R, and Python are available on this site.
Endorsements
“ This exciting new text covers everything today’s aspiring data scientist needs to know, managing to be comprehensive as well as accessible. Like a good confidence interval, the Gabors have got you almost completely covered! ” Joshua Angrist, MIT, Nobel laureate in Economics 2021
“ A beautiful integration of Econometrics and Data Science that provides a direct path from data collection and exploratory analysis to conventional regression modeling, then on to prediction and causal modeling. Exactly what is needed to equip the next generation of students with the tools and insights from the two fields. ” David Card, University of California, Berkeley, Nobel laureate in Economics 2021
MORE endorsements
Reviews by instructors
Buy the book: Amazon.com , or a great deal of global options Request an examination copy
Contact us Follow us on Twitter and Facebook
Key information materials
- Front Matter ,
- Table of content
- Sample chapters 10 and 15
- Short summary on why use this book
- A one-page summary , also available as PDF
You can check out the video recording of the launch webinar , or check out the slideshow presentation .
Why use this book?
Data analysis is a process . It starts with formulating a question and collecting appropriate data, or assessing whether the available data can help answer the question. Then comes cleaning and organizing the data, tedious but essential tasks that affect the results of the analysis as much as any other step in the process. Exploratory data analysis gives context to the eventual results and helps deciding the details of the analytical method to be applied. The main analysis consists of choosing and implementing the method to answer the question, with potential robustness checks. Along the way, correct interpretation and effective presentation of the results are crucial. Carefully crafted data visualization help summarize our findings and convey key messages. The final task is to answer the original question, with potential qualifications and directions for future inquiries.
Our textbook equips future data analysts with the most important tools, methods and skills they need through the entire process of data analysis to answer data focused, real life questions. We cover all the fundamental methods that help along the process of data analysis. The textbook is divided into four parts covering data wrangling and exploration, regression analysis, prediction with machine learning, and causal analysis . We explain when, why, and how the various methods work, and how they are related to each other. MORE on content
A cornerstone of this textbook are 47 case studies spreading over one-third of our material. This reflects our view that working through case studies is the best way to learn data analysis. Each of our case studies starts with a relevant question and answers it in the end, using real life data and applying the tools and methods covered in the particular chapter. MORE on case studies
We share all raw and cleaned data we use in the case studies. We also share the codes that clean the data and produce all results , tables, and graphs in Stata, R, and Python so students can tinker with our code and compare the solutions in the different software. MORE on data and code
This textbook was written to be a complete course in data analysis. This textbook could be useful for university students in graduate programs as core text in applied statistics and econometrics, quantitative methods, or data analysis. It may also complement online courses that teach specific methods to give more context and explanation. Undergraduate courses can also make use of this textbook, even though the workload on students exceeds the typical undergraduate workload. Finally, the textbook can serve as a handbook for practitioners to guide them through all steps of real-life data analysis. MORE on why use this book?
About authors
Gábor békés.
Gábor Békés is an Assistant Professor at the Department of Economics and Business of the Central European University and director of the MS in Business Analytics program. He is a senior fellow at KRTK and a research affiliate at the Center for Economic Policy Research (CEPR). He published in top economics journals on multinational firm activities and productivity, business clusters, and innovation spillovers. He managed international data collection projects on firm performance and supply chains. He has done both policy advising (the European Commission, ECB) as well as private sector consultancy (in finance, business intelligence and real estate). He has taught graduate-level data analysis and economic geography courses since 2012. Personal website
Balatonudvari, Hungary, July 2018. Photo by Anna Fetter.
Gábor Kézdi
Gábor Kézdi is a Research Associate Professor at the University of Michigan’s Institute for Social Research . He published in top journals in economics, statistics, and political science on topics including household finances, health, education, demography, and ethnic disadvantages and prejudice. He has managed several data collection projects in Europe; currently, he is co-investigator of the Health and Retirement Study in the U.S. He has consulted various governmental and non-governmental organizations on the disadvantage of the Roma minority and the evaluation of social interventions. He has taught data analysis, econometrics, and labor economics from undergraduate to Ph.D. levels since 2002 and supervised a number of MA and PhD students. Personal website
We could not have done this alone. Far from it. So, we are grateful , really.
We provide access to get all the code we used – in R, Stata and Python.
For all the code that reproduces all the tables and graphs in the textbook, visit the Github page github.com/gabors-data-analysis/da_case_studies where the live version of the code is available.
You may download the latest release v0.8.1. as a zip file .
Learning to code for data analysis
Learning to code for data analysis – free fully online courses now available!
For R, Stata and Python!
We provide access to get all the data we used; see our dataset summaries .
Data is shared via a OSF project repository . You can download it and use it, see Data and code for more information.
Teaching material for instructors
There are several materials we prepare for instructors:
- This course may be used for a variety of courses, and it is needed used Management Phd, Applied Economics MA, Data Science MSc or even in Executive MBA. Let us offer some experience and advice on how to teach this textbook for different courses
- For a variety of course and program types Frequently asked Questions and Answers
- Answers to all 360 practice questions for instructors, available from Cambridge University Press Instructor Resources
- Slideshows – one for each of the 24 chapters available through Cambridge University Press Instructor Resources
- Adopting instructors may get access to slides in Latex. Contact us for access
Coding help and info
Users can see a
- Review of Data and code and information on how to set up folders
- Brief summary of languages used
- Some advice on learning to code .
- How to set up for Stata? ,
- How to set up for R?
- How to set up for Python?
Many applications
The book has many application.
Summary from JEL
Textbook for graduate students discusses the most important tools, methods, and skills necessary for carrying out a data analysis project, presenting case studies from around the world linking business or policy questions to decisions in data selection and the application of methods. Covers data collection and quality, exploratory data analysis and visualization generalizing from data, and hypothesis testing. Provides an overview of regression analysis, including probability models and time series regressions. Explores predictive analytics, cross-validation, tree-based machine learning methods, classification, and forecasting from time series data. Focuses on causal analysis, the potential outcomes framework and causal maps, difference-in-differences analysis, various panel data methods, and the event study approach.
Introduction to Statistics and Data Analysis – A Case-Based Approach
Suggested citation:
Ziller, Conrad (2024). Introduction to Statistics and Data Analysis – A Case-Based Approach. Available online at https://bookdown.org/conradziller/introstatistics
To download the R-Scripts and data used in this book, go HERE .
A PDF-version of the book can be downloaded HERE .
Motivation for this Book
This short book is a complete introduction to statistics and data analysis using R and RStudio. It contains hands-on exercises with real data—mostly from social sciences. In addition, this book presents four key ingredients of statistical data analysis (univariate statistics, bivariate statistics, statistical inference, and regression analysis) as brief case studies. The motivation for this was to provide students with practical cases that help them navigate new concepts and serve as an anchor for recalling the acquired knowledge in exams or while conducting their own data analysis.
The case study logic is expected to increase motivation for engaging with the materials. As we all know, academic teaching is not the same as before the pandemic. Students are (rightfully) increasingly reluctant to chalk-and-talk techniques of teaching, and we have all developed dopamine-related addictions to social media content which have considerably shortened our ability to concentrate. This poses challenges to academic teaching in general and complex content such as statistics and data science in particular.
How to Use the Book
This book consists of four case studies that provide a short, yet comprehensive, introduction to statistics and data analysis. The examples used in the book are based on real data from official statistics and publicly available surveys. While each case study follows its own logic, I advise reading them consecutively. The goal is to provide readers with an opportunity to learn independently and to gather a solid foundation of hands-on knowledge of statistics and data analysis. Each case study contains questions that can be answered in the boxes below. The solutions to the questions can be viewed below the boxes (by clicking on the arrow next to the word “solution”). It is advised to save answers to a separate document because this content is not saved and cannot be accessed after reloading the book page.
A working sheet with questions, answer boxes, and solutions can be downloaded together with the R-Scrips HERE . You can read this book online for free. Copies in printable format may be ordered from the author.
This book can be used for teaching by university instructors, who may use data examples and analyses provided in this book as illustrations in lectures (and by acknowledging the source). This book can be used for self-study by everyone who wants to acquire foundational knowledge in basic statistics and practical skills in data analysis. The materials can also be used as a refresher on statistical foundations.
Beginners in R and RStudio are advised to install the programs via the following link https://posit.co/download/rstudio-desktop/ and to download the materials from HERE . The scripts from this material can then be executed while reading the book. This helps to get familiar with statistical analysis, and it is just an awesome feeling to get your own script running! (On the downside, it is completely normal and part of the process that code for statistical analysis does not work. This is what helpboards across the web and, more recently, ChatGPT are for. Just google your problem and keep on trying, it is, as always, 20% inspiration and 80% consistency.)
Organization of the Book
The book contains four case studies, each showcasing unique statistical and data-analysis-related techniques.
- Section 2: Univariate Statistics – Case Study Socio-Demographic Reporting
Section 2 contains material on the analysis of one variable. It presents measures of typical values (e.g., the mean) and the distribution of data.
- Section 3: Bivariate Statistics - Case Study 2020 United States Presidential Election
Section 3 contains material on the analysis of the relationship between two variables, including cross tabs and correlations.
- Section 4: Statistical Inference - Case Study Satisfaction with Government
Section 4 introduces the concept of statistical inference, which refers to inferring population characteristics from a random sample. It also covers the concepts of hypothesis testing, confidence intervals, and statistical significance.
- Section 5: Regression Analysis - Case Study Attitudes Toward Justice
Section 5 covers how to conduct multiple regression analysis and interpret the corresponding results. Multiple regression investigates the relationship between an outcome variable (e.g., beliefs about justice) and multiple variables that represent different competing explanations for the outcome.
Acknowledgments
Thank you to Paul Gies, Phillip Kemper, Jonas Verlande, Teresa Hummler, Paul Vierus, and Felix Diehl for helpful feedback on previous versions of this book. I want to thank Achim Goerres for his feedback early on and for granting me maximal freedom in revising and updating the materials of his introductory lectures on Methods and Statistics, which led to the writing of this book. Earlier versions of this book have been used in teaching courses on statistics in the Political Science undergraduate program at the University of Duisburg-Essen.
About the Author
Conrad Ziller is a Senior Researcher in the Department of Political Science at the University of Duisburg-Essen. His research interests focus on the role of immigration in politics and society, immigrant integration, policy effects on citizens, and quantitative methods. He is the principal investigator of research projects funded by the German Research Foundation and the Fritz Thyssen Foundation. More information about his research can be found here: https://conradziller.com/ .
The final part of the book is about linear regression analysis, which is the natural endpoint for a course on introductory statistics. However, the “ordinary” regression is where many further useful techniques come into play—most of which can subsumed under the label “Advanced Regression Models”. You will need them when analyzing, for example, panel data where the same respondents were interviewed multiple times or spatially clustered data from cross-national surveys.
I will extend this introduction with case studies on advanced regression techniques soon. If you want to get notified when this material is online, please sign up with your email address here: https://forms.gle/T8Hvhq3EmcywkTdFA .
In the meantime, I have a chapter on “Multiple Regression with Non-Independent Observations: Random-Effects and Fixed-Effects” that can be downloaded via https://ssrn.com/abstract=4747607 .
For feedback on the usefulness of the introduction and/or reports on errors and misspellings, I would be utmost thankful if you would send me a short notification at [email protected] .
Thanks much for engaging with this introduction!
The online version of this book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .
IMAGES
VIDEO