Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on July 9, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, other interesting articles, frequently asked questions about descriptive statistics.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

descriptive statistical tools in research

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution .

  • Simple frequency distribution table
  • Grouped frequency distribution table
Gender Number
Male 182
Female 235
Other 27

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Library visits in the past year Percent
0–4 6%
5–8 20%
9–12 42%
13–16 24%
17+ 8%

Measures of central tendency estimate the center, or average, of a data set. The mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

Mean number of library visits
Data set 15, 3, 12, 0, 24, 3
Sum of all values 15 + 3 + 12 + 0 + 24 + 3 = 57
Total number of responses = 6
Mean Divide the sum of values by to find : 57/6 =

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then , the median is the number in the middle. If there are two numbers in the middle, find their mean.

Median number of library visits
Ordered data set 0, 3, 3, 12, 15, 24
Middle numbers 3, 12
Median Find the mean of the two middle numbers: (3 + 12)/2 =

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Mode number of library visits
Ordered data set 0, 3, 3, 12, 15, 24
Mode Find the most frequently occurring response:

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s or SD ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.
Raw data Deviation from mean Squared deviation
15 15 – 9.5 = 5.5 30.25
3 3 – 9.5 = -6.5 42.25
12 12 – 9.5 = 2.5 6.25
0 0 – 9.5 = -9.5 90.25
24 24 – 9.5 = 14.5 210.25
3 3 – 9.5 = -6.5 42.25
= 9.5 Sum = 0 Sum of squares = 421.5

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

Visits to the library
6
Mean 9.5
Median 7.5
Mode 3
Standard deviation 9.18
Variance 84.3
Range 24

If you were to only consider the mean as a measure of central tendency, your impression of the “middle” of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to outliers , you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.

Number of visits to the library in the past year
Group 0–4 5–8 9–12 13–16 17+
Children 32 68 37 23 22
Adults 36 48 43 83 25

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

Visits to the library in the past year (Percentages)
Group 0–4 5–8 9–12 13–16 17+
Children 18% 37% 20% 13% 12% 182
Adults 15% 20% 18% 35% 11% 235

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables . It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Statistical power
  • Pearson correlation
  • Degrees of freedom
  • Statistical significance

Methodology

  • Cluster sampling
  • Stratified sampling
  • Focus group
  • Systematic review
  • Ethnography
  • Double-Barreled Question

Research bias

  • Implicit bias
  • Publication bias
  • Cognitive bias
  • Placebo effect
  • Pygmalion effect
  • Hindsight bias
  • Overconfidence bias

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved September 18, 2024, from https://www.scribbr.com/statistics/descriptive-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, central tendency | understanding the mean, median & mode, variability | calculating range, iqr, variance, standard deviation, inferential statistics | an easy introduction & examples, what is your plagiarism score.

descriptive statistical tools in research

Quant Analysis 101: Descriptive Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the first terms you’re likely to hear being thrown around is descriptive statistics. In this post, we’ll unpack the basics of descriptive statistics, using straightforward language and loads of examples . So grab a cup of coffee and let’s crunch some numbers!

Overview: Descriptive Statistics

What are descriptive statistics.

  • Descriptive vs inferential statistics
  • Why the descriptives matter
  • The “ Big 7 ” descriptive statistics
  • Key takeaways

At the simplest level, descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset – for example, a set of survey responses. They provide a snapshot of the characteristics of your dataset and allow you to better understand, roughly, how the data are “shaped” (more on this later). For example, a descriptive statistic could include the proportion of males and females within a sample or the percentages of different age groups within a population.

Another common descriptive statistic is the humble average (which in statistics-talk is called the mean ). For example, if you undertook a survey and asked people to rate their satisfaction with a particular product on a scale of 1 to 10, you could then calculate the average rating. This is a very basic statistic, but as you can see, it gives you some idea of how this data point is shaped .

Descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset, including its “shape”

What about inferential statistics?

Now, you may have also heard the term inferential statistics being thrown around, and you’re probably wondering how that’s different from descriptive statistics. Simply put, descriptive statistics describe and summarise the sample itself , while inferential statistics use the data from a sample to make inferences or predictions about a population .

Put another way, descriptive statistics help you understand your dataset , while inferential statistics help you make broader statements about the population , based on what you observe within the sample. If you’re keen to learn more, we cover inferential stats in another post , or you can check out the explainer video below.

Why do descriptive statistics matter?

While descriptive statistics are relatively simple from a mathematical perspective, they play a very important role in any research project . All too often, students skim over the descriptives and run ahead to the seemingly more exciting inferential statistics, but this can be a costly mistake.

The reason for this is that descriptive statistics help you, as the researcher, comprehend the key characteristics of your sample without getting lost in vast amounts of raw data. In doing so, they provide a foundation for your quantitative analysis . Additionally, they enable you to quickly identify potential issues within your dataset – for example, suspicious outliers, missing responses and so on. Just as importantly, descriptive statistics inform the decision-making process when it comes to choosing which inferential statistics you’ll run, as each inferential test has specific requirements regarding the shape of the data.

Long story short, it’s essential that you take the time to dig into your descriptive statistics before looking at more “advanced” inferentials. It’s also worth noting that, depending on your research aims and questions, descriptive stats may be all that you need in any case . So, don’t discount the descriptives! 

Research methodology webinar

The “Big 7” descriptive statistics

With the what and why out of the way, let’s take a look at the most common descriptive statistics. Beyond the counts, proportions and percentages we mentioned earlier, we have what we call the “Big 7” descriptives. These can be divided into two categories – measures of central tendency and measures of dispersion.

Measures of central tendency

True to the name, measures of central tendency describe the centre or “middle section” of a dataset. In other words, they provide some indication of what a “typical” data point looks like within a given dataset. The three most common measures are:

The mean , which is the mathematical average of a set of numbers – in other words, the sum of all numbers divided by the count of all numbers. 
The median , which is the middlemost number in a set of numbers, when those numbers are ordered from lowest to highest.
The mode , which is the most frequently occurring number in a set of numbers (in any order). Naturally, a dataset can have one mode, no mode (no number occurs more than once) or multiple modes.

To make this a little more tangible, let’s look at a sample dataset, along with the corresponding mean, median and mode. This dataset reflects the service ratings (on a scale of 1 – 10) from 15 customers.

Example set of descriptive stats

As you can see, the mean of 5.8 is the average rating across all 15 customers. Meanwhile, 6 is the median . In other words, if you were to list all the responses in order from low to high, Customer 8 would be in the middle (with their service rating being 6). Lastly, the number 5 is the most frequent rating (appearing 3 times), making it the mode.

Together, these three descriptive statistics give us a quick overview of how these customers feel about the service levels at this business. In other words, most customers feel rather lukewarm and there’s certainly room for improvement. From a more statistical perspective, this also means that the data tend to cluster around the 5-6 mark , since the mean and the median are fairly close to each other.

To take this a step further, let’s look at the frequency distribution of the responses . In other words, let’s count how many times each rating was received, and then plot these counts onto a bar chart.

Example frequency distribution of descriptive stats

As you can see, the responses tend to cluster toward the centre of the chart , creating something of a bell-shaped curve. In statistical terms, this is called a normal distribution .

As you delve into quantitative data analysis, you’ll find that normal distributions are very common , but they’re certainly not the only type of distribution. In some cases, the data can lean toward the left or the right of the chart (i.e., toward the low end or high end). This lean is reflected by a measure called skewness , and it’s important to pay attention to this when you’re analysing your data, as this will have an impact on what types of inferential statistics you can use on your dataset.

Example of skewness

Measures of dispersion

While the measures of central tendency provide insight into how “centred” the dataset is, it’s also important to understand how dispersed that dataset is . In other words, to what extent the data cluster toward the centre – specifically, the mean. In some cases, the majority of the data points will sit very close to the centre, while in other cases, they’ll be scattered all over the place. Enter the measures of dispersion, of which there are three:

Range , which measures the difference between the largest and smallest number in the dataset. In other words, it indicates how spread out the dataset really is.

Variance , which measures how much each number in a dataset varies from the mean (average). More technically, it calculates the average of the squared differences between each number and the mean. A higher variance indicates that the data points are more spread out , while a lower variance suggests that the data points are closer to the mean.

Standard deviation , which is the square root of the variance . It serves the same purposes as the variance, but is a bit easier to interpret as it presents a figure that is in the same unit as the original data . You’ll typically present this statistic alongside the means when describing the data in your research.

Again, let’s look at our sample dataset to make this all a little more tangible.

descriptive statistical tools in research

As you can see, the range of 8 reflects the difference between the highest rating (10) and the lowest rating (2). The standard deviation of 2.18 tells us that on average, results within the dataset are 2.18 away from the mean (of 5.8), reflecting a relatively dispersed set of data .

For the sake of comparison, let’s look at another much more tightly grouped (less dispersed) dataset.

Example of skewed data

As you can see, all the ratings lay between 5 and 8 in this dataset, resulting in a much smaller range, variance and standard deviation . You might also notice that the data are clustered toward the right side of the graph – in other words, the data are skewed. If we calculate the skewness for this dataset, we get a result of -0.12, confirming this right lean.

In summary, range, variance and standard deviation all provide an indication of how dispersed the data are . These measures are important because they help you interpret the measures of central tendency within context . In other words, if your measures of dispersion are all fairly high numbers, you need to interpret your measures of central tendency with some caution , as the results are not particularly centred. Conversely, if the data are all tightly grouped around the mean (i.e., low dispersion), the mean becomes a much more “meaningful” statistic).

Key Takeaways

We’ve covered quite a bit of ground in this post. Here are the key takeaways:

  • Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis.
  • Measures of central tendency include the mean (average), median and mode.
  • Skewness indicates whether a dataset leans to one side or another
  • Measures of dispersion include the range, variance and standard deviation

If you’d like hands-on help with your descriptive statistics (or any other aspect of your research project), check out our private coaching service , where we hold your hand through each step of the research journey. 

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

ed

Good day. May I ask about where I would be able to find the statistics cheat sheet?

Khan

Right above you comment 🙂

Laarbik Patience

Good job. you saved me

Lou

Brilliant and well explained. So much information explained clearly!

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

descriptive statistical tools in research

  • Print Friendly
  • Privacy Policy

Research Method

Home » Descriptive Statistics – Types, Methods and Examples

Descriptive Statistics – Types, Methods and Examples

Table of Contents

Descriptive Statistics

Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods. Descriptive statistics is primarily concerned with measures of central tendency, variability, and distribution, as well as graphical representations of data.

Here are the main components of descriptive statistics:

  • Measures of Central Tendency : These provide a summary statistic that represents the center point or typical value of a dataset. The most common measures of central tendency are the mean (average), median (middle value), and mode (most frequent value).
  • Measures of Dispersion or Variability : These provide a summary statistic that represents the spread of values in a dataset. Common measures of dispersion include the range (difference between the highest and lowest values), variance (average of the squared differences from the mean), standard deviation (square root of the variance), and interquartile range (difference between the upper and lower quartiles).
  • Measures of Position : These are used to understand the distribution of values within a dataset. They include percentiles and quartiles.
  • Graphical Representations : Data can be visually represented using various methods like bar graphs, histograms, pie charts, box plots, and scatter plots. These visuals provide a clear, intuitive way to understand the data.
  • Measures of Association : These measures provide insight into the relationships between variables in the dataset, such as correlation and covariance.

Descriptive Statistics Types

Descriptive statistics can be classified into two types:

Measures of Central Tendency

These measures help describe the center point or average of a data set. There are three main types:

  • Mean : The average value of the dataset, obtained by adding all the data points and dividing by the number of data points.
  • Median : The middle value of the dataset, obtained by ordering all data points and picking out the one in the middle (or the average of the two middle numbers if the dataset has an even number of observations).
  • Mode : The most frequently occurring value in the dataset.

Measures of Variability (or Dispersion)

These measures describe the spread or variability of the data points in the dataset. There are four main types:

  • Range : The difference between the largest and smallest values in the dataset.
  • Variance : The average of the squared differences from the mean.
  • Standard Deviation : The square root of the variance, giving a measure of dispersion that is in the same units as the original dataset.
  • Interquartile Range (IQR) : The range between the first quartile (25th percentile) and the third quartile (75th percentile), which provides a measure of variability that is resistant to outliers.

Descriptive Statistics Formulas

Sure, here are some of the most commonly used formulas in descriptive statistics:

Mean (μ or x̄) :

The average of all the numbers in the dataset. It is computed by summing all the observations and dividing by the number of observations.

Formula : μ = Σx/n or x̄ = Σx/n (where Σx is the sum of all observations and n is the number of observations)

The middle value in the dataset when the observations are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle numbers.

The most frequently occurring number in the dataset. There’s no formula for this as it’s determined by observation.

The difference between the highest (max) and lowest (min) values in the dataset.

Formula : Range = max – min

Variance (σ² or s²) :

The average of the squared differences from the mean. Variance is a measure of how spread out the numbers in the dataset are.

Population Variance formula : σ² = Σ(x – μ)² / N Sample Variance formula: s² = Σ(x – x̄)² / (n – 1)

(where x is each individual observation, μ is the population mean, x̄ is the sample mean, N is the size of the population, and n is the size of the sample)

Standard Deviation (σ or s) :

The square root of the variance. It measures the amount of variability or dispersion for a set of data. Population Standard Deviation formula: σ = √σ² Sample Standard Deviation formula: s = √s²

Interquartile Range (IQR) :

The range between the first quartile (Q1, 25th percentile) and the third quartile (Q3, 75th percentile). It measures statistical dispersion, or how far apart the data points are.

Formula : IQR = Q3 – Q1

Descriptive Statistics Methods

Here are some of the key methods used in descriptive statistics:

This method involves arranging data into a table format, making it easier to understand and interpret. Tables often show the frequency distribution of variables.

Graphical Representation

This method involves presenting data visually to help reveal patterns, trends, outliers, or relationships between variables. There are many types of graphs used, such as bar graphs, histograms, pie charts, line graphs, box plots, and scatter plots.

Calculation of Central Tendency Measures

This involves determining the mean, median, and mode of a dataset. These measures indicate where the center of the dataset lies.

Calculation of Dispersion Measures

This involves calculating the range, variance, standard deviation, and interquartile range. These measures indicate how spread out the data is.

Calculation of Position Measures

This involves determining percentiles and quartiles, which tell us about the position of particular data points within the overall data distribution.

Calculation of Association Measures

This involves calculating statistics like correlation and covariance to understand relationships between variables.

Summary Statistics

Often, a collection of several descriptive statistics is presented together in what’s known as a “summary statistics” table. This provides a comprehensive snapshot of the data at a glanc

Descriptive Statistics Examples

Descriptive Statistics Examples are as follows:

Example 1: Student Grades

Let’s say a teacher has the following set of grades for 7 students: 85, 90, 88, 92, 78, 88, and 94. The teacher could use descriptive statistics to summarize this data:

  • Mean (average) : (85 + 90 + 88 + 92 + 78 + 88 + 94)/7 = 88
  • Median (middle value) : First, rearrange the grades in ascending order (78, 85, 88, 88, 90, 92, 94). The median grade is 88.
  • Mode (most frequent value) : The grade 88 appears twice, more frequently than any other grade, so it’s the mode.
  • Range (difference between highest and lowest) : 94 (highest) – 78 (lowest) = 16
  • Variance and Standard Deviation : These would be calculated using the appropriate formulas, providing a measure of the dispersion of the grades.

Example 2: Survey Data

A researcher conducts a survey on the number of hours of TV watched per day by people in a particular city. They collect data from 1,000 respondents and can use descriptive statistics to summarize this data:

  • Mean : Calculate the average hours of TV watched by adding all the responses and dividing by the total number of respondents.
  • Median : Sort the data and find the middle value.
  • Mode : Identify the most frequently reported number of hours watched.
  • Histogram : Create a histogram to visually display the frequency of responses. This could show, for example, that the majority of people watch 2-3 hours of TV per day.
  • Standard Deviation : Calculate this to find out how much variation there is from the average.

Importance of Descriptive Statistics

Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important:

  • Data Summarization : Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large dataset, it’s often difficult to identify patterns or tendencies just by looking at the raw data. Descriptive statistics provide numerical and graphical summaries that can highlight important aspects of the data.
  • Data Simplification : They simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary, making it easier to understand and interpret the dataset.
  • Identification of Patterns and Trends : Descriptive statistics can help identify patterns and trends in the data, providing valuable insights. Measures like the mean and median can tell you about the central tendency of your data, while measures like the range and standard deviation tell you about the dispersion.
  • Data Comparison : By summarizing data into measures such as the mean and standard deviation, it’s easier to compare different datasets or different groups within a dataset.
  • Data Quality Assessment : Descriptive statistics can help identify errors or outliers in the data, which might indicate issues with data collection or entry.
  • Foundation for Further Analysis : Descriptive statistics are typically the first step in data analysis. They help create a foundation for further statistical or inferential analysis. In fact, advanced statistical techniques often assume that one has first examined their data using descriptive methods.

When to use Descriptive Statistics

They can be used in a wide range of situations, including:

  • Understanding a New Dataset : When you first encounter a new dataset, using descriptive statistics is a useful first step to understand the main characteristics of the data, such as the central tendency, dispersion, and distribution.
  • Data Exploration in Research : In the initial stages of a research project, descriptive statistics can help to explore the data, identify trends and patterns, and generate hypotheses for further testing.
  • Presenting Research Findings : Descriptive statistics can be used to present research findings in a clear and understandable way, often using visual aids like graphs or charts.
  • Monitoring and Quality Control : In fields like business or manufacturing, descriptive statistics are often used to monitor processes, track performance over time, and identify any deviations from expected standards.
  • Comparing Groups : Descriptive statistics can be used to compare different groups or categories within your data. For example, you might want to compare the average scores of two groups of students, or the variance in sales between different regions.
  • Reporting Survey Results : If you conduct a survey, you would use descriptive statistics to summarize the responses, such as calculating the percentage of respondents who agree with a certain statement.

Applications of Descriptive Statistics

Descriptive statistics are widely used in a variety of fields to summarize, represent, and analyze data. Here are some applications:

  • Business : Businesses use descriptive statistics to summarize and interpret data such as sales figures, customer feedback, or employee performance. For instance, they might calculate the mean sales for each month to understand trends, or use graphical representations like bar charts to present sales data.
  • Healthcare : In healthcare, descriptive statistics are used to summarize patient data, such as age, weight, blood pressure, or cholesterol levels. They are also used to describe the incidence and prevalence of diseases in a population.
  • Education : Educators use descriptive statistics to summarize student performance, like average test scores or grade distribution. This information can help identify areas where students are struggling and inform instructional decisions.
  • Social Sciences : Social scientists use descriptive statistics to summarize data collected from surveys, experiments, and observational studies. This can involve describing demographic characteristics of participants, response frequencies to survey items, and more.
  • Psychology : Psychologists use descriptive statistics to describe the characteristics of their study participants and the main findings of their research, such as the average score on a psychological test.
  • Sports : Sports analysts use descriptive statistics to summarize athlete and team performance, such as batting averages in baseball or points per game in basketball.
  • Government : Government agencies use descriptive statistics to summarize data about the population, such as census data on population size and demographics.
  • Finance and Economics : In finance, descriptive statistics can be used to summarize past investment performance or economic data, such as changes in stock prices or GDP growth rates.
  • Quality Control : In manufacturing, descriptive statistics can be used to summarize measures of product quality, such as the average dimensions of a product or the frequency of defects.

Limitations of Descriptive Statistics

While descriptive statistics are a crucial part of data analysis and provide valuable insights about a dataset, they do have certain limitations:

  • Lack of Depth : Descriptive statistics provide a summary of your data, but they can oversimplify the data, resulting in a loss of detail and potentially significant nuances.
  • Vulnerability to Outliers : Some descriptive measures, like the mean, are sensitive to outliers. A single extreme value can significantly skew your mean, making it less representative of your data.
  • Inability to Make Predictions : Descriptive statistics describe what has been observed in a dataset. They don’t allow you to make predictions or generalizations about unobserved data or larger populations.
  • No Insight into Correlations : While some descriptive statistics can hint at potential relationships between variables, they don’t provide detailed insights into the nature or strength of these relationships.
  • No Causality or Hypothesis Testing : Descriptive statistics cannot be used to determine cause and effect relationships or to test hypotheses. For these purposes, inferential statistics are needed.
  • Can Mislead : When used improperly, descriptive statistics can be used to present a misleading picture of the data. For instance, choosing to only report the mean without also reporting the standard deviation or range can hide a large amount of variability in the data.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Critical Analysis

Critical Analysis – Types, Examples and Writing...

Narrative Analysis

Narrative Analysis – Types, Methods and Examples

Probability Histogram

Probability Histogram – Definition, Examples and...

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

Framework Analysis

Framework Analysis – Method, Types and Examples

Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Recent Advances in Biostatistics

Introduction to Descriptive Statistics

Submitted: 04 July 2023 Reviewed: 20 July 2023 Published: 07 September 2023

DOI: 10.5772/intechopen.1002475

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Recent Advances in Biostatistics

B. Santhosh Kumar

To purchase hard copies of this book, please contact the representative in India: CBS Publishers & Distributors Pvt. Ltd. www.cbspd.com | [email protected]

Chapter metrics overview

310 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

IntechOpen

Total Chapter Views on intechopen.com

This chapter offers a comprehensive exploration of descriptive statistics, tracing its historical development from Condorcet’s “average” concept to Galton and Pearson’s contributions. Emphasizing its pivotal role in academia, descriptive statistics serve as a fundamental tool for summarizing and analyzing data across disciplines. The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques. It delves into their historical context, highlighting their organizational and presentational significance. Furthermore, the chapter accentuates the advantages of descriptive statistics in academia, including their ability to succinctly represent complex data, aid decision-making, and enhance research communication. It highlights the potency of visualization in discerning data patterns and explores emerging trends like large dataset analysis, Bayesian statistics, and nonparametric methods. Sources of variance intrinsic to descriptive statistics, such as sampling fluctuations, measurement errors, and outliers, are discussed, stressing the importance of considering these factors in data interpretation.

  • academic research
  • data analysis
  • data visualization
  • decision-making
  • research methodology
  • data summarization

Author Information

Olubunmi alabi *.

  • African University of Science and Technology, Abuja, Nigeria

Tosin Bukola

  • University of Greenwich, London, United Kingdom

*Address all correspondence to: [email protected]

1. Introduction

The French mathematician and philosopher Condorcet established the idea of the “average” as a means to summarize data, which is when descriptive statistics got their start. Yet, the widespread use of descriptive statistics in academic study did not start until the 19th century. Francis Galton, who was concerned in the examination of human features and attributes, was one of the major forerunners of descriptive statistics. Galton created various statistical methods that are still frequently applied in academic research today, such as the correlation and regression analysis concepts. The American statistician and mathematician in the early 20th century Karl Pearson created the “normal distribution,” which is a bell-shaped curve that characterizes the distribution of many natural occurrences. Moreover, Pearson created a number of correlational measures and popularized the chi-square test, which evaluates the significance of variations between observed and predicted frequencies. With the advent of new methods like multivariate analysis and factor analysis in the middle of the 20th century, the development of electronic computers sparked a revolution in statistical analysis. Descriptive statistics is the analysis and summarization of data to gain insights into its characteristics and distribution [ 1 ].

Descriptive statistics help researchers generate study ideas and guide further analysis by allowing them to explore data patterns and trends [ 2 ]. Descriptive statistics were used more often in academic research because they helped researchers better comprehend their datasets and served as a basis for more sophisticated statistical techniques. Similarly, Descriptive statistics are used to summarize and analyze data in a variety of academic areas, including psychology, sociology, economics, education, and epidemiology [ 3 ]. Descriptive statistics continue to be a crucial research tool in academia today, giving researchers a method to compile and analyze data from many fields. It is now simpler than ever to analyze and understand data, enabling researchers to make better informed judgments based on their results. This is due to the development of new statistical techniques and computer tools. Descriptive statistics can benefit researchers in hypothesis creation and exploratory analysis by identifying trends, patterns, and correlations between variables in huge datasets [ 4 ]. Descriptive statistics are important in data-driven decision-making processes because they allow stakeholders to make educated decisions based on reliable data [ 5 ].

2. Background

The history of descriptive statistics may be traced back to the 17th century, when early pioneers like John Graunt and William Petty laid the groundwork for statistical analysis [ 6 ]. Descriptive statistics is a fundamental concept in academia that is widely used across many disciplines, including social sciences, economics, medicine, engineering, and business. Descriptive statistics provides a comprehensive background for understanding data by organizing, summarizing, and presenting information effectively [ 7 ]. In academia, descriptive statistics is used to summarize and analyze data, providing insights into the patterns, trends, and characteristics of a dataset. Similarly, in academic research, descriptive statistics are often used as a preliminary analysis technique to gain a better understanding of the dataset before applying more complex statistical methods. Descriptive statistics lay the groundwork for inferential statistics by assisting researchers in drawing inferences about a population based on observed sample data [ 8 ]. Descriptive statistics aid in the identification and analysis of outliers, which can give useful insights into unusual observations or data collecting problems [ 9 ].

Descriptive statistics enable researchers to synthesize both quantitative and qualitative data, allowing for a thorough examination of factors [ 10 ]. Descriptive statistics can provide valuable information about the central tendency, variability, and distribution of the data, allowing researchers to make informed decisions about the appropriate statistical techniques to use. Descriptive statistics are an essential component of survey research technique, allowing researchers to efficiently summarize and display survey results [ 11 ]. Descriptive statistics may be used to summarize data as well as spot outliers, or observations that dramatically depart from the trend of the data as a whole. Finding outliers can help researchers spot any issues or abnormalities in the data so they can make the necessary modifications or repairs. In academic research, descriptive statistics are frequently employed to address research issues and evaluate hypotheses. Descriptive statistics, for instance, can be used to compare the average scores of two groups to see if there is a significant difference between them. In order to create new hypotheses or validate preexisting ideas, descriptive statistics may also be used to find patterns and correlations in the data.

There are several sources of variation that can affect the descriptive statistics of a data set, some of which include: Sampling Variation, descriptive statistics are often calculated using a sample of data rather than the entire population. Therefore, the descriptive statistics can vary depending on the particular sample that is selected. This is known as sampling variation. Measurement Variation, different measurement methods can produce different results, leading to variation in descriptive statistics. For example, if a scale is used to measure the weight of objects, slight differences in how the scale is used can produce slightly different measurements.

Data entry errors are mistakes made during the data entry process which can lead to variation in descriptive statistics. Even small errors, such as transposing two digits, can significantly impact the results. Outliers, Outliers are extreme values that fall outside of the expected range of values. These values can skew the descriptive statistics, making them appear more or less extreme than they actually are. Natural Variation, Natural variation refers to the inherent variability in the data itself. For example, if a data set contains measurements of the heights of trees, there will naturally be variation in the heights of the trees. It is important to understand these sources of variation when interpreting and using descriptive statistics in academia. Properly accounting for these sources of variation can help ensure that the descriptive statistics accurately reflect the underlying data.

Some emerging patterns in descriptive statistics in academia include: Big data analysis, with the increasing availability of large data sets, researchers are using descriptive statistics to identify patterns and trends in the data. The use of big data analysis techniques, such as machine learning and data mining, is becoming more common in academic research. Visualization techniques, advances in data visualization techniques are enabling researchers to more easily identify patterns in data sets. For example, heat maps and scatter plots can be used to visualize the relationship between different variables. Bayesian statistics is an emerging area of research in academia, which involves using probability theory to make inferences about data. Bayesian statistics can provide more accurate estimates of descriptive statistics, particularly when dealing with complex data sets.

Non-parametric statistics are becoming increasingly popular in academia, particularly when dealing with data sets that do not meet the assumptions of traditional parametric statistical tests. Non-parametric tests do not require the data to be normally distributed, and can be more robust to outliers. Open science practices, such as pre-registration and data sharing, are becoming more common in academia. This is enabling researchers to more easily replicate and verify the results of descriptive statistical analyses, which can improve the quality and reliability of research findings. Overall, the emerging patterns in descriptive statistics in academia reflect the increasing availability of data, the need for more accurate and robust statistical techniques, and a growing emphasis on transparency and openness in research practices.

3. Benefits of descriptive statistics

The advantages of descriptive statistics extend beyond research and academia, with applications in commercial decision-making, public policy, and strategic planning [ 12 ]. The benefits of descriptive statistics include providing a clear and concise summary of data, aiding in decision-making processes, and facilitating effective communication of findings [ 13 ]. Descriptive statistics provide numerous benefits to academia, some of which include: Summarization of Data: descriptive statistics allow researchers to quickly and efficiently summarize large data sets, providing a snapshot of the key characteristics of the data. This can help researchers identify patterns and trends in the data, and can also help to simplify complex data sets. Better decision making: descriptive statistics can help researchers make data-driven decisions. For example, if a researcher is comparing the effectiveness of two different treatments, descriptive statistics can be used to identify which treatment is more effective based on the data. Visualization of data: descriptive statistics can be used to create visualizations of data, which can make it easier to communicate research findings to others.

Histograms, bar charts, and scatterplots are examples of data visualization techniques that may be used to graphically depict data in order to detect trends, outliers, and correlations [ 14 ]. Visualizations can also help to identify patterns and trends in the data that might not be immediately apparent from raw data. Hypothesis Testing: descriptive statistics are often used in hypothesis testing, which allows researchers to determine whether a particular hypothesis about a data set is supported by the data. This can help to validate research findings and increase confidence in the conclusions drawn from the data. Improved data quality: Descriptive statistics can help to identify errors or inconsistencies in the data, which can help researchers improve the quality of the data. This can lead to more accurate research findings and a better understanding of the underlying phenomena. Overall, the benefits of descriptive statistics in academia are many and varied. They help researchers summarize large data sets, make data-driven decisions, visualize data, validate research findings, and improve the quality of the data. By using descriptive statistics, researchers can gain valuable insights into complex data sets and make more informed decisions based on the data.

4. Practical applications of descriptive statistics

Descriptive statistics has practical applications in disciplines such as business, social sciences, healthcare, finance, and market research [ 15 ]. Descriptive statistics have a wide range of practical applications in academia, some of which include: Data Summarization: Descriptive statistics can be used to summarize large data sets, making it easier for researchers to understand the key characteristics of the data. This is particularly useful when dealing with complex data sets that contain many variables. Hypothesis Testing: Descriptive statistics can be used to test hypotheses about a data set. For example, researchers can use descriptive statistics to test whether the mean value of a particular variable is significantly different from a hypothesized value. Data visualization: descriptive statistics can be used to create visualizations of data, which can make it easier to identify patterns and trends in the data. For example, a histogram or boxplot can be used to visualize the distribution of a variable. Comparing Groups: Descriptive statistics can be used to compare different groups within a data set. For example, researchers may compare the mean values of a particular variable between different demographic groups, such as age or gender. Predictive modeling: Descriptive statistics can be used to build predictive models, which can be used to forecast future trends or outcomes. For example, a researcher might use descriptive statistics to identify the key variables that predict student performance in a particular course. The practical applications of descriptive statistics in academia are wide-ranging and varied. They can be used in many different fields, including psychology, economics, sociology, and biology, among others, to provide insights into complex data sets and help researchers make data-driven decisions ( Figure 1 ).

descriptive statistical tools in research

Types of descriptive statistics. Ref: https://www.analyticssteps.com/blogs/types-descriptive-analysis-examples-steps .

Descriptive statistics is a useful tool for researchers in a variety of sectors since it allows them express the major characteristics of a dataset, such as its frequency, central tendency, variability, and distribution.

4.1 Central tendency measurements

Central tendency metrics, such as mean, median, and mode, are essential descriptive statistics that offer information about the average or typical value in a collection [ 16 ]. One of the primary purposes of descriptive statistics is to summarize data in a succinct and useful manner. Measures of central tendency, such as the median, are resistant to outliers and offer a more representative assessment of the average value in a skewed distribution [ 17 ]. The mean, median, and mode are measures of central tendency that are used to characterize the usual or center value of a dataset. The mean of a dataset is the arithmetic average, but the median is the midway number when the data is ordered in order of magnitude. The mode is the most often occurring value in the collection. Central tendency measurements are one of the most important aspects of descriptive statistics, as they provide a summary of the “typical” value of a data set.

The three most commonly used measures of central tendency are: Mean: the mean is calculated by adding up all the values in a data set and dividing by the total number of values. The mean is sensitive to outliers, as even one extreme value can greatly affect the mean. Median: the median is the middle value in a data set when the values are ordered from smallest to largest. If the data set has an odd number of values, the median is the middle value. If the data set has an even number of values, the median is the average of the two middle values. The median is more robust to outliers than the mean. Mode: the mode is the most common value in a data set. In some cases, there may be multiple modes (i.e. bimodal or multimodal distributions). The mode is useful for identifying the most frequently occurring value in a data set. Each of these measures of central tendency provides a different perspective on the “typical” value of a data set, and which measure is most appropriate to use depends on the nature of the data and the research question being addressed. For example, if the data set contains extreme outliers, the median may be a better measure of central tendency than the mean. Conversely, if the data set is symmetrical and normally distributed, the mean may provide the best measure of central tendency.

4.2 Variability indices

It is another key part of descriptive statistics is determining data variability. The spread or dispersion of data points about the central tendency readings is quantified by variability indices such as range, variance, and standard deviation [ 18 ]. Variability measures, such as range, variance, and standard deviation, reveal information about the spread or dispersion of the data. Variability indices, such as the coefficient of variation, allow you to compare variability across various datasets with different scales or units of measurement [ 19 ]. The range is the distance between the dataset’s greatest and lowest values, and the variance and standard deviation are measures of how much the data values depart from the mean. Variability indices are measures used in descriptive statistics to provide information about how much the data varies or how spread out it is. Variability indices, such as the interquartile range, give insights into data distribution while being less impacted by extreme values than the standard deviation [ 20 ]. Some commonly used variability indices include:

Range: The range is the difference between the largest and smallest values in a data set. It provides a simple measure of the spread of the data, but is sensitive to outliers. Interquartile Range (IQR): The IQR is the range of the middle 50% of the data. It is calculated by subtracting the 25th percentile (lower quartile) from the 75th percentile (upper quartile). The IQR is more robust to outliers than the range. Variance: The variance is a measure of how spread out the data is around the mean. It is calculated by taking the average of the squared differences between each data point and the mean. The variance is sensitive to outliers. Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how much the data varies from the mean, and is more commonly used than the variance because it has the same units as the original data.

Coefficient of Variation (CV): The CV is a measure of relative variability, expressed as a percentage. It is calculated by dividing the standard deviation by the mean and multiplying by 100. The CV is useful for comparing variability across different data sets that have different units or scales. These variability indices provide important information about the spread and variability of the data, which can help researchers better understand the characteristics of the data and draw meaningful conclusions from it.

4.3 Data visualization

Data may be visually represented using graphical approaches in addition to numerical metrics. Graphs and charts, such as histograms, box plots, and scatterplots, allow researchers investigate data patterns and correlations. Box plots and violin plots are efficient data visualization approaches for showing data distribution and spotting potential outliers [ 21 ]. They may also be used to detect outliers, or data points that deviate dramatically from the rest of the data. Data visualization is an important aspect of descriptive statistics, as it allows researchers to communicate complex data in a visual and easily understandable format. Some common types of data visualization used in descriptive statistics include: Histograms: Histograms are used to display the distribution of a continuous variable. The data is divided into intervals (or “bins”), and the number of observations falling into each bin is displayed on the vertical axis. Histograms provide a visual representation of the shape of the distribution, and can help to identify outliers or skewness. Box plots: Box plots provide a graphical representation of the distribution of a continuous variable. The application of graphical approaches, such as scatterplots and heat maps, improves comprehension of correlations and patterns in large datasets [ 22 ].

The box represents the middle 50% of the data, with the median displayed as a horizontal line inside the box. The whiskers extend to the minimum and maximum values in the data set, and any outliers are displayed as points outside the whiskers. Box plots are useful for comparing distributions across different groups or for identifying outliers. Scatter plots: Scatter plots are used to display the relationship between two continuous variables. Each data point is represented as a point on the graph, with one variable displayed on the horizontal axis and the other variable displayed on the vertical axis. Scatter plots can help to identify patterns or relationships in the data, such as a positive or negative correlation. Bar charts: Bar charts are used to display the distribution of a categorical variable.

The categories are displayed on the horizontal axis, and the frequency or percentage of observations falling into each category is displayed on the vertical axis. Bar charts can help to compare the frequency of different categories or to display the results of a survey or questionnaire. Heat maps: Heat maps are used to display the relationship between two categorical variables. The categories are displayed on both the horizontal and vertical axes, and the frequency or percentage of observations falling into each combination of categories is displayed using a color scale. Heat maps can help to identify patterns or relationships in the data, such as a higher frequency of observations in certain combinations of categories. These types of data visualizations can help researchers to communicate complex data in a clear and understandable format, and can also provide insights into the characteristics of the data that may not be immediately apparent from the raw data.

4.4 Data cleaning and preprocessing

Data cleaning and preprocessing procedures, such as imputation methods for missing data, aid in the preservation of data integrity and the reduction of bias in descriptive analysis [ 23 ]. Before beginning any statistical analysis, be certain that the data is clean and well arranged. The process of discovering and fixing flaws or inconsistencies in data, such as missing numbers or outliers, is known as data cleaning. Data preparation is the process of putting data into an appropriate format for analysis, such as scaling or normalizing the data. Data cleaning and preprocessing are essential steps in descriptive analysis, as they help to ensure that the data is accurate, complete, and ready for analysis. Some common data cleaning and preprocessing steps include: Handling missing data: Missing data can be a common problem in datasets and can impact the accuracy of the analysis. Depending on the amount of missing data, researchers may choose to remove incomplete cases or impute missing values using techniques such as mean imputation, regression imputation, or multiple imputation. Handling outliers: Outliers are extreme values that are different from the majority of the data points and can distort the analysis. Outlier identification and removal procedures, for example, assist increase the accuracy and reliability of descriptive statistics [ 24 ].

To assure the correctness and dependability of descriptive statistics, data cleaning and preprocessing require finding and dealing with missing values, outliers, and data inconsistencies [ 25 ]. Researchers may choose to remove or transform outliers to better reflect the characteristics of the data. Data transformation: Data transformation is used to normalize the data or to make it easier to analyze. Common transformations include logarithmic, square root, or Box-Cox transformations. Handling categorical data: Categorical data, such as nominal or ordinal data, may need to be recoded into numerical data before analysis. Researchers may also need to handle missing or inconsistent categories within the data. Standardizing data: Standardizing data involves scaling the data to have a mean of zero and a standard deviation of one. This can be useful for comparing variables with different units or scales. Data integration: Data integration involves merging or linking multiple datasets to create a single, comprehensive dataset for analysis. This may involve matching or merging datasets based on common variables or identifiers. By performing these data cleaning and preprocessing steps, researchers can ensure that the data is accurate and ready for analysis, which can lead to more reliable and meaningful insights from the data.

5. Descriptive statistics in academic methodology

Descriptive statistics are important in academic technique because they enable researchers to synthesize and describe data collected for research objectives [ 26 ]. Descriptive statistics is often used in combination with other statistical techniques, such as inferential statistics, to draw conclusions and make predictions from the data. In academic research, descriptive statistics is used in a variety of ways, such as describing sample characteristics. Descriptive statistics is used to describe the characteristics of a sample, such as the mean, median, and standard deviation of a variable. This information can be used to identify patterns, trends, or differences within the sample. Identifying data outliers: Descriptive statistics can help researchers identify potential outliers or anomalies in the data, which can affect the validity of the results. For example, identifying extreme values in a dataset can help researchers to investigate whether these values are due to measurement error or a true characteristic of the population.

Communicating research findings: Descriptive statistics is used to summarize and communicate research findings in a clear and concise manner. Graphs, charts, and tables can be used to display descriptive statistics in a way that is easy to understand and interpret. Testing assumptions: Descriptive statistics can be used to test assumptions about the data, such as normality or homogeneity of variance, which are important for selecting appropriate statistical tests and interpreting the results. Overall, descriptive statistics is a critical methodology in academic research that helps researchers to describe and understand the characteristics of their data. By using descriptive statistics, researchers can draw meaningful insights and conclusions from their data, and communicate these findings to others in a clear and concise manner.

6. Pitfalls of descriptive statistics

The possibility for misunderstanding, reliance on summary measures alone, and susceptibility to high values or outliers are all disadvantages of descriptive statistics [ 27 ]. While descriptive statistics is an essential tool in academic statistics, there are several potential pitfalls that researchers should be aware of: Limited scope: Descriptive statistics can provide a useful summary of the characteristics of a dataset, but it is limited in its ability to provide insights into the underlying causes or mechanisms that drive the data. Descriptive statistics alone cannot establish causal relationships or test hypotheses. Misleading interpretations: Descriptive statistics can be misleading if not interpreted correctly. For example, a small sample size may not accurately represent the population, and summary statistics such as the mean may not be meaningful if the data is not normally distributed.

Incomplete analysis: Descriptive statistics can only provide a limited view of the data, and researchers may need to use additional statistical techniques to fully analyze the data. For example, hypothesis testing and regression analysis may be needed to establish relationships between variables and make predictions. Biased data: Descriptive statistics can be biased if the data is not representative of the population of interest. Sampling bias, measurement bias, or non-response bias can all impact the validity of descriptive statistics. Over-reliance on summary statistics: Descriptive statistics can be over-reliant on summary statistics such as the mean or median, which may not provide a complete picture of the data. Visualizations and other descriptive statistics, such as measures of variability, can provide additional insight into the data. To avoid these pitfalls, researchers should carefully consider the scope and limitations of descriptive statistics and use additional statistical techniques as needed. They should also ensure that their data is representative of the population of interest and interpret their descriptive statistics in a thoughtful and nuanced manner.

7. Conclusion

Researchers can test the normalcy assumptions of their data by using relevant descriptive statistics techniques such as measures of skewness and kurtosis [ 28 ]. Descriptive statistics has become a fundamental methodology in academic research that is used to summarize and describe the characteristics of a dataset, such as the central tendency, variability, and distribution of the data. It is used in a wide range of disciplines, including social sciences, natural sciences, engineering, and business. Descriptive statistics can be used to describe sample characteristics, identify data outliers, communicate research findings, and test assumptions. The kind of data, research topic, and particular aims of the study all influence the right choice and implementation of descriptive statistical approaches [ 29 ].

However, there are several potential pitfalls of descriptive statistics, including limited scope, misleading interpretations, incomplete analysis, biased data, and over-reliance on summary statistics. The use of descriptive statistics in data presentation can improve the interpretability of study findings, making complicated material more accessible to a larger audience [ 30 ]. To use descriptive statistics effectively in academic research, researchers should carefully consider the limitations and scope of the methodology, use additional statistical techniques as needed, ensure that their data is representative of the population of interest, and interpret their descriptive statistics in a thoughtful and nuanced manner.

Conflict of interest

The authors declare no conflict of interest.

  • 1. Agresti A, Franklin C. Statistics: The Art and Science of Learning from Data. Upper Saddle River, NJ: Pearson; 2009
  • 2. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 4th ed. Shelton (CT): PMPH-USA; 2014
  • 3. Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge; 2013
  • 4. Osborne J. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment. 2019; 10 (7):1-9
  • 5. Field A, Hole G. How To Design and Report Experiments Sage. The Tyranny of Evaluation Human Factors in Computing Systems CHI Fringe; 2003
  • 6. Anders H. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley; 1998. p. xvii+795. ISBN 0-471-17912-4
  • 7. Rebecca M. Warner’s Applied Statistics: From Bivariate Through Multivariate Techniques. Second Edition. Thousand Oaks, California: SAGE Publications; 2012
  • 8. Sullivan LM, Artino AR Jr. Analyzing and interpreting continuous data using ordinal regression. Journal of Graduate Medical Education. 2013; 5 (4):542-543
  • 9. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; 2011
  • 10. Maxwell SE, Delaney HD, Kelley K. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge; 2017
  • 11. De Leeuw ED, Hox JJ. International Handbook of Survey Methodology. Routledge; 2008
  • 12. Chatfield C. The Analysis of Time Series: An Introduction. CRC Press; 2016
  • 13. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Pearson; 2013
  • 14. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016
  • 15. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage; 2012
  • 16. Howell DC. Statistical Methods for Psychology. Cengage Learning; 2013
  • 17. Wilcox RR. Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. CRC Press; 2017
  • 18. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. Pearson; 2019
  • 19. Beasley TM, Schumacker RE. Multiple regression approach to analyzing contingency tables: Post hoc and planned comparison procedures. Journal of Experimental Education. 2013; 81 (3):310-312
  • 20. Dodge Y. The Concise Encyclopedia of Statistics. Springer Science & Business Media; 2008
  • 21. Krzywinski M, Altman N. Points of significance: Visualizing samples with box plots. Nature Methods. 2014; 11 (2):119-120
  • 22. Cleveland WS. Visualizing data. Hobart Press; 1993
  • 23. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2019
  • 24. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis. 2008; 52 (3):1694-1711
  • 25. Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr, Desarbo WS. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons; 2017
  • 26. Aguinis H, Gottfredson RK. Statistical power analysis in HRM research. Organizational Research Methods. 2013; 16 (2):289-324
  • 27. Stevens JP. Applied Multivariate Statistics for the Social Sciences. Routledge; 2012
  • 28. Byrne BM. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming. Routledge; 2016
  • 29. Everitt BS, Hothorn T. An Introduction to Applied Multivariate Analysis with R. Springer; 2011
  • 30. Kosslyn SM. Graph Design for the Eye and Mind. Oxford University Press; 2006

© The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Continue reading from the same book

Edited by B. Santhosh Kumar

Published: 22 May 2024

By Hazhar Talaat Abubaker Blbas

59 downloads

By Joanne N. Halls, Barbara J. Lutz, Sara B. Jones an...

125 downloads

By Ajith Wickramasinghe and Anusha Jayasiri

43 downloads

IntechOpen Author/Editor? To get your discount, log in .

Discounts available on purchase of multiple copies. View rates

Local taxes (VAT) are calculated in later steps, if applicable.

Support: [email protected]

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

14 Quantitative analysis: Descriptive statistics

Numeric data collected in a research project can be analysed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarise themselves with one of these programs for understanding the concepts described in this chapter.

Data preparation

In research projects, data may be collected from a variety of sources: postal surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine-readable, numeric format, such as in a spreadsheet or a text file, so that they can be analysed by computer programs like SPSS or SAS. Data preparation usually follows the following steps:

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing a detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale, and whether this scale is a five-point, seven-point scale, etc.), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from ‘strongly disagree’ to ‘strongly agree’, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analysed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, if a survey measuring a construct such as ‘benefits of computers’ provided respondents with a checklist of benefits that they could select from, and respondents were encouraged to choose as many of those benefits as they wanted, then the total number of checked items could be used as an aggregate measure of benefits. Note that many other forms of data—such as interview transcripts—cannot be converted into a numeric format for statistical analysis. Codebooks are especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format—e.g., SPSS stores data as .sav files—which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database where it can be reorganised as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet created using a program such as Microsoft Excel, while larger datasets with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet, and each measurement item can be represented as one column. Data should be checked for accuracy during and after entry via occasional spot checks on a set of items or observations. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the ‘strongly agree’ response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

-1

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items—where items convey the opposite meaning of that of their underlying construct—should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate analysis

Univariate analysis—or analysis of a single variable—refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: frequency distribution, central tendency, and dispersion. The frequency distribution of a variable is a summary of the frequency—or percentages—of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services—as a gauge of their ‘religiosity’—using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for ‘did not answer’. If we count the number or percentage of observations within each category—except ‘did not answer’ which is really a missing value rather than a category—and display it in the form of a table, as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

Frequency distribution of religiosity

With very large samples, where observations are independent and random, the frequency distribution tends to follow a plot that looks like a bell-shaped curve—a smoothed bar chart of the frequency distribution—similar to that shown in Figure 14.2. Here most observations are clustered toward the centre of the range of values, with fewer and fewer observations clustered toward the extreme ends of the range. Such a curve is called a normal distribution .

(15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8=20.875

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

36-15=21

Bivariate analysis

Bivariate analysis examines how two variables are related to one another. The most common bivariate statistic is the bivariate correlation —often, simply called ‘correlation’—which is a number between -1 and +1 denoting the strength of the relationship between two variables. Say that we wish to study how age is related to self-esteem in a sample of 20 respondents—i.e., as age increases, does self-esteem increase, decrease, or remain unchanged?. If self-esteem increases, then we have a positive correlation between the two variables, if self-esteem decreases, then we have a negative correlation, and if it remains the same, we have a zero correlation. To calculate the value of this correlation, consider the hypothetical dataset shown in Table 14.1.

Normal distribution

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

\[H_0:\quad r = 0 \]

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Descriptive Statistics

Try Qualtrics for free

Descriptive statistics in research: a critical component of data analysis.

15 min read With any data, the object is to describe the population at large, but what does that mean and what processes, methods and measures are used to uncover insights from that data? In this short guide, we explore descriptive statistics and how it’s applied to research.

What do we mean by descriptive statistics?

With any kind of data, the main objective is to describe a population at large — and using descriptive statistics, researchers can quantify and describe the basic characteristics of a given data set.

For example, researchers can condense large data sets, which may contain thousands of individual data points or observations, into a series of statistics that provide useful information on the population of interest. We call this process “describing data”.

In the process of producing summaries of the sample, we use measures like mean, median, variance, graphs, charts, frequencies, histograms, box and whisker plots, and percentages. For datasets with just one variable, we use univariate descriptive statistics. For datasets with multiple variables, we use bivariate correlation and multivariate descriptive statistics.

Want to find out the definitions?

Univariate descriptive statistics: this is when you want to describe data with only one characteristic or attribute

Bivariate correlation: this is when you simultaneously analyze (compare) two variables to see if there is a relationship between them

Multivariate descriptive statistics: this is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable

Then, after describing and summarizing the data, as well as using simple graphical analyses, we can start to draw meaningful insights from it to help guide specific strategies. It’s also important to note that descriptive statistics can employ and use both quantitative and qualitative research .

Describing data is undoubtedly the most critical first step in research as it enables the subsequent organization, simplification and summarization of information — and every survey question and population has summary statistics. Let’s take a look at a few examples.

Examples of descriptive statistics

Consider for a moment a number used to summarize how well a striker is performing in football — goals scored per game. This number is simply the number of shots taken against how many of those shots hit the back of the net (reported to three significant digits). If a striker is scoring 0.333, that’s one goal for every three shots. If they’re scoring one in four, that’s 0.250.

A classic example is a student’s grade point average (GPA). This single number describes the general performance of a student across a range of course experiences and classes. It doesn’t tell us anything about the difficulty of the courses the student is taking, or what those courses are, but it does provide a summary that enables a degree of comparison with people or other units of data.

Ultimately, descriptive statistics make it incredibly easy for people to understand complex (or data intensive) quantitative or qualitative insights across large data sets.

Take your research to the next level with XM for Strategy & Research

Types of descriptive statistics

To quantitatively summarize the characteristics of raw, ungrouped data, we use the following types of descriptive statistics:

  • Measures of Central Tendency ,
  • Measures of Dispersion and
  • Measures of Frequency Distribution.

Following the application of any of these approaches, the raw data then becomes ‘grouped’ data that’s logically organized and easy to understand. To visually represent the data, we then use graphs, charts, tables etc.

Let’s look at the different types of measurement and the statistical methods that belong to each:

Measures of Central Tendency are used to describe data by determining a single representative of central value. For example, the mean, median or mode.

Measures of Dispersion are used to determine how spread out a data distribution is with respect to the central value, e.g. the mean, median or mode. For example, while central tendency gives the person the average or central value, it doesn’t describe how the data is distributed within the set.

Measures of Frequency Distribution are used to describe the occurrence of data within the data set (count).

The methods of each measure are summarized in the table below:

Measures of Central Tendency Measures of Dispersion Measures of Frequency Distribution
Mean Range Count
Median Standard deviation
Mode Quartile deviation
Variance
Absolute deviation

Mean: The most popular and well-known measure of central tendency. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

Median: The median is the middle score for a set of data that has been arranged in order of magnitude. If you have an even number of data, e.g. 10 data points, take the two middle scores and average the result.

Mode: The mode is the most frequently occurring observation in the data set.  

Range: The difference between the highest and lowest value.

Standard deviation: Standard deviation measures the dispersion of a data set relative to its mean and is calculated as the square root of the variance.

Quartile deviation : Quartile deviation measures the deviation in the middle of the data.

Variance: Variance measures the variability from the average of mean.

Absolute deviation: The absolute deviation of a dataset is the average distance between each data point and the mean.

Count: How often each value occurs.

Scope of descriptive statistics in research

Descriptive statistics (or analysis) is considered more vast than other quantitative and qualitative methods as it provides a much broader picture of an event, phenomenon or population.

But that’s not all: it can use any number of variables, and as it collects data and describes it as it is, it’s also far more representative of the world as it exists.

However, it’s also important to consider that descriptive analyses lay the foundation for further methods of study. By summarizing and condensing the data into easily understandable segments, researchers can further analyze the data to uncover new variables or hypotheses.

Mostly, this practice is all about the ease of data visualization. With data presented in a meaningful way, researchers have a simplified interpretation of the data set in question. That said, while descriptive statistics helps to summarize information, it only provides a general view of the variables in question.

It is, therefore, up to the researchers to probe further and use other methods of analysis to discover deeper insights.

Things you can do with descriptive statistics

Define subject characteristics

If a marketing team wanted to build out accurate buyer personas for specific products and industry verticals, they could use descriptive analyses on customer datasets (procured via a survey) to identify consistent traits and behaviors.

They could then ‘describe’ the data to build a clear picture and understanding of who their buyers are, including things like preferences, business challenges, income and so on.

Measure data trends

Let’s say you wanted to assess propensity to buy over several months or years for a specific target market and product. With descriptive statistics, you could quickly summarize the data and extract the precise data points you need to understand the trends in product purchase behavior.

Compare events, populations or phenomena

How do different demographics respond to certain variables? For example, you might want to run a customer study to see how buyers in different job functions respond to new product features or price changes. Are all groups as enthusiastic about the new features and likely to buy? Or do they have reservations? This kind of data will help inform your overall product strategy and potentially how you tier solutions.

Validate existing conditions

When you have a belief or hypothesis but need to prove it, you can use descriptive techniques to ascertain underlying patterns or assumptions.

Form new hypotheses

With the data presented and surmised in a way that everyone can understand (and infer connections from), you can delve deeper into specific data points to uncover deeper and more meaningful insights — or run more comprehensive research.

Guiding your survey design to improve the data collected

To use your surveys as an effective tool for customer engagement and understanding, every survey goal and item should answer one simple, yet highly important question:

What am I really asking?

It might seem trivial, but by having this question frame survey research, it becomes significantly easier for researchers to develop the right questions that uncover useful, meaningful and actionable insights.

Planning becomes easier, questions clearer and perspective far wider and yet nuanced.

Hypothesize – what’s the problem that you’re trying to solve? Far too often, organizations collect data without understanding what they’re asking, and why they’re asking it.

Finally, focus on the end result. What kind of data do you need to answer your question? Also, are you asking a quantitative or qualitative question? Here are a few things to consider:

  • Clear questions are clear for everyone. It takes time to make a concept clear
  • Ask about measurable, evident and noticeable activities or behaviors.
  • Make rating scales easy. Avoid long lists, confusing scales or “don’t know” or “not applicable” options.
  • Ensure your survey makes sense and flows well. Reduce the cognitive load on respondents by making it easy for them to complete the survey.
  • Read your questions aloud to see how they sound.
  • Pretest by asking a few uninvolved individuals to answer.

Furthermore…

As well as understanding what you’re really asking, there are several other considerations for your data:

Keep it random

How you select your sample is what makes your research replicable and meaningful. Having a truly random sample helps prevent bias, increasingly the quality of evidence you find.

Plan for and avoid sample error

Before starting your research project, have a clear plan for avoiding sample error. Use larger sample sizes, and apply random sampling to minimize the potential for bias.

Don’t over sample

Remember, you can sample 500 respondents selected randomly from a population and they will closely reflect the actual population 95% of the time.

Think about the mode

Match your survey methods to the sample you select. For example, how do your current customers prefer communicating? Do they have any shared characteristics or preferences? A mixed-method approach is critical if you want to drive action across different customer segments.

Use a survey tool that supports you with the whole process

Surveys created using a survey research software can support researchers in a number of ways:

  • Employee satisfaction survey template
  • Employee exit survey template
  • Customer satisfaction (CSAT) survey template
  • Ad testing survey template
  • Brand awareness survey template
  • Product pricing survey template
  • Product research survey template
  • Employee engagement survey template
  • Customer service survey template
  • NPS survey template
  • Product package testing survey template
  • Product features prioritization survey template

These considerations have been included in Qualtrics’ survey software , which summarizes and creates visualizations of data, making it easy to access insights, measure trends, and examine results without complexity or jumping between systems.

Uncover your next breakthrough idea with Stats iQ™

What makes Qualtrics so different from other survey providers is that it is built in consultation with trained research professionals and includes high-tech statistical software like Qualtrics Stats iQ .

With just a click, the software can run specific analyses or automate statistical testing and data visualization. Testing parameters are automatically chosen based on how your data is structured (e.g. categorical data will run a statistical test like Chi-squared), and the results are translated into plain language that anyone can understand and put into action.

Get more meaningful insights from your data

Stats iQ includes a variety of statistical analyses, including: describe, relate, regression, cluster, factor, TURF, and pivot tables — all in one place!

Confidently analyze complex data

Built-in artificial intelligence and advanced algorithms automatically choose and apply the right statistical analyses and return the insights in plain english so everyone can take action.

Integrate existing statistical workflows

For more experienced stats users, built-in R code templates allow you to run even more sophisticated analyses by adding R code snippets directly in your survey analysis.

Advanced statistical analysis methods available in Stats iQ

Regression analysis – Measures the degree of influence of independent variables on a dependent variable (the relationship between two or multiple variables).

Analysis of Variance (ANOVA) test – Commonly used with a regression study to find out what effect independent variables have on the dependent variable. It can compare multiple groups simultaneously to see if there is a relationship between them.

Conjoint analysis – Asks people to make trade-offs when making decisions, then analyses the results to give the most popular outcome. Helps you understand why people make the complex choices they do.

T-Test – Helps you compare whether two data groups have different mean values and allows the user to interpret whether differences are meaningful or merely coincidental.

Crosstab analysis – Used in quantitative market research to analyze categorical data – that is, variables that are different and mutually exclusive, and allows you to compare the relationship between two variables in contingency tables.

Go from insights to action

Now that you have a better understanding of descriptive statistics in research and how you can leverage statistical analysis methods correctly, now’s the time to utilize a tool that can take your research and subsequent analysis to the next level.

Try out a Qualtrics survey software demo so you can see how it can take you through descriptive research and further research projects from start to finish.

Related resources

Mixed methods research 17 min read, market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, request demo.

Ready to learn more about Qualtrics?

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Foundations
  • Measurement
  • Research Design
  • Conclusion Validity
  • Data Preparation
  • Correlation
  • Inferential Statistics
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

Descriptive statistics are typically distinguished from inferential statistics . With descriptive statistics you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data.

Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary. For instance, consider a simple number used to summarize how well a batter is performing in baseball, the batting average. This single number is simply the number of hits divided by the number of times at bat (reported to three significant digits). A batter who is hitting .333 is getting a hit one time in every three at bats. One batting .250 is hitting one time in four. The single number describes a large number of discrete events. Or, consider the scourge of many students, the Grade Point Average (GPA). This single number describes the general performance of a student across a potentially wide range of course experiences.

Every time you try to describe a large set of observations with a single indicator you run the risk of distorting the original data or losing important detail. The batting average doesn’t tell you whether the batter is hitting home runs or singles. It doesn’t tell whether she’s been in a slump or on a streak. The GPA doesn’t tell you whether the student was in difficult courses or easy ones, or whether they were courses in their major field or in other disciplines. Even given these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.

Univariate Analysis

Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at:

  • the distribution
  • the central tendency
  • the dispersion

In most situations, we would describe all three of these characteristics for each of the variables in our study.

The Distribution

The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. For instance, a typical way to describe the distribution of college students is by year in college, listing the number or percent of students at each of the four years. Or, we describe gender by listing the number or percent of males and females. In these cases, the variable has few enough values that we can list each one and summarize how many sample cases had the value. But what do we do for a variable like income or GPA? With these variables there can be a large number of possible values, with relatively few people having each one. In this case, we group the raw scores into categories according to ranges of values. For instance, we might look at GPA according to the letter grade ranges. Or, we might group income into four or five ranges of income values.

CategoryPercent
Under 35 years old9%
36–4521%
46–5545%
56–6519%
66+6%

One of the most common ways to describe a single variable is with a frequency distribution . Depending on the particular variable, all of the data values may be represented, or you may group the values into categories first (e.g. with age, price, or temperature variables, it would usually not be sensible to determine the frequencies for each value. Rather, the value are grouped into ranges and the frequencies determined.). Frequency distributions can be depicted in two ways, as a table or as a graph. The table above shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 1. This type of graph is often referred to as a histogram or bar chart.

Distributions may also be displayed using percentages. For example, you could use percentages to describe the:

  • percentage of people in different income levels
  • percentage of people in different age ranges
  • percentage of people in different ranges of standardized test scores

Central Tendency

The central tendency of a distribution is an estimate of the “center” of a distribution of values. There are three major types of estimates of central tendency:

The Mean or average is probably the most commonly used method of describing central tendency. To compute the mean all you do is add up all the values and divide by the number of values. For example, the mean or average quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:

The sum of these 8 values is 167 , so the mean is 167/8 = 20.875 .

The Median is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample. For example, if there are 500 scores in the list, score #250 would be the median. If we order the 8 scores shown above, we would get:

There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20 , the median is 20 . If the two middle scores had different values, you would have to interpolate to determine the median.

The Mode is the most frequently occurring value in the set of scores. To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the model. In some distributions there is more than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently.

Notice that for the same set of 8 scores we got three different values ( 20.875 , 20 , and 15 ) for the mean, median and mode respectively. If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode are all equal to each other.

Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15 , so the range is 36 - 15 = 21 .

The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample. Again lets take the set of scores:

to compute the standard deviation, we first find the distance between each value and the mean. We know from above that the mean is 20.875 . So, the differences from the mean are:

Notice that values that are below the mean have negative discrepancies and values above it have positive ones. Next, we square each discrepancy:

Now, we take these “squares” and sum them to get the Sum of Squares (SS) value. Here, the sum is 350.875 . Next, we divide this sum by the number of scores minus 1 . Here, the result is 350.875 / 7 = 50.125 . This value is known as the variance . To get the standard deviation, we take the square root of the variance (remember that we squared the deviations earlier). This would be SQRT(50.125) = 7.079901129253 .

Although this computation may seem convoluted, it’s actually quite simple. To see this, consider the formula for the standard deviation:

  • X is each score,
  • X̄ is the mean (or average),
  • n is the number of values,
  • Σ means we sum across the values.

In the top part of the ratio, the numerator, we see that each score has the mean subtracted from it, the difference is squared, and the squares are summed. In the bottom part, we take the number of scores minus 1 . The ratio is the variance and the square root is the standard deviation. In English, we can describe the standard deviation as:

the square root of the sum of the squared deviations from the mean divided by the number of scores minus one.

Although we can calculate these univariate statistics by hand, it gets quite tedious when you have more than a few values and variables. Every statistics program is capable of calculating them easily for you. For instance, I put the eight scores into SPSS and got the following table as a result:

MetricValue
N8
Mean20.8750
Median20.0000
Mode15.00
Standard Deviation7.0799
Variance50.1250
Range21.00

which confirms the calculations I did by hand above.

The standard deviation allows us to reach some conclusions about specific scores in our distribution. Assuming that the distribution of scores is normal or bell-shaped (or close to it!), the following conclusions can be reached:

  • approximately 68% of the scores in the sample fall within one standard deviation of the mean
  • approximately 95% of the scores in the sample fall within two standard deviations of the mean
  • approximately 99% of the scores in the sample fall within three standard deviations of the mean

For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799 , we can from the above statement estimate that approximately 95% of the scores will fall in the range of 20.875-(2*7.0799) to 20.875+(2*7.0799) or between 6.7152 and 35.0348 . This kind of information is a critical stepping stone to enabling us to compare the performance of an individual on one variable with their performance on another, even when the variables are measured on entirely different scales.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

Which descriptive statistics tool should you choose?

This article will help you choose the right descriptive statistics tool for your data. Each tool is available in Excel using the XLSTAT software.

The purpose of descriptive statistics

Describing data is an essential part of statistical analysis aiming to provide a complete picture of the data before moving to exploratory analysis or predictive modeling. The type of statistical methods used for this purpose are called descriptive statistics. They include both numerical (e.g. central tendency measures such as mean, mode, median or measures of variability) and graphical tools (e.g. histogram, box plot, scatter plot…) which give a summary of the dataset and extract important information such as central tendencies and variability. Moreover, we can use descriptive statistics to explore the association between two or several variables (bivariate or multivariate analysis).

For example, let’s say we have a data table which represents the results of a survey on the amount of money people spend on online shopping on a monthly average basis. Rows correspond to respondents and columns to the amount of money spent as well as the age group they belong to. Our goal is to extract important information from the survey and detect potential differences between the age groups. For this, we can simply summarize the results per group using common descriptive statistics, such as:

The mean and the median , that reflect the central tendency.

The standard deviation , the variance , and the variation coefficient, that reflect the dispersion .

In another example, using qualitative data, we consider a survey on commuting. Rows correspond to respondents and columns to the mode of transportation as well as to the city they live in. Our goal is to describe transportation preferences when commuting to work per city using: - The mode , reflecting the most frequent mode of commuting (the most frequent category).

The frequencies , reflecting how many times each mode of commuting appears as an answer.

The relative frequencies (percentages), which is the frequency divided by the total number of answers.

Bar charts and stacked bars, that graphically illustrate the relative frequencies by category.

A guide to choose a descriptive statistics tool according to the situation

In order to choose the right descriptive statistics tool, we need to consider the types and the number of variables we have as well as the objective of the study. Based on these three criteria we have generated a grid that will help you decide which tool to use according to your situation. The first column of the grid refers to data types:

Quantitative dataset: containing variables that describe quantities of the objects of interest. The values are numbers. The weight of an infant is an example of a quantitative variable.

Qualitative dataset: containing variables that describe qualities of the objects of interest (categorical or nominal data). These values are called categories, also referred as levels or modalities. The gender of an infant is an example of a qualitative variable. The possible values are the categories male and female. Qualitative variables are referred as nominal or categorical.

Mixed dataset: containing both types of variables.

The second column indicates the number of variables. The proposed tools can handle either the description of one (univariate analysis) or the description of the relationships between two (bivariate analysis) or several variables. The grid provides intuitive example for each situation as well as a link of a tutorial explaining how to apply each XLSTAT tool using a demo file.

Descriptive Statistics grid

Please note that the list below is not exhaustive. However, it contains the most commonly used descriptive statistics, all available in Excel using the XLSTAT add-on.

Quantitative One variable (univariate analysis) Estimate a frequency distribution How many people per age class attended this event? (here the investigated variable is age in a quantitative form)
Measure the central tendency of one sample What is the average grade in a classroom? Scattergram Strip plot
Measure the dispersion of one sample How widely or narrowly are the grade scores dispersed around the mean score in a classroom? , quartiles Scattergram Strip plot
Characterize the shape of a distribution Is the employee wage distribution in a company symmetric?
Visually control wether a sample follows a given distribution What is the theorical percentage of students who obtained a better note than a given threshold
Measure the position of a value within a sample What data point can be used to split the sample into 95% of low values and 5% of high values?
Detect extreme values Is the height of 184cm an extreme value in this group of students?
Two variables (bivariate analysis) Describe the association between two variables Does plant biomass increase or decrease with soil Pb content?
Several variables Describe the association between multiple variables What is the evolution of the life expectancy, the fertility rate and the size of population over the last 10 years in this country? (up to 3 variables to describe over time) or (up to 3 variables to describe)
Describe the association between three variables under specific conditions How to visualize the proportions of three ice cream ingredients in several ice scream samples?
Two matrices of several variables Describe the association between two matrices Does the evaluation of a series of products differ from a panel to another?
Qualitative One variable (univariate analysis) Compute the frequencies of different categories How many clients said they are satisfied by the service and how many said they were not?
Detect the most frequent category Which is the most frequent hair color in this country?
Two variables (bivariate analysis) Measure the association between two variables Does the presence of a trace element change according to the presence of another trace element? Stacked or clustered bars
Mixed (quantitative & qualitative) Two variables (bivariate analysis) Describe the relationship between a binary and a continuous variable Is the concentration of a molecule in rats linked to the rats' sex (M/F)?
Describe the relationship between a categorical and a continuous variable Does sepal length differ between three flower species?
Several variables (multivariate analysis) Describe the relationship between one categorical and two quantitative variables Does the amount of money spent on this commercial website change according to the age class and the salary of the customers? (with groups)

How to run descriptive statistics in XLSTAT?

In XLSTAT, you will find a large variety of descriptive statistics tools in the Describing data menu. The most popular feature is Descriptive Statistics . All you have to do is select your data on the Excel sheet, then set up the dialog box and click OK. It's simple and quick. If you do not have XLSTAT, download for free our 14-Day version.

XLSTAT dialog box for Descriptive Statistics-General tab

Outputs for quantitative data

Statistics : Min./max. value, 1st quartile, median, 3rd quartile, range, sum, mean, geometric mean, harmonic mean, kurtosis (Pearson), skewness (Pearson), kurtosis, skewness, CV (standard deviation/mean), sample variance, estimated variance, standard deviation of a sample, estimated standard deviation, mean absolute deviation, standard deviation of the mean.

Graphs : box plots, scattergrams, strip plots, Q-Q plots, p-p plots, stem and leaf plots. It is possible group together the various box plots, scattergrams and strip plots on the same chart, sort them by mean and color by group to compare them.

Outputs for qualitative data

Statistics : No. of categories, mode, mode frequency, mode weight, % mode, relative frequency of the mode, frequency, weight of the category, percentage of the category, relative frequency of the category

Graphs : Bar charts, pie charts, double pie charts, doughnuts, stacked bars, multiple bars

XLSTAT has developed a series of statistics tutorials that will provide you with a theorical background on inferential statistical, data modeling, clustering, multivariate data analysis and more. These guides will also help you in choosing an appropriate statistical method to investigate the question you are asking.

Which statistical test to use?

Which statistical model should you use?

Which multivariate data analysis method to choose?

Which clustering method should you choose?

Choosing an appropriate time series analysis method

Comparison of supervised machine learning algorithms

Source: Introductory Statistics: Exploring the World Through Data: Robert Gould and Colle n Ryan**

Was this article useful?

Similar articles

  • Free Case Studies and White Papers
  • How to interpret goodness of fit statistics in regression analysis?
  • Webinar XLSTAT: Sensory data analysis - Part 1 - Evaluating differences between products
  • What is statistical modeling?
  • Statistics Tutorials for choosing the right statistical method
  • Which statistical model should you choose?

Expert Software for Better Insights, Research, and Outcomes

Chapter 14 Quantitative Analysis Descriptive Statistics

Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarize themselves with one of these programs for understanding the concepts described in this chapter.

Data Preparation

In research projects, data may be collected from a variety of sources: mail-in surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine -readable, numeric format, such as in a spreadsheet or a text file, so that they can be analyzed by computer programs like SPSS or SAS. Data preparation usually follows the following steps.

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a five-point, seven-point, or some other type of scale), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from “strongly disagree” to “strongly agree”, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analyzed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, for measuring a construct such as “benefits of computers,” if a survey provided respondents with a checklist of b enefits that they could select from (i.e., they could choose as many of those benefits as they wanted), then the total number of checked items can be used as an aggregate measure of benefits. Note that many other forms of data, such as interview transcripts, cannot be converted into a numeric format for statistical analysis. Coding is especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format (e.g., SPSS stores data as .sav files), which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database, where they can be reorganized as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet such as Microsoft Excel, while larger dataset with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet and each measurement item can be represented as one column. The entered data should be frequently checked for accuracy, via occasional spot checks on a set of items or observations, during and after entry. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the “strongly agree” response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

Missing values. Missing data is an inevitable part of any empirical data set. Respondents may not answer certain questions if they are ambiguously worded or too sensitive. Such problems should be detected earlier during pretests and corrected before the main data collection process begins. During data entry, some statistical programs automatically treat blank entries as missing values, while others require a specific numeric value such as -1 or 999 to be entered to denote a missing value. During data analysis, the default mode of handling missing values in most software programs is to simply drop the entire observation containing even a single missing value, in a technique called listwise deletion . Such deletion can significantly shrink the sample size and make it extremely difficult to detect small effects. Hence, some software programs allow the option of replacing missing values with an estimated value via a process called imputation . For instance, if the missing value is one item in a multi-item scale, the imputed value may be the average of the respondent’s responses to remaining items on that scale. If the missing value belongs to a single-item scale, many researchers use the average of other respondent’s responses to that item as the imputed value. Such imputation may be biased if the missing value is of a systematic nature rather than a random nature. Two methods that can produce relatively unbiased estimates for imputation are the maximum likelihood procedures and multiple imputation methods, both of which are supported in popular software programs such as SPSS and SAS.

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items, where items convey the opposite meaning of that of their underlying construct, should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate Analysis

Univariate analysis, or analysis of a single variable, refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: (1) frequency distribution, (2) central tendency, and (3) dispersion. The frequency distribution of a variable is a summary of the frequency (or percentages) of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services (as a measure of their “religiosity”) using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for “did not answer.” If we count the number (or percentage) of observations within each category (except “did not answer” which is really a missing value rather than a category), and display it in the form of a table as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

descriptive statistical tools in research

Figure 14.1. Frequency distribution of religiosity.

With very large samples where observations are independent and random, the frequency distribution tends to follow a plot that looked like a bell-shaped curve (a smoothed bar chart of the frequency distribution) similar to that shown in Figure 14.2, where most observations are clustered toward the center of the range of values, and fewer and fewer observations toward the extreme ends of the range. Such a curve is called a normal distribution.

Central tendency is an estimate of the center of a distribution of values. There are three major estimates of central tendency: mean, median, and mode. The arithmetic mean (often simply called the “mean”) is the simple average of all values in a given distribution. Consider a set of eight test scores: 15, 22, 21, 18, 36, 15, 25, 15. The arithmetic mean of these values is (15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8 = 20.875. Other types of means include geometric mean (n th root of the product of n numbers in a distribution) and harmonic mean (the reciprocal of the arithmetic means of the reciprocal of each value in a distribution), but these means are not very popular for statistical analysis of social research data.

The second measure of central tendency, the median , is the middle value within a range of values in a distribution. This is computed by sorting all values in a distribution in increasing order and selecting the middle value. In case there are two middle values (if there is an even number of values in a distribution), the average of the two middle values represent the median. In the above example, the sorted values are: 15, 15, 15, 18, 22, 21, 25, 36. The two middle values are 18 and 22, and hence the median is (18 + 22)/2 = 20.

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

Dispersion refers to the way values are spread around the central tendency, for example, how tightly or how widely are the values clustered around the mean. Two common measures of dispersion are the range and standard deviation. The range is the difference between the highest and lowest values in a distribution. The range in our previous example is 36-15 = 21.

The range is particularly sensitive to the presence of outliers. For instance, if the highest value in the above distribution was 85 and the other vales remained the same, the range would be 85-15 = 70. Standard deviation , the second measure of dispersion, corrects for such outliers by using a formula that takes into account how close or how far each value from the distribution mean:

descriptive statistical tools in research

Figure 14.2. Normal distribution.

descriptive statistical tools in research

Table 14.1. Hypothetical data on age and self-esteem.

The two variables in this dataset are age (x) and self-esteem (y). Age is a ratio-scale variable, while self-esteem is an average score computed from a multi-item self-esteem scale measured using a 7-point Likert scale, ranging from “strongly disagree” to “strongly agree.” The histogram of each variable is shown on the left side of Figure 14.3. The formula for calculating bivariate correlation is:

descriptive statistical tools in research

Figure 14.3. Histogram and correlation plot of age and self-esteem.

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

H 0 : r = 0

H 1 : r ≠ 0

H 0 is called the null hypotheses , and H 1 is called the alternative hypothesis (sometimes, also represented as H a ). Although they may seem like two hypotheses, H 0 and H 1 actually represent a single hypothesis since they are direct opposites of each other. We are interested in testing H 1 rather than H 0 . Also note that H 1 is a non-directional hypotheses since it does not specify whether r is greater than or less than zero. Directional hypotheses will be specified as H 0 : r ≤ 0; H 1 : r > 0 (if we are testing for a positive correlation). Significance testing of directional hypothesis is done using a one-tailed t-test, while that for non-directional hypothesis is done using a two-tailed t-test.

In statistical testing, the alternative hypothesis cannot be tested directly. Rather, it is tested indirectly by rejecting the null hypotheses with a certain level of probability. Statistical testing is always probabilistic, because we are never sure if our inferences, based on sample data, apply to the population, since our sample never equals the population. The probability that a statistical inference is caused pure chance is called the p-value . The p-value is compared with the significance level (α), which represents the maximum level of risk that we are willing to take that our inference is incorrect. For most statistical analysis, α is set to 0.05. A p-value less than α=0.05 indicates that we have enough statistical evidence to reject the null hypothesis, and thereby, indirectly accept the alternative hypothesis. If p>0.05, then we do not have adequate statistical evidence to reject the null hypothesis or accept the alternative hypothesis.

The easiest way to test for the above hypothesis is to look up critical values of r from statistical tables available in any standard text book on statistics or on the Internet (most software programs also perform significance testing). The critical value of r depends on our desired significance level (α = 0.05), the degrees of freedom (df), and whether the desired test is a one-tailed or two-tailed test. The degree of freedom is the number of values that can vary freely in any calculation of a statistic. In case of correlation, the df simply equals n – 2, or for the data in Table 14.1, df is 20 – 2 = 18. There are two different statistical tables for one-tailed and two -tailed test. In the two -tailed table, the critical value of r for α = 0.05 and df = 18 is 0.44. For our computed correlation of 0.79 to be significant, it must be larger than the critical value of 0.44 or less than -0.44. Since our computed value of 0.79 is greater than 0.44, we conclude that there is a significant correlation between age and self-esteem in our data set, or in other words, the odds are less than 5% that this correlation is a chance occurrence. Therefore, we can reject the null hypotheses that r ≤ 0, which is an indirect way of saying that the alternative hypothesis r > 0 is probably correct.

Most research studies involve more than two variables. If there are n variables, then we will have a total of n*(n-1)/2 possible correlations between these n variables. Such correlations are easily computed using a software program like SPSS, rather than manually using the formula for correlation (as we did in Table 14.1), and represented using a correlation matrix, as shown in Table 14.2. A correlation matrix is a matrix that lists the variable names along the first row and the first column, and depicts bivariate correlations between pairs of variables in the appropriate cell in the matrix. The values along the principal diagonal (from the top left to the bottom right corner) of this matrix are always 1, because any variable is always perfectly correlated with itself. Further, since correlations are non-directional, the correlation between variables V1 and V2 is the same as that between V2 and V1. Hence, the lower triangular matrix (values below the principal diagonal) is a mirror reflection of the upper triangular matrix (values above the principal diagonal), and therefore, we often list only the lower triangular matrix for simplicity. If the correlations involve variables measured using interval scales, then this specific type of correlations are called Pearson product moment correlations .

Another useful way of presenting bivariate data is cross-tabulation (often abbreviated to cross-tab, and sometimes called more formally as a contingency table). A cross-tab is a table that describes the frequency (or percentage) of all combinations of two or more nominal or categorical variables. As an example, let us assume that we have the following observations of gender and grade for a sample of 20 students, as shown in Figure 14.3. Gender is a nominal variable (male/female or M/F), and grade is a categorical variable with three levels (A, B, and C). A simple cross-tabulation of the data may display the joint distribution of gender and grades (i.e., how many students of each gender are in each grade category, as a raw frequency count or as a percentage) in a 2 x 3 matrix. This matrix will help us see if A, B, and C grades are equally distributed across male and female students. The cross-tab data in Table 14.3 shows that the distribution of A grades is biased heavily toward female students: in a sample of 10 male and 10 female students, five female students received the A grade compared to only one male students. In contrast, the distribution of C grades is biased toward male students: three male students received a C grade, compared to only one female student. However, the distribution of B grades was somewhat uniform, with six male students and five female students. The last row and the last column of this table are called marginal totals because they indicate the totals across each category and displayed along the margins of the table.

descriptive statistical tools in research

Table 14.2. A hypothetical correlation matrix for eight variables.

descriptive statistical tools in research

Table 14.3. Example of cross-tab analysis.

Although we can see a distinct pattern of grade distribution between male and female students in Table 14.3, is this pattern real or “statistically significant”? In other words, do the above frequency counts differ from that that may be expected from pure chance? To answer this question, we should compute the expected count of observation in each cell of the 2 x 3 cross-tab matrix. This is done by multiplying the marginal column total and the marginal row total for each cell and dividing it by the total number of observations. For example, for the male/A grade cell, expected count = 5 * 10 / 20 = 2.5. In other words, we were expecting 2.5 male students to receive an A grade, but in reality, only one student received the A grade. Whether this difference between expected and actual count is significant can be tested using a chi-square test . The chi-square statistic can be computed as the average difference between observed and expected counts across all cells. We can then compare this number to the critical value associated with a desired probability level (p < 0.05) and the degrees of freedom, which is simply (m-1)*(n-1), where m and n are the number of rows and columns respectively. In this example, df = (2 – 1) * (3 – 1) = 2. From standard chi-square tables in any statistics book, the critical chi-square value for p=0.05 and df=2 is 5.99. The computed chi -square value, based on our observed data, is 1.00, which is less than the critical value. Hence, we must conclude that the observed grade pattern is not statistically different from the pattern that can be expected by pure chance.

  • Social Science Research: Principles, Methods, and Practices. Authored by : Anol Bhattacherjee. Provided by : University of South Florida. Located at : http://scholarcommons.usf.edu/oa_textbooks/3/ . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

What is Descriptive Statistics? Definition, Types, Examples

Appinio Research · 23.11.2023 · 37min read

What is Descriptive Statistics Definition Types Examples

Have you ever wondered how we make sense of the vast sea of data surrounding us? In a world overflowing with information, the ability to distill complex datasets into meaningful insights is a skill of immense importance.

This guide will equip you with the knowledge and tools to unravel the stories hidden within data. Whether you're a data analyst, a researcher, a business professional, or simply curious about the art of data interpretation, this guide will demystify the fundamental concepts and techniques of descriptive statistics, empowering you to explore, understand, and communicate data like a seasoned expert.

What is Descriptive Statistics?

Descriptive statistics  refers to a set of mathematical and graphical tools used to summarize and describe essential features of a dataset. These statistics provide a clear and concise representation of data, enabling researchers, analysts, and decision-makers to gain valuable insights, identify patterns, and understand the characteristics of the information at hand.

Purpose of Descriptive Statistics

The primary purpose of descriptive statistics is to simplify and condense complex data into manageable, interpretable summaries. Descriptive statistics serve several key objectives:

  • Data Summarization:  They provide a compact summary of the main characteristics of a dataset, allowing individuals to grasp the essential features quickly.
  • Data Visualization:  Descriptive statistics often accompany visual representations, such as histograms, box plots, and bar charts, making it easier to interpret and communicate data trends and distributions.
  • Data Exploration:  They facilitate the exploration of data to identify outliers, patterns, and potential areas of interest or concern.
  • Data Comparison:  Descriptive statistics enable the comparison of datasets, groups, or variables, aiding in decision-making and hypothesis testing.
  • Informed Decision-Making:  By providing a clear understanding of data, descriptive statistics support informed decision-making across various domains, including business, healthcare, social sciences, and more.

Importance of Descriptive Statistics in Data Analysis

Descriptive statistics play a pivotal role in data analysis by providing a foundation for understanding, summarizing, and interpreting data. Their importance is underscored by their widespread use in diverse fields and industries.

Here are key reasons why descriptive statistics are crucial in data analysis:

  • Data Simplification:  Descriptive statistics simplify complex datasets, making them more accessible to analysts and decision-makers. They condense extensive information into concise metrics and visual representations.
  • Initial Data Assessment:  Descriptive statistics are often the first step in data analysis. They help analysts gain a preliminary understanding of the data's characteristics and identify potential areas for further investigation.
  • Data Visualization:  Descriptive statistics are often paired with visualizations, enhancing data interpretation. Visual representations, such as histograms and scatter plots, provide intuitive insights into data patterns.
  • Communication and Reporting:  Descriptive statistics serve as a common language for conveying data insights to a broader audience. They are instrumental in research reports, presentations, and data-driven decision-making.
  • Quality Control:  In manufacturing and quality control processes, descriptive statistics help monitor and maintain product quality by identifying deviations from desired standards.
  • Risk Assessment:  In finance and insurance, descriptive statistics, such as standard deviation and variance, are used to assess and manage risk associated with investments and policies.
  • Healthcare Decision-Making:  Descriptive statistics inform healthcare professionals about patient demographics , treatment outcomes, and disease prevalence, aiding in clinical decision-making and healthcare policy formulation.
  • Market Analysis :  In marketing and consumer research, descriptive statistics reveal customer preferences, market trends, and product performance, guiding marketing strategies and product development .
  • Scientific Research:  In scientific research, descriptive statistics are fundamental for summarizing experimental results, comparing groups, and identifying meaningful patterns in data.
  • Government and Policy:  Government agencies use descriptive statistics to collect and analyze data on demographics, economics, and social trends to inform policy decisions and resource allocation.

Descriptive statistics serve as a critical foundation for effective data analysis and decision-making across a wide range of disciplines. They empower individuals and organizations to extract meaningful insights from data, enabling more informed and evidence-based choices.

Data Collection and Preparation

First, let's delve deeper into the crucial initial data collection and preparation steps. These initial stages lay the foundation for effective descriptive statistics.

Data Sources

When embarking on a data analysis journey, you must first identify your data sources. These sources can be categorized into two main types:

  • Primary Data :  This data is collected directly from original sources. It includes surveys, experiments, and observations tailored to your specific research objectives. Primary data offers high relevance and control over the data collection process.
  • Secondary Data :  Secondary data, on the other hand, is data that already exists and has been collected by someone else for a different purpose. It can include publicly available datasets, reports, and databases. Secondary data can save time and resources but may not always align perfectly with your research needs.

Understanding the nature of your data is fundamental. Data can be classified into two primary types:

  • Quantitative Data :  Quantitative data consists of numeric values and is often used for measurements and calculations. Examples include age, income, temperature, and test scores. Quantitative data can further be categorized as discrete (countable) or continuous (measurable).
  • Qualitative Data :  Qualitative data, also known as categorical data, represents categories or labels and cannot be measured numerically. Examples include gender, color, and product categories. Qualitative data can be nominal (categories with no specific order) or ordinal (categories with a meaningful order).

Data Cleaning and Preprocessing

Once you have your data in hand, preparing it for analysis is essential. Data cleaning and preprocessing involve several critical steps:

Handling Missing Data

Missing data can significantly impact your analysis. There are various approaches to address missing values:

  • Deletion:  You can remove rows or columns with missing data, but this may lead to a loss of valuable information.
  • Imputation:  Imputing missing values involves estimating or filling in the missing data using methods such as mean imputation, median imputation, or advanced techniques like regression imputation.

Outlier Detection

Outliers are data points that deviate significantly from the rest of the data. Detecting and handling outliers is crucial to prevent them from skewing your results. Popular methods for identifying outliers include box plots and z-scores.

Data Transformation

Data transformation aims to normalize or standardize the data to make it more suitable for analysis. Common transformations include:

  • Normalization:  Scaling data to a standard range, often between 0 and 1.
  • Standardization:  Transforming data to have a mean of 0 and a standard deviation of 1.

Data Organization and Presentation

Organizing and presenting your data effectively is essential for meaningful analysis and communication. Here's how you can achieve this:

Data Tables

Data tables are a straightforward way to present your data, especially when dealing with smaller datasets. They allow you to list data in rows and columns, making it easy to review and perform basic calculations.

Graphs and Charts

Visualizations play a pivotal role in conveying the message hidden within your data. Some common types of graphs and charts include:

  • Histograms:  Histograms display the distribution of continuous data by dividing it into intervals or bins and showing the frequency of data points within each bin.
  • Bar Charts:  Bar charts are excellent for representing categorical or discrete data . They display categories on one axis and corresponding values on the other.
  • Line Charts:  Line charts are useful for identifying trends over time, making them suitable for time series data.
  • Scatter Plots:  Scatter plots help visualize the relationship between two variables, making them valuable for identifying correlations.
  • Pie Charts:  Pie charts are suitable for displaying the composition of a whole in terms of its parts, often as percentages.

Summary Statistics

Calculating summary statistics, such as the mean, median, and standard deviation, provides a quick snapshot of your data's central tendencies and variability.

When it comes to data collection and visualization, Appinio offers a seamless solution that simplifies the process. In Appinio, creating interactive visualizations is the easiest way to understand and present your data effectively. These visuals help you uncover insights and patterns within your data, making it a valuable tool for anyone seeking to make data-driven decisions.

Book a demo today to explore how Appinio can enhance your data collection and visualization efforts, ultimately empowering your decision-making process!

Book a Demo

Measures of Central Tendency

Measures of central tendency are statistics that provide insight into the central or typical value of a dataset. They help you understand where the data tends to cluster, which is crucial for drawing meaningful conclusions.

The mean, also known as the average, is the most widely used measure of central tendency. It is calculated by summing all the values in a dataset and then dividing by the total number of values. The formula for the mean (μ) is:

μ = (Σx) / N
  • μ represents the mean.
  • Σx represents the sum of all individual data points.
  • N is the total number of data points.

The mean is highly sensitive to outliers and extreme values in the dataset. It's an appropriate choice for normally distributed data.

The median is another measure of central tendency that is less influenced by outliers compared to the mean. To find the median, you first arrange the data in ascending or descending order and then locate the middle value. If there's an even number of data points, the median is the average of the two middle values.

For example, in the dataset [3, 5, 7, 8, 10], the median is 7.

The mode is the value that appears most frequently in a dataset. Unlike the mean and median, which are influenced by the actual values, the mode represents the data point with the highest frequency of occurrence.

In the dataset [3, 5, 7, 8, 8], the mode is 8.

Choosing the Right Measure

Selecting the appropriate measure of central tendency depends on the nature of your data and your research objectives:

  • Use the  mean  for normally distributed data without significant outliers.
  • Choose the  median  when dealing with skewed data or data with outliers.
  • The  mode  is most useful for categorical data  or nominal data .

Understanding these measures and when to apply them is crucial for accurate data analysis and interpretation.

Measures of Variability

The measures of variability provide insights into how spread out or dispersed your data is. These measures complement the central tendency measures discussed earlier and are essential for a comprehensive understanding of your dataset.

The range is the simplest measure of variability and is calculated as the difference between the maximum and minimum values in your dataset. It offers a quick assessment of the spread of your data.

Range = Maximum Value - Minimum Value

For example, consider a dataset of daily temperatures in Celsius for a month:

  • Maximum temperature: 30°C
  • Minimum temperature: 10°C

The range would be 30°C - 10°C = 20°C, indicating a 20-degree Celsius spread in temperature over the month.

Variance measures the average squared deviation of each data point from the mean. It quantifies the overall dispersion of data points. The formula for variance (σ²) is as follows:

σ² = Σ(x - μ)² / N
  • σ² represents the variance.
  • Σ represents the summation symbol.
  • x represents each individual data point.
  • μ is the mean of the dataset.

Calculating the variance involves the following:

  • Find the mean (μ) of the dataset.
  • For each data point, subtract the mean (x - μ).
  • Square the result for each data point [(x - μ)²].
  • Sum up all the squared differences [(Σ(x - μ)²)].
  • Divide by the total number of data points (N) to get the variance.

A higher variance indicates greater variability among data points, while a lower variance suggests data points are closer to the mean.

Standard Deviation

The standard deviation is a widely used measure of variability and is simply the square root of the variance. It provides a more interpretable value and is often preferred for reporting. The formula for standard deviation (σ) is:

Calculating the standard deviation follows the same process as variance but with an additional step of taking the square root of the variance. It represents the average deviation of data points from the mean in the same units as the data.

For example, if the variance is calculated as 16 (square units), the standard deviation would be 4 (the same units as the data). A smaller standard deviation indicates data points are closer to the mean, while a larger standard deviation indicates greater variability.

Interquartile Range (IQR)

The interquartile range (IQR) is a robust measure of variability that is less influenced by extreme values (outliers) than the range, variance, or standard deviation. It is based on the quartiles of the dataset. To calculate the IQR:

  • Arrange the data in ascending order.
  • Calculate the first quartile (Q1), which is the median of the lower half of the data.
  • Calculate the third quartile (Q3), which is the median of the upper half of the data.
  • Subtract Q1 from Q3 to find the IQR.
IQR = Q3 - Q1

The IQR represents the range within which the central 50% of your data falls. It provides valuable information about the middle spread of your dataset, making it a useful measure for skewed or non-normally distributed data.

Data Distribution

Understanding the distribution of your data is essential for making meaningful inferences and choosing appropriate statistical methods. In this section, we will explore different aspects of data distribution.

Normal Distribution

The normal distribution, also known as the Gaussian distribution or bell curve, is a fundamental concept in statistics. It is characterized by a symmetric, bell-shaped curve. In a normal distribution:

  • The mean, median, and mode are all equal and located at the center of the distribution.
  • Data points are evenly spread around the mean.
  • The distribution is defined by two parameters: mean (μ) and standard deviation (σ).

The normal distribution is essential in various statistical tests and modeling techniques. Many natural phenomena, such as heights and IQ scores, closely follow a normal distribution. It serves as a reference point for understanding other distributions and statistical analyses.

Skewness and Kurtosis

Skewness and kurtosis are measures that provide insights into the shape of a data distribution:

Skewness quantifies the asymmetry of a distribution. A distribution can be:

  • Positively Skewed (Right-skewed):  In a positively skewed distribution, the tail extends to the right, and the majority of data points are concentrated on the left side of the distribution. The mean is typically greater than the median.
  • Negatively Skewed (Left-skewed):  In a negatively skewed distribution, the tail extends to the left, and the majority of data points are concentrated on the right side of the distribution. The mean is typically less than the median.

Skewness is calculated using various formulas, including Pearson's first coefficient of skewness.

Kurtosis measures the "tailedness" of a distribution, indicating whether the distribution has heavy or light tails compared to a normal distribution. Kurtosis can be:

  • Leptokurtic:  A distribution with positive kurtosis has heavier tails and a more peaked central region than a normal distribution.
  • Mesokurtic:  A distribution with kurtosis equal to that of a normal distribution.
  • Platykurtic:  A distribution with negative kurtosis has lighter tails and a flatter central region than a normal distribution.

Kurtosis is calculated using different formulas, including the fourth standardized moment.

Understanding skewness and kurtosis helps you assess the departure of your data from normality and choose appropriate statistical methods.

Other Types of Distributions

While the normal distribution is prevalent, real-world data often follows different distributions. Some other types of distributions you may encounter include:

  • Exponential Distribution:  Commonly used for modeling the time between events in a Poisson process, such as arrival times in a queue.
  • Poisson Distribution:  Used for counting the number of events in a fixed interval of time or space, such as the number of phone calls received in an hour.
  • Binomial Distribution:  Suitable for modeling the number of successes in a fixed number of independent Bernoulli trials.
  • Lognormal Distribution:  Often used for data that is the product of many small, independent, positive factors, such as stock prices.
  • Uniform Distribution:  Represents a constant probability over a specified range of values, making all outcomes equally likely.

Understanding the characteristics and properties of these distributions is crucial for selecting appropriate statistical techniques and making accurate interpretations in various fields of study and data analysis.

Visualizing Data

Visualizing data is a powerful way to gain insights and understand the patterns and characteristics of your dataset. Below are several standard methods of data visualization.

Histograms  are a widely used graphical representation of the distribution of continuous data. They are particularly useful for understanding the shape of the data's frequency distribution. Here's how they work:

  • Data is divided into intervals, or "bins."
  • The number of data points falling into each bin is represented by the height of bars on a graph.
  • The bars are typically adjacent and do not have gaps between them.

Histograms help you visualize the central tendency, spread, and skewness of your data. They can reveal whether your data is normally distributed, skewed to the left or right, or exhibits multiple peaks.

Histograms are especially useful when you have a large dataset and want to quickly assess its distribution. They are commonly used in fields like finance to analyze stock returns, biology to study species distribution, and quality control to monitor manufacturing processes.

Box plots , also known as box-and-whisker plots, are excellent tools for visualizing the distribution of data, particularly for identifying outliers and comparing multiple datasets. Here's how they are constructed:

  • The box represents the interquartile range (IQR), with the lower edge of the box at the first quartile (Q1) and the upper edge at the third quartile (Q3).
  • A vertical line inside the box indicates the median (Q2).
  • Whiskers extend from the edges of the box to the minimum and maximum values within a certain range.
  • Outliers, which are data points significantly outside the whiskers, are often shown as individual points.

Box plots provide a concise summary of data distribution, including central tendency and variability. They are beneficial when comparing data distribution across different categories or groups.

Box plots are commonly used in fields like healthcare to compare patient outcomes by treatment, in education to assess student performance across schools, and in market research to analyze customer ratings for different products.

Scatter Plots

Scatter plots  are a valuable tool for visualizing the relationship between two continuous variables. They are handy for identifying patterns, trends, and correlations in data. Here's how they work:

  • Each data point is represented as a point on the graph, with one variable on the x-axis and the other on the y-axis.
  • The resulting plot shows the dispersion and clustering of data points, allowing you to assess the strength and direction of the relationship.

Scatter plots help you determine whether there is a positive, negative, or no correlation between the variables. Additionally, they can reveal outliers and influential data points that may affect the relationship.

Scatter plots are commonly used in fields like economics to analyze the relationship between income and education, environmental science to study the correlation between temperature and plant growth, and marketing to understand the relationship between advertising spend and sales.

Frequency Distributions

Frequency distributions  are a tabular way to organize and display categorical or discrete data. They show the count or frequency of each category within a dataset. Here's how to create a frequency distribution:

  • Identify the distinct categories or values in your dataset.
  • Count the number of occurrences of each category.
  • Organize the results in a table, with categories in one column and their respective frequencies in another.

Frequency distributions help you understand the distribution of categorical data, identify dominant categories, and detect any rare or uncommon values. They are commonly used in fields like marketing to analyze customer demographics, in education to assess student grades, and in social sciences to study survey responses.

Descriptive Statistics for Categorical Data

Categorical data requires its own set of descriptive statistics to gain insights into the distribution and characteristics of these non-numeric variables. There are various methods for describing categorical data.

Frequency Tables

Frequency tables , also known as contingency tables, summarize categorical data by displaying the count or frequency of each category within one or more variables. Here's how they are created:

  • List the categories or values of the categorical variable(s) in rows or columns.
  • Count the occurrences of each category and record the frequencies.

Frequency tables are best used for summarizing and comparing categorical data across different groups or dimensions. They provide a straightforward way to understand data distribution and identify patterns or associations.

For example, in a survey about favorite ice cream flavors , a frequency table might show how many respondents prefer vanilla, chocolate, strawberry, and other flavors.

Bar charts  are a common graphical representation of categorical data. They are similar to histograms but are used for displaying categorical variables. Here's how they work:

  • Categories are listed on one axis (usually the x-axis), while the corresponding frequencies or counts are shown on the other axis (usually the y-axis).
  • Bars are drawn for each category, with the height of each bar representing the frequency or count of that category.

Bar charts make it easy to compare the frequencies of different categories visually. They are especially helpful for presenting categorical data in a visually appealing and understandable way.

Bar charts are commonly used in fields like market research to display survey results, in social sciences to illustrate demographic information, and in business to show product sales by category.

Pie charts  are circular graphs that represent the distribution of categorical data as "slices of a pie." Here's how they are constructed:

  • Categories or values are represented as segments or slices of the pie, with each segment's size proportional to its frequency or count.

Pie charts are effective for showing the relative proportions of different categories within a dataset. They are instrumental when you want to emphasize the composition of a whole in terms of its parts.

Pie charts are commonly used in areas such as marketing to display market share, in finance to show budget allocations, and in demographics to illustrate the distribution of ethnic groups within a population.

These methods for visualizing and summarizing categorical data are essential for gaining insights into non-numeric variables and making informed decisions based on the distribution of categories within a dataset.

Descriptive Statistics Summary and Interpretation

Summarizing and interpreting descriptive statistics gives you the skills to extract meaningful insights from your data and apply them to real-world scenarios.

Summarizing Descriptive Statistics

Once you've collected and analyzed your data using descriptive statistics, the next step is to summarize the findings. This involves condensing the wealth of information into a few key points:

  • Central Tendency:  Summarize the central tendency of your data. If it's a numeric dataset, mention the mean, median, and mode as appropriate. For categorical data, highlight the most frequent categories.
  • Variability:  Describe the spread of the data using measures like range, variance, and standard deviation. Discuss whether the data is tightly clustered or widely dispersed.
  • Distribution:  Mention the shape of the data distribution. Is it normal, skewed, or bimodal? Use histograms or box plots to illustrate the distribution visually.
  • Outliers:  Identify any outliers and discuss their potential impact on the analysis. Consider whether outliers should be treated or investigated further.
  • Key Observations: Highlight any notable observations or patterns that emerged during your analysis. Are there clear trends or interesting findings in the data?

Interpreting Descriptive Statistics

Interpreting descriptive statistics involves making sense of the numbers and metrics you've calculated. It's about understanding what the data is telling you about the underlying phenomenon. Here are some steps to guide your interpretation:

  • Context Matters:  Always consider the context of your data. What does a specific value or pattern mean in the real-world context of your study? For example, a mean salary value may vary significantly depending on the industry.
  • Comparisons:  If you have multiple datasets or groups, compare their descriptive statistics. Are there meaningful differences or similarities between them? Statistical tests may be needed for formal comparisons.
  • Correlations:  If you've used scatter plots to visualize relationships, interpret the direction and strength of correlations. Are variables positively or negatively correlated, or is there no clear relationship?
  • Causation:  Be cautious about inferring causation from descriptive statistics alone. Correlation does not imply causation, so consider additional research or experimentation to establish causal relationships.
  • Consider Outliers:  If you have outliers, assess their impact on the overall interpretation. Do they represent genuine data points or measurement errors?

Descriptive Statistics Examples

To better understand how descriptive statistics are applied in real-world scenarios, let's explore a range of practical examples across various fields and industries. These examples illustrate how descriptive statistics provide valuable insights and inform decision-making processes.

Financial Analysis

Example:  Investment Portfolio Analysis

Description:  An investment analyst is tasked with evaluating the performance of a portfolio of stocks over the past year. They collect daily returns for each stock and want to provide a comprehensive summary of the portfolio's performance.

Use of Descriptive Statistics:

  • Central Tendency:  Calculate the portfolio's average daily return (mean) to assess its overall performance during the year.
  • Variability:  Compute the portfolio's standard deviation to measure the risk or volatility associated with the investment.
  • Distribution:  Create a histogram to visualize the distribution of daily returns, helping the analyst understand the nature of the portfolio's gains and losses.
  • Outliers:  Identify any outliers in daily returns that may require further investigation.

The resulting descriptive statistics will guide the analyst in making recommendations to investors, such as adjusting the portfolio composition to manage risk or improve returns.

Marketing Research

Example:  Product Sales Analysis

Description:  A marketing team wants to evaluate the sales performance of different products in their product line. They have monthly sales data for the past two years.

  • Central Tendency:  Calculate the mean monthly sales for each product to determine their average performance.
  • Variability:  Compute the standard deviation of monthly sales to identify products with the most variable sales.
  • Distribution:  Create box plots to visualize the sales distribution for each product, helping to understand the range and variability.
  • Comparisons:  Compare sales trends over the two years for each product to identify growth or decline patterns.

Descriptive statistics allow the marketing team to make informed decisions about product marketing strategies, inventory management, and product development.

Social Sciences

Example:  Survey Analysis on Happiness Levels

Description:  A sociologist conducts a survey to assess the happiness levels of residents in different neighborhoods within a city. Respondents rate their happiness on a scale of 1 to 10.

  • Central Tendency:  Calculate the mean happiness score for each neighborhood to identify areas with higher or lower average happiness levels.
  • Variability:  Compute the standard deviation of happiness scores to understand the degree of variation within each neighborhood.
  • Distribution:  Create histograms to visualize the distribution of happiness scores, revealing whether happiness levels are normally distributed or skewed.
  • Comparisons:  Compare the happiness levels across neighborhoods to identify potential factors influencing happiness disparities.

Descriptive statistics help sociologists pinpoint areas that may require interventions to improve residents' overall well-being and identify potential research directions.

These examples demonstrate how descriptive statistics play a vital role in summarizing and interpreting data across diverse domains. By applying these statistical techniques, professionals can make data-driven decisions, identify trends and patterns, and gain valuable insights into various aspects of their work.

Common Descriptive Statistics Mistakes and Pitfalls

While descriptive statistics are valuable tools, they can be misused or misinterpreted if not handled carefully. Here are some common mistakes and pitfalls to avoid when working with descriptive statistics.

Misinterpretation of Descriptive Statistics

  • Assuming Causation:  One of the most common mistakes is inferring causation from correlation . Just because two variables are correlated does not mean that one causes the other. Always be cautious about drawing causal relationships from descriptive statistics alone.
  • Ignoring Context:  Failing to consider the context of the data can lead to misinterpretation. A descriptive statistic may seem significant, but it might not have practical relevance in the specific context of your study.
  • Neglecting Outliers:  Ignoring outliers or treating them as errors without investigation can lead to incomplete and inaccurate conclusions. Outliers may hold valuable information or reveal unusual phenomena.
  • Overlooking Distribution Assumptions:  When applying statistical tests or methods, it's important to check whether your data meets the assumptions of those techniques. For example, using methods designed for normally distributed data on skewed data can yield misleading results.

Data Reporting Errors

  • Inadequate Data Documentation:  Failing to provide clear documentation about data sources, collection methods, and preprocessing steps can make it challenging for others to replicate your analysis or verify your findings.
  • Mislabeling Variables:  Accurate labeling of variables and units is crucial. Mislabeling or using inconsistent units can lead to erroneous calculations and interpretations.
  • Failure to Report Measures of Uncertainty:  Descriptive statistics provide point estimates of central tendency and variability. It's crucial to report measures of uncertainty, such as confidence intervals or standard errors, to convey the range of possible values.

Avoiding Biases in Descriptive Statistics

  • Sampling Bias :  Ensure that your sample is representative of the population you intend to study. Sampling bias can occur when certain groups or characteristics are over- or underrepresented in the sample, leading to biased results.
  • Selection Bias:  Be cautious of selection bias, where specific data points are systematically included or excluded based on criteria that are unrelated to the research question. This can distort the analysis.
  • Confirmation Bias:  Avoid the tendency to seek, interpret, or remember information in a way that confirms preexisting beliefs or hypotheses. This bias can lead to selective attention and misinterpretation of data.
  • Reporting Bias:  Be transparent in reporting all relevant data, even if the results do not support your hypothesis or are inconclusive. Omitting such data can create a biased view of the overall picture.

Awareness of these common mistakes and pitfalls can help you conduct more robust and accurate analyses using descriptive statistics, leading to more reliable and meaningful conclusions in your research and decision-making processes.

Conclusion for Descriptive Statistics

Descriptive statistics are the essential building blocks of data analysis. They provide us with the means to summarize, visualize, and comprehend the often intricate world of data. By mastering these techniques, you have gained a valuable skill that can be applied across a multitude of fields and industries. From making informed business decisions to advancing scientific research, from understanding market trends to improving healthcare outcomes, descriptive statistics serve as our trusted guides in the realm of data.

You've learned how to calculate measures of central tendency, assess variability, explore data distributions, and employ powerful visualization tools. You've seen how descriptive statistics bring clarity to the chaos of data, revealing patterns and outliers, guiding your decisions, and enabling you to communicate insights effectively . As you continue to work with data, remember that descriptive statistics are your steadfast companions, ready to help you navigate the data landscape, extract valuable insights, and make informed choices based on evidence rather than guesswork.

How to implement Descriptive Statistics in Minutes?

Introducing Appinio , the real-time market research platform that's revolutionizing how businesses harness consumer insights. Imagine conducting your own market research in minutes, with the power of descriptive statistics at your fingertips.

Here's why Appinio is your go-to choice for fast, data-driven decisions:

Instant Insights: From questions to insights in minutes. Appinio accelerates your decision-making process, delivering real-time results when you need them most.

User-Friendly: No need for a PhD in research. Appinio's intuitive platform ensures that anyone can seamlessly gather and analyze data, making market research accessible to all.

Global Reach: Define your target group from 1200+ characteristics and survey it in over 90 countries. With Appinio, you can tap into a diverse pool of respondents worldwide.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Creative Checkup – Optimize Advertising Slogans & Creatives for maximum ROI

16.09.2024 | 9min read

Creative Checkup – Optimize Advertising Slogans & Creatives for ROI

Get your brand Holiday Ready: 4 Essential Steps to Smash your Q4

03.09.2024 | 5min read

Get your brand Holiday Ready: 4 Essential Steps to Smash your Q4

Beyond Demographics: Psychographic Power in target group identification

03.09.2024 | 8min read

Beyond Demographics: Psychographics power in target group identification

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Descriptive Statistics: Reporting the Answers to the 5 Basic Questions of Who, What, Why, When, Where, and a Sixth, So What?

Affiliation.

  • 1 From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
  • PMID: 28891910
  • DOI: 10.1213/ANE.0000000000002471

Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"

PubMed Disclaimer

Similar articles

  • Fundamentals of Research Data and Variables: The Devil Is in the Details. Vetter TR. Vetter TR. Anesth Analg. 2017 Oct;125(4):1375-1380. doi: 10.1213/ANE.0000000000002370. Anesth Analg. 2017. PMID: 28787341 Review.
  • Repeated Measures Designs and Analysis of Longitudinal Data: If at First You Do Not Succeed-Try, Try Again. Schober P, Vetter TR. Schober P, et al. Anesth Analg. 2018 Aug;127(2):569-575. doi: 10.1213/ANE.0000000000003511. Anesth Analg. 2018. PMID: 29905618 Free PMC article.
  • Preparing for the first meeting with a statistician. De Muth JE. De Muth JE. Am J Health Syst Pharm. 2008 Dec 15;65(24):2358-66. doi: 10.2146/ajhp070007. Am J Health Syst Pharm. 2008. PMID: 19052282 Review.
  • Summarizing and presenting numerical data. Pupovac V, Petrovecki M. Pupovac V, et al. Biochem Med (Zagreb). 2011;21(2):106-10. doi: 10.11613/bm.2011.018. Biochem Med (Zagreb). 2011. PMID: 22135849
  • Introduction to biostatistics: Part 2, Descriptive statistics. Gaddis GM, Gaddis ML. Gaddis GM, et al. Ann Emerg Med. 1990 Mar;19(3):309-15. doi: 10.1016/s0196-0644(05)82052-9. Ann Emerg Med. 1990. PMID: 2310070
  • Applying a "medical deserts" lens to cancer care services in the North-West region of Romania from 2009 to 2022 - a mixed-methods analysis. Brînzac MG, Ungureanu MI, Baba CO. Brînzac MG, et al. Arch Public Health. 2024 Sep 5;82(1):149. doi: 10.1186/s13690-024-01353-x. Arch Public Health. 2024. PMID: 39232788 Free PMC article.
  • Canadian midwives' perspectives on the clinical impacts of point of care ultrasound in obstetrical care: A concurrent mixed-methods study. Johnston BK, Darling EK, Malott A, Thomas L, Murray-Davis B. Johnston BK, et al. Heliyon. 2024 Mar 5;10(6):e27512. doi: 10.1016/j.heliyon.2024.e27512. eCollection 2024 Mar 30. Heliyon. 2024. PMID: 38533003 Free PMC article.
  • Validation and psychometric testing of the Chinese version of the prenatal body image questionnaire. Wang Q, Lin J, Zheng Q, Kang L, Zhang X, Zhang K, Lin R, Lin R. Wang Q, et al. BMC Pregnancy Childbirth. 2024 Feb 1;24(1):102. doi: 10.1186/s12884-024-06281-w. BMC Pregnancy Childbirth. 2024. PMID: 38302902 Free PMC article.
  • Cracking the code: uncovering the factors that drive COVID-19 standard operating procedures compliance among school management in Malaysia. Ahmad NS, Karuppiah K, Praveena SM, Ali NF, Ramdas M, Mohammad Yusof NAD. Ahmad NS, et al. Sci Rep. 2024 Jan 4;14(1):556. doi: 10.1038/s41598-023-49968-4. Sci Rep. 2024. PMID: 38177620 Free PMC article.
  • Comparison of Nonneurological Structures at Risk During Anterior-to-Psoas Versus Transpsoas Surgical Approaches Using Abdominal CT Imaging From L1 to S1. Razzouk J, Ramos O, Harianja G, Carter M, Mehta S, Wycliffe N, Danisa O, Cheng W. Razzouk J, et al. Int J Spine Surg. 2023 Dec 26;17(6):809-815. doi: 10.14444/8542. Int J Spine Surg. 2023. PMID: 37748918 Free PMC article.
  • Search in MeSH

Related information

  • Cited in Books

LinkOut - more resources

Full text sources.

  • Ingenta plc
  • Ovid Technologies, Inc.
  • Wolters Kluwer

Other Literature Sources

  • scite Smart Citations

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Descriptive Statistics: Detailed Analysis and Practical Implementation

Master descriptive statistics for data analysis with our guide on implementing practical techniques effectively. Start analyzing now!

Statistical methods are the backbone of rigorous empirical research across various disciplines. This blog post seeks to demystify the concept of descriptive statistics, providing a detailed analysis of its role in data interpretation and emphasizing its practical implementation across a multitude of fields. By exploring the fundamental components, readers should expect to gain a comprehensive understanding of how descriptive statistics serve as a tool for summarization and simplification in an increasingly data-driven world.

The significance of statistics in research cannot be overstated, as it is often through the application of these methods that complex data becomes interpretable and insightful. Our discussion will introduce these principles and prepare the ground for deeper exploration throughout this blog.

Understanding Descriptive Statistics

Definition and purpose of descriptive statistics.

Descriptive statistics form the cornerstone of quantitative analysis, offering a snapshot of data by producing concise summaries. The intrinsic value of these statistical methods lies in their ability to simplify complex datasets, allowing for a more digestible presentation of research findings. Problem solving skills course frequently emphasize the ability to comprehend and apply these techniques effectively. Despite their essential role, it is imperative to recognize the limitations of descriptive statistics, which are often misunderstood as providing inferential judgements when they simply describe a set.

Benefits of using descriptive statistics

The utility of descriptive statistics extends to numerous aspects of data analysis - from providing a foundational understanding of the dataset’s shape to aiding in detecting errors or anomalies. These statistical tools allow researchers to convey their findings in a manner that is both accessible and informative to a range of audiences, regardless of their prior statistical knowledge.

Acknowledging limitations in its use

However, descriptive statistics are not without their constraints. They provide a summarized glimpse but may obfuscate the nuances and underlying patterns present within the data. It is essential to use these methods as part of a broader analytical strategy, ensuring that the interpretation of data does not halt at the descriptive level, but extends to more complex analyses where appropriate.

Fundamental Components of Descriptive Statistics

Measures of central tendency.

Grasping the concept of central tendency is fundamental when engaging with descriptive statistics. This involves understanding measures such as mean, median, and mode, each offering a unique lens through which to view the "center" of the data. The mean calculates the average value which can be influenced by outliers, while the median provides the middle value when data are ranked, proving robust against skewed distributions. The mode, identifying the most frequent value, offers insight into the data's modality.

Calculating these measures and interpreting results

Computing these measures may seem straightforward, but interpreting them requires a discerning eye. It is through careful comparison and analysis of these measures that one can infer the data’s symmetry, skewness, and even identify outlier effects. Online certificate courses often include training on utilizing software for these calculations, though understanding their implications is just as critical.

Real-life examples for better understanding

Consider a classroom where students' test scores could be analyzed to evaluate performance. The mean score would provide an average performance metric, the median can represent a typical score unaffected by extreme grades, and the mode can highlight the most common score bracket. Such analysis assists in curriculum adjustments and targeted educational interventions.

The Role of Graphs and Charts in Descriptive Statistics

Importance of visual representation.

Visualizing data is a powerful aspect of descriptive statistics, translating numerical findings into imagery that can be rapidly interpreted. Graphs and charts provide a narrative to the data, effectively communicating results and revealing trends that might go unnoticed in tabular data. Different data calls for different types of visual tools, making it imperative to choose the appropriate format when conveying statistical information.

Advantage of using graphs and charts

The effectiveness of data representation can significantly affect the audience's comprehension. A well-designed chart can facilitate comparison, highlight statistical relationships, and become an integral part of storytelling with data. The decision to use a bar chart over a line graph, for example, could make the difference in how successfully the data's message is communicated.

Type of data suitable for different graphs

Selecting the correct graph or chart is closely related to the type of data at hand. For categorical data, pie charts can effectively showcase proportions, whereas histograms are suitable for continuous data, revealing distribution patterns. Time-series data would benefit from a line graph, highlighting trends over a period, while scatter plots are ideal for examining the relationship between two quantitative variables.

Practical Application of Descriptive Statistics

Use of descriptive statistics in various industries.

Descriptive statistics find applications in virtually every industry, affecting decision-making processes and strategy formulation. In business, these statistics can illustrate sales trends, customer demographics, or performance measures. The healthcare sector relies on descriptive analyses to track disease prevalence or patient outcomes. Social scientists use descriptive statistics to map patterns in social behavior, while psychologists could use them to summarize survey data.

Application in business and finance

For example, a retail store might apply descriptive statistics to identify the average customer spend or to determine the most popular product categories during different seasons. By leveraging these insights, a store can optimize its inventory, tailor marketing strategies, and improve overall operational efficiency.

Relevance in health and medical research

In the medical field, descriptive statistics can summarize patient blood pressure readings in a research study, offering a straightforward depiction of the study population's cardiovascular health status. It allows for an initial assessment before delving into more sophisticated statistical tests that might compare the efficacy of different treatments.

Recap of importance and application of descriptive statistics

Descriptive statistics act as interpreters in a world saturated with data. They serve an indispensable role in research and decision-making by providing clear, concise summaries of data. This blog has covered the importance and application of these statistics, from the computation of central measures to the visual impact of graphs and charts.

Encouragement for further study and usage

As with any scientific method, the journey of learning and applying descriptive statistics is ongoing. Researchers, students, and professionals across industries are encouraged to further their understanding of these methods, recognizing their value and potential when combined with other forms of data analysis.

Concluding thoughts on the power and limitations of descriptive statistics

While powerful, descriptive statistics provide the foundation upon which further, more complex analysis can be built. An awareness of their limitations is just as critical as recognizing their utility. By appreciating the nuances of descriptive statistics and the insights they can offer, users can contribute to a more informed, data-driven approach in their respective fields.

How does descriptive statistics contribute to the understanding and interpretation of data in research studies?

Introduction to descriptive statistics.

Descriptive statistics play a key role in research. They offer a simple summary of large datasets. Imagine analyzing raw data without any summarization. It would be a daunting task. Descriptive statistics make complex data understandable. Researchers rely on them heavily.

Simplification of Complex Data

Researchers gather vast amounts of data. These can overwhelm anyone. Descriptive statistics help simplify. They break down complex data into understandable pieces. Think of them as a translator. They translate data into a language everyone understands.

Three central measures exist:

Each measure offers a different insight. The mean provides an average. It adds all values and divides by their count. Skewness can, however, affect the mean. The median is the middle value. It's unaffected by outliers. The mode is the most frequent value. It helps in understanding common trends.

Measures of Variability

Consistency matters in data. Descriptive statistics show this through variability measures.

Four main measures exist:

- Interquartile Range

- Standard Deviation

The range reveals the spread of data. It is the difference between the highest and lowest values. The interquartile range looks at the middle spread. It reflects a typical range, excluding extremes. Variance measures the average degree to which each point differs from the mean. The standard deviation is the square root of the variance. It expresses average distance from the mean. These measures help us understand data diversity.

Data Distribution Insights

Descriptive statistics also address distribution. They answer questions like:

- Is the data skewed?

- Is it peaked or flat?

Shapes of distribution influence analysis. Skewness tells us if data tilts to one side. Is it to the right or left? Kurtosis speaks to data peaks. Are they high, low, or moderate? Understanding these aspects is crucial. They shape the interpretation of the dataset.

Relationship Between Variables

Descriptive statistics can reveal relationships. Correlation coefficients measure this. They range from -1 to +1. A positive coefficient indicates a direct relationship. As one variable increases, so does the other. A negative one shows an inverse relationship. As one goes up, the other goes down. Coefficients near zero imply no relationship. Knowing these relationships guides researchers' understanding.

Conclusion to Descriptive Statistics

In essence, descriptive statistics serve as tools. They bring clarity and insight to research data. They transform raw information into usable knowledge. Without them, making sense of data becomes a near-impossible task. Researchers depend on descriptive statistics to convey findings succinctly. Thus, they form the foundation of data analysis in research studies. They do not provide all the answers. Yet, they are the critical first step in data interpretation.

Introduction to Descriptive Statistics Descriptive statistics play a key role in research. They offer a simple summary of large datasets. Imagine analyzing raw data without any summarization. It would be a daunting task. Descriptive statistics make complex data understandable. Researchers rely on them heavily. Simplification of Complex Data Researchers gather vast amounts of data. These can overwhelm anyone. Descriptive statistics help simplify. They break down complex data into understandable pieces. Think of them as a translator. They translate data into a language everyone understands. Measures of Central Tendency Three central measures exist: -  Mean -  Median -  Mode Each measure offers a different insight. The  mean  provides an average. It adds all values and divides by their count. Skewness can, however, affect the mean. The  median  is the middle value. Its unaffected by outliers. The  mode  is the most frequent value. It helps in understanding common trends. Measures of Variability Consistency matters in data. Descriptive statistics show this through variability measures.  Four main measures exist: -  Range -  Interquartile Range -  Variance -  Standard Deviation The  range  reveals the spread of data. It is the difference between the highest and lowest values. The  interquartile range  looks at the middle spread. It reflects a typical range, excluding extremes.  Variance  measures the average degree to which each point differs from the mean. The  standard deviation  is the square root of the variance. It expresses average distance from the mean. These measures help us understand data diversity. Data Distribution Insights Descriptive statistics also address distribution. They answer questions like: - Is the data skewed? - Is it peaked or flat? Shapes of distribution influence analysis.  Skewness  tells us if data tilts to one side. Is it to the right or left?  Kurtosis  speaks to data peaks. Are they high, low, or moderate? Understanding these aspects is crucial. They shape the interpretation of the dataset. Relationship Between Variables Descriptive statistics can reveal relationships. Correlation coefficients measure this. They range from -1 to +1. A positive coefficient indicates a direct relationship. As one variable increases, so does the other. A negative one shows an inverse relationship. As one goes up, the other goes down. Coefficients near zero imply no relationship. Knowing these relationships guides researchers understanding. Conclusion to Descriptive Statistics In essence, descriptive statistics serve as tools. They bring clarity and insight to research data. They transform raw information into usable knowledge. Without them, making sense of data becomes a near-impossible task. Researchers depend on descriptive statistics to convey findings succinctly. Thus, they form the foundation of data analysis in research studies. They do not provide all the answers. Yet, they are the critical first step in data interpretation.

What are the practical applications of measures of central tendency in real-world data analysis?

Understanding basics.

In data analysis, measures of central tendency form a cornerstone. They allow us to understand "typical" values. Mean , median , and mode are the most common. Each has its unique applications in various fields. These measures help simplify complex data sets. They provide a single value to represent a collection of data points.

Business and Economics

Businesses make extensive use of these measures. They apply them to analyze sales, customer behavior, and market trends. Companies use the mean to calculate average sales. This gives them a gauge of overall performance. The median income provides insights into the economic well-being of a customer base. It is less affected by outliers than the mean. Employers often refer to the mode when determining the most common salary.

Education and Testing

Educators analyze test scores using central tendency. They determine the average score using the mean . This offers insight into overall class performance. The median helps identify the middle-performing student. For educators, the mode might show the most frequently occurring score. It helps in understanding the distribution of grades.

Health and Medicine

Healthcare professionals rely on these measures. They look at patient ages, recovery times, and blood pressure readings. The mean age of patients can inform potential risk for diseases. The median recovery time gives a clear picture of treatment effectiveness. The mode aids in identifying the most common symptoms or conditions.

Real Estate

Real estate agents analyze property values. The mean property price offers a quick market overview. The median gives a better sense of typical home values. It mitigates the distortion from extremely high or low prices. Agents use the mode to understand the most common price points or features.

Public Policy

Policy makers use measures of central tendency. They determine average income, housing costs, or educational attainment. Such analysis influences policy decisions. The mean income informs tax policy and welfare programs. The median home cost helps in assessing housing affordability. The mode might indicate the most frequent level of education. This shapes education policy.

Limitations and Considerations

While vital, these measures have limitations. Each can be skewed by outliers. The mean is particularly sensitive to extreme values. Thus, data analysts must choose the appropriate measure. They must understand the data distribution. Only then will they derive accurate and insightful conclusions.

In summary, measures of central tendency offer practical tools for real-world data analysis. They boil down intricate data into actionable insights. Their applications cross countless domains. They offer powerful snapshots of datasets. However, analysts must apply them judiciously to unlock their full potential.

Measures of Central Tendency Understanding Basics In data analysis, measures of central tendency form a cornerstone. They allow us to understand  typical  values.  Mean ,  median , and  mode  are the most common. Each has its unique applications in various fields. These measures help simplify complex data sets. They provide a single value to represent a collection of data points.  Business and Economics Businesses make extensive use of these measures. They apply them to analyze sales, customer behavior, and market trends. Companies use the  mean  to calculate average sales. This gives them a gauge of overall performance. The  median  income provides insights into the economic well-being of a customer base. It is less affected by outliers than the mean. Employers often refer to the mode when determining the most common salary. Education and Testing Educators analyze test scores using central tendency. They determine the average score using the  mean . This offers insight into overall class performance. The  median  helps identify the middle-performing student. For educators, the  mode  might show the most frequently occurring score. It helps in understanding the distribution of grades.  Health and Medicine Healthcare professionals rely on these measures. They look at patient ages, recovery times, and blood pressure readings. The mean age of patients can inform potential risk for diseases. The median recovery time gives a clear picture of treatment effectiveness. The mode aids in identifying the most common symptoms or conditions. Real Estate Real estate agents analyze property values. The  mean  property price offers a quick market overview. The  median  gives a better sense of typical home values. It mitigates the distortion from extremely high or low prices. Agents use the  mode  to understand the most common price points or features. Public Policy Policy makers use measures of central tendency. They determine average income, housing costs, or educational attainment. Such analysis influences policy decisions. The  mean  income informs tax policy and welfare programs. The  median  home cost helps in assessing housing affordability. The  mode  might indicate the most frequent level of education. This shapes education policy. Limitations and Considerations While vital, these measures have limitations. Each can be skewed by outliers. The mean is particularly sensitive to extreme values. Thus, data analysts must choose the appropriate measure. They must understand the data distribution. Only then will they derive accurate and insightful conclusions. In summary, measures of central tendency offer practical tools for real-world data analysis. They boil down intricate data into actionable insights. Their applications cross countless domains. They offer powerful snapshots of datasets. However, analysts must apply them judiciously to unlock their full potential.

What are the challenges encountered during the implementation of descriptive statistical analysis in large data sets?

Large data sets pose unique challenges, handling massive volumes.

Volume matters in large data sets. Statisticians face significant challenges when working with extensive volumes of data. Sometimes, traditional computing resources lack the necessary power. Therefore, advanced hardware, or cloud-based solutions often become a necessity. These improve processing speed and efficiency.

Data Quality and Consistency

Data quality cannot be overlooked . High volumes often bring inconsistency and missing values. Ensuring clean, uniform data can be daunting. Analysts must verify data quality before starting descriptive analysis. This involves data cleaning and preprocessing, which itself can become complex.

Computational Complexity

Computational resources strain under heavy loads . Descriptive statistics involve computations like mean, median, mode, variance, and standard deviation. When data sets become large, these calculations stress memory and processing power. Efficient algorithms and software optimization become crucial.

Visualization Constraints

Visualizing data helps understanding . Yet, large data sets complicate this. Standard tools often do not scale well or may oversimplify the nuances in the data. Employing specialized visualization tools that handle large volumes effectively becomes imperative.

Time Management

Time becomes a crucial factor . Time efficiency in data processing and analysis affects project deadlines. With vast data sets, compression and parallel processing techniques may assist. Nevertheless, these approaches need careful implementation.

Statistical Significance and Noise

Every data point counts . In large data sets, even minor variations can appear significant. Analysts must distinguish noise from true variability. This discernment affects the conclusions drawn and actions taken.

Scaling Analysis

Analyses must scale with data . As data grows, methods and approaches used for small sets may not apply. Approaches must fit the volume, velocity, and variety of big data. Scaling descriptive statistics requires innovative methodologies and robust systems.

Overcoming Learning Curve

New tools require new skills . Utilizing advanced analytics tools requires knowledge and experience. Teams might lack expertise in cutting-edge statistical software. Investment in training becomes crucial to harness full data potential.

In conclusion, large data sets bring a unique set of challenges to descriptive statistical analysis. Addressing such challenges requires thought, planning, and the implementation of appropriate strategies and tools. A comprehensive approach enables data professionals to provide accurate, meaningful insights despite the inherent difficulties of working with big data.

Large Data Sets Pose Unique Challenges Handling Massive Volumes Volume matters  in large data sets. Statisticians face significant challenges when working with extensive volumes of data. Sometimes, traditional computing resources lack the necessary power. Therefore, advanced hardware, or cloud-based solutions often become a necessity. These improve processing speed and efficiency. Data Quality and Consistency Data quality cannot be overlooked . High volumes often bring inconsistency and missing values. Ensuring clean, uniform data can be daunting. Analysts must verify data quality before starting descriptive analysis. This involves data cleaning and preprocessing, which itself can become complex. Computational Complexity Computational resources strain under heavy loads . Descriptive statistics involve computations like mean, median, mode, variance, and standard deviation. When data sets become large, these calculations stress memory and processing power. Efficient algorithms and software optimization become crucial. Visualization Constraints Visualizing data helps understanding . Yet, large data sets complicate this. Standard tools often do not scale well or may oversimplify the nuances in the data. Employing specialized visualization tools that handle large volumes effectively becomes imperative. Time Management Time becomes a crucial factor . Time efficiency in data processing and analysis affects project deadlines. With vast data sets, compression and parallel processing techniques may assist. Nevertheless, these approaches need careful implementation. Statistical Significance and Noise Every data point counts . In large data sets, even minor variations can appear significant. Analysts must distinguish noise from true variability. This discernment affects the conclusions drawn and actions taken. Scaling Analysis Analyses must scale with data . As data grows, methods and approaches used for small sets may not apply. Approaches must fit the volume, velocity, and variety of big data. Scaling descriptive statistics requires innovative methodologies and robust systems. Overcoming Learning Curve New tools require new skills . Utilizing advanced analytics tools requires knowledge and experience. Teams might lack expertise in cutting-edge statistical software. Investment in training becomes crucial to harness full data potential. In conclusion, large data sets bring a unique set of challenges to descriptive statistical analysis. Addressing such challenges requires thought, planning, and the implementation of appropriate strategies and tools. A comprehensive approach enables data professionals to provide accurate, meaningful insights despite the inherent difficulties of working with big data.

He is a content producer who specializes in blog content. He has a master's degree in business administration and he lives in the Netherlands.

descriptive statistical tools in research

Problem Solving: Unlock the Power of Expert Systems

Unlock the power of data and uncover new insights with this probabilitybased approach Statistics Tool

Unlocking the Power of Statistics with a Probability-Based Approach

A woman is sitting at a desk with a laptop in front of her. She is wearing a white shirt and glasses, and is looking directly at the computer screen. Her right hand is resting on the keyboard, and a finger of her left hand is raised in the air. On the laptop screen, there is a white letter 'O' on a black background. The background of the desk is a mesh pattern, and the surroundings are blurry. The woman appears to be focused and engaged in her work.

7 Problem Solving Skills You Need to Succeed

'Master corporate decision-making with vital risk assessment strategies. Ensure business success by identifying and mitigating potential hazards.'

Risk Assessment: A Vital Tool in Managing Corporate Decisions

Child Care and Early Education Research Connections

Descriptive Statistics

This page describes graphical and pictorial methods of descriptive statistics and the three most common measures of descriptive statistics (central tendency, dispersion, and association).

Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of:

Graphical/Pictorial Methods

Measures of central tendency, measures of dispersion, measures of association.

There are several graphical and pictorial methods that enhance researchers' understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:

Scatter plots

Geographical Information Systems (GIS)

Visually represent the frequencies with which values of variables occur

Each value of a variable is displayed along the bottom of a histogram, and a bar is drawn for each value

The height of the bar corresponds to the frequency with which that value occurs

Display the relationship between two quantitative or numeric variables by plotting one variable against the value of another variable

For example, one axis of a scatter plot could represent height and the other could represent weight. Each person in the data would receive one data point on the scatter plot that corresponds to his or her height and weight

Geographic Information Systems (GIS)

A GIS is a computer system capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location

Using a GIS program, a researcher can create a map to represent data relationships visually

Display networks of relationships among variables, enabling researchers to identify the nature of relationships that would otherwise be too complex to conceptualize

Visit the following websites for more information:

Graphical Analytic Techniques

Geographic Information Systems

Glossary terms related to graphical and pictorial methods:

GIS Histogram Scatter Plot Sociogram

Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics. They describe the "average" member of the population of interest. There are three measures of central tendency:

Mean  -- the sum of a variable's values divided by the total number of values Median  -- the middle value of a variable Mode  -- the value that occurs most often

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000.

Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000 Median Income = $45,000 Modal Income = $10,000

The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000, a handful of individuals earn millions.

Basic Statistics

Measures of Position

Glossary terms related to measures of central tendency:

Average Central Tendency Confidence Interval Mean Median Mode Moving Average Point Estimate Univariate Analysis

Measures of dispersion provide information about the spread of a variable's values. There are four key measures of dispersion:

Standard Deviation

Range  is simply the difference between the smallest and largest values in the data. The interquartile range is the difference between the values at the 75th percentile and the 25th percentile of the data.

Variance  is the most commonly used measure of dispersion. It is calculated by taking the average of the squared differences between each value and the mean.

Standard deviation , another commonly used statistic, is the square root of the variance.

Skew  is a measure of whether some values of a variable are extremely different from the majority of the values. For example, income is skewed because most people make between $0 and $200,000, but a handful of people earn millions. A variable is positively skewed if the extreme values are higher than the majority of values. A variable is negatively skewed if the extreme values are lower than the majority of values.

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000:

Range = 1,000,000 - 10,000 = 990,000 Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000 - 225,000)2 + (1,000,000 - 225,000)2] / 5 = 150,540,000,000 Standard Deviation = Square Root (150,540,000,000) = 387,995 Skew = Income is positively skewed

Survey Research Tools

Variance and Standard Deviation

Summarizing and Presenting Data

Skewness Simulation

Glossary terms related to measures of dispersion:

Confidence Interval Distribution Kurtosis Point Estimate Quartiles Range Skewness Standard Deviation Univariate Analysis Variance

Measures of association indicate whether two variables are related. Two measures are commonly used:

Correlation

As a measure of association between variables, chi-square tests are used on nominal data (i.e., data that are put into classes: e.g., gender [male, female] and type of job [unskilled, semi-skilled, skilled]) to determine whether they are associated*

A chi-square is called significant if there is an association between two variables, and nonsignificant if there is not an association

To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts up the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). If there is a large difference between the observed values and the expected values, the chi-square test is significant, which indicates there is an association between the two variables.

*The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. As such, the chi square test is not restricted to nominal data; with non-binned data, however, the results depend on how the bins or classes are created and the size of the sample

A correlation coefficient is used to measure the strength of the relationship between numeric variables (e.g., weight and height)

The most common correlation coefficient is  Pearson's r , which can range from -1 to +1.

If the coefficient is between 0 and 1, as one variable increases, the other also increases. This is called a positive correlation. For example, height and weight are positively correlated because taller people usually weigh more

If the correlation coefficient is between -1 and 0, as one variable increases the other decreases. This is called a negative correlation. For example, age and hours slept per night are negatively correlated because older people usually sleep fewer hours per night

Chi-Square Procedures for the Analysis of Categorical Frequency Data

Chi-square Analysis

Glossary terms related to measures of association:

Association Chi Square Correlation Correlation Coefficient Measures of Association Pearson's Correlational Coefficient Product Moment Correlation Coefficient

cropped Screenshot 2023 08 20 at 23.18.57

Descriptive Research in Psychology: Methods, Applications, and Importance

Picture a psychologist’s toolkit, brimming with an array of methods designed to unravel the mysteries of the human mind—among them, the unsung hero of descriptive research, a powerful lens through which we can observe, understand, and illuminate the vast landscape of human behavior and cognition. This versatile approach to psychological inquiry serves as a cornerstone in our quest to comprehend the intricacies of the human experience, offering insights that shape our understanding of everything from child development to social interactions.

Descriptive research in psychology is like a skilled artist’s sketch, capturing the essence of human behavior and mental processes with precision and depth. It’s the foundation upon which many psychological theories are built, providing a rich tapestry of observations that inform more complex studies. Unlike experimental methods that manipulate variables to establish cause-and-effect relationships, descriptive research aims to paint a vivid picture of what is, rather than what could be.

Defining Descriptive Research in Psychology: More Than Meets the Eye

At its core, descriptive research in psychology is a systematic approach to observing and cataloging human behavior, thoughts, and emotions in their natural context. It’s the scientific equivalent of people-watching, but with a structured methodology and a keen eye for detail. This type of research doesn’t just scratch the surface; it dives deep into the nuances of human experience, capturing the subtleties that might otherwise go unnoticed.

The beauty of descriptive research lies in its versatility. It can take many forms, each offering a unique perspective on the human psyche. From participant observation in psychology , where researchers immerse themselves in the world they’re studying, to meticulous case studies that explore individual experiences in depth, descriptive research adapts to the questions at hand.

One of the primary goals of descriptive research is to provide a comprehensive account of a phenomenon. It’s not about proving or disproving hypotheses; instead, it’s about gathering rich, detailed information that can later inform more targeted inquiries. This approach is particularly valuable when exploring new or understudied areas of psychology, serving as a springboard for future research.

Methods and Techniques: The Descriptive Researcher’s Toolkit

The methods employed in descriptive research are as diverse as the questions they seek to answer. Let’s take a closer look at some of the key tools in the descriptive researcher’s arsenal:

1. Observational methods: Picture a researcher sitting quietly in a playground, noting how children interact. This direct observation can yield invaluable insights into social development and behavior patterns.

2. Case studies: These in-depth explorations of individual experiences can shed light on rare psychological phenomena or provide detailed accounts of therapeutic interventions.

3. Surveys and questionnaires: By tapping into the thoughts and opinions of large groups, researchers can identify trends and patterns in attitudes and behaviors.

4. Archival research in psychology : Delving into historical records and existing datasets can uncover long-term trends and provide context for current psychological phenomena.

5. Naturalistic observation: This method involves studying behavior in its natural environment, without interference from the researcher. It’s like being a fly on the wall, capturing authentic human interactions.

Each of these methods has its strengths and limitations, and skilled researchers often combine multiple approaches to gain a more comprehensive understanding of their subject matter.

Applications: Descriptive Research in Action

The applications of descriptive research in psychology are as varied as human behavior itself. Let’s explore how this approach illuminates different areas of psychological study:

In developmental psychology, descriptive research plays a crucial role in understanding how children grow and change over time. Longitudinal studies in psychology , which follow the same group of individuals over an extended period, provide invaluable insights into the trajectory of human development.

Social psychology relies heavily on descriptive methods to explore how people interact and influence one another. For instance, observational studies in public spaces can reveal patterns of nonverbal communication or group dynamics that might be difficult to capture in a laboratory setting.

Clinical psychology often employs case studies to delve into the complexities of mental health disorders. These detailed accounts can provide rich, contextual information about the lived experiences of individuals dealing with psychological challenges.

In educational psychology, descriptive research helps identify effective teaching strategies and learning patterns. Classroom observations and student surveys can inform educational policies and practices, ultimately improving learning outcomes.

Real-world examples of descriptive studies abound. Consider the famous “Bobo doll” experiments by Albert Bandura, which used observational methods to explore how children learn aggressive behaviors. While not strictly descriptive in nature, these studies incorporated descriptive elements that provided crucial insights into social learning theory.

Strengths and Limitations: A Balanced View

Like any research method, descriptive research has its strengths and limitations. On the plus side, it offers a level of ecological validity that’s hard to match in controlled experiments. By studying behavior in natural settings, researchers can capture the complexity and nuance of real-world phenomena.

Descriptive research is also particularly adept at identifying patterns and generating hypotheses. It’s often the first step in a longer research process, providing the foundation for more targeted experimental studies. This approach can be especially valuable when dealing with sensitive topics or populations that might be difficult to study in more controlled settings.

However, it’s important to acknowledge the limitations of descriptive research. One of the primary challenges is the directionality problem in psychology . While descriptive studies can identify relationships between variables, they can’t establish causation. This limitation can sometimes lead to misinterpretation of results or overreaching conclusions.

Another potential pitfall is researcher bias. The subjective nature of some descriptive methods, particularly observational studies, can introduce unintended biases into the data collection and interpretation process. Researchers must be vigilant in maintaining objectivity and employing strategies to minimize bias.

When compared to experimental research, descriptive studies may seem less rigorous or definitive. However, this perception overlooks the unique value that descriptive research brings to the table. While experiments are excellent for testing specific hypotheses and establishing causal relationships, they often lack the richness and contextual detail that descriptive methods provide.

Conducting a Descriptive Study: From Planning to Publication

Embarking on a descriptive research project requires careful planning and execution. Here’s a roadmap for aspiring researchers:

1. Define your research question: Start with a clear, focused question that guides your inquiry. What specific aspect of human behavior or cognition do you want to explore?

2. Choose your method: Select the descriptive technique(s) best suited to answer your research question. Will you be conducting surveys, observing behavior, or delving into case studies?

3. Develop your data collection tools: Create robust instruments for gathering information, whether it’s a well-designed questionnaire or a structured observation protocol.

4. Recruit participants: If your study involves human subjects, ensure you have a representative sample and obtain proper informed consent.

5. Collect data: Implement your chosen method(s) with consistency and attention to detail. Remember, the quality of your data will directly impact the value of your findings.

6. Analyze and interpret: Once you’ve gathered your data, it’s time to make sense of it. Look for patterns, themes, and relationships within your observations.

7. Draw conclusions: Based on your analysis, what can you say about the phenomenon you’ve studied? Be careful not to overstate your findings or imply causation where none has been established.

Throughout this process, it’s crucial to keep ethical considerations at the forefront. Respect for participants’ privacy, confidentiality, and well-being should guide every step of your research.

The Future of Descriptive Research: Evolving Methods and New Frontiers

As we look to the future, descriptive research in psychology continues to evolve and adapt to new challenges and opportunities. Emerging technologies are opening up exciting possibilities for data collection and analysis. For instance, wearable devices and smartphone apps are enabling researchers to gather real-time data on behavior and physiological responses in natural settings.

The rise of big data and advanced analytics is also transforming descriptive research. By analyzing vast datasets of human behavior online, researchers can identify patterns and trends on a scale previously unimaginable. However, this new frontier also brings ethical challenges, particularly around privacy and consent.

Another promising direction is the integration of descriptive methods with other research approaches. Quasi-experiments in psychology , which combine elements of descriptive and experimental research, offer a middle ground that can leverage the strengths of both approaches.

As we continue to unravel the complexities of the human mind, descriptive research will undoubtedly play a crucial role. Its ability to capture the richness and diversity of human experience makes it an indispensable tool in the psychologist’s toolkit.

In conclusion, descriptive research in psychology is far more than just a preliminary step in the scientific process. It’s a powerful approach that provides the foundation for our understanding of human behavior and mental processes. By offering detailed, contextual insights into the human experience, descriptive research helps us identify patterns, generate hypotheses, and ultimately advance our knowledge of psychology.

From exploring the intricacies of child development to unraveling the dynamics of social interactions, descriptive research continues to illuminate the vast landscape of human psychology. As we move forward, the challenge for researchers will be to harness new technologies and methodologies while maintaining the core strengths of descriptive approaches – their ability to capture the nuance, complexity, and diversity of human experience.

In the end, it’s this deep, rich understanding of human behavior that drives psychological science forward, informing theories, shaping interventions, and ultimately helping us to better understand ourselves and others. As we continue to explore the fascinating world of the human mind, descriptive research will remain an essential tool, helping us to see the world through the eyes of those we study and to tell their stories with clarity, empathy, and scientific rigor.

References:

1. Coolican, H. (2014). Research methods and statistics in psychology. Psychology Press.

2. Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications.

3. Goodwin, C. J., & Goodwin, K. A. (2016). Research in psychology: Methods and design. John Wiley & Sons.

4. Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings. Oxford University Press.

5. Leedy, P. D., & Ormrod, J. E. (2015). Practical research: Planning and design. Pearson.

6. Marczyk, G., DeMatteo, D., & Festinger, D. (2005). Essentials of research design and methodology. John Wiley & Sons.

7. Mertens, D. M. (2014). Research and evaluation in education and psychology: Integrating diversity with quantitative, qualitative, and mixed methods. Sage publications.

8. Rosenthal, R., & Rosnow, R. L. (2008). Essentials of behavioral research: Methods and data analysis. McGraw-Hill.

9. Shaughnessy, J. J., Zechmeister, E. B., & Zechmeister, J. S. (2015). Research methods in psychology. McGraw-Hill Education.

10. Willig, C., & Rogers, W. S. (Eds.). (2017). The SAGE handbook of qualitative research in psychology. Sage.

Similar Posts

Dark Humor Psychology: Unraveling the Appeal of Morbid Jokes

Dark Humor Psychology: Unraveling the Appeal of Morbid Jokes

A morbid chuckle, a guilty grin—dark humor has long been a fascinating enigma, captivating minds and sparking debates about the fine line between wit and impropriety. It’s that uncomfortable laughter that bubbles up when we least expect it, often leaving us wondering, “Should I really be laughing at this?” But what is it about dark…

Psychological Rehabilitation: Restoring Mental Health and Well-being

Psychological Rehabilitation: Restoring Mental Health and Well-being

Shattered minds, fractured lives—a journey through the transformative power of psychological rehabilitation, where broken pieces are painstakingly reassembled, and hope is rekindled in the face of even the most daunting mental health challenges. It’s a process that demands patience, resilience, and unwavering dedication from both the individuals seeking help and the professionals guiding them towards…

Spinal Cord Stimulator Psychological Evaluation: Essential Steps for Treatment Success

Spinal Cord Stimulator Psychological Evaluation: Essential Steps for Treatment Success

For those suffering from chronic pain, the path to relief through spinal cord stimulation is a journey that begins with a critical step: the psychological evaluation. This crucial assessment isn’t just a box to tick; it’s a gateway to understanding the complex interplay between mind and body in the realm of chronic pain management. Imagine,…

Empirical Evidence in Psychology: Definition, Types, and Importance

Empirical Evidence in Psychology: Definition, Types, and Importance

Empirical evidence, the lifeblood of psychological research, holds the key to unlocking the mysteries of the human mind and behavior. It’s the backbone of our understanding, the foundation upon which we build our knowledge of the intricate workings of the psyche. But what exactly is empirical evidence in psychology, and why does it matter so…

Rehabilitation Psychology Journal: Advancing Research and Practice in Disability Care

Rehabilitation Psychology Journal: Advancing Research and Practice in Disability Care

Revolutionizing disability care through cutting-edge research and innovative practice, the Rehabilitation Psychology Journal has been a beacon of hope for individuals and families navigating the complex landscape of rehabilitation. This esteemed publication has consistently pushed the boundaries of our understanding, shedding light on the intricate interplay between psychological factors and physical recovery. Imagine a world…

Metronome Uses in Psychology: Exploring Rhythmic Applications in Mental Health

Metronome Uses in Psychology: Exploring Rhythmic Applications in Mental Health

From the steady ticking of a musician’s tool to the rhythmic heartbeat of psychological innovation, metronomes have found a new tempo in the realm of mental health. Who would have thought that this simple device, originally designed to keep musicians in time, could become a powerful instrument in the symphony of psychological well-being? It’s a…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Fam Med Community Health
  • v.7(2); 2019

Logo of fmch

Basics of statistics for primary care research

Timothy c guetterman.

Family Medicine, University of Michigan, Michigan Medicine, Ann Arbor, Michigan, USA

The purpose of this article is to provide an accessible introduction to foundational statistical procedures and present the steps of data analysis to address research questions and meet standards for scientific rigour. It is aimed at individuals new to research with less familiarity with statistics, or anyone interested in reviewing basic statistics. After examining a brief overview of foundational statistical techniques, for example, differences between descriptive and inferential statistics, the article illustrates 10 steps in conducting statistical analysis with examples of each. The following are the general steps for statistical analysis: (1) formulate a hypothesis, (2) select an appropriate statistical test, (3) conduct a power analysis, (4) prepare data for analysis, (5) start with descriptive statistics, (6) check assumptions of tests, (7) run the analysis, (8) examine the statistical model, (9) report the results and (10) evaluate threats to validity of the statistical analysis. Researchers in family medicine and community health can follow specific steps to ensure a systematic and rigorous analysis.

Investigators in family medicine and community health often employ quantitative research to address aims that examine trends, relationships among variables or comparisons of groups (Fetters, 2019, this issue). Quantitative research involves collecting structured or closed-ended data, typically in the form of numbers, and analysing that numeric data to address research questions and test hypotheses. Research hypotheses provide a proposition about the expected outcome of research that may be assessed using a variety of methodologies, while statistical hypotheses are specific statements about propositions that can only be tested statistically. Statistical analysis requires a series of steps beginning with formulating hypotheses and selecting appropriate statistical tests. After preparing data for analysis, researchers then proceed with the actual statistical analysis and finally report and interpret the results.

Family medicine and community health researchers often limit their analyses to descriptive statistics—reporting frequencies, means and standard deviation (SD). While sometimes an appropriate stopping point, researchers may be missing opportunities for more advanced analyses. For example, knowing that patients have favourable attitudes about a treatment may be important and can be addressed with descriptive statistics. On the other hand, finding that attitudes are different (or not) between men and women and that difference is statistically significant may give even more actionable information to healthcare professionals. The latter question, about differences, can be addressed through inferential statistical tests. The purpose of this article is to provide an accessible introduction to foundational statistical procedures and present the steps of data analysis to address research questions and meet standards for scientific rigour. It is aimed at individuals new to research with less familiarity with statistics and may be helpful information when reading research or conducting peer review.

Foundational statistical techniques

Statistical analysis is a method of aggregating numeric data and drawing inferences about variables. Statistical procedures may be broadly classified into (1) statistics that describe data—descriptive statistics; and (2) statistics that make inferences about more general situations beyond the actual data set—inferential statistics.

Descriptive statistics

Descriptive statistics aggregate data that are grouped into variables to examine typical values and the spread of values for each variable in a data set. Statistics summarising typical values are referred to as measures of central tendency and include the mean, median and mode. The spread of values is represented through measures of variability, including the variance, SD and range. Together, descriptive statistics provide indicators of the distribution of data, or the frequency of values through the data set as in a histogram plot. Table 1 summarises commonly used descriptive statistics. For consistency, I use the terms independent variable and dependent variable, but in some fields and types of research such as correlational studies the preferred terms may be predictor and outcome variable. An independent variable influences, affects or predicts a dependent variable .

StatisticStatisticDescription of calculationIntent
Measures of central tendencyMeanTotal of values divided by the number of values.Describe all responses with the average value.
MedianArrange all values in order and determine the halfway point.Determine the middle value among all values, which is important when dealing with extreme outliers.
ModeExamine all values and determine which one appears most frequently.Describe the most common value.
Measures of variabilityVarianceCalculate the difference of each value from the mean, square this difference score, sum all of the squared difference scores and divide by the number of values minus 1.Provide an indicator of spread.
Standard deviationSquare root of variance.Give an indicator of spread by reporting on average how much values differ from the mean.
RangeThe difference between the maximum and minimum value.Give a very general indicator of spread.
FrequenciesCount the number of occurrences of each value.Provide a distribution of how many times each value occurs.

Inferential statistics: comparing groups with t tests and ANOVA

Inferential statistics are another broad category of techniques that go beyond describing a data set. Inferential statistics can help researchers draw conclusions from a sample to a population. 1 We can use inferential statistics to examine differences among groups and the relationships among variables. Table 2 presents a menu of common, fundamental inferential tests. Remember that even more complex statistics rely on these as a foundation.

Inferential statistics

StatisticIntent
t testsCompare groups to examine whether means between two groups are statistically significant.
Analysis of varianceCompare groups to examine whether means among two or more groups are statistically significant.
CorrelationExamine whether there is a relationship or association between two or more variables.
RegressionExamine how one or more variables predict another variable.

The t test is used to compare two group means by determining whether group differences are likely to have occurred randomly by chance or systematically indicating a real difference. Two common forms are the independent samples t test, which compares means of two unrelated groups, such as means for a treatment group relative to a control group, and the paired samples t test, which compares means of related groups, such as the pretest and post-test scores for the same individuals before and after a treatment. A t test is essentially determining whether the difference in means between groups is larger than the variability within the groups themselves.

Another fundamental set of inferential statistics falls under the general linear model and includes analysis of variance (ANOVA), correlation and regression. To determine whether group means are different, use the t test or the ANOVA. Note that the t test is limited to two groups, but the ANOVA is applicable to two or more groups. For example, an ANOVA could examine whether a primary outcome measure—dependent variable—is significantly different for groups assigned to one of three different interventions. The ANOVA result comes in an F statistic along with a p value or confidence interval (CI), which tells whether there is some significant difference among groups. We then need to use other statistics (eg, planned comparisons or a Bonferroni comparison, to give two possibilities) to determine which of those groups are significantly different from one another. Planned comparisons are established before conducting the analysis to contrast the groups, while other tests like the Bonferroni comparison are conducted post-hoc (ie, after analysis).

Examining relationships using correlation and regression

The general linear model contains two other major methods of analysis, correlation and regression. Correlation reveals whether values between two variables tend to systematically change together. Correlation analysis has three general outcomes: (1) the two variables rise and fall together; (2) as values in one variable rise, the other falls; and (3) the two variables do not appear to be systematically related. To make those determinations, we use the correlation coefficient (r) and related p value or CI. First, use the p value or CI, as compared with established significance criteria (eg, p<0.05), to determine whether a relationship is even statistically significant. If it is not, stop as there is no point in looking at the coefficients. If so, move to the correlation coefficient.

A correlation coefficient provides two very important pieces of information—the strength and direction of the relationship. An r statistic can range from −1.0 to +1.0. Strength is determined by how close the value is to −1.0 or 1.0. Either extreme indicates a perfect relationship, while a value of 0 indicates no relationship. Cohen provides guidance for interpretations: 0.1 is a weak correlation, 0.3 is a medium correlation and 0.5 is a large correlation. 1 2 These interpretations must be considered in the context of the study and relative to the literature. The valence (+ or −) coefficient reveals the direction of the relationship. A negative correlation means one value rises, while the other tends to fall, and a positive coefficient means that the values of the two variables tend to rise and fall together.

Regression adds an additional layer beyond correlation that allows predicting one value from another. Assume we are trying to predict a dependent variable (Y) from an independent variable (X). Simple linear regression gives an equation (Y = b 0 + b 1 X) for a line that we can use to predict one value from another. The three major components of that prediction are the constant (ie, the intercept represented by b 0 ), the systematic explanation of variation (b 1 ), and the error, which is a residual value not accounted for in the equation 3 but available as part of our regression output. To assess a regression model (ie, model fit), examine key pieces of the regression output: (1) F statistic and its significance to determine whether the model systematically accounts for variance in the dependent variable; (2) the r square value for a measure of how much variance in the dependent variable is accounted for by the model; (3) the significance of coefficients for each independent variable in the model; and (4) residuals to examine random error in the model. Other factors, such as outliers, are potentially important (see Field 4 ).

The aforementioned inferential tests are foundational to many other advanced statistics that are beyond the scope of this article. Inferential tests rely on foundational assumptions, including that data are normally distributed, observations are independent, and generally that our dependent or outcome variable is continuous. When data do not meet these assumptions, we turn to non-parametric statistics (see Field 4 ).

A brief history of foundational statistics

Prominent statisticians Karl Pearson and Ronald A Fisher developed and popularised many of the basic statistics that remain a foundation for statistics today. Fisher’s ideas formed the basis of null hypothesis significance testing that sets a criterion for confidence or probability of an event. 4 Among his contributions, Fisher also developed the ANOVA. Pearson’s correlation coefficient provides a way to examine whether two variables are related. The correlation coefficient is denoted by r for a relationship between two variables or R for relationships among more than two variables as in multiple correlation or regression. 4 William Gosset developed the t distribution and later the t test as a way to examine whether two values of means were statistically different. 5

Statistical software

While the aforementioned statistics can be calculated manually, researchers typically use statistical software that process data, calculate statistics and p values, and supply a summary output from the analysis. However, the programs still require an informed researcher to run the correct analysis and interpret the output. Several available programs include SAS, Stata, SPSS and R. Try using the programs through a demonstration or trial period before deciding which one to use. It also helps to know or have access to others using the program should you have questions.

Example study

The remainder of this article presents steps in statistical analysis that apply to many techniques. A recently published study on communication skills to break bad news to a patient with cancer provides an exemplar to illustrate these steps. 6 In that study, the team examined the validity of a competence assessment of communication skills, hypothesising that after receiving training, post-test scores would be statistically improved from pretest scores on the same measure. Another analysis was to examine pretest sensitisation, tested through a hypothesis that a group randomly assigned to receive a pretest and post-test would not be significantly different from a post-test-only group. To test the hypotheses, Guetterman et al 6 examined whether mean differences were statistically significant by applying t tests and ANOVA.

Steps in statistical analysis

Statistical analysis might be considered in 10 related steps. These steps assume necessary background activities, such as conducting literature review and writing clear research question or aims, are already complete.

Step 1. Formulate a hypothesis to test

In statistical analysis, we test hypotheses. Therefore, it is necessary to formulate hypotheses that are testable. A hypothesis is specific, detailed and congruent with statistical procedures. A null hypothesis gives a prediction and typically uses words like ‘no difference’ or ‘no association’. 7 For example, we may hypothesise that group means on a certain measure are not significantly different and test that with an ANOVA or t-test. For example, in the exemplar study, one of the hypotheses was ‘MPathic-VR scores will improve (decreased score reflects better performance) from the preseminar test to the postseminar test based on exposure to the [breaking bad news] BBN intervention’ (p508), which was tested with a t test. 6 Hypotheses about relationships among variables could be tested with correlation and regression. Ultimately, hypotheses are driven by the purpose or aims of a study and further subdivide the purpose or aims into aspects that are specific and testable. When forming hypotheses, a concern is that having too many dependent variables leads to multiple tests of the same data set. This concern, called multiple comparisons or multiplicity, can inflate the likelihood of finding a significant relationship when none exists. Conducting fewer tests and adjusting the p value are ways to mitigate the concern.

Step 2. Select a test to run based on research questions or hypotheses

The statistical test must match the intended hypothesis and research question. Descriptive statistics allow us to examine trends limited to typical values, spread of values and distributions of data. ANOVAs and t tests are methods to test whether means are statistically different among groups and what those differences are. In the exemplar study, the authors used paired samples t-tests for pre–post scores with the same individuals and independent t tests for differences among groups. 6

Correlation is a method to examine whether two or more variables are related to one another, and regression extends that idea by allowing us to fit a line to make predictions about one variable based on a linear relationship to another. These statistical tests alone do not determine cause and effect, but merely associations. Causal inferences can only be made with certain research designs (eg, experiments) and perhaps with advanced statistical techniques (eg, propensity score analysis). Table 3 provides guidance for determining which statistical test to use.

Choosing and interpreting statistics for studies common in primary care

I want toStatistical choiceIndependent variableDependent variableHow to interpret
Examine trends or distributions.Descriptive statisticsCategorical or continuousCategorical or continuousReport the statistic as is to describe the data set.
Compare group means.t testsCategorical with two levels (ie, two groups)ContinuousExamine the t statistic and significance level.
If significant, clearly report which group mean is higher, along with the effect size.
Compare group means.Analysis of varianceCategorical with two or more levels (ie, two or more groups)ContinuousExamine the statistic and significance level.
If significant, clearly report which group means are significantly different and how (eg, which are higher), along with the effect size.
Examine whether variables are associated.CorrelationContinuousContinuousExamine the r statistic and significance level.
If significant, describe whether a positive or negative correlation and its strength.
Gain a detailed understanding of the association of variables and use one or more variables to predict another.RegressionContinuous or categorical, may have more than one independent variable in multiple regressionContinuousExamine the statistic and significance level.
If significant, examine the R square for how much variance the model accounts for.
Determine whether each regression coefficient is significant; if significant, discuss the coefficients.

Step 3. Conduct a power analysis to determine a sample size

Before conducting analysis, we need to ensure that we will have an adequate sample size to detect an effect. Sample size relates to the concept of power. For example, to detect a small effect, a larger sample is needed. Larger sample sizes can thus detect a smaller effect. Sample size is determined through a power analysis. The determination of sample size is never a simple percent of the population, but a calculated number based on the planned statistical tests, significance level and effect size. 8 I recommend using G*Power for basic power calculations, although many other options are available. In the exemplar study, the authors did not report their power analysis prior to conducting the study, but they gave a post-hoc power analysis of the actual power based on their sample size and the effect size detected. 6

Step 4. Prepare data for analysis

Data often need cleaning and other preparation before conducting analysis. Problems requiring cleaning include values outside of an acceptable range and missing values. Any particular value could be wrong because of a data entry error or data collection problem. Visually inspecting data can reveal anomalies. For example, an age value of 200 is clearly an error, or a value of 9 on a 1–5 Likert-type scale is an error. An easy way to start inspecting data is to sort each variable by ascending values and then descending values to look for atypical values. Then, try to correct the problem by determining what the value should be. Missing values are a more complicated problem because a concern is why the value is missing. A few missing values at random is not necessarily a concern, but a pattern of missing values (eg, individuals from a specific ethnic group tend to skip a certain question) indicates a systematic missingness that could indicate a problem with the data collection instrument. Descriptive statistics are an additional way to check for errors and ensure data are ready for analysis. While not discussed in the communication assessment exemplar, the authors did prepare data for analysis and report missing values in their descriptive statistics.

Step 5. Always start with descriptive statistics

Before running inferential statistics, it is critical to first describe the data. Obtaining descriptive statistics is a way to check whether data are ready for further analysis. Descriptive statistics give a general sense of trends and can illuminate errors by reviewing frequencies, minimums and maximums that can indicate values outside of the accepted range. Descriptive statistics are also an important step to check whether we meet assumptions for statistical tests. In a quantitative study, descriptive statistics also inform the first table of the results that reports information about the sample, as seen in table 2 of the exemplar study. 6

Step 6. Check assumptions of statistical tests

All statistical tests rely on foundational assumptions. Although some tests are more robust to violations, checking assumptions indicates whether the test is likely to be valid for a particular data set. Foundational parametric statistics (eg, t tests, ANOVA, correlation, regression) assume independent observations and a normal linear distribution of data. In the exemplar study, the authors noted ‘Data from both groups met normality assumptions, based on the Shapiro–Wilk test’ (p508), and gave the statistics in addition to noting specific assumptions for the independent t tests around equality of variances. 6

Step 7. Run the analysis

Conducting the analysis involves running whatever tests were planned. Statistics may be calculated manually or using software like SPSS, Stata, SAS or R. Statistical software provides an output with key tests statistics, p values that indicate whether a result is likely systematic or random, and indicators of fit. In the exemplar study, the authors noted they used SPSS V.22. 6

Step 8. Examine how well the statistical model fits

The first step involves examining whether the statistical model was significant or a good fit. For t tests, ANOVAs, correlation and regression, first examine an overall test of significance. For a t test, if the t statistic is not statistically significant (eg, p>0.05 or a CI crossing 0), we can conclude no significant difference between groups. The communication assessment exemplar reports significance of the t tests along with measures such as equality of variance.

For an ANOVA, if the F statistic is not statistically significant (eg, p>0.05 or a CI crossing 0), we can conclude no significant difference between groups and stop because there is no point in further examining what groups may be different. If the F statistic is significant in an ANOVA, we can then use contrasts or post-hoc tests to examine what is different. For a correlation test, if the r value is not statistically significant (eg, p>0.05 or a CI crossing 0), we can stop because there is no point in looking at the magnitude or direction of the coefficient. If it is significant, we can proceed to interpret the r. Finally, for a regression, we can examine the F statistic as an omnibus test and its significance. If it is not significant, we can stop. If it is significant, then examine the p value of each independent variable and residuals.

Step 9. Report the results of statistical analysis

When writing statistical results, always start with descriptive statistics and note whether assumptions for tests were met. When reporting inferential statistical tests, give the statistic itself (eg, a F statistic), the measure of significance (p value or CI), the effect size and a brief written interpretation of the statistical test. The interpretation, for example, could note that an intervention was not significantly different from the control or that it was associated with improvement that was statistically significant. For example, the exemplar study gives the pre–post means along standard error, t statistic, p value and an interpretation that postseminar means were lower, along with a reminder to the reader that lower is better. 6

When writing for a journal, follow the journal’s style. Many styles italicise non-Greek statistics (eg, the p value), but follow the particular instructions given. Remember a p value can never be 0 even though some statistical programs round the p to 0. In that case, most styles prefer to report as p<0.001.

Step 10. Evaluate threats to statistical conclusion validity

Shadish et al 9 provide nine threats to statistical conclusion validity in drawing inferences about the relationship between two variables; the threats can broadly apply to many statistical analyses. Although it helps to consider and anticipate these threats when designing a research study, some only arise after data collection and analysis. Threats to statistical conclusion validity appear in table 4 . 9 Pertinent threats can be dealt with to the extent possible (eg, if assumptions were not met, select another test) and should be discussed as limitations in the research report. For example, in the exemplar study, the authors noted the sample size as a limitation but reported that a post-hoc power analysis found adequate power. 6

Threats to statistical conclusion validity

ThreatDescription
Low statistical power (see step 3)The sample size is not adequate to detect an effect.
Violated assumptions of statistical tests (see step 6)The data violate assumptions needed for the test, such as normality.
Fishing and error ratesRepeated tests of the same data (eg, multiple comparisons) increase chances of errors in conclusions.
Unreliability of measuresError in measurement or instruments can artificially inflate or decrease apparent relationships among variables.
Restricted rangeStatistics can be biased by limited outcome values (eg, high/low only) or floor or ceiling effects in which participants scores are clustered around high or low values.
Unreliability of treatment implementationIn experiments, unstandardised or inconsistent implementation affects conclusions about correlation.
Extraneous variance in an experimentThe setting of a study can introduce error.
Heterogeneity of unitsAs participants differ within conditions, standard deviation can increase and introduce error, making it harder to detect effects.
Inaccurate effect size estimationOutliers or incorrect effect size calculations (eg, a continuous measure for a dichotomous dependent variable) can skew measures of effect.

Key resources to learn more about statistics include Field 4 and Salkind 10 for foundational information. For advanced statistics, Hair et al 11 and Tabachnick and Fidell 12 provide detailed information on multivariate statistics. Finally, the University of California Los Angeles Institute for Digital Research and Education (stats.idre.ucla.edu/other/annotatedoutput/) provides annotated output from Stata, SAS, Stata and MPlus for many statistical tests to help researchers read the output and understand what it means.

Researchers in family medicine and community health often conduct statistical analyses to address research questions. Following specific steps ensures a systematic and rigorous analysis. Knowledge of these essential statistical procedures will equip family medicine and community health researchers with interpreting literature, reviewing literature and conducting appropriate statistical analysis of their quantitative data.

Nevertheless, I gently remind you that the steps are interrelated, and statistics is not only a consideration at the end of data collection. When designing a quantitative study, investigators should remember that statistics is based on distributions, meaning statistics works with aggregated numerical data and relies on variance within that data to test statistical hypotheses about group differences, relationships or trends. Statistics provides a broad view, based on these distributions, which brings implications at the early design phase. In designing a quantitative study, the nature of statistics generally suggests a larger number of participants in the research (ie, a larger n) to have adequate power to detect statistical significance and draw valid conclusions. Therefore, it will likely be helpful for researchers to include a biostatistician as early as possible in the research team when designing a study.

Contributors: The sole author, TCG, is responsible for the conceptualisation, writing and preparation of this manuscript.

Funding: This study was funded by the National Institutes of Health (10.13039/100000002) and grant number 1K01LM012739.

Competing interests: None declared.

Patient consent for publication: Not required.

Provenance and peer review: Not commissioned; internally peer reviewed.

IMAGES

  1. Standard statistical tools in research and data analysis

    descriptive statistical tools in research

  2. Introduction to Descriptive Analysis / Descriptive Statistics

    descriptive statistical tools in research

  3. Basic statistical tools for descriptive statistics

    descriptive statistical tools in research

  4. Descriptive Statistics

    descriptive statistical tools in research

  5. Descriptive Statistics

    descriptive statistical tools in research

  6. Research 101: Descriptive statistics

    descriptive statistical tools in research

VIDEO

  1. Statistical Tools for Data Analysis and Synopsis-Thesis Writing: Part-1, Dr. Shalini Agarwal

  2. Day-1 Application of descriptive and Inferential tools in business analytics

  3. Statistical Tools

  4. 2# Descriptive statistical of Data Variables

  5. Descriptive Statistics with R Studio : Secondary Data

  6. Data Analytics using SPSS Day 1: Introduction to Statistical Tools and Data Analysis

COMMENTS

  1. Descriptive Statistics

    Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

  2. Descriptive Analytics

    Descriptive Statistics: In research, descriptive analytics often takes the form of descriptive statistics. This includes calculating measures of central tendency (like mean, median, and mode), measures of dispersion (like range, variance, and standard deviation), and measures of frequency (like count, percent, and frequency).

  3. Descriptive Statistics for Summarising Data

    This chapter discusses and illustrates descriptive statistics. The purpose of the procedures and fundamental concepts reviewed in this chapter is quite straightforward: to facilitate the description and summarisation of data. By 'describe' ...

  4. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, ...

  5. Quant Analysis 101: Descriptive Statistics

    Learn about the key concepts and measures within descriptive statistics, including measures of central tendency and dispersion.

  6. Descriptive Statistics

    Descriptive Statistics Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods. Descriptive statistics is primarily concerned with measures of central tendency, variability, and distribution ...

  7. Introduction to Descriptive Statistics

    Emphasizing its pivotal role in academia, descriptive statistics serve as a fundamental tool for summarizing and analyzing data across disciplines. The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques.

  8. Exploratory Data Analysis: Frequencies, Descriptive Statistics

    Effective presentation of study results, in presentation or manuscript form, typically starts with frequencies and descriptive statistics (ie, mean, medians, standard deviations). One can get a better sense of the variables by examining these data to determine whether a balanced and sufficient research design exists.

  9. Quantitative analysis: Descriptive statistics

    14 Quantitative analysis: Descriptive statistics Numeric data collected in a research project can be analysed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs.

  10. Introduction to Descriptive statistics

    The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques.

  11. Descriptive Statistics in Research: Your Complete Guide- Qualtrics

    What do we mean by descriptive statistics? With any kind of data, the main objective is to describe a population at large — and using descriptive statistics, researchers can quantify and describe the basic characteristics of a given data set.

  12. Descriptive Statistics

    Descriptive statistics provide an essential foundation for understanding and summarizing large datasets by offering valuable insights into the central tendencies, dispersion, and shape of the distribution. By leveraging measures such as mean, median, mode, range,...

  13. Descriptive Statistics

    Descriptive statistics are used to describe the basic features of your study's data and form the basis of virtually every quantitative analysis of data.

  14. Which descriptive statistics tool should you choose?

    The purpose of descriptive statistics Describing data is an essential part of statistical analysis aiming to provide a complete picture of the data before moving to exploratory analysis or predictive modeling. The type of statistical methods used for this purpose are called descriptive statistics. They include both numerical (e.g. central tendency measures such as mean, mode, median or ...

  15. Chapter 14 Quantitative Analysis Descriptive Statistics

    Chapter 14 Quantitative Analysis Descriptive Statistics Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs.

  16. What is Descriptive Statistics? Definition, Types, Examples

    What is Descriptive Statistics? Descriptive statistics refers to a set of mathematical and graphical tools used to summarize and describe essential features of a dataset. These statistics provide a clear and concise representation of data, enabling researchers, analysts, and decision-makers to gain valuable insights, identify patterns, and understand the characteristics of the information at hand.

  17. Descriptive Statistics: Reporting the Answers to the 5 Basic Questions

    Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic …

  18. Descriptive Statistics: Detailed Analysis and Practical ...

    Definition and purpose of descriptive statistics. Descriptive statistics form the cornerstone of quantitative analysis, offering a snapshot of data by producing concise summaries. The intrinsic value of these statistical methods lies in their ability to simplify complex datasets, allowing for a more digestible presentation of research findings.

  19. Selection of Appropriate Statistical Methods for Data Analysis

    Two main statistical methods are used in data analysis: descriptive statistics, which summarizes data using indexes such as mean and median and another is inferential statistics, which draw conclusions from data using statistical tests such as student's t -test.

  20. Introduction: Statistics as a Research Tool

    Two types of descriptive statistics that go hand in hand are measuresof central tendency, which describe the characteristics of the average case, and measuresof dispersion, which tell us just how typical this average case is. We use inferential statistics to make statements about a population on the basis of a sample drawn from that population.

  21. Descriptive Statistics

    Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of: Graphical/Pictorial Methods. Measures of Central Tendency.

  22. Descriptive Research in Psychology: Methods and Applications

    Defining Descriptive Research in Psychology: More Than Meets the Eye. At its core, descriptive research in psychology is a systematic approach to observing and cataloging human behavior, thoughts, and emotions in their natural context. It's the scientific equivalent of people-watching, but with a structured methodology and a keen eye for detail.

  23. Basics of statistics for primary care research

    After examining a brief overview of foundational statistical techniques, for example, differences between descriptive and inferential statistics, the article illustrates 10 steps in conducting statistical analysis with examples of each.