define statistical treatment in a research paper

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

Statistical Treatment of Data – Explained & Example

Picture of DiscoverPhDs

  • By DiscoverPhDs
  • September 8, 2020

Statistical Treatment of Data in Research

‘Statistical treatment’ is when you apply a statistical method to a data set to draw meaning from it. Statistical treatment can be either descriptive statistics, which describes the relationship between variables in a population, or inferential statistics, which tests a hypothesis by making inferences from the collected data.

Introduction to Statistical Treatment in Research

Every research student, regardless of whether they are a biologist, computer scientist or psychologist, must have a basic understanding of statistical treatment if their study is to be reliable.

This is because designing experiments and collecting data are only a small part of conducting research. The other components, which are often not so well understood by new researchers, are the analysis, interpretation and presentation of the data. This is just as important, if not more important, as this is where meaning is extracted from the study .

What is Statistical Treatment of Data?

Statistical treatment of data is when you apply some form of statistical method to a data set to transform it from a group of meaningless numbers into meaningful output.

Statistical treatment of data involves the use of statistical methods such as:

  • regression,
  • conditional probability,
  • standard deviation and
  • distribution range.

These statistical methods allow us to investigate the statistical relationships between the data and identify possible errors in the study.

In addition to being able to identify trends, statistical treatment also allows us to organise and process our data in the first place. This is because when carrying out statistical analysis of our data, it is generally more useful to draw several conclusions for each subgroup within our population than to draw a single, more general conclusion for the whole population. However, to do this, we need to be able to classify the population into different subgroups so that we can later break down our data in the same way before analysing it.

Statistical Treatment Example – Quantitative Research

Statistical Treatment of Data Example

For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the data into different subgroups based on these parameters to determine how each one affects the effectiveness of the drug. Categorising the data in this way is an example of performing basic statistical treatment.

Type of Errors

A fundamental part of statistical treatment is using statistical methods to identify possible outliers and errors. No matter how careful we are, all experiments are subject to inaccuracies resulting from two types of errors: systematic errors and random errors.

Systematic errors are errors associated with either the equipment being used to collect the data or with the method in which they are used. Random errors are errors that occur unknowingly or unpredictably in the experimental configuration, such as internal deformations within specimens or small voltage fluctuations in measurement testing instruments.

These experimental errors, in turn, can lead to two types of conclusion errors: type I errors and type II errors . A type I error is a false positive which occurs when a researcher rejects a true null hypothesis. On the other hand, a type II error is a false negative which occurs when a researcher fails to reject a false null hypothesis.

List of Abbreviations Thesis

Need to write a list of abbreviations for a thesis or dissertation? Read our post to find out where they go, what to include and how to format them.

Science Investigatory Project

A science investigatory project is a science-based research project or study that is performed by school children in a classroom, exhibition or science fair.

Purpose of Research - What is Research

The purpose of research is to enhance society by advancing knowledge through developing scientific theories, concepts and ideas – find out more on what this involves.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

define statistical treatment in a research paper

Browse PhDs Now

DiscoverPhDs_Annotated_Bibliography_Literature_Review

Find out the differences between a Literature Review and an Annotated Bibliography, whey they should be used and how to write them.

Covid-19 Guidance for Students

Stay up to date with current information being provided by the UK Government and Universities about the impact of the global pandemic on PhD research studies.

Freija Mendrik Profile

Freija is half way through her PhD at the Energy and Environment Institute, University of Hull, researching the transport of microplastics through the Mekong River and to the South China Sea.

define statistical treatment in a research paper

Dr Pattison gained his PhD in Cosmology from the University of Portsmouth. He is now finishing a short term post-doc within the same research group and developing a career in science communication and science education.

Join Thousands of Students

Research Paper Statistical Treatment of Data: A Primer

We can all agree that analyzing and presenting data effectively in a research paper is critical, yet often challenging.

This primer on statistical treatment of data will equip you with the key concepts and procedures to accurately analyze and clearly convey research findings.

You'll discover the fundamentals of statistical analysis and data management, the common quantitative and qualitative techniques, how to visually represent data, and best practices for writing the results - all framed specifically for research papers.

If you are curious on how AI can help you with statistica analysis for research, check Hepta AI .

Introduction to Statistical Treatment in Research

Statistical analysis is a crucial component of both quantitative and qualitative research. Properly treating data enables researchers to draw valid conclusions from their studies. This primer provides an introductory guide to fundamental statistical concepts and methods for manuscripts.

Understanding the Importance of Statistical Treatment

Careful statistical treatment demonstrates the reliability of results and ensures findings are grounded in robust quantitative evidence. From determining appropriate sample sizes to selecting accurate analytical tests, statistical rigor adds credibility. Both quantitative and qualitative papers benefit from precise data handling.

Objectives of the Primer

This primer aims to equip researchers with best practices for:

Statistical tools to apply during different research phases

Techniques to manage, analyze, and present data

Methods to demonstrate the validity and reliability of measurements

By covering fundamental concepts ranging from descriptive statistics to measurement validity, it enables both novice and experienced researchers to incorporate proper statistical treatment.

Navigating the Primer: Key Topics and Audience

The primer spans introductory topics including:

Research planning and design

Data collection, management, analysis

Result presentation and interpretation

While useful for researchers at any career stage, earlier-career scientists with limited statistical exposure will find it particularly valuable as they prepare manuscripts.

How do you write a statistical method in a research paper?

Statistical methods are a critical component of research papers, allowing you to analyze, interpret, and draw conclusions from your study data. When writing the statistical methods section, you need to provide enough detail so readers can evaluate the appropriateness of the methods you used.

Here are some key things to include when describing statistical methods in a research paper:

Type of Statistical Tests Used

Specify the types of statistical tests performed on the data, including:

Parametric vs nonparametric tests

Descriptive statistics (means, standard deviations)

Inferential statistics (t-tests, ANOVA, regression, etc.)

Statistical significance level (often p < 0.05)

For example: We used t-tests and one-way ANOVA to compare means across groups, with statistical significance set at p < 0.05.

Analysis of Subgroups

If you examined subgroups or additional variables, describe the methods used for these analyses.

For example: We stratified data by gender and used chi-square tests to analyze differences between subgroups.

Software and Versions

List any statistical software packages used for analysis, including version numbers. Common programs include SPSS, SAS, R, and Stata.

For example: Data were analyzed using SPSS version 25 (IBM Corp, Armonk, NY).

The key is to give readers enough detail to assess the rigor and appropriateness of your statistical methods. The methods should align with your research aims and design. Keep explanations clear and concise using consistent terminology throughout the paper.

What are the 5 statistical treatment in research?

The five most common statistical treatments used in academic research papers include:

The mean, or average, is used to describe the central tendency of a dataset. It provides a singular value that represents the middle of a distribution of numbers. Calculating means allows researchers to characterize typical observations within a sample.

Standard Deviation

Standard deviation measures the amount of variability in a dataset. A low standard deviation indicates observations are clustered closely around the mean, while a high standard deviation signifies the data is more spread out. Reporting standard deviations helps readers contextualize means.

Regression Analysis

Regression analysis models the relationship between independent and dependent variables. It generates an equation that predicts changes in the dependent variable based on changes in the independents. Regressions are useful for hypothesizing causal connections between variables.

Hypothesis Testing

Hypothesis testing evaluates assumptions about population parameters based on statistics calculated from a sample. Common hypothesis tests include t-tests, ANOVA, and chi-squared. These quantify the likelihood of observed differences being due to chance.

Sample Size Determination

Sample size calculations identify the minimum number of observations needed to detect effects of a given size at a desired statistical power. Appropriate sampling ensures studies can uncover true relationships within the constraints of resource limitations.

These five statistical analysis methods form the backbone of most quantitative research processes. Correct application allows researchers to characterize data trends, model predictive relationships, and make probabilistic inferences regarding broader populations. Expertise in these techniques is fundamental for producing valid, reliable, and publishable academic studies.

How do you know what statistical treatment to use in research?

The selection of appropriate statistical methods for the treatment of data in a research paper depends on three key factors:

The Aim and Objective of the Study

The aim and objectives that the study seeks to achieve will determine the type of statistical analysis required.

Descriptive research presenting characteristics of the data may only require descriptive statistics like measures of central tendency (mean, median, mode) and dispersion (range, standard deviation).

Studies aiming to establish relationships or differences between variables need inferential statistics like correlation, t-tests, ANOVA, regression etc.

Predictive modeling research requires methods like regression, discriminant analysis, logistic regression etc.

Thus, clearly identifying the research purpose and objectives is the first step in planning appropriate statistical treatment.

Type and Distribution of Data

The type of data (categorical, numerical) and its distribution (normal, skewed) also guide the choice of statistical techniques.

Parametric tests have assumptions related to normality and homogeneity of variance.

Non-parametric methods are distribution-free and better suited for non-normal or categorical data.

Testing data distribution and characteristics is therefore vital.

Nature of Observations

Statistical methods also differ based on whether the observations are paired or unpaired.

Analyzing changes within one group requires paired tests like paired t-test, Wilcoxon signed-rank test etc.

Comparing between two or more independent groups needs unpaired tests like independent t-test, ANOVA, Kruskal-Wallis test etc.

Thus the nature of observations is pivotal in selecting suitable statistical analyses.

In summary, clearly defining the research objectives, testing the collected data, and understanding the observational units guides proper statistical treatment and interpretation.

What is statistical techniques in research paper?

Statistical methods are essential tools in scientific research papers. They allow researchers to summarize, analyze, interpret and present data in meaningful ways.

Some key statistical techniques used in research papers include:

Descriptive statistics: These provide simple summaries of the sample and the measures. Common examples include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation) and graphs (histograms, pie charts).

Inferential statistics: These help make inferences and predictions about a population from a sample. Common techniques include estimation of parameters, hypothesis testing, correlation and regression analysis.

Analysis of variance (ANOVA): This technique allows researchers to compare means across multiple groups and determine statistical significance.

Factor analysis: This technique identifies underlying relationships between variables and latent constructs. It allows reducing a large set of variables into fewer factors.

Structural equation modeling: This technique estimates causal relationships using both latent and observed factors. It is widely used for testing theoretical models in social sciences.

Proper statistical treatment and presentation of data are crucial for the integrity of any quantitative research paper. Statistical techniques help establish validity, account for errors, test hypotheses, build models and derive meaningful insights from the research.

Fundamental Concepts and Data Management

Exploring basic statistical terms.

Understanding key statistical concepts is essential for effective research design and data analysis. This includes defining key terms like:

Statistics : The science of collecting, organizing, analyzing, and interpreting numerical data to draw conclusions or make predictions.

Variables : Characteristics or attributes of the study participants that can take on different values.

Measurement : The process of assigning numbers to variables based on a set of rules.

Sampling : Selecting a subset of a larger population to estimate characteristics of the whole population.

Data types : Quantitative (numerical) or qualitative (categorical) data.

Descriptive vs. inferential statistics : Descriptive statistics summarize data while inferential statistics allow making conclusions from the sample to the larger population.

Ensuring Validity and Reliability in Measurement

When selecting measurement instruments, it is critical they demonstrate:

Validity : The extent to which the instrument measures what it intends to measure.

Reliability : The consistency of measurement over time and across raters.

Researchers should choose instruments aligned to their research questions and study methodology .

Data Management Essentials

Proper data management requires:

Ethical collection procedures respecting autonomy, justice, beneficence and non-maleficence.

Handling missing data through deletion, imputation or modeling procedures.

Data cleaning by identifying and fixing errors, inconsistencies and duplicates.

Data screening via visual inspection and statistical methods to detect anomalies.

Data Management Techniques and Ethical Considerations

Ethical data management includes:

Obtaining informed consent from all participants.

Anonymization and encryption to protect privacy.

Secure data storage and transfer procedures.

Responsible use of statistical tools free from manipulation or misrepresentation.

Adhering to ethical guidelines preserves public trust in the integrity of research.

Statistical Methods and Procedures

This section provides an introduction to key quantitative analysis techniques and guidance on when to apply them to different types of research questions and data.

Descriptive Statistics and Data Summarization

Descriptive statistics summarize and organize data characteristics such as central tendency, variability, and distributions. Common descriptive statistical methods include:

Measures of central tendency (mean, median, mode)

Measures of variability (range, interquartile range, standard deviation)

Graphical representations (histograms, box plots, scatter plots)

Frequency distributions and percentages

These methods help describe and summarize the sample data so researchers can spot patterns and trends.

Inferential Statistics for Generalizing Findings

While descriptive statistics summarize sample data, inferential statistics help generalize findings to the larger population. Common techniques include:

Hypothesis testing with t-tests, ANOVA

Correlation and regression analysis

Nonparametric tests

These methods allow researchers to draw conclusions and make predictions about the broader population based on the sample data.

Selecting the Right Statistical Tools

Choosing the appropriate analyses involves assessing:

The research design and questions asked

Type of data (categorical, continuous)

Data distributions

Statistical assumptions required

Matching the correct statistical tests to these elements helps ensure accurate results.

Statistical Treatment of Data for Quantitative Research

For quantitative research, common statistical data treatments include:

Testing data reliability and validity

Checking assumptions of statistical tests

Transforming non-normal data

Identifying and handling outliers

Applying appropriate analyses for the research questions and data type

Examples and case studies help demonstrate correct application of statistical tests.

Approaches to Qualitative Data Analysis

Qualitative data is analyzed through methods like:

Thematic analysis

Content analysis

Discourse analysis

Grounded theory

These help researchers discover concepts and patterns within non-numerical data to derive rich insights.

Data Presentation and Research Method

Crafting effective visuals for data presentation.

When presenting analyzed results and statistics in a research paper, well-designed tables, graphs, and charts are key for clearly showcasing patterns in the data to readers. Adhering to formatting standards like APA helps ensure professional data presentation. Consider these best practices:

Choose the appropriate visual type based on the type of data and relationship being depicted. For example, bar charts for comparing categorical data, line graphs to show trends over time.

Label the x-axis, y-axis, legends clearly. Include informative captions.

Use consistent, readable fonts and sizing. Avoid clutter with unnecessary elements. White space can aid readability.

Order data logically. Such as largest to smallest values, or chronologically.

Include clear statistical notations, like error bars, where applicable.

Following academic standards for visuals lends credibility while making interpretation intuitive for readers.

Writing the Results Section with Clarity

When writing the quantitative Results section, aim for clarity by balancing statistical reporting with interpretation of findings. Consider this structure:

Open with an overview of the analysis approach and measurements used.

Break down results by logical subsections for each hypothesis, construct measured etc.

Report exact statistics first, followed by interpretation of their meaning. For example, “Participants exposed to the intervention had significantly higher average scores (M=78, SD=3.2) compared to controls (M=71, SD=4.1), t(115)=3.42, p = 0.001. This suggests the intervention was highly effective for increasing scores.”

Use present verb tense. And scientific, formal language.

Include tables/figures where they aid understanding or visualization.

Writing results clearly gives readers deeper context around statistical findings.

Highlighting Research Method and Design

With a results section full of statistics, it's vital to communicate key aspects of the research method and design. Consider including:

Brief overview of study variables, materials, apparatus used. Helps reproducibility.

Descriptions of study sampling techniques, data collection procedures. Supports transparency.

Explanations around approaches to measurement, data analysis performed. Bolsters methodological rigor.

Noting control variables, attempts to limit biases etc. Demonstrates awareness of limitations.

Covering these methodological details shows readers the care taken in designing the study and analyzing the results obtained.

Acknowledging Limitations and Addressing Biases

Honestly recognizing methodological weaknesses and limitations goes a long way in establishing credibility within the published discussion section. Consider transparently noting:

Measurement errors and biases that may have impacted findings.

Limitations around sampling methods that constrain generalizability.

Caveats related to statistical assumptions, analysis techniques applied.

Attempts made to control/account for biases and directions for future research.

Rather than detracting value, acknowledging limitations demonstrates academic integrity regarding the research performed. It also gives readers deeper insight into interpreting the reported results and findings.

Conclusion: Synthesizing Statistical Treatment Insights

Recap of statistical treatment fundamentals.

Statistical treatment of data is a crucial component of high-quality quantitative research. Proper application of statistical methods and analysis principles enables valid interpretations and inferences from study data. Key fundamentals covered include:

Descriptive statistics to summarize and describe the basic features of study data

Inferential statistics to make judgments of the probability and significance based on the data

Using appropriate statistical tools aligned to the research design and objectives

Following established practices for measurement techniques, data collection, and reporting

Adhering to these core tenets ensures research integrity and allows findings to withstand scientific scrutiny.

Key Takeaways for Research Paper Success

When incorporating statistical treatment into a research paper, keep these best practices in mind:

Clearly state the research hypothesis and variables under examination

Select reliable and valid quantitative measures for assessment

Determine appropriate sample size to achieve statistical power

Apply correct analytical methods suited to the data type and distribution

Comprehensively report methodology procedures and statistical outputs

Interpret results in context of the study limitations and scope

Following these guidelines will bolster confidence in the statistical treatment and strengthen the research quality overall.

Encouraging Continued Learning and Application

As statistical techniques continue advancing, it is imperative for researchers to actively further their statistical literacy. Regularly reviewing new methodological developments and learning advanced tools will augment analytical capabilities. Persistently putting enhanced statistical knowledge into practice through research projects and manuscript preparations will cement competencies. Statistical treatment mastery is a journey requiring persistent effort, but one that pays dividends in research proficiency.

Avatar of Antonio Carlos Filho

Antonio Carlos Filho @acfilho_dev

Statistical Treatment

Statistics Definitions > Statistical Treatment

What is Statistical Treatment?

Statistical treatment can mean a few different things:

  • In Data Analysis : Applying any statistical method — like regression or calculating a mean — to data.
  • In Factor Analysis : Any combination of factor levels is called a treatment.
  • In a Thesis or Experiment : A summary of the procedure, including statistical methods used.

1. Statistical Treatment in Data Analysis

The term “statistical treatment” is a catch all term which means to apply any statistical method to your data. Treatments are divided into two groups: descriptive statistics , which summarize your data as a graph or summary statistic and inferential statistics , which make predictions and test hypotheses about your data. Treatments could include:

  • Finding standard deviations and sample standard errors ,
  • Finding T-Scores or Z-Scores .
  • Calculating Correlation coefficients .

2. Treatments in Factor Analysis

statistical treatment

3. Treatments in a Thesis or Experiment

Sometimes you might be asked to include a treatment as part of a thesis. This is asking you to summarize the data and analysis portion of your experiment, including measurements and formulas used. For example, the following experimental summary is from Statistical Treatment in Acta Physiologica Scandinavica. :

Each of the test solutions was injected twice in each subject…30-42 values were obtained for the intensity, and a like number for the duration, of the pain indiced by the solution. The pain values reported in the following are arithmetical means for these 30-42 injections.”

The author goes on to provide formulas for the mean, the standard deviation and the standard error of the mean.

Vogt, W.P. (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences . SAGE. Wheelan, C. (2014). Naked Statistics . W. W. Norton & Company Unknown author (1961) Chapter 3: Statistical Treatment. Acta Physiologica Scandinavica. Volume 51, Issue s179 December Pages 16–20.

define statistical treatment in a research paper

Statistical Analysis in Research: Meaning, Methods and Types

Home » Videos » Statistical Analysis in Research: Meaning, Methods and Types

The scientific method is an empirical approach to acquiring new knowledge by making skeptical observations and analyses to develop a meaningful interpretation. It is the basis of research and the primary pillar of modern science. Researchers seek to understand the relationships between factors associated with the phenomena of interest. In some cases, research works with vast chunks of data, making it difficult to observe or manipulate each data point. As a result, statistical analysis in research becomes a means of evaluating relationships and interconnections between variables with tools and analytical techniques for working with large data. Since researchers use statistical power analysis to assess the probability of finding an effect in such an investigation, the method is relatively accurate. Hence, statistical analysis in research eases analytical methods by focusing on the quantifiable aspects of phenomena.

What is Statistical Analysis in Research? A Simplified Definition

Statistical analysis uses quantitative data to investigate patterns, relationships, and patterns to understand real-life and simulated phenomena. The approach is a key analytical tool in various fields, including academia, business, government, and science in general. This statistical analysis in research definition implies that the primary focus of the scientific method is quantitative research. Notably, the investigator targets the constructs developed from general concepts as the researchers can quantify their hypotheses and present their findings in simple statistics.

When a business needs to learn how to improve its product, they collect statistical data about the production line and customer satisfaction. Qualitative data is valuable and often identifies the most common themes in the stakeholders’ responses. On the other hand, the quantitative data creates a level of importance, comparing the themes based on their criticality to the affected persons. For instance, descriptive statistics highlight tendency, frequency, variation, and position information. While the mean shows the average number of respondents who value a certain aspect, the variance indicates the accuracy of the data. In any case, statistical analysis creates simplified concepts used to understand the phenomenon under investigation. It is also a key component in academia as the primary approach to data representation, especially in research projects, term papers and dissertations. 

Most Useful Statistical Analysis Methods in Research

Using statistical analysis methods in research is inevitable, especially in academic assignments, projects, and term papers. It’s always advisable to seek assistance from your professor or you can try research paper writing by CustomWritings before you start your academic project or write statistical analysis in research paper. Consulting an expert when developing a topic for your thesis or short mid-term assignment increases your chances of getting a better grade. Most importantly, it improves your understanding of research methods with insights on how to enhance the originality and quality of personalized essays. Professional writers can also help select the most suitable statistical analysis method for your thesis, influencing the choice of data and type of study.

Descriptive Statistics

Descriptive statistics is a statistical method summarizing quantitative figures to understand critical details about the sample and population. A description statistic is a figure that quantifies a specific aspect of the data. For instance, instead of analyzing the behavior of a thousand students, research can identify the most common actions among them. By doing this, the person utilizes statistical analysis in research, particularly descriptive statistics.

  • Measures of central tendency . Central tendency measures are the mean, mode, and media or the averages denoting specific data points. They assess the centrality of the probability distribution, hence the name. These measures describe the data in relation to the center.
  • Measures of frequency . These statistics document the number of times an event happens. They include frequency, count, ratios, rates, and proportions. Measures of frequency can also show how often a score occurs.
  • Measures of dispersion/variation . These descriptive statistics assess the intervals between the data points. The objective is to view the spread or disparity between the specific inputs. Measures of variation include the standard deviation, variance, and range. They indicate how the spread may affect other statistics, such as the mean.
  • Measures of position . Sometimes researchers can investigate relationships between scores. Measures of position, such as percentiles, quartiles, and ranks, demonstrate this association. They are often useful when comparing the data to normalized information.

Inferential Statistics

Inferential statistics is critical in statistical analysis in quantitative research. This approach uses statistical tests to draw conclusions about the population. Examples of inferential statistics include t-tests, F-tests, ANOVA, p-value, Mann-Whitney U test, and Wilcoxon W test. This

Common Statistical Analysis in Research Types

Although inferential and descriptive statistics can be classified as types of statistical analysis in research, they are mostly considered analytical methods. Types of research are distinguishable by the differences in the methodology employed in analyzing, assembling, classifying, manipulating, and interpreting data. The categories may also depend on the type of data used.

Predictive Analysis

Predictive research analyzes past and present data to assess trends and predict future events. An excellent example of predictive analysis is a market survey that seeks to understand customers’ spending habits to weigh the possibility of a repeat or future purchase. Such studies assess the likelihood of an action based on trends.

Prescriptive Analysis

On the other hand, a prescriptive analysis targets likely courses of action. It’s decision-making research designed to identify optimal solutions to a problem. Its primary objective is to test or assess alternative measures.

Causal Analysis

Causal research investigates the explanation behind the events. It explores the relationship between factors for causation. Thus, researchers use causal analyses to analyze root causes, possible problems, and unknown outcomes.

Mechanistic Analysis

This type of research investigates the mechanism of action. Instead of focusing only on the causes or possible outcomes, researchers may seek an understanding of the processes involved. In such cases, they use mechanistic analyses to document, observe, or learn the mechanisms involved.

Exploratory Data Analysis

Similarly, an exploratory study is extensive with a wider scope and minimal limitations. This type of research seeks insight into the topic of interest. An exploratory researcher does not try to generalize or predict relationships. Instead, they look for information about the subject before conducting an in-depth analysis.

The Importance of Statistical Analysis in Research

As a matter of fact, statistical analysis provides critical information for decision-making. Decision-makers require past trends and predictive assumptions to inform their actions. In most cases, the data is too complex or lacks meaningful inferences. Statistical tools for analyzing such details help save time and money, deriving only valuable information for assessment. An excellent statistical analysis in research example is a randomized control trial (RCT) for the Covid-19 vaccine. You can download a sample of such a document online to understand the significance such analyses have to the stakeholders. A vaccine RCT assesses the effectiveness, side effects, duration of protection, and other benefits. Hence, statistical analysis in research is a helpful tool for understanding data.

Sources and links For the articles and videos I use different databases, such as Eurostat, OECD World Bank Open Data, Data Gov and others. You are free to use the video I have made on your site using the link or the embed code. If you have any questions, don’t hesitate to write to me!

Support statistics and data, if you have reached the end and like this project, you can donate a coffee to “statistics and data”..

Copyright © 2022 Statistics and Data

National Academies Press: OpenBook

On Being a Scientist: A Guide to Responsible Conduct in Research: Third Edition (2009)

Chapter: the treatment of data.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

 On Being a S c i e n t i s t The Treatment of Data In order to conduct research responsibly, graduate students need to understand how to treat data correctly. In 2002, the editors of the Journal of Cell Biology began to test the images in all accepted manu- scripts to see if they had been altered in ways that violated the jour- nal’s guidelines. About a quarter of the papers had images that showed evidence of inappropriate manipulation. The editors requested the original data for these papers, compared the original data with the submitted images, and required that figures be remade to accord with the guidelines. In about 1 percent of the papers, the editors found evidence for what they termed “fraudulent manipulation” that affected conclusions drawn in the paper, resulting in the papers’ rejection. Researchers who manipulate their data in ways that deceive others, even if the manipulation seems insignificant at the time, are violating both the basic values and widely accepted professional standards of science. Researchers draw conclusions based on their observations of nature. If data are altered to present a case that is stronger than the data warrant, researchers fail to fulfill all three of the obligations described at the beginning of this guide. They mis- lead their colleagues and potentially impede progress in their field or research. They undermine their own authority and trustworthiness as researchers. And they introduce information into the scientific record that could cause harm to the broader society, as when the dangers of a medical treatment are understated. This is particularly important in an age in which the Internet al- lows for an almost uncontrollably fast and extensive spread of infor- mation to an increasingly broad audience. Misleading or inaccurate data can thus have far-reaching and unpredictable consequences of a magnitude not known before the Internet and other modern com- munication technologies. Misleading data can arise from poor experimental design or care- less measurements as well as from improper manipulation. Over time,

T h e T r e a t m e n t o f D a t a  researchers have developed and have continually improved methods and tools designed to maintain the integrity of research. Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended. Because of the critical importance of methods, scientific papers must include a description of the procedures used to produce the data, sufficient to permit reviewers and readers of a scientific paper to evaluate not only the validity of the data but also the reliability of the methods used to derive those data. If this information is not available, other researchers may be less likely to accept the data and the conclusions drawn from them. They also may be unable to reproduce accurately the conditions under which the data were derived. The best methods will count for little if data are recorded incor- rectly or haphazardly. The requirements for data collection differ among disciplines and research groups, but researchers have a fun- damental obligation to create and maintain an accurate, accessible, and permanent record of what they have done in sufficient detail for others to check and replicate their work. Depending on the field, this obligation may require entering data into bound notebooks with sequentially numbered pages using permanent ink, using a computer application with secure data entry fields, identifying when and where work was done, and retaining data for specified lengths of time. In much industrial research and in some academic research, data note- books need to be signed and dated by a witness on a daily basis. Unfortunately, beginning researchers often receive little or no formal training in recording, analyzing, storing, or sharing data. Regularly scheduled meetings to discuss data issues and policies maintained by research groups and institutions can establish clear expectations and responsibilities.

10 On Being a S c i e n t i s t The Selection of Data Deborah, a third-year graduate student, and Kamala, a postdoc- toral fellow, have made a series of measurements on a new experimental semiconductor material using an expensive neutron test at a national laboratory. When they return to their own laboratory and examine the data, a newly proposed mathematical explanation of the semiconductor’s behavior predicts results indicated by a curve. During the measurements at the national laboratory, Deborah and Kamala observed electrical power fluctuations that they could not control or predict were affecting their detector. They suspect the fluctuations af- fected some of their measurements, but they don’t know which ones. When Deborah and Kamala begin to write up their results to present at a lab meeting, which they know will be the first step in preparing a publication, Kamala suggests dropping two anomalous data points near the horizontal axis from the graph they are preparing. She says that due to their deviation from the theoretical curve, the low data points were obviously caused by the power fluctuations. Furthermore, the deviations were outside the expected error bars calculated for the remaining data points. Deborah is concerned that dropping the two points could be seen as manipulating the data. She and Kamala could not be sure that any of their data points, if any, were affected by the power fluctuations. They also did not know if the theoretical prediction was valid. She wants to do a separate analysis that includes the points and discuss the issue in the lab meeting. But Kamala says that if they include the data points in their talk, others will think the issue important enough to discuss in a draft paper, which will make it harder to get the paper published. Instead, she and Deborah should use their professional judgment to drop the points now. 1. What factors should Kamala and Deborah take into account in deciding how to present the data from their experiment? 2. Should the new explanation predicting the results affect their deliberations? 3. Should a draft paper be prepared at this point? 4. If Deborah and Kamala can’t agree on how the data should be presented, should one of them consider not being an author of the paper?

T h e T r e a t m e n t o f D a t a 11 Most researchers are not required to share data with others as soon as the data are generated, although a few disciplines have ad- opted this standard to speed the pace of research. A period of confi- dentiality allows researchers to check the accuracy of their data and draw conclusions. However, when a scientific paper or book is published, other re- searchers must have access to the data and research materials needed to support the conclusions stated in the publication if they are to verify and build on that research. Many research institutions, funding agencies, and scientific journals have policies that require the sharing of data and unique research materials. Given the expectation that data will be accessible, researchers who refuse to share the evidentiary basis behind their conclusions, or the materials needed to replicate published experiments, fail to maintain the standards of science. In some cases, research data or materials may be too voluminous, unwieldy, or costly to share quickly and without expense. Neverthe- less, researchers have a responsibility to devise ways to share their data and materials in the best ways possible. For example, centralized facilities or collaborative efforts can provide a cost-effective way of providing research materials or information from large databases. Examples include repositories established to maintain and distribute astronomical images, protein sequences, archaeological data, cell lines, reagents, and transgenic animals. New issues in the treatment and sharing of data continue to arise as scientific disciplines evolve and new technologies appear. Some forms of data undergo extensive analysis before being recorded; con- sequently, sharing those data can require sharing the software and sometimes the hardware used to analyze them. Because digital tech- nologies are rapidly changing, some data stored electronically may be inaccessible in a few years unless provisions are made to transport the data from one platform to another. New forms of publication are challenging traditional practices associated with publication and the evaluation of scholarly work.

The scientific research enterprise is built on a foundation of trust. Scientists trust that the results reported by others are valid. Society trusts that the results of research reflect an honest attempt by scientists to describe the world accurately and without bias. But this trust will endure only if the scientific community devotes itself to exemplifying and transmitting the values associated with ethical scientific conduct.

On Being a Scientist was designed to supplement the informal lessons in ethics provided by research supervisors and mentors. The book describes the ethical foundations of scientific practices and some of the personal and professional issues that researchers encounter in their work. It applies to all forms of research—whether in academic, industrial, or governmental settings-and to all scientific disciplines.

This third edition of On Being a Scientist reflects developments since the publication of the original edition in 1989 and a second edition in 1995. A continuing feature of this edition is the inclusion of a number of hypothetical scenarios offering guidance in thinking about and discussing these scenarios.

On Being a Scientist is aimed primarily at graduate students and beginning researchers, but its lessons apply to all scientists at all stages of their scientific careers.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

define statistical treatment in a research paper

  • Get new issue alerts Get alerts
  • Submit a Manuscript

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Dealing powerfully with statistical power

A narrative review.

Darling, H. S.

Department of Medical Oncology and Hemato-oncology, Command Hospital Air Force, Bengaluru, Karnataka, India

Address for correspondence: Dr. H. S. Darling, Department of Medical Oncology & Hemato-oncology, Command Hospital Air Force, Bengaluru, Karnataka - 560 007, India [email protected]

Received May 31, 2022

Received in revised form June 14, 2022

Accepted June 14, 2022

This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

Evidence-based medicine demands that the research methodology should be robust, reliable, and reproducible. Statistical power is the probability of detecting an effect that really exists in a population. It is critical to incorporate power at the designing stage of a study. A sufficiently powered study ensures reliable results and avoids wastage of resources. It is essential for a clinician to be aware of the role and interpretation of statistical power while reading the research articles. We performed a medical literature database search in PubMed, Embase, Cochrane, and Google, followed by abstract screening and then full paper study selection to gather the desired information and prepare this review on power. This review aims to provide the basic know-how about the practical aspects of statistical power for a clinician.

INTRODUCTION

“If someone understands both the disease and the medicine, only then is he a wise physician” (“rog daaroo dovai bujhai taa vaid sujaan;” - Shri Guru Angad Dev Ji, the second Sikh guru.[ 1 ]

Power and sample size are critical aspects of the design of any population-based hypothesis testing study.[ 2 ] They must be addressed at the planning stage of a high-quality study. A study needs to be adequately powered to be able to prove a treatment effect as well as to avoid the wastage of resources. An underpowered study provides insufficient evidence to accept or reject a hypothesis. On the other hand, an overpowered study, in which the sample size is larger than required, wastes resources. For any given statistical method and significance level, there are four major considerations while designing a study, namely, sample size, power, clinically meaningful effect size, and the variability of the parameter of interest in the target population. If three of them are known, the fourth can be calculated.[ 3 ] Generally, researchers outsource the sample size calculation of a study to a statistician, but critical inputs from the clinician are necessary to decide the relevant outcomes and clinically meaningful differences.[ 4 ] This article intends to cover the basics of statistical power interpretation, its misinterpretations, uses, and limitations. The objective of this review article is to help clinicians understand the concept and the practical aspects of statistical power.

MATERIALS AND METHODS

An online literature search was conducted in the various databases including PubMed, Embase, Cochrane, and Google using a planned scheme, as depicted in Figure 1 . The search terms used included “statistical power,” “clinical importance,” “type II error,” and “statistical significance.” Articles published in the past 20 years were considered. We excluded duplicate citations and articles that were unrelated to the topic of statistical power, non-human studies, articles for which the full text was not available, and those published in a language other than English. We then searched the references in the bibliographies of certain selected articles to identify any other relevant literature that had been missed. Finally, we included 18 articles that contained relevant information and illustrations for this review. Two more articles were selected by the manual search done on the references of the selected articles, and one reference was from Shri Guru Granth Sahib.

F1-25

Concept and interpretation

The statistical power is a conditional probability similar to the P value.[ 5 ] Conventionally, in hypothesis testing, the alternative hypothesis is the assumption that the null hypothesis is false. If the alternative hypothesis is actually true, the power indicates the probability of correctly rejecting the null hypothesis. The statistical power is best determined prospectively to decide whether a clinical study is worth doing, looking at the required effort, expenditure, time, manpower, and patient exposure. A hypothesis test with small power may yield large P values and large confidence intervals (CI).[ 6 ] Hence, a low-power study may fail to reject the null hypothesis, even if a clinically meaningful difference exists between the treatments being compared.

While analyzing the results of a research study, the conclusion of the investigator could be true or false. The investigator's conclusion would be false in case the alternative hypothesis was chosen when in actuality, the null hypothesis was true. This scenario (incorrectly rejecting the null hypothesis, or a false positive) indicates a type I error, and the probability of committing this error is called “α,” the significance level. The second scenario, that is, incorrectly accepting the null hypothesis, or a false negative, is called the type II error, and the probability of committing this error is called “β.” The power of a study is the complementary probability of the type II error (1 − β) and it signifies the probability of correctly rejecting the null hypothesis.[ 3 ] The levels of α and β are decided based upon the phase of the study, available resources, and the effect size (which is the quantification of the difference between two groups). The smaller these values are, the better will be the quality of the evidence generated by the study, but the larger will be the required sample size. For instance, in earlier phases, a large α and β may be acceptable.[ 7 ] Both α and β are arbitrary values and are decided by balancing between the sample size and the effect size. If the sample size is pre-fixed, for example, in previously collected data or in a retrospective analysis, one can calculate the power with a given effect size or vice versa.[ 8 ] A graph can be plotted with the power against the effect size for a fixed sample size. For a sample size that is too small, the required effect size has to be very large for a given power value. Otherwise, the study may be futile if the actual effect size is small. Power, effect size, and sample size are best decided a priori, while planning a study. In certain situations, investigators may perform post hoc analyses, but this is not ideal and should be a rarity.[ 4 ]

The calculation of the sample size with desired values of α and β requires specification of the effect size and the standard deviation (SD). The effect size and the SD may be taken from a similar previous study in the literature or may be obtained from a pilot study. There are inherent limitations to each method, including sampling errors, publication bias, and study design. To calculate the optimal sample size, an investigator can use sensitivity analysis, with multiple possible values of effect size and SD.[ 9 10 ] This is explained in more detail in the next section.

Factors affecting the power of a study

Four factors influence the power of a study. These need to be accounted for during the planning stage of a study.

Precision and standard deviation of the data

Precision is how close or dispersed the measurements are to each other. It may be influenced by certain modifiable factors like observer and measurement biases as well as the actual variability (measured as the SD) of the population parameter. Observer bias occurs when a researcher's expectations, opinions, or prejudices influence what the researcher perceives or records in a study, whereas measurement bias refers to any systematic or non-random error that occurs in the collection of data in a study. Their collective impact is demonstrated as the 95% CI. The higher the values of these biases and the SD, the broader is the CI. The bigger the sample size, the narrower is the CI, and the closer the result is to the actual population value.

The magnitude of the effect size

Detecting minute differences between intervention effects requires very accurate results for the study to successfully determine the difference. This will require a bigger sample size and more power. A wide CI may be acceptable when the effect size is large.

Type I or type II error

A smaller type II error indicates a higher probability that an actual existing effect size will be detected, with the given α and sample size.

Type of statistical tests

Two types of statistical tests can be used to calculate the sample size. Parametric tests are always preferable as they require a smaller sample size. However, parametric tests (e.g., Student's t test) require the data to be normally distributed, compared to the non-parametric tests (e.g., Mann–Whitney U).[ 11 ]

  • The investigator can estimate the sample size required to test a specific hypothesis if the power has been pre-decided.
  • The power of a study can be calculated for an already existing dataset with a fixed sample size and a pre-defined effect size.
  • The effect size can be calculated with a given power and a fixed sample size.
  • Addressing these issues a priori while designing the study allows a tighter and more rigorous study.[ 3 ]
  • Sample size and power calculations help decide the feasibility of a study within the available resources.[ 4 ]
  • Power derivations give us the sample size with a known type I and type II error.[ 11 ]

A study needs to have an adequate sample size to be practically feasible, clinically valuable, and reliable. A low-power study fails to answer the research question reliably. An overpowered study is wasteful in terms of resources as well as makes a small difference appear very significant.[ 3 ]

Misinterpretations

The sample size is directly proportional to the power. A power threshold of 80% is often used. This means that if the treatment has a detectable effect, the results obtained from the study will be statistically significant 80% of the time. If a large sample size is not feasible for any reason, the power of the study will be compromised. Hence, many studies in rare diseases are conducted without calculating the power before conducting the study. Rather, some investigators calculate the power in the post hoc setting, that is, the power is calculated based on the observed effect after conducting the study. By demonstrating a low power with an observed meaningful effect, they tend to recommend lower power thresholds in such studies. In principle, the statistical power is about the population being sampled. It is an assumption that in a post hoc power analysis, the observed effect size is the real effect size.[ 12 ] Post hoc power estimates have been shown to be illogical and misleading. Investigators do these post hoc power calculations either to justify the study design or to explain why the study did not yield a statistically significant effect after the study completion. Ironically, post hoc power estimates do not serve these purposes validly. A post hoc power estimate is a justification akin to the statement, “If the axe chopped down the tree, it was sharp enough.”[ 13 14 ]

Another scenario is when the power is calculated for a post hoc secondary data analysis of an existing dataset. The kind of research question here would be, “Would studying a dataset of this size provide sufficient evidence to reject the null hypothesis?” This is a somewhat redundant question as, in reality, that is the only dataset the investigator has, and therefore, the sample size is fixed. The only conceivable benefit of performing a power analysis here would be to avoid type I or type II errors.[ 15 ] Although a secondary data analyst cannot gain access to a larger dataset in case the sample size of the existing dataset is inadequate, the investigator could avoid wasting time and risking misunderstandings by not doing the analysis at all.

Recommendations

When to do sample size calculations for a study.

Sample size calculations should ideally be done before the study, so that the correct answer to the research question is obtained in the most efficient way.

Sometimes, an interim power calculation is performed during an already ongoing study. However, the researcher needs to be careful to ensure that the study is not stopped early even if statistical significance has been attained. Conversely, one must use an interim power calculation to avoid prolongation of a study in the case of life-saving or hazardous therapies. This provision must be a part of the research protocol a priori.

Rarely, after a negative study, one can retrospectively interrogate the data to assess if the study was underpowered and whether the negative study represents a false negative, that is, a type II error.[ 11 ]

When can an underpowered study be acceptable?

Conventionally, the power has to always be at least 80% or greater. However, a researcher may choose to conduct an underpowered study in certain situations like

  • In an exploratory analysis or a pilot study
  • When there is only one available dataset
  • When studying a very interesting question with limited time and resources
  • In rare diseases and situations in which limited scientific knowledge is available
  • In resource-limited settings[ 10 16 ]
  • In a laboratory study or a retrospective correlation study[ 17 ]

Better alternatives

The answer that the post hoc power estimates try to obtain may simply be found in the relevant CI.[ 18 19 20 ] For instance, in a study in which the clinically meaningful effect size is 4 units and the 95% CI is (−0.3, +0.2), the researcher may accept the null hypothesis. Conversely, if the CI is very broad (−13, +22), it is clearly quite unsafe to assume that the true effect size is zero simply because it is statistically non-significant; it might be zero, but it might be highly positive or highly negative, and recognition of this uncertainty is necessary. Although both (−0.3, +0.2) and (−13, +22) correspond to non-significant hypothesis tests in that they include H 0 , they do not have the same practical implications.[ 21 ]

The Bayesian method is a different approach to decide about the correctness of the findings from a study.[ 10 ] In a Bayesian analysis, a researcher can specify prior probabilities for different population hypotheses and/or prior distributions over possible values of population parameters. Based on these priors, which must often be chosen subjectively, the Bayes' theorem can then be used to calculate posterior probabilities.[ 10 ] A more detailed discussion on the Bayesian method is beyond the scope of this article.

Augmenting the power without increasing the sample size

  • Correction for non-adherence of the participants
  • Adjustment for multiple comparisons
  • Innovative study designs[ 4 ]

A well-powered study is critical to generate reliable and reproducible results. In certain situations, a low-powered study may be acceptable. Prospective power analysis is the ideal and standard statistical requirement. Resorting to a post hoc power estimate is not a valid technique. When trying to interpret a failure to reject an important null hypothesis, extrapolating the information from the CI or a Bayesian analysis is better in many situations than using a post hoc power estimate.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Cited Here |
  • Google Scholar

Clinically Relevant Difference; hypothesis testing; power; sample size; type I error; type II error

  • + Favorites
  • View in Gallery

Readers Of this Article Also Read

P</em>” or not to “<em xmlns:mrws=\"http://webservices.ovid.com/mrws/1.0\">p</em>”, that is the question: a narrative review on <em xmlns:mrws=\"http://webservices.ovid.com/mrws/1.0\">p</em> value', 'darling h. s.', 'cancer research, statistics, and treatment', 'oct-dec 2021', '4', '4' , 'p 756-762');" onmouseout="javascript:tooltip_mouseout()" class="ejp-uc__article-title-link">to “ p ” or not to “ p ”, that is the question: a narrative review on p value, basics of statistics - 5: sample size calculation (iii): a narrative review..., basics of statistics – 4: sample size calculation (ii): a narrative review, basics of statistics-3: sample size calculation – (i), basics of statistics – 2: types of clinical studies.

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety
  • Statistics >

Statistical Treatment Of Data

Statistical treatment of data is essential in order to make use of the data in the right form. Raw data collection is only one aspect of any experiment; the organization of data is equally important so that appropriate conclusions can be drawn. This is what statistical treatment of data is all about.

This article is a part of the guide:

  • Statistics Tutorial
  • Branches of Statistics
  • Statistical Analysis
  • Discrete Variables

Browse Full Outline

  • 1 Statistics Tutorial
  • 2.1 What is Statistics?
  • 2.2 Learn Statistics
  • 3 Probability
  • 4 Branches of Statistics
  • 5 Descriptive Statistics
  • 6 Parameters
  • 7.1 Data Treatment
  • 7.2 Raw Data
  • 7.3 Outliers
  • 7.4 Data Output
  • 8 Statistical Analysis
  • 9 Measurement Scales
  • 10 Variables and Statistics
  • 11 Discrete Variables

There are many techniques involved in statistics that treat data in the required manner. Statistical treatment of data is essential in all experiments, whether social, scientific or any other form. Statistical treatment of data greatly depends on the kind of experiment and the desired result from the experiment.

For example, in a survey regarding the election of a Mayor, parameters like age, gender, occupation, etc. would be important in influencing the person's decision to vote for a particular candidate. Therefore the data needs to be treated in these reference frames.

An important aspect of statistical treatment of data is the handling of errors. All experiments invariably produce errors and noise. Both systematic and random errors need to be taken into consideration.

Depending on the type of experiment being performed, Type-I and Type-II errors also need to be handled. These are the cases of false positives and false negatives that are important to understand and eliminate in order to make sense from the result of the experiment.

define statistical treatment in a research paper

Treatment of Data and Distribution

Trying to classify data into commonly known patterns is a tremendous help and is intricately related to statistical treatment of data. This is because distributions such as the normal probability distribution occur very commonly in nature that they are the underlying distributions in most medical, social and physical experiments.

Therefore if a given sample size is known to be normally distributed, then the statistical treatment of data is made easy for the researcher as he would already have a lot of back up theory in this aspect. Care should always be taken, however, not to assume all data to be normally distributed, and should always be confirmed with appropriate testing.

Statistical treatment of data also involves describing the data. The best way to do this is through the measures of central tendencies like mean , median and mode . These help the researcher explain in short how the data are concentrated. Range, uncertainty and standard deviation help to understand the distribution of the data. Therefore two distributions with the same mean can have wildly different standard deviation, which shows how well the data points are concentrated around the mean.

Statistical treatment of data is an important aspect of all experimentation today and a thorough understanding is necessary to conduct the right experiments with the right inferences from the data obtained.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Siddharth Kalla (Apr 10, 2009). Statistical Treatment Of Data. Retrieved Sep 01, 2024 from Explorable.com: https://explorable.com/statistical-treatment-of-data

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

define statistical treatment in a research paper

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

define statistical treatment in a research paper

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalise your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalising your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardised indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organising data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualising the relationship between two variables using a scatter plot .

By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12
30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00
653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

Is this article helpful?

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.

  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Correlation Coefficient | Types, Formulas & Examples
  • Descriptive Statistics | Definitions, Types, Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | Meaning, Formula & Examples
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Inferential Statistics | An Easy Introduction & Examples
  • Levels of measurement: Nominal, ordinal, interval, ratio
  • Missing Data | Types, Explanation, & Imputation
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Skewness | Definition, Examples & Formula
  • T-Distribution | What It Is and How To Use It (With Examples)
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Interval Data? | Examples & Definition
  • What Is Nominal Data? | Examples & Definition
  • What Is Ordinal Data? | Examples & Definition
  • What Is Ratio Data? | Examples & Definition
  • What Is the Mode in Statistics? | Definition, Examples & Calculator

Chapter 3 Research Design and Methodology

  • October 2019
  • Thesis for: Master of Arts in Teaching

Kenneth De la Piedra at University of Perpetual Help System Jonelta Pueblo de Panay

  • University of Perpetual Help System Jonelta Pueblo de Panay

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Exp Biol Med (Maywood)
  • v.247(9); 2022 May

Logo of ebm

Statistical considerations for outcomes in clinical research: A review of common data types and methodology

Associated data.

Supplemental material, sj-docx-1-ebm-10.1177_15353702221085710 for Statistical considerations for outcomes in clinical research: A review of common data types and methodology by Matthew P Smeltzer and Meredith A Ray in Experimental Biology and Medicine

With the increasing number and variety of clinical trials and observational data analyses, producers and consumers of clinical research must have a working knowledge of an array of statistical methods. Our goal with this body of work is to highlight common types of data and analyses in clinical research. We provide a brief, yet comprehensive overview of common data types in clinical research and appropriate statistical methods for analyses. These include continuous data, binary data, count data, multinomial data, and time-to-event data. We include references for further studies and real-world examples of the application of these methods. In summary, we review common continuous and discrete data, summary statistics for said data, common hypothesis tests and appropriate statistical tests, and underlying assumption for the statistical tests. This information is summarized in tabular format, for additional accessibility.

Impact Statement

Particularly in the clinical field, a larger variety of statistical analyses are conducted and results are utilized by a wide range of researchers, with some having more in-depth statistical training than others. Thus, we set out to summarize and outline appropriate statistical analyses for the most common data found in clinical research. We aimed to make this body of work comprehensive, yet brief and such that anyone working in clinical or public health research could gain a basic understanding of the different types of data and analyses.

Introduction

Clinical research is vitally important for translating basic scientific discoveries into improved medical and public health practice through research involving human subjects. 1 The goal is to generate high-quality evidence to inform standards of care or practice. At its later stages, clinical research, in the form of clinical trials or observational studies, often focuses on comparing health outcomes between groups of persons who differ based on a treatment received or some other external exposure. 2

The scientific method dictates that we test falsifiable hypotheses with quantitative data. 3 Evidence for or against a treatment or exposure must be evaluated statistically to determine if any observed differences are likely to represent true differences or are likely to have occurred by chance. Statistical methods are used to conduct hypothesis testing to this end. 4 In addition, statistical methods are employed to summarize the results of a study and to estimate the observed effect of the treatment or exposure on the outcome of interest. 4

All clinical trials and many observational studies have a designated primary outcome of interest, which is the quantitative metric used to determine the effect of the treatment or exposure. The statistical properties, such as its probability distribution, of the outcome variable and quantifying changes in said variable due to the exposure are of primary importance in determining the choice of statistical methodology. 4 , 5 Here, we review some of the most common types of outcome variables in comparative research and common statistical methods used for analysis.

In this summary, we review standard statistical methodology used for data analysis in clinical research. We identify five common types of outcome data and provide an overview of the typical methods of analysis, effect estimates derived, and graphical presentation. We aim to provide a resource for the clinical researcher who is not a practicing statistician. Methods discussed can be reviewed in more detail in graduate-level textbooks on applied statistics, which are referenced throughout our summary. We also provide references for real-world clinical research projects that have employed each core method. In addition, the procedures available in standard statistical software for data analysis in each of these scenarios are provided in Supplemental Tables 1 and 2 .

At the core, there are generally two categories of outcome data: discrete and continuous. By definition, discrete data, also called categorical data, are the data that have natural, mutually exclusive, non-overlapping groups. 6 Two examples would be severity, defined as mild, moderate, or severe, and intervention exposure groups, such as those receiving the intervention and those not receiving the intervention. Such categories may be ordinal (having an inherent order) or nominal (no inherent order) and can range from two groups or more. 6 The categories may represent qualitative groups (such as the previous examples) or quantitative data, that is, age groupings, such as 18–35, 36–55, 56 years and above.

Continuous data have more flexibility and can be defined as a variable that “can assume any values within a specified relevant interval of values.” 6 , 7 More concrete examples include a person’s age, height, weight, or blood pressure. While we may round for convenience, that is, round to the nearest integer (year) for age, there are no theoretical gaps between two continuous values. Addressing perhaps an obvious question, there are unique situations where data may skirt between discrete and continuous. For example, when does a quantitative ordinal discrete variable have enough categories to be considered a continuous variable? These are often a situation-by-situation basis and decided a priori before the onset of the study.

In addition to the type of data, the sample size may also influence the method used for calculating test statistics and p -values for statistical inference. When sample sizes are sufficiently large, we typically use a class of statistics called asymptotic statistics that rely on a result known as the central limit theorem. 8 These often rely on a Z statistic, chi-square statistic, or F statistic. When sample sizes are more limited, we typically use non-parametric or exact statistical methods that do not rely on these large sample assumptions. Most of the statistical methods that we review here rely on asymptotic statistics in their basic form, but often have an analogous method relying on exact and or non-parametric methods. 9 When a researcher encounters small sample sizes, it is important to consider these alternative methods.

In addition to identifying appropriate statistical methodology for testing hypotheses given the study’s outcome data, there are a number of additional influences that should be considered, such as effect modification and confounding. Additional factors can alter the association of the exposure and outcome and thus are critical to consider when analyzing biological associations. Effect modification, by definition, occurs when a third factor alters the effect of the exposure on the outcome. 10 Specifically, the magnitude of this alteration changes across the values of this third factor. A separate phenomenon, known as confounding, occurs when an imbalance in the distribution of a third factor in the data distorts the observed effect (association) of the exposure on the outcome. 10 To meet the criteria of a confounder, this third factor must be associated with the exposure and with the outcome but not in the casual pathway. If all scenarios above occur, this third factor is a confounder and introduces bias when not properly controlled. While effect modification is a biologic phenomenon in which an exposure impacts the outcome differently for different groups of individuals, confounding is a phenomenon caused by the imbalance of the data itself and may not have biologic significance.

An important consideration is that the effect-modifying or confounding factor is not in the casual pathway from the exposure to outcome. The casual pathway is the primary biological pathway in which the exposure influences the outcome. For example, if Variable A causes Z, and Z causes Y, then Variable Z is on the causal pathway from A to Y. In this case, controlling for Z as either a confounder or effect modifier while estimating the effect of A on Y will induce bias in the estimate. Investigators should also avoid controlling for common effects of A and Y, which can induce “collider bias.” We will discuss how to assess for effect modification and confounding later.

Methods for common variable types

Continuous data.

Continuous data, as described above, are quantitative data with no theoretical gaps between values, where the range (minimum and maximum) of values is dependent on what is being measured. For example, the natural range for age is (0, ~100) while the natural range for temperature measured in degrees Fahrenheit is (−459.67, 134). These types of data are often summarized with a central measurement and a spread measurement. 7 The most common central measurements are the mean or median and represent the “center” of the observed data. The spread measurement aims to quantify how much variation is in the data or how much of the data deviates from the central measurement. Thus, if a mean is presented as the central metric, the variance or standard deviation is typically presented as the spread measurement. If the median is presented as the central metric, the interquartile range (IQR: 25th and 75th percentile) and range (minimum and maximum) are reported as the spread measurement. Understandably, the next question is: which metric to use and when?

This leads to the topic of data distributions. If our continuous data follow what we call a normal probability distribution, this is a symmetric distribution around the mean and therefore, our mean and median will be approximately the same value. 7 While it is statistically appropriate to report either the mean (with variance or standard deviation) or the median (with IQR and range), if our data follow a normal distribution, the most common practice is to report the mean. If the data are skewed and do not follow a normal distribution, it is appropriate to report the median. If the data are skewed, the mean is pulled toward the more extreme values and no longer a true central measurement, while the median is not influenced by skewness. 7

A normal distribution is a statistical probability distribution, defined by a mean and variance, that illustrates the probability of observing a specific value or values from the data. It has convenient statistical properties, such as a pre-specified probability density function and cumulative density function, which are the functions that calculate said probabilities. 7 In addition to the normal distribution, other distributions exist for continuous data and discrete data. Other continuous distributions include, but not limited to exponential, chi-square, F, T, gamma, and beta distributions. 8 Discrete distributions include, but not limited to Bernoulli, binomial, Poisson, negative binomial, and hypergeometric. 8 Each distribution is defined by one or more parameters which control the average, standard deviation, and other aspects of the distribution. If the data follow one of these known distributions, calculating the likelihoods of occurrence, such as for hypothesis testing, becomes straightforward.

How we determine if data follow one of these distributions vary for each type of distribution. For the scope of this body of work, we will only cover how to assess if a continuous variable follows a normal distribution. There are three ways in which one can assess normality, each has its strength and weakness and, therefore, encourage the consideration of all three approaches. Normality can be assessed visually with quantile–quantile (QQ) plots, visually with histograms, or by statistical test (Shapiro–Wilk test, Kolmogorov–Smirnov test, Cramer–von Mises test, or Anderson–Darling test). 11 , 12 Other tests exist but these are the most commonly available in statistical software. The normality tests tend to be very strict, and the smallest deviations will lead to non-normal conclusion. 11 , 12 The visual assessments, such as the QQ plots and histograms, are more subjective to the researcher’s judgment, hence useful to consider visual and statistical approaches.

When our outcome variable is normally distributed, there are several factors that must be considered for selecting the appropriate statistical method to test the hypothesis, such as number of samples, independence, and so on. These analyses have been summarized in Table 1 . Note this table is not comprehensive but a generalized summary of common analyses and assumptions. When the continuous outcome violates normality, or the sample size is small, non-parametric approaches can instead be used. Non-parametric approaches are analyses that do not make any assumptions about the type of distribution; they can analyze normal and non-normally distributed data. However, if data are normal, parametric approaches are more appropriate to implement.

Summary of continuous data analyses and assumptions (all observations are independent).

Type of outcome variableOutcome statistical distributionTheoretical hypotheses AssumptionsCommonly used point estimateCommonly used Effect estimate – Common statistical methods
One variableNormal
NormalityMeanMean -test
No assumption
NoneMedianMedianSign test or signed-rank test
Two variablesNormal
1. Normality
2. Two groups are independent
3. Group variances are equal
MeanDifference of means
or Cohen’s d
-test
No Assumption
H1 : M1 ≠ M2
1. Two groups are independent
2. Both groups have same distribution shape
MedianU statisticMann–Whitney U test
Three or more variables
( groups)
Normal

1. Normality
2. All groups are independent
3. Group variances are equal
MeanCohen’s fANOVA
No Assumption

1. All groups are independentMedian Kruskal–Wallis
Association analyses: modeling outcome as a function of one or more explanatory variables
One continuous variableNormal


or


1. Linear association between explanatory variables and outcome
2. Independent explanatory variables (if more than one)
3. Normally distributed error terms
4. Equal variances
NoneCohen’s f or R Linear regression (overall F-test or partial -tests)

ANOVA: Analysis of variance.

If our aim is to quantify the association between an outcome and exposure, we can apply linear regression (assuming all assumptions are met, see Table 1 ). As outlined earlier, we need to consider possible effect modifiers and confounders. To assess for effect modification, we can do so by introducing an interaction term in the model. As a simple example, the model would contain the exposure variable, the possible effect modifier, and a multiplication term between the exposure and possible effect modifier (termed the interaction term). If the interaction term is statistically significant, we would conclude effect modification is present. If a variable is not an effect modifier, consideration for confounding is then checked. There exist different approaches for assessing confounding but the most widely used is the 10% rule. This rule states that a variable is a confounder if the regression coefficient for the exposure variable changes by more than 10% with the inclusion of the possible confounder in the model. A nice example of this can be seen in Ray et al. (2020). 16

Counts and rates

Count data are the number of times a particular event occurs for each individual, taking non-negative integer values. In biomedical science, we most often look at count data over a period of time, creating an event rate (event count / period of time). The simplest analysis of these data involves calculating events per patient-year of follow-up. When conducting patient-year analyses in large populations, it is often acceptable to look at this statistic in aggregate (sum of total events in the population / sum of total patient-years at risk in the population). Confidence intervals can be calculated by assuming a Poisson distribution.

Statistical modeling of count data or event rates is common with a Poisson model. These models can adjust for confounding by other variables and incorporate interaction terms for effect modification. When a binary treatment variable is used with event rate as the outcome, incidence rate ratios (with confidence intervals) can be estimated from these models. The model can be extended to a zero inflated Poisson (ZIP) model or a negative binomial model when the standard Poisson model does not fit the data well. Population level analyses often look at disease incidence rates and ratios using these methods. 17 , 18 Recently, this type of statistic modeling is at the core of statistical methods used to calculate vaccine efficacy against COVID-19 in a highly impactful randomized trial. 19

Binary data

Arguably, the simplest form of an outcome variable in clinical research is the binary variable for which every observation is classified in one of two groups (disease versus no disease, response versus no response, etc.). 20 We typically assume a binomial statistical distribution for this type of data. When the treatment variable is also binary, results can be analyzed by the simple analysis of the classic 2 × 2 table. From this table, we can estimate the proportion of responses, odds of response, or risk of response/disease within each treatment group. We then compare these estimates between treatment groups using differences or ratio measures. These include the difference in proportions, risk difference, odds ratios, and risk ratios (relative risk). Hypothesis testing around these estimates may utilize the chi-square test to assess the general association between the two variables, large sample asymptotic tests relying on normality under the central limit theorem, or exact tests that do not assume a specific statistical distribution.

Statistical models for binary outcomes can be constructed using logistic regression. In this way, the effect estimates (typically the odds ratio) can be adjusted for confounding by measured variables. These models typically rely on asymptotic normality for hypothesis testing but exact statistics are also available. The models can also assess effect modification through statistical interaction terms. An example of the classical 2 × 2 table can be referenced in Khan et al. 21 A typical application of logistic regression can be seen in Ray et al. 22 We have summarized methods for categorical data in Table 2 .

Summary of discrete data analyses and assumptions (all observations are independent).

Type of outcome variableOutcome statistical distributionTheoretical hypotheses AssumptionsCommonly used point estimateCommonly Used Effect estimate – Common statistical methods
Discrete
 One binary variableBinomial
One binary variable.ProportionProportionZ-test or binominal exact test
 Two binary variablesBinomial
1. One binary metric measured on two different samples.
2. Two samples are independent.
ProportionsDifference in proportions or
Cohen’s h
Z-test

H1 : OR ≠ 1
1. Two binary variables measured on same sample.
2. One variable measuring outcome.
3. One variable measuring exposure.
OddsOdds ratioLogistic regression

H1 : RR ≠ 1
1. Two binary variables measured on same sample.
2. One variable measuring outcome.
3. One variable measuring exposure.
RiskRisk ratioLogistic, Poisson, or negative binomial regression
 Two discrete variablesNo Assumption


1. Two variables measured on the same sample.
2. Each variable is measuring a different metric.
NoneCramer’s V or PhiChi-squared test, Fisher’s exact test (small sample sizes), or logistic regression
Association analyses: modeling outcome as a function of one or more explanatory variables
 One binary variableBinomial


or


or

H1 : ORi ≠ 1

1. Outcome variable is binary
2. Explanatory variables are independent
3. Explanatory variables are linearly associated with the log odds.
OddsOdds ratioLogistic regression
 One discrete variable with > 2 levelsMultinomial (ordered or unordered)If outcome data are nominal, the assumptions are the same as binomial logistic regression.
If outcome data are ordinal, the proportional odds assumption must be met in addition to binomial logistic regression assumptions.
OddsOdds ratioMultinomial logistic regression: generalized logit link for unordered and cumulative logit link for ordered
 Counts and events per follow-upPoisson or
negative binomial



or


or

H1 : IRR ≠ 1

1. Outcome variable is positive integer counts following a Poisson or negative binomial distributionIncidence rateIncidence rate ratioPoisson or negative binomial regression
Time-to-eventNo Distribution Assumed



1. Single discrete exploratory variable (with categories)
2. Censoring is not related to explanatory variables
5-year survivalDifference in 5-year survivalKaplan–Meier (Log-rank test)



or


or

H1 : ≠ 1

1. Hazard remains constant over time (hazards are proportional assumption).
2. Explanatory variables are independent.
3. Explanatory variables are linearly associated with the log hazard.
NoneHazard ratioCox proportional hazards model

Multinomial data

Multinomial data are a natural extension of binary data such that it is a discrete variable with more than two levels. It follows that the extensions of logistic regression can be applied to estimate effects and adjust for effect modification and confounding. However, multinomial data can be nominal or ordinal. For nominal data, the order is of no importance and, therefore, the models use a generalized logit link. 23 This will select one category as a referent category and then perform a set of logistic regression models, each comparing one non-referent level to this referent level. For example, in Kane et al. , 24 they applied a multinomial logistic regression to model type of treatment (five categories) as a function of education level and other covariates. They select watchful waiting as the referent treatment. The analysis thus had four logistic regressions to report, respective of each of the other treatment categories compared to watchful waiting.

If the multinomial data are ordinal, we use a cumulative logit link in the regression model. This link will model the categories cumulatively and sequentially. 23 For example, suppose our outcome has three levels, 1, 2, and 3 and are representative of the number of treatments. Cumulative logit will conduct two logistic regressions: first, Modeling Category 1 versus Categories 2 and 3 (combined) and then Categories 1 and 2 (combined) versus Category 3. Because of the combining of categories, this assumes that the odds are proportional across categories. Thus, this assumption must be checked and satisfied before applying this model. Depending on the outcome, only one of the logistic models may be needed, such as in Bostwick et al. , 25 where their outcome was palliative performance status (low, moderate, and high) and the effects of cancer/non-cancer status. Here, they only reported high-performance status versus moderate and low combined as their outcome.

Time-to-event

Time-to-event data, often called survival data, compare the time from a baseline point to the potential occurrence of an outcome between groups. 26 These data are unique as a statistical outcome because they involve a binary component (event occurred or event did not occur) and the time to event occurrence or last follow-up. Both the occurrence of event and the time it took to occur are of interest. These outcomes are most frequently analyzed with two common statistical methodologies, the Kaplan–Meier method and the Cox proportional hazards model. 26

The Kaplan–Meier method allows for the estimation of a survival distribution of observed data in the presence of censored observations and does not assume any statistical distribution for the data. 26 , 27 In this way, knowledge that an individual did not experience an event up to a certain time point, but is still at risk, is incorporated into the estimates. For example, knowing an individual survived 2 months after a therapy and was censored is less information than knowing an individual survived 2 years after a therapy and was censored. The method assumes that the occurrence of censoring is not associated with the exposure variable. In addition to estimating the entire curve over time, the Kaplan–Meier plot allows for the estimation of the survival probability to a certain point in time, such as “5-year” survival. Survival curves are typically estimated for each group of interest (if exposure is discrete), shown together on a plot. The log-rank test is often used to test for a statistically significant difference in two or more survival curves. 26 An analogous method, known as Cumulative Incidence, takes a similar approach to the non-parametric Kaplan–Meier method, but starts from zero and counts events as they occur, with estimates increasing with time (rather than decreasing). 26 Cumulative Incidence analyses can also be adjusted for competing risks, which occur when subjects experience a different event during the follow-up time that precludes them from experiencing the event of primary interest. In the presence of competing risks, Cumulative Incidence curves can be compared using Gray’s test. 26

Time-to-event data can also be analyzed using statistical models. The most common statistical model is the Cox proportional hazards model. 28 From this model, we can estimate hazard ratios with confidence intervals for comparing the risk of the event occurring between two groups. 26 Multiple variable models can be fit to incorporate interaction terms or can be adjust for confounding (the 10% rule can be applied to the hazard ratio estimate). Although the Cox model does not assume a statistical distribution for the outcome variable, it does assume that the ratio of effect between two treatment groups is constant across time (i.e., proportional hazards). Therefore, one hazard ratio estimate applies to all time points in the study. Extensions of this model are available to allow for more flexibility, with additional complexity in interpretation. Examples of standard applications of the Kaplan–Meier method and Cox proportional hazards models can be seen in recent papers by Mok et al. 29 and Aparicio et al. 30

Generalized linear models

With the exception of time-to-event data, all of the statistical modeling techniques described above can be classified as some form of generalized linear model (GLM). 20 Modern statistical methods utilize GLMs as a broader class of statistical model. In the GLM, the outcome variable can take on different forms (continuous, categorical, multinomial, count, etc) and it is mathematically transformed using a link function. In fact, the statistical modeling methods we have discussed here are each a special case of a GLM. The GLM can accommodate multiple covariates that could be either continuous or categorical. The GLM framework is often a useful tool for understanding the interconnectedness of common statistical methods. For the interested reader, an elegant description of the most common GLMs and how they interrelate is given in Chapter 5 of Categorical Data Analysis by Alan Agresti. 20

Concerns of bias and validity

While statistical significance is necessary to demonstrate that an observed result is not likely to have occurred by chance alone, it is not sufficient to insure a valid result. Bias can arise in clinical research from many causes, including misclassification of the exposure, misclassification of the outcome, confounding, missing data, and selection of the study cohort. 10 , 31 Care should be taken at the study design phase to reduce potential bias as much as possible. To this end, application of proper research methodology is essential. Confounding can sometimes be corrected through statistical adjustment after collection of the data, if the confounding factor is properly measured in the study. 10 , 31 All of these issues are outside the scope of basic statistics and this current summary. However, good clinical research studies should consider both statistical methodology and potential threats to validity from bias. 10 , 31

In this review, we have discussed five of the most common types of outcome data in clinical studies, including continuous, count, binary, multinomial, and time-to-event data. Each data type requires specific statistical methodology, specific assumptions, and consideration of other important factors in data analysis. However, most fall within the overarching GLM framework. In addition, the study design is an important factor in the selection of the appropriate method. Statistical methods can be applied for effect estimation, hypothesis testing, and confidence interval estimation. All of the methods discussed here can be applied using commonly available statistical analysis software without excessive customized programming.

In addition to the common types of data discussed here, other statistical methods are sometimes necessary. We have not discussed in detail situations where data are correlated or clustered. These scenarios typically violate the independence assumption required by many methods. Common subsets of these include longitudinal analyses with multiple observations collected across time and time series data which also require specialized techniques. We have also not covered situations where outcome data are multidimensional, such as the case for research in genetics. The analysis of large amounts of genetic information often relies on the basic methods discussed here, but special considerations and adapted methodology are needed to account for the large numbers of hypothesis tests conducted. One consideration is multiple comparisons. When a single sample is tested more than one time, this increases the chance of making either type I or II error. 32 This means we incorrectly reject or fail to reject the null hypothesis given the truth at the population level. Because of this increased likelihood of error, the significance level must be adjusted. These types of adjustments are not discussed here. Moreover, this overview is not comprehensive, and many additional statistical methodologies are available for specific situations.

In this work, we have focused our discussion on statistical analysis. Another key element in clinical research is a priori statistical design of trials. Appropriate selection of the trial design, including both epidemiologic and statistical design, allows data to be collected in a way that valid statistical comparisons can be made. Power and sample size calculations are key design elements that rely on many of the statistical principals discussed above. Investigators are encouraged to work with experienced statisticians early in the trial design phase, to ensure appropriate statistical considerations are made.

In summary, statistical methods play a critical role in clinical research. A vast array of statistical methods are currently available to handle a breath of data scenarios. Proper application of these techniques requires intimate knowledge of the study design and data collected. A working knowledge of common statistical methodologies and their similarities and differences is vital for producers and consumers of clinical research.

Supplemental Material

Author’ Contributions: All authors participated in the design, interpretation of the studies, writing and review of the manuscript.

Declaration of conflicting interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

An external file that holds a picture, illustration, etc.
Object name is 10.1177_15353702221085710-img1.jpg

Supplemental material: Supplemental material for this article is available online.

chrome icon

What is statistical treatment?  

Insight from top 3 papers.

Statistical treatment refers to the application of statistical methods and techniques to analyze and interpret data in various fields. It involves using statistical tools to process and analyze data, make inferences, and draw conclusions. Statistical treatment plays a crucial role in medical decision-making, where it is used for diagnosis, treatment, and prognosis. It helps in improving medical outcomes by providing tools such as probability measurement, medical indicators, reference ranges, scoring systems, and various statistical measures like odds ratio, sensitivity, specificity, and predictivities. Statistical treatment is also important in the field of fluid dynamics, specifically in the processing of data acquired via image velocimetry. It involves basic statistical treatment, frequency and modal analysis, as well as advanced research topics like multiscale modal decompositions and nonlinear dimensionality reduction. [??] [??] [??]

Source Papers (3)

TitleInsight
PDF Talk with Paper
, , ,   Talk with Paper
- 4 Talk with Paper

Related Questions

Weighted mean plays a crucial role in the statistical treatment of data by providing a more accurate representation of the data when certain observations carry more significance than others. This is particularly relevant in scenarios where datasets have biases that need correction through importance weighting methods . Weighted mean is essential in constructing weighted datasets to ensure uniqueness of scores and achieve desired cumulative statistics, simplifying the analysis process . In complex survey analysis, alternative analysis weights based on the weighted mean can enhance the efficiency of model-parameter estimates while maintaining consistency in regression models . Moreover, in situations where individual cells of a dataset have assigned weights, the use of cellwise weighted likelihood functions, such as the weighted average, facilitates multivariate statistical methods like maximum likelihood estimation and covariance matrix estimation .

Statistics play a crucial role in research by providing methods to collect, process, interpret, and draw conclusions from data, making it an indispensable tool in various fields. It aids in identifying relationships within collected data, enhancing decision-making processes through hypothesis testing. Statistics help in analyzing data collected from research samples to draw conclusions about entire populations, enabling less biased decision-making and establishing causality. In foreign policy analysis, statistical techniques like regression analysis are commonly used to investigate relationships between key variables and assess their significance. Overall, statistics in research enables professionals to estimate processes efficiently, manage data effectively, and gain better insights into scientific proposals, making it a fundamental element in research processes.

The purpose of statistical treatment is to provide decision-makers with valuable information for selecting treatments. Statistical treatment involves using quantitative data collection procedures and technological tools to analyze and interpret data effectively. It aids in making accurate predictions based on available data, such as determining the best treatment type, dosage, and expected recovery time for patients. Statistical treatment also plays a crucial role in planning clinical trials by defining effects, determining sample sizes, and conducting statistical analyses to ensure the success of experimental treatments. By utilizing statistical methods like risk evaluation and minimax-regret criteria, statistical treatment helps in choosing optimal treatment rules and sample designs. Overall, statistical treatment enhances decision-making processes by providing insights derived from rigorous data analysis and interpretation.

Statistical treatment for chapter 3 involves Bayesian estimation of parameters in feed-forward neural networks, addressing challenges like lack of interpretability due to neural networks being perceived as black boxes . This lack of transparency extends to parameters like neuron numbers and connection weights, affecting the interpretation of Bayesian frameworks. Techniques like hierarchical priors, noninformative priors, and hybrid priors are utilized to express prior knowledge or lack thereof on parameter distributions . Additionally, model building approaches such as model selection and model averaging are discussed, emphasizing the distinction between modeling for understanding relationships and prediction for output accuracy . Efficient search of the model space using algorithms like Markov Chain Monte Carlo and reversible-jump Markov Chain Monte Carlo is crucial for model selection and averaging .

Statistical treatments commonly used in quantitative research design include log-rank statistical tests and proportional hazards models for comparing survival curves and estimating treatment benefits . Tables and figures are also commonly used to present specific data or statistical analysis results, providing visual understanding and interpretation of the statistical results . Alternating treatments designs (ATDs) utilize visual analysis, visual structured criteria (VSC), and a comparison involving actual and linearly interpolated values (ALIV) for analyzing single-case experimental designs . Additionally, statistical solutions, such as time-parameterized probability measures, have been proposed as a framework for global solutions and uncertainty quantification in quantitative research design .

Trending Questions

The project manager plays a crucial role in shaping organizational culture during project initiation, influencing project success through effective communication, cultural awareness, and stakeholder management. Their competencies are vital in navigating the complexities of organizational dynamics. ## Influence on Communication and Clarity - Project managers facilitate clear communication, which is essential for aligning project goals with organizational culture. Effective communication management enhances project success by ensuring that all stakeholders are informed and engaged. - Clarity in project scope, guided by the project manager, helps mitigate misunderstandings that can arise from cultural differences within the team. ## Cultural Awareness and Adaptation - Project managers must assess and adapt to the organizational culture, recognizing that different cultural layers impact project management practices. This adaptability is crucial for international projects where cultural discrepancies may exist. - Understanding the unique cultural elements of an organization allows project managers to tailor their approaches, fostering a collaborative environment that aligns with organizational values. ## Continuous Improvement and Stakeholder Management - As improvement agents, project managers are expected to contribute to the continuous enhancement of organizational culture, promoting practices that support project success. - Effective stakeholder management, a key responsibility of project managers, ensures that diverse perspectives are integrated, enhancing the project's alignment with organizational culture. While the project manager's role is pivotal, it is essential to recognize that organizational culture is also shaped by broader systemic factors, including leadership styles and organizational policies, which can sometimes limit the project manager's influence.

Continuous Security Assurance (CSA) is essential in smart homes for effective monitoring, detection, and quantification of security posture due to the increasing complexity and interconnectivity of IoT devices. This approach ensures that security measures evolve alongside emerging threats. ## Monitoring and Detection - CSA leverages IoT data analytics to enhance security monitoring, enabling proactive identification of threats through real-time data analysis. - Continuous authentication frameworks utilize contextual information to validate user access, significantly reducing unauthorized access risks. ## Security Posture Quantification - Quantitative Security Assurance (SA) metrics are crucial for evaluating the security posture of smart home systems, allowing organizations to assess vulnerabilities and strengths systematically. - A security-enabled safety assurance framework integrates safety and security measures, ensuring that operational data is continuously analyzed to maintain a safe environment. While CSA is vital for maintaining security in smart homes, it is also important to consider the potential for over-reliance on automated systems, which may lead to complacency in human oversight and decision-making.

The comparison of generative AI's creative output to human creativity reveals a complex interplay of originality and novelty. While generative AI can enhance individual creativity, it often leads to a reduction in collective diversity and novelty in creative outputs. ## Individual Creativity Enhancement - Generative AI tools, such as text-to-image models, have been shown to increase individual creative productivity by 25% and improve the perceived quality of outputs, especially among less creative individuals. - AI-assisted artists who explore novel ideas tend to produce more favorably evaluated artworks, indicating that human input remains crucial for achieving higher creativity levels. ## Collective Novelty Reduction - Despite enhancing individual creativity, generative AI outputs tend to be more similar to each other, leading to a decrease in collective novelty. - Studies indicate that reliance on AI can result in design fixation, where users produce fewer and less original ideas compared to traditional methods. In contrast, some argue that human creativity is still paramount, as it guides AI in producing innovative outputs. The synergy between human creativity and AI could foster new creative workflows, suggesting that collaboration rather than competition may define the future of creative expression.

Watershed management is crucial for sustainable development, as it integrates ecological health with socioeconomic benefits. Effective management ensures the preservation of vital resources, mitigates environmental risks, and enhances community well-being. Below are key reasons why watershed management is essential: ## Socioeconomic Benefits - Watersheds provide essential resources such as potable water, irrigation, and energy, which are vital for agriculture and industry. - Proper management enhances income-generating opportunities for local communities, thereby improving their overall quality of life. ## Environmental Protection - Effective watershed management helps in flood protection, sediment control, and biodiversity conservation, which are critical in the face of climate change. - It promotes sustainable land use practices that protect soil and water resources, ensuring long-term ecological balance. ## Integrated Resource Management - Watersheds serve as management units for interdependent natural resources, facilitating coordinated approaches to land and water conservation. - The integration of technology in watershed management can enhance resource efficiency and sustainability. While the benefits of watershed management are clear, challenges such as resource conflicts and climate variability must be addressed to ensure its effectiveness.

Public Finance Management (PFM) Theory faces several criticisms that highlight its limitations and the need for a more nuanced understanding of public budgeting and finance. ## Conceptual Limitations - PFM struggles with a clear theoretical framework, often conflating public and private sectors, which complicates its application in real-world scenarios. - The reliance on rationalist paradigms in budgeting has led to overly technical explanations that fail to address the complexities of public finance, resulting in incomplete theories. ## Educational Deficiencies - There is a significant pedagogic deficit in public administration education regarding PFM, with courses lacking a coherent curriculum, leading to inconsistencies in knowledge and practice among public administrators. ## Evolving Perspectives - The traditional focus on public finance has shifted towards public economics, emphasizing the need for a broader understanding of public expenditures and their implications for market failures. While these critiques underscore the challenges within PFM Theory, they also suggest pathways for its evolution, advocating for a more integrated approach that encompasses diverse theoretical insights and practical applications.

Services on Demand

Related links.

define statistical treatment in a research paper

South African Dental Journal

On-line version  issn 0375-1562 print version  issn 0011-8516, s. afr. dent. j. vol.71 n.6 johannesburg jul. 2016.

COMMUNICATION

Statistical terms Part 1: The meaning of the MEAN, and other statistical terms commonly used in medical research

L M Sykes I ; F Gani II ; Z Vally III

I BSc, BDS, MDent (Pros). Department of Prosthodontics, University of Pretoria II BDS, MSc. Department of Prosthodontics, University of Pretoria III BDS, MDent (Pros). Department of Prosthodontics, University of Pretoria

Correspondence

INTRODUCTION

A letter was recently sent to members of a research committee which read as follows: "Dear Members. We have 27 protocols to review and will divide them between all members. Each protocol will be evaluated by two people, thus you will all have to evaluate ±9 protocols"

The response from the resident statistician read: "Hello. I would like to correct this common statement highlighted above. Although it is a colloquial statement, it should be corrected among members. It is preferred to state that "each will evaluate between 7-11 protocols or 9±2 (7-11 protocols)."

This amusing, yet technically correct, anecdote brings home the realization that many researchers, supervisors, reviewers and clinicians do not fully understand many research concepts and statistical terms, nor the significance (non-statistically speaking) behind them. This is the first of a planned series of papers which aim to explain, clarify, and simplify a number of these apparently esoteric principles. With that objective, the series could help future researchers improve their study designs, as well as empower their readers with the knowledge needed to critically evaluate any ensuing literature. The series will begin with definitions and explanations of statistical terms, and then will deal with experimental designs and levels of evidence.

The information and layout of Paper One is based on notes from the University of Barotseland 1 and on the work of Sch-oeman. 2 However, we recognise that the human mind responds better to stories and illustrations than to numbers and statistics. For this reason the paper has been interspersed with many "Quotes and anecdotes to engage and amuse the reader, and help promote their memory", referenced by name where possible (Steven Pinker 3 ).

Scientific research refers to the "systematic technique for the advancement of knowledge and consists of developing a theory that may or may not be proven true when subject to empirical methods." 4 It should have an appropriate experimental design that produces objective data and valid results. These should be accurately analyzed and reported, so that they cannot be erroneously or ambiguously interpreted. 4 This of course is in direct contrast to the satirical remark of Evan Esar who defined statistics as The science of producing unreliable facts from reliable figures". 3 Classic research presupposes that a specific question can be answered, and then endeavours to do so by using a proper experimental design and following a step-wise approach of defining the problem (usually based on some observation), formulating a hypothesis (an educated guess to try to explain the problem / phenomenon), and then collecting and analyzing the data to prove or disprove the hypothesis.

This refers to any facts, observations, and information that come from investigations, and is the foundation upon which new knowledge is built. To paraphrase Author Co-nan Doyle "A theory remains a theory until it is backed up by data." 5 Data can be either quantitative or qualitative.

1.1 Quantitative data is information about quantities that can be measured and written down in numbers (e.g. test score, weight).

1.2 Qualitative data is also called categorical or frequency data, and cannot be expressed in terms of numbers. Items are grouped according to some common property, after which the number of members per group are recorded (e.g. males/females, vehicle type).

In research, the target population includes all of those entities from which the researcher wishes to draw conclusions. However, it is impractical to try to conduct research on an entire population and for this reason only a small portion of the population is studied, i.e. a sample. The inclusion and exclusion criteria will help define and narrow down the target population (in human research). Sampling refers to the process of selecting research subjects from the population of interest in such a way that they are representative of the whole population.

2.1 The sample population is that small selection from the whole who are included in the research. Inferential statistics seek to make predictions about a population based on the results observed in a sample of that population.

2.2 Sample size refers to the number of patients / test specimens that finish the study and not the number that entered it. When determining sample size, most researchers would want to keep this number as low as possible for reasons of practicality, material costs, time, and availability of facilities and patients. However, the lower limit will also depend on the estimated variation between subjects. Where there is great variation, a larger sample number will be needed. Statistical analysis always takes into consideration the sample size. As Joseph Stalin put it, "A single death is a tragedy; a million deaths is a statistic." 5

2.3 Non-responders refers to those persons who refuse to take part in the study, who do not comply with study protocol, or who do not complete the entire study. Their non-participation could result in an element of bias, and can only be ignored if their reasons for refusal will not affect the interpretation of the findings.

2.4 Sampling methods are divided into nonprobability and

probability sampling. In the former, not every member of the population has a chance of being selected, while in the latter, they all do have an equal chance.

2.4.1 Nonprobability

a) Convenience sampling refers to taking persons as they arrive on the scene and is continued until the full desired sample number has been obtained. It is NOT representative of the population.
b) Quota sampling is similar to convenience sampling except that those sampled are selected in the same ratio as they are found in the general population.

2.4.2 Probability

a) Random sampling is when the study subjects are chosen completely by chance. At each draw, every member of the population has the same chance of being selected as any other person. Tables of random digits are available to ensure true randomness.
b) stratified random samples are constructed by first dividing a heterogeneous population into strata and then taking random samples from within each stratum. Strata may be chosen to reflect only one or more aspects of that population (e.g. gender, age, ethnicity).
c) systematic sampling involves having the population in a predetermined sequence e.g. names in alphabetical order. A starting point is then picked randomly and the person whose name falls in that position is taken as the first to be sampled.
d) Cluster sampling is when the population is first divided into natural subgroups, often based on their being geographically close to each other e.g. houses in a street, staff in one hospital. A number of clusters are then randomly sampled.

2.5 Generalization is an attempt to extend the results of a sample to a population and can only be done when the sample is truly representative of the entire population. Generalizing the results obtained from a sample to the broad population must take into account sample variation. Even if the sample selected is completely random, there is still a degree of variance within the population that will require your results from within a sample to include a margin of error. The greater the sample size, the more representative it tends to be of a population as a whole. Thus the margin of error falls and the confidence level rises.

2.6 Bias is a threat to a sample's validity, and prevents impartial consideration. It can come in many forms and can stem from many sources such as the researcher, the participants, study design or sample. The most common bias is due to the selection of subjects. For example, if subjects self-select into a sample group, then the results are no longer externally valid, as the type of person who wants to be in a study is not necessarily similar to the population that one is seeking to draw inferences about. Examples of bias could be: Cognitive bias, which refers to human factors, such as decisions being made on perceptions rather than evidence; Sample bias, where the sample is skewed so that certain specimens or persons are unrepresented, or have been specifically selected in order to prove a hypothesis. 4

2.7 Prevalence refers to the proportion of cases present in a population at a specified point in time, hence it explains how widespread is the disease. (Memory Point - remember all the P's).

2.8 Incidence is the number of new cases that occurred over a specific time, and gives an indication about the risk of contracting a disease. 6

3. EXPERIMENTAL DESIGN

Design relates to the manner in which the data will be obtained and analyzed. For this reason, consultation with a statistician is crucial during the preparation phases of any research. Prior to embarking on the study one must already have determined the target population, sampling methods, sample size, data collection methods, and statistical tests that will be used to analyze the findings. Many studies fail or produce invalid results because this crucial step was neglected during the planning stages. As William James commented "We must be careful not to confuse data with the abstractions we use to analyse them". Light et al were more blunt in stating "You can't fix by analysis what you bungled by design". 5

3.1 Descriptive statistics are used for studies that explore observed data. In descriptive statistics, it often helpful to divide data into equal-sized subsets. For example, dividing a list of individuals sorted by height into two parts - the tallest and the shortest, results in two quantiles, with the median height value as the dividing line. Quartiles separate data set into four equal-sized groups, deciles into 10 groups etc. 1

3.2 inferential statistics are used when you don't have access to the whole population or it is not feasible to measure all the data. Smaller samples are then taken and inferential statistics are used to make generalizations about the whole group from which the sample was drawn e.g. "Receiving your college degree increases your lifetime earnings by 50%" is an inferential statistic. 1 A word of caution, one has to be very clear of the meaning and interpretation of results presented as percentages. Consider the issue of percentages versus percentage points - they are not the same thing. For example, "if 40 out of 100 homes in a distressed suburb have mortgages, the rate is 40%. If a new law allows 10 homeowners to refinance, now only 30 mortgages are troubled. The new rate is 30%, a drop of 10 percentage points (40 - 30 = 10). This is not 10% less than the old rate, in fact, the decrease is 25% (10 / 40 = 0.25 = 25%)". 4 Another classic example of mis-representation of data was a recent survey on smoking habits of final year medical students. There was only one Indian student in the class who also happened to be a smoker. The resulting report declared that "100% of Indian students smoke". In the words of Henry Clay, one must still bear in mind that "Statistics are no substitute for judgement". 5

I n all research, a certain amount of variability will occur when humans are measuring objects or observing phenomena. This will depend on the accuracy of the measuring tool, and the manner in which it is used by the operator on each successive occasion. Thus, error does not mean a mistake, but rather it describes the variability in measurement in the study. The amount of error must be recognized, delineated, and taken into account in order to give true meaning to the data. When humans are involved, the amount of error can be defined as inter-operator (differences between different operators), or intra-operator (differences when performed by the same operator at different times). To overcome this, a certain number of objects are measured many times and by different people to detect the variation. This will then set the limits as to how accurate the results will be. 4

3.4 Accuracy, Precision, Reliability and Validity

a) Accuracy is a measure of how close measurements are to the true value.
b) Precision is the degree to which repeated measurements will produce the same results (or how close the measures are to each other).
c) Reliability is the degree to which a method produces the same results (consistency of the results) when it is used at different times, under different circumstances, by either the same or multiple observers. It can be tested by conducting inter-observer or intra-observer studies to determine error rates. Low inter-observer variation (or error) indicates high reliability. 4 The research must test what is it supposed to test, and must ensure adequacy and appropriateness of the interpretation and application of the results.
Results can have low accuracy but high precision and vice versa, which impact on the validity and reliability. An example to illustrate this would be aiming an arrow at the centre of a target. If all arrows are close together and in the centre of the target you have high accuracy and precision ( Figure 1a ). Results are then considered valid and reliable. If all arrows are both far away from the centre, and spread out, there is low accuracy, low precision. Results are neither valid nor reliable ( Figure 1b ). Lastly, if the arrows are all far off the centre but still all close to each other, it indicates that a mistake has been made, but the same mistake is made each time. Thus, there is low accuracy but high precision, and the results are not valid, despite being reliable ( Figure 1c ). 7,8
d) Validity refers to how appropriate and adequate the test is for that specific purpose. It also considers how correctly the results are interpreted and subsequently used.    

A note on sensitivity and specificity.

Sensitivity and specificity are used as statistical measures to determine the effectiveness of a medical diagnostics. Sensitivity is a measure of the number of true positives and is calculated from the formula [true positive/true positive + false negative], while specificity is a measure of the amount of true negatives and is calculated by [true negative/true negative + false positive].

4. VARIABLE

This is the property of an object or event that can take on different values. For example, college major is a variable that takes on values like mathematics, computer science, English, psychology. 1

4.1 Discrete Variable has a limited number of values e.g. gender (male or female)

4.2 Continuous Variable can take on many different values anywhere between the lowest and highest points on the measurement scale.

4.3 Dependent Variable is that variable in which the researcher is interested, but is not under his/her control. It is observed and measured in response to the independent variable.

4.4 Independent Variable is a variable that is manipulated, measured, or selected by the researcher as an antecedent (precursor) condition to an observed behaviour. In a hypothesized cause-and-effect relationship, the independent variable is the cause and the dependent variable is the outcome or effect.

5. MEASURES OF CENTRE

Plotting data in a frequency distribution shows the general shape of the distribution and gives a general sense of how the numbers are bunched. Several statistics can be used to represent the "centre" of the distribution. These statistics are commonly referred to as measures of central tendency. 1

5.1 Mean (average) - is the most common measure of central tendency and refers to the average value of a group of numbers. Add up all the figures, divide by the number of values, and that is the average or mean It is calculated from the formula ΣΧ / N. [The sum all the scores in the distribution ( ΣΧ ) divided by the total number of scores (N)]. If you subtract each value in the distribution from the mean and then sum all of these deviation scores, the result will be zero (* see below). As one comic put it " Whenever I read statistical reports, I try to imagine the unfortunate Mr Average Person who has 0.66 children, 0.032 cars and 0.046 TVs". 3

5.2 Median - is the score that divides the distribution into halves; half of the scores are above the median and half are below it when the data are arranged in numerical order. It is the central value, and can be useful if there is an extremely high or low value in a collection of values. The median is also referred to as the score at the 50 th percentile in the distribution. The median location of N numbers can be found by the formula (N + 1) / 2. When N is an odd number, the formula yields an integer that represents the value in a numerically ordered distribution corresponding to the median location. (For example, in the distribution of numbers (3 1 5 4 9 9 8) the median location is (7 + 1) / 2 = 4. When applied to the ordered distribution (1 3 4 5 8 9 9), the value 5 is the median, three scores are above 5 and three are below 5. If there were only 6 values (1 3 4 5 8 9), the median location is (6 + 1) / 2 = 3.5. In this case the median is half-way between the 3 rd and 4 th scores (4 and 5) or 4.5.

5.3 Mode - is the most frequent or common score in the distribution, and is the point or value of Χ that corresponds to the highest point on the distribution. If the highest frequency is shared by more than one value, the distribution is said to be multimodal, and will be reflected by peaks at two different points in the distribution.

6. MEASURES OF SPREAD

Although the average value gives information about how scores are centred in the distribution, the mean, median, and mode do not help much when interpreting those statistics. Measures of variability provide information about the degree to which individual scores are clustered about, or deviate from the average value in a distribution. 1

6.1 Range is the difference between the highest and lowest score in a distribution. It is not often used as the sole measure of variability because it is based solely on the most extreme scores in the distribution and does not reflect the pattern of variation within a distribution.

a) Interquartile Range (IQR) provides a measure of the spread of the middle 50% of the scores. The IQR is defined as the 75 th percentile - the 25 th percentile. The interquartile range plays an important role in the graphical method known as the boxplot. The advantage of using the IQR is that it is easy to compute and extreme scores in the distribution have much less impact. However, it suffers as a measure of variability because it discards too much data. Nevertheless, researchers want to study variability while eliminating scores that are likely to be accidents. The boxplot allows for this for this distinction and is an important tool for exploring data.

6.2 Variance is a measure based on the deviations of individual scores from the mean. As noted in the definition of the mean (5.1 above), simply summing the deviations will result in a value of 0. To get around this problem the variance is based on squared deviations of scores about the mean. When the deviations are squared, the rank order and relative distance of scores in the distribution is preserved while negative values are eliminated. Then to control for the number of subjects in the distribution, the sum of the squared deviations is divided by n (population) or by n - 1 (sample). The formula for variance is thus s 2 = Σ ( χ - Χ ) 2 /(n-1). The result is the average of the sum of the squared deviations and it is called the variance.

6.3 standard deviation provides insight into how much variation there is within a group of values. It measures the deviation (difference) from the group's mean (average). The standard deviation (s or σ ) is the positive square root of the variance. The variance is a measure in squared units and has little meaning with respect to the data. Thus, the standard deviation is a measure of variability expressed in the same units as the data. The standard deviation is very much like a mean or an "average" of these deviations. In a normal (symmetric and mound-shaped) distribution, about two-thirds of the scores fall between +1 and -1 standard deviations from the mean and the standard deviation is approximately 1/4 of the range in small samples (N< 30) and 1/5 to 1/6 of the range in large samples (N> 100).

Standard deviation and variance are both measures of variability. The variance describes how much each value in the data set deviates from the mean (i.e. the spread of the responses), and is a squared value. The standard deviation also describes variability and is defined as the square root of the variance. This allows for a description of the variability in the same units as the data. A low SD will mean that the points of data are close to the mean, and a high SD indicates that the data is spread over a wide range of values. The SD is also used to describe the margin of error in the statistical analysis. This is usually twice the SD, typically described by the 95% confidence level. Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter. After a sample is taken, the population parameter is either in the interval or not. The desired level of confidence is set by the researcher beforehand, for example 90%, 95%, 99%. If a corresponding hypothesis test is performed, the confidence level is the complement of the level of significance, i.e. a 95% confidence interval reflects a significance level of 0.05. Greater levels of variance yield larger confidence intervals, and hence less precise estimates of the parameter. Certain factors may affect the confidence interval size including size of sample, level of confidence, and population variability. A larger sample size normally will lead to a better estimate of the population parameter.

7. measures of shape

For distributions summarizing data from continuous measurement scales, statistics can be used to describe how the distribution rises and drops. 1

7.1 Symmetric refers to distributions that have the same shape on both sides of the centre are called symmetric. A symmetric distribution with only one peak is referred to as a normal distribution.

7.2 Skewness refers to the degree of asymmetry in a distribution. Asymmetry often reflects extreme scores in a distribution.

a) Positively skewed is when the distribution has a tail extending out to the right (larger numbers). In this case, the mean is greater than the median reflecting the fact that the mean is sensitive to each score in the distribution and is subject to large shifts when the sample is small and contains extreme scores. b) Negatively skewed is when the distribution has an extended tail pointing to the left (smaller numbers) and reflects bunching of numbers in the upper part of the distribution with fewer scores at the lower end of the measurement scale.

7.3 Kurtosis has a specific mathematical definition, but generally, it refers to how scores are concentrated in the centre of the distribution, the upper and lower tails (ends), and the shoulders (between the centre and tails) of a distribution. 6

8. the hypothesis

A hypothesis is an assumption about an unknown fact. Donald Rumsfeld may have been trying to explain this when he said "We know there are known knowns; these are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns - the ones we don't know we don't know". 5 Most studies explore the relationship between two variables, for example, that prenatal exposure to pesticides is associated with lower birth weight. This is called the alternative hypothesis. The null hypothesis (Ho) is the opposite of the stated hypothesis (i.e. there is no relationship in the data, or the treatment did not have any effect). Well-designed studies seek to disprove the Ho, in this case, that prenatal pesticide exposure is not associated with lower birth weight.

Tests of the results determine the probability of seeing such results if the Ho were true. The p-value indicates how unlikely this would be, or helps determine the amount of evidence needed to demonstrate that the results more than likely did not occur by chance. It describes the probability of observing results if the null hypothesis is true. P value describes the statistical significance of the data, and is set at an arbitrary value. These are usually set with a cut-off point of 0.05 (5%) or 0.01 (1%). E.g. data with a p value of 0.01 means there is only a 1% chance of obtaining that same result if there was no real effect of the experiment (a 1% chance that the null hypothesis is true). If the Ho can be rejected, then the test will be statistically 'significant' NB. Significant is a statistical term and does not mean important!

9. correlation

This refers to the association between variables, particularly where they move together.

9.1 Positive correlation means that as one variable rises or falls, the other does as well (e.g. caloric intake and weight).

9.2 Negative correlation indicates that two variables move in opposite directions (e.g. vehicle speed and travel time).

9.3 Causation must not be confused with correlation. Causation is when a change in one variable alters another, but causation flows in only ONE direction. It is also known as cause and effect. E.g. Sunrise causes an increase in air temperature, in addition sunlight is positively correlated with increased temperature. However, the reverse is not true - increased temperature does not cause sunrise.

a) Regression analysis is a way to determine if there is or is not a correlation between two (or more) variables and how strong any correlation may be. It usually involves plotting data points on an X/Y axis, then looking for the average causal effect. This means looking at how the graph's dots are distributed and establishing a trend line. Again, correlation is not necessarily causation. While causation is sometimes easy to prove, frequently it can often be difficult because of confounding variables (unknown factors that affect the two variables being studied). Again, once causation has been established, the factor that drives change (in the above example, sunlight) is the independent variable. The variable that is driven is the dependent variable (see point 4 above).

CONCLUSIONS

Understanding commonly used statistical terms should help clinicians decipher and understand research data analysis, and equip them with the knowledge needed to analyze results more critically. Perhaps then, the old adage of "All readers can read, but not all who can read are readers" will no longer be true of those reading the SADJ.

1. University of Barotseland, Statistics - Introduction to Basic Concepts, in bobhall.tamu.edu/FiniteMath/Introduction.html . 2014.         [  Links  ]

2. Schoeman, H. Biostatistics for the Health Sciences, University of Medunsa, Editor. 2003: South Africa. p. 78-91.         [  Links  ]

3. Wikipedia. http://www.brainyquote.com/quotes/keywords/statistics.html 2015.         [  Links  ]

4. Senn, D., Weems, RA., Manual of Forensic Odontology. 5th ed., ed. C. Press. 1997. Chapter 3.         [  Links  ]

5. Light, R., Singer, JD., Willett, JB. You can't fix by analysis what you bungled by design. Course materials, quotes, in https://advanceddataanalytics.net/quotes/ . 2014.         [  Links  ]

6. Wikimedia. Statistical terms used in research studies; a primer for media. 2015: journalistresource.org/research/statistics-for-journalists .         [  Links  ]

7. Green Thompson, L. Multiple choice exam setting workshop. 2015, accessed information: http:/download.usmle.org/iwtu- torial/intro.htm : Johannesburg.         [  Links  ]

8. Green Thompson, L. Multiple choie question paper setting. 2015, Accessed at: http:/www.nbme.org/publications/item-writing-manual/html : Johannesburg.         [  Links  ]

define statistical treatment in a research paper

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Types of Variables in Research & Statistics | Examples

Types of Variables in Research & Statistics | Examples

Published on September 19, 2022 by Rebecca Bevans . Revised on June 21, 2023.

In statistical research , a variable is defined as an attribute of an object of study. Choosing which variables to measure is central to good experimental design .

If you want to test whether some plant species are more salt-tolerant than others, some key variables you might measure include the amount of salt you add to the water, the species of plants being studied, and variables related to plant health like growth and wilting .

You need to know which types of variables you are working with in order to choose appropriate statistical tests and interpret the results of your study.

You can usually identify the type of variable by asking two questions:

  • What type of data does the variable contain?
  • What part of the experiment does the variable represent?

Table of contents

Types of data: quantitative vs categorical variables, parts of the experiment: independent vs dependent variables, other common types of variables, other interesting articles, frequently asked questions about variables.

Data is a specific measurement of a variable – it is the value you record in your data sheet. Data is generally divided into two categories:

  • Quantitative data represents amounts
  • Categorical data represents groupings

A variable that contains quantitative data is a quantitative variable ; a variable that contains categorical data is a categorical variable . Each of these types of variables can be broken down into further types.

Quantitative variables

When you collect quantitative data, the numbers you record represent real amounts that can be added, subtracted, divided, etc. There are two types of quantitative variables: discrete and continuous .

Discrete vs continuous variables
Type of variable What does the data represent? Examples
Discrete variables (aka integer variables) Counts of individual items or values.
Continuous variables (aka ratio variables) Measurements of continuous or non-finite values.

Categorical variables

Categorical variables represent groupings of some kind. They are sometimes recorded as numbers, but the numbers represent categories rather than actual amounts of things.

There are three types of categorical variables: binary , nominal , and ordinal variables .

Binary vs nominal vs ordinal variables
Type of variable What does the data represent? Examples
Binary variables (aka dichotomous variables) Yes or no outcomes.
Nominal variables Groups with no rank or order between them.
Ordinal variables Groups that are ranked in a specific order. *

*Note that sometimes a variable can work as more than one type! An ordinal variable can also be used as a quantitative variable if the scale is numeric and doesn’t need to be kept as discrete integers. For example, star ratings on product reviews are ordinal (1 to 5 stars), but the average star rating is quantitative.

Example data sheet

To keep track of your salt-tolerance experiment, you make a data sheet where you record information about the variables in the experiment, like salt addition and plant health.

To gather information about plant responses over time, you can fill out the same data sheet every few days until the end of the experiment. This example sheet is color-coded according to the type of variable: nominal , continuous , ordinal , and binary .

Example data sheet showing types of variables in a plant salt tolerance experiment

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

define statistical treatment in a research paper

Experiments are usually designed to find out what effect one variable has on another – in our example, the effect of salt addition on plant growth.

You manipulate the independent variable (the one you think might be the cause ) and then measure the dependent variable (the one you think might be the effect ) to find out what this effect might be.

You will probably also have variables that you hold constant ( control variables ) in order to focus on your experimental treatment.

Independent vs dependent vs control variables
Type of variable Definition Example (salt tolerance experiment)
Independent variables (aka treatment variables) Variables you manipulate in order to affect the outcome of an experiment. The amount of salt added to each plant’s water.
Dependent variables (aka ) Variables that represent the outcome of the experiment. Any measurement of plant health and growth: in this case, plant height and wilting.
Control variables Variables that are held constant throughout the experiment. The temperature and light in the room the plants are kept in, and the volume of water given to each plant.

In this experiment, we have one independent and three dependent variables.

The other variables in the sheet can’t be classified as independent or dependent, but they do contain data that you will need in order to interpret your dependent and independent variables.

Example of a data sheet showing dependent and independent variables for a plant salt tolerance experiment.

What about correlational research?

When you do correlational research , the terms “dependent” and “independent” don’t apply, because you are not trying to establish a cause and effect relationship ( causation ).

However, there might be cases where one variable clearly precedes the other (for example, rainfall leads to mud, rather than the other way around). In these cases you may call the preceding variable (i.e., the rainfall) the predictor variable and the following variable (i.e. the mud) the outcome variable .

Once you have defined your independent and dependent variables and determined whether they are categorical or quantitative, you will be able to choose the correct statistical test .

But there are many other ways of describing variables that help with interpreting your results. Some useful types of variables are listed below.

Type of variable Definition Example (salt tolerance experiment)
A variable that hides the true effect of another variable in your experiment. This can happen when another variable is closely related to a variable you are interested in, but you haven’t controlled it in your experiment. Be careful with these, because confounding variables run a high risk of introducing a variety of to your work, particularly . Pot size and soil type might affect plant survival as much or more than salt additions. In an experiment you would control these potential confounders by holding them constant.
Latent variables A variable that can’t be directly measured, but that you represent via a proxy. Salt tolerance in plants cannot be measured directly, but can be inferred from measurements of plant health in our salt-addition experiment.
Composite variables A variable that is made by combining multiple variables in an experiment. These variables are created when you analyze data, not when you measure it. The three plant health variables could be combined into a single plant-health score to make it easier to present your findings.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

  • The  independent variable  is the amount of nutrients added to the crop field.
  • The  dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 21). Types of Variables in Research & Statistics | Examples. Scribbr. Retrieved September 2, 2024, from https://www.scribbr.com/methodology/types-of-variables/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, independent vs. dependent variables | definition & examples, confounding variables | definition, examples & controls, control variables | what are they & why do they matter, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

IMAGES

  1. Statistical Treatment

    define statistical treatment in a research paper

  2. Analysis In A Research Paper

    define statistical treatment in a research paper

  3. Research methodology notes

    define statistical treatment in a research paper

  4. Statistical Treatment of Data

    define statistical treatment in a research paper

  5. How To Make Statistical Treatment In Quantitative Research

    define statistical treatment in a research paper

  6. Statistical Treatment 2-20-18

    define statistical treatment in a research paper

VIDEO

  1. Dancing Is The Most Effective Treatment For Depression #funfact

  2. Selecting the Appropriate Hypothesis Test [FIL]

  3. WHAT STATISTICAL TREATMENT WILL YOU BE USING IN THE STUDY?

  4. INDEPENDENT SAMPLES T-TEST USING SPSS

  5. What is Statistical Learning?

  6. Issues and Solutions in Psychiatric Clinical Trial

COMMENTS

  1. Statistical Treatment of Data

    Statistical Treatment Example - Quantitative Research. For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the ...

  2. Research Paper Statistical Treatment of Data: A Primer

    Research Paper Statistical Treatment of Data: A Primer. March 11, 2024. We can all agree that analyzing and presenting data effectively in a research paper is critical, yet often challenging. This primer on statistical treatment of data will equip you with the key concepts and procedures to accurately analyze and clearly convey research ...

  3. Statistical Treatment

    The term "statistical treatment" is a catch all term which means to apply any statistical method to your data. Treatments are divided into two groups: descriptive statistics, which summarize your data as a graph or summary statistic and inferential statistics, which make predictions and test hypotheses about your data. Treatments could include:

  4. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  5. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  6. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  7. Basics of statistics for primary care research

    Correlation analysis has three general outcomes: (1) the two variables rise and fall together; (2) as values in one variable rise, the other falls; and (3) the two variables do not appear to be systematically related. To make those determinations, we use the correlation coefficient (r) and related p value or CI.

  8. PDF Chapter 10. Experimental Design: Statistical Analysis of Data Purpose

    Now, if we divide the frequency with which a given mean was obtained by the total number of sample means (36), we obtain the probability of selecting that mean (last column in Table 10.5). Thus, eight different samples of n = 2 would yield a mean equal to 3.0. The probability of selecting that mean is 8/36 = 0.222.

  9. Statistical Analysis in Research: Meaning, Methods and Types

    A Simplified Definition. Statistical analysis uses quantitative data to investigate patterns, relationships, and patterns to understand real-life and simulated phenomena. The approach is a key analytical tool in various fields, including academia, business, government, and science in general. This statistical analysis in research definition ...

  10. The Treatment of Data

    Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended.

  11. Inferential Statistics

    Example: Inferential statistics. You randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics. You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

  12. PDF CHAPTER 3 COMMONLY USED STATISTICAL TERMS

    sophisticated statistical analysis rarely, if ever, compen-sates for a poorly conceived project, a poorly constructed research design, or an inaccurate data collection instru-ment. Thus, statistics certainly may aid the researcher but are never a substitute for good, sound thinking and atten-tion to the scientific method and research process. For

  13. Dealing powerfully with statistical power: A narrative revie

    study ensures reliable results and avoids wastage of resources. It is essential for a clinician to be aware of the role and interpretation of statistical power while reading the research articles. We performed a medical literature database search in PubMed, Embase, Cochrane, and Google, followed by abstract screening and then full paper study selection to gather the desired information and ...

  14. Statistical Treatment of Data

    Statistical treatment of data also involves describing the data. The best way to do this is through the measures of central tendencies like mean, median and mode. These help the researcher explain in short how the data are concentrated. Range, uncertainty and standard deviation help to understand the distribution of the data.

  15. (PDF) An Overview of Statistical Data Analysis

    collection, handling and sorting of data, given the insight of a particular phenomenon and the possibility. that, from that knowledge, inferring possible new results. One of the goals with ...

  16. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organisations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  17. (PDF) Chapter 3 Research Design and Methodology

    Research Design and Methodology. Chapter 3 consists of three parts: (1) Purpose of the. study and research design, (2) Methods, and (3) Statistical. Data analysis procedure. Part one, Purpose of ...

  18. What is statistical analysis?

    A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation ("x affects y because …"). A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses.

  19. Statistical considerations for outcomes in clinical research: A review

    All clinical trials and many observational studies have a designated primary outcome of interest, which is the quantitative metric used to determine the effect of the treatment or exposure. The statistical properties, such as its probability distribution, of the outcome variable and quantifying changes in said variable due to the exposure are ...

  20. What is statistical treatment?

    Insight from top 3 papers. Statistical treatment refers to the application of statistical methods and techniques to analyze and interpret data in various fields. It involves using statistical tools to process and analyze data, make inferences, and draw conclusions. Statistical treatment plays a crucial role in medical decision-making, where it ...

  21. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate ...

  22. Statistical terms Part 1: The meaning of the MEAN, and other

    The information and layout of Paper One is based on notes from the University of Barotseland 1 and on the work of Sch-oeman. 2 However, we recognise that the human mind responds better to stories and illustrations than to numbers and statistics. For this reason the paper has been interspersed with many "Quotes and anecdotes to engage and amuse ...

  23. Types of Variables in Research & Statistics

    Example (salt tolerance experiment) Independent variables (aka treatment variables) Variables you manipulate in order to affect the outcome of an experiment. The amount of salt added to each plant's water. Dependent variables (aka response variables) Variables that represent the outcome of the experiment.