ad

  • Factorial Designs
  • Response Surface Designs
  • Randomized Designs

Table of contents

Section contents.

  • Capabilities
  • Requirements
  • Important note
  • Automatic install or upgrade
  • Manual download and install
  • Source code

Quick search

Search the documentation.

Documentation license

Pydoe : the experimental design package for python ¶.

The pyDOE package is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs .

All available designs can be accessed after a simple import statement:

Capabilities ¶

The package currently includes functions for creating designs for any number of factors:

  • General Full-Factorial ( fullfact )
  • 2-Level Full-Factorial ( ff2n )
  • 2-Level Fractional-Factorial ( fracfact )
  • Plackett-Burman ( pbdesign )
  • Box-Behnken ( bbdesign )
  • Central-Composite ( ccdesign )
  • Latin-Hypercube ( lhs )

Requirements ¶

Installation and download ¶, important note ¶.

The installation commands below should be run in a DOS or Unix command shell ( not in a Python shell).

Under Windows (version 7 and earlier), a command shell can be obtained by running cmd.exe (through the Run… menu item from the Start menu). Under Unix (Linux, Mac OS X,…), a Unix shell is available when opening a terminal (in Mac OS X, the Terminal program is found in the Utilities folder, which can be accessed through the Go menu in the Finder).

Automatic install or upgrade ¶

One of the automatic installation or upgrade procedures below might work on your system, if you have a Python package installer or use certain Linux distributions.

Under Unix, it may be necessary to prefix the commands below with sudo , so that the installation program has sufficient access rights to the system .

If you have pip , you can try to install the latest version with

If you have setuptools , you can try to automatically install or upgrade this package with

Manual download and install ¶

Alternatively, you can simply download the package archive from the Python Package Index (PyPI) and unpack it. The package can then be installed by going into the unpacked directory ( pyDOE-... ), and running the provided setup.py program with

or, for an installation in the user Python library (no additional access rights needed):

or, for an installation in a custom directory my_directory :

or, if additional access rights are needed (Unix):

You can also simply move the pyDOE-py* directory that corresponds best to your version of Python to a location that Python can import from (directory in which scripts using pyDOE are run, etc.); the chosen pyDOE-py* directory should then be renamed pyDOE . Python 3 users should then run 2to3 -w . from inside this directory so as to automatically adapt the code to Python 3.

Source code ¶

The latest, bleeding-edge but working code and documentation source are available on GitHub .

Any feedback, questions, bug reports, or success stores should be sent to the author . I’d love to hear from you!

This code was originally published by the following individuals for use with Scilab:

  • Copyright (C) 2012 - 2013 - Michael Baudin
  • Copyright (C) 2012 - Maria Christopoulou
  • Copyright (C) 2010 - 2011 - INRIA - Michael Baudin
  • Copyright (C) 2009 - Yann Collette
  • Copyright (C) 2009 - CEA - Jean-Marc Martinez
  • Website: forge.scilab.org/index.php/p/scidoe/sourcetree/master/macros

Much thanks goes to these individuals.

This package is provided under two licenses:

  • The BSD License (3-Clause)
  • Any other that the author approves (just ask!)

References ¶

  • Factorial designs
  • Plackett-Burman designs
  • Box-Behnken designs
  • Central composite designs
  • Latin-Hypercube designs

There is also a wealth of information on the NIST website about the various design matrices that can be created as well as detailed information about designing/setting-up/running experiments in general.

ad

  • Factorial Designs
  • Response Surface Designs
  • Randomized Designs
  • pyDOE3 : An experimental design package for python

Table of contents

Section contents.

  • Capabilities
  • Requirements
  • Installation

Quick search

Search the documentation.

Documentation license

Pydoe3 : an experimental design package for python ¶.

pyDOE3 is fork of pyDOE2 which is a fork of pyDOE.

As for pyDOE2 wrt to pyDOE , pyDOE3 came to life to solve bugs and issues that remained unsolved in pyDOE2 .

The pyDOE3 package is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs .

All available designs can be accessed after a simple import statement:

Capabilities ¶

The package currently includes functions for creating designs for any number of factors:

General Full-Factorial ( fullfact )

2-Level Full-Factorial ( ff2n )

2-Level Fractional-Factorial ( fracfact )

Plackett-Burman ( pbdesign )

Generalized Subset Design ( gsd )

Response-Surface Designs

Box-Behnken ( bbdesign )

Central-Composite ( ccdesign )

Latin-Hypercube ( lhs )

Requirements ¶

Installation ¶.

or with Anaconda distribution

This code was originally published by the following individuals for use with Scilab:

Copyright (C) 2012 - 2013 - Michael Baudin

Copyright (C) 2012 - Maria Christopoulou

Copyright (C) 2010 - 2011 - INRIA - Michael Baudin

Copyright (C) 2009 - Yann Collette

Copyright (C) 2009 - CEA - Jean-Marc Martinez

Copyright (c) 2014, Abraham D. Lee & tisimst

Copyright (c) 2018, Rickard Sjögren & Daniel Svensson

Much thanks goes to these individuals.

This package is provided under The BSD License (3-Clause)

References ¶

Factorial designs

Plackett-Burman designs

Box-Behnken designs

Central composite designs

Latin-Hypercube designs

There is also a wealth of information on the NIST website about the various design matrices that can be created as well as detailed information about designing/setting-up/running experiments in general.

  • Docs »
  • dexpy - Design of Experiments (DOE) in Python
  • View page source

dexpy - Design of Experiments (DOE) in Python ¶

dexpy is a Design of Experiments (DOE) package based on the Design-Expert ® software from Stat-Ease, Inc . If you’re new to the area of DOE, here is a primer to help get you started.

The primary purpose of this package is to construct experimental designs. After performing your experiment, you can analyze the collected data using packages such as statsmodels . However, there are also functions that fill in holes in the existing statistical analysis packages, for example statistical power .

  • Dependencies on Windows
  • Screening Designs
  • Response Surface Designs
  • Mixture Designs
  • Optimal Designs
  • Statistical Power
  • Analyzing the Results
  • Confirmation
  • Optimization
  • Problem Description
  • Design Choice
  • Run the Experiment
  • Fit a Model

Indices and tables ¶

  • Search Page

tirthajyoti.github.io

Design-of-experiment (doe) matrix generator for engineering and statistics.

design of experiments python tutorial

Copyright Notice and Code repository

Copyright (c): 2018-2028, Dr. Tirthajyoti Sarkar, Sunnyvale, CA 94086

It uses a MIT License, so although I retain the copyright of this particular code, please feel free to exercise your rights of the free software by using and enhancing it.

Please get the codebase from here .

Table of Contents

What is a scientific experiment, what is experimental design, options for open-source doe builder package in python, limitation of the foundation packages used, simplified user interface, designs available, what supporitng packages are required, eratta for using pydoe, how to run the program, is an installer/python library available, full-factorial design, fractional-factorial design, central-composite design, latin hypercube design, acknowledgements and requirements, introduction.

Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis. This exercise has become critical in this age of rapidly expanding field of data science and associated statistical modeling and machine learning . A well-planned DOE can give a researcher meaningful data set to act upon with optimal number of experiments preserving critical resources.

After all, aim of Data Science is essentially to conduct highest quality scientific investigation and modeling with real world data. And to do good science with data, one needs to collect it through carefully thought-out experiment to cover all corner cases and reduce any possible bias.

In its simplest form, a scientific experiment aims at predicting the outcome by introducing a change of the preconditions, which is represented by one or more independent variables , also referred to as “input variables” or “predictor variables.” The change in one or more independent variables is generally hypothesized to result in a change in one or more dependent variables , also referred to as “output variables” or “response variables.” The experimental design may also identify control variables that must be held constant to prevent external factors from affecting the results.

Experimental design involves not only the selection of suitable independent, dependent, and control variables, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources. There are multiple approaches for determining the set of design points (unique combinations of the settings of the independent variables) to be used in the experiment.

Main concerns in experimental design include the establishment of validity , reliability , and replicability . For example, these concerns can be partially addressed by carefully choosing the independent variable, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and sensitivity .

Need for careful design of experiment arises in all fields of serious scientific, technological, and even social science investigation —  computer science, physics, geology, political science, electrical engineering, psychology, business marketing analysis, financial analytics , etc…

Unfortunately, majority of the state-of-the-art DOE generators are part of commercial statistical software packages like JMP (SAS) or Minitab . However, a researcher will surely be benefited if there exists an open-source code which presents an intuitive user interface for generating an experimental design plan from a simple list of input variables. There are a couple of DOE builder Python packages but individually they don’t cover all the necessary DOE methods and they lack a simplified user API, where one can just input a CSV file of input variables’ range and get back the DOE matrix in another CSV file.

This set of codes is a collection of functions which wrap around the core packages (mentioned below) and generate design-of-experiment (DOE) matrices for a statistician or engineer from an arbitrary range of input variables.

Both the core packages, which act as foundations to this repo, are not complete in the sense that they do not cover all the necessary functions to generate DOE table that a design engineer may need while planning an experiment. Also, they offer only low-level APIs in the sense that the standard output from them are normalized numpy arrays. It was felt that users, who may not be comfortable in dealing with Python objects directly, should be able to take advantage of their functionalities through a simplified user interface.

User just needs to provide a simple CSV file with a single table of variables and their ranges (2-level i.e. min/max or 3-level). Some of the functions work with 2-level min/max range while some others need 3-level ranges from the user (low-mid-high). Intelligence is built into the code to handle the case if the range input is not appropriate and to generate levels by simple linear interpolation from the given input. The code will generate the DOE as per user’s choice and write the matrix in a CSV file on to the disk. In this way, the only API user is exposed to are input and output CSV files. These files then can be used in any engineering simulator, software, process-control module, or fed into process equipments.

  • Full factorial,
  • 2-level fractional factorial,
  • Plackett-Burman,
  • Sukharev grid,
  • Box-Behnken,
  • Box-Wilson (Central-composite) with center-faced option,
  • Box-Wilson (Central-composite) with center-inscribed option,
  • Box-Wilson (Central-composite) with center-circumscribed option,
  • Latin hypercube (simple),
  • Latin hypercube (space-filling),
  • Random k-means cluster,
  • Maximin reconstruction,
  • Halton sequence based,
  • Uniform random matrix

How to use it?

First make sure you have all the necessary packages installed. You can simply run the .bash (Unix/Linux) and .bat (Windows) files provided in the repo, to install those packages from your command line interface. They contain the following commands,

Please note that as installed, PyDOE will throw some error related to type conversion. There are two options

  • I have modified the pyDOE code suitably and included a file with re-written functions in the repo. This is the file called by the program while executing, so you should see no error.
  • If you encounter any error, you could try to modify the PyDOE code by going to the folder where pyDOE files are copied and copying the two files doe_factorial.py and doe_box_behnken.py supplied with this repo.

Note this is just a code repository and not a installer package. For the time being, please clone this repo from GitHub , store all the files in a local directory.

git clone https://github.com/tirthajyoti/Design-of-experiment-Python.git

Then start using the software by simply typing,

python Main.py

After this, a simple menu will be printed on the screen and you will be prompted for a choice of number (a DOE) and name of the input CSV file (containing the names and ranges of your variables).

design of experiments python tutorial

You must have an input parameters CSV file stored in the same directory that you are running this code from. You should use the supplied generic CSV file as an example. Please put the factors in the columns and the levels in the row (not the other way around). Couple of example CSV files are provided in the repo. Feel free to modify them as per your needs.

At this time, No . I plan to work on turning this into a full-fledged Python library which can be installed from PyPi repository by a PIP command. But I cannot promise any timeline for that :-) If somebody wants to collaborate and work on an installer, please feel free to do so.

Let’s say the input file contains the following table for the parameters range. Imagine this as a generic example of a checmical process in a plant.

Pressure Temperature FlowRate Time
40 290 0.2 5
55 320 0.3 8
70 350 0.4 11

If we build a full-factorial DOE out of this, we will get a table with 81 entries because 4 factors permuted in 3 levels result in 3^4=81 combinations!

Pressure Temperature FlowRate Time
40 290 0.2 5
50 290 0.2 5
70 290 0.2 5
40 320 0.2 5
50 320 0.2 5
70 320 0.2 5
40 290 0.3 8
50 290 0.3 8
70 290 0.3 8
40 320 0.3 8
50 320 0.3 8
70 320 0.3 8
40 320 0.4 11
50 320 0.4 11
70 320 0.4 11
40 350 0.4 11
50 350 0.4 11
70 350 0.4 11

Clearly the full-factorial designs grows quickly! Engineers and scientists therefore often use half-factorial/fractional-factorial designs where they confound one or more factors with other factors and build a reduced DOE. Let’s say we decide to build a 2-level fractional factorial of this set of variables with the 4th variables as the confounding factor (i.e. not an independent variable but as a function of other variables). If the functional relationship is “A B C BC” i.e. the 4th parameter vary depending only on 2nd and 3rd parameter, the output table could look like,

Pressure Temperature FlowRate Time
40 290 0.2 11
70 290 0.2 11
40 350 0.2 5
70 350 0.2 5
40 290 0.4 5
70 290 0.4 5
40 350 0.4 11
70 350 0.4 11

A Box-Wilson Central Composite Design, commonly called ‘a central composite design,’ contains an imbedded factorial or fractional factorial design with center points that is augmented with a group of ‘star points’ that allow estimation of curvature. One central composite design consists of cube points at the corners of a unit cube that is the product of the intervals [-1,1], star points along the axes at or outside the cube, and center points at the origin. Central composite designs are of three types. Circumscribed (CCC) designs are as described above. Inscribed (CCI) designs are as described above, but scaled so the star points take the values -1 and +1, and the cube points lie in the interior of the cube. Faced (CCF) designs have the star points on the faces of the cube. Faced designs have three levels per factor, in contrast with the other types that have five levels per factor. The following figure shows these three types of designs for three factors. [Read this page] (http://blog.minitab.com/blog/understanding-statistics/getting-started-with-factorial-design-of-experiments-doe) for more information about this kind of design philosophy.

Sometimes, a set of randomized design points within a given range could be attractive for the experimenter to asses the impact of the process variables on the output. Monte Carlo simulations are close example of this approach. However, a Latin Hypercube design is better choice for experimental design rather than building a complete random matrix as it tries to subdivide the sample space in smaller cells and choose only one element out of each subcell. This way, a more ‘uniform spreading’ of the random sample points can be obtained. User can choose the density of sample points. For example, if we choose to generate a Latin Hypercube of 12 experiments from the same input files, that could look like,

Pressure Temperature FlowRate Time
63.16 313.32 0.37 10.52
61.16 343.88 0.23 5.04
57.83 327.46 0.35 9.47
68.61 309.81 0.35 8.39
66.01 301.29 0.22 6.34
45.76 347.97 0.27 6.94
40.48 320.72 0.29 9.68
51.46 293.35 0.20 7.11
43.63 334.92 0.30 7.66
47.87 339.68 0.26 8.59
55.28 317.68 0.39 5.61
53.99 297.07 0.32 10.43

Of course, there is no guarantee that you will get the same matrix if you run this function because this are randomly sampled, but you get the idea!

The code was written in Python 3.6. It uses following external packages that needs to be installed on your system to use it,

  • pydoe: A package designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs. Check the docs here .
  • diversipy: A collection of algorithms for sampling in hypercubes, selecting diverse subsets, and measuring diversity. Check the docs here .

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Design of experiments for Python

relf/pyDOE3

Folders and files.

NameName
153 Commits

Repository files navigation

Pydoe3: an experimental design package for python.

Documentation

pyDOE3 is a fork of the pyDOE2 package that is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs.

This fork came to life to solve bugs and issues that remained unsolved in the original package.

Capabilities

The package currently includes functions for creating designs for any number of factors:

  • General Full-Factorial ( fullfact )
  • 2-level Full-Factorial ( ff2n )
  • 2-level Fractional Factorial ( fracfact )
  • Plackett-Burman ( pbdesign )
  • Generalized Subset Designs ( gsd )
  • Box-Behnken ( bbdesign )
  • Central-Composite ( ccdesign )
  • Latin-Hypercube ( lhs )

See Documentation .

Installation

pyDOE original code was originally converted from code by the following individuals for use with Scilab:

  • Copyright (C) 2012-2013, Michael Baudin
  • Copyright (C) 2012, Maria Christopoulou
  • Copyright (C) 2010-2011, INRIA, Michael Baudin
  • Copyright (C) 2009, Yann Collette
  • Copyright (C) 2009, CEA, Jean-Marc Martinez

pyDOE was converted to Python by the following individual:

  • Copyright (c) 2014, Abraham D. Lee

The following individuals forked pyDOE and worked on pyDOE2 :

  • Copyright (C) 2018, Rickard Sjögren and Daniel Svensson

This package is provided under the BSD License (3-clause)

  • Factorial designs
  • Plackett-Burman designs
  • Box-Behnken designs
  • Central composite designs
  • Latin-Hypercube designs
  • Surowiec, Izabella, Ludvig Vikström, Gustaf Hector, Erik Johansson, Conny Vikström, and Johan Trygg. “Generalized Subset Designs in Analytical Chemistry.” Analytical Chemistry 89, no. 12 (June 20, 2017): 6491–97. https://doi.org/10.1021/acs.analchem.7b00506 .

Contributors 13

  • Python 100.0%

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 1: introduction to design of experiments, overview section  .

In this course we will pretty much cover the textbook - all of the concepts and designs included. I think we will have plenty of examples to look at and experience to draw from.

Please note: the main topics listed in the syllabus follow the chapters in the book.

A word of advice regarding the analyses. The prerequisite for this course is STAT 501 - Regression Methods and STAT 502 - Analysis of Variance . However, the focus of the course is on the design and not on the analysis. Thus, one can successfully complete this course without these prerequisites, with just STAT 500 - Applied Statistics for instance, but it will require much more work, and for the analysis less appreciation of the subtleties involved. You might say it is more conceptual than it is math oriented.

  Text Reference: Montgomery, D. C. (2019). Design and Analysis of Experiments , 10th Edition, John Wiley & Sons. ISBN 978-1-119-59340-9

What is the Scientific Method? Section  

Do you remember learning about this back in high school or junior high even? What were those steps again?

Decide what phenomenon you wish to investigate. Specify how you can manipulate the factor and hold all other conditions fixed, to insure that these extraneous conditions aren't influencing the response you plan to measure.

Then measure your chosen response variable at several (at least two) settings of the factor under study. If changing the factor causes the phenomenon to change, then you conclude that there is indeed a cause-and-effect relationship at work.

How many factors are involved when you do an experiment? Some say two - perhaps this is a comparative experiment? Perhaps there is a treatment group and a control group? If you have a treatment group and a control group then, in this case, you probably only have one factor with two levels.

How many of you have baked a cake? What are the factors involved to ensure a successful cake? Factors might include preheating the oven, baking time, ingredients, amount of moisture, baking temperature, etc.-- what else? You probably follow a recipe so there are many additional factors that control the ingredients - i.e., a mixture. In other words, someone did the experiment in advance! What parts of the recipe did they vary to make the recipe a success? Probably many factors, temperature and moisture, various ratios of ingredients, and presence or absence of many additives.  Now, should one keep all the factors involved in the experiment at a constant level and just vary one to see what would happen?  This is a strategy that works but is not very efficient.  This is one of the concepts that we will address in this course.

  • understand the issues and principles of Design of Experiments (DOE),
  • understand experimentation is a process,
  • list the guidelines for designing experiments, and
  • recognize the key historical figures in DOE.

ADOpy: a python package for adaptive design optimization

  • Published: 08 September 2020
  • Volume 53 , pages 874–897, ( 2021 )

Cite this article

design of experiments python tutorial

  • Jaeyeong Yang 1 ,
  • Mark A. Pitt 2 ,
  • Woo-Young Ahn 1 &
  • Jay I. Myung 2  

4535 Accesses

9 Citations

4 Altmetric

Explore all metrics

Experimental design is fundamental to research, but formal methods to identify good designs are lacking. Advances in Bayesian statistics and machine learning offer algorithm-based ways to identify good experimental designs. Adaptive design optimization (ADO; Cavagnaro, Myung, Pitt, & Kujala, 2010; Myung, Cavagnaro, & Pitt, 2013) is one such method. It works by maximizing the informativeness and efficiency of data collection, thereby improving inference. ADO is a general-purpose method for conducting adaptive experiments on the fly and can lead to rapid accumulation of information about the phenomenon of interest with the fewest number of trials. The nontrivial technical skills required to use ADO have been a barrier to its wider adoption. To increase its accessibility to experimentalists at large, we introduce an open-source Python package, ADOpy, that implements ADO for optimizing experimental design. The package, available on GitHub, is written using high-level modular-based commands such that users do not have to understand the computational details of the ADO algorithm. In this paper, we first provide a tutorial introduction to ADOpy and ADO itself, and then illustrate its use in three walk-through examples: psychometric function estimation, delay discounting, and risky choice. Simulation data are also provided to demonstrate how ADO designs compare with other designs (random, staircase).

Similar content being viewed by others

design of experiments python tutorial

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

design of experiments python tutorial

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Research design: qualitative, quantitative, and mixed methods approaches / sixth edition.

Avoid common mistakes on your manuscript.

Introduction

A main goal of psychological research is to gain knowledge about brain and behavior. Scientific discovery is guided in part by statistical inference, and the strength of any inference depends on the quality of the data collected. Because human data always contain various types of noise, researchers need to design experiments so that the signal of interest (experimental manipulations) is amplified while unintended influences from uncontrolled variables (noise) are still present. The design space, the stimulus set that arises from decisions about the independent variable (number of variables, number of levels of each variable) is critically important for creating a high-signal experiment.

A similarly important consideration is the stimulus presentation schedule during the experiment. This issue is often guided by two competing goals: efficiency and precision. How much data must be collected to be confident that differences between conditions could be found? This question is similar to that asked when performing a power analysis, but is focused on the performance of the participant during the experiment itself. Too few trials yield poor precision (low signal-to-noise ratio); there are simply not enough data to make an inference, for or against a prediction, with confidence. Adding more trials can increase precision along with practice effects. However, it may not be efficient to add too many trials, especially with a clinical population where time is really of the essence and when participants can easily get fatigued or bored. What then is the optimal number of trials that will provide the most precise performance estimates? A partial answer lies in recognizing that not all stimuli are equally informative. By optimizing stimulus selection in the design space, efficiency and precision can be balanced.

Methods of optimizing efficiency and precision have been developed for some experimental paradigms. The most widely used one is the staircase procedure for estimating a threshold (Cornsweet, 1962 ; Feeny et al., 1966 ; Rose et al., 1970 ), such as when measuring hearing or visual acuity. Stimuli differ along a one-dimensional continuum (intensity). The procedure operates by a simple heuristic rule, of which there are a handful of variants: The stimulus to present on one trial is determined by the response on the previous trial. Intensity is increased if the stimulus was not detected, decreased if it was. The experiment is stopped after a given number of reversals in direction has been observed. The staircase method is efficient because the general region of the threshold is identified after a relatively small number of trials, after which the remaining trials concentrate on obtaining a precise threshold estimate. Its ease of implementation and generally good results have made it a popular method across many fields in psychology.

Formal approaches to achieving these same ends (good efficiency and precision) have also been developed. They originated in the fields of optimal experimental design in statistics (Lindley, 1956 ; Atkinson & Donev, 1992 ) and active learning in machine learning (Cohn et al., 1994 ; Settles, 2009 ). In psychology, the application of these methods began in visual psychophysics (e.g., Kontsevich & Tyler, 1999 ), but has since expanded into other content areas (neuroscience, memory, decision making) and beyond. Common among them is the use of a Bayesian decision theoretic framework. The approach is intended to improve upon the staircase method by using not only the participant’s responses to guide the choice of the stimulus on the next trial, but also a mathematical model that is assumed to describe the psychological process of interest (discussed more fully below). The model-based algorithm integrates information from both sources (model predictions and participants’ responses) to present what it identifies as the stimulus that should be most informative on the next trial.

The method developed in our lab, adaptive design optimization (ADO), has been shown to be efficient and precise. For example, in visual psychophysics, contrast sensitivity functions (i.e., thresholds) can be estimated so precisely in 50 trials that small changes in luminance (brightness) can be differentiated (Gu et al., 2016 ; Hou et al., 2016 ). In delayed discounting, precise estimation of the k parameter of the hyperbolic model (a measure of impulsivity) can be obtained in fewer than 20 trials, and the estimate is 3-5 times more precise than the staircase method (Ahn et al., 2019 ). Other applications of ADO can be found in several areas of psychology such as retention memory (Cavagnaro et al., 2010 , 2011 ), risky choice decision (Cavagnaro et al., 2013a , b ; Aranovich et al., 2017 ), and in neuroscience (Lewi et al., 2009 ; DiMattina & Zhang, 2008 , 2011 ; Lorenz et al., 2016 ).

The technical expertise required to implement the ADO algorithm is nontrivial, posing a hurdle to its wider use. In this paper, we introduce an open-source Python package, dubbed ADOpy, that is intended to make the technology available to researchers who have limited background in Bayesian statistics or cognitive modeling (e.g., the hBayesDM package, Ahn et al., 2017 ). Only a working knowledge of Python programming is assumed. Footnote 1 For an in-depth, comprehensive treatment of Bayesian cognitive modeling, the reader is directed to the following excellent sources written for psychologists (Lee & Wagenmakers, 2014 ; Farrell & Lewandowsky, 2018 ; Vandekerckhove et al., 2018 ). ADO is implemented in three two-choice tasks: psychometric function estimation, the delay discounting task (Green & Myerson, 2004 ) and the choice under risk and ambiguity (CRA) task (Levy et al., 2010 ). ADOpy easily interfaces with Python code running one of these tasks, requiring only a few definitions and one function call. Most model parameters have default values, but a simulation mode is provided for users to assess the consequences of changing parameter values. As we discuss below, this is a useful step that we encourage researchers to use to ensure the algorithm is optimized for their test situation.

The algorithm underlying ADO is illustrated in Fig.  1 . It consists of three steps that are executed on each trial of an experiment: (1) design optimization; (2) experimentation; and (3) Bayesian updating. In the first step, we identify the optimal design (e.g., stimulus) of all possible designs, the choice of which is intended to provide the most information about the quantity to be inferred (e.g., model parameters). In Step 2, an experiment is carried out with the chosen experimental design. In Step 3, the participant’s response is used to update the belief about the informativeness of all designs. This revised (updated) knowledge is used to repeat the ADO cycle on the next trial of the experiment.

figure 1

Schematic diagram illustrating the three iterative steps of adaptive design optimization (ADO)

The following section provides a short technical introduction to the ADO algorithm. Subsequent sections introduce the package and demonstrate how to use ADOpy for optimizing experimental design with walk-through examples from three domains: psychometric function estimation, delay discounting, and risky choice. Readers who prefer to concentrate on the practical application of the algorithm rather than its technicalities should skip Section “ Adaptive design optimization (ADO) ” and jump directly to Section “ ADOpy ”.

Adaptive design optimization (ADO)

ADO follows in the tradition of optimal experimental design in statistics (Lindley, 1956 ; Atkinson & Donev, 1992 ) and active learning in machine learning (Cohn et al., 1994 ; Settles, 2009 ). ADO is a model-based approach to optimization in the sense that it requires a quantitative (statistical, cognitive) model that predicts experimental outcomes based on the model’s parameters and design variables (e.g., experimentally controllable independent variables). Statistically speaking, a model is defined in terms of the probability density function (PDF) , Footnote 2 a parametric family of probability distributions indexed by its parameters, denoted by p ( y | 𝜃 , d ), where y represents a vector of experimental outcomes, 𝜃 is the parameter vector, and finally, d is the vector of design variables.

ADO is formulated in a Bayesian framework of optimal experimental design (Chaloner & Verdinelli, 1995 ; Müller, 1999 ; Müller et al., 2004 ; Amzal et al., 2006 ). On each ADO trial, we seek to identify the optimal design d ∗ that maximizes some real-valued function U ( d ) that represents the utility or usefulness of design d . Formally, the “global” utility function U ( d ) (Chaloner & Verdinelli, 1995 ) is defined as:

where p ( 𝜃 ) is the prior distribution. In the above equation, u ( d , 𝜃 , y ), called the “local” utility function, measures the utility of a hypothetical experiment carried out with design d when the model outputs an outcome y given the parameter value 𝜃 . Note that the global utility U ( d ), which is a function of design d , represents the mean of the local utility u ( d , 𝜃 , y ) calculated across all possible outcomes and parameter values, weighted by the likelihood function Footnote 3 p ( y | 𝜃 , d ) and the prior p ( 𝜃 ).

As is typically done in ADO, the ADOpy package adopts an information theoretic framework in which the optimal design is defined as the one that is maximally informative about the unknown quantity of interest, i.e., the values of the parameter 𝜃 in our case. Specifically, by using Shannon’s entropy, a particular local utility function is defined as \(u(d, \theta ,y) = \log \frac {p(\theta |y,d)}{p(\theta )}\) . The global utility function in Eq.  1 becomes the mutual information between the outcome random variable Y ( d ) and the parameter random variable Θ conditional on design d (Cover & Thomas, 1991 ):

where H ( Y ( d )) is the marginal entropy (i.e., overall uncertainty) of the outcome event and H ( Y ( d )| Θ ) is the conditional entropy of the outcome event given the knowledge of the parameter 𝜃 . Footnote 4 Accordingly, the optimal design d ∗ that maximizes the mutual information in Eq.  2 is the one that maximally reduces the uncertainty about the parameters of interest.

Once the optimal design d ∗ is identified, we then conduct an actual experiment on the current trial with the optimal design and observe an experimental outcome y o b s . The prior distribution p ( 𝜃 ) is updated via Bayes rule with this new observation to obtain the posterior distribution p ( 𝜃 | y o b s ), which in turn becomes the new prior on the next trial, i.e., by replacing p ( 𝜃 ) with p ( 𝜃 | y o b s ) in Eq.  1 . This “trilogy scheme” of design optimization, experimentation, and Bayesian updating, depicted in Fig.  1 , is applied successively on each ADO trial until the end of the experiment.

Finding the optimal design d ∗ that maximizes U ( d ) in Eq.  1 is computationally non-trivial as it involves solving a high dimensional maximization and integration problem. As such, obtaining an analytic form solution for the problem is generally not possible; instead, approximate solutions must be sought numerically. For this purpose, the ADOpy package implements a grid-based algorithm for both the design optimization and Bayesian updating steps in Fig.  1 . Implementation of the algorithm requires the discretization of both the continuous parameter and design spaces. That is, each element of the parameter vector 𝜃 and the design vector d is represented as a one-dimensional discretized line with a finite number of grid points. Further, the local utility function u ( d , 𝜃 , y ), the likelihood function p ( y | 𝜃 , d ), and the prior p ( 𝜃 ) are all represented numerically as vectors defined on the grid points.

Figure  2 describes the grid-based ADO algorithm implemented in the ADOpy package in four steps, which is adapted from Bayesian adaptive estimation algorithms in psychophysics (Kontsevich & Tyler, 1999 ; Kujala & Lukka, 2006 ; Lesmes et al., 2006 ). In Step 0, which is performed once at the start of the experiment, the algorithm first creates and stores in memory a look-up table of various functions over all possible (discretized) outcomes and parameter values. This involves pre-computation of the likelihood function p ( y | 𝜃 , d ) and the entropy H ( Y ( d )| 𝜃 ) for all possible values for response y , parameter 𝜃 , and design d . Also, the prior knowledge for model parameter p 0 ( 𝜃 ) is initialized based on researchers’ beliefs, typically from a uniform distribution. The use of pre-computed look-up tables makes it possible to run ADO-based experiments on the fly without additional computational time on each trial. The three steps of the ADO trilogy scheme illustrated in Fig.  1 are then executed.

figure 2

Three steps of a grid-based ADO algorithm with an initial step for pre-computation

In brief, users can find an optimal experimental design with ADO that maximizes information gain. To use it efficiently in an experiment, grid-based ADO discretizes the possible design and parameter spaces and generates pre-computed look-up tables. For a more thorough description of the algorithm, see Cavagnaro et al., ( 2010 ) and Myung et al., ( 2013 ).

In this section, we provide a step-by-step guide on how to use the ADOpy package to compute optimal designs adaptively with walk-through examples. It is assumed that readers are familiar with Python programming and have written experiment scripts using Python or some other language. For further information, the detailed guide on how to use the ADOpy package is also provided on the official documentation ( https://docs.adopy.org ).

ADOpy is designed in a modular fashion to ensure functional flexibility and code readability. At the core of the package are three classes: Task , Model , and Engine . The Task class is used to define design variables of a task. The Model class is used to define model parameters and the probability density (or mass) function that specifies the probability of responses given parameters and designs (e.g., Myung, 2003 ; Farrell and Lewandowsky, 2018 ). The Engine class is used for implementing design optimization and Bayesian updating.

The general workflow of these classes is illustrated in Fig.  3 . After loading the three classes, users should initialize each object, with the engine requiring the most parameters. The for-loop is an experiment itself divided into three parts: 1) obtain the design (stimulus) for the next trials and present the stimulus to the participant; 2) obtain a response from the participant, which would come from a keyboard or mouse, as defined by the experimenter; 3) update the ADO engine using the participant response together with the design.

figure 3

ADOpy workflow. Each function call above is described in greater detail in Section “ Basic usage ”. Note that ADOpy itself is soley the engine for stimulus selection and does not include code to conduct an experiment (e.g., present the stimuli or collect responses, save the data); the user must program these steps

ADOpy implements a grid-search algorithm in which the design space and parameter space are discretized as sets of grid points. How to set grid points and the range of each grid dimension is described in detail in Section “ Basic usage ”.

Owing to the modular structure of ADOpy, users do not have to concern themselves with how the Engine works, other than defining the Task and the Model classes. Consequently, ADOpy dramatically reduces the amount of coding, and the likelihood of coding errors, when implementing ADO.

Prerequisites

Before installing ADOpy, users should install Python (version 3.5 or higher). Using the Anaconda distribution ( https://www.anaconda.com ) is recommended because it ensures compatibility among dependencies.

ADOpy depends on several core packages for scientific computing: NumPy, SciPy, and Pandas. Since ADOpy uses high dimensional matrices to compute optimal designs, it is strongly recommended to install linear algebra libraries (e.g., Intel Math Kernel Library, LAPACK, BLAS) to make the operations fast. If the Anaconda distribution is used, the Intel Math Kernel Library will be used as the default.

Installation

The ADOpy package is available from the Python Package Index (PyPI) and GitHub. The easiest way to install ADOpy is from PyPI using pip as follows:

figure a

To install the developmental version, users can install it from GitHub. However, it can be unstable, so use it with caution.

figure b

To check that ADOpy was installed successfully, run the following code at the Python prompt. As of now, the latest version is 0.3.1 .

Module structure

Inside the ADOpy package, the two most important modules are adopy.base and adopy.tasks . The module adopy.base contains three basic classes: Task , Model , and Engine (see more details in Section “ 2   Basic usage ”). Using these classes, users can apply the ADO procedure into their tasks and models. For convenience, users can load these classes directly from adopy itself as follows:

figure c

The other module, adopy.tasks , contains three pre-implemented tasks and models (see Section “ Tasks and Models implemented in ADOpy2 ” and Table  1 ). The three tasks are psychometric function estimation ( adopy.tasks.psi ), the delay discounting task ( adopy.tasks.ddt ), and the choice under risk and ambiguity task ( adopy.tasks.cra ).

Basic usage

Implementation of ADOpy requires execution of the four steps shown in Fig.  3 , the most important and complex of which is the Initialization step, in which ADOpy objects to be used in the subsequent steps are defined. The Initialization step itself comprises four sub-steps: defining a task, defining a model, defining grids, and initializing an ADO engine. In this section, we explain the coding involved in each of these sub-steps using the delay discounting task as an example.

Defining a task

The Task class is for defining the experimental task. Using the Task class, a task object is initialized by specifying three types of information: the name of the task ( name ), the design variables ( designs ), and the response variable ( responses ).

Delay discounting (DD; the task is depicted in Fig.  4 ), refers to the well-established finding that animals, including humans, tend to discount the value of a delayed reward such that the discount progressively increases as a function of the receipt delay (e.g., Green & Myerson, 2004 ; Vincent, 2016 ). The delay discounting task has been widely used to assess individual differences in temporal impulsivity and is a strong candidate endophenotype for addiction (Green & Myerson, 2004 ; Bickel, 2015 ). In a typical DD task, a participant is asked to indicate his/her preference between two options, a smaller-sooner (SS) option (e.g., 8 dollars now) and a larger-later (LL) option (e.g., 50 dollars in a month). Let us use a formal expression ( R S S , t S S ) to denote the SS option where R S S represents the reward amount, and t S S represents the receipt delay. Similarly, ( R L L , t L L ) denotes the LL option. By definition, the following constraints are imposed on the reward amounts and the delay times: R S S < R L L and t S S < t L L for a given pair of options. The choice response is recorded as either y = 1 (LL option) or y = 0 (SS option).

figure 4

Illustrated scheme of the delay discounting (DD) task. On each trial, a participant is asked to choose between two options, a smaller-sooner (SS) option on the left and a larger-later (LL) option on the right. The dotted lines and arrows indicate the design variables of the task to be optimized

The DD task therefore has four design variables, i.e., d = ( t S S , t L L , R S S , R L L ), with a binary response on each trial (i.e., 0 or 1). As such, we define a Task object for the DD task as follows:

figure d

where the four symbols ( t_ss , t_ll , r_ss , r_ll ) denote short notations for the respective design variables ( t S S , t L L , R S S , R L L ). Note that designs argument should be specified as labels for design variables, while responses argument should be given as possible values of responses.

With the task object defined, the information passed into the object can be accessed by task.name , task.designs , and task.responses , respectively:

figure e

Defining a model

Before making a model object, users should define a function that describes how to compute the response probability given design variables and model parameters. For example, the hyperbolic model for the delay discounting task is defined with the following set of equations:

where P (LL over SS) denotes the probability of choosing the LL option over the SS option, and V L L and V S S denote subjective value estimates for the LL and SS options respectively. There are two model parameters: k represents the discounting rate and τ represents the inverse temperature that measures the consistency or stability in choice responses. For further details about the above model, the reader is referred to Section “ Delay discounting task ”.

Based on the above model, the following Python snippet computes the response probability:

figure f

The argument names for design variables in the above function definition must be the same as those used in the task definition (i.e., t_ss , r_ss , t_ll , r_ll ). We also recommend using NumPy functions for the definition, given that it can vectorize basic mathematical operations.

Specification of a mathematical model is performed by the Model class. Four arguments are required: the name of the model ( name ), a task object related to the model ( task ), labels of model parameters ( params ), and the response probability of the model ( func ), which in the current case is defined by the function compute_likelihood() . In terms of these arguments, a model object is defined as below:

figure g

As in the task object, the information passed into the model object can be accessed by model.name , model.task , and model.params :

figure h

Further, users can run the response probability passed into the model object by model.compute() , which uses the same arguments that are used for the compute_likelihood() function, as follows:

figure i

Defining grids

As mentioned earlier, ADOpy implements a grid-based algorithm that requires the discretization of both parameter and design spaces. As such, before running ADO using model and task objects, users must specify the grid resolution to be used for the design optimization and Bayesian updating steps in Fig.  1 . This amounts to defining the number and spacing of grid points on each dimension of the design and parameter variables. The grid passed to the ADO engine determines (1) the range of values in design variables that the ADO engine can suggest and (2) the range of the model parameters over which the computations will be carried out.

It is important to note that the number of grid points affects the efficiency and reliability of parameter estimation. The more sparse the grid, the more efficient but less precise parameter estimation will be; the denser the grid, the more precise but less efficient parameter estimation will be. Specifically, sparse grids can lead to poorly estimated model parameters whereas dense grids can require large amounts of memory and long computing times. Thus, before conducting an ADO-based experiment with participants, it is worth identifying the optimal grid resolution for each parameter/design variable. A simulation mode provided with ADOpy can help facilitate this process.

A grid object for ADOpy can be defined as a Python dictionary object by using the name of a variable as its key and a list of the grid points as its values. If a design variable or model parameter needs to be fixed to a single value, users would simply assign a single grid point for the variable. Also, to restrict the values of a variable, users can manually make a matrix in which each column vector indicates possible values for the variable, then pass it as a value with a key of the column labels. Example codes below illustrate various ways of defining the grids for two design variables, t_ss and t_ll :

figure j

In much the same way, users can also define a grid for model parameters. For example, a grid for the two parameters of the delay discounting model in Eq.  3 , k and tau , can be defined as:

figure k

The reader is directed to Appendix  A for more examples for defining grids for the delay discounting task.

Initializing an ADO engine

With the defined Model and Task classes and grids for design and parameter variables, users are now ready to load an Engine for ADO computation. It requires four arguments: (1) the task object ( task ); (2) the model object ( model ); (3) a grid for design variables ( grid_design ); and (4) a grid for model parameters ( grid_param ):

figure l

When initializing an instance of Engine , it pre-computes response probabilities and mutual information for a given sets of designs and parameters. This step may take a while, with linearly increasing computing time in proportion to the number and resolution of the grids. For the three examples provided here, compute time is usually less than two seconds on an average Mac or Windows computer.

Once the engine object is in place, users can access its task objects: the exhaustive list of task objects is ( engine.task ), its model object ( engine.model ), the number of possible pairs on design variables ( engine.num_design ), the number of possible pairs on model parameters ( engine.num_param ), the grid matrix of design variables ( engine.grid_design ), the grid matrix of model parameters ( engine.grid_param ), the prior distribution on the grid matrix of model parameters ( engine.prior ), the posterior distribution on the grid matrix of model parameters ( engine.post ), the posterior mean ( engine.post_mean ), the covariance matrix of the posterior ( engine.post_cov ), and the standard deviations of the posterior ( engine.post_sd ).

Two functions are available in ADOpy for the engine object: engine.get_design() and engine.update() . The engine.get_design() provides a set of designs on each trial of the experiment given a specified design type. With an argument of design_type , users can indicate the type of design to use. There are two possible values: 'optimal' and 'random' . The value 'optimal' refers to the optimal design calculated by the ADO algorithm, and the value 'random' to a uniformly sampled design from the given design grid. The output of this function call is a dictionary that contains key-value pairs for each design variable and its optimal or random value. If no argument is given for design_type , the optimal design is returned by default.

figure m

The other important use of the engine object is update() . Here, ADOpy first performs the Bayesian updating step described in Figs.  1 and  2 based on a participant’s response given the design, and then computes a new optimal design for the next trial using the updated posterior distributions of model parameters. It takes two arguments: the design used on the given trial ( design ), and the corresponding response on that trial ( response ). For example, from the observation that a participant selects the SS option ( response = 0 ) or the LL option ( response = 1 ) on the current trial, users can update the posterior as follows:

figure n

Simulating responses

ADOpy can be run in the simulation mode to assess design quality and experiment efficiency (see next section).The design itself, the model chosen, and the grid resolution of the design space. and model parameters all affect how ADO performs. Simulation mode can be useful to fine-tune the aforementioned variables. Using the engine object of the ADOpy package, users can generate simulated responses given true parameters. As a concrete example, let us run the simulation with true parameter values of k = 0.12 and tau = 1.5 of the delay discounting model described in Eq.  3 . To acquire a simulated response, we use the Bernoulli probability distribution for a binary choice response as described below:

figure o

With the functions and objects defined as above, we can now run the simulations with a code block like this:

figure p

Note that the above code block contains the by-now familiar trilogy: design optimization, experimentation, and Bayesian updating, in the same way done in an actual ADO-based experiment as described in Fig.  1 .

Practical issues

Users should carefully consider several practical issues when using ADOpy. Grid-based ADO, which is what is used here, may demand a lot of memory. While pre-computing a look-up table lessens repeated calculation between trials, it requires more and more memory as the grid size increases. Thus, users are advised to first determine the proper number of grid points on each dimension of the model parameters and design variables and to check if computation time with the settings is suitable (i.e., fast enough to prevent boredom between trials). For example, by varying grid resolution, users can assess the trade-off in estimation accuracy and the computational cost of that resolution. Another option is to use a dynamic gridding algorithm, in which the grid space is dynamically adjusted and grid points near posterior means are more finely spaced. Adaptive mesh refinement (AMR: e.g., Berger, 1984 ) is one such method. ADOpy does not currently support dynamic-gridding; it may in the future.

A related practical issue is the computation time required to complete Step 0 in Fig.  2 , in which initial lookup tables need to be created for the likelihood function and the entropy for all possible values of the response, parameter, and design variables. As noted above, it has been our experience that this step usually takes no more than a few seconds on standard laptops and PCs. To be concrete, for the delay discounting task, it takes \(\sim \) 0.5 seconds on an iMac and 1 \(\sim \) 2 seconds on a Windows PC to execute the pre-computation step. However, this step can become progressively time-inefficient as the dimensionality of the experimental task increases. In such a case, we recommend to use the pickle module of Python for saving the lookup tables and then loading them back at the start of an experiment with each new participant. Other means of ensuring sufficiently fast computation are using linear algebra libraries (e.g., Intel MKL, LAPACK, or BLAS), which are highly efficient and can take advantage of multi-core CPUs, or using a remote server or a cloud computing system, where optimal designs are computed asynchronously.

ADOpy will eventually start to select the same or similar design on consecutive trials. This is a sign that not much more can be learned from the experiment (e.g., parameter estimation is quite good). This will happen toward the end of an experiment if there are sufficient trials. One option to address the issue is to dilute their presence by using filler trials, showing randomly chosen or predetermined designs for a trial when ADO picks the same design twice or more in a sequence. Another option is to run the experiment in a “self-terminating mode”; stop the experiment once a specific criterion (e.g., efficiency) is reached, e.g., the standard deviations of posterior distributions fall below certain predetermined values.

The focus of this tutorial is on using ADOpy for univariate and discrete responses. One might wonder how to extend it to multivariate and continuous responses, e.g., reaction times in a lexical decision task. Implementation is much the same as in the univariate continuous case. That is, given a multivariate continuous response vector y = ( y 1 , y 2 ,..., y m ), first discretize each response variable y i into finite grids, and then pre-compute the likelihood function p ( y | 𝜃 , d ) for all discretized values of y i ’s, 𝜃 , and d in the pre-computation Step 0 in Fig.  2 . From there, the remaining steps of the ADO algorithm are the same and straightforward.

Tasks and Models implemented in ADOpy

Currently, three tasks are implemented in the ADOpy package; they are listed in Table  1 : Psychometric function estimation ( adopy.tasks.psi ), the delay discounting task ( adopy.tasks.dd ), the choice under risk and ambiguity task ( adopy.tasks.cra ). At least two models are available for each task.

In this section, we describe these tasks and illustrate how to use each task/model in ADOpy and how ADO performs compared to traditional non-ADO (e.g., staircase, random) methods, along with simulated results for the three tasks. In addition, we provide and discuss a complete and full Python script for simulating psychometric function estimation in ADOpy.

  • Psychometric function estimation

Psychometric function estimation is one of the first modeling problems in the psychological sciences in which a Bayesian adaptive framework was applied to improve the efficiency of psychophysical testing and analysis (Watson & Pelli, 1983 ; King-Smith et al., 1994 ; Kujala & Lukka, 2006 ; Lesmes et al., 2006 ). The problem involves a 2-alternative forced choice (2AFC) task in which the participant decides whether a psychophysical stimulus, visual or auditory, is present or absent while the stimulus intensity is varied from trial to trial to assess perceptual sensitivity.

The psychometric function that defines the probability of correct detection given stimulus intensity x is given as the following general form (Garcia-Perez, 1998 ; Wichmann & Hill, 2001 ):

The participant’s response in the psychophysical task is recorded in either y = 1 (correct) or y = 0 (incorrect). The two-parameter sigmoid function F ( x ; α , β ) that characterizes the relationship between the response probability and the stimulus intensity is typically assumed to follow the logistic, cumulative normal, or cumulative log Weibull form (see, e.g., Wichmann & Hill, 2001 , for further details). The parameter vector 𝜃 = ( α , β , γ , δ ) of the psychometric function consists of α (threshold), β (slope), γ (guess rate) and δ (lapse rate), as depicted in Fig.  5 . Note that design variable is stimulus intensity, i.e., d = x .

figure 5

The psychometric function and its parameters defined in Eq.  4

The module ‘ adopy.tasks.psi ’ included in the ADOpy package provides classes for psychometric function estimation in the 2AFC experimental paradigm (see Table  1 ). In the module, Task2AFC is pre-defined for 2AFC tasks with a single design variable ( stimulus ) and binary responses (0 for incorrect or 1 for correct). Without passing any arguments, users can utilize the pre-defined Task2AFC class as below:

figure q

For the task, users can specify the form of the two parameter sigmoid psychometric function F ( x ; α , β ) as in Eq.  4 from three classes: a logistic function ( ModelLogistic ), a log Weibull CDF ( ModelWeibull ), and a normal CDF ( ModelProbit ). Here, assume that the psychometric function has a logistic form which computes correct detection as:

Based on Eq.  5 , the ModelLogistic class in the adopy.tasks.psi provides the equivalent model with four parameters (threshold α , slope β , guess rate γ , and lapse rate δ ).

figure r

As grid resolutions for the task and model, we provide an example code while fixing guess rate to 0.5 and lapse rate to 0.04 as described below. Especially for stimulus and threshold , users should define them within appropriate ranges for their tasks of interest.

figure s

Based on the task object, model object, and grids, the module adopy.tasks.psi provides an Engine class, called EnginePsi , pre-implemented for psychometric function estimation. The EnginePsi class not only provides an optimal design or randomly chosen design, but also computes a design using the staircase method. The staircase method is probably the most commonly used procedure in adaptive estimation of the psychometric function (e.g., Garcia-Perez, 1998 ) in which stimulus intensity is adjusted by a fixed and pre-determined amount based on a participant’s response on the current stimulus. The following code initializes the engine and computes designs:

figure t

where EnginePsi requires only three arguments ( model , designs , and params ) since the task is fixed to the psychometric function estimation.

The particular up/down scheme of the staircase method implemented in ‘ EnginePsi ’ is as follows: Footnote 5

where Δ is a certain amount of change for every trial. EnginePsi has a property called d_step to compute Δ , which means the number of steps for an index on the design grid. In other words, the denser the design grid is, the smaller Δ becomes. Initially, d_step is set to 1 by default, but users can use a different value as described below:

figure u

Having defined and initialized the required task, model, grids, and engine objects, we are now in a position to generate simulated binary responses. This is achieved by using the module scipy.stats.bernoulli . Here, the data-generating parameter values are set to guess_rate = 0.5, lapse_rate = 0.04, threshold = 20, and slope = 1.5:

figure v

Finally, the following example code runs 60 simulation trials:

figure w

We conclude this section with a brief presentation of simulation results, comparing performance among three design conditions: ADO, staircase, and random (see Appendix  B.1 for the details of the simulation setup). The simulation results are summarized in Fig.  6 . As shown in Fig.  6 a, for all three conditions, the estimation of the threshold parameter α , as measured by root mean square error (RMSE), converges toward the ground truth, with ADO designs exhibiting clearly superior performance over staircase and random designs. As for the slope parameter β , the convergence is much slower (ADO and staircase) or even virtually zero (random). Essentially the same patterns of results are observed when performance is measured by the posterior standard deviation (Fig.  6 b). In short, the simulation demonstrates the advantage of using ADO designs in psychometric function estimation.

figure 6

Comparison of ADO, staircase, and random designs in the simulation of psychometric function estimation. Simulations were conducted using the logistic model with parameter values of threshold α = 20, slope β = 1.5, guess rate γ = 0.5, and lapse rate δ = 0.04. The three designed are compared with root mean squared errors (RMSE; Panel A) and standard deviations of the posterior distribution (Panel B). RMSE represents the discrepancy between true and estimated parameters in that the lower RMSE, the better estimation performance. Standard deviations of the posterior distribution indicate the certainty of a belief on the distribution for model parameters, i.e., the lower the standard deviations is, the higher certainty on the model parameters. Each curve represents an average across 1,000 independent simulation runs

Delay discounting task

There exists a sizable literature on computational modeling of delay discounting (e.g., Green & Myerson, 2004 ; Van-DenBos & McClure, 2013 ; Cavagnaro et al., 2016 ). As described earlier in Section “ Basic usage ”, preferential choices between two options, SS (smaller-sooner) and LL (larger-later), are made based on the subjective value of each option, which takes the following form:

where V is the value of an option, R and t are the amount of reward and delay of the option respectively, and D ( t ) is the discounting factor assumed to be a monotonically decreasing function of delay t .

Various models for the specific form of D ( t ) have been proposed and evaluated, including the ones below:

where the parameter k is a discounting rate and the parameter s reflects the subjective, nonlinear scaling of time (Green & Myerson, 2004 ). Based on subjective values of options, it is assumed that preferential choices are made stochastically depending on the difference between the subjective values, according to Eq.  3 . In summary, the models for the delay discounting task assume at most three parameters with 𝜃 = ( k , s , τ ), and there are four design variables that can be optimized, i.e., d = ( t S S , t L L , R S S , R L L ). The participant’s choice response on each trial is binary in y = 1 (LL option) or 0 (SS option).

The module ‘ adopy.tasks.dd ’ included in the ADOpy package provides classes for the delay discounting task (see Table  1 ). TaskDD represents the DD task with four design variables ( t_ss , t_ll , r_ss , and r_ll ) with a binary choice response.

figure x

In addition, the same module ‘ adopy.tasks.dd ’ includes six models (see Table  1 ): Exponential model (Samuelson, 1937 ), Hyperbolic model (Mazur, 1987 ), Hyperboloid model (Green & Myerson, 2004 ), Constant Sensitivity model (Ebert & Prelec, 2007 ), Quasi-Hyperbolic model (Laibson, 1997 ), and Double Exponential model (McClure et al., 2007 ). Here, we demonstrate the Hyperbolic model which has two model parameters ( k and tau ) and computes the discounting factor as in Eq.  8 :

figure y

A simulation experiment like that for Psychometric function estimation was carried out with the hyperbolic model, and the results from three designs (ADO, staircase, and random). See Appendix  B.2 for the details of the simulation setup and the Python scripts used. The simulation results are presented in Fig.  7 . As the trial progresses, the discounting rate parameter k converges toward the ground truth for all three design conditions, with the swiftest (almost immediate) convergence with ADO. On the other hand, the inverse temperature parameter τ showed a much slower or even no convergence (staircase), probably due to the relatively small sample size (i.e., 42). In short, the simulation results, taken together, demonstrated the superiority of ADO designs over non-ADO designs.

figure 7

Comparison of ADO, staircase, and random designs in the simulation of the delay discounting task. Simulations were conducted using the hyperbolic model with parameter values of k = 0.12 and τ = 1.5. The three designs are compared with root mean squared errors (RMSE; Panel A) and standard deviations of the posterior distribution (Panel B). Each curve represents an average across 1,000 independent simulation runs

Choice under risk and ambiguity task

The choice under risk and ambiguity (CRA) task (Levy et al., 2010 ) is designed to assess how individuals make decisions under two different types of uncertainty: risk and ambiguity. Example stimuli of the CRA task are shown in Fig.  8 .

figure 8

Illustrated scheme of the choice under risk and ambiguity (CRA) task. The participant chooses one of two options on either a risky trial (left) or an ambiguous trial (right). A risky option has the amount of reward and a probability of winning the reward indicated by the upper, brown proportion of the box. For an ambiguous option, the probability to win is not explicitly shown but partially blocked by a gray box. On each trial, a risk or ambiguous option is always paired with a fixed (reference) option whose probability of winning the reward is set to 0.5

The task involves preferential choice decisions in which the participant is asked to indicate a preference between two options: (1) winning either a fixed amount of reward denoted by R F with a probability of 0.5 or winning none otherwise; and (2) winning a varying amount of reward ( R V ) with a varying probability ( p V ) or winning none otherwise. Further, the variable option comes in two types: (a) risky type in which the winning probabilities are fully known to the participant; and (b) ambiguous type in which the winning probabilities are only partially known to the participant. The level of ambiguity ( A V ) in the latter type is varied between 0 (no ambiguity and thus fully known) and 1 (total ambiguity and thus fully unknown). As a concrete example, the CRA task of Levy et al., ( 2010 ) employed the following values: R F = 5 (reference option); R V ∈{5,9.5,18,34,65}, p V ∈{0.13,0.25,0.38} and A V = 0 (variable options on risky trials); and finally, R V ∈{5,9.5,18,34,65}, p V = 0.5 and A V ∈{0.25,0.5,0.75} (variable options on ambiguity trials).

The linear model (Levy et al., 2010 ) for the CRA task assumes that choices are based on subjective values of the two options. The subjective values are computed using the following form:

where U F and U V are subjective values for fixed and variable options respectively, α is the risk attitude parameter, β is the ambiguity attitude parameter. R F and R V are the amounts of reward for fixed and variable options, A V and p V are the ambiguity level and the probability to win for a variable option. Both choices are made stochastically based on the difference between the subjective values according to the softmax choice rule:

where P ( V o v e r F ) represents the probability of choosing the variable option over the fixed one, and the parameter γ represents the inverse temperature that captures the participant’s response consistency.

To summarize, the CRA model assumes three parameters, 𝜃 = ( α , β , γ ), of α (risk attitude), β (ambiguity attitude), and γ (response consistency). There are four design variables to be optimized: d = ( R F , R V , A V , p V ) where R F > 0, R V > 0, 0 < A V < 1, and 0 < p V < 1 is made up of R F (reward amount for fixed option), R V (reward amount for variable option), A V (ambiguity level) and p V (winning probability for variable option). The participant’s preferential choice on each trial is recorded in either y = 1 (variable option) or y = 0 (fixed option).

The module ‘ adopy.tasks.cra ’ in the ADOpy package provides classes for the choice under risk and ambiguity task (see Table  1 ). TaskCRA represents the CRA task with four design variables denoted by p_var ( p V ), a_var ( A V ), r_var ( R V ), and r_fix ( R F ), and a binary choice response.

figure z

ADOpy currently implements two models of the CRA task: Linear model (Levy et al., 2010 ) and Exponential model (Hsu et al., 2005 ). For the linear model in Eq.  9 , users can define and initialize the model with ModelLinear as:

figure aa

Now, we briefly discuss results of simulated experiments using the linear model with three design conditions: ADO, fixed, and random design. The fixed design refers to those originally used by Levy et al., ( 2010 ). See Appendix  B.3 for the details of the simulation setup and code. The results summarized in Fig.  9 indicate that two parameters, α (risk attitude) and β (ambiguity attitude), converged to their respective ground truth most rapidly under the ADO condition. On the other hand, the inverse temperature parameter ( γ ) showed little, if any, convergence for any of the designs, probably due to the relatively small sample size (i.e., 60).

figure 9

Comparison of ADO, fixed, and random designs in the simulation of the choice under risk and ambiguity task. The fixed design was pre-determined according to Levy et al., ( 2010 ). Simulations were conducted using the linear model with parameter values of α = 0.66, β = 0.67, and γ = 3.5. Three designed are compared with root mean squared errors (RMSE; Panel A) and standard deviations of the posterior distribution (Panel B). Each curve represents an average across 1,000 independent simulation runs

Integrating ADOpy with experiments

In this section we describe how to integrate ADOpy into a third-party Python package for conducting psychological experiments, such as PsychoPy (Peirce, 2007 ; 2009 ), OpenSesame (Mathôt et al., 2012 ), or Expyriment (Krause and Lindemann, 2014 ). Integration is accomplished following a two-step procedure described below.

First, users should create and initialize an ADOpy Engine object. This corresponds to the initialization step illustrated in Fig.  3 . Users can create their own task and model as described in Section “ ADOpy ” or use pre-implemented tasks and models in ADOpy (see Section “ Tasks and Models implemented in ADOpy ”). Remember that the number of design variables, model parameters, and the grid sizes affect the computation time, so users should ensure the appropriateness of their choice of grid sizes, for example, by running simulations as described in Section “ Practical issues ”.

Second, users should integrate this code into the code for a running experiment. The interface between the two requires collecting observations from a participant using a computed optimal design and updating the engine on each trial with the collected response. ‘ run_trial ’ is an experimenter-created function for data collection. It takes as arguments the given design values on each trial, and then returns the participant’s response. This function, ‘ run_trial ’, can be used for both simulated and real data. Users can also run run_trial within a for-loop to conduct an ADO experiment in multiple trials as shown below:

figure ab

Note that the three lines inside the for-loop correspond to the three steps in Fig.  1 .

In what follows, we elaborate and illustrate how to run ADOpy in the DD task, using a fully worked-out annotated Python script (Appendix  C ). Users new to ADO will find the PsychoPy program in the appendix without any modification of the code after installing ADOpy and PsychoPy. The program runs the DD task using optimal designs computed by ADOpy. A short description for the ADO-powered DD task is provided below, while the non-ADO version is available on the Github repository of ADOpy. Footnote 6

To utilize ADO on the program, we first need to load the ADOpy classes, the DD task and the model of our choice (hyperbolic in this case). We could have chosen a different model or defined one by ourselves and used it (see lines 58–61 in Fig.  10 ).

figure 10

Main codes for running the delay discounting task with ADOpy, from a fully work-out annotated script in Appendix  C

To run the DD task, we define a function run_trial that conducts an experiment using a given design on a single trial (see Appendix  C , lines 250–288). Then, for the initialization step, Task , Model and Engine objects should be initialized. As in Section “ Delay discounting task ”, users can use the implemented task and models for the DD task (lines 329–357 in Fig.  10 ).

Once the engine is created, the code to run the ADO-based version is actually simpler than the non-ADO version (lines 420–429 in Fig.  10 ; see lines 435–460 for the non-ADO version on the Github repository). Using the Engine class of the ADOpy package, it finds the optimal design and updates itself from observation with a single line of code for each.

ADOpy is a toolbox for optimizing design selection on each trial in real time so as to maximize the informativeness and efficiency of data collection. The package implements Bayesian adaptive parameter estimation for three behavioral tasks: psychometric function estimation, delay discounting, and choice under risk and ambiguity. Each task can be run in an ADO-based mode or a non-ADO-based mode (random, fixed, staircase depending on the task). Default parameter and design values can be used, or the user can customize these settings, including the number of trials, the parameter ranges, and the grid resolution (i.e., number of grid points on each parameter/design dimension). Furthermore, in addition to conducting an actual experiment with participants, the package can be used to run parameter recovery simulations to assess ADO’s performance. Is it likely to be superior (i.e., more precise and efficient) to random and other (staircase, fixed) designs? By performing a comparison as described in the preceding section, a question like this one can be answered. Causes for unsatisfactory performance can be evaluated, such as altering grid resolution or the number of trials. More advanced users can conduct Bayesian sensitivity analysis on the choice of priors.

The need to tune ADO to a given experimental setup might make readers leery of the methodology. Shouldn’t it be more robust and work flawlessly in any setting without such fussing? Like any machine-learning method, use of ADO requires parameter tuning to maximize performance. ADOpy’s simulation mode is an easy and convenient way to explore how changes in the design and grid resolution alter ADO’s performance. Experimenter-informed decisions about the properties of the design space will result in the greatest gains in an ADO experiment.

Use of ADOpy is not limited to the models that come with the package. Users can define their own model using the Model class. Specification of the model’s probability density (or mass) function is all that is required along with the parameters, including any changes to the design space, as mentioned above. For example, it would be straightforward to create ADO-based experiments for other behavioral tasks, such as the balloon analog risk task (BART: Lejuez et al., 2002 ; Wallsten et al., 2005 ) for assessing risk-taking propensity.

The ADOpy package, as currently implemented, has several limitations. ADOpy cannot optimize the selection of design variables that are not expressed in the probability density (or mass) function of the model. For example, if a researcher is interested in learning how degree of distractibility (low or high level of background noise) impacts decision making, unless this construct were factored into the model as a design variable, ADOpy would not optimize on this dimension. This limitation does not prevent ADO from being used by the researcher; it just means that the experiment will not be optimized on that stimulus dimension. Another limitation that users must be sensitive to is the memory demands of the algorithm. As discussed earlier, the algorithm creates a pre-computed look-up table of all possible discretized combinations of the outcome variable, the parameters, and the design variables. For example, for 100 grid points defined on each outcome variable, three parameters, and three design variables, the total memory demand necessary to store the look-up table would be 10 14 bytes (= 100 1 + 3+ 3 ), i.e., 100 terabytes, assuming one byte allotted for storing each data point. This is clearly well beyond what most desktops or servers can handle. In short, as the dimensionality of the ADO problem increases linearly, the memory demand of the grid-based ADO algorithm grows exponentially, sooner or later hitting a hardware limitation. Grid-based ADO does not scale up well, technically speaking. The good news is that there is a scalable algorithm that does not tax memory. It is known as sequential Monte Carlo (SMC) or particle filter in machine learning (Doucet et al., 2001 ; Andrieu et al., 2003 ; Cappe et al., 2007 ).

In conclusion, the increasing use of computational methods for analyzing and modeling data is improving how science is practiced. ADOPy is a novel and promising tool that has the potential to improve the quality of inference in experiments. This is accomplished by exploiting the predictive precision of computational modeling in conjunction with the power of statistical and machine learning algorithms to perform better inference. It is our hope that ADOpy will empower more researchers to harness this technology, one outcome of which should be more informative and efficient experiments that collectively accelerate advances in psychological science and beyond.

ADOpy is available at https://github.com/adopy/adopy .

The probability density function (PDF) for a continuous response variable, or the probability mass function (PMF) for a discrete response variable, refers to the probability of observing a response outcome given a fixed parameter value and is therefore a function defined over the set of possible outcomes.

The likelihood function represents the “likeliness” of the parameter given a fixed specific response outcome as a function over the set of possible parameter values. Specifically, the likelihood function is obtained from the same equation as the probability density function (PDF) by reversing the roles of y and 𝜃 .

See Step 1 in Fig.  2 for specific equations defining the entropy measures in Eq.  2 .

For those interested, see https://www.psychopy.org/api/data.html for other implementations of staircase algorithms in PsychoPy (Peirce, 2007 ; 2009 ).

https://github.com/adopy/adopy/tree/master/examples

Ahn, W.-Y., Gu, H., Shen, Y., Haines, N., Hahn, H., Teater, J. E., ..., Pitt, M. A. (2019). Rapid, precise, and reliable phenotyping of delay discounting using a Bayesian learning algorithm. bioRxiv.

Ahn, W.-Y., Haines, N., & Zhang, L. (2017). Revealing neurocomputational mechanisms of reinforcement learning and decision-making with the hbayesdm package. Computational Psychiatry , 1 , 24–57.

Article   PubMed   Google Scholar  

Amzal, B., Bois, F. Y., Parent, E., & Robert, C. P. (2006). Bayesian-optimal design via interacting particle systems. Journal of the American Statistical Association , 101 (474), 773–785.

Article   Google Scholar  

Andrieu, C., DeFreitas, N., Doucet, A., & Jornan, M. J. (2003). An introduction to MCMC for machine learning. Machine Learning , 50 , 5–43.

Aranovich, G. J., Cavagnaro, D. R., Pitt, M. A., Myung, J. I., & Mathews, C. A. (2017). A model-based analysis of decision making under risk in obsessive-compulsive and hoarding disorders. Journal of Psychiatric Research , 90 , 126–132.

Article   PubMed   PubMed Central   Google Scholar  

Atkinson, A., & Donev, A. (1992) Optimum experimental designs . London: Oxford University Press.

Google Scholar  

Berger, M. J. (1984). Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics , 53 , 484–512.

Bickel, W. K. (2015). Discounting of delayed rewards as an endophenotype. Biological Psychiatry , 77 (10), 846–847.

Cappe, O., Godsill, S. J., & Moulines, E. (2007). An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE , 95 (5), 899–924.

Cavagnaro, D. R., Aranovich, G. J., McClure, S. M., Pitt, M. A., & Myung, J. I. (2016). On the functional form of temporal discounting: An optimized adaptive test. Journal of Risk & Uncertainty , 52 , 233–254.

Cavagnaro, D. R., Gonzalez, R., Myung, J. I., & Pitt, M. A. (2013a). Optimal decision stimuli for risky choice experiments: An adaptive approach. Management Science , 59 (2), 358–375.

Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala, J. V. (2010). Adaptive design optimization: A mutual information based approach to model discrimination in cognitive science. Neural Computation , 22 (4), 887–905.

Cavagnaro, D. R., Pitt, M. A., Gonzalez, R., & Myung, J. I. (2013b). Discriminating among probability weighting functions using adaptive design optimization. Journal of Risk and Uncertainty , 47 , 255–289.

Cavagnaro, D. R., Pitt, M. A., & Myung, J. I. (2011). Model discrimination through adaptive experimentation. Psychonomic Bulletin & Review , 18 (1), 204–210.

Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science , 10 (3), 273–304.

Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning , 15 (2), 201–221.

Cornsweet, T. N. (1962). The staircase-method in psychophysics. The American Journal of Psychology , 75 (3), 485–491.

Cover, T. M., & Thomas, J. A. (1991) Elements of information theory . Hoboken: Wiley.

Book   Google Scholar  

DiMattina, C., & Zhang, K. (2008). How optimal stimuli for sensory neurons are constrained by network architecture. Neural Computation , 20 , 668–708.

DiMattina, C., & Zhang, K. (2011). Active data collection for efficient estimation and comparison of nonlinear neural models. Neural Computation , 23 , 2242–2288.

Doucet, A., De Freitas, N., & Gordon, N. (2001) Sequential Monte Carlo methods in practice . Berlin: Springer.

Ebert, J. E., & Prelec, D. (2007). The fragility of time: Time-insensitivity and valuation of the near and far future. Management Science , 53 (9), 1423–1438.

Farrell, S., & Lewandowsky, S. (2018) Computational modeling of cognition and behavior . Cambridge: Cambridge University Press.

Feeny, S., Kaiser, P. K., & Thomas, J. P. (1966). An analysis of data gathered by the staircase-method. The American Journal of Psychology , 79 (4), 652–654.

Garcia-Perez, M. A. (1998). Forced-choice staircases with fixed step sizes: Asymptotic and small-samples properties. Vision Research , 38 , 1861–1881.

Green, L., & Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin , 130 , 769–792.

Gu, H., Kim, W., Hou, F., Lesmes, L., Pitt, M. A., Lu, Z.-L., & Myung, J. I. (2016). A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function. Journal of Vision , 16 (6), 15, 1–17.

Hou, F., Lesmes, L., Kim, W., Gu, H., Pitt, M. A., Myung, J. I., & Lu, Z.-L. (2016). Evaluating the performance of the quick CSF method in detecting contrast sensitivity function changes. Journal of Vision , 16 (6), 18, 1–19.

Hsu, M., Bhatt, M., Adolphs, R., Tranel, D., & Camerer, C. F. (2005). Neural systems responding to degrees of uncertainty in human decision-making. Science , 310 (5754), 1680–1683.

King-Smith, P. E., Grigsby, S. S., Vingrys, A. J., Benes, S. C., & Supowit, A. (1994). Efficient and unbiased modifications of the quest threshold method: Theory, simulations, experimental evaluation and practical implementation. Vision Research , 34 , 885–912.

Kontsevich, L. L., & Tyler, C. W. (1999). Bayesian adaptive estimation of psychometric slope and threshold. Vision Research , 39 , 2729–2737.

Krause, F., & Lindemann, O. (2014). Expyriment: A python library for cognitive and neuroscientific experiments. Behavior Research Methods , 46 (2), 416–428.

Kujala, J. V., & Lukka, T. J. (2006). Bayesian adaptive estimation: The next dimension. Journal of Mathematical Psychology , 50 (4), 369–389.

Laibson, D. (1997). Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics , 112 (2), 443–478.

Lee, M. D., & Wagenmakers, E.-J. (2014) Bayesian cognitive modeling: A practical course . Cambridge: Cambridge University Press.

Lejuez, C. W., Read, J. P., Kahler, C. W., Ramsey, J. B., Stuart, G. L., & et al. (2002). Evaluation of a behavioral measure of risk-taking: The balloon analogue risk task (bart). Journal of Experimental Psychology: Applied , 8 (2), 75–85.

PubMed   Google Scholar  

Lesmes, L. A., Jeon, S.-T., Lu, Z.-L., & Dosher, B. A. (2006). Bayesian adaptive estimation of threshold versus contrast external noise functions: The quick T v C method. Vision Research , 46 , 3160–3176.

Levy, I., Snell, J., Nelson, A. J., Rustichini, A., & Glimcher, P. W. (2010). Neural representation of subjective value under risk and ambiguity. Journal of Neurophysiology , 103 , 1036–2047.

Lewi, J., Butera, R., & Paninski, L. (2009). Sequential optimal design of neurophysiology experiments. Neural Computation , 21 , 619–687.

Lindley, D. V. (1956). On a measure of the information provided by an experiment. Annals of Mathematical Statistics , 27 (4), 986– 1005.

Lorenz, R., Pio-Monti, R., Violante, I. R., Anagnostopoulos, C., Faisal, A. A., Montana, G., & Leech, R. (2016). The automatic neuroscientist: A framework for optimizing experimental design with closed-loop real-time fmri. NeuroImage , 129 , 320– 334.

Mathôt, S., Schreij, D., & Theeuwes, J. (2012). Opensesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods , 44 (2), 314–324.

Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. Commons, ML.; Mazur, JE.; Nevin, JA.

McClure, S. M., Ericson, K. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2007). Time discounting for primary rewards. Journal of Neuroscience , 27 (21), 5796–5804.

Müller, P. (1999). Simulation-based optimal design. In J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.) Bayesian statistics , (Vol. 6 pp. 459–474). Oxford: Oxford University Press.

Müller, P., Sanso, B., & De Iorio, M. (2004). Optimal Bayesian design by inhomogeneous Markov chain simulation. Journal of the American Statistical Association , 99 (467), 788–798.

Myung, I. J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology , 47 , 90–100.

Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive design optimization. Journal of Mathematical Psychology , 57 , 53–67.

Peirce, J. W. (2007). Psychopy—psychophysics software in python. Journal of Neuroscience Methods , 162 (1-2), 8–13.

Peirce, J. W. (2009). Generating stimuli for neuroscience using psychopy. Frontiers in Neuroinformatics , 2 , 10.

PubMed   PubMed Central   Google Scholar  

Rose, R. M., Teller, D. Y., & Rendleman, P. (1970). Statistical properties of staircase estimates. Perception & Psychophysics , 8 (4), 199–204.

Samuelson, P. A. (1937). A note on measurement of utility. The Review of Economic Studies , 4 (2), 155–161.

Settles, B. (2009). Active learning literature survey. University of Wisconsin-Madison Computer Sciences Technical Report TR1648 ( http://digital.library.wisc.edu/1793/60660 ).

Van-DenBos, W., & McClure, S. E. (2013). Towards a general model of temporal discounting. Journal of the Experimental Analysis of Behavior , 99 , 58–73.

Vandekerckhove, J., Rouder, J. N., & Krushke, J. K. (2018). Editorial: Bayesian methods for advancing psychological science. Psychonomic Bulletin & Review , 25 , 1–4.

Vincent, B. T. (2016). Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks. Behavior Research Methods , 48 , 1608–1620.

Wallsten, T. S., Pleskac, T. J., & Lejuez, C. W. (2005). Modeling behavior in a clinically diagnostic sequential risk-taking task. Psychological Review , 112 (4), 862–880.

Watson, A. B., & Pelli, D. G. (1983). Quest: A Bayesian adaptive psychometric method. Perception & Psychophysics , 33 (2), 113–120.

Wichmann, F. A., & Hill, N. J. (2001). The psychometric function: I. fitting, sampling, and goodness of fit. Perception & Psychophysics , 63 (8), 1293–1313.

Download references

Acknowledgements

The research was supported by National Institute of Health Grant R01-MH093838 to M.A.P. and J.I.M, the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT, & Future Planning (NRF-2018R1C1B3007313 and NRF-2018R1A4A1025891), the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01367, BabyMind), and the Creative-Pioneering Researchers Program through Seoul National University to W.-Y.A. Portions of this paper are published in the Proceedings of the 41st Annual Meeting of the Cognitive Science Society held in July, 2019.

Author information

Authors and affiliations.

Department of Psychology, Seoul National University, Seoul, Korea

Jaeyeong Yang & Woo-Young Ahn

Department of Psychology, Ohio State University, Columbus, OH, USA

Mark A. Pitt & Jay I. Myung

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Woo-Young Ahn or Jay I. Myung .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Defining grids for delay discounting task

As the first example, suppose that the delay discounting task has two constraints on its designs: the delay of SS option should be smaller than that of LL option ( t_ss < t_ll ), and the amount of reward of SS option should be smaller than that of LL option ( r_ss < r_ll ). Considering seven delays (i.e., right now, two weeks, a month, six months, a year, three years, and ten years) and 79 possible rewards (from $12.5 to $787.5 with an increment of $12.5), users can make a grid for design variables by executing the following lines:

figure ac

As an another example, if users want to use the amount of reward of the SS option ( r_ss ) and the delay of the LL option ( t_ll ) while fixing t_ss to 0 and r_ll to $800, define a grid as shown below:

figure ad

For model parameters, users should define a grid object containing grid points on a proper range for each parameter. For example, a grid for the hyperbolic model (Mazur, 1987 ) with two parameters ( k and τ ) can be defined as follows:

figure ae

Appendix B: ADOpy simulations

Simulations for psychometric function estimation were conducted for a simple 2-alternative forced choice (2AFC) task with one design variable. With an assumption that the psychometric function has a logistic function shape, we ran 1,000 simulations for three designs: (a) ADO design, (b) staircase design, and (c) randomly chosen design. For each simulation, responses were simulated for a total of 60 trials, using Task2AFC and ModelLogistic in the module adopy.tasks.psi .

Simulated responses were generated with true parameter values of threshold α = 20, slope β = 1.5, guess rate γ = 0.5, and lapse rate δ = 0.04. The simulation for psychometric function estimation used 100 grid points for the design variable ( stimulus ) and two model parameters ( threshold and slope ) each, and the guess and lapse rates were fixed to 0.5 and 0.04, respectively. The grid settings were given as follows:

Design variable

stimulus : 100 grid points from \(20 \log _{10} 0.05\) to \(20 \log _{10} 400\) in a log scale.

Model parameters

threshold : 100 grid points from \(20 \log _{10} 0.1\) to \(20 \log _{10} 200\) in a log scale.

slope : 100 grid points from 0 to 10 in a linear scale.

guess_rate : fixed to 0.5.

lapse_rate : fixed to 0.04.

Assuming the hyperbolic model, simulations for the delay discounting (DD) task were conducted using TaskDD and ModelHyp in the module adopy.tasks.dd . We compared three designs: (a) ADO design, (b) staircase design, and (c) randomly chosen design. The staircase method runs 6 trials for each delay to estimate the discounting rate. While t S S is fixed to 0, it starts with R S S of $400 and R L L of $800. If a participant chooses the SS option, the staircase method increases R S S by 50%; if the participant chooses the LL option, it decreases R S S by 50%. After repeating this 5 times, it proceeds to another delay value.

One thousand independent simulations were performed for each design condition, each for a total of 108 trials. Simulated data were generated using the true parameter values of k = 0.12 and τ = 1.5. Grid resolutions used for the simulations were as follows:

Design variables

t_ss : fixed to 0, which means ’right now’.

t_ll : 18 delays (3 days, 5 days, 1 week, 2 weeks, 3 weeks, 1 month, 6 weeks, 2 months, 10 weeks, 3 months, 4 months, 5 months, 6 months, 1 year, 2 years, 3 years, 5 years, 10 years) in a unit of a week.

r_ss : 63 points from $12.5 to $787.5 with an increment of $12.5.

r_ll : fixed to $800.

k (discounting rate): 20 grid points from 10 − 5 to 1 in a log scale.

tau (inverse temperature): 20 grid points from 0 to 5 in a linear scale.

In simulating this CRA task, we assume the linear model and considered three methods for experimental designs in the simulation study: (a) ADO design, (b) ’fixed’ design of Levy et al., ( 2010 ), and (c) random design.

The fixed design was set as follow. The the reward of the fixed option ( R F ) to 5 and the rewards of the variable option ( R V ) to 5, 9.5, 18, 34, 65. In risky trials, ambiguity ( A V ) is set to 0 but the probability of winning for the variable option ( P V ) is chosen among 0.13, 0.25, and 0.38. On the other hand, in ambiguous trials, the probability p V is set to 0.5 but the ambiguity A V is chosen from 0.25, 0.5, and 0.75. The total number of combinations is 30: 15 of which are for risky trials, and the rest of which are for ambiguous trials.

Grid settings for the four design variables and the three model parameters were set as follows:

p_var and a_var in risky trials: there are 9 probabilities to win for p_var (0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45), and a_var was fixed to 0.

p_var and a_var in ambiguous trials: there are 6 levels of ambiguity for a_var (0.125, 0.25, 0.375, 0.5, 0.625, 0.75), and p_var was fixed to 0.5.

r_var and r_fix : based on 10 reward values (10, 15, 21, 31, 45, 66, 97, 141, 206, 300), rewards pairs such that r_var > r_fix were used.

alpha (risk attitude parameter): 11 grid points from 0 to 3 in a linear scale.

beta (ambiguity attitude parameter): 11 grid points from − 3 to 3 in a linear scale.

gamma (inverse temperature): 11 grid points from 0 to 5 in a linear scale.

One thousand independent simulations were performed for each design condition, each for a total of 60 trials, with 30 risky and 30 ambiguous trials. Simulated data were generated using the true parameter values of α = 0.66, β = 0.67, and γ = 3.5 based on Levy et al., ( 2010 ).

Appendix C: Fully worked-out python script for delay discounting task

figure af

Rights and permissions

Reprints and permissions

About this article

Yang, J., Pitt, M.A., Ahn, WY. et al. ADOpy: a python package for adaptive design optimization. Behav Res 53 , 874–897 (2021). https://doi.org/10.3758/s13428-020-01386-4

Download citation

Published : 08 September 2020

Issue Date : April 2021

DOI : https://doi.org/10.3758/s13428-020-01386-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cognitive modeling
  • Bayesian adaptive experimentation
  • Optimal experimental design
  • Delay discounting
  • Risky choice
  • Find a journal
  • Publish with us
  • Track your research
  • Stack Overflow Public questions & answers
  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Talent Build your employer brand
  • Advertising Reach developers & technologists worldwide
  • Labs The future of collective knowledge sharing
  • About the company

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Design of Experiments (DoE) in Python?

Is there any module or package available in Python for using Design of Experiments (DoE) methods?

Deepak Gupta's user avatar

3 Answers 3

Have you tried pyDOE ? It is great for constructing factorial designs, response surface designs, and more. You can find it on PyPi .

Guillaume Jacquenot's user avatar

Try this new code base which takes a simple CSV file as input variable's range and generates the desired DOE in another CSV file. It is not a full package but collection of functions to use.

Design-of-Experiment-Python

Tirtha's user avatar

I would suggest to use OpenTURNS's DoEs . The provided DoE are:

  • stratified DoE (axial, factorial, composite, box)
  • random (bootstrap, LHS, MonteCarlo, importance sampling)
  • deterministic (fixed, Gauss product, tensor product, Smolyak)
  • cross validation (K-Fold, leave one out)
  • low discrepancy (Faure, Halton, reverse Halton, Haselgrove and, of course, Sobol')
  • optimized LHS (from Monte-Carlo or simulated annealing, with different criteria)

Furthermore, there are different quadrature methods, which are another set of features for design of experiments.

It's easy to install using pypi or Conda .

Michael Baudin's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged python statistics or ask your own question .

  • Featured on Meta
  • Upcoming sign-up experiments related to tags
  • The 2024 Developer Survey Is Live
  • The return of Staging Ground to Stack Overflow
  • Policy: Generative AI (e.g., ChatGPT) is banned

Hot Network Questions

  • Horror movie that has a demon hand coming through a mirror
  • PostGIS ST_ClusterDBSCAN returns NULL
  • Is it possible that the editor is still looking for other reveiwers while one reviewer has submitted the reviewer report?
  • Chain slipping in 8th gear only
  • Is there an analytical solution for the integral of a ratio of sin functions like this one?
  • How to count the number of lines in an array
  • Usage and origin of the terms dictionary and atom in compressive sensing
  • What might cause an inner tube to "behave" flat in a tire?
  • Surfing social media is one of the most, if not the most popular ______ among my friends => pastime or pastimes?
  • How to temporarly disable a primary IP without losing other IPs on the same interface
  • How can I have two plots progress on different rates of time?
  • Should I practise a piece at a metronome tempo that is faster than required?
  • Parts of the Edit Page
  • Can my grant pay for a conference marginally related to award?
  • Is it a "shifting of the burden of proof" if I show evidence in favor of a position, and ask the audience to debate that evidence if they disagree?
  • tnih neddih eht kcehc
  • Is there any way to play Runescape singleplayer?
  • How should I end a campaign only the passive players are enjoying?
  • Who is the "Sir Oracle" being referenced in "Dracula"?
  • Am I wasting my time self-studying program pre-requisites?
  • Why focus on T gates and not some other single qubit rotation R making Clifford + R universal?
  • Infinitary logics and the axiom of choice
  • Is it possible for Mathematica to output the name of a matrix as opposed to its matrix form?
  • What is the history and meaning of letters “v” and “e” in expressions +ve and -ve?

design of experiments python tutorial

DoEgen 0.5.0

pip install DoEgen Copy PIP instructions

Released: Oct 10, 2023

DoEgen: A Python Library for Optimised Design of Experiment Generation and Evaluation

Verified details, maintainers.

Avatar for pysebha from gravatar.com

Unverified details

Project links, github statistics.

  • Open issues:

View statistics for this project via Libraries.io , or by using our public dataset on Google BigQuery

Author: Sebastian Haan

Requires: Python >=3.6

Classifiers

  • Python :: 3
  • Python :: 3.6
  • Python :: 3.7

Project description

DoEgen is a Python library aiming to assist in generating optimised Design of Experiments (DoE), evaluating design efficiencies, and analysing experiment results.

In a first step, optimised designs can be automatically generated and efficiencies evaluated for any mixture of factor-levels for numeric and categorical factors. Designs are automatically evaluated as function of number of experiment runs and the most efficient designs are suggested. In particular DoEgen provides computation of a wide range of design efficiencies and allows to import and evaluate externally generated designs as well.

The second part of DoEgen assists in analysing any derived experiment results in terms of factor importance, correlations, and response analysis for best parameter space selection.

Table of Contents

Definitions, functionality, requirements, user templates.

  • Running tests

Documentation

Design generation, design efficiencies, design selection, experiment result analysis, use case study, comparison to other doe tools, attribution and acknowledgments.

An Experiment Design is typically defined by:

  • Number of Factors: the parameters or variates of the experiment
  • Number of Runs: the number of experiments
  • Levels: The number of value options for each factor, which can be either numeric values (discrete or continuous) or categorical. Discrete levels for continuous factors can be obtained by providing the minimum and maximum of the factor range and the number of levels. The more levels, the more “fine-grained” the experiment will evaluate this factor, but also more experimental runs are required.

The goal of optimising an experimental design is to provide an efficient design that is near-optimal in terms of, e.g., orthogonality, level balance, and two-way interaction coverage, yet can be performed with a minimum number of experimental runs, which are often costly or time-consuming.

If you would like to jumpstart a new experiment and to skip the technical details, you can find a summary of the main usage of DoEgen in Use Case Study .

Currently, the (preliminary) release contains several functions for generating and evaluating designs. Importing and evaluating external designs is supported (e.g. for comparison to other DoE generator tools). DoE also implements several functions for experiment result analysis and visualisation of parameter space.

The main functionalities are (sorted in order of typical experiment process):

  • Reading Experiment Setup Table and Settings (Parameter Name, Levels for each factor, Maximum number of runs, Min/Max etc)
  • Generating optimised design arrays for a range of runs (given maximum number of runs, and optional computation-time constraints, see settings_design.yaml ).
  • Evaluation and visualisation of more than ten design efficiencies such as level balance, orthogonality, D-efficiencies etc (see Design Efficiencies for the complete list).
  • Automatic suggestion of minimum, optimal, and best designs within a given range of experiment runs.
  • Import and evaluation of externally generated design arrays.
  • Experiment result analysis: Template table for experiment results, multi-variant RMSE computation, best model/parameter selection, Factor Importance computation, pairwise response surface and correlation computation, factor correlation analysis and Two-way interaction response plots.
  • Visualisation of experiment results.

Installation And Requirements

  • Python >= 3.6
  • SWIG >=3.0.12
  • scikit-learn

The DoEgen package is currently considered experimental and has been tested with the libraries specified in requirements.txt .

Python Setup Installation

The OApackage requires an installation of SWIG (tested with SWIG 3.0.12), which can be found at https://www.dev2qa.com/how-to-install-swig-on-macos-linux-and-windows/or can be installed via conda

After installing swig and numpy , DoEgen can be installed either with

or using pip

Note that OAPackage can be also installed manually by following installation instructions and documentation for OApackage (tested with OApackage 2.6.6), which can be found at https://pypi.org/project/OApackage/ .

Docker Installation

A docker image is provided in the folder docker/ and can be build, e.g., with

and tested, e.g.,

replacing <PATH_TO_DOCKER_IMAGE> with the absolute path to the docker image and <SETTINGSFILE_DESIGN> and <SETTINGSFILE_ANALYSIS> with the respective path and filename of the setting file.

The factor (parameter) settings of experiment are defined in an experiment setup table (see Experiment_results_template.xlsx ). A new excel setup template table can be also created with create_setupfile.py . Each factor is on a new row and specified by Parameter Name , Parameter Type , Level Number , Minimum , Maximum , Include (Y/N) (optional, by default all will be included), Levels (optional) If Levels are provided pleae seperate each level by a comma; Levels can be a mix of numerical and string entries (NUmbre of entries should match Level Number )

After the experiment is run, the results have to be filled in an experiment result table (see Experiment_results_template.xlsx ). A new excel result template table can be also created with create_resultfile.py The result table allows to fill in multiple output properties (Y_label: output target to be predicted) and experiment positions. The results have to be provided in the table with the following columns:

  • Nexp : Run# of experiment, need to match Run# in Experiment setup and design.
  • PID : Identifier# of label of location (point) in experiment (e.g. if experiment is run at different locations simultaneously).
  • Y Label : Identifier# or label of Y-Variate (target property that has to be predicted or evaluated, e.g. Rain and Temperature). This allows to include multi-output models with distinct target properties. Note that currently each Y variate is evaluated separately.
  • Y Exp The experiment result for Y
  • Y Truth (optional) if the true value available is available for Y. This is required to calculate the RMSE and to select best parameter space.
  • Not currently considered (yet) in result stats computation: Std Y Exp , Std Y Truth , Weight PID

design of experiments python tutorial

Running Tests

To verify that DoEgen works, you can run the example experiment

Please do not modify README.md . Instead make any changes in the master documentation file MANUAL.md (uses pandoc markdown syntax) and then convert to the inferior Github markdown flavor (note that the new github-flavored markdown format gfm option does not correctly solve figure caption and resize options):

and to pdf:

or as standalone html:

Main Modules and Usage

Design generation with doegen.py : Main model for generating optimised designs and computation of efficiencies. Settings are specified in settings yaml file settings_design.yaml . If the yaml and .xlsx template files are not yet in your working directory (e.g. after first DoEgen installation), you can create in the the yaml and excel template files with

Before running doegen.py ,two things have to be the done:

  • fill in experiment setup table (see template provided Experiment_setup_template.xlsx or example in test/ folder)
  • provide settings in settings file (see settings_design.yaml )

Now you are ready to run the design generation

This will produce a number of files for different experiment run length (see folder test/results/DesignArray_Nrun... ):

  • The optimised design array EDarray_[factor_levelels]_Nrun.csv .
  • A table of design efficiencies Efficiencies_[factor_levelels]_Nrun.csv
  • Table of two-way Interaction balance Table_Interaction_Balance.txt
  • Table of Pearson correlation coefficients between all factor pairs Table_Pearson_Correlation.csv
  • Plot of pairwise correlation including regression fit pairwise_correlation.png (see example plot below)

Besides the default optimisation (based on function doegen.deogen.optimize_design ), DoEgen also allows the to construct full orthogonal designs using the function doegen.doegen.gen_highD , which is based on OApackage orthogonal arrays and extensions. However, this works only for special cases with limited number of factors and design levels. Thus, it is currently not fully automated but might assist advanced users to construct optimal designs.

DoEgen will select by default three designs based on the following citeria:

  • minimum Design with the criteria:
  • number of runs >= number of factors + 1
  • center balance > 95%
  • level balance > 95%
  • Orthogonal Balance > 90%
  • Two Level interaction Balance > 90%
  • Two Level Interaction Minimum One = 100%
  • optimal Design with the criteria:
  • center balance > 98%
  • level balance > 98%
  • Orthogonal Balance > 95%
  • Two Level interaction Balance > 95%
  • best design which is based on best score that is sum of efficiencies above and includes a small penalty for runsize relative to maximum runsize

This will deliver (see folder test/results/ ):

  • Overview summary of the three designs and their main efficiencies: Experiment_Design_selection_summary.txt
  • Three tables ( Designtable_minimum/optimal/best...csv ) for the there suggested designs that are converted in the actual level values
  • An overview of the efficiencies is plotted as function of exp run and saved in Efficiencies_[factor_levels].png

In case the user wants to select another design for a different run size, one can covert the design array into a design table with the function doegen.deogen.array2valuetable() .

design of experiments python tutorial

DoEgen computes more than ten efficiencies and saves them as .csv file for each generated design array. All indicators, except for the canonical correlations, have a range from 0 (worst possible) to 1 (optimal):

  • Center Balance: 100% [1 - Sum(Center-Deviation)/Array Size], i.e. the average center balance over all factors.
  • Level Balance: Defined as 100% [1 - Sum(Imbalance)/Array Size], the average level balance over all factors.
  • Orthogonality: Defined as 100% [1 - Orthogonality], i.e. the average orthogonality over all factor pairs.
  • Two-way Interaction Balance: Similar to level balance but for pairwise factor balance.
  • Two-way Interaction with at least one occurrence: 100% [1 - Sum(Not at least one pairwise factor occurrence)/number of pairwise combinations]; 100% if all factor-level pair combinations occur at least once.
  • D-Eff: D-Efficiency (model includes main term and quadratic).
  • D1 Eff: only main terms
  • D2 Eff: main, quadratic, and interaction terms
  • A-Eff: A-efficiency (main term and quadratic)
  • A1-Eff: only main terms
  • A2-Eff: main, quadratic, and interaction terms

For further inspection, doegen.deogen.evaluate_design2 creates also the following tables and plots:

  • Table of Pearson Correlation (same as above if normalised discrete variables)
  • Table of Two-way Interaction Balance
  • Cornerplot of pairwise factor relation with Y

design of experiments python tutorial

Experiment Result Analysis with doeval.py : The experiment results have to be provided in a result table with the format as specified in #user-templates, and specifications in the settings_expresults.yaml file. Then run

This will create the following stats tables and plots (see folder test/expresults/ as example):

  • Valuation of the factors in term of “importance”, which is defined by the maximum change (range) in the average Y between any factor levels. Results are visualized in bar plot ( Ybarplot_*.png ) and saved as csv ( Experiment_Elevation_Factorimportance.csv ), including, min, max, std deviation across all levels
  • Computes RMSE between experiment result and ground truth; results saved as csv.
  • Ranked list of top experiments and their parameters based on RMSE
  • Computes average and variance of best parameters weighted with RMSE; saved to csv file
  • An overview plot of all the correlation plots between Y and each factor ( Expresult_distribution_X-Y_*.png , see function plot_regression )
  • Overview plot of the correlations between Y and RMSE ( Expresult_distribution_X-RMSE_*.png , see function plot_regression )
  • Plot of Y values for each pairwise combination of factors ( Y-pairwise-correlation_*.png , see function plot_3dmap ), which allows the user to visualise categorical factors
  • Plot of RMSE value for each pairwise combination of factors ( RMSE-pairwise-correlation_*.png , see function plot_3dmap )

design of experiments python tutorial

Here we demonstrate a typical use case where we would like to first generate and select an optimal experiment design. Then subsequently after running the experiment we would like to answer the question which is the best parameter space and what parameters are important. Our case study is given by the test example, which consists of 8 factors (parameters) that are specified in the experiment setup table Experiment_setup_test.xlsx .

design of experiments python tutorial

The first goal is to generate an efficient design with only a fraction of the entire parameter combination (in our case the full factorial would be 3 6  × 2 2  = 2916). The maximum number of experiments (in this case we choose 150) is set in the file settings_design_test.yaml , which also specifies input and output directory names, as well as the maximum time for optimising one run (in this case 100 seconds per design optimisation). This configuration will generate and optimize a range of experiments with different design run sizes from 12 to 150, in steps of 6 runsizes (since the lowest common multiple of our mix of 2 and 3 factor levels is 6). Note that the user can also choose a different stepsize, which can done by setting the value in the setting parameter delta_nrun . Now we are all setup to start the experiment design generation and optimisation script, which we do by running the script doegen.py with the settings file as argument:

This will generate for each runsize an optimised design array and a list of efficiencies and diagnostic tables and plots (see Design Generation for more details). To simplify the selection of the generated experiment designs, DoEgen suggests automatically three designs: 1) one minimum design (lowest number of runs at given efficiency threshold), 2) one optimal design, and 3) one best design (either equal or has larger experiment run number than optimal design). In our case the three design are selected for run numbers 30 (minimum), 72 (optimal), 90 (best). Since the optimal design has basically almost the same efficiencies as the best design (see figure below) but at a lower cost of experiment runs, we choose for our experiment the optimal design, which is given in the table Designtable_optimal_Nrun72.csv .

design of experiments python tutorial

Now it is time to run the experiment. In our example, we produce just some random data for the 72 experiments with 10 sensor locations (PID 1 to 10) and one output variable Y (e.g. temperature). To analyse the experiment, the results have to written in a structured table with the format as given in experiment_results_Nrun72.xlsx (see description in figure below).

design of experiments python tutorial

To run the experiment analysis script, settings such as for input output directory names are given in the settings file settings_expresults_test.yaml , and we can now run the analysis script with

This analysis produces a range of diagnostic tables and result plots for each output variable Y (in our case we have only one Y). One of the question of this example use case is to identify what factors are important, which is given in the figure Ybarplot.png . The “importance” basically indicates how much a factor changes Y (defined by the maximum average change in Y between any levels). This has the advantage to identify also important factors that have either a low linear regression coefficients with Y (see r values in plot Expresult_correlation_X.png ) or are categorical. Such insight can be valuable to determine, e.g., which factors should be investigated in more detail in a subsequent experiment or to estiamate which factors have no effect on Y.

design of experiments python tutorial

Another important question is what are the best parameter values based on the obtained experiment results so far? This question can be answered by computing the Root-Mean-Square-Error between experiment results and ground truth (or alternatively the likelihood if the model predictions include also uncertainties). Table Experiment_1_RMSE_Top10_sorted.csv provides an overview of the top 10 experiments sorted as function of their RMSE. Moroever we can calculate the (RMSE-weighted) average of each factor for the top experiments as shown in bar plot below.

design of experiments python tutorial

Furthermore, multiple other diagnostics plots such as factor-Y correlation and pairwise correlation maps with RMSE are generated (see Experiment Result Analysis for more details).

The aim of DoEgen is to provide an open-source tool for researchers to create optimised designs and a framework for transparent evaluation of experiment designs. Moreover, DoEgen aims to assist the result analysis that may allow the researcher a subsequent factor selection, parameter fine-tuning, or model building. The design generation function of DoEgen is build upon the excellent package OApackage and extends it further in terms of design efficiency evaluation, filtering, automation, and experiment analysis. There are multiple other tools available for DoE; the table below provides a brief (preliminary, subjective, and oversimplified) summary of the main advantages and disadvantages for each tool that has been tested. Users are encouraged to test these tools themselves.

Feature SAS JMP pyDOE2 OApackage DoEgen
Open-Source no (paid) yes yes yes
Design Optimisation Score very good limited good good
Optimal Runsize Finder no no no yes
Design Efficiency Eval yes no limited yes
Exp Result Analysis yes no no yes
Development Stage advanced early moderate very early

OApackage: A Python package for generation and analysis of orthogonal arrays, optimal designs and conference designs , P.T. Eendebak, A.R. Vazquez, Journal of Open Source Software, 2019

pyDOE2: An experimental design package for python

Dean, A., Morris, M., Stufken, J. and Bingham, D. eds., 2015. Handbook of design and analysis of experiments (Vol. 7). CRC Press.

Goos, P. and Jones, B., 2011. Optimal design of experiments: a case study approach. John Wiley & Sons.

Kuhfeld, W.F., 2010. Discrete choice. SAS Technical Papers, 2010, pp.285-663.

Zwerina, K., Huber, J. and Kuhfeld, W.F., 1996. A general method for constructing efficient choice designs. Durham, NC: Fuqua School of Business, Duke University.

Cheong, Y.P. and Gupta, R., 2005. Experimental design and analysis methods for assessing volumetric uncertainties. SPE Journal, 10(03), pp.324-335.

JMP, A. and Proust, M., 2010. Design of experiments guide. Cary, NC: SAS Institute Inc.

Acknowledgments are an important way for us to demonstrate the value we bring to your research. Your research outcomes are vital for ongoing funding of the Sydney Informatics Hub.

If you make use of this code for your research project, please include the following acknowledgment:

“This research was supported by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney.”

Project Contributors

Key project contributors to the DoEgen project are:

  • Sebastian Haan (Sydney Informatics Hub, University of Sydney): Main contributor and software development of DoEgen.
  • Christopher Howden (Sydney Informatics Hub, University of Sydney): Statistical consultancy, literature suggestions, and documentation.
  • Danial Azam (School of Geophysics, University of Sydney): Testing DoEgen on applications for computational geosciences.
  • Joel Nothman (Sydney Informatics Hub, University of Sydney): Code review and improvements with focus on doegen.py.
  • Dietmar Muller (School of Geophysics, University of Sydney): Suggesting the need for this project and developing real-world use cases for geoscience research.

Additional project contributors

Addtional features added to this version of DoEgen by:

  • Matt Boyd (School of Geosciences, University of Sydney), improvements with a focus on speeding up DoEgen using python multiprocessing.

DoEgen has benefited from the OApackage library OApackage for the design optimisation code and we would like to thank the researchers who have made their code available as open-source.

Copyright 2021 Sebastian Haan, The University of Sydney

DoEgen is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL version 3) as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program (see LICENSE.md). If not, see https://www.gnu.org/licenses/ .

Project details

Release history release notifications | rss feed.

Oct 10, 2023

Jul 27, 2021

Jul 13, 2021

Jun 8, 2021

May 6, 2021

Apr 14, 2021

Mar 30, 2021

Mar 2, 2021

Nov 18, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages .

Source Distribution

Uploaded Oct 10, 2023 Source

Built Distribution

Uploaded Oct 10, 2023 Python 3

Hashes for DoEgen-0.5.0.tar.gz

Hashes for DoEgen-0.5.0.tar.gz
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256

Hashes for DoEgen-0.5.0-py3-none-any.whl

Hashes for DoEgen-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256
MD5
BLAKE2b-256
  • português (Brasil)

Supported by

design of experiments python tutorial

Tutorial: Design of Experiments

Learn how to create a DOE and explore the results.

  • Review fully defined geometric variables
  • Create a DOE and activate the variables
  • Adjust and review the design variables
  • Create a response
  • Review run results before the DOE is complete
  • Explore the DOE run results

Open the DOE Model

  • On the File menu, select Open .

design of experiments python tutorial

  • On the File menu, click the Preferences button.
  • Under Inspire > Run Options > Analysis solver , select SimSolid .

design of experiments python tutorial

Review Fully Defined Geometric Variables

design of experiments python tutorial

These variables were generated when sketching. It is best practice to make sure that all parts are fully constrained in the model when creating geometric variables. For details on how to create variables, see Video Tutorial: Creating a Parametric Part .

  • For Main_tube_L , confirm that the Length is 5120 mm .
  • Close the Variable Manager.

design of experiments python tutorial

  • Double-click Extrude 1 . This is where the Main_tube_L variable was defined. To see the full entry, click in the Extents field and scroll all the way to the left Main_tube_L=4120.0000 mm . This portion of the model is fully defined.
  • Close the Extrude tool.
  • Close the Construction History Browser.

Create a DOE and Activate the Variables

design of experiments python tutorial

Adjust and Review Design Variables

design of experiments python tutorial

  • Click the left side of the preview slider for Main_tube_L to review the minimum value of this variable. Notice the design change in the graphics area.
  • Click the right side of the preview slider for Main_tube_L to review the minimum value of this variable. Notice the design change in the graphics area.

Create a Response

design of experiments python tutorial

  • Use the dropdown menu on the left side of the guide bar to select Displacement .
  • Leave the default option, Model , selected to create a response for the entire model.

design of experiments python tutorial

  • Change the DOE type to Full Factorial , which evaluates all possible combinations of input variable levels. This will resolve all the effects and interactions. It would take 2304 runs to complete this DOE using all design variables, including one with discrete parameters.
  • This is undesirable, so let's change the DOE type back to MELS (Modified Extensible Lattice Sequence). A lattice sequence is a quasi-random sequence, or low discrepancy sequence, designed to equally spread out points in a space by minimizing clumps and empty spaces. For the Number of Runs , enter 20 . Tip: Depending on your machine's capabilities, you may execute up to 4 runs at a time by using the Multi Execution option.
  • Select Run .

Review Completed Runs While the DOE Is in Progress

To Do this
Review run results In the Status column of the Run Status dialog, double-click a completed run.
Display run errors In the Status column of the Run Status dialog, mouse over a failed run to display a possible error message.
Review error logs In the Run Status dialog, right-click a failed run, and then select Open Run Folder.

design of experiments python tutorial

  • When all runs are complete, close the Run Status dialog.

Explore the DOE Run Results

design of experiments python tutorial

Positive effects are shown in shades of blue and indicate that a positive change in the design variable results in a positive change on the response.

Negative effects are shown in shades of brown and indicate that a positive change in the design variable results in a negative change on the response. FA_Tube_B has a linear effect of -2.021 , meaning an increase in the tube base width will decrease the displacement response. The full linear effect from the minimum design variable of 96 mm to the maximum design variable of 144 mm would be approximately -2 mm .

For more information, see Linear Effects .

design of experiments python tutorial

  • For information on how to save and export the optimized shape, see Export Geometry and Results .

SciPy Tutorial

Quiz/exercises.

SciPy is a scientific computation library that uses NumPy underneath.

SciPy stands for Scientific Python.

Learning by Reading

We have created 10 tutorial pages for you to learn the fundamentals of SciPy:

Basic SciPy

Learning by quiz test.

Test your SciPy skills with a quiz test.

Start SciPy Quiz

Learning by Exercises

Scipy exercises.

Insert the correct syntax for printing the kilometer unit (in meters):

Start the Exercise

Learning by Examples

In our "Try it Yourself" editor, you can use the SciPy module, and modify the code to see the result.

How many cubic meters are in one liter:

Click on the "Try it Yourself" button to see how it works.

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

  • Python Basics
  • Interview Questions
  • Python Quiz
  • Popular Packages
  • Python Projects
  • Practice Python
  • AI With Python
  • Learn Python3
  • Python Automation
  • Python Web Dev
  • DSA with Python
  • Python OOPs
  • Dictionaries

Python Web Scraping Tutorial

  • Introduction to Web Scraping
  • What is Web Scraping and How to Use It?
  • Web Scraping - Legal or Illegal?
  • Difference between Web Scraping and Web Crawling
  • Web Scraping using cURL in PHP

Basics of Web Scraping

  • HTML Basics
  • Tags vs Elements vs Attributes in HTML
  • CSS Introduction
  • JavaScript Cheat Sheet - A Basic Guide to JavaScript

Setting Up the Environment

  • Beautifulsoup Installation - Python
  • How to Install Requests in Python - For Windows, Linux, Mac
  • Selenium Python Introduction and Installation
  • How to Install Python Scrapy on Windows?

Extracting Data from Web Pages

  • Implementing Web Scraping in Python with BeautifulSoup
  • How to extract paragraph from a website and save it as a text file?
  • Extract all the URLs from the webpage Using Python
  • How to Scrape Nested Tags using BeautifulSoup?
  • Extract all the URLs that are nested within <li> tags using BeautifulSoup
  • Clean Web Scraping Data Using clean-text in Python

Fetching Web Pages

  • GET and POST Requests Using Python
  • BeautifulSoup - Scraping Paragraphs from HTML

HTTP Request Methods

  • GET method - Python requests
  • POST method - Python requests
  • PUT method - Python requests
  • DELETE method- Python requests
  • HEAD method - Python requests
  • PATCH method - Python requests

Searching and Extract for specific tags Beautifulsoup

  • Python BeautifulSoup - find all class
  • BeautifulSoup - Search by text inside a tag
  • Scrape Google Search Results using Python BeautifulSoup
  • Get tag name using Beautifulsoup in Python
  • Extracting an attribute value with beautifulsoup in Python
  • BeautifulSoup - Modifying the tree
  • Find the text of the given tag using BeautifulSoup
  • Python | Remove spaces from a string
  • Understanding Character Encoding
  • XML parsing in Python
  • Python - XML to JSON

Scrapy Basics

  • Scrapy - Command Line Tools
  • Scrapy - Item Loaders
  • Scrapy - Item Pipeline
  • Scrapy - Selectors
  • Scrapy - Shell
  • Scrapy - Spiders
  • Scrapy - Feed exports
  • Scrapy - Link Extractors
  • Scrapy - Settings
  • Scrapy - Sending an E-mail
  • Scrapy - Exceptions

Selenium Python Basics

  • Navigating links using get method - Selenium Python
  • Interacting with Webpage - Selenium Python
  • Locating single elements in Selenium Python
  • Locating multiple elements in Selenium Python
  • Locator Strategies - Selenium Python
  • Writing Tests using Selenium Python

Web scraping, the process of extracting data from websites, has emerged as a powerful technique to gather information from the vast expanse of the internet. In this tutorial, we’ll explore various Python libraries and modules commonly used for web scraping and delve into why Python 3 is the preferred choice for this task.

Essential Packages and Tools for Python Web Scraping

The latest version of Python , offers a rich set of tools and libraries specifically designed for web scraping, making it easier than ever to retrieve data from the web efficiently and effectively.

Table of Content

Requests Module

Beautifulsoup library, urllib module.

  • Why Python Web Scraping?

The requests library is used for making HTTP requests to a specific URL and returns the response. Python requests provide inbuilt functionalities for managing both the request and response.

Example: Making a Request

Python requests module has several built-in methods to make HTTP requests to specified URI using GET, POST, PUT, PATCH, or HEAD requests. A HTTP request is meant to either retrieve data from a specified URI or to push data to a server. It works as a request-response protocol between a client and a server. Here we will be using the GET request. The GET method is used to retrieve information from the given server using a given URI. The GET method sends the encoded user information appended to the page request. 

Python requests making GET request

For more information, refer to our Python Requests Tutorial . 

Beautiful Soup provides a few simple methods and Pythonic phrases for guiding, searching, and changing a parse tree: a toolkit for studying a document and removing what you need. It doesn’t take much code to document an application.

Beautiful Soup automatically converts incoming records to Unicode and outgoing forms to UTF-8. You don’t have to think about encodings unless the document doesn’t define an encoding, and Beautiful Soup can’t catch one. Then you just have to choose the original encoding. Beautiful Soup sits on top of famous Python parsers like LXML and HTML, allowing you to try different parsing strategies or trade speed for flexibility.

  • Importing Libraries: The code imports the requests library for making HTTP requests and the BeautifulSoup class from the bs4 library for parsing HTML.
  • Making a GET Request: It sends a GET request to ‘https://www.geeksforgeeks.org/python-programming-language/’ and stores the response in the variable r.
  • Checking Status Code: It prints the status code of the response, typically 200 for success.
  • Parsing the HTML : The HTML content of the response is parsed using BeautifulSoup and stored in the variable soup.
  • Printing the Prettified HTML: It prints the prettified version of the parsed HTML content for readability and analysis.

Python BeautifulSoup Parsing HTML

Finding Elements by Class

Now, we would like to extract some useful data from the HTML content. The soup object contains all the data in the nested structure which could be programmatically extracted. The website we want to scrape contains a lot of text so now let’s scrape all those content. First, let’s inspect the webpage we want to scrape. 

findallbs4pythontutorial-copy

In the above image, we can see that all the content of the page is under the div with class entry-content. We will use the find class. This class will find the given tag with the given attribute. In our case, it will find all the div having class as entry-content.

We can see that the content of the page is under the <p> tag. Now we have to find all the p tags present in this class. We can use the find_all class of the BeautifulSoup.

find_all bs4

For more information, refer to our Python BeautifulSoup . 

Selenium is a popular Python module used for automating web browsers. It allows developers to control web browsers programmatically, enabling tasks such as web scraping, automated testing, and web application interaction. Selenium supports various web browsers, including Chrome, Firefox, Safari, and Edge, making it a versatile tool for browser automation.

Example 1: For Firefox

In this specific example, we’re directing the browser to the Google search page with the query parameter “geeksforgeeks”. The browser will load this page, and we can then proceed to interact with it programmatically using Selenium. This interaction could involve tasks like extracting search results, clicking on links, or scraping specific content from the page.

for-firefox

Example 2: For Chrome

  • We import the webdriver module from the Selenium library.
  • We specify the path to the web driver executable. You need to download the appropriate driver for your browser and provide the path to it. In this example, we’re using the Chrome driver.
  • We create a new instance of the web browser using webdriver.Chrome() and pass the path to the Chrome driver executable as an argument.
  • We navigate to a webpage by calling the get() method on the browser object and passing the URL of the webpage.
  • We extract information from the webpage using various methods provided by Selenium. In this example, we retrieve the page title using the title attribute of the browser object.
  • Finally, we close the browser using the quit() method.

design of experiments python tutorial

For more information, refer to our Python Selenium . 

The lxml module in Python is a powerful library for processing XML and HTML documents. It provides a high-performance XML and HTML parsing capabilities along with a simple and Pythonic API. lxml is widely used in Python web scraping due to its speed, flexibility, and ease of use.

Here’s a simple example demonstrating how to use the lxml module for Python web scraping:

  • We import the html module from lxml along with the requests module for sending HTTP requests.
  • We define the URL of the website we want to scrape.
  • We send an HTTP GET request to the website using the requests.get() function and retrieve the HTML content of the page.
  • We parse the HTML content using the html.fromstring() function from lxml, which returns an HTML element tree.
  • We use XPath expressions to extract specific elements from the HTML tree. In this case, we’re extracting the text content of all the <a> (anchor) elements on the page.
  • We iterate over the extracted link titles and print them out.

The urllib module in Python is a built-in library that provides functions for working with URLs. It allows you to interact with web pages by fetching URLs (Uniform Resource Locators), opening and reading data from them, and performing other URL-related tasks like encoding and parsing. Urllib is a package that collects several modules for working with URLs, such as:

  • urllib.request for opening and reading.
  • urllib.parse for parsing URLs
  • urllib.error for the exceptions raised
  • urllib.robotparser for parsing robot.txt files

If urllib is not present in your environment, execute the below code to install it.

Here’s a simple example demonstrating how to use the urllib module to fetch the content of a web page:

  • We define the URL of the web page we want to fetch.
  • We use urllib.request.urlopen() function to open the URL and obtain a response object.
  • We read the content of the response object using the read() method.
  • Since the content is returned as bytes, we decode it to a string using the decode() method with ‘utf-8’ encoding.
  • Finally, we print the HTML content of the web page.

uutt

The pyautogui module in Python is a cross-platform GUI automation library that enables developers to control the mouse and keyboard to automate tasks. While it’s not specifically designed for web scraping, it can be used in conjunction with other web scraping libraries like Selenium to interact with web pages that require user input or simulate human actions.

In this example, pyautogui is used to perform scrolling and take a screenshot of the search results page obtained by typing a query into the search input field and clicking the search button using Selenium.

The schedule module in Python is a simple library that allows you to schedule Python functions to run at specified intervals. It’s particularly useful in web scraping in Python when you need to regularly scrape data from a website at predefined intervals, such as hourly, daily, or weekly.

  • We import the necessary modules: schedule, time, requests, and BeautifulSoup from the bs4 package.
  • We define a function scrape_data() that performs the web scraping task. Inside this function, we send a GET request to a website (replace ‘https://example.com’ with the URL of the website you want to scrape), parse the HTML content using BeautifulSoup, extract the desired data, and print it.
  • We schedule the scrape_data() function to run every hour using schedule.every().hour.do(scrape_data).
  • We enter a main loop that continuously checks for pending scheduled tasks using schedule.run_pending() and sleeps for 1 second between iterations to prevent the loop from consuming too much CPU.

design of experiments python tutorial

Why Python3 for Web Scraping?

Python’s popularity for web scraping stems from several factors:

  • Ease of Use : Python’s clean and readable syntax makes it easy to understand and write code, even for beginners. This simplicity accelerates the development process and reduces the learning curve for web scraping tasks.
  • Rich Ecosystem : Python boasts a vast ecosystem of libraries and frameworks tailored for web scraping. Libraries like BeautifulSoup, Scrapy, and Requests simplify the process of parsing HTML, making data extraction a breeze.
  • Versatility : Python is a versatile language that can be used for a wide range of tasks beyond web scraping. Its flexibility allows developers to integrate web scraping seamlessly into larger projects, such as data analysis, machine learning, or web development.
  • Community Support : Python has a large and active community of developers who contribute to its libraries and provide support through forums, tutorials, and documentation. This wealth of resources ensures that developers have access to assistance and guidance when tackling web scraping challenges.

Please Login to comment...

Similar reads.

  • Web-scraping

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

buildings-logo

Article Menu

design of experiments python tutorial

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Parametric modeling and numerical simulation of a three-dimensional random aggregate model of lime–sand piles based on python–abaqus.

design of experiments python tutorial

1. Introduction

1.1. research on mesoscopic model modeling, 1.2. research ideas and purposes, 2. meso-model modeling of lime–sand pile, 2.1. aggregate gradation test, 2.2. determination of basic parameters of aggregate, 2.3. the generation of aggregate, 2.4. delivery of aggregate, 3. numerical simulation of lime–sand pile meso-model, 3.1. determination and verification of microscopic parameters, 3.2. mesh generation of mesoscopic model, 3.3. static simulation analysis, 3.3.1. different mixing ratio simulation analysis, 3.3.2. simulation analysis of different heights, 4. conclusions and suggestions, author contributions, data availability statement, conflicts of interest.

  • Chen, P.; Cui, Q. Application of Lime Soil Pile in Collapsible Loess Ground Treatment. Adv. Mater. Res. 2014 , 1049–1050 , 256–259. [ Google Scholar ] [ CrossRef ]
  • He, Y.Q.; Zhu, Y.P. Theory and experiment of collapsible loess foundation treatment by expansion method. Civ. Build. Environ. Eng. 2009 , 31 , 44–48. [ Google Scholar ]
  • Shabir, H.; Muhammad, F.; Ahmed, K.F.; Saeed, Z. Experimental evaluation of lime column as a ground improvement method in soft soils. SN Appl. Sci. 2021 , 3 , 799. [ Google Scholar ]
  • Zhang, C.X.; Wang, J.L. Discussion on the effect of lime pile pre-melting treatment of island permafrost foundation. Low Temp. Build. Technol. 2022 , 44 , 167–170+174. [ Google Scholar ] [ CrossRef ]
  • Yang, X.M.; Cheng, J.; Cai, H.C. Effect evaluation of application of quicklime pile composite foundation in island frozen soil area. J. Railw. Sci. Eng. 2022 , 19 , 941–948. [ Google Scholar ] [ CrossRef ]
  • Wittmann, F.H.; Roelfstra, P.E.; Sadouki, H. Simulation and analysis of composite structures. Mater. Sci. Eng. 1985 , 68 , 239–248. [ Google Scholar ] [ CrossRef ]
  • Liu, G.T.; Wang, Z.M. Numerical simulation of fracture of concrete materials by random aggregate model. J. Tsinghua Univ. (Nat. Sci. Ed.) 1996 , 1 , 84–89. [ Google Scholar ]
  • Schlangen, E.; Van Mier, J.G.M. Simple lattice modle for numerical simulation of fracture of concrete materials and structures. Mater. Struct. 1992 , 25 , 239–248. [ Google Scholar ] [ CrossRef ]
  • Walraven, J.C.; Reinhardt, H.W. Theory and experiments on the mechanical behavior of cracks in plain and reinforced concrete subject to shear loading. Heron 1991 , 26 , 26–35. [ Google Scholar ]
  • Qin, W.P.; Yang, X.H.; Chen, C.Y. A fast random placement algorithm for three-dimensional convex concrete aggregate. Hydropower Energy Sci. 2006 , 3 , 39–42+99. [ Google Scholar ]
  • Guan, Z.Q.; Gao, Q.H.; Gu, Y.X.; Song, C.; Shan, J. Establishment of three-dimensional finite element mesh model of composite mesostructure. Eng. Mech. 2005 , S1 , 67–72. [ Google Scholar ]
  • Qin, W.; Du, C.B. Meso-level mechanical modeling of three-dimensional concrete based on CT slices. Eng. Mech. 2012 , 29 , 186–193. [ Google Scholar ]
  • Avdeev, I.; Sobolev, K.; Amirjanov, A.; Hastert, A. Micromechanical Models of Structural Behavior of Concrete. MRS Online Proceeding Libr. Arch. 2010 , 20 , 1276–1290. [ Google Scholar ] [ CrossRef ]
  • Zhu, L.; Dang, F.N.; Ding, W.H.; Xue, Y.; Zhang, L. Based on CT technology and gray level co-occurrence matrix theory, the mesoscopic damage evolution process of concrete under different loads is studied. J. Civ. Eng. 2020 , 53 , 97–107. [ Google Scholar ] [ CrossRef ]
  • Yang, S.D.; Gao, Y.M.; Tao, Z.; Chen, W. Research on plastic damage coupling mechanical properties of concrete based on random aggregate model. J. Univ. South China (Nat. Sci. Ed.) 2023 , 37 , 47–52+94. [ Google Scholar ] [ CrossRef ]
  • Sun, J.Y.; Xie, J.B.; Zhou, Y.; Zhou, Y. A 3D three-phase meso-scale model for simulation of chloride diffusion in concrete based on ANSYS. Int. J. Mech. Sci. 2022 , 219 , 107127. [ Google Scholar ] [ CrossRef ]
  • Ma, B. Study on the Diffusion Characteristics of Sulfate Ions in Concrete Based on Random Aggregate Model. Master’s Thesis, Chongqing Jiaotong University, Chongqing, China, 2023. [ Google Scholar ]
  • Liu, Y.T. Meso-Simulation Study on Compressive Properties of Recycled Concrete and Analysis of Influencing Factors. Master’s Thesis, Xi’an University of Technology, Xi’an, China, 2023. [ Google Scholar ]
  • Wang, S.X. Study on the Size Effect of Compressive Strength of Fly Ash Geopolymer Concrete at High Temperature. Master’s Thesis, Xi’an University of Architecture and Technology, Xi’an, China, 2023. [ Google Scholar ]
  • Liu, S.Y.; Miao, Y.C.; Li, M.H.; Selyutina, N.; Smirnov, I.; Liu, Y.; Zhang, Y. Numerical analysis of damage and deterioration of recycled thermal insulation concrete after high temperature. J. Taiyuan Univ. Technol. 2024 , 10 , 1–14. [ Google Scholar ]
  • Wu, H.; Lu, S.Y.; Chen, D. Dynamic shear behavior of FRP-concrete bond interface based on concrete 3D mesoscopic model. Eng. Mech. 2024 , 41 , 1–16. [ Google Scholar ]
  • Fang, Q.; Zhang, J.H.; Huan, Y.; Zhang, Y. Research on modeling method of three-dimensional mesoscopic model of fully graded concrete. Eng. Mech. 2013 , 30 , 14–21+30. [ Google Scholar ]
  • Jiang, B.K.; Sun, W.J. Analysis of geometric model of recycled concrete random aggregate based on MATLAB programming language. Sichuan Cem. 2023 , 6 , 7–9+12. [ Google Scholar ]
  • Cheng, S.H.; Ren, Z.G.; Li, P.P.; Shangguan, J. Numerical modeling of concrete three-dimensional random concave-convex aggregate based on LS-DYNA. J. Wuhan Univ. Technol. 2014 , 36 , 89–94+121. [ Google Scholar ]
  • Tan, Y.W.; Wang, S.G.; Xu, F.; Liu, W.; Chen, X.; Ran, Y. The application status of COMSOL Multiphysics in the study of concrete durability. J. Silic. 2017 , 45 , 697–707. [ Google Scholar ] [ CrossRef ]
  • Caballero, A.; López, C.M.; Carol, I. 3D meso-structural analysis of concrete specimens under uniaxial tension. Comput. Methods Appl. Mech. Eng. 2006 , 195 , 7182–7195. [ Google Scholar ] [ CrossRef ]
  • Zhao, C.; Yang, Q.Y.; Zhong, X.G.; Shu, X.; Shen, M. Voronoi-RBSM coupling concrete mesoscopic modeling method. Eng. Mech. 2024 , 40 , 1–11. [ Google Scholar ]
  • Guo, R.Q.; Ren, H.Q.; Zhang, L.; Long, Z.; Wu, X.; Li, Z. SHPB simulation study based on concrete meso-aggregate model. Vib. Impact 2019 , 38 , 107–116. [ Google Scholar ] [ CrossRef ]
  • Xu, Q.; Zhou, X.S.; Cheng, Z.C. Random aggregate model and meso-mechanical analysis of concrete based on Ansys. J. Wuhan Univ. 2019 , 52 , 1035–1040+1047. [ Google Scholar ] [ CrossRef ]
  • Jia, J.Y.; Wang, Z.R.; Xiao, K. Research on concrete crack propagation simulation system based on VC++ and ANSYS. Ind. Build. 2012 , 42 , 539–543+576. [ Google Scholar ] [ CrossRef ]
  • Liang, J.; Lou, Z.K.; Han, J.H. Concrete aggregate modeling analysis based on AutoCAD. J. Water Conserv. 2011 , 42 , 1379–1383. [ Google Scholar ] [ CrossRef ]
  • Cao, J.F.; Wang, X.C.; Kong, L. Application of Python Language in Abaqus , 2nd ed.; Machine Industry Press: Beijing, China, 2020; pp. 109–182. [ Google Scholar ]
  • Qiao, Y.; Si, J.; Yuan, J.; Wang, Y.; Niu, X.; Ju, J.; Zhou, M.; He, L. Multi-Parameter Experimental Research on the Expansion Force Affecting Lime-Sand Piles under Preloading Pressure. Buildings 2024 , 14 , 1208. [ Google Scholar ] [ CrossRef ]
  • Xu, Z.J. Monte Carlo Method ; Shanghai Science and Technology Press: Shanghai, China, 1985. [ Google Scholar ]
  • Wu, Y.H.; Xiao, Y.X.; Xu, Y.F. Establishment of three-dimensional meso-random model of concrete based on Python-Abaqus. J. Comput. Mech. 2022 , 39 , 566–573. [ Google Scholar ]
  • Kerner, E.H. The elastic and thermos-elastic properties of composite media. Proc. Phys. Society. Sect. B 1956 , 69 , 808. [ Google Scholar ] [ CrossRef ]
  • Hu, L.P.; Chen, X.D.; Zhu, X.Y.; Chen, C. Mechanical properties analysis and discrete element simulation of steel slag pervious concrete with different aggregate sizes. Compr. Util. Fly Ash 2022 , 36 , 69–75+132. [ Google Scholar ] [ CrossRef ]
  • Pei, X.J.; Zhang, F.Y.; Wu, W.J.; Liang, S.Y. Physicochemical and index properties of loess stabilized with lime and fly ash piles. Appl. Clay Sci. 2015 , 114 , 77–84. [ Google Scholar ] [ CrossRef ]
  • Zhang, Z.Q.; Li, Y.L.; Zhu, X.Y.; Liu, X. Meso-scale corrosion expansion cracking of ribbed reinforced concrete based on a 3D random aggregate model. J. Zhejiang Univ. -Sci. A (Appl. Phys. Eng.) 2021 , 22 , 924–941. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.R.; Cheng, Y.H.; Wu, H. Analysis of dynamic compression behavior of concrete based on 3D mesoscopic model. Explos. Impact 2024 , 44 , 023102. [ Google Scholar ]
  • Zhang, X.F.; Tian, Y.; Qin, P.; Xiao, T.; Wu, J. Mesoscopic simulation of uniaxial tension of concrete with deformed aggregate based on cohesion model. Hydropower 2022 , 48 , 73–77. [ Google Scholar ]
  • Ke, Y.; Deng, D.; Gao, F.; Zheng, K. Application of meso-mechanical homogeneous model in the evaluation of mechanical and thermal performance parameters of shale ceramsite aggregate. Concrete 2017 , 10 , 32–36. [ Google Scholar ]
  • Yu, Q.; Chen, Z.; Yang, J.; Rong, K. Numerical Study of Concrete Dynamic Splitting Based on 3D Realistic Aggregate Mesoscopic Model. Materials 2021 , 14 , 1948. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Lime TypeNameCaO + MgOMgOCO SO
Siliceous limeSiliceous lime90CL90-Q≥90≤5≤4≤2
Type of SoilNatural Water Content W (%)Natural Density (g/cm³)Plasticity IndexCompression Modulus
(MPa)
Liquidity FactorShearing Strength
C/kpaΦ
Loess18.81.7512.813.46<030.023.0
Aggregate TypeYoung’s Modulus
(MPa)
Poisson’s RatioNatural Density (g/cm³)Equivalent Coefficient of Linear Expansion
(10 /°C)
Lime matrix80000.302.509.40
Sand aggregate10 0.201.50-
Loess aggregate230.401.75-
Aggregate TypeGrid TypeGrid SizeElement Number
Lime matrixC3D43 mm634,800
Sand aggregateC3D41 mm932,225
Loess aggregateC3D41 mm233,056
CategoryExperimental Value (kN)Simulation Value (kN)Relative Error/%
S-4:5:18.588.812.68
M-5:4:1 12.3712.611.94
L-6:3:118.4818.892.22
GroupsExperimental Value (kN)Analogue Value (kN)Relative Error/%
M-5012.3712.611.94
M-10012.4212.571.21
M-15012.3312.521.54
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Yuan, J.; Si, J.; Qiao, Y.; Sun, W.; Qiao, S.; Niu, X.; Zhou, M.; Ju, J. Parametric Modeling and Numerical Simulation of a Three-Dimensional Random Aggregate Model of Lime–Sand Piles Based on Python–Abaqus. Buildings 2024 , 14 , 1842. https://doi.org/10.3390/buildings14061842

Yuan J, Si J, Qiao Y, Sun W, Qiao S, Niu X, Zhou M, Ju J. Parametric Modeling and Numerical Simulation of a Three-Dimensional Random Aggregate Model of Lime–Sand Piles Based on Python–Abaqus. Buildings . 2024; 14(6):1842. https://doi.org/10.3390/buildings14061842

Yuan, Jia, Jianhui Si, Yong Qiao, Wenshuo Sun, Shibo Qiao, Xiaoyu Niu, Ming Zhou, and Junpeng Ju. 2024. "Parametric Modeling and Numerical Simulation of a Three-Dimensional Random Aggregate Model of Lime–Sand Piles Based on Python–Abaqus" Buildings 14, no. 6: 1842. https://doi.org/10.3390/buildings14061842

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. Master The Design Of Experiments Python

    design of experiments python tutorial

  2. Python Tutorial For Absolute Beginners

    design of experiments python tutorial

  3. Design of experiments in Python

    design of experiments python tutorial

  4. Design of experiments in Python

    design of experiments python tutorial

  5. Design of experiments in Python

    design of experiments python tutorial

  6. Master The Design Of Experiments Python

    design of experiments python tutorial

VIDEO

  1. Design of Experiments DoE in Python

  2. Python project ideas for beginners #2

  3. Amazing 💥 Pattern Design using Python turtle #python #coding #funny #viral #trending #design

  4. Python Decorators In 1 MINUTE!

  5. Amazing ⚙️ Pattern Design using Python turtle #python #coding #funny #viral #trending #design

  6. Lecture_6: Python essentials part 1

COMMENTS

  1. Create your experimental design with a simple Python command

    Introduction. Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis. This exercise has become critical in this age of rapidly expanding the field of data science and associated statistical modeling and machine learning.A well-planned DOE can give a researcher meaningful data set to act upon with the optimal ...

  2. Design of Experiments (DOE) with python

    An introduction to Design of Experiments (DOE) with python with a simple case study with and without interactions. Photo by Edge2Edge Media on Unsplash Introduction. In this article, I want to ...

  3. pyDOE : The experimental design package for python

    The pyDOE package is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs. Hint. All available designs can be accessed after a simple import statement: >>> from pyDOE import *.

  4. Python Tutorial: Experimental Design in Python

    Want to learn more? Take the full course at https://learn.datacamp.com/courses/experimental-design-in-python at your own pace. More than a video, you'll lear...

  5. Welcome to DOEPY

    Introduction¶. Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis. This exercise has become critical in this age of rapidly expanding field of data science and associated statistical modeling and machine learning.A well-planned DOE can give a researcher meaningful data set to act upon with optimal number ...

  6. pyDOE2 · PyPI

    Design of experiments for Python. pyDOE2: An experimental design package for python. pyDOE2 is a fork of the pyDOE package that is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs.. This fork came to life to solve bugs and issues that remained unsolved in the original package.

  7. pyDOE3 : An experimental design package for python

    Design of experiments for Python. pyDOE3: An experimental design package for python¶. pyDOE3 is fork of pyDOE2 which is a fork of pyDOE. As for pyDOE2 wrt to pyDOE, pyDOE3 came to life to solve bugs and issues that remained unsolved in pyDOE2.. The pyDOE3 package is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs.

  8. dexpy

    dexpy - Design of Experiments (DOE) in Python. dexpy is a Design of Experiments (DOE) package based on the Design-Expert ® software from Stat-Ease, Inc. If you're new to the area of DOE, here is a primer to help get you started. The primary purpose of this package is to construct experimental designs. After performing your experiment, you ...

  9. Design of experiments in Python

    Design of experiments are an important part of scientific research. It is a methodology for choosing the best set of experiments to get data that will help y...

  10. GitHub

    pyDOE2: An experimental design package for python. pyDOE2 is a fork of the pyDOE package that is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs. This fork came to life to solve bugs and issues that remained unsolved in the original package.

  11. pyDOE · PyPI

    Design of experiments for Python. Navigation. Project description ; Release history ; Download files ; Verified details These details have been verified by PyPI ... Tags DOE, design of experiments, experimental design, optimization, statistics, python . Classifiers. Development Status. 5 - Production/Stable ...

  12. Design-of-experiment (DOE) matrix generator for engineering and

    Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis. This exercise has become critical in this age of rapidly expanding field of data science and associated statistical modeling and machine learning. A well-planned DOE can give a researcher meaningful data ...

  13. pyDOE3 · PyPI

    pyDOE3: An experimental design package for python. pyDOE3 is a fork of the pyDOE2 package that is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs. This fork came to life to solve bugs and issues that remained unsolved in the original package.

  14. GitHub

    pyDOE3: An experimental design package for python pyDOE3 is a fork of the pyDOE2 package that is designed to help the scientist, engineer, statistician, etc., to construct appropriate experimental designs.

  15. Response Optimization with Design of Experiments and python

    An important point while running a DOE, however, is the ability to look for the maximum response of a system. In this article we will employ some very basic tools available with python to address such a point: given the result of a full factorial DOE with 2 levels, how to plan and execute the next runs in order to achieve a maximum.

  16. Lesson 1: Introduction to Design of Experiments

    Upon completion of this lesson, you should be able to: understand the issues and principles of Design of Experiments (DOE), understand experimentation is a process, list the guidelines for designing experiments, and. recognize the key historical figures in DOE. 1.1 - A Quick History of the Design of Experiments (DOE)

  17. Design Optimization with Ax in Python

    Optimize design from a set of constrained equations — an analytical model derived from first principles — that likely weave together with nonlinearities. Found in sim.py, which the tutorial explains. Improve experimental modeling and design efficiency by fitting a model of what happens in the true world.

  18. ADOpy: a python package for adaptive design optimization

    Experimental design is fundamental to research, but formal methods to identify good designs are lacking. Advances in Bayesian statistics and machine learning offer algorithm-based ways to identify good experimental designs. Adaptive design optimization (ADO; Cavagnaro, Myung, Pitt, & Kujala, 2010; Myung, Cavagnaro, & Pitt, 2013) is one such method. It works by maximizing the informativeness ...

  19. Design of Experiments (DoE) in Python?

    The provided DoE are: stratified DoE (axial, factorial, composite, box) random (bootstrap, LHS, MonteCarlo, importance sampling) deterministic (fixed, Gauss product, tensor product, Smolyak) cross validation (K-Fold, leave one out) low discrepancy (Faure, Halton, reverse Halton, Haselgrove and, of course, Sobol') optimized LHS (from Monte-Carlo ...

  20. DoEgen · PyPI

    DoEgen: A Python Library for Optimised Design of Experiment Generation and Evaluation. DoEgen is a Python library aiming to assist in generating optimised Design of Experiments (DoE), evaluating design efficiencies, and analysing experiment results. In a first step, optimised designs can be automatically generated and efficiencies evaluated for ...

  21. Tutorial: Design of Experiments

    Evaluate designs by using geometric variables and applying a design-of-experiments (DOE) or optimization method. Manufacture. Set up and run a basic porosity or thinning analysis. Print 3D. Preapare and run an additive manufacturing simulation, and export a file for 3D printing. Inspire Python API

  22. Experimental Design: Types, Examples and Methods

    The primary types of experimental design include: Pre-experimental Research Design. True Experimental Research Design. Quasi-Experimental Research Design. Statistical Experimental Design. Pre-experimental Research Design. A preliminary approach where groups are observed after implementing cause and effect factors to determine the need for ...

  23. SciPy Tutorial

    W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.

  24. Python Web Scraping Tutorial

    Complete System Design Tutorial; Software Design Patterns; System Design Roadmap; Top 10 System Design Interview Questions and Answers; Interview Corner. Company Preparation; ... In this tutorial, we'll explore various Python libraries and modules commonly used for web scraping and delve into why Python 3 is the preferred choice for this task.

  25. Buildings

    A lime-sand pile is a three-phase particle composite material composed of a lime matrix, sand, and a loess aggregate at the meso level. Establishing a random aggregate model that can reflect the actual aggregate gradation, content, and morphology is the premise of numerical simulations of the meso-mechanics of lime-sand piles. In this paper, the secondary development of Abaqus 2022 is ...