research papers on architecture

Most Downloaded Articles

Open access

Biophilic design in architecture and its contributions to health, well-being, and sustainability: A critical review

February 2022

Weijie Zhong | Torsten Schröder | Juliette Bekkering

In the last ten years, ‘nature’ and biophilic design have received widespread attention in architecture, especially in response to growing environmental challenges. However, open questions and controversies...

Share article

The effects of environmental factors on the patient outcomes in hospital environments: A review of literature

Saman Jamshidi | Jan S. Parker | Seyedehnastaran Hashemi

This study investigates the evidence supporting the impact of the built environment on the health outcomes for patients within the hospital setting. Improving the hospital environment may potentially...

research papers on architecture

Using passive cooling strategies to improve thermal performance and reduce energy consumption of residential buildings in U.A.E. buildings

Hanan M. Taleb

Passive design responds to local climate and site conditions in order to maximise the comfort and health of building users while minimising energy use. The key to designing a passive building is to...

Designing with nature: Advancing three-dimensional green spaces in architecture through frameworks for biophilic design and sustainability

August 2023

Weijie Zhong | Torsten Schroeder | Juliette Bekkering

In the transition to a more sustainable built environment over the last two decades, the “greening” of architecture as a popular approach has received widespread attention. However, there are still...

Computational design in architecture: Defining parametric, generative, and algorithmic design

Inês Caetano | Luís Santos | António Leitão

Computation-based approaches in design have emerged in the last decades and rapidly became popular among architects and other designers. Design professionals and researchers adopted different terminologies...

AI for conceptual architecture: Reflections on designing with text-to-text, text-to-image, and image-to-image generators

Anca-Simona Horvath | Panagiota Pouliou

In this paper we present a research-through-design study where we employed text-to-text, text-to-image, and image-to-image generative tools for a conceptual architecture project for the eVolo skyscraper...

Evolution of Islamic geometric patterns

Yahya Abdullahi | Mohamed Rashid Bin Embi

This research demonstrates the suitability of applying Islamic geometrical patterns (IGPs) to architectural elements in terms of time scale accuracy and style matching. To this end, a detailed survey...

Tree-inspired dendriforms and fractal-like branching structures in architecture: A brief historical overview

September 2014

Iasef Md Rian | Mario Sassone

The shapes of trees are complex and fractal-like, and they have a set of physical, mechanical and biological functions. The relation between them always draws attention of human beings throughout history...

Exploring the influence of the built environment on human experience through a neuroscience approach: A systematic review

Tulay Karakas | Dilek Yildiz

The built environment provides a habitat for the most sophisticated mammal in our universe, the human being. Developments in science and technology are forcing us to reconsider the priority of human...

Mapping urban public spaces based on the Nolli map method

September 2021

Huimin Ji | Wowo Ding

In 1748, Giambattista Nolli published a large map of Rome, which accurately depicted the form and system of public space in the city. This graphic presentation has become a powerful tool for urban space...

Kahn's light: The measurable and the unmeasurable of the Bangladesh National Assembly Building

Blerim Lutolli | Kaltrina Jashanica

To incline and unfold the constituent elements of Louis I. Kahn's architecture and philosophy, a precise attention must lie on the fundamental objectives that he vigorously pursued and mastered, especially...

Studies on sustainable features of vernacular architecture in different regions across the world: A comprehensive synthesis and evaluation

December 2019

Anh Tuan Nguyen | Nguyen Song Ha Truong | David Rockwood | Anh Dung Tran Le

Due to the increasing pressure brought by recent global environmental problems, building designers are embracing regionalism and the knowledge of traditional structures, arguing that these structures...

Healthy campus by open space design: Approaches and guidelines

December 2014

Stephen Siu Yu Lau | Zhonghua Gou | Yajing Liu

This paper examines the architectural and landscape design strategies and intentions for green, open spaces facilities targeting stress alleviation for learning environments such as those of university...

Geometric proportions: The underlying structure of design process for Islamic geometric patterns

December 2012

Loai M. Dabbour

This paper discusses geometric proportions which serve as a design tool, especially for Islamic pattern design. The key role of geometry in Islamic patterns is discussed with relation to proportions...

Cultural heritage tourism and urban regeneration: The case of Fez Medina in Morocco

Available online 15 June 2024

Djamel Boussaa | Muhammed Madandola

After Morocco gained independence in 1956, the country's historic cities, including Fez, Marrakesh, and Meknes, experienced rapid urban growth, decay, and the destruction of their rich cultural and...

The impact of the design of learning spaces on attention and memory from a neuroarchitectural approach: A systematic review

Mar Llorens-Gámez | Juan Luis Higuera-Trujillo | Carla Sentieri Omarrementeria | Carmen Llinares

Enriched environments in animal models have demonstrated that exposure to an optimal stimulus improves behavior, cognition, and genomics. However, the evidence base for the neurophysiological influence...

Short- and long-term effects of architecture on the brain: Toward theoretical formalization

Andréa de Paiva | Richard Jedon

The physical environment affects people's behavior and wellbeing. Some effects can be easily noticed through observation, whereas others require an in-depth study to be understood and measured. Although...

What is a Korean officetel? Case study on Bundang New Town

Emilien Gohaud | Seungman Baek

The purpose of this study is to identify a little-known element in Korea׳s housing typology: the officetel. A portmanteau of the English office and hotel, the officetel was originally a work facility...

A triangular architectural relation model among sustainability, beauty, and power

August 2024

Ján Legény | Robert Špaček | Tomáš Hubinský | Lucia Benkovičová

Sustainability, beauty, and power are notions that hit our contemporary perception every day. However, they have been an integral part of architecture and urban planning in various forms since the ages....

Integrated design of transport infrastructure and public spaces considering human behavior: A review of state-of-the-art methods and tools

Liu Yang | Koen H. van Dam | Arnab Majumdar | Bani Anvari | Washington Y. Ochieng | Lufeng Zhang

In order to achieve holistic urban plans incorporating transport infrastructure, public space and the behavior of people in these spaces, integration of urban design and computer modeling is a promising...

Quality of public space and sustainable development goals: analysis of nine urban projects in Spanish cities

Raimundo Bambó Naya | Pablo de la Cal Nicolás | Carmen Díez Medina | Isabel Ezquerra | Sergio García-Pérez | Javier Monclús

The starting point of this research is the urban model promoted by the United Nations through the Sustainable Development Goals (SDGs). The importance of public spaces is especially highlighted in Goal...

Integrating algae building technology in the built environment: A cost and benefit perspective

Nimish Biloria | Yashkumar Thakkar

Energy consumption rates have been rising globally at an escalating pace since the last three decades. The exploration of new renewable and clean sources of energy globally is thus gaining prime importance....

Contemporary construction in historical sites: The missing factors

Sina Kamali Tabrizi | Mohamed Gamal Abdelmonem

Historical sites (HSs) are akin to living entities, and their existence is perpetuated through the erection of new buildings or additions. Many HSs need sustainable development and new construction,...

From BIM model to 3D construction printing: A framework proposal

Rodrigo García-Alvarado | Pedro Soza | Ginnia Moroni | Fernando Pedreros | Martín Avendaño | Pablo Banda | Cristian Berríos

The growth of 3D construction printing needs appropriate integration into the planning and execution phases of building projects. With this aim, the Architecture, Engineering, and Construction sector...

The interaction of history and modern thought in the creation of Iran's architecture by investigating the approaches of past-oriented architecture

Mohsen Kamali

The relationship between tradition and modernity significantly influences society, culture, and architectural discourse. This philosophy offers a framework for exploring the impact of traditional architecture...

  • Guide for Authors
  • Abstracting and Indexing

Stay Informed

Register your interest and receive email alerts tailored to your needs. Sign up below.

research papers on architecture

Cookie notice

We use cookies to analyse and improve our service, to improve and personalise content, advertising and your digital experience. We also share information about your use of our site with our social media, advertising and analytics partners.

Your cookie preferences

We use cookies and similar technologies on our website. Cookies are text files containing small amounts of information, which your computer or mobile device downloads when you visit a website. When you return to websites, or visit websites that use the same cookies, they recognise these cookies and therefore your browsing device.

We use different types of cookies for different things, such as:

  • Analysing how you use our website
  • Giving you a better, more personalised experience
  • Recognising when you’ve sign in

We split our cookies into three distinct categories. You can turn Functional and Performance cookies on and off right here. Strictly necessary cookies can’t be turned off.

Strictly necessary cookies

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.

Performance cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

Functional cookies

These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

Targeting cookies

These cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

  • < Back to search results
  • arq: Architectural Research Quarterly

arq: Architectural Research Quarterly

  • Submit your article

This journal utilises an Online Peer Review Service (OPRS) for submissions. By clicking "Continue" you will be taken to our partner site https://mc.manuscriptcentral.com/arq . Please be aware that your Cambridge account is not valid for this OPRS and registration is required. We strongly advise you to read all "Author instructions" in the "Journal information" area prior to submitting.

  • Information
  • Journal home
  • Journal information
  • Latest issue
  • Open access articles
  • Special 21st anniversary collection
  • FirstView articles
  • Get access Subscribe Check if you have access via personal or institutional login Log in Register
  • Contains open access

arq: Architectural Research Quarterly

  • ISSN: 1359-1355 (Print) , 1474-0516 (Online)
  • Editor: Adam Sharr Newcastle University, UK
  • Editorial board

John Tuomey, Winner of the CICA Pierre Vago Journalism Award 2020

The 2020 CICA Pierre Vago Journalism Award from the International Committee of Architectural Critics (CICA) has been awarded to John Tuomey for his insightful article ‘Bringing Heaven Down to Earth: Reading the Plan of Ronchamp’ published in Volume 23 Issue 1 of  arq: Architectural Research Quarterly .

Read the article for free now.

Check out the related blog post.

Latest articles

Shining and automation: the phenotechnology of ornament.

  • Lars Spuybroek
  • arq: Architectural Research Quarterly , Volume 27 , Issue 3

Archival plans, alterations, and 3D laser scanning of Erik Gunnar Asplund’s Stockholm Public Library

  • Patrick H. Fleming , Anders Bergström

The Ger Plug-In: demonstrating a model for sustainable and affordable housing in Ulaanbaatar’s fringe districts

  • Joshua Bolchover , Jimmy C. K. Tong

Buildings-in-buildings: museological theatres of preservation and display

  • Ashley Paine

Spatial agency practice in Tai O Village: colonial legacies and spatial-architectural approaches to collaborative urban futures

  • Daniel Keith Elkin , Michael Louw , Chi-Yuen Leung , Norah Wang Xiaolu , Markus Wernli

Examining the publicness of spaces on European social housing estates: a position paper

  • Ellen Braae , Henriette Steiner , Svava Riesto , Marie Glaser , Eveline Althaus , Liv Christensen , Lillin Knudtzon , Melissa Anne Murphy , Inger-Lise Saglie , Beata Sirowy , Bettina Lamm , Gilda Berruti , Maria Cerreta , Laura Lieto , Paola Scala , Maria Federica Palestino , Marilena Prisco , Anne Tietjen
  • arq: Architectural Research Quarterly , Volume 27 , Issue 2

Domestic space as an institutional place for ready-made objects: Le Corbusier’s bidet case

  • Sung-Taeg Nam

Korean heat radiated: from Frank Lloyd Wright’s Usonian houses to postwar mass-produced houses in America

  • Hyon-Sob Kim

2023 Journal Citation Reports © Clarivate Analytics

Flexible housing: opportunities and limits

  • Tatjana Schneider , Jeremy Till
  • arq: Architectural Research Quarterly , Volume 9 , Issue 2

Society Logo

ARENA Journal of Architectural Research

Press Logo

Latest news

<h4></h4><h2></h2>

About this journal

AJAR is an online Open Access peer-reviewed journal for all kinds of design research and scholarly research within the architectural field, and has been set up by the  Architectural Research European Network Association (ARENA) network. It welcomes the submission of essays by doctoral students and younger researchers as well as by established architects and academics. Content for the journal is organised under 4 sections: Design, Technology, Practice, Humanities.

Announcements

New joint editor-in-chief and 2022 update, ajar update.

Launched first in April 2016, AJAR has since been publishing a wide range of top-quality essays on a huge variety of architectural subjects. Our rigorous editing procedures mean that the acceptance rate is still only around 10% of the essays submitted to us. Our fully indexed articles have now reached over 40,000 individual article views and around 6,000 article downloads to date. We are open for innovative, deeply researched essays on any aspect of architecture.  

research papers on architecture

Archnet-IJAR: International Journal of Architectural Research

  • Submit your paper
  • Author guidelines
  • Editorial team
  • Indexing & metrics
  • Calls for papers & news

Before you start

For queries relating to the status of your paper pre decision, please contact the Editor or Journal Editorial Office. For queries post acceptance, please contact the Supplier Project Manager. These details can be found in the Editorial Team section.

Author responsibilities

Our goal is to provide you with a professional and courteous experience at each stage of the review and publication process. There are also some responsibilities that sit with you as the author. Our expectation is that you will:

  • Respond swiftly to any queries during the publication process.
  • Be accountable for all aspects of your work. This includes investigating and resolving any questions about accuracy or research integrity .
  • Treat communications between you and the journal editor as confidential until an editorial decision has been made.
  • Include anyone who has made a substantial and meaningful contribution to the submission (anyone else involved in the paper should be listed in the acknowledgements).
  • Exclude anyone who hasn’t contributed to the paper, or who has chosen not to be associated with the research.
  • In accordance with COPE’s position statement on AI tools , Large Language Models cannot be credited with authorship as they are incapable of conceptualising a research design without human direction and cannot be accountable for the integrity, originality, and validity of the published work. The author(s) must describe the content created or modified as well as appropriately cite the name and version of the AI tool used; any additional works drawn on by the AI tool should also be appropriately cited and referenced. Standard tools that are used to improve spelling and grammar are not included within the parameters of this guidance. The Editor and Publisher reserve the right to determine whether the use of an AI tool is permissible.
  • If your article involves human participants, you must ensure you have considered whether or not you require ethical approval for your research, and include this information as part of your submission. Find out more about informed consent .

Generative AI usage key principles

  • Copywriting any part of an article using a generative AI tool/LLM would not be permissible, including the generation of the abstract or the literature review, for as per Emerald’s authorship criteria, the author(s) must be responsible for the work and accountable for its accuracy, integrity, and validity.
  • The generation or reporting of results using a generative AI tool/LLM is not permissible, for as per Emerald’s authorship criteria, the author(s) must be responsible for the creation and interpretation of their work and accountable for its accuracy, integrity, and validity.
  • The in-text reporting of statistics using a generative AI tool/LLM is not permissible due to concerns over the authenticity, integrity, and validity of the data produced, although the use of such a tool to aid in the analysis of the work would be permissible.
  • Copy-editing an article using a generative AI tool/LLM in order to improve its language and readability would be permissible as this mirrors standard tools already employed to improve spelling and grammar, and uses existing author-created material, rather than generating wholly new content, while the author(s) remains responsible for the original work.
  • The submission and publication of images created by AI tools or large-scale generative models is not permitted.

Research and publishing ethics

Our editors and employees work hard to ensure the content we publish is ethically sound. To help us achieve that goal, we closely follow the advice laid out in the guidelines and flowcharts on the COPE (Committee on Publication Ethics) website .

We have also developed our research and publishing ethics guidelines . If you haven’t already read these, we urge you to do so – they will help you avoid the most common publishing ethics issues.

A few key points:

  • Any manuscript you submit to this journal should be original. That means it should not have been published before in its current, or similar, form. Exceptions to this rule are outlined in our pre-print and conference paper policies .  If any substantial element of your paper has been previously published, you need to declare this to the journal editor upon submission. Please note, the journal editor may use  Crossref Similarity Check  to check on the originality of submissions received. This service compares submissions against a database of 49 million works from 800 scholarly publishers.
  • Your work should not have been submitted elsewhere and should not be under consideration by any other publication.
  • If you have a conflict of interest, you must declare it upon submission; this allows the editor to decide how they would like to proceed. Read about conflict of interest in our research and publishing ethics guidelines .
  • By submitting your work to Emerald, you are guaranteeing that the work is not in infringement of any existing copyright.
  • If you have written about a company/individual/organisation in detail using information that is not publicly available, have spent time within that company/organisation, or the work features named/interviewed employees, you will need to clear permission by using the  consent to publish form ; please also see our permissions guidance for full details. If you have to clear permission with the company/individual/organisation, consent must be given either by the named individual in question or their representative, a board member of the company/organisation, or a HR department representative of the company/organisation.
  • You have an ethical obligation and responsibility to conduct your research in adherence to national and international research ethics guidelines, as well as the ethical principles outlined by your discipline and any relevant authorities, and to be transparent about your research methods in such a way that all involved in the publication process may fairly and appropriately evaluate your work. For all research involving human participants, you must ensure that you have obtained informed consent, meaning that you must inform all participants in your work (or their legal representative) as to why the research is being conducted, whether their anonymity is protected, how their data will be stored and used, and whether there are any associated risks from participation in the study; the submitted work must confirm that informed consent was obtained and detail how this was addressed in accordance with our policy on informed consent .  
  • Where appropriate, you must provide an ethical statement within the submitted work confirming that your research received institutional and national (or international) ethical approval, and that it complies with all relevant guidelines and regulations for studies involving humans, whether that be data, individuals, or samples. Specifically, the statement should contain the name and location of the institutional ethics reviewing committee or review board, the approval number, the date of approval, and the details of the national or international guidelines that were followed, as well as any other relevant information. You should also include details of how the work adheres to relevant consent guidelines along with confirming that informed consent was secured for all participants. The details of these statements should ensure that author and participant anonymity is not compromised. Any work submitted without a suitable ethical statement and details of informed consent for all participants, where required, will be returned to the authors and will not be considered further until appropriate and clear documentation is provided. Emerald reserves the right to reject work without sufficient evidence of informed consent from human participants and ethical approval where required.

Third party copyright permissions

Prior to article submission, you need to ensure you’ve applied for, and received, written permission to use any material in your manuscript that has been created by a third party. Please note, we are unable to publish any article that still has permissions pending. The rights we require are:

  • Non-exclusive rights to reproduce the material in the article or book chapter.
  • Print and electronic rights.
  • Worldwide English-language rights.
  • To use the material for the life of the work. That means there should be no time restrictions on its re-use e.g. a one-year licence.

We are a member of the International Association of Scientific, Technical, and Medical Publishers (STM) and participate in the STM permissions guidelines , a reciprocal free exchange of material with other STM publishers.  In some cases, this may mean that you don’t need permission to re-use content. If so, please highlight this at the submission stage.

Please take a few moments to read our guide to publishing permissions  to ensure you have met all the requirements, so that we can process your submission without delay.

Open access submissions and information

All our journals currently offer two open access (OA) publishing paths; gold open access and green open access.

If you would like to, or are required to, make the branded publisher PDF (also known as the version of record) freely available immediately upon publication, you can select the gold open access route once your paper is accepted. 

If you’ve chosen to publish gold open access, this is the point you will be asked to pay the APC (article processing charge) . This varies per journal and can be found on our APC price list or on the editorial system at the point of submission. Your article will be published with a Creative Commons CC BY 4.0 user licence , which outlines how readers can reuse your work.

Alternatively, if you would like to, or are required to, publish open access but your funding doesn’t cover the cost of the APC, you can choose the green open access, or self-archiving, route. As soon as your article is published, you can make the author accepted manuscript (the version accepted for publication) openly available, free from payment and embargo periods.

You can find out more about our open access routes, our APCs and waivers and read our FAQs on our open research page. 

Find out about open

Transparency and Openness Promotion (TOP) Guidelines

We are a signatory of the Transparency and Openness Promotion (TOP) Guidelines , a framework that supports the reproducibility of research through the adoption of transparent research practices. That means we encourage you to:

  • Cite and fully reference all data, program code, and other methods in your article.
  • Include persistent identifiers, such as a Digital Object Identifier (DOI), in references for datasets and program codes. Persistent identifiers ensure future access to unique published digital objects, such as a piece of text or datasets. Persistent identifiers are assigned to datasets by digital archives, such as institutional repositories and partners in the Data Preservation Alliance for the Social Sciences (Data-PASS).
  • Follow appropriate international and national procedures with respect to data protection, rights to privacy and other ethical considerations, whenever you cite data. For further guidance please refer to our  research and publishing ethics guidelines . For an example on how to cite datasets, please refer to the references section below.

Prepare your submission

Manuscript support services.

We are pleased to partner with Editage, a platform that connects you with relevant experts in language support, translation, editing, visuals, consulting, and more. After you’ve agreed a fee, they will work with you to enhance your manuscript and get it submission-ready.

This is an optional service for authors who feel they need a little extra support. It does not guarantee your work will be accepted for review or publication.

Visit Editage

Manuscript requirements

Before you submit your manuscript, it’s important you read and follow the guidelines below. You will also find some useful tips in our structure your journal submission how-to guide.

Article files should be provided in Microsoft Word format.

While you are welcome to submit a PDF of the document alongside the Word file, PDFs alone are not acceptable. LaTeX files can also be used but only if an accompanying PDF document is provided. Acceptable figure file types are listed further below.

Articles should be up to a maximum of 10000 words in length. This includes all text, for example, the structured abstract, references, all text in tables, and figures and appendices.

 

Please allow 280 words for each figure or table.

A concisely worded title should be provided.

The names of all contributing authors should be added to the ScholarOne submission; please list them in the order in which you’d like them to be published. Each contributing author will need their own ScholarOne author account, from which we will extract the following details:

(institutional preferred). . We will reproduce it exactly, so any middle names and/or initials they want featured must be included. . This should be where they were based when the research for the paper was conducted.

In multi-authored papers, it’s important that ALL authors that have made a significant contribution to the paper are listed. Those who have provided support but have not contributed to the research should be featured in an acknowledgements section. You should never include people who have not contributed to the paper or who don’t want to be associated with the research. Read about our for authorship.

If you want to include these items, save them in a separate Microsoft Word document and upload the file with your submission. Where they are included, a brief professional biography of not more than 100 words should be supplied for each named author.

Your article must reference all sources of external research funding in the acknowledgements section. You should describe the role of the funder or financial sponsor in the entire research process, from study design to submission.

All submissions must include a structured abstract, following the format outlined below.

These four sub-headings and their accompanying explanations must always be included:

The following three sub-headings are optional and can be included, if applicable:


You can find some useful tips in our  how-to guide.

The maximum length of your abstract should be 250 words in total, including keywords and article classification (see the sections below).

Your submission should include up to 12 appropriate and short keywords that capture the principal topics of the paper. Our  how to guide contains some practical guidance on choosing search-engine friendly keywords.

Please note, while we will always try to use the keywords you’ve suggested, the in-house editorial team may replace some of them with matching terms to ensure consistency across publications and improve your article’s visibility.

During the submission process, you will be asked to select a type for your paper; the options are listed below. If you don’t see an exact match, please choose the best fit:

You will also be asked to select a category for your paper. The options for this are listed below. If you don’t see an exact match, please choose the best fit:

 Reports on any type of research undertaken by the author(s), including:

 Covers any paper where content is dependent on the author's opinion and interpretation. This includes journalistic and magazine-style pieces.

 Describes and evaluates technical products, processes or services.

 Focuses on developing hypotheses and is usually discursive. Covers philosophical discussions and comparative studies of other authors’ work and thinking.

 Describes actual interventions or experiences within organizations. It can be subjective and doesn’t generally report on research. Also covers a description of a legal case or a hypothetical case study used as a teaching exercise.

 This category should only be used if the main purpose of the paper is to annotate and/or critique the literature in a particular field. It could be a selective bibliography providing advice on information sources, or the paper may aim to cover the main contributors to the development of a topic and explore their different views.

 Provides an overview or historical examination of some concept, technique or phenomenon. Papers are likely to be more descriptive or instructional (‘how to’ papers) than discursive.

Headings must be concise, with a clear indication of the required hierarchy. 

The preferred format is for first level headings to be in bold, and subsequent sub-headings to be in medium italics.

Notes or endnotes should only be used if absolutely necessary. They should be identified in the text by consecutive numbers enclosed in square brackets. These numbers should then be listed, and explained, at the end of the article.

All figures (charts, diagrams, line drawings, webpages/screenshots, and photographic images) should be submitted electronically. Both colour and black and white files are accepted.

There are a few other important points to note:

Tables should be typed and submitted in a separate file to the main body of the article. The position of each table should be clearly labelled in the main body of the article with corresponding labels clearly shown in the table file. Tables should be numbered consecutively in Roman numerals (e.g. I, II, etc.).

Give each table a brief title. Ensure that any superscripts or asterisks are shown next to the relevant items and have explanations displayed as footnotes to the table, figure or plate.

Where tables, figures, appendices, and other additional content are supplementary to the article but not critical to the reader’s understanding of it, you can choose to host these supplementary files alongside your article on Insight, Emerald’s content-hosting platform (this is Emerald's recommended option as we are able to ensure the data remain accessible), or on an alternative trusted online repository. All supplementary material must be submitted prior to acceptance.

Emerald recommends that authors use the following two lists when searching for a suitable and trusted repository:

   

, you must submit these as separate files alongside your article. Files should be clearly labelled in such a way that makes it clear they are supplementary; Emerald recommends that the file name is descriptive and that it follows the format ‘Supplementary_material_appendix_1’ or ‘Supplementary tables’. All supplementary material must be mentioned at the appropriate moment in the main text of the article; there is no need to include the content of the file only the file name. A link to the supplementary material will be added to the article during production, and the material will be made available alongside the main text of the article at the point of EarlyCite publication.

Please note that Emerald will not make any changes to the material; it will not be copy-edited or typeset, and authors will not receive proofs of this content. Emerald therefore strongly recommends that you style all supplementary material ahead of acceptance of the article.

Emerald Insight can host the following file types and extensions:

, you should ensure that the supplementary material is hosted on the repository ahead of submission, and then include a link only to the repository within the article. It is the responsibility of the submitting author to ensure that the material is free to access and that it remains permanently available. Where an alternative trusted online repository is used, the files hosted should always be presented as read-only; please be aware that such usage risks compromising your anonymity during the review process if the repository contains any information that may enable the reviewer to identify you; as such, we recommend that all links to alternative repositories are reviewed carefully prior to submission.

Please note that extensive supplementary material may be subject to peer review; this is at the discretion of the journal Editor and dependent on the content of the material (for example, whether including it would support the reviewer making a decision on the article during the peer review process).

All references in your manuscript must be formatted using one of the recognised Harvard styles. You are welcome to use the Harvard style Emerald has adopted – we’ve provided a detailed guide below. Want to use a different Harvard style? That’s fine, our typesetters will make any necessary changes to your manuscript if it is accepted. Please ensure you check all your citations for completeness, accuracy and consistency.

References to other publications in your text should be written as follows:

, 2006) Please note, ‘ ' should always be written in italics.

A few other style points. These apply to both the main body of text and your final list of references.

At the end of your paper, please supply a reference list in alphabetical order using the style guidelines below. Where a DOI is available, this should be included at the end of the reference.

Surname, initials (year),  , publisher, place of publication.

e.g. Harrow, R. (2005),  , Simon & Schuster, New York, NY.

Surname, initials (year), "chapter title", editor's surname, initials (Ed.), , publisher, place of publication, page numbers.

e.g. Calabrese, F.A. (2005), "The early pathways: theory to practice – a continuum", Stankosky, M. (Ed.),  , Elsevier, New York, NY, pp.15-20.

Surname, initials (year), "title of article",  , volume issue, page numbers.

e.g. Capizzi, M.T. and Ferguson, R. (2005), "Loyalty trends for the twenty-first century",  , Vol. 22 No. 2, pp.72-80.

Surname, initials (year of publication), "title of paper", in editor’s surname, initials (Ed.),  , publisher, place of publication, page numbers.

e.g. Wilde, S. and Cox, C. (2008), “Principal factors contributing to the competitiveness of tourism destinations at varying stages of development”, in Richardson, S., Fredline, L., Patiar A., & Ternel, M. (Ed.s),  , Griffith University, Gold Coast, Qld, pp.115-118.

Surname, initials (year), "title of paper", paper presented at [name of conference], [date of conference], [place of conference], available at: URL if freely available on the internet (accessed date).

e.g. Aumueller, D. (2005), "Semantic authoring and retrieval within a wiki", paper presented at the European Semantic Web Conference (ESWC), 29 May-1 June, Heraklion, Crete, available at: http://dbs.uni-leipzig.de/file/aumueller05wiksar.pdf (accessed 20 February 2007).

Surname, initials (year), "title of article", working paper [number if available], institution or organization, place of organization, date.

e.g. Moizer, P. (2003), "How published academic research can inform policy decisions: the case of mandatory rotation of audit appointments", working paper, Leeds University Business School, University of Leeds, Leeds, 28 March.

 (year), "title of entry", volume, edition, title of encyclopaedia, publisher, place of publication, page numbers.

e.g.   (1926), "Psychology of culture contact", Vol. 1, 13th ed., Encyclopaedia Britannica, London and New York, NY, pp.765-771.

(for authored entries, please refer to book chapter guidelines above)

Surname, initials (year), "article title",  , date, page numbers.

e.g. Smith, A. (2008), "Money for old rope",  , 21 January, pp.1, 3-4.

 (year), "article title", date, page numbers.

e.g.   (2008), "Small change", 2 February, p.7.

Surname, initials (year), "title of document", unpublished manuscript, collection name, inventory record, name of archive, location of archive.

e.g. Litman, S. (1902), "Mechanism & Technique of Commerce", unpublished manuscript, Simon Litman Papers, Record series 9/5/29 Box 3, University of Illinois Archives, Urbana-Champaign, IL.

If available online, the full URL should be supplied at the end of the reference, as well as the date that the resource was accessed.

Surname, initials (year), “title of electronic source”, available at: persistent URL (accessed date month year).

e.g. Weida, S. and Stolley, K. (2013), “Developing strong thesis statements”, available at: https://owl.english.purdue.edu/owl/resource/588/1/ (accessed 20 June 2018)

Standalone URLs, i.e. those without an author or date, should be included either inside parentheses within the main text, or preferably set as a note (Roman numeral within square brackets within text followed by the full URL address at the end of the paper).

Surname, initials (year),  , name of data repository, available at: persistent URL, (accessed date month year).

e.g. Campbell, A. and Kahn, R.L. (2015),  , ICPSR07218-v4, Inter-university Consortium for Political and Social Research (distributor), Ann Arbor, MI, available at: https://doi.org/10.3886/ICPSR07218.v4 (accessed 20 June 2018)

Submit your manuscript

There are a number of key steps you should follow to ensure a smooth and trouble-free submission.

Double check your manuscript

Before submitting your work, it is your responsibility to check that the manuscript is complete, grammatically correct, and without spelling or typographical errors. A few other important points:

  • Give the journal aims and scope a final read. Is your manuscript definitely a good fit? If it isn’t, the editor may decline it without peer review.
  • Does your manuscript comply with our research and publishing ethics guidelines ?
  • Have you cleared any necessary publishing permissions ?
  • Have you followed all the formatting requirements laid out in these author guidelines?
  • If you need to refer to your own work, use wording such as ‘previous research has demonstrated’ not ‘our previous research has demonstrated’.
  • If you need to refer to your own, currently unpublished work, don’t include this work in the reference list.
  • Any acknowledgments or author biographies should be uploaded as separate files.
  • Carry out a final check to ensure that no author names appear anywhere in the manuscript. This includes in figures or captions.

You will find a helpful submission checklist on the website Think.Check.Submit .

The submission process

All manuscripts should be submitted through our editorial system by the corresponding author.

The only way to submit to the journal is through the journal’s ScholarOne site as accessed via the Emerald website, and not by email or through any third-party agent/company, journal representative, or website. Submissions should be done directly by the author(s) through the ScholarOne site and not via a third-party proxy on their behalf.

A separate author account is required for each journal you submit to. If this is your first time submitting to this journal, please choose the Create an account or Register now option in the editorial system. If you already have an Emerald login, you are welcome to reuse the existing username and password here.

Please note, the next time you log into the system, you will be asked for your username. This will be the email address you entered when you set up your account.

Don't forget to add your  ORCiD ID during the submission process. It will be embedded in your published article, along with a link to the ORCiD registry allowing others to easily match you with your work.

Don’t have one yet? It only takes a few moments to register for a free ORCiD identifier .

Visit the ScholarOne support centre  for further help and guidance.

What you can expect next

You will receive an automated email from the journal editor, confirming your successful submission. It will provide you with a manuscript number, which will be used in all future correspondence about your submission. If you have any reason to suspect the confirmation email you receive might be fraudulent, please contact the journal editor in the first instance.

Post submission

Review and decision process.

Each submission is checked by the editor. At this stage, they may choose to decline or unsubmit your manuscript if it doesn’t fit the journal aims and scope, or they feel the language/manuscript quality is too low.

If they think it might be suitable for the publication, they will send it to at least two independent referees for double anonymous peer review.  Once these reviewers have provided their feedback, the editor may decide to accept your manuscript, request minor or major revisions, or decline your work.

While all journals work to different timescales, the goal is that the editor will inform you of their first decision within 60 days.

During this period, we will send you automated updates on the progress of your manuscript via our submission system, or you can log in to check on the current status of your paper.  Each time we contact you, we will quote the manuscript number you were given at the point of submission. If you receive an email that does not match these criteria, it could be fraudulent and we recommend you contact the journal editor in the first instance.

Manuscript transfer service

Emerald’s manuscript transfer service takes the pain out of the submission process if your manuscript doesn’t fit your initial journal choice. Our team of expert Editors from participating journals work together to identify alternative journals that better align with your research, ensuring your work finds the ideal publication home it deserves. Our dedicated team is committed to supporting authors like you in finding the right home for your research.

If a journal is participating in the manuscript transfer program, the Editor has the option to recommend your paper for transfer. If a transfer decision is made by the Editor, you will receive an email with the details of the recommended journal and the option to accept or reject the transfer. It’s always down to you as the author to decide if you’d like to accept. If you do accept, your paper and any reviewer reports will automatically be transferred to the recommended journals. Authors will then confirm resubmissions in the new journal’s ScholarOne system.

Our Manuscript Transfer Service page has more information on the process.

If your submission is accepted

Open access.

Once your paper is accepted, you will have the opportunity to indicate whether you would like to publish your paper via the gold open access route.

If you’ve chosen to publish gold open access, this is the point you will be asked to pay the APC (article processing charge).  This varies per journal and can be found on our APC price list or on the editorial system at the point of submission. Your article will be published with a Creative Commons CC BY 4.0 user licence , which outlines how readers can reuse your work.

For UK journal article authors - if you wish to submit your work accepted by Emerald to REF 2021, you must make a ‘closed deposit’ of your accepted manuscript to your respective institutional repository upon acceptance of your article. Articles accepted for publication after 1st April 2018 should be deposited as soon as possible, but no later than three months after the acceptance date. For further information and guidance, please refer to the REF 2021 website.

All accepted authors are sent an email with a link to a licence form.  This should be checked for accuracy, for example whether contact and affiliation details are up to date and your name is spelled correctly, and then returned to us electronically. If there is a reason why you can’t assign copyright to us, you should discuss this with your journal content editor. You will find their contact details on the editorial team section above.

Proofing and typesetting

Once we have received your completed licence form, the article will pass directly into the production process. We will carry out editorial checks, copyediting, and typesetting and then return proofs to you (if you are the corresponding author) for your review. This is your opportunity to correct any typographical errors, grammatical errors or incorrect author details. We can’t accept requests to rewrite texts at this stage.

When the page proofs are finalised, the fully typeset and proofed version of record is published online. This is referred to as the EarlyCite version. While an EarlyCite article has yet to be assigned to a volume or issue, it does have a digital object identifier (DOI) and is fully citable. It will be compiled into an issue according to the journal’s issue schedule, with papers being added by chronological date of publication.

How to share your paper

Visit our author rights page  to find out how you can reuse and share your work.

To find tips on increasing the visibility of your published paper, read about  how to promote your work .

Correcting inaccuracies in your published paper

Sometimes errors are made during the research, writing and publishing processes. When these issues arise, we have the option of withdrawing the paper or introducing a correction notice. Find out more about our  article withdrawal and correction policies .

Need to make a change to the author list? See our frequently asked questions (FAQs) below.

Frequently asked questions

The only time we will ever ask you for money to publish in an Emerald journal is if you have chosen to publish via the gold open access route. You will be asked to pay an APC (article-processing charge) once your paper has been accepted (unless it is a sponsored open access journal), and never at submission.

At no other time will you be asked to contribute financially towards your article’s publication, processing, or review. If you haven’t chosen gold open access and you receive an email that appears to be from Emerald, the journal, or a third party, asking you for payment to publish, please contact our support team via .

Please contact the editor for the journal, with a copy of your CV. You will find their contact details on the editorial team tab on this page.

Typically, papers are added to an issue according to their date of publication. If you would like to know in advance which issue your paper will appear in, please contact the content editor of the journal. You will find their contact details on the editorial team tab on this page. Once your paper has been published in an issue, you will be notified by email.

Please email the journal editor – you will find their contact details on the editorial team tab on this page. If you ever suspect an email you’ve received from Emerald might not be genuine, you are welcome to verify it with the content editor for the journal, whose contact details can be found on the editorial team tab on this page.

If you’ve read the aims and scope on the journal landing page and are still unsure whether your paper is suitable for the journal, please email the editor and include your paper's title and structured abstract. They will be able to advise on your manuscript’s suitability. You will find their contact details on the Editorial team tab on this page.

Authorship and the order in which the authors are listed on the paper should be agreed prior to submission. We have a right first time policy on this and no changes can be made to the list once submitted. If you have made an error in the submission process, please email the Journal Editorial Office who will look into your request – you will find their contact details on the editorial team tab on this page.

Editor-in-Chief

  • Professor Ashraf M. Salama University of Northumbria, Newcastle upon Tyne - UK [email protected]

Editorial Assistant

  • Heather Montgomery Archnet-IJAR - UK [email protected]

Regional Editor

  • Dr Abeer Allahham (Middle East and Africa) Imam Abdulrahman Bin Faisal University - Saudi Arabia
  • Dr Natasa Cukovic Ignjatovic (Europe) University of Belgrade - Serbia
  • Dr Tammy Gaber (North America) Laurentian University - Canada
  • Professor Doris Kowaltowski (South America) University of Campinas - Brazil
  • Dr Beatriz Maturana (South America) University of Chile - Chile
  • Dr Nabil Mohareb (Middle East and Africa) American University in Cairo - Egypt
  • Dr Lindy Osbourne Burton (Australia) Queensland University of Technology - Australia
  • Dr Hazem Rashed-Ali (North America) Texas Tech University, Lubbock - USA
  • Dr Gehan Selim (Europe) University of Leeds - UK
  • Dr Norsidah Ujang (Asia) University of Putra Malaysia - Malaysia
  • Dr Florian Wiedmann (Europe) University of Nottingham - UK
  • Paul Kidd Emerald Publishing - UK [email protected]

Journal Editorial Office (For queries related to pre-acceptance)

  • Mahim Kaushal Emerald Publishing [email protected]

Supplier Project Manager (For queries related to post-acceptance)

  • Subha Sri Aneesh Emerald Publishing [email protected]

International Advisory Board

  • Professor Jamel Akbar Fatih Sultan Mehmet Vakif University - Turkey
  • Professor Nezar Al-Sayyad University of California, Berkeley - USA
  • Professor Chimay Anumba The University of Florida - USA
  • Professor Ruth Dalton University of Northumbria - UK
  • Professor Besim Hakim FAICP, AIA - USA
  • Professor Rahinah Ibrahim Universiti Putra Malaysia - Malaysia
  • Professor Paul Jones Northumbria University - UK
  • Professor Derya Oktay Maltepe Universitesi, Istanbul - Turkey
  • Professor Attilio Petruccioli Sapienza University of Rome - Italy
  • Professor Farzad Rahimian Professor of Digital Engineering and Manufacturing, Teesside University - UK
  • Professor Nikos Salingaros University of Texas, San Antonio - USA
  • Professor Flora Samuel University of Reading - UK
  • Distinguished Professor Henry Sanoff North Carolina State University - USA
  • Dr Sharon Smith Arizona State University - USA

Review Board

  • Dr Chaham Alalouch Sultan Qaboos University - Sultanate of Oman
  • Professor Ahmed Abd Elrahman Ain Shams University - Egypt
  • Dr Sherif Abdelmohsen American University in Cairo - Egypt
  • Dr Mona Abdelwahab Arab Academy for Science, Technology & Maritime Transport - Egypt
  • Dr Yasemin Afacan Bilkent University - Turkey
  • Dr Khaled Galal Ahmed United Arab Emirates University - UAEU
  • Mr Tarek Ahmed Northumbria University - UK
  • Dr Amer Al-Jokhadar University of Petra - Jordan
  • Professor Kheir Al-Kodmany University of Illinois Chicago - USA
  • Dr Husam AlWaer University of Dundee - UK
  • Dr Nadia Alaily-Mattar Technical University of Munich - Germany
  • Dr Sara Alsaadani Arab Academy for Science Technology and Maritime Transport - Egypt
  • Professor Hasim Altan Prince Mohammad bin Fahd University - Saudi Arabia
  • Professor Jane Anderson Oxford Brookes University - UK
  • Professor Yusuf Arayici University of Northumbria - UK
  • Professor Sahar A Attia Cairo University - Egypt
  • Dr Simona Azzali Prince Sultan University, Riyadh - KSA
  • Professor Samer Bagaeen 100 Resilient Cities / Rockefeller Foundation - UK
  • Professor Amar Bennadji Hanze University of Applied Science - Netherlands
  • Dr Malika Bose Pennsylvania State University, State College - USA
  • Dr James Brown Umea University - Sweden
  • Professor Hernan Casakin Ariel University - Israel
  • Dr Nadia Charalambous University of Cyprus - Cyprus
  • Dr Carin Combrinck University of Pretoria - South Africa
  • Professor Marwa Dabaieh Malmö University - Sweden
  • Dr Aparna Datey University of QUeensland - Australia
  • Dr Hermie Delport University of Cape Town - South Africa
  • Professor Halime Demirkan Bilkent University - Turkey
  • Dr Mirjana Devetakovic University of Belgrade - Serbia
  • Professor Branka Dimitrijevic University of Strathclyde - UK
  • Professor Karine Dupre Griffith University, Queensland - Australia
  • Dr Marwa El-Ashmouni Beni-Suef University - Egypt
  • Professor Dalila El-Kerdany Cairo University - Egypt
  • Professor Ahmed O. El-Kholei Menoufia University - Egypt
  • Dr Amira Elnokaly Lincoln School of Architecture - UK
  • Dr Heba Elsharkawy Kingston University, London - UK
  • Professor Abeer Elshater Ain Shams University - Egypt
  • Professor Roberto Fabbri Zayed University, Abu Dhabi - United Arab Emirates
  • Dr Alia Fadel Leeds Beckett University - UK
  • Professor Leen Fakhoury German Jordanian University - Jordan
  • Professor Nisha Fernando University of Kansas, Lawrence - USA
  • Professor Hisham Gabr Cairo University - Egypt
  • Professor Mohamed Gamal Abdelmonem Nottingham Trent University - UK
  • Professor Elsa Garavaglia Politecnico di Milano, Milan - Italy
  • Dr Remah Gharib American University in Cairo - Egypt
  • Dr Neveen Hamza Newcastle Univerisity - UK
  • Dr Selma Harrington MRIAI, HonAIA, Architects Council of Europe - Ireland
  • Professor Joseph Heathcott The New School, New York - USA
  • Dr Fahriye Hilal Halicioglu Dokuz Eylul University - Turkey
  • Professor Justin B. Hollander Tufts University, Massachusetts - USA
  • Dr Suha Jaradat Edinburgh Napier University - UK
  • Dr Jiayi Jin Northumbria University - UK
  • Dr Matthew Jones University of West England - UK
  • Hesam Kamalipour Cardiff University - United Kingdom
  • Dr Orcun Kepez OKDW: Orcun Kepez Design Workshop - Turkey
  • Professor Heba Allah E. Khalil Cairo University - Egypt
  • Dr Smita Khan Visvesvaraya National Institute of Technology - India
  • Professor Mi Jeong Kim Hanyang University - Korea
  • Dr Michael Kleiss Clemson University - USA
  • Dr Georgia Lindsay University of Tasmania - Australia
  • Professor Fuad Mallick Brac University - Bangladesh
  • Dr Cameron McEwan University of Northumbria - UK
  • Professor Naglaa Megahed Port Said University - Egypt
  • Dr Biserka Mitrovic University of Belgrade - Serbia
  • Dr Jolanda Morkel STADIO Higher Education - South Africa
  • Professor Ibrahim Motawa Ulster University, Belfast - UK
  • Professor Ayman Othman The British University in Egypt - Egypt
  • Dr Celen Pasalar North Carolina State University - USA
  • Professor Ana Pereira Roders Delft University of Technology (TU Delft) - Netherlands
  • Joe Ravetz University of Manchester - United Kingdom
  • Professor Rabee Reffat Assiut University - Egypt
  • Hans Sagan University of Bergen - Norway
  • Dr Mahmoud Reza Saghafi University of Art, Esfahan - Iran
  • Professor Rachel Sara Birmingham City University - UK
  • Dr Urmi Sengupta Queen's University Belfast - UK
  • Dr Ahlam Ammar Sharif The Hashemite University - Jordan
  • Ahmed Soliman Concordia University - Canada
  • Dr Hadas Sopher Ariel University - Israel
  • Dr Ana Souto Nottingham Trent University - UK
  • Dr Ayse Torun Northumbria University - UK
  • Professor Hulya Turgut Istanbul Ozyegin University - Turkey

Citation metrics

CiteScore 2023

Further information

CiteScore is a simple way of measuring the citation impact of sources, such as journals.

Calculating the CiteScore is based on the number of citations to documents (articles, reviews, conference papers, book chapters, and data papers) by a journal over four years, divided by the number of the same document types indexed in Scopus and published in those same four years.

For more information and methodology visit the Scopus definition

CiteScore Tracker 2024

(updated monthly)

CiteScore Tracker is calculated in the same way as CiteScore, but for the current year rather than previous, complete years.

The CiteScore Tracker calculation is updated every month, as a current indication of a title's performance.

2023 Impact Factor

The Journal Impact Factor is published each year by Clarivate Analytics. It is a measure of the number of times an average paper in a particular journal is cited during the preceding two years.

For more information and methodology see Clarivate Analytics

5-year Impact Factor (2023)

A base of five years may be more appropriate for journals in certain fields because the body of citations may not be large enough to make reasonable comparisons, or it may take longer than two years to publish and distribute leading to a longer period before others cite the work.

Actual value is intentionally only displayed for the most recent year. Earlier values are available in the Journal Citation Reports from Clarivate Analytics .

Publication timeline

Time to first decision

Time to first decision , expressed in days, the "first decision" occurs when the journal’s editorial team reviews the peer reviewers’ comments and recommendations. Based on this feedback, they decide whether to accept, reject, or request revisions for the manuscript.

Data is taken from submissions between 1st June 2023 and 31st May 2024

Acceptance to publication

Acceptance to publication , expressed in days, is the average time between when the journal’s editorial team decide whether to accept, reject, or request revisions for the manuscript and the date of publication in the journal. 

Data is taken from the previous 12 months (Last updated July 2024)

Acceptance rate

The acceptance rate is a measurement of how many manuscripts a journal accepts for publication compared to the total number of manuscripts submitted expressed as a percentage %

Data is taken from submissions between 1st June 2023 and 31st May 2024 .

This figure is the total amount of downloads for all articles published early cite in the last 12 months

(Last updated: July 2024)

This journal is abstracted and indexed by

  • Arts and Humanities Citation Index
  • British Library, United Kingdom
  • Avery Index to Architectural Periodicals 
  • EBSCO-Current Abstracts-Art and Architecture 
  • CNKI: China National Knowledge Infrastructure
  • Library of Congress, United States
  • Norwegian Register for Scientific Journals
  • Pro-Quest 

Reviewer information

Peer review process.

This journal engages in a double-anonymous peer review process, which strives to match the expertise of a reviewer with the submitted manuscript. Reviews are completed with evidence of thoughtful engagement with the manuscript, provide constructive feedback, and add value to the overall knowledge and information presented in the manuscript.

The mission of the peer review process is to achieve excellence and rigour in scholarly publications and research.

Our vision is to give voice to professionals in the subject area who contribute unique and diverse scholarly perspectives to the field.

The journal values diverse perspectives from the field and reviewers who provide critical, constructive, and respectful feedback to authors. Reviewers come from a variety of organizations, careers, and backgrounds from around the world.

All invitations to review, abstracts, manuscripts, and reviews should be kept confidential. Reviewers must not share their review or information about the review process with anyone without the agreement of the editors and authors involved, even after publication. This also applies to other reviewers’ “comments to author” which are shared with you on decision.

research papers on architecture

Resources to guide you through the review process

Discover practical tips and guidance on all aspects of peer review in our reviewers' section. See how being a reviewer could benefit your career, and discover what's involved in shaping a review.

More reviewer information

Thank you to the 2023 Reviewers of Archnet-IJAR

The publishing and editorial teams would like to thank the following, for their invaluable service as 2023 reviewers for this journal. We are very grateful for the contributions made. With their help, the journal has been able to publish such high...

Thank you to the 2022 Reviewers of Archnet-IJAR: International Journal of Architectural Research

The publishing and editorial teams would like to thank the following, for their invaluable service as 2022 reviewers for this journal. We are very grateful for the contributions made. With their help, the journal has been able to publish such high...

Thank you to the 2021 Reviewers of Archnet-IJAR

The publishing and editorial teams would like to thank the following, for their invaluable service as 2021 reviewers for this journal. We are very grateful for the contributions made. With their help, the journal has ...

Literati awards

2023 literati award winners banner

Archnet-IJAR - Literati Award Winners 2023

We are pleased to announce our 2023 Literati Award winners. Outstanding Paper Development of Special Needs Classroom ...

research papers on architecture

Archnet-IJAR - Literati Award Winners 2022

We are pleased to announce our 2022 Literati Award winners. Outstanding Paper Beyond the pandemic: the role...

research papers on architecture

Archnet-IJAR - Literati Award Winners 2021

We are pleased to announce our 2021 Literati Award winners. Outstanding Paper The impact of urban façade qu...

Archnet-IJAR is an interdisciplinary scholarly journal of architecture, urban design and planning, and built environment studies.

Signatory of DORA logo

Aims and scope

Archnet-IJAR: International Journal of Architectural Research (ARCH) aims at establishing a bridge between theory and practice in these fields. The journal acts as a platform that reports on the latest research findings for examining buildings and urban environments and debates innovative approaches for creating responsive environments.

Archnet-IJAR is truly international and aims at strengthening ties between scholars, academics, and practitioners from the global north and the global south with contributors and readers reaching across the boundaries of cultures and geographies.

Archnet-IJAR publishes articles in two broad areas that address a wide range of topics and scales:

  • Architectural and Design Research: involves a range of topics that include architectural pedagogy and design studio teaching practices; architectural and sustainable design; design methods and architectural theories; architectural criticism; design and project programming; environment-behaviour studies; application of information technologies; post-occupancy and facility performance evaluation; and social and cultural factors in design.
  • Cities and Urban Research: involves a range of topics that include governance and political factors contributing to the shaping of communities, cities and urban regions; community planning; sustainable urban conservation; environmental planning and eco-development; housing policy; planning, and design; new urbanism; everyday urbanism; sustainable development; urban design assessment; and urban studies.

Brief History

Archnet-IJAR was launched in March 2007 as part of Archnet , considered to be the most comprehensive digital platform for architects, planners, urban designers, interior designers, landscape architects, and scholars working in these fields, developed at the MIT School of Architecture and Planning in close co-operation with the Aga Khan Documentation Centre (AKDC) of MIT Libraries.

Professor Ashraf M. Salama has been leading Archnet-IJAR  since its inception. In 2018,  Archnet-IJAR  was acquired by Emerald in order to foster its exposure and international appeal while enhancing its global presence.

Latest articles

These are the latest articles published in this journal (Last updated: July 2024 )

Location-based examination of the characteristics of university campuses in Istanbul

The typology of school mapping synergies: a diagrammatic morphological evaluation of middle schools in iran, selecting applications to increase the efficiency of distance learning in architectural design, top downloaded articles.

These are the most downloaded articles over the last 12 months for this journal (Last updated: July 2024 )

The Impact of ASPECTSS-based Design Intervention in Autism School Design: a Case Study

Identifying key urban design attributes for enhanced sense of safety - the case of el-sherouk city in cairo, the impact of biophilic design in university study areas on students' productivity.

These are the top cited articles for this journal, from the last 12 months according to Crossref (Last updated: July 2024 )

Sustainable Development Goals and the Future of Architectural Education -- Cultivating SDGs-Centred Architectural Pedagogies

Analytical hierarchy process for ranking green neighbourhood efforts in the middle east and north africa region, years of education and research-driven in sustainable architecture: where do we stand and where do we go, related journals.

This journal is part of our Property management & built environment collection. Explore our Property management & built environment subject area to find out more.  

See all related journals

Open House International: Sustainable & Smart Architecture and Urban Studies

Open House International: Sustainable & Smart Architecture and Urban Studies (OHI) is an interdisciplinary research...

research papers on architecture

Engineering, Construction and Architectural Management

CIB-encouraged, Engineering, Construction and Architectural Management publishes papers on global research breakthroughs...

research papers on architecture

Journal of Engineering, Design and Technology

Journal of Engineering, Design and Technology is a CIB-encouraged journal publishing research at the intersection of...

research papers on architecture

  • Review article
  • Open access
  • Published: 18 September 2020

Senses of place: architectural design for the multisensory mind

  • Charles Spence   ORCID: orcid.org/0000-0003-2111-072X 1  

Cognitive Research: Principles and Implications volume  5 , Article number:  46 ( 2020 ) Cite this article

251k Accesses

89 Citations

32 Altmetric

Metrics details

Traditionally, architectural practice has been dominated by the eye/sight. In recent decades, though, architects and designers have increasingly started to consider the other senses, namely sound, touch (including proprioception, kinesthesis, and the vestibular sense), smell, and on rare occasions, even taste in their work. As yet, there has been little recognition of the growing understanding of the multisensory nature of the human mind that has emerged from the field of cognitive neuroscience research. This review therefore provides a summary of the role of the human senses in architectural design practice, both when considered individually and, more importantly, when studied collectively. For it is only by recognizing the fundamentally multisensory nature of perception that one can really hope to explain a number of surprising crossmodal environmental or atmospheric interactions, such as between lighting colour and thermal comfort and between sound and the perceived safety of public space. At the same time, however, the contemporary focus on synaesthetic design needs to be reframed in terms of the crossmodal correspondences and multisensory integration, at least if the most is to be made of multisensory interactions and synergies that have been uncovered in recent years. Looking to the future, the hope is that architectural design practice will increasingly incorporate our growing understanding of the human senses, and how they influence one another. Such a multisensory approach will hopefully lead to the development of buildings and urban spaces that do a better job of promoting our social, cognitive, and emotional development, rather than hindering it, as has too often been the case previously.

Significance statement

Architecture exerts a profound influence over our well-being, given that the majority of the world’s population living in urban areas spend something like 95% of their time indoors. However, the majority of architecture is designed for the eye of the beholder, and tends to neglect the non-visual senses of hearing, smell, touch, and even taste. This neglect may be partially to blame for a number of problems faced by many in society today including everything from sick-building syndrome (SBS) to seasonal affective disorder (SAD), not to mention the growing problem of noise pollution. However, in order to design buildings and environments that promote our health and well-being, it is necessary not only to consider the impact of the various senses on a building’s inhabitants, but also to be aware of the way in which sensory atmospheric/environmental cues interact. Multisensory perception research provides relevant insights concerning the rules governing sensory integration in the perception of objects and events. This review extends that approach to the understanding of how multisensory environments and atmospheres affect us, in part depending on how we cognitively interpret, and/or attribute, their sources. It is argued that the confusing notion of synaesthetic design should be replaced by an approach to multisensory congruency that is based on the emerging literature on crossmodal correspondences instead. Ultimately, the hope is that such a multisensory approach, in transitioning from the laboratory to the real world application domain of architectural design practice, will lead on to the development of buildings and urban spaces that do a better job of promoting our social, cognitive, and emotional development, rather than hindering it, as has too often been the case previously.

Introduction

We are visually dominant creatures (Hutmacher, 2019 ; Levin, 1993 ; Posner, Nissen, & Klein, 1976 ). That is, we all mostly tend to think, reason, and imagine visually. As Finnish architect Pallasmaa ( 1996 ) noted almost a quarter of a century ago in his influential work The eyes of the skin: Architecture and the senses, architects have traditionally been no different in this regard, designing primarily for the eye of the beholder (Bille & Sørensen, 2018 ; Pallasmaa, 1996 , 2011 ; Rybczynski, 2001 ; Williams, 1980 ). Elsewhere, Pallasmaa ( 1994 , p. 29) writes that: “The architecture of our time is turning into the retinal art of the eye. Architecture at large has become an art of the printed image fixed by the hurried eye of the camera . ” The famous Swiss architect Le Corbusier ( 1991 , p. 83) went even further in terms of his unapologetically oculocentric outlook, writing that: “I exist in life only if I can see”, going on to state that: “I am and I remain an impenitent visual—everything is in the visual” and “one needs to see clearly in order to understand”. Commenting on the current situation, Canadian designer Bruce Mau put it thus: “We have allowed two of our sensory domains—sight and sound—to dominate our design imagination. In fact, when it comes to the culture of architecture and design, we create and produce almost exclusively for one sense—the visual.” (Mau, 2018 , p. 20; see also Blesser & Salter, 2007 ).

Such visual dominance makes sense or, at the very least, can be explained or accounted for neuroscientifically (Hutmacher, 2019 ; Meijer, Veselič, Calafiore, & Noppeney, 2019 ). After all, it turns out that far more of our brains are given over to the processing of what we see than to dealing with the information from any of our other senses (Gallace, Ngo, Sulaitis, & Spence, 2012 ). For instance, according to Felleman and Van Essen ( 1991 ), more than half of the cortex is engaged in the processing of visual information (see also Eberhard, 2007 , p. 49; Palmer, 1999 , p. 24; though note that others believe that the figure is closer to one third). This figure compares to something like just 12% of the cortex primarily dedicated to touch, around 3% to hearing, and less than 1% given over to the processing of the chemical senses of smell and taste. Footnote 1 Information theorists such as Zimmerman ( 1989 ) arrived at a similar hierarchy, albeit with a somewhat different weighting for each of the five main senses. In particular, Zimmermann estimated a channel capacity (in bits/s) of 10 7 for vision, 10 6 for touch, 10 5 for hearing and olfaction, and 10 3 for taste (gustation).

Figure  1 schematically illustrates the hierarchy of attentional capture by each of the senses as envisioned by Morton Heilig, the inventor of the Sensorama, the world’s first multisensory virtual reality apparatus (Heilig, 1962 ), when writing about the multisensory future of cinema in an article first published in 1955 (see Heilig, 1992 ). Nevertheless, while commentators from many different disciplines would seem to agree on vision’s current pre-eminence, one cannot help but wonder what has been lost as a result of the visual dominance that one sees wherever one looks in the world of architecture (“see” and “look” being especially apposite terms here).

figure 1

Heilig ( 1992 ) ranked the order in which he believed our attention to be captured by the various senses. According to Heilig’s rankings: vision, 70%; audition, 20%; olfaction, 5%; touch, 4%; and taste, 1%. Does the same hierarchy (and weighting) apply to our appreciation of architecture, one might wonder? And is attentional capture the most relevant metric anyway?

While the hegemony of the visual (see Levin, 1993 ) is a phenomenon that appears across most aspects of our daily lives, the very ubiquity of this phenomenon certainly does not mean that the dominance of the visual should not be questioned (e.g., Dunn, 2017 ; Hutmacher, 2019 ). For, as Finnish architect and theoretician Pallasmaa ( 2011 , p. 595) notes: “Spaces, places, and buildings are undoubtedly encountered as multisensory lived experiences. Instead of registering architecture merely as visual images, we scan our settings by the ears, skin, nose, and tongue.” Elsewhere, he writes that: “Architecture is the art of reconciliation between ourselves and the world, and this mediation takes place through the senses” (Pallasmaa, 1996 , p. 50; see also Böhme, 2013 ). We will return later to question the visual dominance account, highlighting how our experience of space, as of anything else, is much more multisensory than most people realize.

Review outline

While architectural practice has traditionally been dominated by the eye/sight, a growing number of architects and designers have, in recent decades, started to consider the role played by the other senses, namely sound, touch (including proprioception, kinesthesis, and the vestibular sense), smell, and, on rare occasions, even taste. It is, then, clearly important that we move beyond the merely visual (not to mention modular) focus in architecture that has been identified in the writings of Juhani Pallasmaa and others, to consider the contribution that is made by each of the other senses (e.g., Eberhard, 2007 ; Malnar & Vodvarka, 2004 ). Reviewing this literature constitutes the subject matter of the next section. However, beyond that, it is also crucial to consider the ways in which the senses interact too. As will be stressed later, to date there has been relatively little recognition of the growing understanding of the multisensory nature of the human mind that has emerged from the field of cognitive neuroscience research in recent decades (e.g., Calvert, Spence, & Stein, 2004 ; Stein, 2012 ).

The principal aim of this review is therefore to provide a summary of the role of the human senses in architectural design practice, both when considered individually and, more importantly, when the senses are studied collectively. For it is only by recognizing the fundamentally multisensory nature of perception that one can really hope to explain a number of surprising crossmodal environmental or atmospheric interactions, such as between lighting colour and thermal comfort (Spence, 2020a ) or between sound and the perceived safety of public spaces (Sayin, Krishna, Ardelet, Decré, & Goudey, 2015 ), that have been reported in recent years.

At the same time, however, this review also highlights how the contemporary focus on synaesthetic design in architecture (see Pérez-Gómez, 2016 ) needs to be reframed in terms of the crossmodal correspondences (see Spence, 2011 , for a review), at least if the most is to be made of multisensory interactions and synergies that affect us all. Later, I want to highlight how accounts of multisensory interactions in architecture in terms of synaesthesia tend to confuse matters, rather than to clarify them. Accounting for our growing understanding of crossmodal interactions (specifically the emerging field of crossmodal correspondences research) and multisensory integration will help to explain how it is that our senses conjointly contribute to delivering our multisensory (and not just visual) experience of space. One other important issue that will be discussed later is the role played by our awareness of the multisensory atmosphere of the indoor environments in which we spend so much of our time.

Looking to the future, the hope is that architectural design practice will increasingly incorporate our growing understanding of the human senses, and how they influence one another. Such a multisensory approach will hopefully lead to the development of buildings and urban spaces that do a better job of promoting our social, cognitive, and emotional development, rather than hindering it, as has too often been the case previously. Before going any further, though, it is worth highlighting a number of the negative outcomes for our well-being that have been linked to the sensory aspects of the environments in which we spend so much of our time.

Negative health consequences of neglecting multisensory stimulation

It has been suggested that the rise in sick building syndrome (SBS) in recent decades (Love, 2018 ) can be put down to neglect of the olfactory aspect of the interior environments where city dwellers have been estimated to spend 95% of their lives (e.g., Ott & Roberts, 1998 ; Velux YouGov Report, 2018 ; Wargocki, 2001 ). Indeed, as of 2010, more people around the globe lived in cities than lived in rural areas (see UN-Habitat, 2010 and United Nations Department of Economic and Social Affairs, 2018 ). One might also be tempted to ask what responsibility, if any, architects bear for the high incidence of seasonal affective disorder (SAD) that has been documented in northern latitudes (Cox, 2017 ; Heerwagen, 1990 ; Rosenthal, 2019 ; Rosenthal et al., 1984 ). To give a sense of the problem of “light hunger” (as Heerwagen, 1990 , refers to it), Terman ( 1989 ) claimed that as many as 2 million people in Manhattan alone experience seasonal affective and behavioural changes severe enough to require some form of additional light stimulation during the winter months.

According to Pallasmaa ( 1994 , p. 34), Luis Barragán, the self-taught Mexican architect famed for his geometric use of bright colour (Gregory, 2016 ) felt that most contemporary houses would be more pleasant with only half their window surface. However, while such a suggestion might well be appropriate in Mexico, where Barragán’s work is to be found, many of us (especially those living in northern latitudes in the dark winter months) need as much natural light as we can obtain to maintain our psychological well-being. That said, Barragán is not alone in his appreciation of darkness and shadow. Some years ago, Japanese writer Junichirō Tanizaki also praised the aesthetic appeal of shadow and darkness in the native architecture of his home country in his extended essay on aesthetics, In praise of shadows (Tanizaki, 2001 ).

One of the problems with the extensive use of windows in northern climates is related to poor heat retention, an issue that is becoming all the more prominent in the era of sustainable design and global warming. One solution to this particular problem that has been put forward by a number of technology-minded researchers is simply to replace windows by the use of large screens that relay a view of nature for those who, for whatever reason, have to work in windowless offices (Kahn Jr. et al., 2008 ). However, the limited research that has been conducted on this topic to date suggests that the beneficial effects of being seated near to the window in an office building cannot easily be captured by seating workers next to such video-screens instead.

Similarly, the failure to fully consider the auditory aspects of architectural design may help to explain some part of the global health crisis associated with noise pollution interfering with our sleep, health, and well-being (Owen, 2019 ). The neglect of architecture’s fundamental role in helping to maintain our well-being is a central theme in Pérez-Gómez’s ( 2016 ) influential book Attunement: Architectural meaning after the crisis of modern science. Pérez-Gómez is the director of the History and Theory of Architecture Program at McGill University in Canada. Along similar lines, geographer J. Douglas Porteous had already noted some years earlier that: “Notwithstanding the holistic nature of environmental experience, few researchers have attempted to interpret it in a very holistic [or multisensory] manner.” (Porteous, 1990 , p. 201). Finally, here, it is perhaps also worth noting that there are even some researchers who have wanted to make a connection between the global obesity crisis and the obesogenic environments that so many of us inhabit (Lieberman, 2006 ). The poor diet of multisensory stimulation that we experience living a primary indoor life has also been linked to the growing sleep crisis apparently facing so many people in society today (Walker, 2018 ).

Designing for the modular mind

Researchers working in the field of environmental psychology have long stressed the impact that the sensory features of the built environment have on us (e.g., Mehrabian & Russell, 1974 , for an influential early volume detailing this approach). Indeed, many years ago, the famous modernist Swiss architect Le Corbusier ( 1948 ) made the intriguing suggestion that architectural forms “work physiologically upon our senses.” Inspired by early work with the semantic differential technique, researchers would often attempt to assess the approach-avoidance, active-passive, and dominant-submissive qualities of a building or urban space. This approach was based on the pleasure, arousal, and dominance (PAD) model that has long been dominant in the field. However, it is important to stress that in much of their research, the environmental psychologists took a separate sense-by-sense approach (e.g., Zardini, 2005 ).

The majority of researchers have tended to focus their empirical investigations on studying the impact of changing the stimulation presented to just one sense at a time. More often than not, in fact, they would focus on a single sensory attribute, such as, for example, investigating the consequences of changing the colour (hue) of the lighting or walls (e.g., Bellizzi, et al., 1983 ; Bellizzi & Hite, 1992 ; Costa, Frumento, Nese, & Predieri, 2018 ; Crowley, 1993 ), or else just modulating the brightness of the ambient lighting (e.g., Gal, Wheeler, & Shiv, 2007 ; Xu & LaBroo, 2014 ). Such a unisensory (and, in some cases, unidimensional) approach undoubtedly makes sense inasmuch as it may help to simplify the problem of studying how design affects us (Malnar & Vodvarka, 2004 ). What is more, such an approach is also entirely in tune with the modular approach to mind that was so popular in the fields of psychology and cognitive neuroscience in the closing decades of the twentieth century (e.g., Barlow & Mollon, 1982 ; Fodor, 1983 ). At the same time, however, it can be argued that this sense-by-sense approach neglects the fundamentally multisensory nature of mind, and the many interactions that have been shown to take place between the senses.

The visually dominant approach to research in the field of environmental psychology also means that far less attention has been given over to studying the impact of the auditory (e.g., Blesser & Salter, 2007 ; Kang et al., 2016 ; Schafer, 1977 ; Southworth, 1969 ; Thompson, 1999 ), tactile, somatosensory or embodied (e.g., Heschong, 1979 ; Pallasmaa, 1996 ; Pérez-Gómez, 2016 ), or even the olfactory qualities of the built environment (e.g., Bucknell, 2018 ; Drobnick, 2002 , 2005 ; Henshaw, McLean, Medway, Perkins, & Warnaby, 2018 ) than on the impact of the visual. Furthermore, until very recently, little consideration has been given by the environmental psychologists to the question of how the senses interact, one with another, in terms of their influence on an individual. This neglect is particularly striking given that the natural environment, the built environment, and the atmosphere of a space are nothing if not multisensory (e.g., Bille & Sørensen, 2018 ). In fact, it is no exaggeration to say that our response to the environments, in which we find ourselves, be they built or natural, is always going to be the result of the combined influence of all the senses that are being stimulated, no matter whether we are aware of their influence or not (this is a point to which we will return later).

Given that those of us living in urban environments, which as we have seen is now the majority of us, spend more than 95% of our lives indoors (Ott & Roberts, 1998 ), architects would therefore seem to bear at least some responsibility for ensuring that the multisensory attributes of the built environment work together to deliver an experience that positively stimulates the senses, and, by so doing, facilitates our well-being, rather than hinders it (see also Pérez-Gómez, 2016 , on this theme). Crucially, however, a growing body of cognitive neuroscience research now demonstrates that while we are often unaware of, or at least pay little conscious attention to the subtle sensory cues that may be conveyed by a space (e.g., Forster & Spence, 2018 ), that certainly does not mean that they do not affect us. In fact, the sensory qualities or attributes of the environment have long been known to affect our health and well-being in environments as diverse as the hospital and the home, and from the office to the gym (e.g., Spence, 2002 , 2003 , 2021 ; Spence & Keller, 2019 ). What is more, according to the research that has been published to date, environmental multisensory stimulation can potentially affect us at the social, emotional, and cognitive levels.

It can be argued, therefore, that we all need to pay rather more attention to our senses and the way in which they are being stimulated than we do at present (see also Pérez-Gómez, 2016 , on this theme). You can call it a mindful approach to the senses (Kabat-Zinn, 2005 ), Footnote 2 though my preferred terminology, coined in an industry report published almost 20 years ago, is “sensism” (see Spence, 2002 ). Sensism provides a key to greater well-being by considering the senses holistically, as well as how they interact, and incorporating that understanding into our everyday lives. The approach also builds on the growing evidence of the nature effect (Williams, 2017 ) and the fact that we appear to benefit from, not to mention actually desire, the kinds of environments in which our species evolved. As support for the latter claim, consider only how it has recently emerged that most people set their central heating to a fairly uniform 17–23 °C, meaning that the average indoor temperature and humidity most closely matches the mild outdoor conditions of west central Kenya or the Ethiopian highlands (i.e., the place where human life is first thought to have evolved), better than anywhere else (Just, Nichols, & Dunn, 2019 ; Whipple, 2019 ).

Architectural design for each of the senses

It is certainly not the case that architects have uniformly ignored the non-visual senses (e.g., see Howes, 2005 , 2014 ; McLuhan, 1961 ; Pallasmaa, 1994 , 2011 ; Ragavendira, 2017 ). For instance, in their 2004 book on Sensory design , Malnar and Vodvarka talk about challenging visual dominance in architectural design practice by giving a more equal weighting to all of the senses (Malnar & Vodvarka, 2004 ; see also Mau, 2019 ). Meanwhile, Howes ( 2014 ) writes of the sensory monotony of the bungalow-filled suburbs and of the corporeal experience of skyscrapers as their presence looms up before those on the sidewalk below. At the same time, however, there is also a sense in which it is the gaze of the inhabitants of those tall buildings who are offered the view that is prioritized over the other senses.

However, very often the approach as, in fact, evidenced by Malnar and Vodvarka ( 2004 ) has been to work one sense at a time. Until recently, that is, one finds exactly the same kind of sense-by-sense (or unisensory) approach in the worlds of interior design (Bailly Dunne & Sears, 1998 ), advertising (Lucas & Britt, 1950 ), marketing (Hultén, Broweus, & Dijk, 2009 ; Krishna, 2013 ; Lindstrom, 2005 ), and atmospherics (see Bille & Sørensen, 2018 , on architectural atmospherics; and Kotler, 1974 , on the theme of store atmospherics). Recently, there has been a growing recognition of the importance of the non-visual senses to various fields of design (Haverkamp, 2014 ; Lupton & Lipps, 2018 ; Malnar & Vodvarka, 2004 ). As yet, however, there has not been sufficient recognition of the extent to which the senses interact. As Williams ( 1980 , p. 5) noted some 40 years ago: “Aside from meeting common standards of performance, architects do little creatively with acoustical, thermal, olfactory, and tactile sensory responses.” As we will see later, it is not clear that much has changed since.

The look of architecture

There are a number of ways in which visual perception science can be linked to architectural design practice. For instance, think only of the tricks played on the eyes by the trapezoidal balconies on the famous The Future apartment building in Manhattan (see Fig.  2 ). They appear to slant downward when viewed from one side while appearing to slope upward instead, if viewed from the other. The causes of such a visual illusion can, at the very least, be meaningfully explained in terms of visual perception research (Bruno & Pavani, 2018 ).

figure 2

The Future apartment building at 200 East 32nd Street in Manhattan. Architectural design that appeals primarily to the eye? [Credit Jeffrey Zeldman, and reprinted under Creative Commons agreement]

Cognitive neuroscientists have recently demonstrated that we have an innate preference for visual curvature, be it in internal space (Vartanian et al., 2013 ), or for the furniture that is found within that space (Dazkir & Read, 2012 ; see also Lee, 2018 ; Thömmes & Hübner, 2018 ). We typically rate curvilinear forms as being more approachable than rectilinear ones (see Fig.  3 ). Angular forms, especially when pointing downward/toward us, may well be perceived as threatening, and hence are somewhat more likely to trigger an avoidance response (Salgado-Montejo, Salgado, Alvarado, & Spence, 2017 ). As Ingrid Lee, former design director at IDEO New York put it in her book, Joyful: The surprising power of ordinary things to create extraordinary happiness : “Angular objects, even if they’re not directly in your path as you move through your home, have an unconscious effect on your emotions. They may look chic and sophisticated, but they inhibit our playful impulses. Round shapes do just the opposite. A circular or elliptical coffee table changes a living room from a space for sedate, restrained interaction to a lively center for conversation and impromptu games” (Lee, 2018 , p. 142). One might consider here whether Lee’s comments can be scaled up to describe how we move through the city. Does the visually striking building shown in Fig.  4 , for instance, really promote joyfulness and a carefree travel through the urban environment. It seems doubtful, given the evidence suggesting that viewing angular shapes, even briefly, has been shown to trigger a fear response in the amygdala, the part of the brain that is involved in emotion (e.g., LeDoux, 2003 ). Meanwhile, Liu, Bogicevic, and Mattila ( 2018 ) have noted how the round versus angular nature of the servicescape also influences the consumer response in service encounters.

figure 3

A selection of the interiors shown to participants in a neuroimaging study designed to assess viewers’ approach-avoidance motivation in response to curvilinear vs. rectilinear spaces. [High/Low roof; Open/Enclosed space.] [Figure reprinted with permission from Vartanian et al., 2013 ]

figure 4

Montcalm Shoreditch Signature Tower Hotel, 151–157 City Road, London, completed 2015 by SMC Alsop Architects. What is lost when architectural design focuses on eye appeal? [Figure copyright Ian Ritchie, RA]

The height of the ceiling has also been shown to exert an influence over our approach-avoidance responses, and perhaps even our style of thinking (Baird, Cassidy, & Kurr, 1978 ; Meyers-Levy & Zhu, 2007 ; Vartanian et al., 2015 ). However, here it should also be born in mind that the visual perception of space is significantly influenced by colour and lighting (Lam, 1992 ; Manav, Kutlu, & Küçükdoğu, 2010 ; Oberfeld, Hecht, & Gamer, 2010 ; von Castell, Hecht, & Oberfeld, 2018 ). Given many such psychological observations, it should perhaps come as no surprise to find that links between cognitive neuroscience and architecture have grown rapidly in recent years (Choo, Nasar, Nikrahei, & Walther, 2017 ; Eberhard, 2007 ; Mallgrave, 2011 ; Robinson & Pallasmaa, 2015 ). At the same time, however, it is also worth remembering that it has primarily been people’s response to examples or styles of architecture that have been presented visually (via a monitor), with the participant lying horizontal, that have been studied to date, given the confines of the brain-scanning environment (though see also Papale, Chiesi, Rampinini, Pietrini, & Ricciardi, 2016 ). Footnote 3

At the same time, however, it is important to realize that it is not just our visual cortex that responds to architecture. For, as Frances Anderton writes in The Architectural Review : “We appreciate a place not just by its impact on our visual cortex but by the way in which it sounds, it feels and smells. Some of these sensual experiences elide, for instance our full understanding of wood is often achieved by a perception of its smell, its texture (which can be appreciated by both looking and feeling) and by the way in which it modulates the acoustics of the space.” (Anderton, 1991 , p. 27). The multisensory appreciation of quality here linking to a growing body of research on multisensory shitsukan perception - shitsukan , the Japanese word for “a sense of material quality” or “material perception” (see Fujisaki, 2020 ; Komatsu & Goda, 2018 ; Spence, 2020b ). The following sub-sections summarize some of the key findings on how the non-visual sensory attributes of the built and urban environment affect us, when considered individually.

The sound of space: are you listening?

What a space sounds like is undoubtedly important (Bavister, Lawrence, & Gage, 2018 ; McLuhan, 1961 ; Porteous & Mastin, 1985 ; Thompson, 1999 ). Sounds can, after all, provide subtle cues as to the identity or proportions of a space, even hinting at its function (Blesser & Salter, 2007 ; Eberhard, 2007 ; Robart & Rosenblum, 2005 ). As Pallasmaa ( 1994 , p. 31) notes: “Every building or space has its characteristic sound of intimacy or monumentality, rejection or invitation, hospitality or hostility.” However, more often than not, discussion around sound and architectural design tends to revolve around how best to avoid, or minimize, unwanted noise (see Owen, 2019 , on growing concerns regarding the latter). Indeed, as J. Douglas Porteous notes: “with the rapid urbanization of the world’s population, far more attention is being given to noise than to environmental sound … Research has concentrated almost entirely upon a single aspect of sound, the concept of noise or ‘unwanted sound.’” (Porteous, 1990 , p. 48). Some years earlier, Schafer ( 1977 , p. 222) had made much the same point when he wrote that: “The modern architect is designing for the deaf …. The study of sound enters modern architecture schools only as sound reduction, isolation and absorption.” The fact that year-on-year, noise continues to be one of the top complaints from restaurant patrons, perhaps tells us all we need to know about how successful designers have been in this regard (see Spence, 2014 , for a review; Wagner, 2018 ).

There is also an emerging story here regarding the deleterious effects of loud background noise, and the often-beneficial effects of music and soundscapes, on the recovery of patients in the hospital/healthcare setting (see Spence & Keller, 2019 , for a review). Meanwhile, one of the main complaints from those office workers forced to move into one of the open plan offices that have become so popular (amongst employers, if not employees) in recent years (see ‘Redesigning the corporate office’, 2019 ) is around noise distraction (Borzykowski, 2017 ; Burkus, 2016 ; Evans & Johnson, 2000 ). Footnote 4 Once again, one might want to ask what responsibility architects bear. Experimental evidence documenting the deleterious effect of open-plan working has been reported by a number of researchers (e.g., Bernstein & Turban, 2018 ; De Croon, Sluiter, Kuijer, & Frings-Dresen, 2005 ; Otterbring, Pareigis, Wästlund, Makrygiannis, & Lindström, 2018 ).

There is research ongoing in a number of countries to investigate the use of nature sounds, such as, for example, the sound of running water, to help mask other people’s distracting conversations (Hongisto, Varjo, Oliva, Haapakangas, & Benway, 2017 ). Intriguingly, however, it turns out that people’s beliefs about the source of masking sounds, especially in the case of ambiguous noise, can sometimes influence how much relief they provide (Haga, Halin, Holmgren, & Sörqvist, 2016 ). So, for instance, Haga and her colleagues played the same ambiguous pink noise with interspersed white noise to three groups of office-workers. To one control group, the experimenters said nothing, a second group of participants was told that they could hear industrial machinery noise, while a third group was told that they were listening to nature sounds, based on a waterfall, instead. Intriguingly, subjective restoration was significantly higher amongst those who thought that they were listening to the nature sounds than in those who thought that they were listening to industrial noise instead. As might have been expected, the results of the control group, fell somewhere in between.

Paley Park in New York has often been put forward as a particularly elegant solution to the problem of negating unwanted traffic noise in the context of urban design (e.g., Carroll, 1967 ; Prochnik, 2009 ). In 1967, the empty lot resulting from the demolition of the Stork Club on 53rd Street was transformed into a small public park (a so-called pocket park). The space was developed by Zion and Breen. In this case, the acoustic space, think only of the sounds, or better said noise, of the city, is effectively masked by the presence of a waterfall at the far end of the lot (see Fig.  5 ). What is more, the free-standing chairs allow the visitor to move closer to the waterfall should they feel the need to drown out a little more of the urban noise. The greenery growing thickly along the side walls also likely helps to absorb the noise of the city.

figure 5

Paley Park, New York, by Zion and Breen in 1967. [Credit Jim Henderson, and reprinted under Creative Commons agreement]

Music plays an important role in our experience of the built environment - think here only of the Muzak of decades gone by (Lanza, 2004 ). This is as true of the guest’s hotel experience (e.g., when entering the lobby) as it is elsewhere (e.g., in a shopping centre or bar, say). Footnote 5 The sound that greets customers in the lobby is apparently very important to Ian Schrager, the Brooklyn-born entrepreneur who created fabled nightclub Studio 54 in New York. In recent years, he has been working with Marriott to launch The EDITION hotels in a number of major cities, including London and New York. Music plays a key role in the Schrager experience. As the entrepreneur puts it: “The sound of a hotel lobby is often dictated by monotonous, vapid lounge muzak – a zombie-like drone of new jazz and polite house, with the sole purpose of whiling away the waiting time between check-in and check-out.” As might have been expected, the music in the lobbies of The EDITION hotels is carefully curated (Eriksen, 2014 , p. 27). However, the thumping noise of the music from the nightclub/bar that is often also an integral part of the experience offered by these hip venues means that meticulous architectural design is also required in order to limit the spread of unwanted noise through the rest of the building (e.g., so as not to disturb the sleep of those who may be resting in the rooms upstairs). Note here that there are also some increasingly sophisticated solutions - including sound-absorbing panels, as well as active noise cancellation systems - to dampen unwanted sound in open spaces such as restaurants and offices (Clynes, 2012 ).

Designing for “the eyes of the skin”

The tactile element of architecture is often ignored. In fact, very often, the first point of physical contact with a building typically occurs when we enter or leave. Or, as Pallasmaa ( 1994 , p. 33) once evocatively put it: “The door handle is the handshake of the building”. However, once inside a building, it is worth remembering that we will also typically make contact with flooring (Tonetto, Klanovicz, & Spence, 2014 ), hand rails (Spence, 2020d ), elevator buttons, furniture, and the like (though this is, of course, likely to change somewhat in the era of pandemia). As Richard Sennett, author of Flesh and Stone, laments in his critical take on the sensory order of modernity: “sensory deprivation which seems to curse most modern buildings; the dullness, the monotony, and the tactile sterility which afflicts the urban environment” (Sennett, 1994 , p. 15). The absence of tactile interest is also something that Witold Rybczynski author of The Look of Architecture acknowledges when writing that: “Although architecture is often defined in terms of abstractions such as space, light and volume, buildings are above all physical artifacts. The experience of architecture is palpable: the grain of wood, the veined surface of marble, the cold precision of steel, the textured pattern of brick.” (Rybczynski, 2001 , p. 89). Notice here how Rybczynski mentions both texture and temperature, two of the key attributes of tactile sensation(see also Henderson, 1939 ). Temperature change, and change in the flooring material (tatami matting or cedarwood), is also something that the Tom museum for the blind in Tokyo also plays with deliberately (Classen, 1998 , p. 150; Vorreiter, 1989 ; Wagner, 1989 ). There is also a braille poen on the knob of the exit door too.

The careful use of material can evoke tactility as the viewer (or occupant) imagines or mentally simulates what it would feel like to reach out and touch or caress an intriguing surface (Sigsworth, 2019 ; see also Lupton, 2002 ). Juhani Pallasmaa, who has perhaps written more than anyone else on the theme of the tactile, or haptic in architecture, writes that “Natural materials - stone, brick and wood - allow the gaze to penetrate their surfaces and they enable us to become convinced of the veracity of matter … But the materials of today - sheets of glass, enamelled metal and synthetic materials - present their unyielding surfaces to the eye without conveying anything of their material essence or age.” (Pallasmaa, 1994 , p. 29).

Lisa Heschong, architect, and partner of architectural research firm Heschong Mahone Group, has written extensively on the theme of thermal (as opposed to textural) aspects of architectural design in her book Thermal Delight in Architecture (Heschong, 1979 ) . There, she points to examples such as the hearth, the sauna, and Roman and Japanese baths as archetypes of thermal delight about which rituals have developed, the shared experience reinforcing social bonds of affection and ceremony (see also Lupton, 2002 ; Papale et al., 2016 ). At this point, one might also want to mention the much-admired Therme Vals Spa by Peter Zumthor, in Switzerland with their use of different temperatures of both water and touchable surfaces (Ryan, 1997 , though see also Mairs, 2017 ). The tactile element is, in other words, fundamental to the total (multisensory) experience of architectural design. This is true no matter whether the materiality is touched directly or not (i.e., merely seen, inferred, or imagined). So, for example, here one might only think about how looking at a cheap fake marble or wood veneer can make one feel, to realize that touch in often not required to assess material quality, or the lack thereof (see also Karana, 2010 ).

An architecture of the chemical senses

Talking of an architecture of scent, or of taste (these two of the so-called chemical senses), might seem like a step too far. That said, one does come across titles such as Eating Architecture (Horwitz & Singley, 2004 ) and An Architecture of Smell (McCarthy, 1996 ; see also Barbara & Perliss, 2006 ). Footnote 6 Unfortunately, however, all too often, consideration of the olfactory in architectural design practice has focused on the elimination of negative odours. When thinking about the mundane experience of odours in buildings, what immediately comes to mind includes the smell of wood (i.e., building materials), dust, mould, cleaning products, and flowers. As Eberhard ( 2007 , p. 47) puts it: “We all have our favorite smells in a building, as well as ones that are considered noxious. A cedar closet in the bedroom is an easy example of a good smell. The terrible smell of a house that was ravaged by fire or floods is seared in the memory of those who have endured one of these disasters.” This is perhaps no coincidence, given that it tends to be the bad odours, rather than the neutral or positive ones, that have generally proved most effective in immersing us in an experience (Baus & Bouchard, 2017 ; see also Aggleton & Waskett, 1999 ). Research by Schifferstein, Talke, and Oudshoorn ( 2011 ) investigated whether the nightlife experience could be enhanced by the use of pleasant fragrance to mask the stale odour after the indoor smoking ban was introduced a few years ago. Once again, notice how the focus here is on the elimination of the negative stale odours rather than necessarily the introduction of the positive (the latter merely being introduced in order to mask the former).

Jim Drohnik captures the idea of olfactory absence when talking about not just the “white cube” mentality but the “anosmic cube” (Drobnick, 2005 ). The former phrase was famously coined by O’Doherty ( 1999 , 2009 ) in order to describe the then-popular practice of displaying art in gallery spaces that were devoid of colour or any other form of visual distraction. Footnote 7 Some years later, Jim Drobnik introduced the latter phrase in order to highlight the fact that too many spaces are seemingly deliberately designed to have no smell, nor to leave any lasting olfactory trace, either. Footnote 8 And yet, at the same time, it is clear that odour of a space can be incredibly evocative too, as anecdotally noted by Pallasmaa ( 1994 , p. 32) in the following quote: “The strongest memory of a space is often its odor; I cannot remember the appearance of the door to my grandfather’s farm-house from my early childhood, but I do remember the resistance of its weight, the patina of its wood surface scarred by a half century of use, and I recall especially the scent of home that hit my face as an invisible wall behind the door.” And thinking back to my memories of visiting my own grandfather, long since deceased, on his fairground wagon in Bradford, it was undoubtedly the intense smell of “derv” (English slang for diesel-engine road vehicle), the liquid diesel oil that was used for trucks at the time, that I can still remember better than anything else. The residents of buildings tend to adapt to the positive and neutral smells in the buildings we inhabit. This is evidenced by the fact that we are typically only aware of the smell of our own home, what some call building odour, or BO for short, when we return after a long trip away (Dalton & Wysocki, 1996 ; McCooey, 2008 ).

Sick building syndrome and the problem of poor olfactory design

Improving indoor air quality might well also provide an effective means of helping to alleviate some of the symptoms of sick building syndrome (SBS) that were mentioned earlier (Guieysse et al., 2008 ). It is certainly striking how many large outbreaks of this still-mysterious condition reported in the 1980s were linked to the presence of an unfamiliar smell in closed office buildings with little natural ventilation (Wargocki, Wyon, Baik, Clausen, & Fanger, 1999 ; Wargocki, Wyon, Sundell, Clausen, & Fanger, 2000 ). For instance, in June 1986, more that 12% of the workforce of 2500 people working at the Harry S. Truman State Office Building in Missouri came down with the symptoms of SBS over a 3-day period (Donnell Jr. et al., 1989 ). The symptoms presented by some of the workers (including dizziness and difficulty in breathing) were so severe they had to be rushed to the local hospital for emergency treatment. And while a thorough examination of the building subsequently failed to reveal the presence of any particular toxic airborne pollutants that might have been responsible for the outbreak, in the majority of cases, it turned out that the symptoms of SBS were preceded by the perception of unusual odours and inadequate airflow in the building.

According to Donnell Jr. et al. ( 1989 ), these complaints of odours may well have heightened the perception of poor air quality by some employees in the building. This, in turn, may have led to an epidemic anxiety state resulting in the SBS outbreak (Faust & Brilliant, 1981 ). In fact, workers suffering from SBS were more than twice as likely to have noticed a particular odour in the work area before the onset of their symptoms than those who were working in the same building who were unaffected by the outbreak. Footnote 9 At the same time, however, it should also be borne in mind that our tendency to focus on what we see and hear means that we often exhibit olfactory anosmia to ambient scents (Forster & Spence, 2018 ).

To give a sense of the potential scale of the problem, Woods ( 1989 ) estimated that 30–70 million people in the USA alone are exposed to offices that manifest SBS. As such, anything (and everything) that can be done to reduce the symptoms associated with this reaction to the indoor environment (Finnegan, Pickering, & Burge, 1984 ) will likely have a beneficial effect on the health and well-being of many people. At the same time, however, it is perhaps also worth bearing in mind here that the incidence of SBS would seem to have declined in recent years (though see also Joshi, 2008 ; Magnavita, 2015 ; Redlich, Sparer, & Cullen, 1997 ), perhaps suggesting that building design/ventilation has improved as a result of the earlier outbreaks. Footnote 10 That said, it is perhaps also worth noting that there continues to be some uncertainty as to whether the very real symptoms of SBS should be attributed to airborne pollutants, or may instead be better understood as a psychosomatic response to a particular environmental atmosphere (see Fletcher, 2005 and Love, 2018 ). What is more, there has been a move by some researchers to talk in terms of the less pejorative-sounding building-related symptoms (BRS) instead (Niemelä, Seppänen, Korhonen, & Reijula, 2006 ). One more psychological factor that may be relevant here concerns the feeling of a lack of control over one’s multisensory environment that many of those working in ventilated buildings where the windows cannot be opened manually have may indeed play a role in the elicitation of SBS.

Scent and the city: designing fragrant spaces

There are, however, signs that the situation is slowly starting to change with regards to the emphasis placed on olfaction in both architectural and urban design practice. For instance, a number of commentators have noted, not to mention sometimes been puzzled by, the distinctive, yet unexplained, pleasant - and hence, one assumes, deliberately introduced - fragrances that some new constructions appear to have. Just take the case of the Barclays Center arena in Brooklyn, NY, home of the Brooklyn Nets, as a case in point. On its opening in 2013, various commentators in the press drew attention to the distinctive, if not immediately identifiable, scent that appeared to pervade the space, and which appeared to have been added deliberately - almost as if it were intended to be a signature scent for the space (e.g., Albrecht, 2013 ; Doll, 2013 ; Martinez, 2013 ). That said, the idea of fragrancing public spaces dates back at least as far as 1913. In that year, at the opening of the Marmorhaus cinema in Berlin, the fragrance of Marguerite Carré, a perfume by Bourjois, Paris, was deliberately (and innovatively, at least for the time) wafted through the auditorium (Berg-Ganschow & Jacobsen, 1987 ). Meanwhile, in what may well be a sign of things to come, synaesthetic perfumer Dawn Goldsworthy and her scent design company 12:29 recently made the press after apparently creating a bespoke scent for a new US$40 million apartment in Miami (Schroeder, 2018 ). What further opportunities might there be to design distinctive “signature” scents for spaces/buildings, one might ask (Henshaw et al., 2018 ; Jones, 2006 ; Trivedi, 2006 )?

Evidence that the olfactory element of design can be used to affect behaviour change positively includes, for example, the observation that people tend to engage in more cleaning behaviours when there is a hint of citrus in the air (De Lange, Debets, Ruitenburg, & Holland, 2012 ; Holland, Hendriks, & Aarts, 2005 ). In the future, it may not be too much of a stretch to imagine public spaces filled with aromatic flowers and blossoming trees, introduced with the aim of helping to discourage people from littering, and who knows, perhaps even reducing vandalism (see also Steinwald, Harding, & Piacentini, 2014 ). In terms of the cognitive mechanism underlying such crossmodal effects of scent on behaviour, the suggestion, at least in the citrus cleaning example just mentioned, is that smelling an ambient scent that we associate with clean and cleaning then activates, or primes, the associated concepts (Smeets & Dijksterhuis, 2014 ). Having been primed, the suggestion is thus that this makes it that bit more likely that we will engage in behaviours that are congruent or consistent with the primed concept (though see Doyen, Klein, Pichon, & Cleeremans, 2012 ).

Elsewhere, researchers have already demonstrated the beneficial effects that lavender, and other scents normally associated with aromatherapy, have on those who are exposed to them. So, for instance, the latter tend to show reduced stress, better sleep, and even enhanced recovery from illness (see Herz, 2009 ; Spence, 2003 , for reviews; though see also Haehner, Maass, Croy, & Hummel, 2017 ). According to one commentator writing in The New York Times: “While these findings have obvious implications for health care, the opportunities for architecture and urban planning are particularly intriguing. Designers are trained to focus mostly on the visual, but the science of design could significantly expand designers’ sensory palette. Call it medicinal urbanism.” (Hosey, 2013 ). Effects on people’s mood resulting from exposure to ambient scent have been reported in some by no means all studies (Glass & Heuberger, 2016 ; Glass, Lingg, & Heuberger, 2014 ; Haehner et al., 2017 ; Weber & Heuberger, 2008 ). It remains somewhat uncertain though whether the beneficial effects of aromatherapy scents can be explained by priming effects, based on associative learning, as in the case of the clean citrus scents mentioned above (see Herz, 2009 ), versus via a more direct (i.e., less cognitively mediated) physiological route (cf. Harada, Kashiwadani, Kanmura, & Kuwaki, 2018 ).

The olfactory scentscapes, and scent maps of cities, that have been discussed by various researchers (see Fig.  6 ) have also helped to draw people’s attention to the often rich olfactory landscapes offered by many urban spaces (e.g., https://sensorymaps.com/ ; Bucknell, 2018 ; Henshaw, 2014 ; Henshaw et al., 2018 ; Lipps, 2018 ; Lupton & Lipps, 2018 ; Margolies, 2006 ).

figure 6

Scentscape of the city. Spring scents and smells of the city of Amsterdam by Kate McLean. [Credit “Spring Scents & Smells of the City of Amsterdam” © 2013-2014. Digital print. 2000 x 2000 mm. Courtesy of Kate McLean]

The notion of the healing garden has also seen something of a resurgence in recent years, and the benefits now, as historically, are likely to revolve, at least in part, around the healing, or restorative effect of the smell of flowers and plants (e.g., Pearson, 1991 ; see also Ottoson & Grahn, 2005 ). One building that is often mentioned in this regard, namely in terms of its olfactory design credentials, is the Silicon House by architects, SelgasCano, situated on the outskirts of Madrid ( https://www.architectmagazine.com/project-gallery/silicon-house-6143 ). This house is set in what has been described as “a garden of smells”, which emphasize the olfactory, while also stressing the tactile elements of the design. Hence, while the olfactory aspects of architectural design practice have long been ignored, there are at least signs of a revival of interest in stimulating this sense through both architectural and urban design practice.

Architectural taste

The British writer and artist Adrian Stokes once wrote of the “oral invitation of Veronese marble” (Stokes, 1978 , p. 316). And while I must admit that I have never felt the urge to lick a brick, Pallasmaa ( 1996 , p. 59) vividly recounts the urge that he once experienced to explore/connect with architecture using his tongue. He writes that: “Many years ago when visiting the DL James Residence in Carmel, California, designed by Charles and Henry Greene, I felt compelled to kneel and touch the delicately shining white marble threshold of the front door with my tongue. The sensuous materials and skilfully crafted details of Carlo Scarpa’s architecture as well as the sensuous colours of Luis Barragan’s houses frequently evoke oral experiences. Deliciously coloured surfaces of stucco lustro , a highly polished colour or wood surfaces also present themselves to the appreciation of the tongue.”

Perhaps aware of many readers’ presumed scepticism on the theme of the gustatory contribution to architecture, Footnote 11 Pallasmaa writes elsewhere that: “The suggestions that the sense of taste would have a role in the appreciation of architecture may sound preposterous. However, polished and coloured stone as well as colours in general, and finely crafted wood details, for instance, often evoke an awareness of mouth and taste. Carlo Scarpa’s architectural details frequently evoke sensation of taste.” (Pallasmaa, 2011 , p. 595). The suggestion here that “colours in general … often evoke … [a] taste” seemingly linking to the widespread literature on the crossmodal correspondences that have increasingly been documented between colour and basic tastes (see Spence et al., 2015 , for a review). However, rather than describing this in terms of architecture that one can taste, one might more fruitfully refer to the growing literature on crossmodal correspondences instead (see below for more on this theme).

When, in his book Architecture and the brain , Eberhard ( 2007 , p. 47) talks about what the sense of taste has to do with architecture, he suggests that: “You may not literally taste the materials in a building, but the design of a restaurant can have an impact on your ‘conditioned response’ to the taste of the food.” Environmental multisensory effects on tasting is undoubtedly an area that has grown markedly in interest in recent years (e.g., see Spence, 2020c , for a review). It is though worth noting that just as for the olfactory case, some atmospheric effects on tasting may be more cognitively-mediated (e.g., associated with the priming of notions of luxury/expense, or lack thereof) while others may be more direct, as when changing the colour (see Oberfeld, Hecht, Allendorf, & Wickelmaier, 2009 ; Spence, Velasco, & Knoeferle, 2014 ; Torrico et al., 2020 ) or brightness (Gal et al., 2007 ; Xu & LaBroo, 2014 ) of the ambient lighting changes taste/flavour perception.

“An architecture of the seven senses”?

So far in this section, we have briefly reviewed the unisensory contributions of architectural design organized around each of the five main senses (vision audition, touch, smell, and taste). However, seemingly not content with the traditional five, Pallasmaa ( 1994 ) goes further in the title of one of his early articles entitled “An architecture of the seven senses.” While the text itself is not altogether clear, or explicit, on this point, the skeleton and muscles would appear to be the extra senses that Pallasmaa has in mind here. Indeed, the embodied response of people to architecture is definitely something that has captured the imagination, not to mention intrigued, a number of architectural theorists in recent years (e.g., see Bloomer & Moore, 1977 ; Pallasmaa, 2011 ; Pérez-Gómez, 2016 ).

The vestibular sense is also worthy of mention here (see Gulden & Grüsser, 1998 ; Indovina et al., 2005 ). Anyone who has tried out one of the VR simulations of walking along the outside ledge of a tall building will have had the feeling of vertigo. Normally, architects presumably avoid designing structures that may give rise to such discombobulating feelings. That said, the recent increase in popularity of transparent viewing platforms, and bridges, shows that, on occasion, architects are not beyond emphasizing the important contribution made by this normally “silent” sense. For instance, The Grand Canyon Skywalk is a horseshoe-shaped cantilever bridge with a glass walkway at Eagle Point, Arizona that allows visitors to stand 500–800 ft. (150–240 m) above the canyon floor (Yost, 2007 ). Opened in 2007, by 2015, it had attracted more than a million visitors (see Fig.  7 ). While popular, it is perhaps worth noting that a number of such attractions have recently been closed down in parts of China due to safety fears (Ellis-Petersen, 2019 ). Walking on such structures likely also make people more aware of their own corporeality too, thus engaging the proprioceptive and kinaesthetic senses too. On a more mundane level, Heschong ( 1979 , p. 34) draws attention to the importance of bodily movement in the case of the porch swing whose self-propelled movement, prior to air-conditioning, would have been a thermal necessity in the summer months in the southern states of the USA.

figure 7

Skywalk from outside ledge. [Attribution: Complexsimplellc at English Wikipedia reprinted under Creative Commons agreement]

Consideration of the putatively embodied response to architecture might lead one back to Hall’s ( 1966 ) seminal early notion of “proxemics”. Hall used the latter term to describe the differing response to stimuli as a function of their distance from the viewer’s body. It is certainly easy to imagine this linking to contemporary notions concerning the different regions of personal space that have been documented around an observer (e.g., Previc, 1998 ; Spence, Lee, & Stoep, 2017 ). However, while these terms might sound more or less synonymous to cognitive neuroscientists, Malnar and Vodvarka ( 2004 ), both licensed architects, choose to take a much more cautious stance concerning these terms, treating them as referencing distinct phenomena in their own book on sensory design.

Interim summary

While the impact of each of the senses, however many there might be, can undoubtedly be analysed in isolation, as has largely been attempted in the preceding sections, the fact of the matter is that they interact one with another in terms of determining our response to the environment, be it built or natural. So, having briefly addressed the contribution of each of the senses to architectural design practice, when studied individually, the next question to consider is how the senses interact in the perception of environment/atmosphere, as they do in many other aspects of our everyday perception. After all, as Malnar notes: “The point of immersing people within an environment is to activate the full range of the senses.” (Malnar, 2017 , p. 146). Pallasmaa ( 2000 , p. 78) makes a similar point writing that: “Every significant experience of architecture is multi-sensory; qualities of matter, space and scale are measured by the eye, ear, nose, skin, tongue, skeleton and muscle.” (cf. Rasmussen, 1993 ).

Malnar and Vodvarka ( 2004 , p. ix) set the scene for the discussion with the opening lines of the preface of their book on sensory design in architecture, where they write: “What if we designed for all our senses? Suppose, for a moment, that sound, touch, and odour were treated as the equals of sight, and that emotion was as important as cognition. What would our built environment be like is sensory response, sentiment, and memory were critical design factors, more vital even than structure and program?” Indeed, those who take up the challenge of designing for the multisensory mind might well take a tip from one commentator, writing in Advertising Age when talking about product innovation who suggested that: “… the most successful new products appeal on both rational and emotional levels to as many senses as possible.” (Neff, 2000 , p. 22). Architectural design practice, I suggest, would be well-advised to strive for much the same in order to optimally stimulate the multisensory mind.

Although not the primary interest of the present review, it is perhaps also worth noting in passing, how a very similar debate on the importance of designing for the non-visual senses has been playing out amongst those interested specifically in landscape design/architecture (Lynch & Hack, 1984 ; Mahvash, 2007 ; Treib, 1995 ). The garden is a multisensory space and as Mark Treib wrote once in an essay entitled “Must landscape mean?”: “Today might be a good time to once more examine the garden in relation to the senses.”

Designing for the multisensory mind: architectural design for all the senses

The architect must act as a composer that orchestrates space into a synchronization for function and beauty through the senses – and how the human body engages space is of prime importance. As the human body moves, sees, smells, touches, hears and even tastes within a space – the architecture comes to life.
The rhythm of an architecture can be felt by occupants as a result of the architect’s composition – or arrangement of all the sensorial qualities of space. By arranging spatial sensorial features, an architect can lead occupants through the functional and aesthetic rhythms of a created place. Architectural building for all the senses can serve to move occupants – elevating their experience. (quote from a blogpost by Lehman, 2009 ).

One of the most exciting developments in cognitive neuroscience in recent decades has been the growing realization that perception/experience is far more multisensory than anyone had realized (e.g., Bruno & Pavani, 2018 ; Calvert et al., 2004 ; Levent & Pascual-Leone, 2014 ; Stein, 2012 ). That is, what we hear and smell, and what we think about the experience, is often influenced by what we see, and vice versa (Calvert et al., 2004 ; Stein, 2012 ). The senses talk to, and hence influence, one another all the time, though we often remain unaware of these cross-sensory interactions and influences. In fact, wherever neuroscientists look in the human brain, activity appears to be modulated by what is going on in more than one sense, leading, increasingly, to talk of the multisensory mind (Ghazanfar & Schroeder, 2006 ; Talsma, 2015 ). The key question here must therefore be what implications this growing realization of the ubiquity of multisensory cross-talk has for the field of architectural design practice?

The problem is that, as yet, there has been relatively little research directed at the question of how atmospheric/environmental multisensory cues actually interact. Mattila and Wirtz ( 2001 , pp. 273–274) drew attention to this lacuna some years ago when writing that: “Past studies have examined the effects of individual pleasant stimuli such as music, color or scent on consumer behavior, but have failed to examine how these stimuli might interact.” At the outset, when starting to consider the multisensory perception of architecture, it is worth noting that it is rarely something that we attend to. Indeed, as Benjamin ( 1968 , p. 239) once noted: “Architecture has always represented the prototype of a work of art the reception of which is consummated in a state of distraction.” To the extent that such a view is correct, one can say that multisensory architecture is rarely foregrounded in our attention/experience. Juhani Pallasma, meanwhile, has suggested that: “An architectural experience silences all external noise; it focuses attention on one’s very existence.” (Pallasmaa, 1994 , p. 31). Once again, the suggestion here would appear to be that attention is directed away from the building and toward the individual and their place in the world. Given that, on an everyday basis, architecture is typically not foregrounded in our attention/experience, one might legitimately wonder as to whether the multisensory integration of atmospheric/environmental cues takes place, given that they are so often unattended.

According to the laboratory research that has been published on this question to date, the evidence would appear to suggest that while the multisensory integration of unattended cues relating to an object or event certainly can occur, it is by no means guaranteed to do so (see Spence & Frings, 2020 , for a review). Perhaps the more fundamental question here, though, is whether we need to attend to ambient/environmental sensory cues for them to influence us. However, the research that has been published to date would appear to suggest that very often environmental cues influence us even when we are not consciously aware of, or thinking about them.

One particularly striking example of this was reported by researchers who manipulated whether French or German music was played in a supermarket (North, et al., 1997 , 1999 ). The results showed that the majority of the wine purchased was French when French music was played, with this reversing to a majority of German wines being sold when German music was played. The even more striking aspect of these results was the fact that the majority of those interviewed after coming away from the tills denied that the background music had any influence over the choices they made. A number of studies have also shown that scents that we are unaware of, either because they are presented just below the perceptual threshold or because we have become functionally anosmic to their constant presence, can nevertheless still influence us (Li, Moallem, Paller, & Gottfried, 2007 ). Similarly, there is also a suggestion that inaudible infrasound waves (i.e., < 20 Hz) may also affect people without their necessarily being aware of their presence (Weichenberger et al., 2017 ). Meanwhile, in terms of visual annoyance, it has been reported that flickering LED lights that look no different to the naked eye can nevertheless trigger a significantly greater number of headaches that non-flickering lights (e.g., see Wilkins, 2017 ; Wilkins, Nimmo-Smith, Slater, & Bedocs, 1989 ). Once again, therefore, this suggests that ambient sensory phenomena do not necessarily need to be perceptible in order to affect us, adversely or otherwise.

On the benefits of multisensory design: bringing it all together

One demonstration of just how dramatic the benefits of designing for multiple senses can be was reported by Kroner, Stark-Martin, and Willemain ( 1992 ) in a technical report. These researchers examined the effects of an office make-over when a company moved to a new office building. The employees in the new office were given individual control of the temperature, lighting, air quality, and acoustic conditions where they were working. Productivity increased by approximately 15% in the new building. When the individual control of the ambient multisensory environment was disabled in the new building, performance fell by around 2% instead. Trying to balance the influence of each of the senses is one of the aims of Finnish architect Juhani Pallasmaa, whose name we have come across at several points already in this text. As Steven Holl notes in the preface to Pallasmaa’s The eyes of the skin : “I have experienced the architecture of Juhani Pallasmaa, … The way spaces feel, the sound and smell of these places, has equal weight to the way things look.” (Pallasmaa, 1996 , p. 7). One example of multisensory architectural design to which Juhani Pallasmaa draws attention in several of his writings is the Ira Keller Fountain, Portland Oregon (see Fig.  8 ).

figure 8

The Ira Keller Fountain, Portland Oregon. According to Pallasmaa ( 2011 ), p. 596) this is “An architecture for all the senses including the kinaesthetic and olfactory senses.” Once again, the auditory element is provided by the sound of falling water

On the multisensory integration of atmospheric/environmental cues

To date, only a relatively small number of studies have directly studied the influence of combined ambient/atmospheric cues on people’s perception, feelings, and/or behaviour. Mattila and Wirtz ( 2001 ) conducted one of the first sensory marketing studies to be published in this area. These researchers manipulated the olfactory environment (no scent, a low-arousal scent (lavender), or a high-arousal scent (grapefruit)) while simultaneously manipulating the presence of music (no music, low-arousal music, or high-arousal music). When the scent and music were congruent in terms of their arousal potential, the customers rated the store environment more positively, exhibited higher levels of approach and impulse-buying behaviour, and expressed more satisfaction. There is, though, always a very real danger of sensory overload if the combined multisensory input becomes too stimulating (see Malhotra, 1984 ; Simmel, 1995 ).

Meanwhile, in another representative field study, Sayin et al. ( 2015 ) investigated the impact of presenting ambient soundscapes in an underground car park in Paris. In particular, they assessed the effects of introducing western European birdsong or classical instrumental music by Albinoni to the three normally silent stairwells used by members of the general public when exiting the car park. A total of 77 drivers were asked about their feelings on their way out. Birdsong was found to work best in terms of enhancing the perceived safety of the situation - in this case by around 6%. This despite the fact that all of those who were quizzed realized that the sounds that they had heard were coming from loudspeakers. Footnote 12 In an accompanying series of laboratory studies, Sayin et al.’s participants were shown a 60-s first-person perspective video that had been taken in the same Paris car park, or else a short video of someone walking through a metro station in Istanbul. Once again, participants were asked about how safe it felt, about perceived social presence, and about their willingness to purchase a monthly metro pass. Even under these somewhat contrived experimental conditions, the presence of an ambient soundscape once again increased perceived safety as well as the participants’ self-reported intention to purchase a season ticket. It was, though, the sound of people singing Alleluia that proved most effective in terms of enhancing perceived safety amongst those watching the videos. Footnote 13 It is, however, worth bearing in mind here that many of the key results reported in this study were only borderline significant. As such, adequately-powered replication would be a good idea before too much weight is given to these intriguing findings.

Recently, Ba and Kang ( 2019 ) documented crossmodal interactions between ambient sound and smell in a laboratory study that was designed to capture the sensory cues that might be encountered in a typical urban environment. These researchers decided to pair the sounds of birds, conversation, and traffic, with the smells of flowers (lilac, osmanthus), coffee, or bread, at one of three levels (low, medium, or high) in each modality. A complex array of interactions was observed, with increasing stimulus intensity sometimes enhancing the participants’ comfort ratings, while sometimes leading to a negative response instead. While Ba and Kang’s results defy any simple synopsis, given the complex pattern of results reported, their findings nevertheless clearly suggest that sound and scent interact in terms of influencing people’s evaluation of urban design.

The colour of the ambient lighting in an indoor environment has also been shown to influence the perceived ambient temperature and thermal comfort of an environment (e.g., Candas & Dufour, 2005 ; Tsushima, et al., 2020 ; Winzen, Albers, & Marggraf-Micheel, 2014 ). For instance, in one representative study, Winzen and colleagues reported that illuminating a simulated aircraft cabin in warm yellow vs. cool blue-coloured lighting exerted a significant influence over people’s self-reported thermal comfort. The participants rated the environment as feeling significantly warmer under the warm (as compared to the cool) lighting colour. One can only really make sense of such findings from a multisensory perspective (see Spence, 2020a , for a review).

Taken together, then, the results of the representative selection of studies reported in this section demonstrate that our perception of, and/or response to, multisensory environments are undoubtedly influenced by the combined influence of environmental/atmospheric cues in different sensory modalities. So, in contrast to the quote from Mattila and Wirtz ( 2001 ) that we came across a few pages ago, there is now a growing body of empirical research out there demonstrating that atmospheric cues presented in different sensory modalities, such as music, scents, and visual stimuli combine to influence how alerting, or pleasant, a particular environment, or stimulus (such as, for example, a work of art), is rated as being (e.g., Banks, Ng, & Jones-Gotman, 2012 ; Battacharya & Lindsen, 2016 ).

Sensory congruency

In their book, Spaces speak, are you listening ?, Blesser and Salter draw the reader’s attention to the importance of audiovisual congruency in architectural design. They write that: “Aural architecture, with its own beauty, aesthetics, and symbolism, parallels visual architecture. Visual and aural meanings often align and reinforce each other. For example, the visual vastness of a cathedral communicates through the eyes, while its enveloping reverberation communicates through the ears.” (Blesser & Salter, 2007 , p. 3). However, they also draw attention to the incongruency that one experiences sometimes: “Although we expect the visual and aural experience of a space to be mutually supportive, this is not always the case. Consider dining at an expensive restaurant whose decorations evoke a sense of relaxed and pampered elegance, but whose reverberating clatter produces stress, anxiety, isolation, and psychological tension, undermining the possibility of easy social exchange. The visual and aural attributes produce a conflicting response.” (Blesser & Salter, 2007 , p. 3).

Regardless of whether atmospheric/environmental sensory cues are integrated or not, one general principle underpinning our response to multisensory combinations of environmental cues is that those combinations of stimuli that are “congruent” (whatever that term means in this context) will tend to be processed more fluently, and hence be liked more, than those combinations that are deemed incongruent, and hence will often prove more difficult, and effortful, to process (Reber, 2012 ; Reber, Schwarz, & Winkielman, 2004 ; Reber, Winkielman, & Schwartz, 1998 ; Winkielman, Schwarz, Fazendeiro, & Reber, 2003 ; Winkielman, Ziembowicz, & Nowak, 2015 ). Footnote 14 Indeed, it was the putative sensory incongruency between a relaxing slow-tempo music and arousing citrus scent that was put forward as a possible explanation for why Morrin and Chebat ( 2005 ) found that adding scent and sound in the setting of the shopping mall reduced unplanned purchases as compared to either of the unisensory interventions amongst almost 800 shoppers in one North American Mall (see Fig.  9 ).

figure 9

Morrin and Chebat ( 2005 ). Sales figures (unplanned purchases) in mall as a function of music, scent, or the combination of the two. In this case, multisensory stimulation led to a significant reduction in sales, perhaps because low-tempo music was combined with a likely-alerting citrus scent

Congruency can, of course, be defined at multiple levels. For instance, as we have seen already in this section, sensory cues may be more or less congruent in terms of their arousal/relaxation potential (e.g., Homburg, Imschloss, & Kühnl, 2012 ; Mattila & Wirtz, 2001 ). Mahvash ( 2007 , pp. 56–57) talks about the use of congruent cues to convey the notion of coolness: “… the Persian garden with its patterns of light and shadow, reflecting pools, gurgling fountains, scents of flowers and fruits, and gentle cool breezes 'offers an amazing richness of variety of sensory experiences which all serve to reinforce the pervasive sense of coolness'.” However, different sensory inputs may also be deemed congruent or not in terms of their artistic style (see Hasenfus, Martindale, & Birnbaum, 1983 ; Muecke & Zach, 2007 ; cf. Hersey, 2000 , pp. 37–41). It was stylistic congruency that was manipulated in a couple of experiments, conducted both online and in the laboratory by Siefkes and Arielli ( 2015 ). These researchers had their participants explicitly concentrate on and evaluate the style of the buildings shown in one of two architectural styles (baroque or modern - a short video showing five baroque buildings; there were also a short video, focusing on five modern buildings instead). Their results revealed that the buildings were rated as looking more balanced, more coherent, and to a certain degree, more complete, Footnote 15 when viewed while listening to music that was congruent (e.g., baroque architecture with baroque music - specifically Georg Philipp Telemann’s, Concerto Grosso in D major, TWV 54:D3 (1716)) rather than incongruent (e.g., baroque architecture with Philip Glass track from the soundtrack to the movie Koyaanisqatsi).

Before moving on, though, it is worth noting that in this study, as in many of the other studies reported in this section, there is a possibility that the design of the experiments themselves may have resulted in the participants concerned paying rather more attention to the atmospheric/environmental cues (and possibly also their congruency) than is normally likely to be the case when, as was mentioned earlier, the architecture itself fades into the background. Ecological validity may, in other words, have been compromised to a certain degree.

One of the other examples of incongruency that one often comes across is linked to the growing interest in biophilic design. As Pallasmaa ( 1996 , p. 41) notes: “A walk through a forest is invigorating and healing due to the constant interaction of all sense modalities; Bachelard speaks of ‘the polyphony of the senses’. The eye collaborates with the body and the other senses. One’s sense of reality is strengthened and articulated by this constant interaction. Architecture is essentially an extension of nature into the man-made realm …” Footnote 16 No wonder, then, that many designers have been exploring the benefits of bringing elements of nature into interior spaces in order to boost the occupants’ mood and aid relaxation (Spence, 2021 ). However, one has to ask whether the benefits of adding the sounds of a tropical rainforest to a space such as the shopping area of Glasgow airport, say (Treasure, 2007 ), really outweigh the cognitive dissonance likely elicited by hearing such sounds in such an incongruous setting? Similarly, a jungle soundscape was incorporated into the children’s section of Harrods London Department store a few years ago (Harrods’ Toy Kingdom - The Sound Agency | Sound Branding” https://www.youtube.com/watch?v=EVUUG6VvFKQ ). Nature soundscapes have also been introduced into Audi car salesrooms, not to mention BP petrol station toilet facilities (Bashford, 2010 ; Treasure, 2007 ). It is worth noting here that given the important role that congruency has been shown to play at the level of multisensory object/event perception, there is currently a stark paucity of research that has systematically investigated the relevance/importance of congruency at the level of multisensory ambient, or environmental, cues. As the quotes earlier in this section make clear, it is something to which some architects are undoubtedly sensitive, and on which they already have an opinion. Yet the relevant underpinning research still needs to be conducted.

Ultimately, therefore, while the congruency of atmospheric/environmental cues can be defined in various ways, and while incongruency is normally negatively valenced (because it is hard to process), Footnote 17 issues of (in)congruency may often simply not be an issue for the occupants of specific environments. This may either be because the latter simply do not pay attention to the atmospheric/environmental cues (and hence do not register their incongruency) and/or because they have no reason to believe that the stimuli should be combined in the first place.

Sensory dominance

One common feature of configurations of multisensory stimuli that are in some sense incongruent is sensory dominance. And very often, under laboratory conditions, this tends to be vision that dominates (e.g., Hutmacher, 2019 ; Meijer et al., 2019 ; Posner et al., 1976 ). Under conditions of multisensory conflict, the normally more reliable sense sometimes completely dominates the experience of the other senses, as when wine experts can be tricked into thinking that they are drinking red or rosé wine simply by adding some red food dye to white wine (Wang & Spence, 2019 ). Similarly, people’s assessment of building materials has also been shown to be dominated by the visual rather than by the feel (Wastiels, Schifferstein, Wouters, & Heylighen, 2013 ; see also Karana, 2010 ).

At the same time, however, while we are largely visually dominant, the other senses can also sometimes drive our behaviour. For instance, according to an article that appeared in the Wall Street Journal , many people will apparently refuse to check in to a hotel if there is funny smell in the lobby (Pacelle, 1992 ). Such admittedly anecdotal observations, were they to be backed up by robust empirical data, would then support the notion that olfactory atmospheric cues can, at least under certain conditions, also dominate in terms of determining our approach-avoidance behaviour. Meanwhile, a growing number of diners have also reported how they will sometimes leave a restaurant if the noise is too loud (see Spence, 2014 , for a review; Wagner, 2018 ), resonating with the quote from Blesser and Salter ( 2007 ) that we came across a little earlier.

One other potentially important issue to bear in mind here concerns the “assumption of unity”, or coupling/binding priors that constitute an important factor modulating the extent of crossmodal binding in the case of multisensory object/event perception, according to the literature on the currently popular Bayesian causal inference (see Chen & Spence, 2017 ; Rohe, Ehlis, & Noppeney, 2019 , for reviews). Coupling priors can be thought of as the internalized long-term statistics of the environment (e.g., Girshick, Landy, & Simoncelli, 2011 ). Does it, I wonder, make sense to suggest that we have such priors concerning the unification of environmental/atmospheric cues? Or might it be, perhaps, that in a context in which we are regularly exposed to incongruent environmental/atmospheric multisensory cues - just think of how music is played from loudspeakers without any associated visual referent - that out priors concerning whether to integrate what we see, hear, smell, and feel will necessarily be related, in any meaningful sense, may well be reduced substantially. See Badde, Navarro, and Landy ( 2020 ) and Gau and Noppeney ( 2016 ) on the role of context in the strength of the common-source priors multisensory binding.

Hence, no matter whether one wants to create a tranquil space (Pheasant, Horoshenkov, Watts, & Barret, 2008 ) or one that arouses (Mattila & Wirtz, 2001 ), the senses interact as they do in various other configurations and situations (e.g., Jahncke, Eriksson, & Naula, 2015 ; Jiang, Masullo, & Maffei, 2016 ). There are, in fact, numerous examples where the senses have been shown to interact in the experience and rating of urban environments (e.g., Ba & Kang, 2019 ; Van Renterghem & Botteldooren, 2016 ).

Crossmodal correspondences in architectural design practice

The field of synaesthetic design has grown rapidly in recent years (e.g., Haverkamp, 2014 ; Merter, 2017 ; Spence, 2012b ). According to architectural historian, Alberto Pérez-Gómez, mentioned earlier, the Philips Pavilion designed by Le Corbusier for the 1958 Brussels world’s fair (Fig.  10 ) attempted to deliver a multisensory experience, or atmosphere by means of “forced” synaesthesia (Pérez-Gómez, 2016 , p. 19). Footnote 18 The interior audiovisual environment was mostly designed by Le Corbusier and Iannis Xenakis (see Sterken, 2007 ). From those descriptions that have survived there were many coloured lights and projections and a looping soundscape that was responsive to people’s movement through the space (Lootsma, 1998 ; Muecke & Zach, 2007 ).

figure 10

Philips pavilion was a World’s Fair pavilion designed for Expo 1958 in Brussels by the office of Le Corbusier. The building, which was commissioned by the electronics manufacturer Philips, was designed to house a multimedia spectacle of sound, light and projections celebrating post-war technological progress. Iannis Xenakis was responsible for much of the project management. [Figure copyright Wikimedia Commons: Wouter Hagens]

True to his oculocentric approach, mentioned at the start of this piece, Le Corbusier apparently concentrated on the visual aspects of the “Poème Electronique”, the multimedia show that was projected inside the pavilion. Meanwhile, his site manager, Iannis Xenakis created “Concret PH” - the soundscape, broadcast over 300 loudspeakers, that accompanied it. It is, though, unclear how much connection there actually was between the auditory and visual components of this multimedia presentation. The notion of parallel, but unconnected, stimulation to eye and ear comes through in Xenakis’ quote that: “we are capable of speaking two languages at the same time. One is addressed to the eyes, the other to the ears.” (Varga, 1996 , p. 114). Moreover, in his later work (e.g., Polytopes), Xenakis pursued the idea of creating a total dissociation between visual and aural perception in large abstract sound and light installations (Sterken, 2007 , p. 33).

At several points throughout his book Pérez-Gómez ( 2016 ), stresses the importance of “synaesthesia” to architecture, without, unfortunately, ever really quite defining what he means by the term. All one finds are quotes such as the following: “primordial synesthetic perception ” , p. 11; “perception is primordially synesthetic”, p. 20; “synaesthesia as the primary modality of human perception”, p. 71. Pérez-Gómez ( 2016 , p. 149) draws heavily on Merleau-Ponty’s ( 1962 , p. 235) Phenomenology of Perception , quoting lines such as: “The senses translate each other without any need of an interpreter, they are mutually comprehensible without the intervention of any idea.” A few pages later he cites Heidegger “truths as correspondence” (Pérez-Gómez, 2016 , p. 162). This does, though, sound more like a description of the ubiquitous crossmodal correspondences (Marks, 1978 ; Spence, 2011 ) than necessarily fitting with contemporary definitions of synaesthesia, though the distinction between the two phenomena admittedly remains fiercely contested (e.g., Deroy & Spence, 2013 ; Sathian & Ramachandran, 2020 ). Abath ( 2017 ) has done a great job of highlighting the confusion linked to Merleau-Ponty’s incoherent use of the term synaesthesia, that has, in turn, gone on to “infect” the writings of other architectural theorists, such as Pérez-Gómez ( 2016 ).

Talking of synaesthetic design may then be something of a misnomer (Spence, 2015 ), the fundamental idea here is to base one’s design decisions on the sometimes surprising connections between the senses that we all share, such as, for example, between high-pitched sounds and small, light, fast-moving objects (e.g., Spence, 2011 , 2012a ). It is important to highlight the fact that while these crossmodal correspondences are often confused with synaesthesia, they actually constitute a superficially similar, but fundamentally quite different empirical phenomenon (see Deroy & Spence, 2013 ).

We have already come across a number of examples of crossmodal correspondences being incorporated, knowingly or otherwise, in design decisions. Just think about the use of temperature-hue correspondences (Tsushima et al., 2020 ; see Spence, 2020a , for a review). The lightness-elevation mapping (crossmodal correspondence) might also prove useful from a design perspective (Sunaga, Park, & Spence, 2016 ). And colour-taste and sound-taste correspondences have already been incorporated into the design of multisensory experiential spaces (e.g., Spence et al., 2014 ; see also Adams & Doucé, 2017 ; Adams & Vanrie, 2018 ). Once one accepts the importance of crossmodal correspondences to environmental design, then this represents an additional level at which sensory atmospheric cues may be judged as congruent (e.g., see Spence et al., 2014 ). One of the important questions that remains for future research, though, is to determine whether there may be a priority of one kind of crossmodal congruency over others when they are manipulated simultaneously.

Conclusions

While it would seem unrealistic that the dominance, or hegemony (Levin, 1993 ), of the visual will be overturned any time soon, that does not mean that we should not do our best to challenge it. As critic David Michael Levin puts it: “I think it is appropriate to challenge the hegemony of vision – the ocular-centrism of our culture. And I think we need to examine very critically the character of vision that predominates today in our world. We urgently need a diagnosis of the psychosocial pathology of everyday seeing – and a critical understanding of ourselves as visionary beings.” (Levin, 1993 , p. 205). While not specifically talking about architecture, what we can all do is to adopt a more multisensory perspective and be more sensitive to the way in which the senses interact, be it in architecture or in any other aspect of our everyday experiences.

By designing experiences that congruently engage more of the senses we may be better able to enhance the quality of life while at the same time also creating more immersive, engaging, and memorable multisensory experiences (Bloomer & Moore, 1977 ; Gallace & Spence, 2014 ; Garg, 2019 ; Spence, 2021 ; Ward, 2014 ). Stein and Meredith ( 1993 , p. xi), two of the foremost multisensory neuroscientists of the last quarter century, summarized this idea when they suggesting in the preface to their influential volume The merging of the senses that: “The integration of inputs from different sensory modalities not only transforms some of their individual characteristics, but does so in ways that can enhance the quality of life. Integrated sensory inputs produce far richer experiences than would be predicted from their simple coexistence or the linear sum of their individual products.”

There is growing interest across many fields of endeavour in design that moves beyond this one dominant, or perhaps even overpowering, sense (Lupton & Lipps, 2018 ). The aim is increasingly to design for experience rather than merely for appearance. At the same time, however, it is also important to note that progress has been slow in translating the insights from the academic field of multisensory research to the world of architectural design practice, as noted by licensed architect Joy Monice Malnar when writing about her disappointment with the entries at the 2015 Chicago Architecture Biennial. There, she writes: “So, where are we? What is the current state of the art? Sadly, the current research on multisensory environments appearing in journals such as The Senses & Society does not appear to be impacting artists and architects participating in the Chicago Biennial. Nor are the discoveries in neuroscience offering new information about how the brain relates to the physical environment.” (Malnar, 2017 , p. 153). Footnote 19 At the same time, however, the adverts for at least one new residential development in Barcelona promising residents the benefits of “Sensory living” ( The New York Times International Edition in 2019, August 31–September 1, p. 13), suggests that at least some architects/designers are starting to realize the benefits of engaging their clients’/customers’ senses. The advert promised that the newly purchased apartment would “provoke their senses”.

Ultimately, it is to be hoped that as the growing awareness of the multisensory nature of human perception continues to spread beyond the academic community, those working in the field of architectural design practice will increasingly start to incorporate the multisensory perspective into their work; and, by so doing, promote the development of buildings and urban spaces that do a better job of promoting our social, cognitive, and emotional well-being.

Availability of data and materials

Not applicable.

It is, though, worth highlighting the fact that the denigration of the sense of smell in humans, something that is, for example, also found in older volumes on advertising (Lucas & Britt, 1950 ), turns out to be based on somewhat questionable foundations. For, as noted by McGann ( 2017 ) in the pages of Science , the downplaying of olfaction can actually be traced back to early French neuroanatomist Paul Broca wanting to make more space in the frontal parts of the brain (i.e., the frontal lobes) for free will in the 1880s. In order to do so, he apparently needed to reduce the size of the olfactory cortex accordingly.

Or, as Tuan ( 1977 , p. 18) once put it: “an object or place achieves concrete reality when our experience of it is total, that is, through all the senses as well as with the active and reflective mind”

Relevant here, Mitchell ( 2005 ) has suggested that there are, in fact, no uniquely visual media.

This an issue close to my own heart currently, as the Department where I work was closed due to the discovery of large amounts of asbestos (see BBC News, 2017 ). The university and the latest firm of architects involved in the project are currently battling it out to determine how much of the new building will be given over to individual offices versus shared open-plan offices and hot-desking. The omens, I have to say (at least pre-pandemic), from what is happening elsewhere in the education sector, do not look good (Kinman & Garfield, 2015 ).

Here, one might also consider the Abercrombie & Fitch clothing brand. For a number of years, the chain also managed to craft a distinctive dance sound to match the dark nightclub-like appearance of their interiors.

Writer Tanizaki ( 2001 ), in his essay on aesthetics In Praise of Shadows , also draws attention to the close interplay that exists, or better said, once existed, between architectural design and food/plateware design in traditional Japanese culture.

Intriguingly, Kirshenblatt-Gimblett ( 1991 , p. 416) describes the white cube as an apparatus for “single-sense epiphanies”.

This despite Baudelaire’s line that the smell of a room is “the soul of the apartment” (quoted in Corbin, 1986 , p. 169).

It is also worth noting how suggestible people can be concerning the presence of an odour, as first demonstrated by Slosson’s ( 1899 ) classic classroom demonstration of students in the lecture theatre detecting a fictitious odour in the air.

It has also been suggested that the energy crisis in the 1970s may also have been partly to blame, as that tended to result in lower ventilation standards.

Indeed, one might wonder whether the latter quote refers more to oral stereoagnosis (Jacobs, Serhal, & van Steenberghe, 1998 ), than specifically to gustation (see also Waterman Jr., 1917 , for the suggestion that the tongue can be more revealing than the hand).

This response is very different from the aesthetic disappointment, or even disgust, felt by the man once hypothetically described by the philosopher Immanuel Kant who was very much enjoying listening to a nightingale’s song until realizing that he was listening to a mechanical imitation instead (Kant, 2000 ).

The owner of the car park did not like the sound of this particular sonic intervention, meaning that the researchers were unable to try it out in the field.

At the same time, however, one might consider how marble, one of the most highly prized building materials is in some sense incongruent, given the rich textured patterning of the veined appearance of the surface is typically perfectly smooth to the touch.

These were the anchors on three of the bipolar semantic differential scales used in this study.

The value of connecting with nature in architectural design practice was stressed by an advertorial for an arctic hideaway that suggests that: “True luxury today is connecting with nature and feeling that your senses work again” as appeared in an article in Blue Wings magazine (December 2019, p. 38).

It should, though, be remembered, that sometimes incongruency may be precisely what is wanted. Just take the following quote regarding the crossmodal contrast of thermal heat combined with visual coolness from Japan as but one example: “In the summer the householder likes to hang a picture of a waterfall, a mountain stream, or similar view in the Tokonama and enjoy in its contemplation a feeling of coolness.” (Tetsuro, 1955 , p. 16).

Though Pérez-Gómez ( 2016 , p. 65) seems to be using a rather unconventional definition of synaesthesia, as a little later in his otherwise excellent work, he defines perceptual synaesthesia as “the integrated sensory modalities”, Pérez-Gómez ( 2016 , p. 65). The majority of cognitive neuroscientists would, I presume, take this as a definition of multisensory perception, rather than synaesthesia. Synaesthesia, note, is typically defined as the automatic elicitation of an idiosyncratic concurrent, not normally experienced, in response to the presence of an inducing stimulus (Grossenbacher & Lovelace, 2001 ).

Eberhard ( 2007 , p. xv) sounds a similarly pessimistic note writing that: “I doubt very much that neuroscientific findings will ever usurp intuition and inspiration as a guiding principle within architecture”.

Abath, A. (2017). Merleau-Ponty and the problem of synaesthesia. In O. Deroy (Ed.), Sensory blending: New essays on synaesthesia , (pp. 151–165). Oxford: Oxford University Press.

Google Scholar  

Adams, C., & Doucé, L. (2017). What’s in a scent? Meaning, shape, and sensorial concepts elicited by scents. Journal of Sensory Studies , 32 , e12256.

Article   Google Scholar  

Adams, C., & Vanrie, J. (2018). The added value of designing by crossmodal correspondences: Effect on consumer reactions. In Paper presented at the 4th International Colloquium on Design, Branding and Marketing, UHasselt, Hasselt, Belgium, December 5 th –7 th http://hdl.handle.net/1942/27514 .

Aggleton, J. P., & Waskett, L. (1999). The ability of odours to serve as state-dependent cues for real-world memories: can Viking smells aid the recall of Viking experiences? British Journal of Psychology , 90 , 1–7.

Article   PubMed   Google Scholar  

Albrecht, L. (2013). Barclays Center’s “signature scent” tickles noses, curiosity. http://dnainfo.com/new-york/20130520/prospect-heights/barclays-centers-signature-scent-tickles-noses-curiosity .

Anderton, F. (1991). Architecture for all senses. Architectural Review , 189 (1136), 27.

Ba, M., & Kang, J. (2019). A laboratory study of the sound-odour interaction in urban environments. Building and Environment , 147 , 314–326.

Badde, S., Navarro, K. T., & Landy, M. S. (2020). Modality-specific attention attenuates visual-tactile integration and recalibration effects by reducing prior expectations of a common source for vision and touch. Cognition , 197 , 104170.

Article   PubMed   PubMed Central   Google Scholar  

Bailly Dunne, C., & Sears, M. (1998). Interior designing for all five senses . New York: St. Martin’s Press.

Baird, J. C., Cassidy, B., & Kurr, J. (1978). Room preference as a function of architectural features and user activities. Journal of Applied Psychology , 63 , 719–727.

Banks, S. J., Ng, V., & Jones-Gotman, M. (2012). Does good + good = better? The effect of combining hedonically valenced smells and images. Neuroscience Letters , 514 , 71–76.

Barbara, A., & Perliss, A. (2006). Invisible architecture: Experiencing places through the sense of smell . Milan: Skira.

Barlow, H., & Mollon, J. (Eds.) (1982). The senses . Cambridge: Cambridge University Press.

Bashford, S. (2010). Breaking the sound barrier . The Grocer July 24th. http://www.thegrocer.co.uk/fmcg/breaking-the-sound-barrier/211258.article .

Battacharya, J., & Lindsen, J. P. (2016). Music for a brighter world: Brightness judgment bias by musical emotion. PLoS One , 11 , e0148959.

Baus, O., & Bouchard, S. (2017). Exposure to an unpleasant odour increases the sense of presence in virtual reality. Virtual Reality , 21 , 59–74.

Bavister, P., Lawrence, F., & Gage, S. (2018). Artificial intelligence and the generation of emotional response to sound and space. Proceedings of the Institute of Acoustics, 40(3), 8 pages.

BBC News (2017). Asbestos find closes Oxford University building for two years . BBC News February 10th. https://www.bbc.co.uk/news/uk-england-oxfordshire-38934959 .

Bellizzi, J. A., Crowley, A. E., & Hasty, R. W. (1983). The effects of color in store design. Journal of Retailing, 59 (Spring), 21–45.

Bellizzi, J. A., & Hite, R. E. (1992). Environmental color, consumer feelings, and purchase likelihood. Psychology and Marketing, 9, 347–363.

Benjamin, W. (1968). Illuminations [Trans. H. Zohn] . New York: Schocken Books (First published 1955).

Berg-Ganschow, U., & Jacobsen, W. (1987). … Film … Stadt … Kino … Berlin . USA: Argon.

Bernstein, E. S., & Turban, S. (2018). The impact of the ‘open’ workspace on human collaboration. Philosophical Transactions of the Royal Society B , 373 , 20170239.

Bille, M., & Sørensen, T. F. (2018). Atmospheric architecture: Elements, processes and practices. In D. Howes (Ed.), Senses and sensation: Critical and primary sources , (vol. 4, pp. 137–154). London: Bloomsbury.

Blesser, B., & Salter, L.-R. (2007). Spaces speak, are you listening? Cambridge: MIT Press.

Bloomer, K. C., & Moore, C. W. (1977). Body, memory, and architecture . London: Yale University Press.

Böhme, G. (2013). Atmosphere as mindful physical presence in space. OASE: Journal for Architecture , 91 , 21–32.

Borzykowski, B. (2017). Why open offices are bad for us . BBC January 11th. https://www.bbc.com/worklife/article/20170105-open-offices-are-damaging-our-memories .

Bruno, N., & Pavani, F. (2018). Perception: A multisensory perspective . Oxford: Oxford University Press.

Book   Google Scholar  

Bucknell, A. (2018). Architecture you can smell? A brief history of multisensory design . Metropolis Magazine October 11th. https://www.metropolismag.com/architecture/multisensory-architecture-design-history/ .

Burkus, D. (2016). Why your open office workspace doesn’t work . Forbes June 21st. https://www.forbes.com/sites/davidburkus/2016/06/21/why-your-open-office-workspace-doesnt-work/#188f073a435f .

Calvert, G., Spence, C., & Stein, B. E. (Eds.) (2004). The handbook of multisensory processing . Cambridge: MIT Press.

Candas, V., & Dufour, A. (2005). Thermal comfort: multisensory interactions? Journal of Physiological Anthropology , 24 , 33–36.

Carroll, M. (1967). Paley Park: A corner of quiet delights amid city’s bustle; 53rd St. haven has something for everyone . The New York Times September 20th.  https://www.nytimes.com/1967/09/20/archives/paley-park-a-corner-of-quiet-delights-amid-citys-bustle-53d-st.html

Chen, Y.-C., & Spence, C. (2017). Assessing the role of the ‘unity assumption’ on multisensory integration: a review. Frontiers in Psychology , 8 , 445.

Choo, H., Nasar, J., Nikrahei, B., & Walther, D. B. (2017). Neural codes of seeing architectural styles. Scientific Reports , 7 , 40201. https://doi.org/10.1038/srep40201 .

Classen, C. (1998). The color of angels: Cosmology, gender and the aesthetic imagination. London: Routledge.

Clynes, T. (2012). A restaurant with adjustable acoustics . Popular Science http://www.popsci.com/technology/article/2012-08/restaurant-adjustable-acoustics .

Corbin, A. (1986). The foul and the fragrant: Odor and the French social imagination . Cambridge: Harvard University Press.

Costa, M., Frumento, S., Nese, M., & Predieri, I. (2018). Interior color and psychological functioning in a university residence hall. Frontiers in Psychology , 9 , 1580.

Cox, D. (2017). The science of SAD: Understanding the causes of ‘winter depression’ . The Guardian October 30th. https://www.theguardian.com/lifeandstyle/2017/oct/30/sad-winter-depression-seasonal-affective-disorder?utm_source=esp&utm_medium=Email&%E2%80%A6 .

Crowley, A. E. (1993). The two-dimensional impact of color on shopping. Marketing Letters , 4 , 59–69.

Dalton, P., & Wysocki, C. J. (1996). The nature and duration of adaptation following long-term odor exposure. Perception & Psychophysics , 58 , 781–792.

Dazkir, S. S., & Read, M. A. (2012). Furniture forms and their influence on our emotional responses toward interior environments. Environment and Behavior , 44 , 722–734.

De Croon, E., Sluiter, J., Kuijer, P. P., & Frings-Dresen, M. (2005). The effect of office concepts on worker health and performance: A systematic review of the literature. Ergonomics , 48 , 119–134.

De Lange, M., Debets, L., Ruitenburg, K., & Holland, R. (2012). Making less of a mess: Scent exposure as a tool for behavioral change. Social Influence , 7 (2), 90–97.

Deroy, O., & Spence, C. (2013). Why we are not all synesthetes (not even weakly so). Psychonomic Bulletin & Review , 20 , 643–664.

Doll, J. (2013). The ‘signature scent’ of Brooklyn’s Barclays Center is mysterious . The Atlantic May 20th. https://www.theatlantic.com/national/archive/2013/05/signature-scent-brooklyns-barclays-center-mysterious/315078/ .

Donnell Jr., H. D., Bagby, J. R., Harmon, R. G., Crellin, J. R., Chaski, H. C., Bright, M. F., … Metzger, R. W. (1989). Report of an illness outbreak at the Harry S Truman state office building. American Journal of Epidemiology , 129 , 550–558.

Doyen, S., Klein, O., Pichon, C., & Cleeremans, A. (2012). Behavioural priming: It’s all in the mind, but whose mind? PLoS One , 7 (1), e29081.

Drobnick, J. (2002). Volatile architectures. In B. Miller, & M. Ward (Eds.), Crime and ornament: In the shadow of Adolf Loos , (pp. 263–282). Toronto: YYZ Books.

Drobnick, J. (2005). Volatile effects: Olfactory dimensions in art and architecture. In D. Howes (Ed.), Empire of the senses: The sensual culture reader , (pp. 265–280). Oxford: Berg.

Dunn, N. S. (2017). Shadowplay: Liberation and exhilaration in cities at night. In I. Heywood (Ed.), Sensory arts and design (Sensory Studies Series) , (pp. 31–48). London: Bloomsbury Academic.

Eberhard, J. P. (2007). Architecture and the brain: A new knowledge base from neuroscience . Atlanta: Greenway Communications.

Ellis-Petersen, H. (2019). Chinese province closes all glass bridges over safety fears . The Guardian October 30th. https://www.theguardian.com/world/2019/oct/30/chinese-province-closes-its-glass-bridges-over-safety-fears .

Eriksen, L. (2014). Room with a cue. B&O Play: The Journal , Autumn (3), 26–27.

Evans, G. W., & Johnson, D. (2000). Stress and open-office noise. Journal of Applied Psychology , 85 , 779–783.

Faust, H. S., & Brilliant, L. B. (1981). Is the diagnosis of “mass hysteria” an excuse for incomplete investigation of low-level environmental contamination? Journal of Occupational Medicine , 23 , 22–26.

Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in primate cerebral cortex. Cerebral Cortex , 1 , 1–47.

Finnegan, M. J., Pickering, C. A. C., & Burge, P. S. (1984). The Sick Building Syndrome: Prevalence studies. British Medical Journal , 289 , 1573–1575.

Fletcher, C. (2005). Dystoposthesia: Emplacing environmental sensitivities. In D. Howes (Ed.), Empire of the senses: The sensual culture reader , (pp. 380–396). Oxford: Berg.

Fodor, J. A. (1983). The modularity of mind . Cambridge: MIT Press.

Forster, S., & Spence, C. (2018). “What smell?” Temporarily loading visual attention induces a prolonged loss of olfactory awareness. Psychological Science , 29 , 1642–1652.

Fujisaki, W. (2020). Multisensory shitsukan perception. Acoustical Science & Technology , 41 , 189–195.

Gal, D., Wheeler, S. C., & Shiv, B. (2007, unpublished manuscript). Cross-modal influences on gustatory perception. Available at SSRN: http://ssrn.com/abstract=1030197 .

Gallace, A., Ngo, M. K., Sulaitis, J., & Spence, C. (2012). Multisensory presence in virtual reality: Possibilities & limitations. In G. Ghinea, F. Andres, & S. Gulliver (Eds.), Multiple sensorial media advances and applications: New developments in MulSeMedia , (pp. 1–40). Hershey: IGI Global.

Gallace, A., & Spence, C. (2014). In touch with the future: The sense of touch from cognitive neuroscience to virtual reality . Oxford: Oxford University Press.

Garg, P. (2019). How multi-sensory design can help you create memorable experiences . UX Collective July 28th. https://uxdesign.cc/multi-sensory-design-can-help-you-create-memorable-designs-95dfc0f58da5 .

Gau, R., & Noppeney, U. (2016). How prior expectations shape multisensory perception. NeuroImage , 124 , 876–886.

Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory? Trends in Cognitive Sciences , 10 , 278–285.

Girshick, A. R., Landy, M. S., & Simoncelli, E. P. (2011). Cardinal rules: Visual orientation perception reflects knowledge of environmental statistics. Nature Neuroscience , 14 , 926–932.

Glass, S. T., & Heuberger, E. (2016). Effects of a pleasant natural odor on mood: No influence of age. Natural Product Communications , 11 , 1555–1559.

Glass, S. T., Lingg, E., & Heuberger, E. (2014). Do ambient urban odors evoke basic emotions? Frontiers in Psychology , 5 , 340.

Gregory, A. (2016). The architect who became a diamond . The New Yorker July 16th. https://www.newyorker.com/magazine/2016/08/01/how-luis-barragan-became-a-diamond .

Grossenbacher, P. G., & Lovelace, C. T. (2001). Mechanisms of synesthesia: Cognitive and physiological constraints. Trends in Cognitive Sciences , 5 , 36–41.

Guieysse, B., Hort, C., Platel, V., Munoz, R., Ondarts, M., & Revah, S. (2008). Biological treatment of indoor air for VOC removal: Potential and challenges. Biotechnology Advances , 26 , 398–410.

Gulden, W. O., & Grüsser, O.-J. (1998). Is there a vestibular cortex? Trends in Neurosciences , 21 , 254–259.

Haehner, A., Maass, H., Croy, I., & Hummel, T. (2017). Influence of room fragrance on attention, anxiety and mood. Flavour and Fragrance Journal , (1), 24–28.

Haga, A., Halin, N., Holmgren, M., & Sörqvist, P. (2016). Psychological restoration can depend on stimulus-source attribution: A challenge for the evolutionary account. Frontiers in Psychology , 7 , 1831.

Hall, E. T. (1966). The hidden dimension: Man’s use of space in public and private . London: Bodley Head.

Harada, H., Kashiwadani, H., Kanmura, Y., & Kuwaki, T. (2018). Linalool odor-induced anxiolytic effects in mice. Frontiers in Behavioral Neuroscience , 12 , 241. https://doi.org/10.3389/fnbeh.2018.00241 .

Hasenfus, N., Martindale, C., & Birnbaum, D. (1983). Psychological reality of cross-media artistic styles. Journal of Experimental Psychology: Human Perception and Performance , 9 , 841–863.

PubMed   Google Scholar  

Haverkamp, M. (2014). Synesthetic design: Handbook for a multisensory approach . Basel: Birkhäuser.

Heerwagen, J. H. (1990). Affective functioning, “light hunger,” and room brightness preferences. Environment and Behavior , 22 , 608–635.

Heilig, M. (1962). Sensorama stimulator. U.S. Patent #3,050,870.

Heilig, M. L. (1992). El cine del futuro: The cinema of the future. Presence: Teleoperators, and Virtual Environments , 1 , 279–294.

Henderson, W. B. (1939). Air-conditioning a factor in comfort and profit. Super Market Merchandizing, July (6), 23.

Henshaw, V. (2014). Urban smellscapes: Understanding and designing city smell environments . New York: Routledge.

Henshaw, V., McLean, K., Medway, D., Perkins, C., & Warnaby, G. (Eds.) (2018). Designing with smell: Practices, techniques and challenges . New York: Routledge.

Hersey, G. (2000). Architecture and geometry in the age of the Baroque . Chicago: University of Chicago Press.

Herz, R. S. (2009). Aromatherapy facts and fictions: A scientific analysis of olfactory effects on mood, physiology and behavior. International Journal of Neuroscience , 119 , 263–290.

Heschong, L. (1979). Thermal delight in architecture . Cambridge: MIT Press.

Holland, R. W., Hendriks, M., & Aarts, H. (2005). Smells like clean spirit. Nonconscious effects of scent on cognition and behavior. Psychological Science , 16 , 689–693.

Homburg, C., Imschloss, M., & Kühnl, C. (2012). Of dollars and scents – Does multisensory marketing pay off? Institute for Marketing Oriented Management http://imu2.bwl.uni-mannheim.de/fileadmin/files/imu/files/ap/ri/RI009.pdf .

Hongisto, V., Varjo, J., Oliva, D., Haapakangas, A., & Benway, E. (2017). Perception of water-based masking sounds—Long-term experiment in an open-plan office. Frontiers in Psychology , 8 , 1177.

Horwitz, J., & Singley, P. (Eds.) (2004). Eating architecture . Cambridge: MIT Press.

Hosey, L. (2013). Scent and the city . The New York Times October 5th. https://nyti.ms/HlWGto .

Howes, D. (2005). Architecture of the senses. In M. Zardini (Ed.), Sense of the city: An alternate approach to urbanism , (pp. 322–331). Montreal: Lars Müller Publishers.

Howes, D. (Ed.) (2014). A cultural history of the senses in the modern age . London: Bloomsbury Academic.

Hultén, B., Broweus, N., & van Dijk, M. (2009). Sensory marketing. Basingstoke: Palgrave Macmillan.

Hutmacher, F. (2019). Why is there so much more research on vision than on any other sensory modality? Frontiers in Psychology , 10 , 2246. https://doi.org/10.3389/fpsyg.2019.02246 .

Indovina, I., Maffei, V., Bosco, G., Zago, M., Macaluso, E., & Lacquanita, F. (2005). Representation of visual gravitational motion in the human vestibular cortex. Science , 308 , 416–419.

Jacobs, R., Serhal, C. B., & van Steenberghe, D. (1998). Oral stereognosis: A review of the literature. Clinical Oral Investigations , 2 , 3–10.

Jahncke, H., Eriksson, K., & Naula, S. (2015). The effects of auditive and visual settings on perceived restoration likelihood. Noise & Health , 17 , 1–10.

Jiang, L., Masullo, M., & Maffei, L. (2016). Effect of odour on multisensory environmental evaluations of road traffic. Environmental Impact Assessment Review , 60 , 126–133.

Jones, C. A. (2006). The mediated sensorium. In C. A. Jones (Ed.), Sensorium: Embodied experience, technology, and contemporary art , (pp. 5–49). Cambridge: MIT Press.

Joshi, S. M. (2008). The sick building syndrome. Indian Journal of Occupational and Environmental Medicine , 12 (2), 61–64.

Just, M. G., Nichols, L. M., & Dunn, R. R. (2019). Human indoor climate preferences approximate specific geographies. Royal Society Open Science , 6 (3), 180695.

Kabat-Zinn, J. (2005). Coming to our senses: Healing ourselves and the world through mindfulness . New York: Hyperion.

Kahn Jr., P. H., Friedman, B., Gill, B., Hagman, J., Severson, R. L., Freier, N. G., et al. (2008). A plasma display window? The shifting baseline problem in a technologically-mediated natural world. Journal of Environmental Psychology , 28 , 192–199.

Kang, J., Aletta, F., Gjestland, T. T., Brown, L. A., Botteldooren, D., Schulte-Fortkamp, B., et al. (2016). Ten questions on the soundscapes of the built environment. Building and Environment , 108 , 284–294.

Kant, I. (2000). Critique of the power of judgment . Cambridge: Cambridge University Press.

Karana, E. (2010). How do materials obtain their meanings? METU Journal of the Faculty of Architecture , 27 , 271–285.

Kinman, G., & Garfield, I. (2015). The open-plan university – Noisy nightmare or buzzing ideas hub? The Guardian October 16th. https://www.theguardian.com/higher-education-network/2015/oct/16/the-open-plan-university-noisy-nightmare-or-buzzing-ideas-hub .

Kirshenblatt-Gimblett, B. (1991). Objects of ethnography. In I. Karp, & S. Lavine (Eds.), Exhibiting cultures: The poetics and politics of museum display , (pp. 386–443). Washington, DC: Smithsonian Institution Press.

Komatsu, H., & Goda, N. (2018). Neural mechanisms of material perception: Quest on Shitsukan. Neuroscience , 392 , 329–347.

Kotler, P. (1974). Atmospherics as a marketing tool. Journal of Retailing , 49 (Winter), 48–64.

Krishna, A. (2013). Customer sense: How the 5 senses influence buying behaviour . New York: Palgrave Macmillan.

Kroner, W. M., Stark-Martin, J., & Willemain, T. (1992). The West Bend Mutual study . Troy: Center for Architectural Research, School of Architecture, Rensselaer Polytechnic Institute.

Lam, W. M. (1992). Perception and lighting as form-givers for architecture . New York: Van Nostrand Reinhold.

Lanza, J. (2004). Elevator music: A surreal history of Muzak, easy-listening, and other moodsong . Ann Arbor: University of Michigan Press.

Le Corbusier (1948). Towards a new architecture . London: Architectural Press.

Le Corbusier (1991). Precisions . Cambridge: MIT Press.

LeDoux, J. (2003). The emotional brain, fear, and the amygdala. Cellular and Molecular Neurobiology , 23 , 727–738.

Lee, I. F. (2018). Joyful: The surprising power of ordinary things to create extraordinary happiness . London: Rider.

Lehman, L. M. (2009) Architectural building for all the senses: Bringing space to life. https://marialorenalehman.com/post/architectural-building-for-all-the-senses .

Levent, N., & Pascual-Leone, A. (Eds.) (2014). The multisensory museum: Cross-disciplinary perspectives on touch, sound, smell, memory and space . Plymouth: Rowman & Littlefield.

Levin, M. D. (Ed.) (1993). Modernity and the hegemony of vision . Berkeley: University of California Press.

Li, W., Moallem, I., Paller, K. A., & Gottfried, J. A. (2007). Subliminal smells can guide social preferences. Psychological Science , 18 , 1044–1049.

Lieberman, L. S. (2006). Evolutionary and anthropological perspectives on optimal foraging in obesogenic environments. Appetite , 47 , 3–9.

Lindstrom, M. (2005). Brand sense: How to build brands through touch, taste, smell, sight and sound . London: Kogan Page.

Lipps, A. (2018). Scentscapes. In E. Lupton, & A. Lipps (Eds.), The senses: Design beyond vision , (pp. 108–121). Hudson: Princeton Architectural Press.

Liu, Q., Bogicevic, V., & Mattila, A. S. (2018). Circular vs. angular servicescape: “Shaping” customer response to a fast service encounter pace. Journal of Business Research , 89 , 47–56.

Lootsma, B. (1998). En route to a new tectonics. Daidalos , 68 , 34–47.

Love, S. (2018). Sick building syndrome: Is it the buildings or the people who need treatment? The Independent May 14th. https://www.independent.co.uk/news/long_reads/sick-building-syndrome-treatment-finland-health-mould-nocebo-a8323736.html .

Lucas, D. B., & Britt, S. H. (1950). Advertising psychology and research: An introductory book . New York: McGraw-Hill Book Company.

Lupton, E. (2002). Skin: Surface substance + design . New York: Princeton Architectural Press.

Lupton, E., & Lipps, A. (2018). The senses: Design beyond vision . Hudson: Princeton Architectural Press.

Lynch, K., & Hack, G. (1984). Site design. In Site planning , (3rd ed., pp. 127–129). Cambridge: MIT Press.

Magnavita, N. (2015). Work-related symptoms in indoor environments: A puzzling problem for the occupational physician. International Archives of Occupational and Environmental Health , 88 , 185–196.

Mahvash, K. (2007). Site + sound: Space. In M. W. Muecke, & M. S. Zach (Eds.), Resonance: Essays on the intersection of music and architecture , (pp. 53–75). Ames: Culicidae Press.

Mairs, J. (2017). Therme Vals spa has been destroyed says Peter Zumthor . DeZeen May 11th. https://www.dezeen.com/2017/05/11/peter-zumthor-vals-therme-spa-switzerland-destroyed-news/ .

Malhotra, N. K. (1984). Information and sensory overload. Information and sensory overload in psychology and marketing. Psychology & Marketing , 1 (3–4), 9–21.

Mallgrave, H. F. (2011). The architect’s brain: Neuroscience, creativity, and architecture . Chichester: Wiley-Blackwell.

Malnar, J. M. (2017). The 2015 Chicago Architecture Biennial: The state of sensory design. In I. Heywood (Ed.), Sensory arts and design (Sensory Studies Series) , (pp. 137–156). London: Bloomsbury Academic.

Malnar, J. M., & Vodvarka, F. (2004). Sensory design . Minneapolis: University of Minnesota Press.

Manav, B., Kutlu, R. G., & Küçükdoğu, M. S. (2010). The effects of colour and light on space perception. In Colour and Light in Architecture First International Conference 2010 Proceedings , (pp. 173–177).

Margolies, E. (2006). Vagueness gridlocked: A map of the smells of New York. In J. Drobnick (Ed.), The smell culture reader , (pp. 107–117). Oxford: Berg.

Marks, L. (1978). The unity of the senses: Interrelations among the modalities . New York: Academic.

Martinez, J. (2013). The Barclays Center has its own signature scent . Complex Media May 20th. https://www.complex.com/sports/2013/05/the-barclays-center-has-its-own-signature-scent .

Mattila, A. S., & Wirtz, J. (2001). Congruency of scent and music as a driver of in-store evaluations and behavior. Journal of Retailing , 77 , 273–289.

Mau, B. (2018). Designing LIVE. In E. Lupton, & A. Lipps (Eds.), The senses: Design beyond vision , (pp. 20–23). Hudson: Princeton Architectural Press.

Mau, B. (2019). Bruce Mau’s ‘designing for the five senses’ presented by Freeman . SXSW March 13th. https://schedule.sxsw.com/2019/events/OE38314 .

McCarthy, B. (1996). Multi-source synthesis: An architecture of smell. Architectural Design , 121, 66 (5/6), ii–v.

McCooey, C. (2008). Scenting success , (p. 1). The Financial Times, February 3rd (House & Home).

McGann, J. P. (2017). Poor human olfaction is a 19th-century myth. Science , 356 , eaam7263.

McLuhan, M. (1961). Inside the five sense sensorium. Canadian Architect , 6 (6), 49–54 (Reprinted in Howes, D. (Ed.). (2004). Empire of the senses: the sensual culture reader (pp. 42–52). Oxford, UK: Berg.).

Mehrabian, A. R., & Russell, J. A. (1974). An approach to environmental psychology . Cambridge: MIT Press.

Meijer, D., Veselič, S., Calafiore, C., & Noppeney, U. (2019). Integration of audiovisual spatial signals is not consistent with maximum likelihood estimation. Cortex , 119 , 74–88.

Merleau-Ponty, M. (1962). Phenomenology of perception [trans. C. Smith] . London: Routledge and Kegan Paul.

Merter, S. (2017). Synesthetic approach in the design process for enhanced creativity and multisensory experiences. The Design Journal , 20 (supp. 1), S4519–S4528.

Meyers-Levy, J., & Zhu, R. (2007). The influence of ceiling height: The effect of priming on the type of processing that people use. Journal of Consumer Research , 34 , 174–186.

Mitchell, W. J. T. (2005). There are no visual media. Journal of Visual Culture , 4 , 257–266.

Morrin, M., & Chebat, J. C. (2005). Person-place congruency: The interactive effects of shopper style and atmospherics on consumer expenditures. Journal of Service Research , 8 , 181–191.

Muecke, M. W., & Zach, M. S. (Eds.) (2007). Resonance: Essays on the intersection of music and architecture . Ames: Culicidae Press.

Neff, J. (2000). Product scents hide absence of true innovation . Advertising Age February 21st, 22. http://adage.com/article/news/product-scents-hide-absence-true-innovation/59353/ .

Niemelä, R., Seppänen, O., Korhonen, P., & Reijula, K. (2006). Prevalence of building-related symptoms as an indicator of health and productivity. American Journal of Industrial Medicine , 49 , 819–825.

North, A. C., Hargreaves, D. J., & McKendrick, J. (1997). In-store music affects product choice. Nature, 390, 132.

North, A. C., Hargreaves, D. J., & McKendrick, J. (1999). The influence of in-store music on wine selections. Journal of Applied Psychology, 84, 271–276.

O’Doherty, B. (1999). Inside the white cube: On the ideology of the gallery space, (1976) . Berkeley: University of California Press.

O’Doherty, B. (2009). Beyond the ideology of the white cube . Barcelona: MACBA.

Oberfeld, D., Hecht, H., Allendorf, U., & Wickelmaier, F. (2009). Ambient lighting modifies the flavor of wine. Journal of Sensory Studies , 24 , 797–832.

Oberfeld, D., Hecht, H., & Gamer, M. (2010). Surface lightness influences perceived room height. Quarterly Journal of Experimental Psychology , 63 , 1999–2011.

Ott, W. R., & Roberts, J. W. (1998). Everyday exposure to toxic pollutants. Scientific American , 278 (February), 86–91.

Otterbring, T., Pareigis, J., Wästlund, E., Makrygiannis, A., & Lindström, A. (2018). The relationship between office type and job satisfaction: Testing a multiple mediation model through ease of interaction and well-being. Scandinavian Journal of Work & Environmental Health , 44 , 330–334.

Ottoson, J., & Grahn, P. (2005). A comparison of leisure time spent in a garden with leisure time spent indoors: On measures of restoration in residents in geriatric care. Landscape Research , 30 , 23–55.

Owen, D. (2019). Is noise pollution the next big public-health crisis? The New Yorker May 13th. https://www.newyorker.com/magazine/2019/05/13/is-noise-pollution-the-next-big-public-health-crisis .

Pacelle, M. (1992). Many people refuse to check in if a hotel has odors in the lobby . Wall Street Journal July 28th, B1.

Pallasmaa, J. (1994). An architecture of the seven senses. In S. Holl, J. Pallasmaa, & A. Perez-Gomez (Eds.), Architecture and urbanism: Questions of perception: Phenomenology and architecture (Special issue), July, (pp. 27–37).

Pallasmaa, J. (1996). The eyes of the skin: Architecture and the senses (Polemics) . London: Academy Editions.

Pallasmaa, J. (2000). Hapticity and time: Notes on fragile architecture. Architectural Review , 207 , 78–84.

Pallasmaa, J. (2011). Architecture and the existential sense: Space, body, and the senses. In F. Bacci, & D. Melcher (Eds.), Art and the senses , (pp. 579–598). Oxford: Oxford University Press.

Palmer, S. E. (1999). Vision science: Photons to phenomenology . Cambridge: MIT Press.

Papale, P., Chiesi, L., Rampinini, A. C., Pietrini, P., & Ricciardi, E. (2016). When neuroscience ‘touches’ architecture: From hapticity to a supramodal functioning of the human brain. Frontiers in Psychology , 7 , 866.

Pearson, D. (1991). Making sense of architecture. Architectural Review, 10: Sensuality and Architecture , October , 68–70.

Pérez-Gómez, A. (2016). Attunement: Architectural meaning after the crisis of modern science . Cambridge: MIT Press.

Pheasant, R. J., Horoshenkov, K., Watts, G., & Barret, B. T. (2008). The acoustic and visual factors influencing the construction of tranquil space in urban and rural environments tranquil spaces-quiet places? Journal of the Acoustical Society of America , 123 , 1446–1457.

Porteous, J. D. (1990). Landscapes of the mind: Worlds of sense and metaphor . Toronto: University of Toronto Press.

Porteous, J. D., & Mastin, J. F. (1985). Soundscape. Journal of Architectural and Planning Research , 2 , 169–186.

Posner, M. I., Nissen, M. J., & Klein, R. M. (1976). Visual dominance: An information-processing account of its origins and significance. Psychological Review , 83 , 157–171.

Previc, F. H. (1998). The neuropsychology of 3-D space. Psychological Bulletin , 124 , 123–164.

Prochnik, G. (2009). City of earthly delights . The New York Times December 12th. https://www.nytimes.com/2009/12/13/opinion/13prochnik.html .

Ragavendira, R. (2017). Architecture and human senses. International Journal of Innovations in Engineering and Technology (IJIET) , 8 (2), 131–135.

Rasmussen, S. E. (1993). Experiencing architecture . Cambridge: MIT Press.

Reber, R. (2012). Processing fluency, aesthetic pleasure, and culturally shared taste. In A. P. Shimamura, & S. E. Palmer (Eds.), Aesthetic science: Connecting minds, brains, and experience , (pp. 223–249). Oxford: Oxford University Press.

Reber, R., Schwarz, N., & Winkielman, P. (2004). Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience? Personality and Social Psychology Review , 8 , 364–382.

Reber, R., Winkielman, P., & Schwartz, N. (1998). Effects of perceptual fluency on affective judgments. Psychological Science , 9 , 45–48.

Redesigning the corporate office (2019). The Economist, September 28th. https://www.economist.com/business/2019/09/28/redesigning-the-corporate-office .

Redlich, C. A., Sparer, J., & Cullen, M. R. (1997). Sick building syndrome. Lancet , 349 , 1013–1016.

Robart, R. L., & Rosenblum, L. D. (2005). Hearing space: Identifying rooms by reflected sound. In H. Heft, & K. L. Marsh (Eds.), Studies in perception and action XIII , (pp. 152–156). Hillsdale: Lawrence Erlbaum Associates.

Robinson, S., & Pallasmaa, J. (Eds.) (2015). Mind in architecture: Neuroscience, embodiment, and the future of design . Cambridge: MIT Press.

Rohe, T., Ehlis, A. C., & Noppeney, U. (2019). The neural dynamics of hierarchical Bayesian causal inference in multisensory perception. Nature Communications , 10 , 1907.

Rosenthal, N. E. (2019). Winter blues: Everything you need to know to beat seasonal affective disorder . New York: Guilford Press.

Rosenthal, N. E., Sack, D. A., Gillin, J. C., Lewy, A. J., Goodwin, F. K., Davenport, Y., et al. (1984). Seasonal affective disorder: A description of the syndrome and preliminary findings with light therapy. Archives of General Psychiatry , 41 , 72–80.

Ryan, R. (1997). Thermal baths in Vals, Switzerland by Peter Zumthor . Architectural Review August 16th. https://www.architectural-review.com/buildings/thermal-baths-in-vals-switzerland-by-peter-zumthor/8616979.article?blocktitle=1990s-grid&contentID=24955 .

Rybczynski, W. (2001). The look of architecture . New York: The New York Public Library.

Salgado-Montejo, A., Salgado, C., Alvarado, J., & Spence, C. (2017). Simple lines and shapes are associated with, and communicate, distinct emotions. Cognition & Emotion , 31 , 511–525.

Sathian, K., & Ramachandran, V. S. (Eds.) (2020). Multisensory perception: From laboratory to clinic . San Diego: Elsevier.

Sayin, E., Krishna, A., Ardelet, C., Decré, G. B., & Goudey, A. (2015). “Sound and safe”: The effect of ambient sound on the perceived safety of public spaces. International Journal of Research in Marketing , 32 , 343–353.

Schafer, R. M. (1977). The tuning of the world . New York: Knopf.

Schifferstein, H. N. J., Talke, K. S. S., & Oudshoorn, D.-J. (2011). Can ambient scent enhance the nightlife experience? Chemosensory Perception , 4 , 55–64.

Schroeder, J. (2018). Inside the $30m Miami condo that comes with its own ‘scent identity’: Olfactory specialist spends 6months with new buyers to design their personal scent that is diffused through the HVAC system . Daily Mail Online July 10th. http://www.dailymail.co.uk/news/article-5936585/29million-condo-Miami-comes-custom-scent-identity.html .

Sennett, R. (1994). Flesh and stone: The body and the city in western civilization . New York: Norton.

Siefkes, M., & Arielli, E. (2015). An experimental approach to multimodality: How musical and architectural styles interact in aesthetic perception. In J. Wildfeuer (Ed.), Building bridges for multimodal research: International perspectives on theories and practices of multimodal analysis , (pp. 247–265). New York: Peter Lang.

Sigsworth, W. (2019). Architect Chris Downey lost sight, yet brought a new focus on touch to his architecture. Changing lives. Sappi Europe & J. Brown, Reach out and touch: The joy of the physical in the digital age (22–27). London: John Brown & Brussels: Sappi Europe.

Simmel, G. (1995). The metropolis and mental life. In P. Kasinitz (Ed.), Metropolis: Centre and symbol of our times . London: Macmillan.

Slosson, E. E. (1899). A lecture experiment in hallucination. Psychological Review , 6 , 407–408.

Smeets, M. A. M., & Dijksterhuis, G. B. (2014). Smelly primes – When olfactory primes do or do not work. Frontiers in Psychology , 5 , 96.

Southworth, M. (1969). The sonic environment of cities. Environment and Behavior , 1 (1), 49–70.

Spence, C. (2002). The ICI report on the secret of the senses . London: The Communication Group.

Spence, C. (2003). A new multisensory approach to health and well-being. In Essence, 2 , 16–22.

Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics , 73 , 971–995.

Spence, C. (2012a). Managing sensory expectations concerning products and brands: Capitalizing on the potential of sound and shape symbolism. Journal of Consumer Psychology , 22 , 37–54.

Spence, C. (2012b). Synaesthetic marketing: Cross sensory selling that exploits unusual neural cues is finally coming of age. In The Wired World in 2013, November , (pp. 104–107).

Spence, C. (2014). Noise and its impact on the perception of food and drink. Flavour , 3 , 9.

Spence, C. (2015). Book review: Synaesthetic design. Multisensory Research , 28 , 245–248.

Spence, C. (2020a). Temperature-based crossmodal correspondences: Causes & consequences. Multisensory Research, 33, 645-682. https://doi.org/10.1163/22134808-20191494 .

Spence, C. (2020b). Shitsukan – The multisensory perception of quality. Multisensory Research . https://doi.org/10.1163/22134808-bja10003 .

Spence, C. (2020c). Atmospheric effects on eating and drinking: A review. In H. Meiselman (Ed.), Handbook of eating and drinking , (pp. 257–276). Cham: Springer.

Chapter   Google Scholar  

Spence, C. (2021). Sensehacking . London: Viking Penguin.

Spence, C. (2020d). Designing for the multisensory mind. Architectural Design, December, 42-49.

Spence, C., & Frings, C. (2020). Multisensory feature integration in (and out) of the focus of spatial attention. Attention, Perception, & Psychophysics , 82 , 363–376.

Spence, C., & Keller, S. (2019). Medicine’s melodies: On the costs and benefits of music, soundscapes, and noise in healthcare settings. Music and Medicine , 11 , 211–225.

Spence, C., Lee, J., & van der Stoep, N. (2017). Responding to sounds from unseen locations: Crossmodal attentional orienting in response to sounds presented from the rear. European Journal of Neuroscience, 51, 1137–1150.

Spence, C., Velasco, C., & Knoeferle, K. (2014). A large sample study on the influence of the multisensory environment on the wine drinking experience. Flavour , 3 , 8.

Spence, C., Wan, X., Woods, A., Velasco, C., Deng, J., Youssef, J., & Deroy, O. (2015). On tasty colours and colourful tastes? Assessing, explaining, and utilizing crossmodal correspondences between colours and basic tastes. Flavour , 4 , 23.

Stein, B. E. (Ed.) (2012). The new handbook of multisensory processing . Cambridge: MIT Press.

Stein, B. E., & Meredith, M. A. (1993). The merging of the senses . Cambridge: MIT Press.

Steinwald, M., Harding, M. A., & Piacentini, R. V. (2014). Multisensory engagement with real nature relevant to real life. In N. Levent, & A. Pascual-Leone (Eds.), The multisensory museum: Cross-disciplinary perspectives on touch, sound, smell, memory and space (pp. 45–60). Plymouth: Rowman & Littlefield.

Sterken, S. (2007). Music as an art of space: Interactions between music and architecture in the work of Iannis Xenakis. In M. W. Muecke, & M. S. Zach (Eds.), Resonance: Essays on the intersection of music and architecture , (pp. 21–51). Ames: Culicidae Press.

Stokes, A. (1978). Smooth and rough. In The critical writings of Adrian Stokes , (vol. 2, pp. 213–256). London: Thames & Hudson.

Sunaga, T., Park, J., & Spence, C. (2016). Effects of lightness-location consumers’ purchase decision-making. Psychology & Marketing , 33 , 934–950.

Talsma, D. (2015). Predictive coding and multisensory integration: An attentional account of the multisensory mind. Frontiers in Integrative Neuroscience , 9 , 19.

Tanizaki, J. (2001). In praise of shadows (Trans. By T. J. Harper & E. G. Seidenstickker) . London: Vintage Books.

Terman, M. (1989). On the question of mechanism in phototherapy for seasonal affective disorder: Considerations of clinical efficacy and epidemiology. In N. E. Rosenthal, & M. C. Blehar (Eds.), Seasonal affective disorders and phototherapy , (pp. 357–376). New York: Guilford.

Tetsuro, Y. (1955). The Japanese house and garden . New York: Frederick Praeger.

Thömmes, K., & Hübner, R. (2018). Instagram likes for architectural photos can be predicted by quantitative balance measures and curvature. Frontiers in Psychology: Perception Science , 9 , 1050. https://doi.org/10.3389/fpsyg.2018.01050 .

Thompson, E. (1999). Listening to/for modernity: Architectural acoustics and the development of modern spaces in America. In P. Galison, & E. Thompson (Eds.), The architecture of science , (pp. 253–280). Cambridge: MIT Press.

Tonetto, L., Klanovicz, C. P., & Spence, C. (2014). Modifying action sounds influences people’s emotional responses and bodily sensations. i-Perception , 5 , 153–163.

Torrico, D. D., Han, Y. Sharma, C. Fuentes, S., Gonzalez Viejo, C., & Dunshea, F. R. (2020). Effects of context and virtual reality environments on the wine tasting experience, acceptability, and emotional responses of consumers. Foods, 9:191; https://doi.org/10.3390/foods9020191 .

Article   PubMed Central   Google Scholar  

Treasure, J. (2007). Sound business . Cirencester: Management Books 2000 Ltd.

Treib, M. (1995). Must landscape mean? Approaches to significance in recent landscape architecture. Landscape Journal , 14 (1), 47–62.

Trivedi, B. (2006). Recruiting smell for the hard sell. New Scientist , 2582 , 36–39.

Tsushima, Y., Okada, S., Kawai, Y., Sumita, A., Ando, H., & Miki, M. (2020). Effect of illumination on perceived temperature. PLoS One,  15(8): e0236321.

Tuan, Y. F. (1977). Space and place: the perspective of experience . Minneapolis: University of Minnesota Press.

UN-Habitat (2010). State of the world’s cities 2010/2011: Bridging the urban divide. http://www.unhabitat.org/documents/SOWC10/R7.pdf .

United Nations Department of Economic and Social Affairs (2018). 68% of the world population projected to live in urban areas by 2050, says UN. May 16th. https://www.un.org/development/desa/en/news/population/2018-revision-of-world-urbanization-prospects.html .

Van Renterghem, T., & Botteldooren, D. (2016). View on outdoor vegetation reduces noise annoyance for dwellers near busy roads. Landscape and Urban Planning , 148 , 203–215.

Varga, B. A. (1996). Conversations with Iannis Xenakis . London: Faber and Faber.

Vartanian, O., Navarrete, G., Chatterjee, A., Fich, L. B., Gonzalez-Mora, J. L., Leder, H., et al. (2015). Architectural design and the brain: Effects of ceiling height and perceived enclosure on beauty judgments and approach-avoidance decisions. Journal of Environmental Psychology , 41 , 10–18.

Vartanian, O., Navarrete, G., Chatterjee, A., Fich, L. B., Leder, H., Modroño, C., et al. (2013). Impact of contour on aesthetic judgments and approach-avoidance decisions in architecture. Proceedings of the National Academy of Sciences of the USA , 110 (Supple 2), 10446–10453.

Velux YouGov Report (2018). The indoor generation: effects of modern indoor living on health, wellbeing and productivity. www.velux.nn/indoorgeneration .

von Castell, C., Hecht, H., & Oberfeld, D. (2018). Bright paint makes interior-space surfaces appear farther away. PLoS ONE, 13(9):e0201976. https://doi.org/10.1371/journal.pone.0201976 .

Vorreiter, G. (1989). Theatre of touch. The Architectural Review, 185, 66–69.

Wagner, M. (1989). Theater of touch. Interiors, 149, 98–99.

Wagner, K. (2018). How restaurants got so loud . Atlantic Monthly November 27th. https://www.theatlantic.com/technology/archive/2018/11/how-restaurants-got-so-loud/576715/ .

Walker, M. (2018). Why we sleep . London: Penguin.

Wang, Q. J., & Spence, C. (2019). Drinking through rosé-coloured glasses: Influence of wine colour on the perception of aroma and flavour in wine experts and novices. Food Research International , 126 , 108678.

Ward, J. (2014). Multisensory memories. In N. Levent, & A. Pascual-Leone (Eds.), The multisensory museum: Cross-disciplinary perspectives on touch, sound, smell, memory and space , (pp. 273–284). Plymouth: Rowman & Littlefield.

Wargocki, P. (2001). Measurements of the effects of air quality on sensory perception. Chemical Senses , 26 , 345–348.

Wargocki, P., Wyon, D. P., Baik, Y. K., Clausen, G., & Fanger, P. O. (1999). Perceived air quality, Sick Building Syndrome (SBS) symptoms and productivity in an office with two different pollution loads. Indoor Air , 9 , 165–179.

Wargocki, P., Wyon, D. P., Sundell, J., Clausen, G., & Fanger, P. O. (2000). The effects of outdoor air supply rate in an office on perceived air quality, sick building syndrome (SBS) symptoms and productivity. Indoor Air , 10 , 222–236.

Wastiels, L., Schifferstein, H. N. J., Wouters, I., & Heylighen, A. (2013). Touching materials visually: About the dominance of vision in building material assessment. International Journal of Design , 7 , 31–41.

Waterman Jr., C. N. (1917). Hand-tongue space perception. Journal of Experimental Psychology , 2 , 289–294.

Weber, S. T., & Heuberger, E. (2008). The impact of natural odors on affective states in humans. Chemical Senses , 33 , 441–447.

Weichenberger, M., Bauer, M., Kühler, R., Hensel, J., Forlim, C. G., Ihlenfeld, A., et al. (2017). Altered cortical and subcortical connectivity due to infrasound administered near the hearing threshold – Evidence from fMRI. PLoS One , 12 (4), e0174420.

Whipple, T. (2019). Why we like our homes to be as warm as Africa , (p. 13). The Times, March 20th.

Wilkins, A. J. (2017). The scientific reason you don’t like LED bulbs—And the simple way to fix them . Scientific American August 1st. https://www.scientificamerican.com/article/the-scientific-reason-you-dont-like-led-bulbs-mdash-and-the-simple-way-to-fix-them/ .

Wilkins, A. J., Nimmo-Smith, I., Slater, I. A., & Bedocs, L. (1989). Fluorescent lighting, headaches and eyestrain. Lighting Research and Technology , 21 , 11–18.

Williams, A. R. (1980). The urban stage: A reflection of architecture and urban design . San Franciso: San Francisco Center for Architecture and Urban Studies.

Williams, F. (2017). The nature fix: Why nature makes us happier, healthier, and more creative . London: W. W. Norton & Company.

Winkielman, P., Schwarz, N., Fazendeiro, T., & Reber, R. (2003). The hedonic marking of processing fluency: Implications for evaluative judgment. In J. Musch, & K. C. Klauer (Eds.), The psychology of evaluation: Affective processes in cognition and emotion , (pp. 189–217). Mahwah: Erlbaum.

Winkielman, P., Ziembowicz, M., & Nowak, A. (2015). The coherent and fluent mind: How unified consciousness is constructed from cross-modal inputs via integrated processing experiences. Frontiers in Psychology , 6 , 83.

Winzen, J., Albers, F., & Marggraf-Micheel, C. (2014). The influence of coloured light in the aircraft cabin on passenger thermal comfort. Lighting Research Technology , 46 , 465–475.

Woods, J. E. (1989). Cost avoidance and productivity in owning and operating buildings. In J. E. Cone & M. J. Hodgson (Eds.), Problem-buildings: Building-associated illness and the sick building syndrome. Occupational Medicine: State of the Art Reviews , 4 , 753–770.

Xu, A. J., & Labroo, A. A. (2014). Incandescent affect: Turning on the hot emotional system with bright light. Journal of Consumer Psychology , 24 , 207–216.

Yost, M. (2007). Close to the edge . Wall Street Journal April 10th.

Zardini, M. (Ed.) (2005). Sense of the city: An alternate approach to urbanism: The Canadian Centre for Architecture . Montreal: Lars Müller Publishers.

Zimmerman, M. (1989). The nervous system in the context of information theory. In R. F. Schmidt, & G. Thews (Eds.), Human physiology (2nd. complete ed.) , (pp. 166–173). Berlin: Springer-Verlag.

Download references

Acknowledgements

Completion of this review was supported by AHRC “Rethinking the Senses” Grant AH/L007053/1.

Author information

Authors and affiliations.

Department of Experimental Psychology, Crossmodal Research Laboratory, University of Oxford, Anna Watts Building, Oxford, OX2 6GG, UK

Charles Spence

You can also search for this author in PubMed   Google Scholar

Contributions

The author wrote all parts of this manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Charles Spence .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

The author confirms that he has consent to publish this work.

Competing interests

There are no competing interests to declare.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Spence, C. Senses of place: architectural design for the multisensory mind. Cogn. Research 5 , 46 (2020). https://doi.org/10.1186/s41235-020-00243-4

Download citation

Received : 01 May 2020

Accepted : 05 August 2020

Published : 18 September 2020

DOI : https://doi.org/10.1186/s41235-020-00243-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Multisensory perception
  • Architecture
  • Crossmodal correspondences

research papers on architecture

  • Hispanoamérica
  • Work at ArchDaily
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Architectural Research

Architectural Research: The Latest Architecture and News

Creating architecture in an uncivil time: in conversation with ali karimi of civil architecture.

The Arabian Peninsula represents one of the world’s leading exporters and users of fossil fuels, an economic reality that influences the area’s visions for the future and, implicitly, architecture and urban planning’s role in these scenarios. A number of emerging offices are however countering these narratives, turning to contextual research to reframe the area’s production of architecture. Among these, cultural practice Civil Architecture has become recognized for its provocative works that explore alternative narratives for the identity of the Middle East . While in Bahrain , ArchDaily's Christele Harrouk had the chance to sit down with Ali Karimi , who, together with Hamed Bukhamseen , co-founded Civil Architecture. In the video interview, they discuss the practice of architecture in the Gulf region and the narrative and research-focused approach of the office.

Creating Architecture in an Uncivil Time: In Conversation with Ali Karimi of Civil Architecture - Image 1 of 4

  • Read more »

The Loeb Fellowship at Harvard GSD Announces the Selection for the Class of 2025

The Loeb Fellowship at Harvard GSD Announces the Selection for the Class of 2025 - Featured Image

The Harvard Graduate School of Design (Harvard GSD) has announced the Class of 2025 Loeb Fellows. Ten practitioners and activists from around the world have been selected to join the Loeb Fellowship program to expand their careers and advance their programs and initiatives focused on equity, resilience, and collective action.

The ten selected practitioners are mid-career professionals coming from diverse backgrounds. Each one has been recognized for initiating practices that are transforming public spaces and urban infrastructures, addressing public health concerns and environmental injustices, as well as housing needs and efforts to preserve the cultural, natural, and architectural heritage of diverse regions from all continents.

The Loeb Fellowship at Harvard GSD Announces the Selection for the Class of 2025 - Image 1 of 4

Jingru (Cyan) Cheng Wins 2023 Wheelwright Prize for her Study on the Impact of Sand on the Environment and Communities

Jingru (Cyan) Cheng Wins 2023 Wheelwright Prize for her Study on the Impact of Sand on the Environment and Communities - Featured Image

Harvard University Graduate School of Design (GSD) has announced Jingru (Cyan) Cheng as the recipient of the 2023 Wheelwright Prize , a study grant created to support globally-minded research and investigative approaches to contemporary architecture. The winning research project, titled “Tracing Sand: Phantom Territories, Bodies Adrift,” delves into the multifaceted impacts of sand mining and reclamation, understood from cultural, economic, and ecological perspectives. The unassuming material has become an indispensable element for our built environment and human communities, serving as a vital component in the production of glass, concrete, asphalt roads, and artificial land. Yet the process of dredging underwater systems and sand mining leads to the disruption of habitats in a process that simultaneously shapes one habitat while devastating another.

Jingru (Cyan) Cheng Wins 2023 Wheelwright Prize for her Study on the Impact of Sand on the Environment and Communities - Image 1 of 4

Design for Inclusivity at the UIA World Congress of Architects 2023

Design for Inclusivity at the UIA World Congress of Architects 2023 - Featured Image

The UIA World Congress of Architects 2023 is an invitation for architects from around the world to meet in Copenhagen July 2 – 6 to explore and communicate how architecture influences all 17 UN Sustainable Development Goals (SDGs). For more than two years, the Science Track and its international Scientific Committee have been analyzing the various ways in which architecture responds to the SDGs. The work has resulted in the formulation of six science panels: design for Climate Adaptation , design for Rethinking Resources , design for Resilient Communities , design for Health , design for Inclusivity , and design for Partnerships for Change. An international call for papers was sent out in 2022 and 296 of more than 750 submissions from 77 countries have been invited to present at the UIA World Congress of Architects 2023 in Copenhagen. ArchDaily is collaborating with the UIA to share articles pertaining to the six themes to prepare for the opening of the Congress.

In this fifth feature, we met with co-chairs of design for Inclusivity architect Magda Mostafa , Professor of Design, Department of Architecture , the American University in Cairo and architect Ruth Baumeister , Associate Professor of Theory and History, Aarhus School of Architecture.

Design for Inclusivity at the UIA World Congress of Architects 2023 - Image 1 of 4

Design for Health at the UIA World Congress of Architects 2023

Design for Health at the UIA World Congress of Architects 2023  - Featured Image

In this fourth feature, we met with co-chairs of design for Health architect Arif Hasan , former Visiting Professor NED University Karachi and member of UNs Advisory Group on Forced Evictions, and architect Christian Benimana , Senior Principal and Co-Executive Director at MASS Design Group

Design for Health at the UIA World Congress of Architects 2023  - Image 1 of 4

Design for Resilient Communities at the UIA World Congress of Architects 2023

Design for Resilient Communities at the UIA World Congress of Architects 2023 - Featured Image

In this third feature, we met with co-chairs of Design for Resilient Communities Anna Rubbo , Senior Researcher, Center for Sustainable Urban Development (CSUD), The Climate School, Columbia University, and Juan Du , Professor and Dean of the John H. Daniels Faculty of Architecture , Landscape and Design, University of Toronto.

Design for Resilient Communities at the UIA World Congress of Architects 2023 - Image 1 of 4

Rethinking Resources at the UIA World Congress of Architects 2023

Rethinking Resources at the UIA World Congress of Architects 2023 - Featured Image

The UIA World Congress of Architects 2023 is an invitation for architects from around the world to meet in Copenhagen July 2 – 6 to explore and communicate how architecture influences all 17 UN Sustainable Development Goals (SDGs). For more than two years, the Science Track and its international Scientific Committee have been analyzing the various ways in which architecture responds to the SDGs. The work has resulted in the formulation of six science panels: design for Climate Adaptation , design for Rethinking Resources , design for Resilient Communities , design for Health , design for Inclusivity , and design for Partnerships for Change. An international call for papers was sent out in 2022 and 296 of more than 750 submissions have been invited to present at the UIA World Congress of Architects 2023 in Copenhagen. ArchDaily is collaborating with the UIA to share articles pertaining to the six themes to prepare for the opening of the Congress.

In this first feature, we met with the Head of the Scientific Committee Mette Ramsgaard Thomsen , Professor and Head of the CITA (Centre for IT and Architecture), Royal Danish Academy of Fine Arts, School of Architecture , Design and Conservation who is also co-chairing the panel design for Rethinking Resources with Carlo Ratti, Professor and Director of the Senseable Lab, MIT, Founding Partner of CRA-Carlo Ratti Associati .

Rethinking Resources at the UIA World Congress of Architects 2023 - Image 1 of 4

The UIA World Congress of Architects 2023 Copenhagen Science Track Announces the 6 Themes of Its Agenda

The UIA World Congress of Architects 2023 Copenhagen Science Track Announces the 6 Themes of Its Agenda - Featured Image

The UIA World Congress of Architects 2023 is an invitation for architects from all around the world to meet in Copenhagen to explore and communicate how architecture influences all 17 UN Sustainable Development Goals (SDGs). The Science Track of the UIA World Congress has been tasked with the development of the agenda, Sustainable Futures – Leave No One Behind . For more than two years, its international Scientific Committee has been analyzing the various ways in which architecture responds to the SDGs. The work has resulted in the formulation of six themes: climate adaptation, rethinking resources, resilient communities, health, inclusivity, and partnerships for change. ArchDaily is collaborating with UIA to share articles pertaining to the six themes to prepare for the opening of the Congress on July 2, 2023.

The UIA World Congress of Architects 2023 Copenhagen Science Track Announces the 6 Themes of Its Agenda - Image 1 of 4

“Everyone Belongs to Everyone Else”: The Italian Pavilion at the 2023 Venice Biennale is Curated by Fosbury Architecture

“Everyone Belongs to Everyone Else”: The Italian Pavilion at the 2023 Venice Biennale is Curated by Fosbury Architecture - Featured Image

The project for the Italian Pavilion at the 18 th International Architecture Exhibition – La Biennale di Venezia will be curated by Fosbury Architecture , a collective composed of Giacomo Ardesio, Alessandro Bonizzoni, Nicola Campri, Veronica Caprino, and Claudia Mainardi. Fosbury Architecture’s vision for the exhibition is based on a research practice that sees design as the result of collective and collaborative work. From January to April, leading up to the opening of the Biennale, nine site-specific interventions titled “ Spaziale presenta ” are set out to activate different locations across Italy.

“Everyone Belongs to Everyone Else”: The Italian Pavilion at the 2023 Venice Biennale is Curated by Fosbury Architecture - Image 1 of 4

"With Intention to Build", Moshe Safdie’s Exhibition of Unbuilt Projects Opens in Boston, USA

"With Intention to Build", Moshe Safdie’s Exhibition of Unbuilt Projects Opens in Boston, USA - Featured Image

From October 2022 through January 2, 2023, The Boston Architectural College (BAC) and Safdie Architects will display the most groundbreaking unbuilt projects by Moshe Safdie . With Intention to Build showcases the architect's creative process throughout the 55 years of his career, including models, drawings, and various texts and photographs. The exhibition provides context and tells the story behind these radical unrealized designs that have influenced projects such as Habitat 67 in Montreal, Canada , and Marina Bay Sands in Singapore.

"With Intention to Build", Moshe Safdie’s Exhibition of Unbuilt Projects Opens in Boston, USA - Image 1 of 4

The 3rd Lilly Reich Grant for Equality in Architecture is Awarded to a Research Project Celebrating Anna Bofill Levi

The 3rd Lilly Reich Grant for Equality in Architecture is Awarded to a Research Project Celebrating Anna Bofill Levi - Featured Image

Fundació Mies van der Rohe has announced that a research project focused on Anna Bofill Levi has been awarded the third Lilly Reich Grant for Equality in Architecture . The project, titled “ La arquitectura como contracanto: 1977-1996 ”, was initiated by architects Ma Elia Gutiérrez Mozo , José Parra Martínez , Ana Gilsanz Díaz , and Joaquín Arnau Amo . The research contextualizes the architectural works of pianist, architect, and composer Anna Bofill Levi and brings into focus the result of her multidisciplinary approach, intertwining practices and research in design, architecture and music.

The 3rd Lilly Reich Grant for Equality in Architecture is Awarded to a Research Project Celebrating Anna Bofill Levi - Image 1 of 4

A City of Rooms: An Analysis of Shared Housing and Domestic Living

A City of Rooms: An Analysis of Shared Housing and Domestic Living - Featured Image

"A city of rooms" is a research work by architect Paula Olea Fonti that focuses on the study of shared housing, which is one of the most common ways for young students and professionals to live in the city. A popular and ordinary house, if you will. One that many architects would distinguish for its low architectural value.

The Architect-Researcher: Exploring New Possibilities for the Production of Architecture

The Architect-Researcher: Exploring New Possibilities for the Production of Architecture - Featured Image

While research seems intrinsic to the design process, architectural research is a professional path in itself, whose purpose is to highlight scientific evidence and explore alternatives outside of pre-established norms or empirical considerations. Its purpose is to create a framework of knowledge that can inform the design to reach objectively better outcomes. The following discusses the role and state of research in architecture, some prominent areas of inquiry, and the architects or institutions that dedicate their work to these subjects.

The Architect-Researcher: Exploring New Possibilities for the Production of Architecture - Image 1 of 4

RIBA Announces Winners of 2021 President’s Medal and Awards for Research

RIBA Announces Winners of 2021 President’s Medal and Awards for Research - Featured Image

The Royal Institute of British Architects (RIBA) has announced the winners of the 2021 President's Medal and Awards for Research , highlighting the best research concerning architecture and the built environment. The President's Medal was awarded to John Lin and Sony Devabhaktuni from the University of Hong Kong for their research project As Found Houses , which explores vernacular practices in rural China . Two more awards were granted to the development of an ethics guide for architectural practitioners and a study of thin-tile vaulting in Cuba.

RIBA Announces Winners of 2021 President’s Medal and Awards for Research - Image 1 of 4

Between Arches, Architecture of Connection: An Alternative View of Barcelona

Between Arches, Architecture of Connection: An Alternative View of Barcelona - Featured Image

The gaze is a tool that the architect uses constantly but does not fully value. It is an instrument that, in addition to allowing us to know and recognize our reality and the phenomena that arise from it, can work as a method of analysis. " Entrearcos (Between-arches): architecture of connection " is a research project developed by the architect Daniela Silva Landeros that studies, in the specific case of the Ciutat Vella neighbourhood of the city of Barcelona , the issue of arches in our cities. And Silva Landeros does so from alternative points of view that call into question the way we are used to looking.

Systematica Releases First Assessment on Milan Public Realm, Green Areas and Gathering Places

Systematica Releases First Assessment on Milan Public Realm, Green Areas and Gathering Places - Featured Image

Systematica has just released a case study on access to green areas and the public realm in the city of Milan . Focusing on the availability of these gathering spaces for residents, the research, particularly relevant in this time of the pandemic, also highlights open and not crowded public spaces, convenient for a safe social life.

2020 AIANY | Center for Architecture Arnold W. Brunner Grant for Mid-Career Architects

2020 AIANY | Center for Architecture Arnold W. Brunner Grant for Mid-Career Architects  - Featured Image

CALL FOR ENTRIES Arnold W. Brunner Grant $15,000 Deadline: Monday, February 3rd, 2020 5 pm (EST)

The Center for Architecture is now accepting applications for the 2020 Arnold W. Brunner Grant. This grant is awarded to mid-career architects for advanced study in any area of architectural investigation that will contribute to the knowledge, teaching, or practice of the art and science of architecture. The proposed investigation is to result in a publicly available written work, design project, research paper, or other form of presentation to be offered at the Center for Architecture. Previous topics of research have ranged from the impact of American

AD Interviews: Kim Nielsen of 3XN

During the World Architecture Festival 2018, which will be held this year again in Amsterdam , we had the chance to sit down with Kim Nielsen, one of the founders of Denmark-based firm 3XN .

research papers on architecture

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Architecture

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Last »
  • Architectural History Follow Following
  • Architectural Theory Follow Following
  • Urban Design Follow Following
  • Architectural Education Follow Following
  • History of architecture Follow Following
  • Urban Studies Follow Following
  • Urban History Follow Following
  • Urbanism Follow Following
  • Modern Architecture Follow Following
  • Urban Planning Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Journals
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Research on Zhe-Style Dwelling Houses from the Perspective of Architecture Culture—Taking the Traditional Village in Xianju County as an Example

  • Conference paper
  • First Online: 18 September 2024
  • Cite this conference paper

research papers on architecture

  • Gaochuan Zhang 47 , 48 , 49 ,
  • Jiajia Lv 47 ,
  • Qiaoyuan Lin 47 &
  • Min Fang 47  

Part of the book series: Advances in Science, Technology & Innovation ((ASTI))

Included in the following conference series:

  • International conference on Climate Change and Environmental Sustainability

Rural architecture serves as a significant carrier in the revitalization of rural culture, together with the local natural scenery and customs, forming the distinctive rural landscape. This paper explores and extracts nine distinctive cultural elements of Zhe-style dwelling houses, including house layout, facade composition, gable form, street scale, wind and rain corridors, roof design, building materials, overall color scheme, and decorative carvings, through the identification of landscape genes using four modes, Based on this, an exploration of Zhe-style vernacular dwelling construction in Wengsen Street, Shuangmiao Township, Xianju County, in the Zhejiang central region is conducted, leading to the discovery of a new traditionalist design approach that combines theoretical application and Zhe-style culture practices. In addition, by promoting the integration of Zhe-style residential buildings with the natural environment, it is beneficial to reduce environmental damage and improve the adaptability and sustainability of residential buildings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fan, Z., Chen, C., & Huang, H. (2022). Immersive cultural heritage digital documentation and information service for historical Figure metaverse: a case of Zhu Xi, Song Dynasty, China. Heritage Science, 10 (1), 148. https://doi.org/10.1186/s40494-022-00749-8

Fang, J., Zhang, Y., Zhang, Y., Guo, H., & Sun, Z. (2022, Apr 28). High-definition survey of architectural heritage fusing multisensors—The case of Beamless Hall at Linggu Temple in Nanjing, China. Sensors (Basel), 22 (9). https://doi.org/10.3390/s22093369

Ganobjak, M., Brunner, S., Hofmann, J., Klar, V., Ledermann, M., Herzog, V., Kampfen, B., Kilian, R., Wehdorn, M., & Wernery, J. (2023, Oct 13). Current trends in aerogel use in heritage buildings: Case studies from the aerogel architecture award 2021. Gels, 9 (10). https://doi.org/10.3390/gels9100814

Jiang, W., Yao, K., Lin, Q., Zhao, Y., & Lu, D. (2023, Nov 22). Optical fiber sensing-aided 3D-printed replacement parts for enhancing the sensing ability of architectural heritage. Micromachines (Basel), 14 (12). https://doi.org/10.3390/mi14122135

Li, Y., Liu, Y., & Wen, Y. (2022). New algorithm of traditional Chinese medicine and protection of intangible cultural heritage based on big data deep learning. BioMed Research International, 2022 , 1645204. https://doi.org/10.1155/2022/1645204

Liu, X., & Li, M. (2020 May). Safeguarding intangible cultural heritage to promote mental healthcare in China: Challenges to maintaining the sustainability of safeguarding efforts. International Journal of Social Psychiatry, 66 (3), 311–313. https://doi.org/10.1177/0020764020904752

Maselli, G., Cucco, P., Nestico, A., & Ribera, F. (2024, Jun). Historical heritage-MultiCriteria Decision Method (H-MCDM) to prioritize intervention strategies for the adaptive reuse of valuable architectural assets. MethodsX, 12 , 102487. https://doi.org/10.1016/j.mex.2023.102487

Sun, Z., & Zhang, Y. (2019, Jan 25). Accuracy evaluation of videogrammetry using a low-cost spherical camera for narrow architectural heritage: An observational study with variable baselines and blur filters. Sensors (Basel), 19 (3). https://doi.org/10.3390/s19030496

Wang, L. (2022). Evaluation of the practical effects of environmental measures in the conservation of architectural heritage in Yan'an based on recurrent neural networks. International Journal of Environmental Research and Public Health, 2022 , 3749482. https://doi.org/10.1155/2022/3749482

Wang, X., Li, H., Wang, Y., & Zhao, X. (2022, Aug 20). Assessing climate risk related to precipitation on cultural heritage at the provincial level in China. Science of the Total Environment, 835 , 155489. https://doi.org/10.1016/j.scitotenv.2022.155489

Yang, J., Wang, L., & Wei, S. (2022, Dec 30). Spatial variation and its local influencing factors of intangible cultural heritage development along the grand canal in China. International Journal of Environmental Research and Public Health, 20 (1). https://doi.org/10.3390/ijerph20010662

You, X., Zhang, Y., Tu, Z., Xu, L., Li, L., Lin, R., Chen, K., Chen, S., & Ren, W. (2023, Feb 28). Research on the sustainable renewal of architectural heritage sites from the perspective of extenics-using the example of tulou renovations in LantianVillage, Longyan City. International Journal of Environmental Research and Public Health, 20 (5). https://doi.org/10.3390/ijerph20054378

Yu, H., Verburg, P. H., Liu, L., & Eitelberg, D. A. (2016, Jun). Spatial analysis of cultural heritage landscapes in Rural China: Land use change and its risks for conservation. Environmental Management, 57 (6), 1304–1318. https://doi.org/10.1007/s00267-016-0683-5

Zhang, H., & Long, S. (2023). Evaluation of attraction and spatial pattern analysis of world cultural and natural heritage tourism resources in China. Plos One, 18 (8), e0289093. https://doi.org/10.1371/journal.pone.0289093

Zhang, Z., Zou, Y., & Xiao, W. (2023). Exploration of a virtual restoration practice route for architectural heritage based on evidence-based design: A case study of the Bagong House. Heritage Science, 11 (1), 35. https://doi.org/10.1186/s40494-023-00878-8

Zou, H., Liu, Y., Li, B., & Luo, W. (2022, Oct 11). Sustainable development efficiency of cultural landscape heritage in urban fringe based on GIS-DEA-MI, a case study of Wuhan, China. International Journal of Environmental Research and Public Health, 19 (20). https://doi.org/10.3390/ijerph192013061

Download references

Acknowledgements

This research was supported by Zhejiang University of Science & Technology (NO. F701104K02; 0,101,104,502).

Author information

Authors and affiliations.

School of Civil Engineering and Architecture, Zhejiang University of Science and Technology, Hangzhou, 310023, China

Gaochuan Zhang, Jiajia Lv, Qiaoyuan Lin & Min Fang

Zhejiang Southeast Architectural Design Group CO. LTD, Hangzhou, 310023, China

Gaochuan Zhang

Zhejiang-Singapore Joint Laboratory for Urban Renewal and Future City, Hangzhou, 310023, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Qiaoyuan Lin .

Editor information

Editors and affiliations.

School of Architecture and Urban Planning, Chongqing University, Chongqing, China

Institute for the Advanced Study of Sustainability (UNU-IAS), Tokyo, Japan

Joni Jupesta

Faculty of Economics, University of Gdansk, Gdańsk, Poland

Giuseppe T. Cirella

School of Built Environment, University of New South Wales, Sydney, NSW, Australia

Gloria Pignatta

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Zhang, G., Lv, J., Lin, Q., Fang, M. (2024). Research on Zhe-Style Dwelling Houses from the Perspective of Architecture Culture—Taking the Traditional Village in Xianju County as an Example. In: He, B., Jupesta, J., Cirella, G.T., Pignatta, G. (eds) Urban Climate Change Adaptation. CCES CCES 2022 2023. Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-031-65088-8_20

Download citation

DOI : https://doi.org/10.1007/978-3-031-65088-8_20

Published : 18 September 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-65087-1

Online ISBN : 978-3-031-65088-8

eBook Packages : Political Science and International Studies Political Science and International Studies (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Survey Paper
  • Open access
  • Published: 31 March 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

  • Laith Alzubaidi   ORCID: orcid.org/0000-0002-7296-5413 1 , 5 ,
  • Jinglan Zhang 1 ,
  • Amjad J. Humaidi 2 ,
  • Ayad Al-Dujaili 3 ,
  • Ye Duan 4 ,
  • Omran Al-Shamma 5 ,
  • J. Santamaría 6 ,
  • Mohammed A. Fadhel 7 ,
  • Muthana Al-Amidie 4 &
  • Laith Farhan 8  

Journal of Big Data volume  8 , Article number:  53 ( 2021 ) Cite this article

459k Accesses

2918 Citations

37 Altmetric

Metrics details

In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.

Introduction

Recently, machine learning (ML) has become very widespread in research and has been incorporated in a variety of applications, including text mining, spam detection, video recommendation, image classification, and multimedia concept retrieval [ 1 , 2 , 3 , 4 , 5 , 6 ]. Among the different ML algorithms, deep learning (DL) is very commonly employed in these applications [ 7 , 8 , 9 ]. Another name for DL is representation learning (RL). The continuing appearance of novel studies in the fields of deep and distributed learning is due to both the unpredictable growth in the ability to obtain data and the amazing progress made in the hardware technologies, e.g. High Performance Computing (HPC) [ 10 ].

DL is derived from the conventional neural network but considerably outperforms its predecessors. Moreover, DL employs transformations and graph technologies simultaneously in order to build up multi-layer learning models. The most recently developed DL techniques have obtained good outstanding performance across a variety of applications, including audio and speech processing, visual data processing, natural language processing (NLP), among others [ 11 , 12 , 13 , 14 ].

Usually, the effectiveness of an ML algorithm is highly dependent on the integrity of the input-data representation. It has been shown that a suitable data representation provides an improved performance when compared to a poor data representation. Thus, a significant research trend in ML for many years has been feature engineering, which has informed numerous research studies. This approach aims at constructing features from raw data. In addition, it is extremely field-specific and frequently requires sizable human effort. For instance, several types of features were introduced and compared in the computer vision context, such as, histogram of oriented gradients (HOG) [ 15 ], scale-invariant feature transform (SIFT) [ 16 ], and bag of words (BoW) [ 17 ]. As soon as a novel feature is introduced and is found to perform well, it becomes a new research direction that is pursued over multiple decades.

Relatively speaking, feature extraction is achieved in an automatic way throughout the DL algorithms. This encourages researchers to extract discriminative features using the smallest possible amount of human effort and field knowledge [ 18 ]. These algorithms have a multi-layer data representation architecture, in which the first layers extract the low-level features while the last layers extract the high-level features. Note that artificial intelligence (AI) originally inspired this type of architecture, which simulates the process that occurs in core sensorial regions within the human brain. Using different scenes, the human brain can automatically extract data representation. More specifically, the output of this process is the classified objects, while the received scene information represents the input. This process simulates the working methodology of the human brain. Thus, it emphasizes the main benefit of DL.

In the field of ML, DL, due to its considerable success, is currently one of the most prominent research trends. In this paper, an overview of DL is presented that adopts various perspectives such as the main concepts, architectures, challenges, applications, computational tools and evolution matrix. Convolutional neural network (CNN) is one of the most popular and used of DL networks [ 19 , 20 ]. Because of CNN, DL is very popular nowadays. The main advantage of CNN compared to its predecessors is that it automatically detects the significant features without any human supervision which made it the most used. Therefore, we have dug in deep with CNN by presenting the main components of it. Furthermore, we have elaborated in detail the most common CNN architectures, starting with the AlexNet network and ending with the High-Resolution network (HR.Net).

Several published DL review papers have been presented in the last few years. However, all of them have only been addressed one side focusing on one application or topic such as the review of CNN architectures [ 21 ], DL for classification of plant diseases [ 22 ], DL for object detection [ 23 ], DL applications in medical image analysis [ 24 ], and etc. Although these reviews present good topics, they do not provide a full understanding of DL topics such as concepts, detailed research gaps, computational tools, and DL applications. First, It is required to understand DL aspects including concepts, challenges, and applications then going deep in the applications. To achieve that, it requires extensive time and a large number of research papers to learn about DL including research gaps and applications. Therefore, we propose a deep review of DL to provide a more suitable starting point from which to develop a full understanding of DL from one review paper. The motivation behinds our review was to cover the most important aspect of DL including open challenges, applications, and computational tools perspective. Furthermore, our review can be the first step towards other DL topics.

The main aim of this review is to present the most important aspects of DL to make it easy for researchers and students to have a clear image of DL from single review paper. This review will further advance DL research by helping people discover more about recent developments in the field. Researchers would be allowed to decide the more suitable direction of work to be taken in order to provide more accurate alternatives to the field. Our contributions are outlined as follows:

This is the first review that almost provides a deep survey of the most important aspects of deep learning. This review helps researchers and students to have a good understanding from one paper.

We explain CNN in deep which the most popular deep learning algorithm by describing the concepts, theory, and state-of-the-art architectures.

We review current challenges (limitations) of Deep Learning including lack of training data, Imbalanced Data, Interpretability of data, Uncertainty scaling, Catastrophic forgetting, Model compression, Overfitting, Vanishing gradient problem, Exploding Gradient Problem, and Underspecification. We additionally discuss the proposed solutions tackling these issues.

We provide an exhaustive list of medical imaging applications with deep learning by categorizing them based on the tasks by starting with classification and ending with registration.

We discuss the computational approaches (CPU, GPU, FPGA) by comparing the influence of each tool on deep learning algorithms.

The rest of the paper is organized as follows: “ Survey methodology ” section describes The survey methodology. “ Background ” section presents the background. “ Classification of DL approaches ” section defines the classification of DL approaches. “ Types of DL networks ” section displays types of DL networks. “ CNN architectures ” section shows CNN Architectures. “ Challenges (limitations) of deep learning and alternate solutions ” section details the challenges of DL and alternate solutions. “ Applications of deep learning ” section outlines the applications of DL. “ Computational approaches ” section explains the influence of computational approaches (CPU, GPU, FPGA) on DL. “ Evaluation metrics ” section presents the evaluation metrics. “ Frameworks and datasets ” section lists frameworks and datasets. “ Summary and conclusion ” section presents the summary and conclusion.

Survey methodology

We have reviewed the significant research papers in the field published during 2010–2020, mainly from the years of 2020 and 2019 with some papers from 2021. The main focus was papers from the most reputed publishers such as IEEE, Elsevier, MDPI, Nature, ACM, and Springer. Some papers have been selected from ArXiv. We have reviewed more than 300 papers on various DL topics. There are 108 papers from the year 2020, 76 papers from the year 2019, and 48 papers from the year 2018. This indicates that this review focused on the latest publications in the field of DL. The selected papers were analyzed and reviewed to (1) list and define the DL approaches and network types, (2) list and explain CNN architectures, (3) present the challenges of DL and suggest the alternate solutions, (4) assess the applications of DL, (5) assess computational approaches. The most keywords used for search criteria for this review paper are (“Deep Learning”), (“Machine Learning”), (“Convolution Neural Network”), (“Deep Learning” AND “Architectures”), ((“Deep Learning”) AND (“Image”) AND (“detection” OR “classification” OR “segmentation” OR “Localization”)), (“Deep Learning” AND “detection” OR “classification” OR “segmentation” OR “Localization”), (“Deep Learning” AND “CPU” OR “GPU” OR “FPGA”), (“Deep Learning” AND “Transfer Learning”), (“Deep Learning” AND “Imbalanced Data”), (“Deep Learning” AND “Interpretability of data”), (“Deep Learning” AND “Overfitting”), (“Deep Learning” AND “Underspecification”). Figure  1 shows our search structure of the survey paper. Table  1 presents the details of some of the journals that have been cited in this review paper.

figure 1

Search framework

This section will present a background of DL. We begin with a quick introduction to DL, followed by the difference between DL and ML. We then show the situations that require DL. Finally, we present the reasons for applying DL.

DL, a subset of ML (Fig.  2 ), is inspired by the information processing patterns found in the human brain. DL does not require any human-designed rules to operate; rather, it uses a large amount of data to map the given input to specific labels. DL is designed using numerous layers of algorithms (artificial neural networks, or ANNs), each of which provides a different interpretation of the data that has been fed to them [ 18 , 25 ].

figure 2

Deep learning family

Achieving the classification task using conventional ML techniques requires several sequential steps, specifically pre-processing, feature extraction, wise feature selection, learning, and classification. Furthermore, feature selection has a great impact on the performance of ML techniques. Biased feature selection may lead to incorrect discrimination between classes. Conversely, DL has the ability to automate the learning of feature sets for several tasks, unlike conventional ML methods [ 18 , 26 ]. DL enables learning and classification to be achieved in a single shot (Fig.  3 ). DL has become an incredibly popular type of ML algorithm in recent years due to the huge growth and evolution of the field of big data [ 27 , 28 ]. It is still in continuous development regarding novel performance for several ML tasks [ 22 , 29 , 30 , 31 ] and has simplified the improvement of many learning fields [ 32 , 33 ], such as image super-resolution [ 34 ], object detection [ 35 , 36 ], and image recognition [ 30 , 37 ]. Recently, DL performance has come to exceed human performance on tasks such as image classification (Fig.  4 ).

figure 3

The difference between deep learning and traditional machine learning

figure 4

Deep learning performance compared to human

Nearly all scientific fields have felt the impact of this technology. Most industries and businesses have already been disrupted and transformed through the use of DL. The leading technology and economy-focused companies around the world are in a race to improve DL. Even now, human-level performance and capability cannot exceed that the performance of DL in many areas, such as predicting the time taken to make car deliveries, decisions to certify loan requests, and predicting movie ratings [ 38 ]. The winners of the 2019 “Nobel Prize” in computing, also known as the Turing Award, were three pioneers in the field of DL (Yann LeCun, Geoffrey Hinton, and Yoshua Bengio) [ 39 ]. Although a large number of goals have been achieved, there is further progress to be made in the DL context. In fact, DL has the ability to enhance human lives by providing additional accuracy in diagnosis, including estimating natural disasters [ 40 ], the discovery of new drugs [ 41 ], and cancer diagnosis [ 42 , 43 , 44 ]. Esteva et al. [ 45 ] found that a DL network has the same ability to diagnose the disease as twenty-one board-certified dermatologists using 129,450 images of 2032 diseases. Furthermore, in grading prostate cancer, US board-certified general pathologists achieved an average accuracy of 61%, while the Google AI [ 44 ] outperformed these specialists by achieving an average accuracy of 70%. In 2020, DL is playing an increasingly vital role in early diagnosis of the novel coronavirus (COVID-19) [ 29 , 46 , 47 , 48 ]. DL has become the main tool in many hospitals around the world for automatic COVID-19 classification and detection using chest X-ray images or other types of images. We end this section by the saying of AI pioneer Geoffrey Hinton “Deep learning is going to be able to do everything”.

When to apply deep learning

Machine intelligence is useful in many situations which is equal or better than human experts in some cases [ 49 , 50 , 51 , 52 ], meaning that DL can be a solution to the following problems:

Cases where human experts are not available.

Cases where humans are unable to explain decisions made using their expertise (language understanding, medical decisions, and speech recognition).

Cases where the problem solution updates over time (price prediction, stock preference, weather prediction, and tracking).

Cases where solutions require adaptation based on specific cases (personalization, biometrics).

Cases where size of the problem is extremely large and exceeds our inadequate reasoning abilities (sentiment analysis, matching ads to Facebook, calculation webpage ranks).

Why deep learning?

Several performance features may answer this question, e.g

Universal Learning Approach: Because DL has the ability to perform in approximately all application domains, it is sometimes referred to as universal learning.

Robustness: In general, precisely designed features are not required in DL techniques. Instead, the optimized features are learned in an automated fashion related to the task under consideration. Thus, robustness to the usual changes of the input data is attained.

Generalization: Different data types or different applications can use the same DL technique, an approach frequently referred to as transfer learning (TL) which explained in the latter section. Furthermore, it is a useful approach in problems where data is insufficient.

Scalability: DL is highly scalable. ResNet [ 37 ], which was invented by Microsoft, comprises 1202 layers and is frequently applied at a supercomputing scale. Lawrence Livermore National Laboratory (LLNL), a large enterprise working on evolving frameworks for networks, adopted a similar approach, where thousands of nodes can be implemented [ 53 ].

Classification of DL approaches

DL techniques are classified into three major categories: unsupervised, partially supervised (semi-supervised) and supervised. Furthermore, deep reinforcement learning (DRL), also known as RL, is another type of learning technique, which is mostly considered to fall into the category of partially supervised (and occasionally unsupervised) learning techniques.

Deep supervised learning

Deep semi-supervised learning.

In this technique, the learning process is based on semi-labeled datasets. Occasionally, generative adversarial networks (GANs) and DRL are employed in the same way as this technique. In addition, RNNs, which include GRUs and LSTMs, are also employed for partially supervised learning. One of the advantages of this technique is to minimize the amount of labeled data needed. On other the hand, One of the disadvantages of this technique is irrelevant input feature present training data could furnish incorrect decisions. Text document classifier is one of the most popular example of an application of semi-supervised learning. Due to difficulty of obtaining a large amount of labeled text documents, semi-supervised learning is ideal for text document classification task.

Deep unsupervised learning

This technique makes it possible to implement the learning process in the absence of available labeled data (i.e. no labels are required). Here, the agent learns the significant features or interior representation required to discover the unidentified structure or relationships in the input data. Techniques of generative networks, dimensionality reduction and clustering are frequently counted within the category of unsupervised learning. Several members of the DL family have performed well on non-linear dimensionality reduction and clustering tasks; these include restricted Boltzmann machines, auto-encoders and GANs as the most recently developed techniques. Moreover, RNNs, which include GRUs and LSTM approaches, have also been employed for unsupervised learning in a wide range of applications. The main disadvantages of unsupervised learning are unable to provide accurate information concerning data sorting and computationally complex. One of the most popular unsupervised learning approaches is clustering [ 54 ].

Deep reinforcement learning

For solving a task, the selection of the type of reinforcement learning that needs to be performed is based on the space or the scope of the problem. For example, DRL is the best way for problems involving many parameters to be optimized. By contrast, derivative-free reinforcement learning is a technique that performs well for problems with limited parameters. Some of the applications of reinforcement learning are business strategy planning and robotics for industrial automation. The main drawback of Reinforcement Learning is that parameters may influence the speed of learning. Here are the main motivations for utilizing Reinforcement Learning:

It assists you to identify which action produces the highest reward over a longer period.

It assists you to discover which situation requires action.

It also enables it to figure out the best approach for reaching large rewards.

Reinforcement Learning also gives the learning agent a reward function.

Reinforcement Learning can’t utilize in all the situation such as:

In case there is sufficient data to resolve the issue with supervised learning techniques.

Reinforcement Learning is computing-heavy and time-consuming. Specially when the workspace is large.

Types of DL networks

The most famous types of deep learning networks are discussed in this section: these include recursive neural networks (RvNNs), RNNs, and CNNs. RvNNs and RNNs were briefly explained in this section while CNNs were explained in deep due to the importance of this type. Furthermore, it is the most used in several applications among other networks.

Recursive neural networks

RvNN can achieve predictions in a hierarchical structure also classify the outputs utilizing compositional vectors [ 57 ]. Recursive auto-associative memory (RAAM) [ 58 ] is the primary inspiration for the RvNN development. The RvNN architecture is generated for processing objects, which have randomly shaped structures like graphs or trees. This approach generates a fixed-width distributed representation from a variable-size recursive-data structure. The network is trained using an introduced back-propagation through structure (BTS) learning system [ 58 ]. The BTS system tracks the same technique as the general-back propagation algorithm and has the ability to support a treelike structure. Auto-association trains the network to regenerate the input-layer pattern at the output layer. RvNN is highly effective in the NLP context. Socher et al. [ 59 ] introduced RvNN architecture designed to process inputs from a variety of modalities. These authors demonstrate two applications for classifying natural language sentences: cases where each sentence is split into words and nature images, and cases where each image is separated into various segments of interest. RvNN computes a likely pair of scores for merging and constructs a syntactic tree. Furthermore, RvNN calculates a score related to the merge plausibility for every pair of units. Next, the pair with the largest score is merged within a composition vector. Following every merge, RvNN generates (a) a larger area of numerous units, (b) a compositional vector of the area, and (c) a label for the class (for instance, a noun phrase will become the class label for the new area if two units are noun words). The compositional vector for the entire area is the root of the RvNN tree structure. An example RvNN tree is shown in Fig.  5 . RvNN has been employed in several applications [ 60 , 61 , 62 ].

figure 5

An example of RvNN tree

Recurrent neural networks

RNNs are a commonly employed and familiar algorithm in the discipline of DL [ 63 , 64 , 65 ]. RNN is mainly applied in the area of speech processing and NLP contexts [ 66 , 67 ]. Unlike conventional networks, RNN uses sequential data in the network. Since the embedded structure in the sequence of the data delivers valuable information, this feature is fundamental to a range of different applications. For instance, it is important to understand the context of the sentence in order to determine the meaning of a specific word in it. Thus, it is possible to consider the RNN as a unit of short-term memory, where x represents the input layer, y is the output layer, and s represents the state (hidden) layer. For a given input sequence, a typical unfolded RNN diagram is illustrated in Fig.  6 . Pascanu et al. [ 68 ] introduced three different types of deep RNN techniques, namely “Hidden-to-Hidden”, “Hidden-to-Output”, and “Input-to-Hidden”. A deep RNN is introduced that lessens the learning difficulty in the deep network and brings the benefits of a deeper RNN based on these three techniques.

figure 6

Typical unfolded RNN diagram

However, RNN’s sensitivity to the exploding gradient and vanishing problems represent one of the main issues with this approach [ 69 ]. More specifically, during the training process, the reduplications of several large or small derivatives may cause the gradients to exponentially explode or decay. With the entrance of new inputs, the network stops thinking about the initial ones; therefore, this sensitivity decays over time. Furthermore, this issue can be handled using LSTM [ 70 ]. This approach offers recurrent connections to memory blocks in the network. Every memory block contains a number of memory cells, which have the ability to store the temporal states of the network. In addition, it contains gated units for controlling the flow of information. In very deep networks [ 37 ], residual connections also have the ability to considerably reduce the impact of the vanishing gradient issue which explained in later sections. CNN is considered to be more powerful than RNN. RNN includes less feature compatibility when compared to CNN.

Convolutional neural networks

In the field of DL, the CNN is the most famous and commonly employed algorithm [ 30 , 71 , 72 , 73 , 74 , 75 ]. The main benefit of CNN compared to its predecessors is that it automatically identifies the relevant features without any human supervision [ 76 ]. CNNs have been extensively applied in a range of different fields, including computer vision [ 77 ], speech processing [ 78 ], Face Recognition [ 79 ], etc. The structure of CNNs was inspired by neurons in human and animal brains, similar to a conventional neural network. More specifically, in a cat’s brain, a complex sequence of cells forms the visual cortex; this sequence is simulated by the CNN [ 80 ]. Goodfellow et al. [ 28 ] identified three key benefits of the CNN: equivalent representations, sparse interactions, and parameter sharing. Unlike conventional fully connected (FC) networks, shared weights and local connections in the CNN are employed to make full use of 2D input-data structures like image signals. This operation utilizes an extremely small number of parameters, which both simplifies the training process and speeds up the network. This is the same as in the visual cortex cells. Notably, only small regions of a scene are sensed by these cells rather than the whole scene (i.e., these cells spatially extract the local correlation available in the input, like local filters over the input).

A commonly used type of CNN, which is similar to the multi-layer perceptron (MLP), consists of numerous convolution layers preceding sub-sampling (pooling) layers, while the ending layers are FC layers. An example of CNN architecture for image classification is illustrated in Fig.  7 .

figure 7

An example of CNN architecture for image classification

The input x of each layer in a CNN model is organized in three dimensions: height, width, and depth, or \(m \times m \times r\) , where the height (m) is equal to the width. The depth is also referred to as the channel number. For example, in an RGB image, the depth (r) is equal to three. Several kernels (filters) available in each convolutional layer are denoted by k and also have three dimensions ( \(n \times n \times q\) ), similar to the input image; here, however, n must be smaller than m , while q is either equal to or smaller than r . In addition, the kernels are the basis of the local connections, which share similar parameters (bias \(b^{k}\) and weight \(W^{k}\) ) for generating k feature maps \(h^{k}\) with a size of ( \(m-n-1\) ) each and are convolved with input, as mentioned above. The convolution layer calculates a dot product between its input and the weights as in Eq. 1 , similar to NLP, but the inputs are undersized areas of the initial image size. Next, by applying the nonlinearity or an activation function to the convolution-layer output, we obtain the following:

The next step is down-sampling every feature map in the sub-sampling layers. This leads to a reduction in the network parameters, which accelerates the training process and in turn enables handling of the overfitting issue. For all feature maps, the pooling function (e.g. max or average) is applied to an adjacent area of size \(p \times p\) , where p is the kernel size. Finally, the FC layers receive the mid- and low-level features and create the high-level abstraction, which represents the last-stage layers as in a typical neural network. The classification scores are generated using the ending layer [e.g. support vector machines (SVMs) or softmax]. For a given instance, every score represents the probability of a specific class.

Benefits of employing CNNs

The benefits of using CNNs over other traditional neural networks in the computer vision environment are listed as follows:

The main reason to consider CNN is the weight sharing feature, which reduces the number of trainable network parameters and in turn helps the network to enhance generalization and to avoid overfitting.

Concurrently learning the feature extraction layers and the classification layer causes the model output to be both highly organized and highly reliant on the extracted features.

Large-scale network implementation is much easier with CNN than with other neural networks.

The CNN architecture consists of a number of layers (or so-called multi-building blocks). Each layer in the CNN architecture, including its function, is described in detail below.

Convolutional Layer: In CNN architecture, the most significant component is the convolutional layer. It consists of a collection of convolutional filters (so-called kernels). The input image, expressed as N-dimensional metrics, is convolved with these filters to generate the output feature map.

Kernel definition: A grid of discrete numbers or values describes the kernel. Each value is called the kernel weight. Random numbers are assigned to act as the weights of the kernel at the beginning of the CNN training process. In addition, there are several different methods used to initialize the weights. Next, these weights are adjusted at each training era; thus, the kernel learns to extract significant features.

Convolutional Operation: Initially, the CNN input format is described. The vector format is the input of the traditional neural network, while the multi-channeled image is the input of the CNN. For instance, single-channel is the format of the gray-scale image, while the RGB image format is three-channeled. To understand the convolutional operation, let us take an example of a \(4 \times 4\) gray-scale image with a \(2 \times 2\) random weight-initialized kernel. First, the kernel slides over the whole image horizontally and vertically. In addition, the dot product between the input image and the kernel is determined, where their corresponding values are multiplied and then summed up to create a single scalar value, calculated concurrently. The whole process is then repeated until no further sliding is possible. Note that the calculated dot product values represent the feature map of the output. Figure  8 graphically illustrates the primary calculations executed at each step. In this figure, the light green color represents the \(2 \times 2\) kernel, while the light blue color represents the similar size area of the input image. Both are multiplied; the end result after summing up the resulting product values (marked in a light orange color) represents an entry value to the output feature map.

figure 8

The primary calculations executed at each step of convolutional layer

However, padding to the input image is not applied in the previous example, while a stride of one (denoted for the selected step-size over all vertical or horizontal locations) is applied to the kernel. Note that it is also possible to use another stride value. In addition, a feature map of lower dimensions is obtained as a result of increasing the stride value.

On the other hand, padding is highly significant to determining border size information related to the input image. By contrast, the border side-features moves carried away very fast. By applying padding, the size of the input image will increase, and in turn, the size of the output feature map will also increase. Core Benefits of Convolutional Layers.

Sparse Connectivity: Each neuron of a layer in FC neural networks links with all neurons in the following layer. By contrast, in CNNs, only a few weights are available between two adjacent layers. Thus, the number of required weights or connections is small, while the memory required to store these weights is also small; hence, this approach is memory-effective. In addition, matrix operation is computationally much more costly than the dot (.) operation in CNN.

Weight Sharing: There are no allocated weights between any two neurons of neighboring layers in CNN, as the whole weights operate with one and all pixels of the input matrix. Learning a single group of weights for the whole input will significantly decrease the required training time and various costs, as it is not necessary to learn additional weights for each neuron.

Pooling Layer: The main task of the pooling layer is the sub-sampling of the feature maps. These maps are generated by following the convolutional operations. In other words, this approach shrinks large-size feature maps to create smaller feature maps. Concurrently, it maintains the majority of the dominant information (or features) in every step of the pooling stage. In a similar manner to the convolutional operation, both the stride and the kernel are initially size-assigned before the pooling operation is executed. Several types of pooling methods are available for utilization in various pooling layers. These methods include tree pooling, gated pooling, average pooling, min pooling, max pooling, global average pooling (GAP), and global max pooling. The most familiar and frequently utilized pooling methods are the max, min, and GAP pooling. Figure  9 illustrates these three pooling operations.

figure 9

Three types of pooling operations

Sometimes, the overall CNN performance is decreased as a result; this represents the main shortfall of the pooling layer, as this layer helps the CNN to determine whether or not a certain feature is available in the particular input image, but focuses exclusively on ascertaining the correct location of that feature. Thus, the CNN model misses the relevant information.

Activation Function (non-linearity) Mapping the input to the output is the core function of all types of activation function in all types of neural network. The input value is determined by computing the weighted summation of the neuron input along with its bias (if present). This means that the activation function makes the decision as to whether or not to fire a neuron with reference to a particular input by creating the corresponding output.

Non-linear activation layers are employed after all layers with weights (so-called learnable layers, such as FC layers and convolutional layers) in CNN architecture. This non-linear performance of the activation layers means that the mapping of input to output will be non-linear; moreover, these layers give the CNN the ability to learn extra-complicated things. The activation function must also have the ability to differentiate, which is an extremely significant feature, as it allows error back-propagation to be used to train the network. The following types of activation functions are most commonly used in CNN and other deep neural networks.

Sigmoid: The input of this activation function is real numbers, while the output is restricted to between zero and one. The sigmoid function curve is S-shaped and can be represented mathematically by Eq. 2 .

Tanh: It is similar to the sigmoid function, as its input is real numbers, but the output is restricted to between − 1 and 1. Its mathematical representation is in Eq. 3 .

ReLU: The mostly commonly used function in the CNN context. It converts the whole values of the input to positive numbers. Lower computational load is the main benefit of ReLU over the others. Its mathematical representation is in Eq. 4 .

Occasionally, a few significant issues may occur during the use of ReLU. For instance, consider an error back-propagation algorithm with a larger gradient flowing through it. Passing this gradient within the ReLU function will update the weights in a way that makes the neuron certainly not activated once more. This issue is referred to as “Dying ReLU”. Some ReLU alternatives exist to solve such issues. The following discusses some of them.

Leaky ReLU: Instead of ReLU down-scaling the negative inputs, this activation function ensures these inputs are never ignored. It is employed to solve the Dying ReLU problem. Leaky ReLU can be represented mathematically as in Eq. 5 .

Note that the leak factor is denoted by m. It is commonly set to a very small value, such as 0.001.

Noisy ReLU: This function employs a Gaussian distribution to make ReLU noisy. It can be represented mathematically as in Eq. 6 .

Parametric Linear Units: This is mostly the same as Leaky ReLU. The main difference is that the leak factor in this function is updated through the model training process. The parametric linear unit can be represented mathematically as in Eq. 7 .

Note that the learnable weight is denoted as a.

Fully Connected Layer: Commonly, this layer is located at the end of each CNN architecture. Inside this layer, each neuron is connected to all neurons of the previous layer, the so-called Fully Connected (FC) approach. It is utilized as the CNN classifier. It follows the basic method of the conventional multiple-layer perceptron neural network, as it is a type of feed-forward ANN. The input of the FC layer comes from the last pooling or convolutional layer. This input is in the form of a vector, which is created from the feature maps after flattening. The output of the FC layer represents the final CNN output, as illustrated in Fig.  10 .

figure 10

Fully connected layer

Loss Functions: The previous section has presented various layer-types of CNN architecture. In addition, the final classification is achieved from the output layer, which represents the last layer of the CNN architecture. Some loss functions are utilized in the output layer to calculate the predicted error created across the training samples in the CNN model. This error reveals the difference between the actual output and the predicted one. Next, it will be optimized through the CNN learning process.

However, two parameters are used by the loss function to calculate the error. The CNN estimated output (referred to as the prediction) is the first parameter. The actual output (referred to as the label) is the second parameter. Several types of loss function are employed in various problem types. The following concisely explains some of the loss function types.

Cross-Entropy or Softmax Loss Function: This function is commonly employed for measuring the CNN model performance. It is also referred to as the log loss function. Its output is the probability \(p \in \left\{ 0\left. , 1 \right\} \right. \) . In addition, it is usually employed as a substitution of the square error loss function in multi-class classification problems. In the output layer, it employs the softmax activations to generate the output within a probability distribution. The mathematical representation of the output class probability is Eq. 8 .

Here, \(e^{a_{i}}\) represents the non-normalized output from the preceding layer, while N represents the number of neurons in the output layer. Finally, the mathematical representation of cross-entropy loss function is Eq. 9 .

Euclidean Loss Function: This function is widely used in regression problems. In addition, it is also the so-called mean square error. The mathematical expression of the estimated Euclidean loss is Eq. 10 .

Hinge Loss Function: This function is commonly employed in problems related to binary classification. This problem relates to maximum-margin-based classification; this is mostly important for SVMs, which use the hinge loss function, wherein the optimizer attempts to maximize the margin around dual objective classes. Its mathematical formula is Eq. 11 .

The margin m is commonly set to 1. Moreover, the predicted output is denoted as \(p_{_{i}}\) , while the desired output is denoted as \(y_{_{i}}\) .

Regularization to CNN

For CNN models, over-fitting represents the central issue associated with obtaining well-behaved generalization. The model is entitled over-fitted in cases where the model executes especially well on training data and does not succeed on test data (unseen data) which is more explained in the latter section. An under-fitted model is the opposite; this case occurs when the model does not learn a sufficient amount from the training data. The model is referred to as “just-fitted” if it executes well on both training and testing data. These three types are illustrated in Fig.  11 . Various intuitive concepts are used to help the regularization to avoid over-fitting; more details about over-fitting and under-fitting are discussed in latter sections.

Dropout: This is a widely utilized technique for generalization. During each training epoch, neurons are randomly dropped. In doing this, the feature selection power is distributed equally across the whole group of neurons, as well as forcing the model to learn different independent features. During the training process, the dropped neuron will not be a part of back-propagation or forward-propagation. By contrast, the full-scale network is utilized to perform prediction during the testing process.

Drop-Weights: This method is highly similar to dropout. In each training epoch, the connections between neurons (weights) are dropped rather than dropping the neurons; this represents the only difference between drop-weights and dropout.

Data Augmentation: Training the model on a sizeable amount of data is the easiest way to avoid over-fitting. To achieve this, data augmentation is used. Several techniques are utilized to artificially expand the size of the training dataset. More details can be found in the latter section, which describes the data augmentation techniques.

Batch Normalization: This method ensures the performance of the output activations [ 81 ]. This performance follows a unit Gaussian distribution. Subtracting the mean and dividing by the standard deviation will normalize the output at each layer. While it is possible to consider this as a pre-processing task at each layer in the network, it is also possible to differentiate and to integrate it with other networks. In addition, it is employed to reduce the “internal covariance shift” of the activation layers. In each layer, the variation in the activation distribution defines the internal covariance shift. This shift becomes very high due to the continuous weight updating through training, which may occur if the samples of the training data are gathered from numerous dissimilar sources (for example, day and night images). Thus, the model will consume extra time for convergence, and in turn, the time required for training will also increase. To resolve this issue, a layer representing the operation of batch normalization is applied in the CNN architecture.

The advantages of utilizing batch normalization are as follows:

It prevents the problem of vanishing gradient from arising.

It can effectively control the poor weight initialization.

It significantly reduces the time required for network convergence (for large-scale datasets, this will be extremely useful).

It struggles to decrease training dependency across hyper-parameters.

Chances of over-fitting are reduced, since it has a minor influence on regularization.

figure 11

Over-fitting and under-fitting issues

Optimizer selection

This section discusses the CNN learning process. Two major issues are included in the learning process: the first issue is the learning algorithm selection (optimizer), while the second issue is the use of many enhancements (such as AdaDelta, Adagrad, and momentum) along with the learning algorithm to enhance the output.

Loss functions, which are founded on numerous learnable parameters (e.g. biases, weights, etc.) or minimizing the error (variation between actual and predicted output), are the core purpose of all supervised learning algorithms. The techniques of gradient-based learning for a CNN network appear as the usual selection. The network parameters should always update though all training epochs, while the network should also look for the locally optimized answer in all training epochs in order to minimize the error.

The learning rate is defined as the step size of the parameter updating. The training epoch represents a complete repetition of the parameter update that involves the complete training dataset at one time. Note that it needs to select the learning rate wisely so that it does not influence the learning process imperfectly, although it is a hyper-parameter.

Gradient Descent or Gradient-based learning algorithm: To minimize the training error, this algorithm repetitively updates the network parameters through every training epoch. More specifically, to update the parameters correctly, it needs to compute the objective function gradient (slope) by applying a first-order derivative with respect to the network parameters. Next, the parameter is updated in the reverse direction of the gradient to reduce the error. The parameter updating process is performed though network back-propagation, in which the gradient at every neuron is back-propagated to all neurons in the preceding layer. The mathematical representation of this operation is as Eq. 12 .

The final weight in the current training epoch is denoted by \(w_{i j^{t}}\) , while the weight in the preceding \((t-1)\) training epoch is denoted \(w_{i j^{t-1}}\) . The learning rate is \(\eta \) and the prediction error is E . Different alternatives of the gradient-based learning algorithm are available and commonly employed; these include the following:

Batch Gradient Descent: During the execution of this technique [ 82 ], the network parameters are updated merely one time behind considering all training datasets via the network. In more depth, it calculates the gradient of the whole training set and subsequently uses this gradient to update the parameters. For a small-sized dataset, the CNN model converges faster and creates an extra-stable gradient using BGD. Since the parameters are changed only once for every training epoch, it requires a substantial amount of resources. By contrast, for a large training dataset, additional time is required for converging, and it could converge to a local optimum (for non-convex instances).

Stochastic Gradient Descent: The parameters are updated at each training sample in this technique [ 83 ]. It is preferred to arbitrarily sample the training samples in every epoch in advance of training. For a large-sized training dataset, this technique is both more memory-effective and much faster than BGD. However, because it is frequently updated, it takes extremely noisy steps in the direction of the answer, which in turn causes the convergence behavior to become highly unstable.

Mini-batch Gradient Descent: In this approach, the training samples are partitioned into several mini-batches, in which every mini-batch can be considered an under-sized collection of samples with no overlap between them [ 84 ]. Next, parameter updating is performed following gradient computation on every mini-batch. The advantage of this method comes from combining the advantages of both BGD and SGD techniques. Thus, it has a steady convergence, more computational efficiency and extra memory effectiveness. The following describes several enhancement techniques in gradient-based learning algorithms (usually in SGD), which further powerfully enhance the CNN training process.

Momentum: For neural networks, this technique is employed in the objective function. It enhances both the accuracy and the training speed by summing the computed gradient at the preceding training step, which is weighted via a factor \(\lambda \) (known as the momentum factor). However, it therefore simply becomes stuck in a local minimum rather than a global minimum. This represents the main disadvantage of gradient-based learning algorithms. Issues of this kind frequently occur if the issue has no convex surface (or solution space).

Together with the learning algorithm, momentum is used to solve this issue, which can be expressed mathematically as in Eq. 13 .

The weight increment in the current \(t^{\prime} \text{th}\) training epoch is denoted as \( \Delta w_{i j^{t}}\) , while \(\eta \) is the learning rate, and the weight increment in the preceding \((t-1)^{\prime} \text{th}\) training epoch. The momentum factor value is maintained within the range 0 to 1; in turn, the step size of the weight updating increases in the direction of the bare minimum to minimize the error. As the value of the momentum factor becomes very low, the model loses its ability to avoid the local bare minimum. By contrast, as the momentum factor value becomes high, the model develops the ability to converge much more rapidly. If a high value of momentum factor is used together with LR, then the model could miss the global bare minimum by crossing over it.

However, when the gradient varies its direction continually throughout the training process, then the suitable value of the momentum factor (which is a hyper-parameter) causes a smoothening of the weight updating variations.

Adaptive Moment Estimation (Adam): It is another optimization technique or learning algorithm that is widely used. Adam [ 85 ] represents the latest trends in deep learning optimization. This is represented by the Hessian matrix, which employs a second-order derivative. Adam is a learning strategy that has been designed specifically for training deep neural networks. More memory efficient and less computational power are two advantages of Adam. The mechanism of Adam is to calculate adaptive LR for each parameter in the model. It integrates the pros of both Momentum and RMSprop. It utilizes the squared gradients to scale the learning rate as RMSprop and it is similar to the momentum by using the moving average of the gradient. The equation of Adam is represented in Eq. 14 .

Design of algorithms (backpropagation)

Let’s start with a notation that refers to weights in the network unambiguously. We denote \({\varvec{w}}_{i j}^{h}\) to be the weight for the connection from \(\text {ith}\) input or (neuron at \(\left. (\text {h}-1){\text{th}}\right) \) to the \(j{\text{t }}\) neuron in the \(\text {hth}\) layer. So, Fig. 12 shows the weight on a connection from the neuron in the first layer to another neuron in the next layer in the network.

figure 12

MLP structure

Where \(w_{11}^{2}\) has represented the weight from the first neuron in the first layer to the first neuron in the second layer, based on that the second weight for the same neuron will be \(w_{21}^{2}\) which means is the weight comes from the second neuron in the previous layer to the first layer in the next layer which is the second in this net. Regarding the bias, since the bias is not the connection between the neurons for the layers, so it is easily handled each neuron must have its own bias, some network each layer has a certain bias. It can be seen from the above net that each layer has its own bias. Each network has the parameters such as the no of the layer in the net, the number of the neurons in each layer, no of the weight (connection) between the layers, the no of connection can be easily determined based on the no of neurons in each layer, for example, if there are ten input fully connect with two neurons in the next layer then the number of connection between them is \((10 * 2=20\) connection, weights), how the error is defined, and the weight is updated, we will imagine there is there are two layers in our neural network,

where \(\text {d}\) is the label of induvial input \(\text {ith}\) and \(\text {y}\) is the output of the same individual input. Backpropagation is about understanding how to change the weights and biases in a network based on the changes of the cost function (Error). Ultimately, this means computing the partial derivatives \(\partial \text {E} / \partial \text {w}_{\text {ij}}^{h}\) and \(\partial \text {E} / \partial \text {b}_{\text {j}}^{h}.\) But to compute those, a local variable is introduced, \(\delta _{j}^{1}\) which is called the local error in the \(j{\text{th} }\) neuron in the \(h{\text{th} }\) layer. Based on that local error Backpropagation will give the procedure to compute \(\partial \text {E} / \partial \text {w}_{\text {ij}}^{h}\) and \(\partial \text {E} / \partial \text {b}_{\text {j}}^{h}\) how the error is defined, and the weight is updated, we will imagine there is there are two layers in our neural network that is shown in Fig. 13 .

figure 13

Neuron activation functions

Output error for \(\delta _{\text {j}}^{1}\) each \(1=1: \text {L}\) where \(\text {L}\) is no. of neuron in output

where \(\text {e}(\text {k})\) is the error of the epoch \(\text {k}\) as shown in Eq. ( 2 ) and \(\varvec{\vartheta }^{\prime }\left( {\varvec{v}}_{j}({\varvec{k}})\right) \) is the derivate of the activation function for \(v_{j}\) at the output.

Backpropagate the error at all the rest layer except the output

where \(\delta _{j}^{1}({\mathbf {k}})\) is the output error and \(w_{j l}^{h+1}(k)\) is represented the weight after the layer where the error need to obtain.

After finding the error at each neuron in each layer, now we can update the weight in each layer based on Eqs. ( 16 ) and ( 17 ).

Improving performance of CNN

Based on our experiments in different DL applications [ 86 , 87 , 88 ]. We can conclude the most active solutions that may improve the performance of CNN are:

Expand the dataset with data augmentation or use transfer learning (explained in latter sections).

Increase the training time.

Increase the depth (or width) of the model.

Add regularization.

Increase hyperparameters tuning.

CNN architectures

Over the last 10 years, several CNN architectures have been presented [ 21 , 26 ]. Model architecture is a critical factor in improving the performance of different applications. Various modifications have been achieved in CNN architecture from 1989 until today. Such modifications include structural reformulation, regularization, parameter optimizations, etc. Conversely, it should be noted that the key upgrade in CNN performance occurred largely due to the processing-unit reorganization, as well as the development of novel blocks. In particular, the most novel developments in CNN architectures were performed on the use of network depth. In this section, we review the most popular CNN architectures, beginning from the AlexNet model in 2012 and ending at the High-Resolution (HR) model in 2020. Studying these architectures features (such as input size, depth, and robustness) is the key to help researchers to choose the suitable architecture for the their target task. Table  2 presents the brief overview of CNN architectures.

The history of deep CNNs began with the appearance of LeNet [ 89 ] (Fig.  14 ). At that time, the CNNs were restricted to handwritten digit recognition tasks, which cannot be scaled to all image classes. In deep CNN architecture, AlexNet is highly respected [ 30 ], as it achieved innovative results in the fields of image recognition and classification. Krizhevesky et al. [ 30 ] first proposed AlexNet and consequently improved the CNN learning ability by increasing its depth and implementing several parameter optimization strategies. Figure  15 illustrates the basic design of the AlexNet architecture.

figure 14

The architecture of LeNet

figure 15

The architecture of AlexNet

The learning ability of the deep CNN was limited at this time due to hardware restrictions. To overcome these hardware limitations, two GPUs (NVIDIA GTX 580) were used in parallel to train AlexNet. Moreover, in order to enhance the applicability of the CNN to different image categories, the number of feature extraction stages was increased from five in LeNet to seven in AlexNet. Regardless of the fact that depth enhances generalization for several image resolutions, it was in fact overfitting that represented the main drawback related to the depth. Krizhevesky et al. used Hinton’s idea to address this problem [ 90 , 91 ]. To ensure that the features learned by the algorithm were extra robust, Krizhevesky et al.’s algorithm randomly passes over several transformational units throughout the training stage. Moreover, by reducing the vanishing gradient problem, ReLU [ 92 ] could be utilized as a non-saturating activation function to enhance the rate of convergence [ 93 ]. Local response normalization and overlapping subsampling were also performed to enhance the generalization by decreasing the overfitting. To improve on the performance of previous networks, other modifications were made by using large-size filters \((5\times 5 \; \text{and}\; 11 \times 11)\) in the earlier layers. AlexNet has considerable significance in the recent CNN generations, as well as beginning an innovative research era in CNN applications.

Network-in-network

This network model, which has some slight differences from the preceding models, introduced two innovative concepts [ 94 ]. The first was employing multiple layers of perception convolution. These convolutions are executed using a 1×1 filter, which supports the addition of extra nonlinearity in the networks. Moreover, this supports enlarging the network depth, which may later be regularized using dropout. For DL models, this idea is frequently employed in the bottleneck layer. As a substitution for a FC layer, the GAP is also employed, which represents the second novel concept and enables a significant reduction in the number of model parameters. In addition, GAP considerably updates the network architecture. Generating a final low-dimensional feature vector with no reduction in the feature maps dimension is possible when GAP is used on a large feature map [ 95 , 96 ]. Figure  16 shows the structure of the network.

figure 16

The architecture of network-in-network

Before 2013, the CNN learning mechanism was basically constructed on a trial-and-error basis, which precluded an understanding of the precise purpose following the enhancement. This issue restricted the deep CNN performance on convoluted images. In response, Zeiler and Fergus introduced DeconvNet (a multilayer de-convolutional neural network) in 2013 [ 97 ]. This method later became known as ZefNet, which was developed in order to quantitively visualize the network. Monitoring the CNN performance via understanding the neuron activation was the purpose of the network activity visualization. However, Erhan et al. utilized this exact concept to optimize deep belief network (DBN) performance by visualizing the features of the hidden layers [ 98 ]. Moreover, in addition to this issue, Le et al. assessed the deep unsupervised auto-encoder (AE) performance by visualizing the created classes of the image using the output neurons [ 99 ]. By reversing the operation order of the convolutional and pooling layers, DenconvNet operates like a forward-pass CNN. Reverse mapping of this kind launches the convolutional layer output backward to create visually observable image shapes that accordingly give the neural interpretation of the internal feature representation learned at each layer [ 100 ]. Monitoring the learning schematic through the training stage was the key concept underlying ZefNet. In addition, it utilized the outcomes to recognize an ability issue coupled with the model. This concept was experimentally proven on AlexNet by applying DeconvNet. This indicated that only certain neurons were working, while the others were out of action in the first two layers of the network. Furthermore, it indicated that the features extracted via the second layer contained aliasing objects. Thus, Zeiler and Fergus changed the CNN topology due to the existence of these outcomes. In addition, they executed parameter optimization, and also exploited the CNN learning by decreasing the stride and the filter sizes in order to retain all features of the initial two convolutional layers. An improvement in performance was accordingly achieved due to this rearrangement in CNN topology. This rearrangement proposed that the visualization of the features could be employed to identify design weaknesses and conduct appropriate parameter alteration. Figure  17 shows the structure of the network.

figure 17

The architecture of ZefNet

Visual geometry group (VGG)

After CNN was determined to be effective in the field of image recognition, an easy and efficient design principle for CNN was proposed by Simonyan and Zisserman. This innovative design was called Visual Geometry Group (VGG). A multilayer model [ 101 ], it featured nineteen more layers than ZefNet [ 97 ] and AlexNet [ 30 ] to simulate the relations of the network representational capacity in depth. Conversely, in the 2013-ILSVRC competition, ZefNet was the frontier network, which proposed that filters with small sizes could enhance the CNN performance. With reference to these results, VGG inserted a layer of the heap of \(3\times 3\) filters rather than the \(5\times 5\) and 11 × 11 filters in ZefNet. This showed experimentally that the parallel assignment of these small-size filters could produce the same influence as the large-size filters. In other words, these small-size filters made the receptive field similarly efficient to the large-size filters \((7 \times 7 \; \text{and}\; 5 \times 5)\) . By decreasing the number of parameters, an extra advantage of reducing computational complication was achieved by using small-size filters. These outcomes established a novel research trend for working with small-size filters in CNN. In addition, by inserting \(1\times 1\) convolutions in the middle of the convolutional layers, VGG regulates the network complexity. It learns a linear grouping of the subsequent feature maps. With respect to network tuning, a max pooling layer [ 102 ] is inserted following the convolutional layer, while padding is implemented to maintain the spatial resolution. In general, VGG obtained significant results for localization problems and image classification. While it did not achieve first place in the 2014-ILSVRC competition, it acquired a reputation due to its enlarged depth, homogenous topology, and simplicity. However, VGG’s computational cost was excessive due to its utilization of around 140 million parameters, which represented its main shortcoming. Figure  18 shows the structure of the network.

figure 18

The architecture of VGG

In the 2014-ILSVRC competition, GoogleNet (also called Inception-V1) emerged as the winner [ 103 ]. Achieving high-level accuracy with decreased computational cost is the core aim of the GoogleNet architecture. It proposed a novel inception block (module) concept in the CNN context, since it combines multiple-scale convolutional transformations by employing merge, transform, and split functions for feature extraction. Figure  19 illustrates the inception block architecture. This architecture incorporates filters of different sizes ( \(5\times 5, 3\times 3, \; \text{and} \; 1\times 1\) ) to capture channel information together with spatial information at diverse ranges of spatial resolution. The common convolutional layer of GoogLeNet is substituted by small blocks using the same concept of network-in-network (NIN) architecture [ 94 ], which replaced each layer with a micro-neural network. The GoogLeNet concepts of merge, transform, and split were utilized, supported by attending to an issue correlated with different learning types of variants existing in a similar class of several images. The motivation of GoogLeNet was to improve the efficiency of CNN parameters, as well as to enhance the learning capacity. In addition, it regulates the computation by inserting a \(1\times 1\) convolutional filter, as a bottleneck layer, ahead of using large-size kernels. GoogleNet employed sparse connections to overcome the redundant information problem. It decreased cost by neglecting the irrelevant channels. It should be noted here that only some of the input channels are connected to some of the output channels. By employing a GAP layer as the end layer, rather than utilizing a FC layer, the density of connections was decreased. The number of parameters was also significantly decreased from 40 to 5 million parameters due to these parameter tunings. The additional regularity factors used included the employment of RmsProp as optimizer and batch normalization [ 104 ]. Furthermore, GoogleNet proposed the idea of auxiliary learners to speed up the rate of convergence. Conversely, the main shortcoming of GoogleNet was its heterogeneous topology; this shortcoming requires adaptation from one module to another. Other shortcomings of GoogleNet include the representation jam, which substantially decreased the feature space in the following layer, and in turn occasionally leads to valuable information loss.

figure 19

The basic structure of Google Block

Highway network

Increasing the network depth enhances its performance, mainly for complicated tasks. By contrast, the network training becomes difficult. The presence of several layers in deeper networks may result in small gradient values of the back-propagation of error at lower layers. In 2015, Srivastava et al. [ 105 ] suggested a novel CNN architecture, called Highway Network, to overcome this issue. This approach is based on the cross-connectivity concept. The unhindered information flow in Highway Network is empowered by instructing two gating units inside the layer. The gate mechanism concept was motivated by LSTM-based RNN [ 106 , 107 ]. The information aggregation was conducted by merging the information of the \(\i{\text{th}}-k\) layers with the next \(\i{\text{th}}\) layer to generate a regularization impact, which makes the gradient-based training of the deeper network very simple. This empowers the training of networks with more than 100 layers, such as a deeper network of 900 layers with the SGD algorithm. A Highway Network with a depth of fifty layers presented an improved rate of convergence, which is better than thin and deep architectures at the same time [ 108 ]. By contrast, [ 69 ] empirically demonstrated that plain Net performance declines when more than ten hidden layers are inserted. It should be noted that even a Highway Network 900 layers in depth converges much more rapidly than the plain network.

He et al. [ 37 ] developed ResNet (Residual Network), which was the winner of ILSVRC 2015. Their objective was to design an ultra-deep network free of the vanishing gradient issue, as compared to the previous networks. Several types of ResNet were developed based on the number of layers (starting with 34 layers and going up to 1202 layers). The most common type was ResNet50, which comprised 49 convolutional layers plus a single FC layer. The overall number of network weights was 25.5 M, while the overall number of MACs was 3.9 M. The novel idea of ResNet is its use of the bypass pathway concept, as shown in Fig.  20 , which was employed in Highway Nets to address the problem of training a deeper network in 2015. This is illustrated in Fig.  20 , which contains the fundamental ResNet block diagram. This is a conventional feedforward network plus a residual connection. The residual layer output can be identified as the \((l - 1){\text{th}}\) outputs, which are delivered from the preceding layer \((x_{l} - 1)\) . After executing different operations [such as convolution using variable-size filters, or batch normalization, before applying an activation function like ReLU on \((x_{l} - 1)\) ], the output is \(F(x_{l} - 1)\) . The ending residual output is \(x_{l}\) , which can be mathematically represented as in Eq. 18 .

There are numerous basic residual blocks included in the residual network. Based on the type of the residual network architecture, operations in the residual block are also changed [ 37 ].

figure 20

The block diagram for ResNet

In comparison to the highway network, ResNet presented shortcut connections inside layers to enable cross-layer connectivity, which are parameter-free and data-independent. Note that the layers characterize non-residual functions when a gated shortcut is closed in the highway network. By contrast, the individuality shortcuts are never closed, while the residual information is permanently passed in ResNet. Furthermore, ResNet has the potential to prevent the problems of gradient diminishing, as the shortcut connections (residual links) accelerate the deep network convergence. ResNet was the winner of the 2015-ILSVRC championship with 152 layers of depth; this represents 8 times the depth of VGG and 20 times the depth of AlexNet. In comparison with VGG, it has lower computational complexity, even with enlarged depth.

Inception: ResNet and Inception-V3/4

Szegedy et al. [ 103 , 109 , 110 ] proposed Inception-ResNet and Inception-V3/4 as upgraded types of Inception-V1/2. The concept behind Inception-V3 was to minimize the computational cost with no effect on the deeper network generalization. Thus, Szegedy et al. used asymmetric small-size filters ( \(1\times 5\) and \(1\times 7\) ) rather than large-size filters ( \( 7\times 7\) and \(5\times 5\) ); moreover, they utilized a bottleneck of \(1\times 1\) convolution prior to the large-size filters [ 110 ]. These changes make the operation of the traditional convolution very similar to cross-channel correlation. Previously, Lin et al. utilized the 1 × 1 filter potential in NIN architecture [ 94 ]. Subsequently, [ 110 ] utilized the same idea in an intelligent manner. By using \(1\times 1\) convolutional operation in Inception-V3, the input data are mapped into three or four isolated spaces, which are smaller than the initial input spaces. Next, all of these correlations are mapped in these smaller spaces through common \(5\times 5\) or \(3\times 3\) convolutions. By contrast, in Inception-ResNet, Szegedy et al. bring together the inception block and the residual learning power by replacing the filter concatenation with the residual connection [ 111 ]. Szegedy et al. empirically demonstrated that Inception-ResNet (Inception-4 with residual connections) can achieve a similar generalization power to Inception-V4 with enlarged width and depth and without residual connections. Thus, it is clearly illustrated that using residual connections in training will significantly accelerate the Inception network training. Figure  21 shows The basic block diagram for Inception Residual unit.

figure 21

The basic block diagram for Inception Residual unit

To solve the problem of the vanishing gradient, DenseNet was presented, following the same direction as ResNet and the Highway network [ 105 , 111 , 112 ]. One of the drawbacks of ResNet is that it clearly conserves information by means of preservative individuality transformations, as several layers contribute extremely little or no information. In addition, ResNet has a large number of weights, since each layer has an isolated group of weights. DenseNet employed cross-layer connectivity in an improved approach to address this problem [ 112 , 113 , 114 ]. It connected each layer to all layers in the network using a feed-forward approach. Therefore, the feature maps of each previous layer were employed to input into all of the following layers. In traditional CNNs, there are l connections between the previous layer and the current layer, while in DenseNet, there are \(\frac{l(l+1)}{2}\) direct connections. DenseNet demonstrates the influence of cross-layer depth wise-convolutions. Thus, the network gains the ability to discriminate clearly between the added and the preserved information, since DenseNet concatenates the features of the preceding layers rather than adding them. However, due to its narrow layer structure, DenseNet becomes parametrically high-priced in addition to the increased number of feature maps. The direct admission of all layers to the gradients via the loss function enhances the information flow all across the network. In addition, this includes a regularizing impact, which minimizes overfitting on tasks alongside minor training sets. Figure  22 shows the architecture of DenseNet Network.

figure 22

(adopted from [ 112 ])

The architecture of DenseNet Network

ResNext is an enhanced version of the Inception Network [ 115 ]. It is also known as the Aggregated Residual Transform Network. Cardinality, which is a new term presented by [ 115 ], utilized the split, transform, and merge topology in an easy and effective way. It denotes the size of the transformation set as an extra dimension [ 116 , 117 , 118 ]. However, the Inception network manages network resources more efficiently, as well as enhancing the learning ability of the conventional CNN. In the transformation branch, different spatial embeddings (employing e.g. \(5\times 5\) , \(3\times 3\) , and \(1\times 1\) ) are used. Thus, customizing each layer is required separately. By contrast, ResNext derives its characteristic features from ResNet, VGG, and Inception. It employed the VGG deep homogenous topology with the basic architecture of GoogleNet by setting \(3\times 3\) filters as spatial resolution inside the blocks of split, transform, and merge. Figure  23 shows the ResNext building blocks. ResNext utilized multi-transformations inside the blocks of split, transform, and merge, as well as outlining such transformations in cardinality terms. The performance is significantly improved by increasing the cardinality, as Xie et al. showed. The complexity of ResNext was regulated by employing \(1\times 1\) filters (low embeddings) ahead of a \(3\times 3\) convolution. By contrast, skipping connections are used for optimized training [ 115 ].

figure 23

The basic block diagram for the ResNext building blocks

The feature reuse problem is the core shortcoming related to deep residual networks, since certain feature blocks or transformations contribute a very small amount to learning. Zagoruyko and Komodakis [ 119 ] accordingly proposed WideResNet to address this problem. These authors advised that the depth has a supplemental influence, while the residual units convey the core learning ability of deep residual networks. WideResNet utilized the residual block power via making the ResNet wider instead of deeper [ 37 ]. It enlarged the width by presenting an extra factor, k, which handles the network width. In other words, it indicated that layer widening is a highly successful method of performance enhancement compared to deepening the residual network. While enhanced representational capacity is achieved by deep residual networks, these networks also have certain drawbacks, such as the exploding and vanishing gradient problems, feature reuse problem (inactivation of several feature maps), and the time-intensive nature of the training. He et al. [ 37 ] tackled the feature reuse problem by including a dropout in each residual block to regularize the network in an efficient manner. In a similar manner, utilizing dropouts, Huang et al. [ 120 ] presented the stochastic depth concept to solve the slow learning and gradient vanishing problems. Earlier research was focused on increasing the depth; thus, any small enhancement in performance required the addition of several new layers. When comparing the number of parameters, WideResNet has twice that of ResNet, as an experimental study showed. By contrast, WideResNet presents an improved method for training relative to deep networks [ 119 ]. Note that most architectures prior to residual networks (including the highly effective VGG and Inception) were wider than ResNet. Thus, wider residual networks were established once this was determined. However, inserting a dropout between the convolutional layers (as opposed to within the residual block) made the learning more effective in WideResNet [ 121 , 122 ].

Pyramidal Net

The depth of the feature map increases in the succeeding layer due to the deep stacking of multi-convolutional layers, as shown in previous deep CNN architectures such as ResNet, VGG, and AlexNet. By contrast, the spatial dimension reduces, since a sub-sampling follows each convolutional layer. Thus, augmented feature representation is recompensed by decreasing the size of the feature map. The extreme expansion in the depth of the feature map, alongside the spatial information loss, interferes with the learning ability in the deep CNNs. ResNet obtained notable outcomes for the issue of image classification. Conversely, deleting a convolutional block—in which both the number of channel and spatial dimensions vary (channel depth enlarges, while spatial dimension reduces)—commonly results in decreased classifier performance. Accordingly, the stochastic ResNet enhanced the performance by decreasing the information loss accompanying the residual unit drop. Han et al. [ 123 ] proposed Pyramidal Net to address the ResNet learning interference problem. To address the depth enlargement and extreme reduction in spatial width via ResNet, Pyramidal Net slowly enlarges the residual unit width to cover the most feasible places rather than saving the same spatial dimension inside all residual blocks up to the appearance of the down-sampling. It was referred to as Pyramidal Net due to the slow enlargement in the feature map depth based on the up-down method. Factor l, which was determined by Eq. 19 , regulates the depth of the feature map.

Here, the dimension of the l th residual unit is indicated by \(d_{l}\) ; moreover, n indicates the overall number of residual units, the step factor is indicated by \(\lambda \) , and the depth increase is regulated by the factor \(\frac{\lambda }{n}\) , which uniformly distributes the weight increase across the dimension of the feature map. Zero-padded identity mapping is used to insert the residual connections among the layers. In comparison to the projection-based shortcut connections, zero-padded identity mapping requires fewer parameters, which in turn leads to enhanced generalization [ 124 ]. Multiplication- and addition-based widening are two different approaches used in Pyramidal Nets for network widening. More specifically, the first approach (multiplication) enlarges geometrically, while the second one (addition) enlarges linearly [ 92 ]. The main problem associated with the width enlargement is the growth in time and space required related to the quadratic time.

Extreme inception architecture is the main characteristic of Xception. The main idea behind Xception is its depthwise separable convolution [ 125 ]. The Xception model adjusted the original inception block by making it wider and exchanging a single dimension ( \(3 \times 3\) ) followed by a \(1 \times 1\) convolution to reduce computational complexity. Figure  24 shows the Xception block architecture. The Xception network becomes extra computationally effective through the use of the decoupling channel and spatial correspondence. Moreover, it first performs mapping of the convolved output to the embedding short dimension by applying \(1 \times 1\) convolutions. It then performs k spatial transformations. Note that k here represents the width-defining cardinality, which is obtained via the transformations number in Xception. However, the computations were made simpler in Xception by distinctly convolving each channel around the spatial axes. These axes are subsequently used as the \(1 \times 1\) convolutions (pointwise convolution) for performing cross-channel correspondence. The \(1 \times 1\) convolution is utilized in Xception to regularize the depth of the channel. The traditional convolutional operation in Xception utilizes a number of transformation segments equivalent to the number of channels; Inception, moreover, utilizes three transformation segments, while traditional CNN architecture utilizes only a single transformation segment. Conversely, the suggested Xception transformation approach achieves extra learning efficiency and better performance but does not minimize the number of parameters [ 126 , 127 ].

figure 24

The basic block diagram for the Xception block architecture

Residual attention neural network

To improve the network feature representation, Wang et al. [ 128 ] proposed the Residual Attention Network (RAN). Enabling the network to learn aware features of the object is the main purpose of incorporating attention into the CNN. The RAN consists of stacked residual blocks in addition to the attention module; hence, it is a feed-forward CNN. However, the attention module is divided into two branches, namely the mask branch and trunk branch. These branches adopt a top-down and bottom-up learning strategy respectively. Encapsulating two different strategies in the attention model supports top-down attention feedback and fast feed-forward processing in only one particular feed-forward process. More specifically, the top-down architecture generates dense features to make inferences about every aspect. Moreover, the bottom-up feedforward architecture generates low-resolution feature maps in addition to robust semantic information. Restricted Boltzmann machines employed a top-down bottom-up strategy as in previously proposed studies [ 129 ]. During the training reconstruction phase, Goh et al. [ 130 ] used the mechanism of top-down attention in deep Boltzmann machines (DBMs) as a regularizing factor. Note that the network can be globally optimized using a top-down learning strategy in a similar manner, where the maps progressively output to the input throughout the learning process [ 129 , 130 , 131 , 132 ].

Incorporating the attention concept with convolutional blocks in an easy way was used by the transformation network, as obtained in a previous study [ 133 ]. Unfortunately, these are inflexible, which represents the main problem, along with their inability to be used for varying surroundings. By contrast, stacking multi-attention modules has made RAN very effective at recognizing noisy, complex, and cluttered images. RAN’s hierarchical organization gives it the capability to adaptively allocate a weight for every feature map depending on its importance within the layers. Furthermore, incorporating three distinct levels of attention (spatial, channel, and mixed) enables the model to use this ability to capture the object-aware features at these distinct levels.

Convolutional block attention module

The importance of the feature map utilization and the attention mechanism is certified via SE-Network and RAN [ 128 , 134 , 135 ]. The convolutional block attention (CBAM) module, which is a novel attention-based CNN, was first developed by Woo et al. [ 136 ]. This module is similar to SE-Network and simple in design. SE-Network disregards the object’s spatial locality in the image and considers only the channels’ contribution during the image classification. Regarding object detection, object spatial location plays a significant role. The convolutional block attention module sequentially infers the attention maps. More specifically, it applies channel attention preceding the spatial attention to obtain the refined feature maps. Spatial attention is performed using 1 × 1 convolution and pooling functions, as in the literature. Generating an effective feature descriptor can be achieved by using a spatial axis along with the pooling of features. In addition, generating a robust spatial attention map is possible, as CBAM concatenates the max pooling and average pooling operations. In a similar manner, a collection of GAP and max pooling operations is used to model the feature map statistics. Woo et al. [ 136 ] demonstrated that utilizing GAP will return a sub-optimized inference of channel attention, whereas max pooling provides an indication of the distinguishing object features. Thus, the utilization of max pooling and average pooling enhances the network’s representational power. The feature maps improve the representational power, as well as facilitating a focus on the significant portion of the chosen features. The expression of 3D attention maps through a serial learning procedure assists in decreasing the computational cost and the number of parameters, as Woo et al. [ 136 ] experimentally proved. Note that any CNN architecture can be simply integrated with CBAM.

Concurrent spatial and channel excitation mechanism

To make the work valid for segmentation tasks, Roy et al. [ 137 , 138 ] expanded Hu et al. [ 134 ] effort by adding the influence of spatial information to the channel information. Roy et al. [ 137 , 138 ] presented three types of modules: (1) channel squeeze and excitation with concurrent channels (scSE); (2) exciting spatially and squeezing channel-wise (sSE); (3) exciting channel-wise and squeezing spatially (cSE). For segmentation purposes, they employed auto-encoder-based CNNs. In addition, they suggested inserting modules following the encoder and decoder layers. To specifically highlight the object-specific feature maps, they further allocated attention to every channel by expressing a scaling factor from the channel and spatial information in the first module (scSE). In the second module (sSE), the feature map information has lower importance than the spatial locality, as the spatial information plays a significant role during the segmentation process. Therefore, several channel collections are spatially divided and developed so that they can be employed in segmentation. In the final module (cSE), a similar SE-block concept is used. Furthermore, the scaling factor is derived founded on the contribution of the feature maps within the object detection [ 137 , 138 ].

CNN is an efficient technique for detecting object features and achieving well-behaved recognition performance in comparison with innovative handcrafted feature detectors. A number of restrictions related to CNN are present, meaning that the CNN does not consider certain relations, orientation, size, and perspectives of features. For instance, when considering a face image, the CNN does not count the various face components (such as mouth, eyes, nose, etc.) positions, and will incorrectly activate the CNN neurons and recognize the face without taking specific relations (such as size, orientation etc.) into account. At this point, consider a neuron that has probability in addition to feature properties such as size, orientation, perspective, etc. A specific neuron/capsule of this type has the ability to effectively detect the face along with different types of information. Thus, many layers of capsule nodes are used to construct the capsule network. An encoding unit, which contains three layers of capsule nodes, forms the CapsuleNet or CapsNet (the initial version of the capsule networks).

For example, the MNIST architecture comprises \(28\times 28\) images, applying 256 filters of size \(9\times 9\) and with stride 1. The \(28-9+1=20\) is the output plus 256 feature maps. Next, these outputs are input to the first capsule layer, while producing an 8D vector rather than a scalar; in fact, this is a modified convolution layer. Note that a stride 2 with \(9\times 9\) filters is employed in the first convolution layer. Thus, the dimension of the output is \((20-9)/2+1=6\) . The initial capsules employ \(8\times 32\) filters, which generate 32 × 8 × 6 × 6 (32 for groups, 8 for neurons, while 6 × 6 is the neuron size).

Figure  25 represents the complete CapsNet encoding and decoding processes. In the CNN context, a max-pooling layer is frequently employed to handle the translation change. It can detect the feature moves in the event that the feature is still within the max-pooling window. This approach has the ability to detect the overlapped features; this is highly significant in detection and segmentation operations, since the capsule involves the weighted features sum from the preceding layer.

figure 25

The complete CapsNet encoding and decoding processes

In conventional CNNs, a particular cost function is employed to evaluate the global error that grows toward the back throughout the training process. Conversely, in such cases, the activation of a neuron will not grow further once the weight between two neurons turns out to be zero. Instead of a single size being provided with the complete cost function in repetitive dynamic routing alongside the agreement, the signal is directed based on the feature parameters. Sabour et al. [ 139 ] provides more details about this architecture. When using MNIST to recognize handwritten digits, this innovative CNN architecture gives superior accuracy. From the application perspective, this architecture has extra suitability for segmentation and detection approaches when compared with classification approaches [ 140 , 141 , 142 ].

High-resolution network (HRNet)

High-resolution representations are necessary for position-sensitive vision tasks, such as semantic segmentation, object detection, and human pose estimation. In the present up-to-date frameworks, the input image is encoded as a low-resolution representation using a subnetwork that is constructed as a connected series of high-to-low resolution convolutions such as VGGNet and ResNet. The low-resolution representation is then recovered to become a high-resolution one. Alternatively, high-resolution representations are maintained during the entire process using a novel network, referred to as a High-Resolution Network (HRNet) [ 143 , 144 ]. This network has two principal features. First, the convolution series of high-to-low resolutions are connected in parallel. Second, the information across the resolutions are repeatedly exchanged. The advantage achieved includes getting a representation that is more accurate in the spatial domain and extra-rich in the semantic domain. Moreover, HRNet has several applications in the fields of object detection, semantic segmentation, and human pose prediction. For computer vision problems, the HRNet represents a more robust backbone. Figure  26 illustrates the general architecture of HRNet.

figure 26

The general architecture of HRNet

Challenges (limitations) of deep learning and alternate solutions

When employing DL, several difficulties are often taken into consideration. Those more challenging are listed next and several possible alternatives are accordingly provided.

Training data

DL is extremely data-hungry considering it also involves representation learning [ 145 , 146 ]. DL demands an extensively large amount of data to achieve a well-behaved performance model, i.e. as the data increases, an extra well-behaved performance model can be achieved (Fig.  27 ). In most cases, the available data are sufficient to obtain a good performance model. However, sometimes there is a shortage of data for using DL directly [ 87 ]. To properly address this issue, three suggested methods are available. The first involves the employment of the transfer-learning concept after data is collected from similar tasks. Note that while the transferred data will not directly augment the actual data, it will help in terms of both enhancing the original input representation of data and its mapping function [ 147 ]. In this way, the model performance is boosted. Another technique involves employing a well-trained model from a similar task and fine-tuning the ending of two layers or even one layer based on the limited original data. Refer to [ 148 , 149 ] for a review of different transfer-learning techniques applied in the DL approach. In the second method, data augmentation is performed [ 150 ]. This task is very helpful for use in augmenting the image data, since the image translation, mirroring, and rotation commonly do not change the image label. Conversely, it is important to take care when applying this technique in some cases such as with bioinformatics data. For instance, when mirroring an enzyme sequence, the output data may not represent the actual enzyme sequence. In the third method, the simulated data can be considered for increasing the volume of the training set. It is occasionally possible to create simulators based on the physical process if the issue is well understood. Therefore, the result will involve the simulation of as much data as needed. Processing the data requirement for DL-based simulation is obtained as an example in Ref. [ 151 ].

figure 27

The performance of DL regarding the amount of data

  • Transfer learning

Recent research has revealed a widespread use of deep CNNs, which offer ground-breaking support for answering many classification problems. Generally speaking, deep CNN models require a sizable volume of data to obtain good performance. The common challenge associated with using such models concerns the lack of training data. Indeed, gathering a large volume of data is an exhausting job, and no successful solution is available at this time. The undersized dataset problem is therefore currently solved using the TL technique [ 148 , 149 ], which is highly efficient in addressing the lack of training data issue. The mechanism of TL involves training the CNN model with large volumes of data. In the next step, the model is fine-tuned for training on a small request dataset.

The student-teacher relationship is a suitable approach to clarifying TL. Gathering detailed knowledge of the subject is the first step [ 152 ]. Next, the teacher provides a “course” by conveying the information within a “lecture series” over time. Put simply, the teacher transfers the information to the student. In more detail, the expert (teacher) transfers the knowledge (information) to the learner (student). Similarly, the DL network is trained using a vast volume of data, and also learns the bias and the weights during the training process. These weights are then transferred to different networks for retraining or testing a similar novel model. Thus, the novel model is enabled to pre-train weights rather than requiring training from scratch. Figure  28 illustrates the conceptual diagram of the TL technique.

Pre-trained models: Many CNN models, e.g. AlexNet [ 30 ], GoogleNet [ 103 ], and ResNet [ 37 ], have been trained on large datasets such as ImageNet for image recognition purposes. These models can then be employed to recognize a different task without the need to train from scratch. Furthermore, the weights remain the same apart from a few learned features. In cases where data samples are lacking, these models are very useful. There are many reasons for employing a pre-trained model. First, training large models on sizeable datasets requires high-priced computational power. Second, training large models can be time-consuming, taking up to multiple weeks. Finally, a pre-trained model can assist with network generalization and speed up the convergence.

A research problem using pre-trained models: Training a DL approach requires a massive number of images. Thus, obtaining good performance is a challenge under these circumstances. Achieving excellent outcomes in image classification or recognition applications, with performance occasionally superior to that of a human, becomes possible through the use of deep convolutional neural networks (DCNNs) including several layers if a huge amount of data is available [ 37 , 148 , 153 ]. However, avoiding overfitting problems in such applications requires sizable datasets and properly generalizing DCNN models. When training a DCNN model, the dataset size has no lower limit. However, the accuracy of the model becomes insufficient in the case of the utilized model has fewer layers, or if a small dataset is used for training due to over- or under-fitting problems. Due to they have no ability to utilize the hierarchical features of sizable datasets, models with fewer layers have poor accuracy. It is difficult to acquire sufficient training data for DL models. For example, in medical imaging and environmental science, gathering labelled datasets is very costly [ 148 ]. Moreover, the majority of the crowdsourcing workers are unable to make accurate notes on medical or biological images due to their lack of medical or biological knowledge. Thus, ML researchers often rely on field experts to label such images; however, this process is costly and time consuming. Therefore, producing the large volume of labels required to develop flourishing deep networks turns out to be unfeasible. Recently, TL has been widely employed to address the later issue. Nevertheless, although TL enhances the accuracy of several tasks in the fields of pattern recognition and computer vision [ 154 , 155 ], there is an essential issue related to the source data type used by the TL as compared to the target dataset. For instance, enhancing the medical image classification performance of CNN models is achieved by training the models using the ImageNet dataset, which contains natural images [ 153 ]. However, such natural images are completely dissimilar from the raw medical images, meaning that the model performance is not enhanced. It has further been proven that TL from different domains does not significantly affect performance on medical imaging tasks, as lightweight models trained from scratch perform nearly as well as standard ImageNet-transferred models [ 156 ]. Therefore, there exists scenarios in which using pre-trained models do not become an affordable solution. In 2020, some researchers have utilized same-domain TL and achieved excellent results [ 86 , 87 , 88 , 157 ]. Same-domain TL is an approach of using images that look similar to the target dataset for training. For example, using X-ray images of different chest diseases to train the model, then fine-tuning and training it on chest X-ray images for COVID-19 diagnosis. More details about same-domain TL and how to implement the fine-tuning process can be found in [ 87 ].

figure 28

The conceptual diagram of the TL technique

Data augmentation techniques

If the goal is to increase the amount of available data and avoid the overfitting issue, data augmentation techniques are one possible solution [ 150 , 158 , 159 ]. These techniques are data-space solutions for any limited-data problem. Data augmentation incorporates a collection of methods that improve the attributes and size of training datasets. Thus, DL networks can perform better when these techniques are employed. Next, we list some data augmentation alternate solutions.

Flipping: Flipping the vertical axis is a less common practice than flipping the horizontal one. Flipping has been verified as valuable on datasets like ImageNet and CIFAR-10. Moreover, it is highly simple to implement. In addition, it is not a label-conserving transformation on datasets that involve text recognition (such as SVHN and MNIST).

Color space: Encoding digital image data is commonly used as a dimension tensor ( \(height \times width \times color channels\) ). Accomplishing augmentations in the color space of the channels is an alternative technique, which is extremely workable for implementation. A very easy color augmentation involves separating a channel of a particular color, such as Red, Green, or Blue. A simple way to rapidly convert an image using a single-color channel is achieved by separating that matrix and inserting additional double zeros from the remaining two color channels. Furthermore, increasing or decreasing the image brightness is achieved by using straightforward matrix operations to easily manipulate the RGB values. By deriving a color histogram that describes the image, additional improved color augmentations can be obtained. Lighting alterations are also made possible by adjusting the intensity values in histograms similar to those employed in photo-editing applications.

Cropping: Cropping a dominant patch of every single image is a technique employed with combined dimensions of height and width as a specific processing step for image data. Furthermore, random cropping may be employed to produce an impact similar to translations. The difference between translations and random cropping is that translations conserve the spatial dimensions of this image, while random cropping reduces the input size [for example from (256, 256) to (224, 224)]. According to the selected reduction threshold for cropping, the label-preserving transformation may not be addressed.

Rotation: When rotating an image left or right from within 0 to 360 degrees around the axis, rotation augmentations are obtained. The rotation degree parameter greatly determines the suitability of the rotation augmentations. In digit recognition tasks, small rotations (from 0 to 20 degrees) are very helpful. By contrast, the data label cannot be preserved post-transformation when the rotation degree increases.

Translation: To avoid positional bias within the image data, a very useful transformation is to shift the image up, down, left, or right. For instance, it is common that the whole dataset images are centered; moreover, the tested dataset should be entirely made up of centered images to test the model. Note that when translating the initial images in a particular direction, the residual space should be filled with Gaussian or random noise, or a constant value such as 255 s or 0 s. The spatial dimensions of the image post-augmentation are preserved using this padding.

Noise injection This approach involves injecting a matrix of arbitrary values. Such a matrix is commonly obtained from a Gaussian distribution. Moreno-Barea et al. [ 160 ] employed nine datasets to test the noise injection. These datasets were taken from the UCI repository [ 161 ]. Injecting noise within images enables the CNN to learn additional robust features.

However, highly well-behaved solutions for positional biases available within the training data are achieved by means of geometric transformations. To separate the distribution of the testing data from the training data, several prospective sources of bias exist. For instance, when all faces should be completely centered within the frames (as in facial recognition datasets), the problem of positional biases emerges. Thus, geometric translations are the best solution. Geometric translations are helpful due to their simplicity of implementation, as well as their effective capability to disable the positional biases. Several libraries of image processing are available, which enables beginning with simple operations such as rotation or horizontal flipping. Additional training time, higher computational costs, and additional memory are some shortcomings of geometric transformations. Furthermore, a number of geometric transformations (such as arbitrary cropping or translation) should be manually observed to ensure that they do not change the image label. Finally, the biases that separate the test data from the training data are more complicated than transitional and positional changes. Hence, it is not trivial answering to when and where geometric transformations are suitable to be applied.

Imbalanced data

Commonly, biological data tend to be imbalanced, as negative samples are much more numerous than positive ones [ 162 , 163 , 164 ]. For example, compared to COVID-19-positive X-ray images, the volume of normal X-ray images is very large. It should be noted that undesirable results may be produced when training a DL model using imbalanced data. The following techniques are used to solve this issue. First, it is necessary to employ the correct criteria for evaluating the loss, as well as the prediction result. In considering the imbalanced data, the model should perform well on small classes as well as larger ones. Thus, the model should employ area under curve (AUC) as the resultant loss as well as the criteria [ 165 ]. Second, it should employ the weighted cross-entropy loss, which ensures the model will perform well with small classes if it still prefers to employ the cross-entropy loss. Simultaneously, during model training, it is possible either to down-sample the large classes or up-sample the small classes. Finally, to make the data balanced as in Ref. [ 166 ], it is possible to construct models for every hierarchical level, as a biological system frequently has hierarchical label space. However, the effect of the imbalanced data on the performance of the DL model has been comprehensively investigated. In addition, to lessen the problem, the most frequently used techniques were also compared. Nevertheless, note that these techniques are not specified for biological problems.

Interpretability of data

Occasionally, DL techniques are analyzed to act as a black box. In fact, they are interpretable. The need for a method of interpreting DL, which is used to obtain the valuable motifs and patterns recognized by the network, is common in many fields, such as bioinformatics [ 167 ]. In the task of disease diagnosis, it is not only required to know the disease diagnosis or prediction results of a trained DL model, but also how to enhance the surety of the prediction outcomes, as the model makes its decisions based on these verifications [ 168 ]. To achieve this, it is possible to give a score of importance for every portion of the particular example. Within this solution, back-propagation-based techniques or perturbation-based approaches are used [ 169 ]. In the perturbation-based approaches, a portion of the input is changed and the effect of this change on the model output is observed [ 170 , 171 , 172 , 173 ]. This concept has high computational complexity, but it is simple to understand. On the other hand, to check the score of the importance of various input portions, the signal from the output propagates back to the input layer in the back-propagation-based techniques. These techniques have been proven valuable in [ 174 ]. In different scenarios, various meanings can represent the model interpretability.

Uncertainty scaling

Commonly, the final prediction label is not the only label required when employing DL techniques to achieve the prediction; the score of confidence for every inquiry from the model is also desired. The score of confidence is defined as how confident the model is in its prediction [ 175 ]. Since the score of confidence prevents belief in unreliable and misleading predictions, it is a significant attribute, regardless of the application scenario. In biology, the confidence score reduces the resources and time expended in proving the outcomes of the misleading prediction. Generally speaking, in healthcare or similar applications, the uncertainty scaling is frequently very significant; it helps in evaluating automated clinical decisions and the reliability of machine learning-based disease-diagnosis [ 176 , 177 ]. Because overconfident prediction can be the output of different DL models, the score of probability (achieved from the softmax output of the direct-DL) is often not in the correct scale [ 178 ]. Note that the softmax output requires post-scaling to achieve a reliable probability score. For outputting the probability score in the correct scale, several techniques have been introduced, including Bayesian Binning into Quantiles (BBQ) [ 179 ], isotonic regression [ 180 ], histogram binning [ 181 ], and the legendary Platt scaling [ 182 ]. More specifically, for DL techniques, temperature scaling was recently introduced, which achieves superior performance compared to the other techniques.

Catastrophic forgetting

This is defined as incorporating new information into a plain DL model, made possible by interfering with the learned information. For instance, consider a case where there are 1000 types of flowers and a model is trained to classify these flowers, after which a new type of flower is introduced; if the model is fine-tuned only with this new class, its performance will become unsuccessful with the older classes [ 183 , 184 ]. The logical data are continually collected and renewed, which is in fact a highly typical scenario in many fields, e.g. Biology. To address this issue, there is a direct solution that involves employing old and new data to train an entirely new model from scratch. This solution is time-consuming and computationally intensive; furthermore, it leads to an unstable state for the learned representation of the initial data. At this time, three different types of ML techniques, which have not catastrophic forgetting, are made available to solve the human brain problem founded on the neurophysiological theories [ 185 , 186 ]. Techniques of the first type are founded on regularizations such as EWC [ 183 ] Techniques of the second type employ rehearsal training techniques and dynamic neural network architecture like iCaRL [ 187 , 188 ]. Finally, techniques of the third type are founded on dual-memory learning systems [ 189 ]. Refer to [ 190 , 191 , 192 ] in order to gain more details.

Model compression

To obtain well-trained models that can still be employed productively, DL models have intensive memory and computational requirements due to their huge complexity and large numbers of parameters [ 193 , 194 ]. One of the fields that is characterized as data-intensive is the field of healthcare and environmental science. These needs reduce the deployment of DL in limited computational-power machines, mainly in the healthcare field. The numerous methods of assessing human health and the data heterogeneity have become far more complicated and vastly larger in size [ 195 ]; thus, the issue requires additional computation [ 196 ]. Furthermore, novel hardware-based parallel processing solutions such as FPGAs and GPUs [ 197 , 198 , 199 ] have been developed to solve the computation issues associated with DL. Recently, numerous techniques for compressing the DL models, designed to decrease the computational issues of the models from the starting point, have also been introduced. These techniques can be classified into four classes. In the first class, the redundant parameters (which have no significant impact on model performance) are reduced. This class, which includes the famous deep compression method, is called parameter pruning [ 200 ]. In the second class, the larger model uses its distilled knowledge to train a more compact model; thus, it is called knowledge distillation [ 201 , 202 ]. In the third class, compact convolution filters are used to reduce the number of parameters [ 203 ]. In the final class, the information parameters are estimated for preservation using low-rank factorization [ 204 ]. For model compression, these classes represent the most representative techniques. In [ 193 ], it has been provided a more comprehensive discussion about the topic.

Overfitting

DL models have excessively high possibilities of resulting in data overfitting at the training stage due to the vast number of parameters involved, which are correlated in a complex manner. Such situations reduce the model’s ability to achieve good performance on the tested data [ 90 , 205 ]. This problem is not only limited to a specific field, but involves different tasks. Therefore, when proposing DL techniques, this problem should be fully considered and accurately handled. In DL, the implied bias of the training process enables the model to overcome crucial overfitting problems, as recent studies suggest [ 205 , 206 , 207 , 208 ]. Even so, it is still necessary to develop techniques that handle the overfitting problem. An investigation of the available DL algorithms that ease the overfitting problem can categorize them into three classes. The first class acts on both the model architecture and model parameters and includes the most familiar approaches, such as weight decay [ 209 ], batch normalization [ 210 ], and dropout [ 90 ]. In DL, the default technique is weight decay [ 209 ], which is used extensively in almost all ML algorithms as a universal regularizer. The second class works on model inputs such as data corruption and data augmentation [ 150 , 211 ]. One reason for the overfitting problem is the lack of training data, which makes the learned distribution not mirror the real distribution. Data augmentation enlarges the training data. By contrast, marginalized data corruption improves the solution exclusive to augmenting the data. The final class works on the model output. A recently proposed technique penalizes the over-confident outputs for regularizing the model [ 178 ]. This technique has demonstrated the ability to regularize RNNs and CNNs.

Vanishing gradient problem

In general, when using backpropagation and gradient-based learning techniques along with ANNs, largely in the training stage, a problem called the vanishing gradient problem arises [ 212 , 213 , 214 ]. More specifically, in each training iteration, every weight of the neural network is updated based on the current weight and is proportionally relative to the partial derivative of the error function. However, this weight updating may not occur in some cases due to a vanishingly small gradient, which in the worst case means that no extra training is possible and the neural network will stop completely. Conversely, similarly to other activation functions, the sigmoid function shrinks a large input space to a tiny input space. Thus, the derivative of the sigmoid function will be small due to large variation at the input that produces a small variation at the output. In a shallow network, only some layers use these activations, which is not a significant issue. While using more layers will lead the gradient to become very small in the training stage, in this case, the network works efficiently. The back-propagation technique is used to determine the gradients of the neural networks. Initially, this technique determines the network derivatives of each layer in the reverse direction, starting from the last layer and progressing back to the first layer. The next step involves multiplying the derivatives of each layer down the network in a similar manner to the first step. For instance, multiplying N small derivatives together when there are N hidden layers employs an activation function such as the sigmoid function. Hence, the gradient declines exponentially while propagating back to the first layer. More specifically, the biases and weights of the first layers cannot be updated efficiently during the training stage because the gradient is small. Moreover, this condition decreases the overall network accuracy, as these first layers are frequently critical to recognizing the essential elements of the input data. However, such a problem can be avoided through employing activation functions. These functions lack the squishing property, i.e., the ability to squish the input space to within a small space. By mapping X to max, the ReLU [ 91 ] is the most popular selection, as it does not yield a small derivative that is employed in the field. Another solution involves employing the batch normalization layer [ 81 ]. As mentioned earlier, the problem occurs once a large input space is squashed into a small space, leading to vanishing the derivative. Employing batch normalization degrades this issue by simply normalizing the input, i.e., the expression | x | does not accomplish the exterior boundaries of the sigmoid function. The normalization process makes the largest part of it come down in the green area, which ensures that the derivative is large enough for further actions. Furthermore, faster hardware can tackle the previous issue, e.g. that provided by GPUs. This makes standard back-propagation possible for many deeper layers of the network compared to the time required to recognize the vanishing gradient problem [ 215 ].

Exploding gradient problem

Opposite to the vanishing problem is the one related to gradient. Specifically, large error gradients are accumulated during back-propagation [ 216 , 217 , 218 ]. The latter will lead to extremely significant updates to the weights of the network, meaning that the system becomes unsteady. Thus, the model will lose its ability to learn effectively. Grosso modo, moving backward in the network during back-propagation, the gradient grows exponentially by repetitively multiplying gradients. The weight values could thus become incredibly large and may overflow to become a not-a-number (NaN) value. Some potential solutions include:

Using different weight regularization techniques.

Redesigning the architecture of the network model.

Underspecification

In 2020, a team of computer scientists at Google has identified a new challenge called underspecification [ 219 ]. ML models including DL models often show surprisingly poor behavior when they are tested in real-world applications such as computer vision, medical imaging, natural language processing, and medical genomics. The reason behind the weak performance is due to underspecification. It has been shown that small modifications can force a model towards a completely different solution as well as lead to different predictions in deployment domains. There are different techniques of addressing underspecification issue. One of them is to design “stress tests” to examine how good a model works on real-world data and to find out the possible issues. Nevertheless, this demands a reliable understanding of the process the model can work inaccurately. The team stated that “Designing stress tests that are well-matched to applied requirements, and that provide good “coverage” of potential failure modes is a major challenge”. Underspecification puts major constraints on the credibility of ML predictions and may require some reconsidering over certain applications. Since ML is linked to human by serving several applications such as medical imaging and self-driving cars, it will require proper attention to this issue.

Applications of deep learning

Presently, various DL applications are widespread around the world. These applications include healthcare, social network analysis, audio and speech processing (like recognition and enhancement), visual data processing methods (such as multimedia data analysis and computer vision), and NLP (translation and sentence classification), among others (Fig.  29 ) [ 220 , 221 , 222 , 223 , 224 ]. These applications have been classified into five categories: classification, localization, detection, segmentation, and registration. Although each of these tasks has its own target, there is fundamental overlap in the pipeline implementation of these applications as shown in Fig.  30 . Classification is a concept that categorizes a set of data into classes. Detection is used to locate interesting objects in an image with consideration given to the background. In detection, multiple objects, which could be from dissimilar classes, are surrounded by bounding boxes. Localization is the concept used to locate the object, which is surrounded by a single bounding box. In segmentation (semantic segmentation), the target object edges are surrounded by outlines, which also label them; moreover, fitting a single image (which could be 2D or 3D) onto another refers to registration. One of the most important and wide-ranging DL applications are in healthcare [ 225 , 226 , 227 , 228 , 229 , 230 ]. This area of research is critical due to its relation to human lives. Moreover, DL has shown tremendous performance in healthcare. Therefore, we take DL applications in the medical image analysis field as an example to describe the DL applications.

figure 29

Examples of DL applications

figure 30

Workflow of deep learning tasks

Classification

Computer-Aided Diagnosis (CADx) is another title sometimes used for classification. Bharati et al. [ 231 ] used a chest X-ray dataset for detecting lung diseases based on a CNN. Another study attempted to read X-ray images by employing CNN [ 232 ]. In this modality, the comparative accessibility of these images has likely enhanced the progress of DL. [ 233 ] used an improved pre-trained GoogLeNet CNN containing more than 150,000 images for training and testing processes. This dataset was augmented from 1850 chest X-rays. The creators reorganized the image orientation into lateral and frontal views and achieved approximately 100% accuracy. This work of orientation classification has clinically limited use. As a part of an ultimately fully automated diagnosis workflow, it obtained the data augmentation and pre-trained efficiency in learning the metadata of relevant images. Chest infection, commonly referred to as pneumonia, is extremely treatable, as it is a commonly occurring health problem worldwide. Conversely, Rajpurkar et al. [ 234 ] utilized CheXNet, which is an improved version of DenseNet [ 112 ] with 121 convolution layers, for classifying fourteen types of disease. These authors used the CheXNet14 dataset [ 235 ], which comprises 112,000 images. This network achieved an excellent performance in recognizing fourteen different diseases. In particular, pneumonia classification accomplished a 0.7632 AUC score using receiver operating characteristics (ROC) analysis. In addition, the network obtained better than or equal to the performance of both a three-radiologist panel and four individual radiologists. Zuo et al. [ 236 ] have adopted CNN for candidate classification in lung nodule. Shen et al. [ 237 ] employed both Random Forest (RF) and SVM classifiers with CNNs to classify lung nodules. They employed two convolutional layers with each of the three parallel CNNs. The LIDC-IDRI (Lung Image Database Consortium) dataset, which contained 1010-labeled CT lung scans, was used to classify the two types of lung nodules (malignant and benign). Different scales of the image patches were used by every CNN to extract features, while the output feature vector was constructed using the learned features. Next, these vectors were classified into malignant or benign using either the RF classifier or SVM with radial basis function (RBF) filter. The model was robust to various noisy input levels and achieved an accuracy of 86% in nodule classification. Conversely, the model of [ 238 ] interpolates the image data missing between PET and MRI images using 3D CNNs. The Alzheimer Disease Neuroimaging Initiative (ADNI) database, containing 830 PET and MRI patient scans, was utilized in their work. The PET and MRI images are used to train the 3D CNNs, first as input and then as output. Furthermore, for patients who have no PET images, the 3D CNNs utilized the trained images to rebuild the PET images. These rebuilt images approximately fitted the actual disease recognition outcomes. However, this approach did not address the overfitting issues, which in turn restricted their technique in terms of its possible capacity for generalization. Diagnosing normal versus Alzheimer’s disease patients has been achieved by several CNN models [ 239 , 240 ]. Hosseini-Asl et al. [ 241 ] attained 99% accuracy for up-to-date outcomes in diagnosing normal versus Alzheimer’s disease patients. These authors applied an auto-encoder architecture using 3D CNNs. The generic brain features were pre-trained on the CADDementia dataset. Subsequently, the outcomes of these learned features became inputs to higher layers to differentiate between patient scans of Alzheimer’s disease, mild cognitive impairment, or normal brains based on the ADNI dataset and using fine-tuned deep supervision techniques. The architectures of VGGNet and RNNs, in that order, were the basis of both VOXCNN and ResNet models developed by Korolev et al. [ 242 ]. They also discriminated between Alzheimer’s disease and normal patients using the ADNI database. Accuracy was 79% for Voxnet and 80% for ResNet. Compared to Hosseini-Asl’s work, both models achieved lower accuracies. Conversely, the implementation of the algorithms was simpler and did not require feature hand-crafting, as Korolev declared. In 2020, Mehmood et al. [ 240 ] trained a developed CNN-based network called “SCNN” with MRI images for the tasks of classification of Alzheimer’s disease. They achieved state-of-the-art results by obtaining an accuracy of 99.05%.

Recently, CNN has taken some medical imaging classification tasks to different level from traditional diagnosis to automated diagnosis with tremendous performance. Examples of these tasks are diabetic foot ulcer (DFU) (as normal and abnormal (DFU) classes) [ 87 , 243 , 244 , 245 , 246 ], sickle cells anemia (SCA) (as normal, abnormal (SCA), and other blood components) [ 86 , 247 ], breast cancer by classify hematoxylin–eosin-stained breast biopsy images into four classes: invasive carcinoma, in-situ carcinoma, benign tumor and normal tissue [ 42 , 88 , 248 , 249 , 250 , 251 , 252 ], and multi-class skin cancer classification [ 253 , 254 , 255 ].

In 2020, CNNs are playing a vital role in early diagnosis of the novel coronavirus (COVID-2019). CNN has become the primary tool for automatic COVID-19 diagnosis in many hospitals around the world using chest X-ray images [ 256 , 257 , 258 , 259 , 260 ]. More details about the classification of medical imaging applications can be found in [ 226 , 261 , 262 , 263 , 264 , 265 ].

Localization

Although applications in anatomy education could increase, the practicing clinician is more likely to be interested in the localization of normal anatomy. Radiological images are independently examined and described outside of human intervention, while localization could be applied in completely automatic end-to-end applications [ 266 , 267 , 268 ]. Zhao et al. [ 269 ] introduced a new deep learning-based approach to localize pancreatic tumor in projection X-ray images for image-guided radiation therapy without the need for fiducials. Roth et al. [ 270 ] constructed and trained a CNN using five convolutional layers to classify around 4000 transverse-axial CT images. These authors used five categories for classification: legs, pelvis, liver, lung, and neck. After data augmentation techniques were applied, they achieved an AUC score of 0.998 and the classification error rate of the model was 5.9%. For detecting the positions of the spleen, kidney, heart, and liver, Shin et al. [ 271 ] employed stacked auto-encoders on 78 contrast-improved MRI scans of the stomach area containing the kidneys or liver. Temporal and spatial domains were used to learn the hierarchal features. Based on the organs, these approaches achieved detection accuracies of 62–79%. Sirazitdinov et al. [ 268 ] presented an aggregate of two convolutional neural networks, namely RetinaNet and Mask R-CNN for pneumonia detection and localization.

Computer-Aided Detection (CADe) is another method used for detection. For both the clinician and the patient, overlooking a lesion on a scan may have dire consequences. Thus, detection is a field of study requiring both accuracy and sensitivity [ 272 , 273 , 274 ]. Chouhan et al. [ 275 ] introduced an innovative deep learning framework for the detection of pneumonia by adopting the idea of transfer learning. Their approach obtained an accuracy of 96.4% with a recall of 99.62% on unseen data. In the area of COVID-19 and pulmonary disease, several convolutional neural network approaches have been proposed for automatic detection from X-ray images which showed an excellent performance [ 46 , 276 , 277 , 278 , 279 ].

In the area of skin cancer, there several applications were introduced for the detection task [ 280 , 281 , 282 ]. Thurnhofer-Hemsi et al. [ 283 ] introduced a deep learning approach for skin cancer detection by fine-tuning five state-of-art convolutional neural network models. They addressed the issue of a lack of training data by adopting the ideas of transfer learning and data augmentation techniques. DenseNet201 network has shown superior results compared to other models.

Another interesting area is that of histopathological images, which are progressively digitized. Several papers have been published in this field [ 284 , 285 , 286 , 287 , 288 , 289 , 290 ]. Human pathologists read these images laboriously; they search for malignancy markers, such as a high index of cell proliferation, using molecular markers (e.g. Ki-67), cellular necrosis signs, abnormal cellular architecture, enlarged numbers of mitotic figures denoting augmented cell replication, and enlarged nucleus-to-cytoplasm ratios. Note that the histopathological slide may contain a huge number of cells (up to the thousands). Thus, the risk of disregarding abnormal neoplastic regions is high when wading through these cells at excessive levels of magnification. Ciresan et al. [ 291 ] employed CNNs of 11–13 layers for identifying mitotic figures. Fifty breast histology images from the MITOS dataset were used. Their technique attained recall and precision scores of 0.7 and 0.88 respectively. Sirinukunwattana et al. [ 292 ] utilized 100 histology images of colorectal adenocarcinoma to detect cell nuclei using CNNs. Roughly 30,000 nuclei were hand-labeled for training purposes. The novelty of this approach was in the use of Spatially Constrained CNN. This CNN detects the center of nuclei using the surrounding spatial context and spatial regression. Instead of this CNN, Xu et al. [ 293 ] employed a stacked sparse auto-encoder (SSAE) to identify nuclei in histological slides of breast cancer, achieving 0.83 and 0.89 recall and precision scores respectively. In this field, they showed that unsupervised learning techniques are also effectively utilized. In medical images, Albarquoni et al. [ 294 ] investigated the problem of insufficient labeling. They crowd-sourced the actual mitoses labeling in the histology images of breast cancer (from amateurs online). Solving the recurrent issue of inadequate labeling during the analysis of medical images can be achieved by feeding the crowd-sourced input labels into the CNN. This method signifies a remarkable proof-of-concept effort. In 2020, Lei et al. [ 285 ] introduced the employment of deep convolutional neural networks for automatic identification of mitotic candidates from histological sections for mitosis screening. They obtained the state-of-the-art detection results on the dataset of the International Pattern Recognition Conference (ICPR) 2012 Mitosis Detection Competition.

Segmentation

Although MRI and CT image segmentation research includes different organs such as knee cartilage, prostate, and liver, most research work has concentrated on brain segmentation, particularly tumors [ 295 , 296 , 297 , 298 , 299 , 300 ]. This issue is highly significant in surgical preparation to obtain the precise tumor limits for the shortest surgical resection. During surgery, excessive sacrificing of key brain regions may lead to neurological shortfalls including cognitive damage, emotionlessness, and limb difficulty. Conventionally, medical anatomical segmentation was done by hand; more specifically, the clinician draws out lines within the complete stack of the CT or MRI volume slice by slice. Thus, it is perfect for implementing a solution that computerizes this painstaking work. Wadhwa et al. [ 301 ] presented a brief overview on brain tumor segmentation of MRI images. Akkus et al. [ 302 ] wrote a brilliant review of brain MRI segmentation that addressed the different metrics and CNN architectures employed. Moreover, they explain several competitions in detail, as well as their datasets, which included Ischemic Stroke Lesion Segmentation (ISLES), Mild Traumatic brain injury Outcome Prediction (MTOP), and Brain Tumor Segmentation (BRATS).

Chen et al. [ 299 ] proposed convolutional neural networks for precise brain tumor segmentation. The approach that they employed involves several approaches for better features learning including the DeepMedic model, a novel dual-force training scheme, a label distribution-based loss function, and Multi-Layer Perceptron-based post-processing. They conducted their method on the two most modern brain tumor segmentation datasets, i.e., BRATS 2017 and BRATS 2015 datasets. Hu et al. [ 300 ] introduced the brain tumor segmentation method by adopting a multi-cascaded convolutional neural network (MCCNN) and fully connected conditional random fields (CRFs). The achieved results were excellent compared with the state-of-the-art methods.

Moeskops et al. [ 303 ] employed three parallel-running CNNs, each of which had a 2D input patch of dissimilar size, for segmenting and classifying MRI brain images. These images, which include 35 adults and 22 pre-term infants, were classified into various tissue categories such as cerebrospinal fluid, grey matter, and white matter. Every patch concentrates on capturing various image aspects with the benefit of employing three dissimilar sizes of input patch; here, the bigger sizes incorporated the spatial features, while the lowest patch sizes concentrated on the local textures. In general, the algorithm has Dice coefficients in the range of 0.82–0.87 and achieved a satisfactory accuracy. Although 2D image slices are employed in the majority of segmentation research, Milletrate et al. [ 304 ] implemented 3D CNN for segmenting MRI prostate images. Furthermore, they used the PROMISE2012 challenge dataset, from which fifty MRI scans were used for training and thirty for testing. The U-Net architecture of Ronnerberger et al. [ 305 ] inspired their V-net. This model attained a 0.869 Dice coefficient score, the same as the winning teams in the competition. To reduce overfitting and create the model of a deeper 11-convolutional layer CNN, Pereira et al. [ 306 ] applied intentionally small-sized filters of 3x3. Their model used MRI scans of 274 gliomas (a type of brain tumor) for training. They achieved first place in the 2013 BRATS challenge, as well as second place in the BRATS challenge 2015. Havaei et al. [ 307 ] also considered gliomas using the 2013 BRATS dataset. They investigated different 2D CNN architectures. Compared to the winner of BRATS 2013, their algorithm worked better, as it required only 3 min to execute rather than 100 min. The concept of cascaded architecture formed the basis of their model. Thus, it is referred to as an InputCascadeCNN. Employing FC Conditional Random Fields (CRFs), atrous spatial pyramid pooling, and up-sampled filters were techniques introduced by Chen et al. [ 308 ]. These authors aimed to enhance the accuracy of localization and enlarge the field of view of every filter at a multi-scale. Their model, DeepLab, attained 79.7% mIOU (mean Intersection Over Union). In the PASCAL VOC-2012 image segmentation, their model obtained an excellent performance.

Recently, the Automatic segmentation of COVID-19 Lung Infection from CT Images helps to detect the development of COVID-19 infection by employing several deep learning techniques [ 309 , 310 , 311 , 312 ].

Registration

Usually, given two input images, the four main stages of the canonical procedure of the image registration task are [ 313 , 314 ]:

Target Selection: it illustrates the determined input image that the second counterpart input image needs to remain accurately superimposed to.

Feature Extraction: it computes the set of features extracted from each input image.

Feature Matching: it allows finding similarities between the previously obtained features.

Pose Optimization: it is aimed to minimize the distance between both input images.

Then, the result of the registration procedure is the suitable geometric transformation (e.g. translation, rotation, scaling, etc.) that provides both input images within the same coordinate system in a way the distance between them is minimal, i.e. their level of superimposition/overlapping is optimal. It is out of the scope of this work to provide an extensive review of this topic. Nevertheless, a short summary is accordingly introduced next.

Commonly, the input images for the DL-based registration approach could be in various forms, e.g. point clouds, voxel grids, and meshes. Additionally, some techniques allow as inputs the result of the Feature Extraction or Matching steps in the canonical scheme. Specifically, the outcome could be some data in a particular form as well as the result of the steps from the classical pipeline (feature vector, matching vector, and transformation). Nevertheless, with the newest DL-based methods, a novel conceptual type of ecosystem issues. It contains acquired characteristics about the target, materials, and their behavior that can be registered with the input data. Such a conceptual ecosystem is formed by a neural network and its training manner, and it could be counted as an input to the registration approach. Nevertheless, it is not an input that one might adopt in every registration situation since it corresponds to an interior data representation.

From a DL view-point, the interpretation of the conceptual design enables differentiating the input data of a registration approach into defined or non-defined models. In particular, the illustrated phases are models that depict particular spatial data (e.g. 2D or 3D) while a non-defined one is a generalization of a data set created by a learning system. Yumer et al. [ 315 ] developed a framework in which the model acquires characteristics of objects, meaning ready to identify what a more sporty car seems like or a more comfy chair is, also adjusting a 3D model to fit those characteristics while maintaining the main characteristics of the primary data. Likewise, a fundamental perspective of the unsupervised learning method introduced by Ding et al. [ 316 ] is that there is no target for the registration approach. In this instance, the network is able of placing each input point cloud in a global space, solving SLAM issues in which many point clouds have to be registered rigidly. On the other hand, Mahadevan [ 317 ] proposed the combination of two conceptual models utilizing the growth of Imagination Machines to give flexible artificial intelligence systems and relationships between the learned phases through training schemes that are not inspired on labels and classifications. Another practical application of DL, especially CNNs, to image registration is the 3D reconstruction of objects. Wang et al. [ 318 ] applied an adversarial way using CNNs to rebuild a 3D model of an object from its 2D image. The network learns many objects and orally accomplishes the registration between the image and the conceptual model. Similarly, Hermoza et al. [ 319 ] also utilize the GAN network for prognosticating the absent geometry of damaged archaeological objects, providing the reconstructed object based on a voxel grid format and a label selecting its class.

DL for medical image registration has numerous applications, which were listed by some review papers [ 320 , 321 , 322 ]. Yang et al. [ 323 ] implemented stacked convolutional layers as an encoder-decoder approach to predict the morphing of the input pixel into its last formation using MRI brain scans from the OASIS dataset. They employed a registration model known as Large Deformation Diffeomorphic Metric Mapping (LDDMM) and attained remarkable enhancements in computation time. Miao et al. [ 324 ] used synthetic X-ray images to train a five-layer CNN to register 3D models of a trans-esophageal probe, a hand implant, and a knee implant onto 2D X-ray images for pose estimation. They determined that their model achieved an execution time of 0.1 s, representing an important enhancement against the conventional registration techniques based on intensity; moreover, it achieved effective registrations 79–99% of the time. Li et al. [ 325 ] introduced a neural network-based approach for the non-rigid 2D–3D registration of the lateral cephalogram and the volumetric cone-beam CT (CBCT) images.

Computational approaches

For computationally exhaustive applications, complex ML and DL approaches have rapidly been identified as the most significant techniques and are widely used in different fields. The development and enhancement of algorithms aggregated with capabilities of well-behaved computational performance and large datasets make it possible to effectively execute several applications, as earlier applications were either not possible or difficult to take into consideration.

Currently, several standard DNN configurations are available. The interconnection patterns between layers and the total number of layers represent the main differences between these configurations. The Table  2 illustrates the growth rate of the overall number of layers over time, which seems to be far faster than the “Moore’s Law growth rate”. In normal DNN, the number of layers grew by around 2.3× each year in the period from 2012 to 2016. Recent investigations of future ResNet versions reveal that the number of layers can be extended up to 1000. However, an SGD technique is employed to fit the weights (or parameters), while different optimization techniques are employed to obtain parameter updating during the DNN training process. Repetitive updates are required to enhance network accuracy in addition to a minorly augmented rate of enhancement. For example, the training process using ImageNet as a large dataset, which contains more than 14 million images, along with ResNet as a network model, take around 30K to 40K repetitions to converge to a steady solution. In addition, the overall computational load, as an upper-level prediction, may exceed 1020 FLOPS when both the training set size and the DNN complexity increase.

Prior to 2008, boosting the training to a satisfactory extent was achieved by using GPUs. Usually, days or weeks are needed for a training session, even with GPU support. By contrast, several optimization strategies were developed to reduce the extensive learning time. The computational requirements are believed to increase as the DNNs continuously enlarge in both complexity and size.

In addition to the computational load cost, the memory bandwidth and capacity have a significant effect on the entire training performance, and to a lesser extent, deduction. More specifically, the parameters are distributed through every layer of the input data, there is a sizeable amount of reused data, and the computation of several network layers exhibits an excessive computation-to-bandwidth ratio. By contrast, there are no distributed parameters, the amount of reused data is extremely small, and the additional FC layers have an extremely small computation-to-bandwidth ratio. Table  3 presents a comparison between different aspects related to the devices. In addition, the table is established to facilitate familiarity with the tradeoffs by obtaining the optimal approach for configuring a system based on either FPGA, GPU, or CPU devices. It should be noted that each has corresponding weaknesses and strengths; accordingly, there are no clear one-size-fits-all solutions.

Although GPU processing has enhanced the ability to address the computational challenges related to such networks, the maximum GPU (or CPU) performance is not achieved, and several techniques or models have turned out to be strongly linked to bandwidth. In the worst cases, the GPU efficiency is between 15 and 20% of the maximum theoretical performance. This issue is required to enlarge the memory bandwidth using high-bandwidth stacked memory. Next, different approaches based on FPGA, GPU, and CPU are accordingly detailed.

CPU-based approach

The well-behaved performance of the CPU nodes usually assists robust network connectivity, storage abilities, and large memory. Although CPU nodes are more common-purpose than those of FPGA or GPU, they lack the ability to match them in unprocessed computation facilities, since this requires increased network ability and a larger memory capacity.

GPU-based approach

GPUs are extremely effective for several basic DL primitives, which include greatly parallel-computing operations such as activation functions, matrix multiplication, and convolutions [ 326 , 327 , 328 , 329 , 330 ]. Incorporating HBM-stacked memory into the up-to-date GPU models significantly enhances the bandwidth. This enhancement allows numerous primitives to efficiently utilize all computational resources of the available GPUs. The improvement in GPU performance over CPU performance is usually 10-20:1 related to dense linear algebra operations.

Maximizing parallel processing is the base of the initial GPU programming model. For example, a GPU model may involve up to sixty-four computational units. There are four SIMD engines per each computational layer, and each SIMD has sixteen floating-point computation lanes. The peak performance is 25 TFLOPS (fp16) and 10 TFLOPS (fp32) as the percentage of the employment approaches 100%. Additional GPU performance may be achieved if the addition and multiply functions for vectors combine the inner production instructions for matching primitives related to matrix operations.

For DNN training, the GPU is usually considered to be an optimized design, while for inference operations, it may also offer considerable performance improvements.

FPGA-based approach

FPGA is wildly utilized in various tasks including deep learning [ 199 , 247 , 331 , 332 , 333 , 334 ]. Inference accelerators are commonly implemented utilizing FPGA. The FPGA can be effectively configured to reduce the unnecessary or overhead functions involved in GPU systems. Compared to GPU, the FPGA is restricted to both weak-behaved floating-point performance and integer inference. The main FPGA aspect is the capability to dynamically reconfigure the array characteristics (at run-time), as well as the capability to configure the array by means of effective design with little or no overhead.

As mentioned earlier, the FPGA offers both performance and latency for every watt it gains over GPU and CPU in DL inference operations. Implementation of custom high-performance hardware, pruned networks, and reduced arithmetic precision are three factors that enable the FPGA to implement DL algorithms and to achieve FPGA with this level of efficiency. In addition, FPGA may be employed to implement CNN overlay engines with over 80% efficiency, eight-bit accuracy, and over 15 TOPs peak performance; this is used for a few conventional CNNs, as Xillinx and partners demonstrated recently. By contrast, pruning techniques are mostly employed in the LSTM context. The sizes of the models can be efficiently minimized by up to 20×, which provides an important benefit during the implementation of the optimal solution, as MLP neural processing demonstrated. A recent study in the field of implementing fixed-point precision and custom floating-point has revealed that lowering the 8-bit is extremely promising; moreover, it aids in supplying additional advancements to implementing peak performance FPGA related to the DNN models.

Evaluation metrics

Evaluation metrics adopted within DL tasks play a crucial role in achieving the optimized classifier [ 335 ]. They are utilized within a usual data classification procedure through two main stages: training and testing. It is utilized to optimize the classification algorithm during the training stage. This means that the evaluation metric is utilized to discriminate and select the optimized solution, e.g., as a discriminator, which can generate an extra-accurate forecast of upcoming evaluations related to a specific classifier. For the time being, the evaluation metric is utilized to measure the efficiency of the created classifier, e.g. as an evaluator, within the model testing stage using hidden data. As given in Eq. 20 , TN and TP are defined as the number of negative and positive instances, respectively, which are successfully classified. In addition, FN and FP are defined as the number of misclassified positive and negative instances respectively. Next, some of the most well-known evaluation metrics are listed below.

Accuracy: Calculates the ratio of correct predicted classes to the total number of samples evaluated (Eq. 20 ).

Sensitivity or Recall: Utilized to calculate the fraction of positive patterns that are correctly classified (Eq. 21 ).

Specificity: Utilized to calculate the fraction of negative patterns that are correctly classified (Eq. 22 ).

Precision: Utilized to calculate the positive patterns that are correctly predicted by all predicted patterns in a positive class (Eq. 23 ).

F1-Score: Calculates the harmonic average between recall and precision rates (Eq. 24 ).

J Score: This metric is also called Youdens J statistic. Eq. 25 represents the metric.

False Positive Rate (FPR): This metric refers to the possibility of a false alarm ratio as calculated in Eq. 26

Area Under the ROC Curve: AUC is a common ranking type metric. It is utilized to conduct comparisons between learning algorithms [ 336 , 337 , 338 ], as well as to construct an optimal learning model [ 339 , 340 ]. In contrast to probability and threshold metrics, the AUC value exposes the entire classifier ranking performance. The following formula is used to calculate the AUC value for two-class problem [ 341 ] (Eq. 27 )

Here, \(S_{p}\) represents the sum of all positive ranked samples. The number of negative and positive samples is denoted as \(n_{n}\) and \(n_{p}\) , respectively. Compared to the accuracy metrics, the AUC value was verified empirically and theoretically, making it very helpful for identifying an optimized solution and evaluating the classifier performance through classification training.

When considering the discrimination and evaluation processes, the AUC performance was brilliant. However, for multiclass issues, the AUC computation is primarily cost-effective when discriminating a large number of created solutions. In addition, the time complexity for computing the AUC is \(O \left( |C|^{2} \; n\log n\right) \) with respect to the Hand and Till AUC model [ 341 ] and \(O \left( |C| \; n\log n\right) \) according to Provost and Domingo’s AUC model [ 336 ].

Frameworks and datasets

Several DL frameworks and datasets have been developed in the last few years. various frameworks and libraries have also been used in order to expedite the work with good results. Through their use, the training process has become easier. Table  4 lists the most utilized frameworks and libraries.

Based on the star ratings on Github, as well as our own background in the field, TensorFlow is deemed the most effective and easy to use. It has the ability to work on several platforms. (Github is one of the biggest software hosting sites, while Github stars refer to how well-regarded a project is on the site). Moreover, there are several other benchmark datasets employed for different DL tasks. Some of these are listed in Table  5 .

Summary and conclusion

Finally, it is mandatory the inclusion of a brief discussion by gathering all the relevant data provided along this extensive research. Next, an itemized analysis is presented in order to conclude our review and exhibit the future directions.

DL already experiences difficulties in simultaneously modeling multi-complex modalities of data. In recent DL developments, another common approach is that of multimodal DL.

DL requires sizeable datasets (labeled data preferred) to predict unseen data and to train the models. This challenge turns out to be particularly difficult when real-time data processing is required or when the provided datasets are limited (such as in the case of healthcare data). To alleviate this issue, TL and data augmentation have been researched over the last few years.

Although ML slowly transitions to semi-supervised and unsupervised learning to manage practical data without the need for manual human labeling, many of the current deep-learning models utilize supervised learning.

The CNN performance is greatly influenced by hyper-parameter selection. Any small change in the hyper-parameter values will affect the general CNN performance. Therefore, careful parameter selection is an extremely significant issue that should be considered during optimization scheme development.

Impressive and robust hardware resources like GPUs are required for effective CNN training. Moreover, they are also required for exploring the efficiency of using CNN in smart and embedded systems.

In the CNN context, ensemble learning [ 342 , 343 ] represents a prospective research area. The collection of different and multiple architectures will support the model in improving its generalizability across different image categories through extracting several levels of semantic image representation. Similarly, ideas such as new activation functions, dropout, and batch normalization also merit further investigation.

The exploitation of depth and different structural adaptations is significantly improved in the CNN learning capacity. Substituting the traditional layer configuration with blocks results in significant advances in CNN performance, as has been shown in the recent literature. Currently, developing novel and efficient block architectures is the main trend in new research models of CNN architectures. HRNet is only one example that shows there are always ways to improve the architecture.

It is expected that cloud-based platforms will play an essential role in the future development of computational DL applications. Utilizing cloud computing offers a solution to handling the enormous amount of data. It also helps to increase efficiency and reduce costs. Furthermore, it offers the flexibility to train DL architectures.

With the recent development in computational tools including a chip for neural networks and a mobile GPU, we will see more DL applications on mobile devices. It will be easier for users to use DL.

Regarding the issue of lack of training data, It is expected that various techniques of transfer learning will be considered such as training the DL model on large unlabeled image datasets and next transferring the knowledge to train the DL model on a small number of labeled images for the same task.

Last, this overview provides a starting point for the community of DL being interested in the field of DL. Furthermore, researchers would be allowed to decide the more suitable direction of work to be taken in order to provide more accurate alternatives to the field.

Availability of data and materials

Not applicable.

Rozenwald MB, Galitsyna AA, Sapunov GV, Khrameeva EE, Gelfand MS. A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features. PeerJ Comput Sci. 2020;6:307.

Article   Google Scholar  

Amrit C, Paauw T, Aly R, Lavric M. Identifying child abuse through text mining and machine learning. Expert Syst Appl. 2017;88:402–18.

Hossain E, Khan I, Un-Noor F, Sikander SS, Sunny MSH. Application of big data and machine learning in smart grid, and associated security concerns: a review. IEEE Access. 2019;7:13960–88.

Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H. Survey of review spam detection using machine learning techniques. J Big Data. 2015;2(1):23.

Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P, Quadrana M. Content-based video recommendation system based on stylistic visual features. J Data Semant. 2016;5(2):99–113.

Al-Dulaimi K, Chandran V, Nguyen K, Banks J, Tomeo-Reyes I. Benchmarking hep-2 specimen cells classification using linear discriminant analysis on higher order spectra features of cell shape. Pattern Recogn Lett. 2019;125:534–41.

Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing. 2017;234:11–26.

Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar S. A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR). 2018;51(5):1–36.

Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Hasan M, Van Essen BC, Awwal AA, Asari VK. A state-of-the-art survey on deep learning theory and architectures. Electronics. 2019;8(3):292.

Potok TE, Schuman C, Young S, Patton R, Spedalieri F, Liu J, Yao KT, Rose G, Chakma G. A study of complex deep learning networks on high-performance, neuromorphic, and quantum computers. ACM J Emerg Technol Comput Syst (JETC). 2018;14(2):1–21.

Adeel A, Gogate M, Hussain A. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Inf Fusion. 2020;59:163–70.

Tian H, Chen SC, Shyu ML. Evolutionary programming based deep learning feature selection and network construction for visual data classification. Inf Syst Front. 2020;22(5):1053–66.

Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag. 2018;13(3):55–75.

Koppe G, Meyer-Lindenberg A, Durstewitz D. Deep learning for small and big data in psychiatry. Neuropsychopharmacology. 2021;46(1):176–90.

Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1. IEEE; 2005. p. 886–93.

Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol. 2. IEEE; 1999. p. 1150–7.

Wu L, Hoi SC, Yu N. Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process. 2010;19(7):1908–20.

Article   MathSciNet   MATH   Google Scholar  

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

Yao G, Lei T, Zhong J. A review of convolutional-neural-network-based action recognition. Pattern Recogn Lett. 2019;118:14–22.

Dhillon A, Verma GK. Convolutional neural network: a review of models, methodologies and applications to object detection. Prog Artif Intell. 2020;9(2):85–112.

Khan A, Sohail A, Zahoora U, Qureshi AS. A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev. 2020;53(8):5455–516.

Hasan RI, Yusuf SM, Alzubaidi L. Review of the state of the art of deep learning for plant diseases: a broad analysis and discussion. Plants. 2020;9(10):1302.

Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X. A review of object detection based on deep learning. Multimed Tools Appl. 2020;79(33):23729–91.

Ker J, Wang L, Rao J, Lim T. Deep learning applications in medical image analysis. IEEE Access. 2017;6:9375–89.

Zhang Z, Cui P, Zhu W. Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. 2020. https://doi.org/10.1109/TKDE.2020.2981333 .

Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access. 2019;7:53040–65.

Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E. Deep learning applications and challenges in big data analytics. J Big Data. 2015;2(1):1.

Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning, vol. 1. Cambridge: MIT press; 2016.

MATH   Google Scholar  

Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for COVID-19. J Big Data. 2021;8(1):1–54.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.

Bhowmick S, Nagarajaiah S, Veeraraghavan A. Vision and deep learning-based algorithms to detect and quantify cracks on concrete surfaces from uav videos. Sensors. 2020;20(21):6299.

Goh GB, Hodas NO, Vishnu A. Deep learning for computational chemistry. J Comput Chem. 2017;38(16):1291–307.

Li Y, Zhang T, Sun S, Gao X. Accelerating flash calculation through deep learning methods. J Comput Phys. 2019;394:153–65.

Yang W, Zhang X, Tian Y, Wang W, Xue JH, Liao Q. Deep learning for single image super-resolution: a brief review. IEEE Trans Multimed. 2019;21(12):3106–21.

Tang J, Li S, Liu P. A review of lane detection methods based on deep learning. Pattern Recogn. 2020;111:107623.

Zhao ZQ, Zheng P, Xu ST, Wu X. Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst. 2019;30(11):3212–32.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–8.

Ng A. Machine learning yearning: technical strategy for AI engineers in the era of deep learning. 2019. https://www.mlyearning.org .

Metz C. Turing award won by 3 pioneers in artificial intelligence. The New York Times. 2019;27.

Nevo S, Anisimov V, Elidan G, El-Yaniv R, Giencke P, Gigi Y, Hassidim A, Moshe Z, Schlesinger M, Shalev G, et al. Ml for flood forecasting at scale; 2019. arXiv preprint arXiv:1901.09583 .

Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241–50.

Benhammou Y, Achchab B, Herrera F, Tabik S. Breakhis based breast cancer automatic diagnosis using deep learning: taxonomy, survey and insights. Neurocomputing. 2020;375:9–24.

Wulczyn E, Steiner DF, Xu Z, Sadhwani A, Wang H, Flament-Auvigne I, Mermel CH, Chen PHC, Liu Y, Stumpe MC. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS ONE. 2020;15(6):e0233678.

Nagpal K, Foote D, Liu Y, Chen PHC, Wulczyn E, Tan F, Olson N, Smith JL, Mohtashamian A, Wren JH, et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med. 2019;2(1):1–10.

Google Scholar  

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8.

Brunese L, Mercaldo F, Reginelli A, Santone A. Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays. Comput Methods Programs Biomed. 2020;196(105):608.

Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani M, et al. Artificial intelligence and COVID-19: deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8:109581–95.

Shorfuzzaman M, Hossain MS. Metacovid: a siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recogn. 2020;113:107700.

Carvelli L, Olesen AN, Brink-Kjær A, Leary EB, Peppard PE, Mignot E, Sørensen HB, Jennum P. Design of a deep learning model for automatic scoring of periodic and non-periodic leg movements during sleep validated against multiple human experts. Sleep Med. 2020;69:109–19.

De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, Glorot X, O’Donoghue B, Visentin D, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342–50.

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56.

Kermany DS, Goldbaum M, Cai W, Valentim CC, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122–31.

Van Essen B, Kim H, Pearce R, Boakye K, Chen B. Lbann: livermore big artificial neural network HPC toolkit. In: Proceedings of the workshop on machine learning in high-performance computing environments; 2015. p. 1–6.

Saeed MM, Al Aghbari Z, Alsharidah M. Big data clustering techniques based on spark: a literature review. PeerJ Comput Sci. 2020;6:321.

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529–33.

Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing; 2013. p. 1631–42.

Goller C, Kuchler A. Learning task-dependent distributed representations by backpropagation through structure. In: Proceedings of international conference on neural networks (ICNN’96), vol 1. IEEE; 1996. p. 347–52.

Socher R, Lin CCY, Ng AY, Manning CD. Parsing natural scenes and natural language with recursive neural networks. In: ICML; 2011.

Louppe G, Cho K, Becot C, Cranmer K. QCD-aware recursive neural networks for jet physics. J High Energy Phys. 2019;2019(1):57.

Sadr H, Pedram MM, Teshnehlab M. A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural Process Lett. 2019;50(3):2745–61.

Urban G, Subrahmanya N, Baldi P. Inner and outer recursive neural networks for chemoinformatics applications. J Chem Inf Model. 2018;58(2):207–11.

Hewamalage H, Bergmeir C, Bandara K. Recurrent neural networks for time series forecasting: current status and future directions. Int J Forecast. 2020;37(1):388–427.

Jiang Y, Kim H, Asnani H, Kannan S, Oh S, Viswanath P. Learn codes: inventing low-latency codes via recurrent neural networks. IEEE J Sel Areas Inf Theory. 2020;1(1):207–16.

John RA, Acharya J, Zhu C, Surendran A, Bose SK, Chaturvedi A, Tiwari N, Gao Y, He Y, Zhang KK, et al. Optogenetics inspired transition metal dichalcogenide neuristors for in-memory deep recurrent neural networks. Nat Commun. 2020;11(1):1–9.

Batur Dinler Ö, Aydin N. An optimal feature parameter set based on gated recurrent unit recurrent neural networks for speech segment detection. Appl Sci. 2020;10(4):1273.

Jagannatha AN, Yu H. Structured prediction models for RNN based sequence labeling in clinical text. In: Proceedings of the conference on empirical methods in natural language processing. conference on empirical methods in natural language processing, vol. 2016, NIH Public Access; 2016. p. 856.

Pascanu R, Gulcehre C, Cho K, Bengio Y. How to construct deep recurrent neural networks. In: Proceedings of the second international conference on learning representations (ICLR 2014); 2014.

Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics; 2010. p. 249–56.

Gao C, Yan J, Zhou S, Varshney PK, Liu H. Long short-term memory-based deep recurrent neural networks for target tracking. Inf Sci. 2019;502:279–96.

Zhou DX. Theory of deep convolutional neural networks: downsampling. Neural Netw. 2020;124:319–27.

Article   MATH   Google Scholar  

Jhong SY, Tseng PY, Siriphockpirom N, Hsia CH, Huang MS, Hua KL, Chen YY. An automated biometric identification system using CNN-based palm vein recognition. In: 2020 international conference on advanced robotics and intelligent systems (ARIS). IEEE; 2020. p. 1–6.

Al-Azzawi A, Ouadou A, Max H, Duan Y, Tanner JJ, Cheng J. Deepcryopicker: fully automated deep neural network for single protein particle picking in cryo-EM. BMC Bioinform. 2020;21(1):1–38.

Wang T, Lu C, Yang M, Hong F, Liu C. A hybrid method for heartbeat classification via convolutional neural networks, multilayer perceptrons and focal loss. PeerJ Comput Sci. 2020;6:324.

Li G, Zhang M, Li J, Lv F, Tong G. Efficient densely connected convolutional neural networks. Pattern Recogn. 2021;109:107610.

Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, et al. Recent advances in convolutional neural networks. Pattern Recogn. 2018;77:354–77.

Fang W, Love PE, Luo H, Ding L. Computer vision for behaviour-based safety in construction: a review and future directions. Adv Eng Inform. 2020;43:100980.

Palaz D, Magimai-Doss M, Collobert R. End-to-end acoustic modeling using convolutional neural networks for hmm-based automatic speech recognition. Speech Commun. 2019;108:15–32.

Li HC, Deng ZY, Chiang HH. Lightweight and resource-constrained learning network for face recognition with performance optimization. Sensors. 2020;20(21):6114.

Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962;160(1):106.

Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift; 2015. arXiv preprint arXiv:1502.03167 .

Ruder S. An overview of gradient descent optimization algorithms; 2016. arXiv preprint arXiv:1609.04747 .

Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer; 2010. p. 177–86.

Hinton G, Srivastava N, Swersky K. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on. 2012;14(8).

Zhang Z. Improved Adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS). IEEE; 2018. p. 1–2.

Alzubaidi L, Fadhel MA, Al-Shamma O, Zhang J, Duan Y. Deep learning models for classification of red blood cells in microscopy images to aid in sickle cell anemia diagnosis. Electronics. 2020;9(3):427.

Alzubaidi L, Fadhel MA, Al-Shamma O, Zhang J, Santamaría J, Duan Y, Oleiwi SR. Towards a better understanding of transfer learning for medical imaging: a case study. Appl Sci. 2020;10(13):4523.

Alzubaidi L, Al-Shamma O, Fadhel MA, Farhan L, Zhang J, Duan Y. Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics. 2020;9(3):445.

LeCun Y, Jackel LD, Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Muller UA, Sackinger E, Simard P, et al. Learning algorithms for classification: a comparison on handwritten digit recognition. Neural Netw Stat Mech Perspect. 1995;261:276.

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.

MathSciNet   MATH   Google Scholar  

Dahl GE, Sainath TN, Hinton GE. Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE; 2013. p. 8609–13.

Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network; 2015. arXiv preprint arXiv:1505.00853 .

Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl Based Syst. 1998;6(02):107–16.

Lin M, Chen Q, Yan S. Network in network; 2013. arXiv preprint arXiv:1312.4400 .

Hsiao TY, Chang YC, Chou HH, Chiu CT. Filter-based deep-compression with global average pooling for convolutional networks. J Syst Arch. 2019;95:9–18.

Li Z, Wang SH, Fan RR, Cao G, Zhang YD, Guo T. Teeth category classification via seven-layer deep convolutional neural network with max pooling and global average pooling. Int J Imaging Syst Technol. 2019;29(4):577–83.

Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer; 2014. p. 818–33.

Erhan D, Bengio Y, Courville A, Vincent P. Visualizing higher-layer features of a deep network. Univ Montreal. 2009;1341(3):1.

Le QV. Building high-level features using large scale unsupervised learning. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE; 2013. p. 8595–8.

Grün F, Rupprecht C, Navab N, Tombari F. A taxonomy and library for visualizing learned features in convolutional neural networks; 2016. arXiv preprint arXiv:1606.07757 .

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition; 2014. arXiv preprint arXiv:1409.1556 .

Ranzato M, Huang FJ, Boureau YL, LeCun Y. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE; 2007. p. 1–8.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.

Bengio Y, et al. Rmsprop and equilibrated adaptive learning rates for nonconvex optimization; 2015. arXiv:1502.04390 corr abs/1502.04390

Srivastava RK, Greff K, Schmidhuber J. Highway networks; 2015. arXiv preprint arXiv:1505.00387 .

Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid. 2017;10(1):841–51.

Ordóñez FJ, Roggen D. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors. 2016;16(1):115.

CireşAn D, Meier U, Masci J, Schmidhuber J. Multi-column deep neural network for traffic sign classification. Neural Netw. 2012;32:333–8.

Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning; 2016. arXiv preprint arXiv:1602.07261 .

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 2818–26.

Wu S, Zhong S, Liu Y. Deep residual learning for image steganalysis. Multimed Tools Appl. 2018;77(9):10437–53.

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 4700–08.

Rubin J, Parvaneh S, Rahman A, Conroy B, Babaeizadeh S. Densely connected convolutional networks for detection of atrial fibrillation from short single-lead ECG recordings. J Electrocardiol. 2018;51(6):S18-21.

Kuang P, Ma T, Chen Z, Li F. Image super-resolution with densely connected convolutional networks. Appl Intell. 2019;49(1):125–36.

Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 1492–500.

Su A, He X, Zhao X. Jpeg steganalysis based on ResNeXt with gauss partial derivative filters. Multimed Tools Appl. 2020;80(3):3349–66.

Yadav D, Jalal A, Garlapati D, Hossain K, Goyal A, Pant G. Deep learning-based ResNeXt model in phycological studies for future. Algal Res. 2020;50:102018.

Han W, Feng R, Wang L, Gao L. Adaptive spatial-scale-aware deep convolutional neural network for high-resolution remote sensing imagery scene classification. In: IGARSS 2018-2018 IEEE international geoscience and remote sensing symposium. IEEE; 2018. p. 4736–9.

Zagoruyko S, Komodakis N. Wide residual networks; 2016. arXiv preprint arXiv:1605.07146 .

Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ. Deep networks with stochastic depth. In: European conference on computer vision. Springer; 2016. p. 646–61.

Huynh HT, Nguyen H. Joint age estimation and gender classification of Asian faces using wide ResNet. SN Comput Sci. 2020;1(5):1–9.

Takahashi R, Matsubara T, Uehara K. Data augmentation using random image cropping and patching for deep cnns. IEEE Trans Circuits Syst Video Technol. 2019;30(9):2917–31.

Han D, Kim J, Kim J. Deep pyramidal residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 5927–35.

Wang Y, Wang L, Wang H, Li P. End-to-end image super-resolution via deep and shallow convolutional networks. IEEE Access. 2019;7:31959–70.

Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 1251–8.

Lo WW, Yang X, Wang Y. An xception convolutional neural network for malware classification with transfer learning. In: 2019 10th IFIP international conference on new technologies, mobility and security (NTMS). IEEE; 2019. p. 1–5.

Rahimzadeh M, Attar A. A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of xception and resnet50v2. Inform Med Unlocked. 2020;19:100360.

Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 3156–64.

Salakhutdinov R, Larochelle H. Efficient learning of deep boltzmann machines. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics; 2010. p. 693–700.

Goh H, Thome N, Cord M, Lim JH. Top-down regularization of deep belief networks. Adv Neural Inf Process Syst. 2013;26:1878–86.

Guan J, Lai R, Xiong A, Liu Z, Gu L. Fixed pattern noise reduction for infrared images based on cascade residual attention CNN. Neurocomputing. 2020;377:301–13.

Bi Q, Qin K, Zhang H, Li Z, Xu K. RADC-Net: a residual attention based convolution network for aerial scene classification. Neurocomputing. 2020;377:345–59.

Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2015. p. 2017–25.

Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 7132–41.

Mou L, Zhu XX. Learning to pay attention on spectral domain: a spectral attention module-based convolutional network for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2019;58(1):110–22.

Woo S, Park J, Lee JY, So Kweon I. CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 3–19.

Roy AG, Navab N, Wachinger C. Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In: International conference on medical image computing and computer-assisted intervention. Springer; 2018. p. 421–9.

Roy AG, Navab N, Wachinger C. Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation’’ blocks. IEEE Trans Med Imaging. 2018;38(2):540–9.

Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2017. p. 3856–66.

Arun P, Buddhiraju KM, Porwal A. Capsulenet-based spatial-spectral classifier for hyperspectral images. IEEE J Sel Topics Appl Earth Obs Remote Sens. 2019;12(6):1849–65.

Xinwei L, Lianghao X, Yi Y. Compact video fingerprinting via an improved capsule net. Syst Sci Control Eng. 2020;9:1–9.

Ma B, Li X, Xia Y, Zhang Y. Autonomous deep learning: a genetic DCNN designer for image classification. Neurocomputing. 2020;379:152–61.

Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, et al. Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2020. https://doi.org/10.1109/TPAMI.2020.2983686 .

Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L. Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR 2020; 2020. https://www.microsoft.com/en-us/research/publication/higherhrnet-scale-aware-representation-learning-for-bottom-up-human-pose-estimation/ .

Karimi H, Derr T, Tang J. Characterizing the decision boundary of deep neural networks; 2019. arXiv preprint arXiv:1912.11460 .

Li Y, Ding L, Gao X. On the decision boundary of deep neural networks; 2018. arXiv preprint arXiv:1808.05385 .

Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2014. p. 3320–8.

Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International conference on artificial neural networks. Springer; 2018. p. 270–9.

Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):9.

Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):60.

Wang F, Wang H, Wang H, Li G, Situ G. Learning from simulation: an end-to-end deep-learning approach for computational ghost imaging. Opt Express. 2019;27(18):25560–72.

Pan W. A survey of transfer learning for collaborative recommendation with auxiliary data. Neurocomputing. 2016;177:447–53.

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE; 2009. p. 248–55.

Cook D, Feuz KD, Krishnan NC. Transfer learning for activity recognition: a survey. Knowl Inf Syst. 2013;36(3):537–56.

Cao X, Wang Z, Yan P, Li X. Transfer learning for pedestrian detection. Neurocomputing. 2013;100:51–7.

Raghu M, Zhang C, Kleinberg J, Bengio S. Transfusion: understanding transfer learning for medical imaging. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2019. p. 3347–57.

Pham TN, Van Tran L, Dao SVT. Early disease classification of mango leaves using feed-forward neural network and hybrid metaheuristic feature selection. IEEE Access. 2020;8:189960–73.

Saleh AM, Hamoud T. Analysis and best parameters selection for person recognition based on gait model using CNN algorithm and image augmentation. J Big Data. 2021;8(1):1–20.

Hirahara D, Takaya E, Takahara T, Ueda T. Effects of data count and image scaling on deep learning training. PeerJ Comput Sci. 2020;6:312.

Moreno-Barea FJ, Strazzera F, Jerez JM, Urda D, Franco L. Forward noise adjustment scheme for data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI). IEEE; 2018. p. 728–34.

Dua D, Karra Taniskidou E. Uci machine learning repository. Irvine: University of california. School of Information and Computer Science; 2017. http://archive.ics.uci.edu/ml

Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6(1):27.

Yang P, Zhang Z, Zhou BB, Zomaya AY. Sample subset optimization for classifying imbalanced biological data. In: Pacific-Asia conference on knowledge discovery and data mining. Springer; 2011. p. 333–44.

Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY. Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern. 2013;44(3):445–55.

Wang S, Sun S, Xu J. Auc-maximized deep convolutional neural fields for sequence labeling 2015. arXiv preprint arXiv:1511.05265 .

Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. Deepre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. 2018;34(5):760–9.

Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. 2019;166:4–21.

Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2016. p. 3504–12.

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15(141):20170,387.

Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931–4.

Pokuri BSS, Ghosal S, Kokate A, Sarkar S, Ganapathysubramanian B. Interpretable deep learning for guided microstructure-property explorations in photovoltaics. NPJ Comput Mater. 2019;5(1):1–11.

Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 1135–44.

Wang L, Nie R, Yu Z, Xin R, Zheng C, Zhang Z, Zhang J, Cai J. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat Mach Intell. 2020;2(11):1–11.

Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks; 2017. arXiv preprint arXiv:1703.01365 .

Platt J, et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif. 1999;10(3):61–74.

Nair T, Precup D, Arnold DL, Arbel T. Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation. Med Image Anal. 2020;59:101557.

Herzog L, Murina E, Dürr O, Wegener S, Sick B. Integrating uncertainty in deep neural networks for MRI based stroke analysis. Med Image Anal. 2020;65:101790.

Pereyra G, Tucker G, Chorowski J, Kaiser Ł, Hinton G. Regularizing neural networks by penalizing confident output distributions; 2017. arXiv preprint arXiv:1701.06548 .

Naeini MP, Cooper GF, Hauskrecht M. Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the... AAAI conference on artificial intelligence. AAAI conference on artificial intelligence, vol. 2015. NIH Public Access; 2015. p. 2901.

Li M, Sethi IK. Confidence-based classifier design. Pattern Recogn. 2006;39(7):1230–40.

Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: ICML, vol. 1, Citeseer; 2001. p. 609–16.

Steinwart I. Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans Inf Theory. 2005;51(1):128–42.

Lee K, Lee K, Shin J, Lee H. Overcoming catastrophic forgetting with unlabeled data in the wild. In: Proceedings of the IEEE international conference on computer vision; 2019. p. 312–21.

Shmelkov K, Schmid C, Alahari K. Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE international conference on computer vision; 2017. p. 3400–09.

Zenke F, Gerstner W, Ganguli S. The temporal paradox of Hebbian learning and homeostatic plasticity. Curr Opin Neurobiol. 2017;43:166–76.

Andersen N, Krauth N, Nabavi S. Hebbian plasticity in vivo: relevance and induction. Curr Opin Neurobiol. 2017;45:188–92.

Zheng R, Chakraborti S. A phase ii nonparametric adaptive exponentially weighted moving average control chart. Qual Eng. 2016;28(4):476–90.

Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH. ICARL: Incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2001–10.

Hinton GE, Plaut DC. Using fast weights to deblur old memories. In: Proceedings of the ninth annual conference of the cognitive science society; 1987. p. 177–86.

Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with neural networks: a review. Neural Netw. 2019;113:54–71.

Soltoggio A, Stanley KO, Risi S. Born to learn: the inspiration, progress, and future of evolved plastic artificial neural networks. Neural Netw. 2018;108:48–67.

Parisi GI, Tani J, Weber C, Wermter S. Lifelong learning of human actions with deep neural network self-organization. Neural Netw. 2017;96:137–49.

Cheng Y, Wang D, Zhou P, Zhang T. Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag. 2018;35(1):126–36.

Wiedemann S, Kirchhoffer H, Matlage S, Haase P, Marban A, Marinč T, Neumann D, Nguyen T, Schwarz H, Wiegand T, et al. Deepcabac: a universal compression algorithm for deep neural networks. IEEE J Sel Topics Signal Process. 2020;14(4):700–14.

Mehta N, Pandit A. Concurrence of big data analytics and healthcare: a systematic review. Int J Med Inform. 2018;114:57–65.

Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24–9.

Shawahna A, Sait SM, El-Maleh A. Fpga-based accelerators of deep learning networks for learning and classification: a review. IEEE Access. 2018;7:7823–59.

Min Z. Public welfare organization management system based on FPGA and deep learning. Microprocess Microsyst. 2020;80:103333.

Al-Shamma O, Fadhel MA, Hameed RA, Alzubaidi L, Zhang J. Boosting convolutional neural networks performance based on fpga accelerator. In: International conference on intelligent systems design and applications. Springer; 2018. p. 509–17.

Han S, Mao H, Dally WJ. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding; 2015. arXiv preprint arXiv:1510.00149 .

Chen Z, Zhang L, Cao Z, Guo J. Distilling the knowledge from handcrafted features for human activity recognition. IEEE Trans Ind Inform. 2018;14(10):4334–42.

Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network; 2015. arXiv preprint arXiv:1503.02531 .

Lenssen JE, Fey M, Libuschewski P. Group equivariant capsule networks. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2018. p. 8844–53.

Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2014. p. 1269–77.

Xu Q, Zhang M, Gu Z, Pan G. Overfitting remedy by sparsifying regularization on fully-connected layers of CNNs. Neurocomputing. 2019;328:69–74.

Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. Commun ACM. 2018;64(3):107–15.

Xu X, Jiang X, Ma C, Du P, Li X, Lv S, Yu L, Ni Q, Chen Y, Su J, et al. A deep learning system to screen novel coronavirus disease 2019 pneumonia. Engineering. 2020;6(10):1122–9.

Sharma K, Alsadoon A, Prasad P, Al-Dala’in T, Nguyen TQV, Pham DTH. A novel solution of using deep learning for left ventricle detection: enhanced feature extraction. Comput Methods Programs Biomed. 2020;197:105751.

Zhang G, Wang C, Xu B, Grosse R. Three mechanisms of weight decay regularization; 2018. arXiv preprint arXiv:1810.12281 .

Laurent C, Pereyra G, Brakel P, Zhang Y, Bengio Y. Batch normalized recurrent neural networks. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE; 2016. p. 2657–61.

Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.

Wang X, Qin Y, Wang Y, Xiang S, Chen H. ReLTanh: an activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing. 2019;363:88–98.

Tan HH, Lim KH. Vanishing gradient mitigation with deep learning neural network optimization. In: 2019 7th international conference on smart computing & communications (ICSCC). IEEE; 2019. p. 1–4.

MacDonald G, Godbout A, Gillcash B, Cairns S. Volume-preserving neural networks: a solution to the vanishing gradient problem; 2019. arXiv preprint arXiv:1911.09576 .

Mittal S, Vaishay S. A survey of techniques for optimizing deep learning on GPUs. J Syst Arch. 2019;99:101635.

Kanai S, Fujiwara Y, Iwamura S. Preventing gradient explosions in gated recurrent units. In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2017. p. 435–44.

Hanin B. Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in neural information processing systems. San Mateo: Morgan Kaufmann Publishers; 2018. p. 582–91.

Ribeiro AH, Tiels K, Aguirre LA, Schön T. Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness. In: International conference on artificial intelligence and statistics, PMLR; 2020. p. 2370–80.

D’Amour A, Heller K, Moldovan D, Adlam B, Alipanahi B, Beutel A, Chen C, Deaton J, Eisenstein J, Hoffman MD, et al. Underspecification presents challenges for credibility in modern machine learning; 2020. arXiv preprint arXiv:2011.03395 .

Chea P, Mandell JC. Current applications and future directions of deep learning in musculoskeletal radiology. Skelet Radiol. 2020;49(2):1–15.

Wu X, Sahoo D, Hoi SC. Recent advances in deep learning for object detection. Neurocomputing. 2020;396:39–64.

Kuutti S, Bowden R, Jin Y, Barber P, Fallah S. A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst. 2020;22:712–33.

Yolcu G, Oztel I, Kazan S, Oz C, Bunyak F. Deep learning-based face analysis system for monitoring customer interest. J Ambient Intell Humaniz Comput. 2020;11(1):237–48.

Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R. A survey of deep learning-based object detection. IEEE Access. 2019;7:128837–68.

Muhammad K, Khan S, Del Ser J, de Albuquerque VHC. Deep learning for multigrade brain tumor classification in smart healthcare systems: a prospective survey. IEEE Trans Neural Netw Learn Syst. 2020;32:507–22.

Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88.

Mukherjee D, Mondal R, Singh PK, Sarkar R, Bhattacharjee D. Ensemconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications. Multimed Tools Appl. 2020;79(41):31663–90.

Zeleznik R, Foldyna B, Eslami P, Weiss J, Alexander I, Taron J, Parmar C, Alvi RM, Banerji D, Uno M, et al. Deep convolutional neural networks to predict cardiovascular risk from computed tomography. Nature Commun. 2021;12(1):1–9.

Wang J, Liu Q, Xie H, Yang Z, Zhou H. Boosted efficientnet: detection of lymph node metastases in breast cancer using convolutional neural networks. Cancers. 2021;13(4):661.

Yu H, Yang LT, Zhang Q, Armstrong D, Deen MJ. Convolutional neural networks for medical image analysis: state-of-the-art, comparisons, improvement and perspectives. Neurocomputing. 2021. https://doi.org/10.1016/j.neucom.2020.04.157 .

Bharati S, Podder P, Mondal MRH. Hybrid deep learning for detecting lung diseases from X-ray images. Inform Med Unlocked. 2020;20:100391.

Dong Y, Pan Y, Zhang J, Xu W. Learning to read chest X-ray images from 16000+ examples using CNN. In: 2017 IEEE/ACM international conference on connected health: applications, systems and engineering technologies (CHASE). IEEE; 2017. p. 51–7.

Rajkomar A, Lingam S, Taylor AG, Blum M, Mongan J. High-throughput classification of radiographs using deep convolutional neural networks. J Digit Imaging. 2017;30(1):95–101.

Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Shpanskaya K, et al. Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning; 2017. arXiv preprint arXiv:1711.05225 .

Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2097–106.

Zuo W, Zhou F, Li Z, Wang L. Multi-resolution CNN and knowledge transfer for candidate classification in lung nodule detection. IEEE Access. 2019;7:32510–21.

Shen W, Zhou M, Yang F, Yang C, Tian J. Multi-scale convolutional neural networks for lung nodule classification. In: International conference on information processing in medical imaging. Springer; 2015. p. 588–99.

Li R, Zhang W, Suk HI, Wang L, Li J, Shen D, Ji S. Deep learning based imaging data completion for improved brain disease diagnosis. In: International conference on medical image computing and computer-assisted intervention. Springer; 2014. p. 305–12.

Wen J, Thibeau-Sutre E, Diaz-Melo M, Samper-González J, Routier A, Bottani S, Dormont D, Durrleman S, Burgos N, Colliot O, et al. Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Med Image Anal. 2020;63:101694.

Mehmood A, Maqsood M, Bashir M, Shuyuan Y. A deep siamese convolution neural network for multi-class classification of Alzheimer disease. Brain Sci. 2020;10(2):84.

Hosseini-Asl E, Ghazal M, Mahmoud A, Aslantas A, Shalaby A, Casanova M, Barnes G, Gimel’farb G, Keynton R, El-Baz A. Alzheimer’s disease diagnostics by a 3d deeply supervised adaptable convolutional network. Front Biosci. 2018;23:584–96.

Korolev S, Safiullin A, Belyaev M, Dodonova Y. Residual and plain convolutional neural networks for 3D brain MRI classification. In: 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017). IEEE; 2017. p. 835–8.

Alzubaidi L, Fadhel MA, Oleiwi SR, Al-Shamma O, Zhang J. DFU_QUTNet: diabetic foot ulcer classification using novel deep convolutional neural network. Multimed Tools Appl. 2020;79(21):15655–77.

Goyal M, Reeves ND, Davison AK, Rajbhandari S, Spragg J, Yap MH. Dfunet: convolutional neural networks for diabetic foot ulcer classification. IEEE Trans Emerg Topics Comput Intell. 2018;4(5):728–39.

Yap MH., Hachiuma R, Alavi A, Brungel R, Goyal M, Zhu H, Cassidy B, Ruckert J, Olshansky M, Huang X, et al. Deep learning in diabetic foot ulcers detection: a comprehensive evaluation; 2020. arXiv preprint arXiv:2010.03341 .

Tulloch J, Zamani R, Akrami M. Machine learning in the prevention, diagnosis and management of diabetic foot ulcers: a systematic review. IEEE Access. 2020;8:198977–9000.

Fadhel MA, Al-Shamma O, Alzubaidi L, Oleiwi SR. Real-time sickle cell anemia diagnosis based hardware accelerator. In: International conference on new trends in information and communications technology applications, Springer; 2020. p. 189–99.

Debelee TG, Kebede SR, Schwenker F, Shewarega ZM. Deep learning in selected cancers’ image analysis—a survey. J Imaging. 2020;6(11):121.

Khan S, Islam N, Jan Z, Din IU, Rodrigues JJC. A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn Lett. 2019;125:1–6.

Alzubaidi L, Hasan RI, Awad FH, Fadhel MA, Alshamma O, Zhang J. Multi-class breast cancer classification by a novel two-branch deep convolutional neural network architecture. In: 2019 12th international conference on developments in eSystems engineering (DeSE). IEEE; 2019. p. 268–73.

Roy K, Banik D, Bhattacharjee D, Nasipuri M. Patch-based system for classification of breast histology images using deep learning. Comput Med Imaging Gr. 2019;71:90–103.

Hameed Z, Zahia S, Garcia-Zapirain B, Javier Aguirre J, María Vanegas A. Breast cancer histopathology image classification using an ensemble of deep learning models. Sensors. 2020;20(16):4373.

Hosny KM, Kassem MA, Foaud MM. Skin cancer classification using deep learning and transfer learning. In: 2018 9th Cairo international biomedical engineering conference (CIBEC). IEEE; 2018. p. 90–3.

Dorj UO, Lee KK, Choi JY, Lee M. The skin cancer classification using deep convolutional neural network. Multimed Tools Appl. 2018;77(8):9909–24.

Kassem MA, Hosny KM, Fouad MM. Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning. IEEE Access. 2020;8:114822–32.

Heidari M, Mirniaharikandehei S, Khuzani AZ, Danala G, Qiu Y, Zheng B. Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. Int J Med Inform. 2020;144:104284.

Al-Timemy AH, Khushaba RN, Mosa ZM, Escudero J. An efficient mixture of deep and machine learning models for COVID-19 and tuberculosis detection using X-ray images in resource limited settings 2020. arXiv preprint arXiv:2007.08223 .

Abraham B, Nair MS. Computer-aided detection of COVID-19 from X-ray images using multi-CNN and Bayesnet classifier. Biocybern Biomed Eng. 2020;40(4):1436–45.

Nour M, Cömert Z, Polat K. A novel medical diagnosis model for COVID-19 infection detection based on deep features and Bayesian optimization. Appl Soft Comput. 2020;97:106580.

Mallio CA, Napolitano A, Castiello G, Giordano FM, D’Alessio P, Iozzino M, Sun Y, Angeletti S, Russano M, Santini D, et al. Deep learning algorithm trained with COVID-19 pneumonia also identifies immune checkpoint inhibitor therapy-related pneumonitis. Cancers. 2021;13(4):652.

Fourcade A, Khonsari R. Deep learning in medical image analysis: a third eye for doctors. J Stomatol Oral Maxillofac Surg. 2019;120(4):279–88.

Guo Z, Li X, Huang H, Guo N, Li Q. Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans Radiat Plasma Med Sci. 2019;3(2):162–9.

Thakur N, Yoon H, Chong Y. Current trends of artificial intelligence for colorectal cancer pathology image analysis: a systematic review. Cancers. 2020;12(7):1884.

Lundervold AS, Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik. 2019;29(2):102–27.

Yadav SS, Jadhav SM. Deep convolutional neural network based medical image classification for disease diagnosis. J Big Data. 2019;6(1):113.

Nehme E, Freedman D, Gordon R, Ferdman B, Weiss LE, Alalouf O, Naor T, Orange R, Michaeli T, Shechtman Y. DeepSTORM3D: dense 3D localization microscopy and PSF design by deep learning. Nat Methods. 2020;17(7):734–40.

Zulkifley MA, Abdani SR, Zulkifley NH. Pterygium-Net: a deep learning approach to pterygium detection and localization. Multimed Tools Appl. 2019;78(24):34563–84.

Sirazitdinov I, Kholiavchenko M, Mustafaev T, Yixuan Y, Kuleev R, Ibragimov B. Deep neural network ensemble for pneumonia localization from a large-scale chest X-ray database. Comput Electr Eng. 2019;78:388–99.

Zhao W, Shen L, Han B, Yang Y, Cheng K, Toesca DA, Koong AC, Chang DT, Xing L. Markerless pancreatic tumor target localization enabled by deep learning. Int J Radiat Oncol Biol Phys. 2019;105(2):432–9.

Roth HR, Lee CT, Shin HC, Seff A, Kim L, Yao J, Lu L, Summers RM. Anatomy-specific classification of medical images using deep convolutional nets. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI). IEEE; 2015. p. 101–4.

Shin HC, Orton MR, Collins DJ, Doran SJ, Leach MO. Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data. IEEE Trans Pattern Anal Mach Intell. 2012;35(8):1930–43.

Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z. CLU-CNNs: object detection for medical images. Neurocomputing. 2019;350:53–9.

Gao J, Jiang Q, Zhou B, Chen D. Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: an overview. Math Biosci Eng. 2019;16(6):6536.

Article   MathSciNet   Google Scholar  

Lumini A, Nanni L. Review fair comparison of skin detection approaches on publicly available datasets. Expert Syst Appl. 2020. https://doi.org/10.1016/j.eswa.2020.113677 .

Chouhan V, Singh SK, Khamparia A, Gupta D, Tiwari P, Moreira C, Damaševičius R, De Albuquerque VHC. A novel transfer learning based approach for pneumonia detection in chest X-ray images. Appl Sci. 2020;10(2):559.

Apostolopoulos ID, Mpesiana TA. COVID-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med. 2020;43(2):635–40.

Mahmud T, Rahman MA, Fattah SA. CovXNet: a multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization. Comput Biol Med. 2020;122:103869.

Tayarani-N MH. Applications of artificial intelligence in battling against COVID-19: a literature review. Chaos Solitons Fractals. 2020;142:110338.

Toraman S, Alakus TB, Turkoglu I. Convolutional capsnet: a novel artificial neural network approach to detect COVID-19 disease from X-ray images using capsule networks. Chaos Solitons Fractals. 2020;140:110122.

Dascalu A, David E. Skin cancer detection by deep learning and sound analysis algorithms: a prospective clinical study of an elementary dermoscope. EBioMedicine. 2019;43:107–13.

Adegun A, Viriri S. Deep learning techniques for skin lesion analysis and melanoma cancer detection: a survey of state-of-the-art. Artif Intell Rev. 2020;54:1–31.

Zhang N, Cai YX, Wang YY, Tian YT, Wang XL, Badami B. Skin cancer diagnosis based on optimized convolutional neural network. Artif Intell Med. 2020;102:101756.

Thurnhofer-Hemsi K, Domínguez E. A convolutional neural network framework for accurate skin cancer detection. Neural Process Lett. 2020. https://doi.org/10.1007/s11063-020-10364-y .

Jain MS, Massoud TF. Predicting tumour mutational burden from histopathological images using multiscale deep learning. Nat Mach Intell. 2020;2(6):356–62.

Lei H, Liu S, Elazab A, Lei B. Attention-guided multi-branch convolutional neural network for mitosis detection from histopathological images. IEEE J Biomed Health Inform. 2020;25(2):358–70.

Celik Y, Talo M, Yildirim O, Karabatak M, Acharya UR. Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images. Pattern Recogn Lett. 2020;133:232–9.

Sebai M, Wang X, Wang T. Maskmitosis: a deep learning framework for fully supervised, weakly supervised, and unsupervised mitosis detection in histopathology images. Med Biol Eng Comput. 2020;58:1603–23.

Sebai M, Wang T, Al-Fadhli SA. Partmitosis: a partially supervised deep learning framework for mitosis detection in breast cancer histopathology images. IEEE Access. 2020;8:45133–47.

Mahmood T, Arsalan M, Owais M, Lee MB, Park KR. Artificial intelligence-based mitosis detection in breast cancer histopathology images using faster R-CNN and deep CNNs. J Clin Med. 2020;9(3):749.

Srinidhi CL, Ciga O, Martel AL. Deep neural network models for computational histopathology: a survey. Med Image Anal. 2020;67:101813.

Cireşan DC, Giusti A, Gambardella LM, Schmidhuber J. Mitosis detection in breast cancer histology images with deep neural networks. In: International conference on medical image computing and computer-assisted intervention. Springer; 2013. p. 411–8.

Sirinukunwattana K, Raza SEA, Tsang YW, Snead DR, Cree IA, Rajpoot NM. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans Med Imaging. 2016;35(5):1196–206.

Xu J, Xiang L, Liu Q, Gilmore H, Wu J, Tang J, Madabhushi A. Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images. IEEE Trans Med Imaging. 2015;35(1):119–30.

Albarqouni S, Baur C, Achilles F, Belagiannis V, Demirci S, Navab N. Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans Med Imaging. 2016;35(5):1313–21.

Abd-Ellah MK, Awad AI, Khalaf AA, Hamed HF. Two-phase multi-model automatic brain tumour diagnosis system from magnetic resonance images using convolutional neural networks. EURASIP J Image Video Process. 2018;2018(1):97.

Thaha MM, Kumar KPM, Murugan B, Dhanasekeran S, Vijayakarthick P, Selvi AS. Brain tumor segmentation using convolutional neural networks in MRI images. J Med Syst. 2019;43(9):294.

Talo M, Yildirim O, Baloglu UB, Aydin G, Acharya UR. Convolutional neural networks for multi-class brain disease detection using MRI images. Comput Med Imaging Gr. 2019;78:101673.

Gabr RE, Coronado I, Robinson M, Sujit SJ, Datta S, Sun X, Allen WJ, Lublin FD, Wolinsky JS, Narayana PA. Brain and lesion segmentation in multiple sclerosis using fully convolutional neural networks: a large-scale study. Mult Scler J. 2020;26(10):1217–26.

Chen S, Ding C, Liu M. Dual-force convolutional neural networks for accurate brain tumor segmentation. Pattern Recogn. 2019;88:90–100.

Hu K, Gan Q, Zhang Y, Deng S, Xiao F, Huang W, Cao C, Gao X. Brain tumor segmentation using multi-cascaded convolutional neural networks and conditional random field. IEEE Access. 2019;7:92615–29.

Wadhwa A, Bhardwaj A, Verma VS. A review on brain tumor segmentation of MRI images. Magn Reson Imaging. 2019;61:247–59.

Akkus Z, Galimzianova A, Hoogi A, Rubin DL, Erickson BJ. Deep learning for brain MRI segmentation: state of the art and future directions. J Digit Imaging. 2017;30(4):449–59.

Moeskops P, Viergever MA, Mendrik AM, De Vries LS, Benders MJ, Išgum I. Automatic segmentation of MR brain images with a convolutional neural network. IEEE Trans Med Imaging. 2016;35(5):1252–61.

Milletari F, Navab N, Ahmadi SA. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV). IEEE; 2016. p. 565–71.

Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2015. p. 234–41.

Pereira S, Pinto A, Alves V, Silva CA. Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging. 2016;35(5):1240–51.

Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin PM, Larochelle H. Brain tumor segmentation with deep neural networks. Med Image Anal. 2017;35:18–31.

Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2017;40(4):834–48.

Yan Q, Wang B, Gong D, Luo C, Zhao W, Shen J, Shi Q, Jin S, Zhang L, You Z. COVID-19 chest CT image segmentation—a deep convolutional neural network solution; 2020. arXiv preprint arXiv:2004.10987 .

Wang G, Liu X, Li C, Xu Z, Ruan J, Zhu H, Meng T, Li K, Huang N, Zhang S. A noise-robust framework for automatic segmentation of COVID-19 pneumonia lesions from CT images. IEEE Trans Med Imaging. 2020;39(8):2653–63.

Khan SH, Sohail A, Khan A, Lee YS. Classification and region analysis of COVID-19 infection using lung CT images and deep convolutional neural networks; 2020. arXiv preprint arXiv:2009.08864 .

Shi F, Wang J, Shi J, Wu Z, Wang Q, Tang Z, He K, Shi Y, Shen D. Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19. IEEE Rev Biomed Eng. 2020;14:4–5.

Santamaría J, Rivero-Cejudo M, Martos-Fernández M, Roca F. An overview on the latest nature-inspired and metaheuristics-based image registration algorithms. Appl Sci. 2020;10(6):1928.

Santamaría J, Cordón O, Damas S. A comparative study of state-of-the-art evolutionary image registration methods for 3D modeling. Comput Vision Image Underst. 2011;115(9):1340–54.

Yumer ME, Mitra NJ. Learning semantic deformation flows with 3D convolutional networks. In: European conference on computer vision. Springer; 2016. p. 294–311.

Ding L, Feng C. Deepmapping: unsupervised map estimation from multiple point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2019. p. 8650–9.

Mahadevan S. Imagination machines: a new challenge for artificial intelligence. AAAI. 2018;2018:7988–93.

Wang L, Fang Y. Unsupervised 3D reconstruction from a single image via adversarial learning; 2017. arXiv preprint arXiv:1711.09312 .

Hermoza R, Sipiran I. 3D reconstruction of incomplete archaeological objects using a generative adversarial network. In: Proceedings of computer graphics international 2018. Association for Computing Machinery; 2018. p. 5–11.

Fu Y, Lei Y, Wang T, Curran WJ, Liu T, Yang X. Deep learning in medical image registration: a review. Phys Med Biol. 2020;65(20):20TR01.

Haskins G, Kruger U, Yan P. Deep learning in medical image registration: a survey. Mach Vision Appl. 2020;31(1):8.

de Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M, Išgum I. A deep learning framework for unsupervised affine and deformable image registration. Med Image Anal. 2019;52:128–43.

Yang X, Kwitt R, Styner M, Niethammer M. Quicksilver: fast predictive image registration—a deep learning approach. NeuroImage. 2017;158:378–96.

Miao S, Wang ZJ, Liao R. A CNN regression approach for real-time 2D/3D registration. IEEE Trans Med Imaging. 2016;35(5):1352–63.

Li P, Pei Y, Guo Y, Ma G, Xu T, Zha H. Non-rigid 2D–3D registration using convolutional autoencoders. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI). IEEE; 2020. p. 700–4.

Zhang J, Yeung SH, Shu Y, He B, Wang W. Efficient memory management for GPU-based deep learning systems; 2019. arXiv preprint arXiv:1903.06631 .

Zhao H, Han Z, Yang Z, Zhang Q, Yang F, Zhou L, Yang M, Lau FC, Wang Y, Xiong Y, et al. Hived: sharing a {GPU} cluster for deep learning with guarantees. In: 14th {USENIX} symposium on operating systems design and implementation ({OSDI} 20); 2020. p. 515–32.

Lin Y, Jiang Z, Gu J, Li W, Dhar S, Ren H, Khailany B, Pan DZ. DREAMPlace: deep learning toolkit-enabled GPU acceleration for modern VLSI placement. IEEE Trans Comput Aided Des Integr Circuits Syst. 2020;40:748–61.

Hossain S, Lee DJ. Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-based embedded devices. Sensors. 2019;19(15):3371.

Castro FM, Guil N, Marín-Jiménez MJ, Pérez-Serrano J, Ujaldón M. Energy-based tuning of convolutional neural networks on multi-GPUs. Concurr Comput Pract Exp. 2019;31(21):4786.

Gschwend D. Zynqnet: an fpga-accelerated embedded convolutional neural network; 2020. arXiv preprint arXiv:2005.06892 .

Zhang N, Wei X, Chen H, Liu W. FPGA implementation for CNN-based optical remote sensing object detection. Electronics. 2021;10(3):282.

Zhao M, Hu C, Wei F, Wang K, Wang C, Jiang Y. Real-time underwater image recognition with FPGA embedded system for convolutional neural network. Sensors. 2019;19(2):350.

Liu X, Yang J, Zou C, Chen Q, Yan X, Chen Y, Cai C. Collaborative edge computing with FPGA-based CNN accelerators for energy-efficient and time-aware face tracking system. IEEE Trans Comput Soc Syst. 2021. https://doi.org/10.1109/TCSS.2021.3059318 .

Hossin M, Sulaiman M. A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process. 2015;5(2):1.

Provost F, Domingos P. Tree induction for probability-based ranking. Mach Learn. 2003;52(3):199–215.

Rakotomamonyj A. Optimizing area under roc with SVMS. In: Proceedings of the European conference on artificial intelligence workshop on ROC curve and artificial intelligence (ROCAI 2004), 2004. p. 71–80.

Mingote V, Miguel A, Ortega A, Lleida E. Optimization of the area under the roc curve using neural network supervectors for text-dependent speaker verification. Comput Speech Lang. 2020;63:101078.

Fawcett T. An introduction to roc analysis. Pattern Recogn Lett. 2006;27(8):861–74.

Huang J, Ling CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng. 2005;17(3):299–310.

Hand DJ, Till RJ. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn. 2001;45(2):171–86.

Masoudnia S, Mersa O, Araabi BN, Vahabie AH, Sadeghi MA, Ahmadabadi MN. Multi-representational learning for offline signature verification using multi-loss snapshot ensemble of CNNs. Expert Syst Appl. 2019;133:317–30.

Coupé P, Mansencal B, Clément M, Giraud R, de Senneville BD, Ta VT, Lepetit V, Manjon JV. Assemblynet: a large ensemble of CNNs for 3D whole brain MRI segmentation. NeuroImage. 2020;219:117026.

Download references

Acknowledgements

We would like to thank the professors from the Queensland University of Technology and the University of Information Technology and Communications who gave their feedback on the paper.

This research received no external funding.

Author information

Authors and affiliations.

School of Computer Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia

Laith Alzubaidi & Jinglan Zhang

Control and Systems Engineering Department, University of Technology, Baghdad, 10001, Iraq

Amjad J. Humaidi

Electrical Engineering Technical College, Middle Technical University, Baghdad, 10001, Iraq

Ayad Al-Dujaili

Faculty of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO, 65211, USA

Ye Duan & Muthana Al-Amidie

AlNidhal Campus, University of Information Technology & Communications, Baghdad, 10001, Iraq

Laith Alzubaidi & Omran Al-Shamma

Department of Computer Science, University of Jaén, 23071, Jaén, Spain

J. Santamaría

College of Computer Science and Information Technology, University of Sumer, Thi Qar, 64005, Iraq

Mohammed A. Fadhel

School of Engineering, Manchester Metropolitan University, Manchester, M1 5GD, UK

Laith Farhan

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: LA, and JZ; methodology: LA, JZ, and JS; software: LA, and MAF; validation: LA, JZ, MA, and LF; formal analysis: LA, JZ, YD, and JS; investigation: LA, and JZ; resources: LA, JZ, and MAF; data curation: LA, and OA.; writing–original draft preparation: LA, and OA; writing—review and editing: LA, JZ, AJH, AA, YD, OA, JS, MAF, MA, and LF; visualization: LA, and MAF; supervision: JZ, and YD; project administration: JZ, YD, and JS; funding acquisition: LA, AJH, AA, and YD. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Laith Alzubaidi .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Alzubaidi, L., Zhang, J., Humaidi, A.J. et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8 , 53 (2021). https://doi.org/10.1186/s40537-021-00444-8

Download citation

Received : 21 January 2021

Accepted : 22 March 2021

Published : 31 March 2021

DOI : https://doi.org/10.1186/s40537-021-00444-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Machine learning
  • Convolution neural network (CNN)
  • Deep neural network architectures
  • Deep learning applications
  • Image classification
  • Medical image analysis
  • Supervised learning

research papers on architecture

Brown University Homepage

URBN1100 Investigating the City: Hands-on Research Methods for Urban Analysis

  • Ebooks, Print Books, Interlibrary Loan
  • Secondary Sources & Data
  • Rhode Island Resources
  • Images and Plans

Citing art and architecture papers

Preserving web links, citation managers and more, help with writing papers, your librarian.

Profile Photo

The format used for art papers is usually either MLA or Chicago Manual of Style. For examples and explanations, below are some useful resources. For Brown Library books that can be checked out or read online, use this subject heading: "Bibliographical citations ."

  • Chicago Manual of Style Online Book The full, searchable digital version of this resource.
  • Chicago Manual of Style Quick Guide Easy to use guide for looking up bibliography and footnote formats.
  • Chicago Manual of style Examples for Every Format in Art History Created by Duke University Libraries.
  • Chicago Manual of Style Format from The Owl A comprehensive resource to the Chicago Style from Purdue University's Online Writing Lab.

Licensed for Brown

Access to ninth edition of the MLA Handbook online.

  • MLA Format from The OWL A comprehensive resource to the MLA Style from Purdue University's Online Writing Lab.
  • Citing Images Clear examples for MLA, Chicago, APA from Colgate University Visual Resources Library. Includes examples from various sources and different types of artworks plus architecture.
  • Citing / Documenting Images From Bates College Library. The basics of image citation using Chicago Manual of Style.
  • Crediting a Photo Used Online Some helpful tips.
  • Audiovisual Citation Project British Universities Film & Video Council guidelines for the citation of moving image and sound.

Postcard view of Rhode Island State House and canal

Rhode Island State House and Canal. Postcard, 1908. Public domain. Luna Collection.

  • Perma.cc Perma.CC is a service maintained by the Harvard Law Library. Its purpose is to facilitate sustainable citation of web pages and web documents by archiving an image of a web page and/or web document in the case that one day the site URL is no longer accessible (i.e., link rot). Images are stored and assigned an indentifier and URL for citation in a footnote/references/works cited. Features include the ability to upload a CSV of URLs and the automatic creation of permanent links. Contact [email protected] to request a Brown University Perma.CC account through the University Library, offering users the ability to create an unlimited amount of permanent links.
  • Wayback Machine Maintained by the non-profit Internet Archive, users can enter a URL in bar labeled 'Save Page Now' (if website allows crawling).

Brown supports several citation managers, including EndNote, Mendeley, and Zotero. Please see the page below for descriptions. links, and comparisons of the software.

  • Citation Management Information on citation, citation management software, plagiarism, copyright, fair use, and creative commons

Your  librarian can help you with researching your paper, but what about the actual writing process?

For Brown Library books that can be checked out or read online, use this subject heading:

Art Criticism -- Authorship .

  • Research Support This guide discusses some of the basics of doing college level research, including tips for evaluating sources and a glossary of terms with examples. It also offers tutorials, a library orientation, and much more.
  • Developing a Thesis for a Research Paper This very helpful guide from the University of North Carolina at Chapel Hill Writing Center discusses the art of crafting a thesis.
  • Harriet W. Sheridan Center for Teaching and Learning (Writing Center) Writing Center associates assist students with all stages of the writing process, from finding a topic through drafting, revising, and final editing.

Postcard view showing streets with trolley tracks and wagons

Olneyville Square, Providence. Postcard, 1910. Public domain. Luna Collection.

  • << Previous: Images and Plans
  • Last Updated: Sep 14, 2024 5:20 PM
  • URL: https://libguides.brown.edu/InvestigatingCity

moBUL - Mobile Brown University Library

Brown University Library  |  Providence, RI 02912  |  (401) 863-2165  |  Contact  |  Comments  |  Library Feedback  |  Site Map

Library Intranet

Find Info For

  • Become a Student
  • Current Students
  • Research and Partnerships

Quick Links

Taram's Novel Research Wins Multiple Prestigious Awards in Hardware Security

  • Communications
  • CS E-News & Publications
  • Subscribe to CS E-News
  • Virtual Tour
  • Brand Guide

Assistant Professor Mohammadkazem Taram

Assistant Professor Mohammadkazem Taram

Taram Awarded IEEE Micro Top Picks, distinguished paper at ASPLOS conference and honorable mention from Intel's Hardware Security Academic Award Program

Assistant Professor Mohammadkazem Taram’s paper in collaboration with co-authors from UT Austin, UC San Diego, Intel Labs, Google, Fastly, and Rivos , " Going Beyond the Limits of SFI: Flexible and Secure Hardware-Assisted In-Process Isolation with HFI ," has garnered significant recognition in the field of computer architecture and hardware security for its groundbreaking impact. 

“The beauty of HFI is that it drastically reduces the overhead of Sandboxing without the need to drastically change the processor architecture," said Taram."It brings support for in-process isolation to modern processors through a simple, non-intrusive, yet effective and efficient architectural extension. ” 

This paper presents Hardware-assisted Fault Isolation (HFI), a straightforward upgrade to current processors that enhances security, flexibility, and efficiency for isolating processes. HFI overcomes the drawbacks of existing software-based isolation (SFI) methods, such as high runtime costs, difficulty scaling, vulnerability to Spectre attacks, and compatibility issues with existing code. HFI can easily work with current SFI systems like WebAssembly or directly isolate native programs without requiring any changes.

Published at Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2023, this paper won a distinguished paper award. ASPLOS is an eminent academic forum for multidisciplinary computer systems research spanning hardware, software, and their interaction.

Taram’s paper also received an honorable mention from Intel's Hardware Security Academic Award program. Out of 60 nominations from 240 authors across 100 institutions and 17 countries, only three papers were recognized, one winner and two honorable mentions. 

Most recently, this research was selected for the Institute of Electrical and Electronics Engineers (IEEE) Micro Top Picks from the Computer Architecture Conferences. This prestigious publication collects some of the most significant research papers in computer architecture based on novelty and potential for long-term impact.

Taram's work on HFI advances computer architecture and hardware security by addressing the inherent limitations of traditional software-based isolation methods. HFI paves the way for more secure, efficient, and scalable processor designs. The widespread recognition from prestigious conferences and institutions underscores the far-reaching impact of this research. 

About the Department of Computer Science at Purdue University

Department of Computer Science, 305 N. University Street, West Lafayette, IN 47907

Purdue University Indianapolis, 723 W. Michigan St., Indianapolis, IN 46202

Phone: (765) 494-6010 • Fax: (765) 494-0739

Copyright © 2024 Purdue University | An equal access/equal opportunity university | Copyright Complaints | DOE Degree Scorecards

Trouble with this page? Accessibility issues ? Please contact the College of Science .

IMAGES

  1. Architecture and Environment Paper Free Essay Example

    research papers on architecture

  2. Architecture Thesis Report

    research papers on architecture

  3. (PDF) Architecture Pedagogy -An Overview

    research papers on architecture

  4. (PDF) Successful thesis proposals in architecture and urban planning

    research papers on architecture

  5. (PDF) Contemporary architecture in a historical context

    research papers on architecture

  6. How to write a research paper on architecture

    research papers on architecture

VIDEO

  1. Factors affecting Architecture

  2. Welcome to the Aga Khan Award for Architecture

  3. Yale School of Architecture: "Recent Work"

  4. Ship of Theseus

  5. MA in Contemporary Japanese Architecture

  6. Ancient Korean Architecture in Context

COMMENTS

  1. Frontiers of Architectural Research

    Frontiers of Architectural Research is an international journal that publishes original research papers, review articles, and case studies to promote rapid communication and exchange among scholars, architects, and engineers. This journal introduces and reviews significant and pioneering achievements in the field of architecture research.

  2. Journal of Architectural and Planning Research

    1970-1973 •. Architectural Research and Teaching. The Journal of Architectural and Planning Research is the major international interdisciplinary resource for professionals and scholars in architecture, design, and planning. Reporting internationally both recent research findings and innovative new practices, JAPR provides a link between ...

  3. Most Downloaded Frontiers of Architectural Research Articles

    The interaction of history and modern thought in the creation of Iran's architecture by investigating the approaches of past-oriented architecture. June 2024. Mohsen Kamali. The relationship between tradition and modernity significantly influences society, culture, and architectural discourse.

  4. The Journal of Architecture

    The Journal of Architecture is published by Routledge, an imprint of Taylor & Francis, for the Royal Institute of British Architects (RIBA).. Since its launch in 1996, The Journal of Architecture has become widely recognised as one of the foremost peer-reviewed architecture journals internationally. It is published eight times a year, comprising both guest-edited special issues, as well as ...

  5. (PDF) Architectural design research: Drivers of practice

    Architectural design research is understood as practice-led research centered on architectural design practice and design thinking [25]. The focus of this study is on the utilization of housing ...

  6. Biophilic design in architecture and its contributions to health, well

    In this review, we adopt diverse searching, screening, and selecting methods. The key terms 'biophilia', 'biophilic design', 'biophilic architecture', and 'biophilic building' are used in the initial search for papers (Fig. 2).Three databases are considered: Scopus, Web of Science, and Google Scholar.

  7. arq: Architectural Research Quarterly

    ISSN: 1359-1355 (Print), 1474-0516 (Online) Editor: Adam Sharr Newcastle University, UK. Editorial board. arq publishes cutting-edge work covering all aspects of architectural endeavour. Contents include building design, urbanism, history, theory, environmental design, construction, materials, information technology, and practice.

  8. The Journal of Architecture: Vol 29, No 3 (Current issue)

    The Routledge Handbook of Architecture, Urban Space and Politics, Volume I: Violence, Spectacle and Data. Book Edited by Nikolina Bobic and Farzaneh Haghighi Routledge, 2023 ISBN 9780367629175 £190/$200, hardback, pp. 630, with illustrations. Maja Babić.

  9. ARENA Journal of Architectural Research

    AJAR is an online Open Access peer-reviewed journal for all kinds of design research and scholarly research within the architectural field, and has been set up by the Architectural Research European Network Association (ARENA) network. It welcomes the submission of essays by doctoral students and younger researchers as well as by established architects and academics. Content for the journal is ...

  10. Studies on sustainable features of vernacular architecture in different

    Fig. 2 presents the increased trend of international studies on vernacular architecture (127 studies indexed by SciVerse Scopus of Elsevier and Google scholar that the authors could obtain) within the last three decades. It was observed that the annual number of studies has shown a sharply increased trend since the year 2007. This indicates a greater interest on sustainable features of ...

  11. Architectural Research for Sustainable Environmental Design

    Simos Yannas Architectural Research for Sustainable Environmental Design ENHSA Conference October 2013. 6. is applicable to all building types and built forms in all inhabited locations and ...

  12. The Architect-Researcher: Exploring New Possibilities for the

    Architectural Research versus Research Through Practice. Jeremy Till's canonical paper commissioned by RIBA, Architectural Research: Three Myths and One Model, argues that "architecture is a form ...

  13. Architectural design research: Drivers of practice

    Output-driven research in architecture. Archer (Citation 1995) states that research is 'a systematic inquiry whose goal is communicable knowledge', which has become a widely accepted definition of research (Fraser Citation 2013).Architects produce knowledge through design ideas and practice (Fraser Citation 2013), with architectural design research increasingly expected to form part of the ...

  14. 18 Useful Research Resources for Architects Online

    7) Archnet. Archnet focuses on the built environment and iconic buildings of the Islamic World, and provides a wide range of documentation: images, drawings, publications, seminar proceedings ...

  15. Archnet-IJAR: International Journal of Architectural Research

    This journal is part of our Property management & built environment collection. Explore our Property management & built environment subject area to find out more. Archnet-IJAR is an interdisciplinary scholarly journal of architecture, urban design and planning, and built environment studies.

  16. Senses of place: architectural design for the multisensory mind

    Traditionally, architectural practice has been dominated by the eye/sight. In recent decades, though, architects and designers have increasingly started to consider the other senses, namely sound, touch (including proprioception, kinesthesis, and the vestibular sense), smell, and on rare occasions, even taste in their work. As yet, there has been little recognition of the growing understanding ...

  17. Architectural Research

    An international call for papers was sent out in 2022 and 296 of more than 750 submissions from 77 countries have been ... The following discusses the role and state of research in architecture ...

  18. Green Architecture: A Concept of Sustainability

    The reason for this popularity is to perform the sustainable development. The Concept of Green Architecture, also known as "sustainable architecture" or "green building," is the theory, science and style of buildings designed and constructed in accordance with environmentally friendly principles. Green architecture strives to minimize ...

  19. Architecture Research Papers

    Shaping the City of Tomorrow in East Asia: Concepts, Schemes and Ideas for Urban Development from 1960s to 2010, and Beyond. The paper attempts to outline the urban visions and architectural ideas and vocabulary behind the formation of the large urban conglomeration in Japan, South Korea and China, and how the seeds of Western planning theories ...

  20. Research on Zhe-Style Dwelling Houses from the Perspective of

    2.2 Identification of Zhe-Style Architecture Culture. In terms of house layout, traditional dwellings exhibit a symmetrical arrangement influenced by feudal etiquette (Yu et al., 2016).Among them, the most distinctive feature is the central courtyard, around which the layout takes the form of "Zhe-style " character shape, "Sun" character shape, or "H" shape.

  21. Review of deep learning: concepts, CNN architectures, challenges

    We have reviewed the significant research papers in the field published during 2010-2020, mainly from the years of 2020 and 2019 with some papers from 2021. The main focus was papers from the most reputed publishers such as IEEE, Elsevier, MDPI, Nature, ACM, and Springer. ... This architecture incorporates filters of different sizes (\(5 ...

  22. Citing & Writing

    Information about maintaining citations for art papers. Perma.CC is a service maintained by the Harvard Law Library. Its purpose is to facilitate sustainable citation of web pages and web documents by archiving an image of a web page and/or web document in the case that one day the site URL is no longer accessible (i.e., link rot).

  23. Taram's Novel Research Wins Multiple Prestigious Awards in Hardware

    This prestigious publication collects some of the most significant research papers in computer architecture based on novelty and potential for long-term impact. Taram's work on HFI advances computer architecture and hardware security by addressing the inherent limitations of traditional software-based isolation methods. HFI paves the way for ...