How-to conduct a systematic literature review: A quick guide for computer science research

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Associated Data

No data was used for the research described in the article.

Abstract

Performing a literature review is a critical first step in research to understanding the state-of-the-art and identifying gaps and challenges in the field. A systematic literature review is a method which sets out a series of steps to methodically organize the review. In this paper, we present a guide designed for researchers and in particular early-stage researchers in the computer-science field. The contribution of the article is the following:

Clearly defined strategies to follow for a systematic literature review in computer science research, and

Algorithmic method to tackle a systematic literature review.

Keywords: Systematic literature reviews, literature reviews, research methodology, computer science, doctoral studies

Graphical abstract

Image, graphical abstract

Method details

Overview

A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12]. An SLR updates the reader with current literature about a subject [6]. The goal is to review critical points of current knowledge on a topic about research questions to suggest areas for further examination [5]. Defining an “Initial Idea” or interest in a subject to be studied is the first step before starting the SLR. An early search of the relevant literature can help determine whether the topic is too broad to adequately cover in the time frame and whether it is necessary to narrow the focus. Reading some articles can assist in setting the direction for a formal review., and formulating a potential research question (e.g., how is semantics involved in Industry 4.0?) can further facilitate this process. Once the focus has been established, an SLR can be undertaken to find more specific studies related to the variables in this question. Although there are multiple approaches for performing an SLR ([5], [26], [27]), this work aims to provide a step-by-step and practical guide while citing useful examples for computer-science research. The methodology presented in this paper comprises two main phases: “Planning” described in section 2, and “Conducting” described in section 3, following the depiction of the graphical abstract.

Planning

Defining the protocol is the first step of an SLR since it describes the procedures involved in the review and acts as a log of the activities to be performed. Obtaining opinions from peers while developing the protocol, is encouraged to ensure the review's consistency and validity, and helps identify when modifications are necessary [20]. One final goal of the protocol is to ensure the replicability of the review.

Define PICOC and synonyms

The PICOC (Population, Intervention, Comparison, Outcome, and Context) criteria break down the SLR's objectives into searchable keywords and help formulate research questions [27]. PICOC is widely used in the medical and social sciences fields to encourage researchers to consider the components of the research questions [14]. Kitchenham & Charters [6] compiled the list of PICOC elements and their corresponding terms in computer science, as presented in Table 1 , which includes keywords derived from the PICOC elements. From that point on, it is essential to think of synonyms or “alike” terms that later can be used for building queries in the selected digital libraries. For instance, the keyword “context awareness” can also be linked to “context-aware”.

Table 1

Planning Step 1 “Defining PICOC keywords and synonyms”.

Description	Example (PICOC)	Example (Synonyms)
Population	Can be a specific role, an application area, or an industry domain.	Smart Manufacturing	• Digital Factory • Digital Manufacturing • Smart Factory
Intervention	The methodology, tool, or technology that addresses a specific issue.	Semantic Web	• Ontology • Semantic Reasoning
Comparison	The methodology, tool, or technology in which the Intervention is being compared (if appropriate).	Machine Learning	• Supervised Learning • Unsupervised Learning
Outcome	Factors of importance to practitioners and/or the results that Intervention could produce.	Context-Awareness	• Context-Aware • Context-Reasoning
Context	The context in which the comparison takes place. Some systematic reviews might choose to exclude this element.	Business Process Management	• BPM • Business Process Modeling

Formulate research questions

Clearly defined research question(s) are the key elements which set the focus for study identification and data extraction [21]. These questions are formulated based on the PICOC criteria as presented in the example in Table 2 (PICOC keywords are underlined).

Table 2

Research questions examples.

Research Questions examples
• RQ1: What are the current challenges of context-aware systems that support the decision-making of business processes in smart manufacturing ? • RQ2: Which technique is most appropriate to support decision-making for business process management in smart factories ? • RQ3: In which scenarios are semantic web and machine learning used to provide context-awareness in business process management for smart manufacturing ?

Research Questions examples

• RQ1: What are the current challenges of context-aware systems that support the decision-making of business processes in smart manufacturing ?
• RQ2: Which technique is most appropriate to support decision-making for business process management in smart factories ?
• RQ3: In which scenarios are semantic web and machine learning used to provide context-awareness in business process management for smart manufacturing ?

Select digital library sources

The validity of a study will depend on the proper selection of a database since it must adequately cover the area under investigation [19]. The Web of Science (WoS) is an international and multidisciplinary tool for accessing literature in science, technology, biomedicine, and other disciplines. Scopus is a database that today indexes 40,562 peer-reviewed journals, compared to 24,831 for WoS. Thus, Scopus is currently the largest existing multidisciplinary database. However, it may also be necessary to include sources relevant to computer science, such as EI Compendex, IEEE Xplore, and ACM. Table 3 compares the area of expertise of a selection of databases.

Table 3

Planning Step 3 “Select digital libraries”. Description of digital libraries in computer science and software engineering.

Database	Description	URL	Area	Advanced Search Y/N
Scopus	From Elsevier. sOne of the largest databases. Very user-friendly interface	http://www.scopus.com	Interdisciplinary	Y
Web of Science	From Clarivate. Multidisciplinary database with wide ranging content.	https://www.webofscience.com/	Interdisciplinary	Y
EI Compendex	From Elsevier. Focused on engineering literature.	http://www.engineeringvillage.com	Engineering	Y (Query view not available)
IEEE Digital Library	Contains scientific and technical articles published by IEEE and its publishing partners.	http://ieeexplore.ieee.org	Engineering and Technology	Y
ACM Digital Library	Complete collection of ACM publications.	https://dl.acm.org/	Computing and information technology	Y

Define inclusion and exclusion criteria

Authors should define the inclusion and exclusion criteria before conducting the review to prevent bias, although these can be adjusted later, if necessary. The selection of primary studies will depend on these criteria. Articles are included or excluded in this first selection based on abstract and primary bibliographic data. When unsure, the article is skimmed to further decide the relevance for the review. Table 4 sets out some criteria types with descriptions and examples.

Table 4

Planning Step 4 “Define inclusion and exclusion criteria”. Examples of criteria type.

Criteria Type	Description	Example
Period	Articles can be selected based on the time period to review, e.g., reviewing the technology under study from the year it emerged, or reviewing progress in the field since the publication of a prior literature review.	*Inclusion: From 2015 to 2021 Exclusion*: Articles prior 2015
Language	Articles can be excluded based on language.	*Exclusion*: Articles not in English
Type of Literature	Articles can be excluded if they are fall into the category of grey literature.	*Exclusion*: Reports, policy literature, working papers, newsletters, government documents, speeches
Type of source	Articles can be included or excluded by the type of origin, i.e., conference or journal articles or books.	*Inclusion: Articles from Conferences or Journals Exclusion*: Articles from books
Impact Source	Articles can be excluded if the author limits the impact factor or quartile of the source.	*Inclusion* Articles from Q1, and Q2 sources *Exclusion*: Articles with a Journal Impact Score (JIS) lower than x
Accessibility	Not accessible in specific databases.	*Exclusion*: Not accessible
Relevance to research questions	Articles can be excluded if they are not relevant to a particular question or to “n” number of research questions.	*Exclusion* Not relevant to at least 2 research questions

Define the Quality Assessment (QA) checklist

Assessing the quality of an article requires an artifact which describes how to perform a detailed assessment. A typical quality assessment is a checklist that contains multiple factors to evaluate. A numerical scale is used to assess the criteria and quantify the QA [22]. Zhou et al. [25] presented a detailed description of assessment criteria in software engineering, classified into four main aspects of study quality: Reporting, Rigor, Credibility, and Relevance. Each of these criteria can be evaluated using, for instance, a Likert-type scale [17], as shown in Table 5 . It is essential to select the same scale for all criteria established on the quality assessment.

Table 5

Planning Step 5 “Define QA assessment checklist”. Examples of QA scales and questions.

*Example 1:* Do the researchers discuss any problems (limitations, threats) with the validity of their results (reliability)?	Level of Participation 1 – No, and not considered (Score: 0) 2 – Partially (Score: 0.5) 3 – Yes (Score: 1)
*Example 2:* Is there a clear definition/ description/ statement of the aims/ goals/ purposes/ motivations/ objectives/ questions of the research?	Level of agreement 1 – Disagree (Score: 1) 2 – Somewhat disagree (Score: 2) 3 – Neither agree nor disagree (Score: 3) 4 – Somewhat agree (Score: 4) 5 – Agree (Score: 5)

Define the “Data Extraction” form

The data extraction form represents the information necessary to answer the research questions established for the review. Synthesizing the articles is a crucial step when conducting research. Ramesh et al. [15] presented a classification scheme for computer science research, based on topics, research methods, and levels of analysis that can be used to categorize the articles selected. Classification methods and fields to consider when conducting a review are presented in Table 6 .

Table 6

Planning Step 6 “Define data extraction form”. Examples of fields.

Classification and fields to consider for data extraction	Description and examples
Research type	• Theoretical research focuses on abstract ideas, concepts, and theories built on literature reviews [9]. • Empirical research uses scientific data or case studies for explorative, descriptive, explanatory, or measurable findings [9]. Example: [1] an SLR on context-awareness for S-PSS and categorized the articles in theoretical and empirical research.
By process phases, stages	When analyzing a process or series of processes, an effective way to structure the data is to find a well-established framework of reference or architecture. Examples: • [8] an SLR on self-adaptive systems uses the MAPE-K model to understand how the authors tackle each module stage. • [13] presented a context-awareness survey using the stages of context-aware lifecycle to review different methods.
By technology, framework, or platform	When analyzing a computer science topic, it is important to know the technology currently employed to understand trends, benefits, or limitations. Example: • [3] an SLR on the big data ecosystem in the manufacturing field that includes frameworks, tools, and platforms for each stage of the big data ecosystem.
By application field and/or industry domain	If the review is not limited to a specific “Context” or “Population" (industry domain), it can be useful to identify the field of application Example: • [23] an SLR on adaptive training using virtual reality (VR). The review presents an extensive description of multiple application domains and examines related work.
Gaps and challenges	Identifying gaps and challenges is important in reviews to determine the research needs and further establish research directions that can help scholars act on the topic.
Findings in research	Research in computer science can deliver multiple types of findings, e.g.: Framework, algorithm, methodology, data model, development approach.
Evaluation method	Case studies, experiments, surveys, mathematical demonstrations, and performance indicators.

The data extraction must be relevant to the research questions, and the relationship to each of the questions should be included in the form. Kitchenham & Charters [6] presented more pertinent data that can be captured, such as conclusions, recommendations, strengths, and weaknesses. Although the data extraction form can be updated if more information is needed, this should be treated with caution since it can be time-consuming. It can therefore be helpful to first have a general background in the research topic to determine better data extraction criteria.

Conducting

After defining the protocol, conducting the review requires following each of the steps previously described. Using tools can help simplify the performance of this task. Standard tools such as Excel or Google sheets allow multiple researchers to work collaboratively. Another online tool specifically designed for performing SLRs is Parsif.al 1 . This tool allows researchers, especially in the context of software engineering, to define goals and objectives, import articles using BibTeX files, eliminate duplicates, define selection criteria, and generate reports.

Build digital library search strings

Search strings are built considering the PICOC elements and synonyms to execute the search in each database library. A search string should separate the synonyms with the boolean operator OR. In comparison, the PICOC elements are separated with parentheses and the boolean operator AND. An example is presented next:

(“Smart Manufacturing” OR “Digital Manufacturing” OR “Smart Factory”) AND (“Business Process Management” OR “BPEL” OR “BPM” OR “BPMN”) AND (“Semantic Web” OR “Ontology” OR “Semantic” OR “Semantic Web Service”) AND (“Framework” OR “Extension” OR “Plugin” OR “Tool”

Gather studies

Databases that feature advanced searches enable researchers to perform search queries based on titles, abstracts, and keywords, as well as for years or areas of research. Fig. 1 presents the example of an advanced search in Scopus, using titles, abstracts, and keywords (TITLE-ABS-KEY). Most of the databases allow the use of logical operators (i.e., AND, OR). In the example, the search is for “BIG DATA” and “USER EXPERIENCE” or “UX” as a synonym.

Fig 1

Example of Advanced search on Scopus.

In general, bibliometric data of articles can be exported from the databases as a comma-separated-value file (CSV) or BibTeX file, which is helpful for data extraction and quantitative and qualitative analysis. In addition, researchers should take advantage of reference-management software such as Zotero, Mendeley, Endnote, or Jabref, which import bibliographic information onto the software easily.

Study Selection and Refinement

The first step in this stage is to identify any duplicates that appear in the different searches in the selected databases. Some automatic procedures, tools like Excel formulas, or programming languages (i.e., Python) can be convenient here.

In the second step, articles are included or excluded according to the selection criteria, mainly by reading titles and abstracts. Finally, the quality is assessed using the predefined scale. Fig. 2 shows an example of an article QA evaluation in Parsif.al, using a simple scale. In this scenario, the scoring procedure is the following YES= 1, PARTIALLY= 0.5, and NO or UNKNOWN = 0. A cut-off score should be defined to filter those articles that do not pass the QA. The QA will require a light review of the full text of the article.