1
|
Shemilt I, Arno A, Thomas J, Lorenc T, Khouja C, Raine G, Sutcliffe K, Preethy D, Kwan I, Wright K, Sowden A. Cost-effectiveness of Microsoft Academic Graph with machine learning for automated study identification in a living map of coronavirus disease 2019 (COVID-19) research. Wellcome Open Res 2024; 6:210. [PMID: 38686019 PMCID: PMC11056680 DOI: 10.12688/wellcomeopenres.17141.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2024] [Indexed: 05/02/2024] Open
Abstract
Background Identifying new, eligible studies for integration into living systematic reviews and maps usually relies on conventional Boolean updating searches of multiple databases and manual processing of the updated results. Automated searches of one, comprehensive, continuously updated source, with adjunctive machine learning, could enable more efficient searching, selection and prioritisation workflows for updating (living) reviews and maps, though research is needed to establish this. Microsoft Academic Graph (MAG) is a potentially comprehensive single source which also contains metadata that can be used in machine learning to help efficiently identify eligible studies. This study sought to establish whether: (a) MAG was a sufficiently sensitive single source to maintain our living map of COVID-19 research; and (b) eligible records could be identified with an acceptably high level of specificity. Methods We conducted an eight-arm cost-effectiveness analysis to assess the costs, recall and precision of semi-automated workflows, incorporating MAG with adjunctive machine learning, for continually updating our living map. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Our systematic review software, EPPI-Reviewer, was adapted to incorporate MAG and associated machine learning workflows, and also used to collect data on recall, precision, and manual screening workload. Results The semi-automated MAG-enabled workflow dominated conventional workflows in both the base case and sensitivity analyses. At one month our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified 469 additional, eligible articles for inclusion in our living map, and cost £3,179 GBP per week less, compared with conventional methods relying on Boolean searches of Medline and Embase. Conclusions We were able to increase recall and coverage of a large living map, whilst reducing its production costs. This finding is likely to be transferrable to OpenAlex, MAG's successor database platform.
Collapse
Affiliation(s)
- Ian Shemilt
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Anneliese Arno
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - James Thomas
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Theo Lorenc
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Claire Khouja
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Gary Raine
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Katy Sutcliffe
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - D'Souza Preethy
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Irene Kwan
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Kath Wright
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Amanda Sowden
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| |
Collapse
|
2
|
Boaz A, Baeza J, Fraser A, Persson E. 'It depends': what 86 systematic reviews tell us about what strategies to use to support the use of research in clinical practice. Implement Sci 2024; 19:15. [PMID: 38374051 PMCID: PMC10875780 DOI: 10.1186/s13012-024-01337-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/05/2024] [Indexed: 02/21/2024] Open
Abstract
BACKGROUND The gap between research findings and clinical practice is well documented and a range of strategies have been developed to support the implementation of research into clinical practice. The objective of this study was to update and extend two previous reviews of systematic reviews of strategies designed to implement research evidence into clinical practice. METHODS We developed a comprehensive systematic literature search strategy based on the terms used in the previous reviews to identify studies that looked explicitly at interventions designed to turn research evidence into practice. The search was performed in June 2022 in four electronic databases: Medline, Embase, Cochrane and Epistemonikos. We searched from January 2010 up to June 2022 and applied no language restrictions. Two independent reviewers appraised the quality of included studies using a quality assessment checklist. To reduce the risk of bias, papers were excluded following discussion between all members of the team. Data were synthesised using descriptive and narrative techniques to identify themes and patterns linked to intervention strategies, targeted behaviours, study settings and study outcomes. RESULTS We identified 32 reviews conducted between 2010 and 2022. The reviews are mainly of multi-faceted interventions (n = 20) although there are reviews focusing on single strategies (ICT, educational, reminders, local opinion leaders, audit and feedback, social media and toolkits). The majority of reviews report strategies achieving small impacts (normally on processes of care). There is much less evidence that these strategies have shifted patient outcomes. Furthermore, a lot of nuance lies behind these headline findings, and this is increasingly commented upon in the reviews themselves. DISCUSSION Combined with the two previous reviews, 86 systematic reviews of strategies to increase the implementation of research into clinical practice have been identified. We need to shift the emphasis away from isolating individual and multi-faceted interventions to better understanding and building more situated, relational and organisational capability to support the use of research in clinical practice. This will involve drawing on a wider range of research perspectives (including social science) in primary studies and diversifying the types of synthesis undertaken to include approaches such as realist synthesis which facilitate exploration of the context in which strategies are employed.
Collapse
Affiliation(s)
- Annette Boaz
- Health and Social Care Workforce Research Unit, The Policy Institute, King's College London, Virginia Woolf Building, 22 Kingsway, London, WC2B 6LE, UK.
| | - Juan Baeza
- King's Business School, King's College London, 30 Aldwych, London, WC2B 4BG, UK
| | - Alec Fraser
- King's Business School, King's College London, 30 Aldwych, London, WC2B 4BG, UK
| | - Erik Persson
- Federal University of Santa Catarina (UFSC), Campus Universitário Reitor João Davi Ferreira Lima, Florianópolis, SC, 88.040-900, Brazil
| |
Collapse
|
3
|
Adam GP, Paynter R. Development of literature search strategies for evidence syntheses: pros and cons of incorporating text mining tools and objective approaches. BMJ Evid Based Med 2023; 28:137-139. [PMID: 35346974 DOI: 10.1136/bmjebm-2021-111892] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/05/2022] [Indexed: 11/04/2022]
Affiliation(s)
- Gaelen P Adam
- Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, Rhode Island, USA
| | - Robin Paynter
- Scientific Resource Center, AHRQ Effective Health Care Program, Portland, Oregon, USA
| |
Collapse
|
4
|
Adam GP, Wallace BC, Trikalinos TA. Semi-automated Tools for Systematic Searches. Methods Mol Biol 2022; 2345:17-40. [PMID: 34550582 DOI: 10.1007/978-1-0716-1566-9_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Traditionally, literature identification for systematic reviews has relied on a two-step process: first, searching databases to identify potentially relevant citations, and then manually screening those citations. A number of tools have been developed to streamline and semi-automate this process, including tools to generate terms; to visualize and evaluate search queries; to trace citation linkages; to deduplicate, limit, or translate searches across databases; and to prioritize relevant abstracts for screening. Research is ongoing into tools that can unify searching and screening into a single step, and several protype tools have been developed. As this field grows, it is becoming increasingly important to develop and codify methods for evaluating the extent to which these tools fulfill their purpose.
Collapse
Affiliation(s)
- Gaelen P Adam
- Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI, USA.
| | - Byron C Wallace
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Thomas A Trikalinos
- Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, RI, USA
| |
Collapse
|
5
|
Shemilt I, Arno A, Thomas J, Lorenc T, Khouja C, Raine G, Sutcliffe K, Preethy D, Kwan I, Wright K, Sowden A. Cost-effectiveness of Microsoft Academic Graph with machine learning for automated study identification in a living map of coronavirus disease 2019 (COVID-19) research. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17141.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: Conventionally, searching for eligible articles to include in systematic reviews and maps of research has relied primarily on information specialists conducting Boolean searches of multiple databases and manually processing the results, including deduplication between these multiple sources. Searching one, comprehensive source, rather than multiple databases, could save time and resources. Microsoft Academic Graph (MAG) is potentially such a source, containing a network graph structure which provides metadata that can be exploited in machine learning processes. Research is needed to establish the relative advantage of using MAG as a single source, compared with conventional searches of multiple databases. This study sought to establish whether: (a) MAG is sufficiently comprehensive to maintain our living map of coronavirus disease 2019 (COVID-19) research; and (b) eligible records can be identified with an acceptably high level of specificity. Methods: We conducted a pragmatic, eight-arm cost-effectiveness analysis (simulation study) to assess the costs, recall and precision of our semi-automated MAG-enabled workflow versus conventional searches of MEDLINE and Embase (with and without machine learning classifiers, active learning and/or fixed screening targets) for maintaining a living map of COVID-19 research. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Results: MAG-enabled workflows dominated MEDLINE-Embase workflows in both the base case and sensitivity analyses. At one month (base case analysis) our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified n=469 more new, eligible articles for inclusion in our living map – and cost £3,179 GBP ($5,691 AUD) less – than conventional MEDLINE-Embase searches without any automation or fixed screening targets. Conclusions: MAG-enabled continuous surveillance workflows have potential to revolutionise study identification methods for living maps, specialised registers, databases of research studies and/or collections of systematic reviews, by increasing their recall and coverage, whilst reducing production costs.
Collapse
|
6
|
Arno A, Elliott J, Wallace B, Turner T, Thomas J. The views of health guideline developers on the use of automation in health evidence synthesis. Syst Rev 2021; 10:16. [PMID: 33419479 PMCID: PMC7796617 DOI: 10.1186/s13643-020-01569-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 12/21/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND The increasingly rapid rate of evidence publication has made it difficult for evidence synthesis-systematic reviews and health guidelines-to be continually kept up to date. One proposed solution for this is the use of automation in health evidence synthesis. Guideline developers are key gatekeepers in the acceptance and use of evidence, and therefore, their opinions on the potential use of automation are crucial. METHODS The objective of this study was to analyze the attitudes of guideline developers towards the use of automation in health evidence synthesis. The Diffusion of Innovations framework was chosen as an initial analytical framework because it encapsulates some of the core issues which are thought to affect the adoption of new innovations in practice. This well-established theory posits five dimensions which affect the adoption of novel technologies: Relative Advantage, Compatibility, Complexity, Trialability, and Observability. Eighteen interviews were conducted with individuals who were currently working, or had previously worked, in guideline development. After transcription, a multiphase mixed deductive and grounded approach was used to analyze the data. First, transcripts were coded with a deductive approach using Rogers' Diffusion of Innovation as the top-level themes. Second, sub-themes within the framework were identified using a grounded approach. RESULTS Participants were consistently most concerned with the extent to which an innovation is in line with current values and practices (i.e., Compatibility in the Diffusion of Innovations framework). Participants were also concerned with Relative Advantage and Observability, which were discussed in approximately equal amounts. For the latter, participants expressed a desire for transparency in the methodology of automation software. Participants were noticeably less interested in Complexity and Trialability, which were discussed infrequently. These results were reasonably consistent across all participants. CONCLUSIONS If machine learning and other automation technologies are to be used more widely and to their full potential in systematic reviews and guideline development, it is crucial to ensure new technologies are in line with current values and practice. It will also be important to maximize the transparency of the methods of these technologies to address the concerns of guideline developers.
Collapse
Affiliation(s)
- Anneliese Arno
- EPPI-Centre, UCL Social Science Research Institute, University College London, London, UK.
| | - Julian Elliott
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Byron Wallace
- Khoury College of Computer Sciences, Northeastern University, Boston, USA
| | - Tari Turner
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - James Thomas
- EPPI-Centre, UCL Social Science Research Institute, University College London, London, UK
| |
Collapse
|
7
|
Gates A, Gates M, DaRosa D, Elliott SA, Pillay J, Rahman S, Vandermeer B, Hartling L. Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews. Syst Rev 2020; 9:272. [PMID: 33243276 PMCID: PMC7694314 DOI: 10.1186/s13643-020-01528-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 11/11/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr's predictions varied by review or study-level characteristics. METHODS For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool's predictions varied by review and study-level characteristics. RESULTS Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) - 1.53 (- 2.92, - 0.15) to - 1.17 (- 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant. CONCLUSION Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.
Collapse
Affiliation(s)
- Allison Gates
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Michelle Gates
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Daniel DaRosa
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Sarah A. Elliott
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Jennifer Pillay
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Sholeh Rahman
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Ben Vandermeer
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Lisa Hartling
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| |
Collapse
|
8
|
El Sherif R, Langlois A, Pandu X, Nie JY, Thomas J, Hong QN, Pluye P. Identifying empirical studies for mixed studies reviews: The mixed filter and the automated text classifier. EDUCATION FOR INFORMATION 2020. [DOI: 10.3233/efi-190347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Reem El Sherif
- Department of Family Medicine, McGill University, Montréal, QC, Canada
| | - Alexis Langlois
- Recherche appliquée en linguistique informatique, Université de Montréal, Montréal, QC, Canada
| | - Xiao Pandu
- Recherche appliquée en linguistique informatique, Université de Montréal, Montréal, QC, Canada
| | - Jian-Yun Nie
- Recherche appliquée en linguistique informatique, Université de Montréal, Montréal, QC, Canada
| | - James Thomas
- EPPI-Centre, Department of Social Science, UCL Institute of Education, University College London, England, UK
| | - Quan Nha Hong
- EPPI-Centre, Department of Social Science, UCL Institute of Education, University College London, England, UK
| | - Pierre Pluye
- Department of Family Medicine, McGill University, Montréal, QC, Canada
| |
Collapse
|
9
|
Gates A, Guitard S, Pillay J, Elliott SA, Dyson MP, Newton AS, Hartling L. Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools. Syst Rev 2019; 8:278. [PMID: 31727150 PMCID: PMC6857345 DOI: 10.1186/s13643-019-1222-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 11/05/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We explored the performance of three machine learning tools designed to facilitate title and abstract screening in systematic reviews (SRs) when used to (a) eliminate irrelevant records (automated simulation) and (b) complement the work of a single reviewer (semi-automated simulation). We evaluated user experiences for each tool. METHODS We subjected three SRs to two retrospective screening simulations. In each tool (Abstrackr, DistillerSR, RobotAnalyst), we screened a 200-record training set and downloaded the predicted relevance of the remaining records. We calculated the proportion missed and workload and time savings compared to dual independent screening. To test user experiences, eight research staff tried each tool and completed a survey. RESULTS Using Abstrackr, DistillerSR, and RobotAnalyst, respectively, the median (range) proportion missed was 5 (0 to 28) percent, 97 (96 to 100) percent, and 70 (23 to 100) percent for the automated simulation and 1 (0 to 2) percent, 2 (0 to 7) percent, and 2 (0 to 4) percent for the semi-automated simulation. The median (range) workload savings was 90 (82 to 93) percent, 99 (98 to 99) percent, and 85 (85 to 88) percent for the automated simulation and 40 (32 to 43) percent, 49 (48 to 49) percent, and 35 (34 to 38) percent for the semi-automated simulation. The median (range) time savings was 154 (91 to 183), 185 (95 to 201), and 157 (86 to 172) hours for the automated simulation and 61 (42 to 82), 92 (46 to 100), and 64 (37 to 71) hours for the semi-automated simulation. Abstrackr identified 33-90% of records missed by a single reviewer. RobotAnalyst performed less well and DistillerSR provided no relative advantage. User experiences depended on user friendliness, qualities of the user interface, features and functions, trustworthiness, ease and speed of obtaining predictions, and practicality of the export file(s). CONCLUSIONS The workload savings afforded in the automated simulation came with increased risk of missing relevant records. Supplementing a single reviewer's decisions with relevance predictions (semi-automated simulation) sometimes reduced the proportion missed, but performance varied by tool and SR. Designing tools based on reviewers' self-identified preferences may improve their compatibility with present workflows. SYSTEMATIC REVIEW REGISTRATION Not applicable.
Collapse
Affiliation(s)
- Allison Gates
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Samantha Guitard
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Jennifer Pillay
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Sarah A Elliott
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Michele P Dyson
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Amanda S Newton
- Department of Pediatrics, University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Lisa Hartling
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
| |
Collapse
|
10
|
Norman CR, Leeflang MMG, Porcher R, Névéol A. Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy. Syst Rev 2019; 8:243. [PMID: 31661028 PMCID: PMC6819363 DOI: 10.1186/s13643-019-1162-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 09/13/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. METHODS We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. RESULTS The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. CONCLUSION Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.
Collapse
Affiliation(s)
- Christopher R. Norman
- LIMSI, CNRS, Université Paris Saclay, Rue du Belvedère, Orsay, 91405 France
- Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, Amsterdam, 1105 AZ the Netherlands
| | - Mariska M. G. Leeflang
- Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, Amsterdam, 1105 AZ the Netherlands
| | - Raphaël Porcher
- Center for Clinical Epidemiology, Assistance Publique–Hôpitaux de Paris, Hôtel Dieu Hospital; Team METHODS, CRESS, INSERM U1153; University Paris Descartes, 1 place du Parvis Notre-Dame, Paris, 75004 France
| | - Aurélie Névéol
- LIMSI, CNRS, Université Paris Saclay, Rue du Belvedère, Orsay, 91405 France
| |
Collapse
|
11
|
O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev 2019; 8:143. [PMID: 31215463 PMCID: PMC6582554 DOI: 10.1186/s13643-019-1062-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 06/05/2019] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Although many aspects of systematic reviews use computational tools, systematic reviewers have been reluctant to adopt machine learning tools. DISCUSSION We discuss that the potential reason for the slow adoption of machine learning tools into systematic reviews is multifactorial. We focus on the current absence of trust in automation and set-up challenges as major barriers to adoption. It is important that reviews produced using automation tools are considered non-inferior or superior to current practice. However, this standard will likely not be sufficient to lead to widespread adoption. As with many technologies, it is important that reviewers see "others" in the review community using automation tools. Adoption will also be slow if the automation tools are not compatible with workflows and tasks currently used to produce reviews. Many automation tools being developed for systematic reviews mimic classification problems. Therefore, the evidence that these automation tools are non-inferior or superior can be presented using methods similar to diagnostic test evaluations, i.e., precision and recall compared to a human reviewer. However, the assessment of automation tools does present unique challenges for investigators and systematic reviewers, including the need to clarify which metrics are of interest to the systematic review community and the unique documentation challenges for reproducible software experiments. CONCLUSION We discuss adoption barriers with the goal of providing tool developers with guidance as to how to design and report such evaluations and for end users to assess their validity. Further, we discuss approaches to formatting and announcing publicly available datasets suitable for assessment of automation technologies and tools. Making these resources available will increase trust that tools are non-inferior or superior to current practice. Finally, we identify that, even with evidence that automation tools are non-inferior or superior to current practice, substantial set-up challenges remain for main stream integration of automation into the systematic review process.
Collapse
Affiliation(s)
| | - Guy Tsafnat
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | | | | | | | - Brian Hutton
- Knowledge Synthesis Unit, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6 Canada
| |
Collapse
|
12
|
van Altena AJ, Spijker R, Olabarriaga SD. Usage of automation tools in systematic reviews. Res Synth Methods 2019; 10:72-82. [PMID: 30561081 DOI: 10.1002/jrsm.1335] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 09/28/2018] [Accepted: 12/11/2018] [Indexed: 12/12/2022]
Abstract
Systematic reviews are a cornerstone of today's evidence-informed decision making. With the rapid expansion of questions to be addressed and scientific information produced, there is a growing workload on reviewers, making the current practice unsustainable without the aid of automation tools. While many automation tools have been developed and are available, uptake seems to be lagging. For this reason, we set out to investigate the current level of uptake and what the potential barriers and facilitators are for the adoption of automation tools in systematic reviews. We deployed surveys among systematic reviewers that gathered information on tool uptake, demographics, systematic review characteristics, and barriers and facilitators for uptake. Systematic reviewers from multiple domains were targeted during recruitment; however, responders were predominantly from the biomedical sciences. We found that automation tools are currently not widely used among the participants. When tools are used, participants mostly learn about them from their environment, for example, through colleagues, peers, or organization. Tools are often chosen on the basis of user experience, either by own experience or from colleagues or peers. Lastly, licensing, steep learning curve, lack of support, and mismatch to workflow are often reported by participants as relevant barriers. While conclusions can only be drawn for the biomedical field, our work provides evidence and confirms the conclusions and recommendations of previous work, which was based on expert opinions. Furthermore, our study highlights the importance that organizations and best practices in a field can have for the uptake of automation tools for systematic reviews.
Collapse
Affiliation(s)
- A J van Altena
- Department of Epidemiology, Biostatistics, and Bioinformatics, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - R Spijker
- Medical Library, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.,Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - S D Olabarriaga
- Department of Epidemiology, Biostatistics, and Bioinformatics, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
13
|
Abstract
Systematic review is a type of literature review designed to synthesize all available evidence on a given question. Systematic reviews require significant time and effort, which has led to the continuing development of computer support. This paper seeks to identify the gaps and opportunities for computer support. By interviewing experienced systematic reviewers from diverse fields, we identify the technical problems and challenges reviewers face in conducting a systematic review and their current uses of computer support. We propose potential research directions for how computer support could help to speed the systematic review process while retaining or improving review quality.
Collapse
|
14
|
Michie S, Johnston M. Optimising the value of the evidence generated in implementation science: the use of ontologies to address the challenges. Implement Sci 2017; 12:131. [PMID: 29137660 PMCID: PMC5686802 DOI: 10.1186/s13012-017-0660-2] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 10/30/2017] [Indexed: 12/12/2022] Open
Abstract
Implementing research findings into healthcare practice and policy is a complex process occurring in diverse contexts; it invariably depends on changing human behaviour in many parts of an intricate implementation system. Questions asked with the aim of improving implementation are multifarious variants of 'What works, compared with what, how well, with what exposure, with what behaviours (for how long), for whom, in what setting and why?'. Relevant evidence is being published at a high rate, but its quantity, complexity and lack of shared terminologies present challenges. The achievement of efficient, effective and timely synthesis of evidence is facilitated by using 'ontologies' to systematically structure and organise the evidence about constructs and their relationships, using a controlled, well-defined vocabulary.
Collapse
Affiliation(s)
- Susan Michie
- Centre for Behaviour Change, University College London, 1-19 Torrington Place, London, WC1E 7HB UK
| | - Marie Johnston
- Aberdeen Health Psychology Group, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
15
|
Stansfield C, O'Mara-Eves A, Thomas J. Text mining for search term development in systematic reviewing: A discussion of some methods and challenges. Res Synth Methods 2017; 8:355-365. [PMID: 28660680 DOI: 10.1002/jrsm.1250] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Revised: 03/08/2017] [Accepted: 05/14/2017] [Indexed: 11/10/2022]
Abstract
Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews.
Collapse
Affiliation(s)
- Claire Stansfield
- Evidence for Policy and Practice Information Co-ordinating (EPPI-) Centre, Social Science Research Unit, UCL Institute of Education, University College London, London, UK
| | - Alison O'Mara-Eves
- Evidence for Policy and Practice Information Co-ordinating (EPPI-) Centre, Social Science Research Unit, UCL Institute of Education, University College London, London, UK
| | - James Thomas
- Evidence for Policy and Practice Information Co-ordinating (EPPI-) Centre, Social Science Research Unit, UCL Institute of Education, University College London, London, UK
| |
Collapse
|
16
|
Evidence & Gap Maps: A tool for promoting evidence informed policy and strategic research agendas. J Clin Epidemiol 2016; 79:120-129. [DOI: 10.1016/j.jclinepi.2016.05.015] [Citation(s) in RCA: 108] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 04/26/2016] [Accepted: 05/06/2016] [Indexed: 11/23/2022]
|
17
|
Literature Review of the National CLAS Standards: Policy and Practical Implications in Reducing Health Disparities. J Racial Ethn Health Disparities 2016; 4:632-647. [PMID: 27444488 DOI: 10.1007/s40615-016-0267-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 07/01/2016] [Accepted: 07/05/2016] [Indexed: 10/21/2022]
Abstract
The National Standards for Culturally and Linguistically Appropriate Services (CLAS) in Health and Health Care are a practical tool for health and health care organizations to improve their provision of culturally and linguistically appropriate services (CLAS). Published by the Office of Minority Health at the U.S. Department of Health and Human Services, the National CLAS Standards provide health and health care organizations with a set of action steps for better meeting the needs of individuals from culturally and linguistically diverse backgrounds. Few studies have examined the concept of CLAS or the National CLAS Standards, and they have rarely been extensively studied or reviewed. The authors conducted three literature searches between February 2014 and May 2015, examining the organizational challenges, applicability, and policy implications related to the National CLAS Standards or CLAS, and selected 55 articles for inclusion in the review. The literature highlights a number of challenges in implementing the National CLAS Standards and/or providing CLAS, including issues related to the communication within health care organizations and the inconsistency of accountability measures. This literature review contributes to the growing knowledge base of the National CLAS Standards and CLAS in health and health care.
Collapse
|
18
|
Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Inform Assoc 2016; 23:193-201. [PMID: 26104742 PMCID: PMC4713900 DOI: 10.1093/jamia/ocv044] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 04/16/2015] [Accepted: 04/18/2015] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From a (PDF-formatted) trial report, the system should determine risks of bias for the domains defined by the Cochrane Risk of Bias (RoB) tool, and extract supporting text for these judgments. METHODS We algorithmically annotated 12,808 trial PDFs using data from the Cochrane Database of Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk of bias for each domain, and sentences were labeled as being informative or not. This dataset was used to train a multi-task ML model. We estimated the accuracy of ML judgments versus humans by comparing trials with two or more independent RoB assessments in the CDSR. Twenty blinded experienced reviewers rated the relevance of supporting text, comparing ML output with equivalent (human-extracted) text from the CDSR. RESULTS By retrieving the top 3 candidate sentences per document (top3 recall), the best ML text was rated more relevant than text from the CDSR, but not significantly (60.4% ML text rated 'highly relevant' v 56.5% of text from reviews; difference +3.9%, [-3.2% to +10.9%]). Model RoB judgments were less accurate than those from published reviews, though the difference was <10% (overall accuracy 71.0% with ML v 78.3% with CDSR). CONCLUSION Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses.
Collapse
Affiliation(s)
- Iain J Marshall
- Department of Primary Care and Public Health Sciences, King's College London, UK
| | - Joël Kuiper
- University Medical Center, University of Groningen, Groningen, The Netherlands
| | - Byron C Wallace
- School of Information, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
19
|
O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 2015; 4:5. [PMID: 25588314 PMCID: PMC4320539 DOI: 10.1186/2046-4053-4-5] [Citation(s) in RCA: 262] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2014] [Accepted: 12/10/2014] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. METHODS Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. RESULTS The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). CONCLUSIONS Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.
Collapse
Affiliation(s)
- Alison O’Mara-Eves
- />Evidence for Policy and Practice Information and Coordinating (EPPI)-Centre, Social Science Research Unit, UCL Institute of Education, University of London, London, UK
| | - James Thomas
- />Evidence for Policy and Practice Information and Coordinating (EPPI)-Centre, Social Science Research Unit, UCL Institute of Education, University of London, London, UK
| | - John McNaught
- />The National Centre for Text Mining and School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Makoto Miwa
- />Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, 468-8511 Japan
| | - Sophia Ananiadou
- />The National Centre for Text Mining and School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| |
Collapse
|