1
|
Liu W, Duinkharjav B, Sun Q, Zhang SQ. FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Efficient Foveated Rendering in Virtual Reality. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:3183-3193. [PMID: 40067704 DOI: 10.1109/tvcg.2025.3549577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
Leveraging real-time eye tracking, foveated rendering optimizes hardware efficiency and enhances visual quality virtual reality (VR). This approach leverages eye-tracking techniques to determine where the user is looking, allowing the system to render high-resolution graphics only in the foveal region-the small area of the retina where visual acuity is highest, while the peripheral view is rendered at lower resolution. However, modern deep learning-based gaze-tracking solutions often exhibit a long-tail distribution of tracking errors, which can degrade user experience and reduce the benefits of foveated rendering by causing misalignment and decreased visual quality. This paper introduces FovealNet, an advanced AI-driven gaze tracking framework designed to optimize system performance by strategically enhancing gaze tracking accuracy. To further reduce the implementation cost of the gaze tracking algorithm, FovealNet employs an event-based cropping method that eliminates over 64.8% of irrelevant pixels from the input image. Additionally, it incorporates a simple yet effective token-pruning strategy that dynamically removes tokens on the fly without compromising tracking accuracy. Finally, to support different runtime rendering configurations, we propose a system performance-aware multi-resolution training strategy, allowing the gaze tracking DNN to adapt and optimize overall system performance more effectively. Evaluation results demonstrate that FovealNet achieves at least 1.42× speed up compared to previous methods and 13% increase in perceptual quality for foveated output. The code is available at https://github.com/wl3181/FovealNet.
Collapse
|
2
|
Wang J, Wang T, Xu B, Cossairt O, Willomitzer F. Accurate eye tracking from dense 3D surface reconstructions using single-shot deflectometry. Nat Commun 2025; 16:2902. [PMID: 40169547 PMCID: PMC11962075 DOI: 10.1038/s41467-025-56801-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/31/2025] [Indexed: 04/03/2025] Open
Abstract
Eye-tracking plays a crucial role in the development of virtual reality devices, neuroscience research, and psychology. Despite its significance in numerous applications, achieving an accurate, robust, and fast eye-tracking solution remains a considerable challenge for current state-of-the-art methods. While existing reflection-based techniques (e.g., "glint tracking") are considered to be very accurate, their performance is limited by their reliance on sparse 3D surface data acquired solely from the cornea surface. In this paper, we rethink the way how specular reflections can be used for eye tracking: We propose a method for accurate and fast evaluation of the gaze direction that exploits teachings from single-shot phase-measuring-deflectometry. In contrast to state-of-the-art reflection-based methods, our method acquires dense 3D surface information of both cornea and sclera within only one single camera frame (single-shot). For a typical measurement, we acquire >3000× more surface reflection points ("glints") than conventional methods. We show the feasibility of our approach with experimentally evaluated gaze errors on a realistic model eye below only 0.13°. Moreover, we demonstrate quantitative measurements on real human eyes in vivo, reaching accuracy values between only 0.46° and 0.97°.
Collapse
Affiliation(s)
- Jiazhang Wang
- Wyant College of Optical Sciences, University of Arizona, Tuscon, AZ, 85721, USA.
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, 60208, USA.
| | - Tianfu Wang
- Department of Computer Science, ETH Zürich, Zürich, 8092, Switzerland
| | - Bingjie Xu
- Department of Computer Science, Northwestern University, Evanston, IL, 60208, USA
| | - Oliver Cossairt
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, 60208, USA
- Department of Computer Science, Northwestern University, Evanston, IL, 60208, USA
| | - Florian Willomitzer
- Wyant College of Optical Sciences, University of Arizona, Tuscon, AZ, 85721, USA.
- Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, 60208, USA.
- Department of Computer Science, Northwestern University, Evanston, IL, 60208, USA.
| |
Collapse
|
3
|
Byrne SA, Maquiling V, Nyström M, Kasneci E, Niehorster DC. LEyes: A lightweight framework for deep learning-based eye tracking using synthetic eye images. Behav Res Methods 2025; 57:129. [PMID: 40164925 PMCID: PMC11958443 DOI: 10.3758/s13428-025-02645-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/04/2025] [Indexed: 04/02/2025]
Abstract
Deep learning methods have significantly advanced the field of gaze estimation, yet the development of these algorithms is often hindered by a lack of appropriate publicly accessible training datasets. Moreover, models trained on the few available datasets often fail to generalize to new datasets due to both discrepancies in hardware and biological diversity among subjects. To mitigate these challenges, the research community has frequently turned to synthetic datasets, although this approach also has drawbacks, such as the computational resource and labor-intensive nature of creating photorealistic representations of eye images to be used as training data. In response, we introduce "Light Eyes" (LEyes), a novel framework that diverges from traditional photorealistic methods by utilizing simple synthetic image generators to train neural networks for detecting key image features like pupils and corneal reflections, diverging from traditional photorealistic approaches. LEyes facilitates the generation of synthetic data on the fly that is adaptable to any recording device and enhances the efficiency of training neural networks for a wide range of gaze-estimation tasks. Presented evaluations show that LEyes, in many cases, outperforms existing methods in accurately identifying and localizing pupils and corneal reflections across diverse datasets. Additionally, models trained using LEyes data outperform standard eye trackers while employing more cost-effective hardware, offering a promising avenue to overcome the current limitations in gaze estimation technology.
Collapse
Affiliation(s)
| | - Virmarie Maquiling
- Human-Centered Technologies for Learning, Technical University of Munich, Munich, Germany
| | - Marcus Nyström
- Lund University Humanities Lab, Lund University, Lund, Sweden
| | - Enkelejda Kasneci
- Human-Centered Technologies for Learning, Technical University of Munich, Munich, Germany
| | - Diederick C Niehorster
- Lund University Humanities Lab, Lund University, Lund, Sweden.
- Department of Psychology, Lund University, Lund, Sweden.
| |
Collapse
|
4
|
Pan W, Liang R, Wang Y, Song D, Yin Z. Situational Awareness Prediction for Remote Tower Controllers Based on Eye-Tracking and Heart Rate Variability Data. SENSORS (BASEL, SWITZERLAND) 2025; 25:2052. [PMID: 40218565 PMCID: PMC11991212 DOI: 10.3390/s25072052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 03/23/2025] [Accepted: 03/24/2025] [Indexed: 04/14/2025]
Abstract
Remote tower technology is an important development direction for air traffic control to reduce the construction and operation costs of small or remote airports. However, its digital and virtualized working environment poses new challenges to controllers' situational awareness (SA). In this study, a dataset is constructed by collecting eye-tracking (ET) and heart rate variability (HRV) data from participants in a remote tower simulation control experiment. At the same time, probe questions are designed that correspond to the SA hierarchy in conjunction with the remote tower control task flow, and the dataset is annotated using the scenario presentation assessment method (SPAM). The annotated dataset containing 25 ET and HRV features is trained using the LightGBM model optimized by a Tree-structured Parzen Estimator, and feature selection and model interpretation are performed using the SHapley Additive exPlanations (SHAP) analysis. The results show that the TPE-LightGBM model exhibits excellent prediction capability, obtaining an RMSE, MAE and adjusted R2 of 0.0909, 0.0730 and 0.7845, respectively. This study presents an effective method for assessing and predicting controllers' SA in remote tower environments. It further provides a theoretical basis for understanding the effect of the physiological state of remote tower controllers on their SA.
Collapse
Affiliation(s)
- Weijun Pan
- Flight Technology and Flight Safety Research Base of the Civil Aviation Administration of China, Civil Aviation Flight University of China, Guanghan 618307, China; (W.P.); (D.S.)
| | - Ruihan Liang
- College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, China;
| | - Yuhao Wang
- College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, China;
| | - Dajiang Song
- Flight Technology and Flight Safety Research Base of the Civil Aviation Administration of China, Civil Aviation Flight University of China, Guanghan 618307, China; (W.P.); (D.S.)
| | - Zirui Yin
- School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 611756, China;
| |
Collapse
|
5
|
Zhao Z, Meng H, Li S, Wang S, Wang J, Gao S. High-Accuracy Intermittent Strabismus Screening via Wearable Eye-Tracking and AI-Enhanced Ocular Feature Analysis. BIOSENSORS 2025; 15:110. [PMID: 39997012 PMCID: PMC11852461 DOI: 10.3390/bios15020110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Revised: 02/11/2025] [Accepted: 02/13/2025] [Indexed: 02/26/2025]
Abstract
An effective and highly accurate strabismus screening method is expected to identify potential patients and provide timely treatment to prevent further deterioration, such as amblyopia and even permanent vision loss. To satisfy this need, this work showcases a novel strabismus screening method based on a wearable eye-tracking device combined with an artificial intelligence (AI) algorithm. To identify the minor and occasional inconsistencies in strabismus patients during the binocular coordination process, which are usually seen in early-stage patients and rarely recognized in current studies, the system captures temporally and spatially continuous high-definition infrared images of the eye during wide-angle continuous motion, and is effective in inducing intermittent strabismus. Based on the collected eye motion information, 16 features of the oculomotor process with strong physiological interpretations, which help biomedical staff understand and evaluate results generated later, are calculated through the introduction of pupil-canthus vectors. These features can be normalized, and reflect individual differences. After these features are processed by the random forest (RF) algorithm, this method experimentally yields 97.1% accuracy in strabismus detection in 70 people under diverse indoor testing conditions, validating the high accuracy and robustness of the method, and implying that the method has strong potential to support widespread and highly accurate strabismus screening.
Collapse
Affiliation(s)
| | | | | | | | | | - Shuo Gao
- School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China; (Z.Z.); (H.M.); (S.L.); (S.W.); (J.W.)
| |
Collapse
|
6
|
Nyström M, Hooge ITC, Hessels RS, Andersson R, Hansen DW, Johansson R, Niehorster DC. The fundamentals of eye tracking part 3: How to choose an eye tracker. Behav Res Methods 2025; 57:67. [PMID: 39843609 PMCID: PMC11754381 DOI: 10.3758/s13428-024-02587-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2024] [Indexed: 01/24/2025]
Abstract
There is an abundance of commercial and open-source eye trackers available for researchers interested in gaze and eye movements. Which aspects should be considered when choosing an eye tracker? The paper describes what distinguishes different types of eye trackers, their suitability for different types of research questions, and highlights questions researchers should ask themselves to make an informed choice.
Collapse
Affiliation(s)
- Marcus Nyström
- Lund University Humanities Lab, Box 201, SE, 221 00, Lund, Sweden.
| | - Ignace T C Hooge
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | - Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | | | | | | | - Diederick C Niehorster
- Lund University Humanities Lab, Box 201, SE, 221 00, Lund, Sweden
- Department of Psychology, Lund University, Lund, Sweden
| |
Collapse
|
7
|
Niehorster DC, Nyström M, Hessels RS, Andersson R, Benjamins JS, Hansen DW, Hooge ITC. The fundamentals of eye tracking part 4: Tools for conducting an eye tracking study. Behav Res Methods 2025; 57:46. [PMID: 39762687 PMCID: PMC11703944 DOI: 10.3758/s13428-024-02529-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2024] [Indexed: 01/11/2025]
Abstract
Researchers using eye tracking are heavily dependent on software and hardware tools to perform their studies, from recording eye tracking data and visualizing it, to processing and analyzing it. This article provides an overview of available tools for research using eye trackers and discusses considerations to make when choosing which tools to adopt for one's study.
Collapse
Affiliation(s)
- Diederick C Niehorster
- Lund University Humanities Lab and Department of Psychology, Lund University, Lund, Sweden.
| | - Marcus Nyström
- Lund University Humanities Lab, Lund University, Lund, Sweden
| | - Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| | | | - Jeroen S Benjamins
- Experimental Psychology, Helmholtz Institute & Social, Health and Organizational Psychology, Utrecht University, Utrecht, the Netherlands
| | - Dan Witzner Hansen
- Eye Information Laboratory, IT University of Copenhagen, Copenhagen, Denmark
| | - Ignace T C Hooge
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
8
|
Lapsansky AB, Kreyenmeier P, Spering M, Wylie DR, Altshuler DL. Hummingbirds use compensatory eye movements to stabilize both rotational and translational visual motion. Proc Biol Sci 2025; 292:20242015. [PMID: 39809307 PMCID: PMC11732407 DOI: 10.1098/rspb.2024.2015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Revised: 10/18/2024] [Accepted: 11/15/2024] [Indexed: 01/16/2025] Open
Abstract
To maintain stable vision, behaving animals make compensatory eye movements in response to image slip, a reflex known as the optokinetic response (OKR). Although OKR has been studied in several avian species, eye movements during flight are expected to be minimal. This is because vertebrates with laterally placed eyes typically show weak OKR to nasal-to-temporal motion (NT), which simulates typical forward locomotion, compared with temporal-to-nasal motion (TN), which simulates atypical backward locomotion. This OKR asymmetry is also reflected in the pretectum, wherein neurons sensitive to global visual motion also exhibit a TN bias. Hummingbirds, however, stabilize visual motion in all directions through whole-body movements and are unique among vertebrates in that they lack a pretectal bias. We therefore predicted that OKR in hummingbirds would be symmetrical. We measured OKR in restrained hummingbirds by presenting gratings drifting across a range of speeds. OKR in hummingbirds was asymmetrical, although the direction of asymmetry varied with stimulus speed. Hummingbirds moved their eyes largely independently of one another. Consistent with weak eye-to-eye coupling, hummingbirds also exhibited disjunctive OKR to visual motion simulating forward and backward translation. This unexpected oculomotor behaviour, previously unexplored in birds, suggests a potential role for compensatory eye movements during flight.
Collapse
Affiliation(s)
- Anthony B. Lapsansky
- Salish Sea Research Center, Northwest Indian College, Bellingham, WA98226, USA
- Department of Zoology, University of British Columbia, Vancouver, British ColumbiaV6T 1Z4, Canada
| | - Philipp Kreyenmeier
- Department of Ophthalmology & Visual Sciences, University of British Columbia, Vancouver, British ColumbiaV5Z 3N9, Canada
| | - Miriam Spering
- Department of Ophthalmology & Visual Sciences, University of British Columbia, Vancouver, British ColumbiaV5Z 3N9, Canada
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British ColumbiaV6T 1Z3, Canada
| | - Douglas R. Wylie
- Department of Biological Sciences, University of Alberta, Edmonton, AlbertaT6G 2R3, Canada
| | - Douglas L. Altshuler
- Department of Zoology, University of British Columbia, Vancouver, British ColumbiaV6T 1Z4, Canada
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British ColumbiaV6T 1Z3, Canada
| |
Collapse
|
9
|
Kowalski B, Huang X, Dubra A. Embedded CPU-GPU pupil tracking. BIOMEDICAL OPTICS EXPRESS 2024; 15:6799-6815. [PMID: 39679407 PMCID: PMC11640584 DOI: 10.1364/boe.541421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 10/30/2024] [Accepted: 11/05/2024] [Indexed: 12/17/2024]
Abstract
We explore camera-based pupil tracking using high-level programming in computing platforms with end-user discrete and integrated central processing units (CPUs) and graphics processing units (GPUs), seeking low calculation latencies previously achieved with specialized hardware and programming (Kowalski et al., [Biomed. Opt. Express12, 6496 (2021)10.1364/BOE.433766]. Various desktop and embedded computers were tested, some with two operating systems, using the traditional sequential pupil tracking paradigm, in which the processing of the camera image only starts after it is fully downloaded to the computer. The pupil tracking was demonstrated using two Scheimpflug optical setups, telecentric in both image and object spaces, with different optical magnifications and nominal diffraction-limited performance over an ∼18 mm full field of view illuminated with 940 nm light. Eye images from subjects with different iris and skin pigmentation captured at this wavelength suggest that the proposed pupil tracking does not suffer from ethnic bias. The optical axis of the setups is tilted at 45° to facilitate integration with other instruments without the need for beam splitting. Tracking with ∼0.9-4.4 µm precision and safe light levels was demonstrated using two complementary metal-oxide-semiconductor cameras with global shutter, operating at 438 and 1,045 fps with an ∼500 × 420 pixel region of interest (ROI), and at 633 and 1,897 fps with ∼315 × 280 pixel ROI. For these image sizes, the desktop computers achieved calculation times as low as 0.5 ms, while low-cost embedded computers delivered calculation times in the 0.8-1.3 ms range.
Collapse
Affiliation(s)
| | - Xiaojing Huang
- Department of Ophthalmology, Stanford University, Palo Alto, CA 94303, USA
| | - Alfredo Dubra
- Department of Ophthalmology, Stanford University, Palo Alto, CA 94303, USA
| |
Collapse
|
10
|
Lopes A, Ward AD, Cecchini M. Eye tracking in digital pathology: A comprehensive literature review. J Pathol Inform 2024; 15:100383. [PMID: 38868488 PMCID: PMC11168484 DOI: 10.1016/j.jpi.2024.100383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 04/28/2024] [Accepted: 05/14/2024] [Indexed: 06/14/2024] Open
Abstract
Eye tracking has been used for decades in attempt to understand the cognitive processes of individuals. From memory access to problem-solving to decision-making, such insight has the potential to improve workflows and the education of students to become experts in relevant fields. Until recently, the traditional use of microscopes in pathology made eye tracking exceptionally difficult. However, the digital revolution of pathology from conventional microscopes to digital whole slide images allows for new research to be conducted and information to be learned with regards to pathologist visual search patterns and learning experiences. This has the promise to make pathology education more efficient and engaging, ultimately creating stronger and more proficient generations of pathologists to come. The goal of this review on eye tracking in pathology is to characterize and compare the visual search patterns of pathologists. The PubMed and Web of Science databases were searched using 'pathology' AND 'eye tracking' synonyms. A total of 22 relevant full-text articles published up to and including 2023 were identified and included in this review. Thematic analysis was conducted to organize each study into one or more of the 10 themes identified to characterize the visual search patterns of pathologists: (1) effect of experience, (2) fixations, (3) zooming, (4) panning, (5) saccades, (6) pupil diameter, (7) interpretation time, (8) strategies, (9) machine learning, and (10) education. Expert pathologists were found to have higher diagnostic accuracy, fewer fixations, and shorter interpretation times than pathologists with less experience. Further, literature on eye tracking in pathology indicates that there are several visual strategies for diagnostic interpretation of digital pathology images, but no evidence of a superior strategy exists. The educational implications of eye tracking in pathology have also been explored but the effect of teaching novices how to search as an expert remains unclear. In this article, the main challenges and prospects of eye tracking in pathology are briefly discussed along with their implications to the field.
Collapse
Affiliation(s)
- Alana Lopes
- Department of Medical Biophysics, Western University, London, ON N6A 3K7, Canada
- Gerald C. Baines Centre, London Health Sciences Centre, London, ON N6A 5W9, Canada
| | - Aaron D. Ward
- Department of Medical Biophysics, Western University, London, ON N6A 3K7, Canada
- Gerald C. Baines Centre, London Health Sciences Centre, London, ON N6A 5W9, Canada
- Department of Oncology, Western University, London, ON N6A 3K7, Canada
| | - Matthew Cecchini
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 3K7, Canada
| |
Collapse
|
11
|
Emile Tatinyuy V, Noumsi Woguia AV, ngono JM, FONO LA. Multi-stage gaze-controlled virtual keyboard using eye tracking. PLoS One 2024; 19:e0309832. [PMID: 39466739 PMCID: PMC11516013 DOI: 10.1371/journal.pone.0309832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 08/19/2024] [Indexed: 10/30/2024] Open
Abstract
This study presents a novel multi-stage hierarchical approach to optimize key selection on virtual keyboards using eye gaze. Existing single-stage selection algorithms have difficulty with distant keys on large interfaces. The proposed technique divides the standard QWERTY keyboard into progressively smaller regions guided by eye movements, with boundary fixations first selecting halves and quarters to sequentially narrow the search area. Within each region, keys are highlighted one by one for selection. An experiment compared the multi-stage approach to single-step techniques, having participants copy text using eye gaze alone under both conditions. Metrics on selection speed, words per minute, and usability ratings were significantly improved with the hierarchical technique. Half and quarter selection times decreased over 30% on average while maintaining accuracy, with overall task completion 20% faster. Users also rated the multi-stage approach as more comfortable and easier to use. The multi-level refinement of the selection area optimized interaction efficiency for gaze-based text entry.
Collapse
Affiliation(s)
| | | | - Joseph mvogo ngono
- Department of Applied Computer Science, University of Douala, Douala, Cameroon
| | - Louis Aimé FONO
- Department of Mathematics and Computer Science, University of Douala, Douala, Cameroon
| |
Collapse
|
12
|
Drews M, Dierkes K. Strategies for enhancing automatic fixation detection in head-mounted eye tracking. Behav Res Methods 2024; 56:6276-6298. [PMID: 38594440 PMCID: PMC11541274 DOI: 10.3758/s13428-024-02360-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2024] [Indexed: 04/11/2024]
Abstract
Moving through a dynamic world, humans need to intermittently stabilize gaze targets on their retina to process visual information. Overt attention being thus split into discrete intervals, the automatic detection of such fixation events is paramount to downstream analysis in many eye-tracking studies. Standard algorithms tackle this challenge in the limiting case of little to no head motion. In this static scenario, which is approximately realized for most remote eye-tracking systems, it amounts to detecting periods of relative eye stillness. In contrast, head-mounted eye trackers allow for experiments with subjects moving naturally in everyday environments. Detecting fixations in these dynamic scenarios is more challenging, since gaze-stabilizing eye movements need to be reliably distinguished from non-fixational gaze shifts. Here, we propose several strategies for enhancing existing algorithms developed for fixation detection in the static case to allow for robust fixation detection in dynamic real-world scenarios recorded with head-mounted eye trackers. Specifically, we consider (i) an optic-flow-based compensation stage explicitly accounting for stabilizing eye movements during head motion, (ii) an adaptive adjustment of algorithm sensitivity according to head-motion intensity, and (iii) a coherent tuning of all algorithm parameters. Introducing a new hand-labeled dataset, recorded with the Pupil Invisible glasses by Pupil Labs, we investigate their individual contributions. The dataset comprises both static and dynamic scenarios and is made publicly available. We show that a combination of all proposed strategies improves standard thresholding algorithms and outperforms previous approaches to fixation detection in head-mounted eye tracking.
Collapse
Affiliation(s)
- Michael Drews
- Pupil Labs, Sanderstraße 28, 12047, Berlin, Germany.
| | - Kai Dierkes
- Pupil Labs, Sanderstraße 28, 12047, Berlin, Germany.
| |
Collapse
|
13
|
Band TG, Bar-Or RZ, Ben-Ami E. Advancements in eye movement measurement technologies for assessing neurodegenerative diseases. Front Digit Health 2024; 6:1423790. [PMID: 39027628 PMCID: PMC11254822 DOI: 10.3389/fdgth.2024.1423790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 06/20/2024] [Indexed: 07/20/2024] Open
Abstract
Eye movements have long been recognized as a valuable indicator of neurological conditions, given the intricate involvement of multiple neurological pathways in vision-related processes, including motor and cognitive functions, manifesting in rapid response times. Eye movement abnormalities can indicate neurological condition severity and, in some cases, distinguish between disease phenotypes. With recent strides in imaging sensors and computational power, particularly in machine learning and artificial intelligence, there has been a notable surge in the development of technologies facilitating the extraction and analysis of eye movements to assess neurodegenerative diseases. This mini-review provides an overview of these advancements, emphasizing their potential in offering patient-friendly oculometric measures to aid in assessing patient conditions and progress. By summarizing recent technological innovations and their application in assessing neurodegenerative diseases over the past decades, this review also delves into current trends and future directions in this expanding field.
Collapse
Affiliation(s)
| | - Rotem Z. Bar-Or
- Department of Neuroscience, NeuraLight Ltd., Tel Aviv, Israel
| | | |
Collapse
|
14
|
Jindal S, Yadav M, Manduchi R. Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation. CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. WORKSHOPS 2024; 2024:604-614. [PMID: 39493731 PMCID: PMC11529379 DOI: 10.1109/cvprw63382.2024.00065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
Gaze is an essential prompt for analyzing human behavior and attention. Recently, there has been an increasing interest in determining gaze direction from facial videos. However, video gaze estimation faces significant challenges, such as understanding the dynamic evolution of gaze in video sequences, dealing with static backgrounds, and adapting to variations in illumination. To address these challenges, we propose a simple and novel deep learning model designed to estimate gaze from videos, incorporating a specialized attention module. Our method employs a spatial attention mechanism that tracks spatial dynamics within videos. This technique enables accurate gaze direction prediction through a temporal sequence model, adeptly transforming spatial observations into temporal insights, thereby significantly improving gaze estimation accuracy. Additionally, our approach integrates Gaussian processes to include individual-specific traits, facilitating the personalization of our model with just a few labeled samples. Experimental results confirm the efficacy of the proposed approach, demonstrating its success in both within-dataset and cross-dataset settings. Specifically, our proposed approach achieves state-of-the-art performance on the Gaze360 dataset, improving by 2.5° without personalization. Further, by personalizing the model with just three samples, we achieved an additional improvement of 0.8°. The code and pre-trained models are available at https://github.com/jswati31/stage.
Collapse
|
15
|
Qian K, Arichi T, Edwards AD, Hajnal JV. Instant interaction driven adaptive gaze control interface. Sci Rep 2024; 14:11661. [PMID: 38778122 PMCID: PMC11111737 DOI: 10.1038/s41598-024-62365-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 05/16/2024] [Indexed: 05/25/2024] Open
Abstract
Gaze estimation is long been recognised as having potential as the basis for human-computer interaction (HCI) systems, but usability and robustness of performance remain challenging . This work focuses on systems in which there is a live video stream showing enough of the subjects face to track eye movements and some means to infer gaze location from detected eye features. Currently, systems generally require some form of calibration or set-up procedure at the start of each user session. Here we explore some simple strategies for enabling gaze based HCI to operate immediately and robustly without any explicit set-up tasks. We explore different choices of coordinate origin for combining extracted features from multiple subjects and the replacement of subject specific calibration by system initiation based on prior models. Results show that referencing all extracted features to local coordinate origins determined by subject start position enables robust immediate operation. Combining this approach with an adaptive gaze estimation model using an interactive user interface enables continuous operation with the 75th percentile gaze errors of 0.7∘ , and maximum gaze errors of 1.7∘ during prospective testing. There constitute state-of-the-art results and have the potential to enable a new generation of reliable gaze based HCI systems.
Collapse
Affiliation(s)
- Kun Qian
- King's College London, Centre for the Developing Brain, School of Biomedical Engineering and Imaging Sciences, London, SE1 7EH, UK.
| | - Tomoki Arichi
- King's College London, Centre for the Developing Brain, School of Biomedical Engineering and Imaging Sciences, London, SE1 7EH, UK
| | - A David Edwards
- King's College London, Centre for the Developing Brain, School of Biomedical Engineering and Imaging Sciences, London, SE1 7EH, UK
| | - Joseph V Hajnal
- King's College London, Centre for the Developing Brain, School of Biomedical Engineering and Imaging Sciences, London, SE1 7EH, UK.
| |
Collapse
|
16
|
Johari K, Bhardwaj R, Kim JJ, Yow WQ, Tan UX. Eye movement analysis for real-world settings using segmented linear regression. Comput Biol Med 2024; 174:108364. [PMID: 38599067 DOI: 10.1016/j.compbiomed.2024.108364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/02/2024] [Accepted: 03/21/2024] [Indexed: 04/12/2024]
Abstract
Eye movement analysis is critical to studying human brain phenomena such as perception, cognition, and behavior. However, under uncontrolled real-world settings, the recorded gaze coordinates (commonly used to track eye movements) are typically noisy and make it difficult to track change in the state of each phenomenon precisely, primarily because the expected change is usually a slower transient process. This paper proposes an approach, Improved Naive Segmented linear regression (INSLR), which approximates the gaze coordinates with a piecewise linear function (PLF) referred to as a hypothesis. INSLR improves the existing NSLR approach by employing a hypotheses clustering algorithm, which redefines the final hypothesis estimation in two steps: (1) At each time-stamp, measure the likelihood of each hypothesis in the candidate list of hypotheses by using the least square fit score and its distance from the k-means of the hypotheses in the list. (2) Filter hypothesis based on a pre-defined threshold. We demonstrate the significance of the INSLR method in addressing the challenges of uncontrolled real-world settings such as gaze denoising and minimizing gaze prediction errors from cost-effective devices like webcams. Experiment results show INSLR consistently outperforms the baseline NSLR in denoising noisy signals from three eye movement datasets and minimizes the error in gaze prediction from a low precision device for 71.1% samples. Furthermore, this improvement in denoising quality is further validated by the improved accuracy of the oculomotor event classifier called NSLR-HMM and enhanced sensitivity in detecting variations in attention induced by distractor during online lecture.
Collapse
Affiliation(s)
- Kritika Johari
- Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore.
| | - Rishabh Bhardwaj
- Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Singapore
| | - Jung-Jae Kim
- Institute for Infocomm Research, Agency for Science Technology and Research (A*STAR), Singapore
| | - Wei Quin Yow
- Humanities, Arts and Social Sciences, Singapore University of Technology and Design, Singapore
| | - U-Xuan Tan
- Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore
| |
Collapse
|
17
|
Combe T, Fribourg R, Detto L, Norm JM. Exploring the Influence of Virtual Avatar Heads in Mixed Reality on Social Presence, Performance and User Experience in Collaborative Tasks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2206-2216. [PMID: 38437082 DOI: 10.1109/tvcg.2024.3372051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
In Mixed Reality (MR), users' heads are largely (if not completely) occluded by the MR Head-Mounted Display (HMD) they are wearing. As a consequence, one cannot see their facial expressions and other communication cues when interacting locally. In this paper, we investigate how displaying virtual avatars' heads on-top of the (HMD-occluded) heads of participants in a Video See-Through (VST) Mixed Reality local collaborative task could improve their collaboration as well as social presence. We hypothesized that virtual heads would convey more communicative cues (such as eye direction or facial expressions) hidden by the MR HMDs and lead to better collaboration and social presence. To do so, we conducted a between-subject study ($\mathrm{n}=88$) with two independent variables: the type of avatar (CartoonAvatar/RealisticAvatar/NoAvatar) and the level of facial expressions provided (HighExpr/LowExpr). The experiment involved two dyadic communication tasks: (i) the "20-question" game where one participant asks questions to guess a hidden word known by the other participant and (ii) a urban planning problem where participants have to solve a puzzle by collaborating. Each pair of participants performed both tasks using a specific type of avatar and facial animation. Our results indicate that while adding an avatar's head does not necessarily improve social presence, the amount of facial expressions provided through the social interaction does have an impact. Moreover, participants rated their performance higher when observing a realistic avatar but rated the cartoon avatars as less uncanny. Taken together, our results contribute to a better understanding of the role of partial avatars in local MR collaboration and pave the way for further research exploring collaboration in different scenarios, with different avatar types or MR setups.
Collapse
|
18
|
Zhong W, Xia C, Zhang D, Han J. Uncertainty Modeling for Gaze Estimation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:2851-2866. [PMID: 38358877 DOI: 10.1109/tip.2024.3364539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Gaze estimation is an important fundamental task in computer vision and medical research. Existing works have explored various effective paradigms and modules for precisely predicting eye gazes. However, the uncertainty for gaze estimation, e.g., input uncertainty and annotation uncertainty, have been neglected in previous research. Existing models use a deterministic function to estimate the gaze, which cannot reflect the actual situation in gaze estimation. To address this issue, we propose a probabilistic framework for gaze estimation by modeling the input uncertainty and annotation uncertainty. We first utilize probabilistic embeddings to model the input uncertainty, representing the input image as a Gaussian distribution in the embedding space. Based on the input uncertainty modeling, we give an instance-wise uncertainty estimation to measure the confidence of prediction results, which is critical in practical applications. Then, we propose a new label distribution learning method, probabilistic annotations, to model the annotation uncertainty, representing the raw hard labels as Gaussian distributions. In addition, we develop an Embedding Distribution Smoothing (EDS) module and a hard example mining method to improve the consistency between embedding distribution and label distribution. We conduct extensive experiments, demonstrating that the proposed approach achieves significant improvements over baseline and state-of-the-art methods on two widely used benchmark datasets, GazeCapture and MPIIFaceGaze, as well as our collected dataset using mobile devices.
Collapse
|
19
|
Saxena S, Fink LK, Lange EB. Deep learning models for webcam eye tracking in online experiments. Behav Res Methods 2024; 56:3487-3503. [PMID: 37608235 PMCID: PMC11133145 DOI: 10.3758/s13428-023-02190-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2023] [Indexed: 08/24/2023]
Abstract
Eye tracking is prevalent in scientific and commercial applications. Recent computer vision and deep learning methods enable eye tracking with off-the-shelf webcams and reduce dependence on expensive, restrictive hardware. However, such deep learning methods have not yet been applied and evaluated for remote, online psychological experiments. In this study, we tackle critical challenges faced in remote eye tracking setups and systematically evaluate appearance-based deep learning methods of gaze tracking and blink detection. From their own homes and laptops, 65 participants performed a battery of eye tracking tasks including (i) fixation, (ii) zone classification, (iii) free viewing, (iv) smooth pursuit, and (v) blink detection. Webcam recordings of the participants performing these tasks were processed offline through appearance-based models of gaze and blink detection. The task battery required different eye movements that characterized gaze and blink prediction accuracy over a comprehensive list of measures. We find the best gaze accuracy to be 2.4° and precision of 0.47°, which outperforms previous online eye tracking studies and reduces the gap between laboratory-based and online eye tracking performance. We release the experiment template, recorded data, and analysis code with the motivation to escalate affordable, accessible, and scalable eye tracking that has the potential to accelerate research in the fields of psychological science, cognitive neuroscience, user experience design, and human-computer interfaces.
Collapse
Affiliation(s)
- Shreshth Saxena
- Music Depart., Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
- Dept. of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario, Canada.
| | - Lauren K Fink
- Music Depart., Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Dept. of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario, Canada
- Max Planck - NYU Center for Language Music & Emotion, Frankfurt am Main, Germany
| | - Elke B Lange
- Music Depart., Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| |
Collapse
|
20
|
Liu J, Chi J, Yang Z. A review on personal calibration issues for video-oculographic-based gaze tracking. Front Psychol 2024; 15:1309047. [PMID: 38572211 PMCID: PMC10987702 DOI: 10.3389/fpsyg.2024.1309047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/08/2024] [Indexed: 04/05/2024] Open
Abstract
Personal calibration is a process of obtaining personal gaze-related information by focusing on some calibration benchmarks when the user initially uses a gaze tracking system. It not only provides conditions for gaze estimation, but also improves gaze tracking performance. Existing eye-tracking products often require users to conduct explicit personal calibration first, thereby tracking and interacting based on their gaze. This calibration mode has certain limitations, and there is still a significant gap between theoretical personal calibration methods and their practicality. Therefore, this paper reviews the issues of personal calibration for video-oculographic-based gaze tracking. The personal calibration information in typical gaze tracking methods is first summarized, and then some main settings in existing personal calibration processes are analyzed. Several personal calibration modes are discussed and compared subsequently. The performance of typical personal calibration methods for 2D and 3D gaze tracking is quantitatively compared through simulation experiments, highlighting the characteristics of different personal calibration settings. On this basis, we discuss several key issues in designing personal calibration. To the best of our knowledge, this is the first review on personal calibration issues for video-oculographic-based gaze tracking. It aims to provide a comprehensive overview of the research status of personal calibration, explore its main directions for further studies, and provide guidance for seeking personal calibration modes that conform to natural human-computer interaction and promoting the widespread application of eye-movement interaction.
Collapse
Affiliation(s)
- Jiahui Liu
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
- Beijing Engineering Research Center of Industrial Spectrum Imaging, University of Science and Technology Beijing, Beijing, China
- Shunde Innovation School, University of Science and Technology Beijing, Foshan, China
| | - Jiannan Chi
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
- Beijing Engineering Research Center of Industrial Spectrum Imaging, University of Science and Technology Beijing, Beijing, China
- Shunde Innovation School, University of Science and Technology Beijing, Foshan, China
| | - Zuoyun Yang
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
| |
Collapse
|
21
|
Valtakari NV, Hessels RS, Niehorster DC, Viktorsson C, Nyström P, Falck-Ytter T, Kemner C, Hooge ITC. A field test of computer-vision-based gaze estimation in psychology. Behav Res Methods 2024; 56:1900-1915. [PMID: 37101100 PMCID: PMC10990994 DOI: 10.3758/s13428-023-02125-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2023] [Indexed: 04/28/2023]
Abstract
Computer-vision-based gaze estimation refers to techniques that estimate gaze direction directly from video recordings of the eyes or face without the need for an eye tracker. Although many such methods exist, their validation is often found in the technical literature (e.g., computer science conference papers). We aimed to (1) identify which computer-vision-based gaze estimation methods are usable by the average researcher in fields such as psychology or education, and (2) evaluate these methods. We searched for methods that do not require calibration and have clear documentation. Two toolkits, OpenFace and OpenGaze, were found to fulfill these criteria. First, we present an experiment where adult participants fixated on nine stimulus points on a computer screen. We filmed their face with a camera and processed the recorded videos with OpenFace and OpenGaze. We conclude that OpenGaze is accurate and precise enough to be used in screen-based experiments with stimuli separated by at least 11 degrees of gaze angle. OpenFace was not sufficiently accurate for such situations but can potentially be used in sparser environments. We then examined whether OpenFace could be used with horizontally separated stimuli in a sparse environment with infant participants. We compared dwell measures based on OpenFace estimates to the same measures based on manual coding. We conclude that OpenFace gaze estimates may potentially be used with measures such as relative total dwell time to sparse, horizontally separated areas of interest, but should not be used to draw conclusions about measures such as dwell duration.
Collapse
Affiliation(s)
- Niilo V Valtakari
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584 CS, Utrecht, the Netherlands.
| | - Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584 CS, Utrecht, the Netherlands
| | - Diederick C Niehorster
- Lund University Humanities Lab, Lund University, Lund, Sweden
- Department of Psychology, Lund University, Lund, Sweden
| | - Charlotte Viktorsson
- Development and Neurodiversity Lab, Department of Psychology, Uppsala University, Uppsala, Sweden
| | - Pär Nyström
- Uppsala Child and Baby Lab, Department of Psychology, Uppsala University, Uppsala, Sweden
| | - Terje Falck-Ytter
- Development and Neurodiversity Lab, Department of Psychology, Uppsala University, Uppsala, Sweden
- Karolinska Institutet Center of Neurodevelopmental Disorders (KIND), Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Chantal Kemner
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584 CS, Utrecht, the Netherlands
| | - Ignace T C Hooge
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
22
|
Mokatren M, Kuflik T, Shimshoni I. Calibration-Free Mobile Eye-Tracking Using Corneal Imaging. SENSORS (BASEL, SWITZERLAND) 2024; 24:1237. [PMID: 38400392 PMCID: PMC10892865 DOI: 10.3390/s24041237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 02/25/2024]
Abstract
In this paper, we present and evaluate a calibration-free mobile eye-traking system. The system's mobile device consists of three cameras: an IR eye camera, an RGB eye camera, and a front-scene RGB camera. The three cameras build a reliable corneal imaging system that is used to estimate the user's point of gaze continuously and reliably. The system auto-calibrates the device unobtrusively. Since the user is not required to follow any special instructions to calibrate the system, they can simply put on the eye tracker and start moving around using it. Deep learning algorithms together with 3D geometric computations were used to auto-calibrate the system per user. Once the model is built, a point-to-point transformation from the eye camera to the front camera is computed automatically by matching corneal and scene images, which allows the gaze point in the scene image to be estimated. The system was evaluated by users in real-life scenarios, indoors and outdoors. The average gaze error was 1.6∘ indoors and 1.69∘ outdoors, which is considered very good compared to state-of-the-art approaches.
Collapse
Affiliation(s)
| | | | - Ilan Shimshoni
- The Department of Information Systems, University of Haifa, Haifa 3498838, Israel; (M.M.); (T.K.)
| |
Collapse
|
23
|
Ghosh S, Dhall A, Hayat M, Knibbe J, Ji Q. Automatic Gaze Analysis: A Survey of Deep Learning Based Approaches. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:61-84. [PMID: 37966935 DOI: 10.1109/tpami.2023.3321337] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
Eye gaze analysis is an important research problem in the field of Computer Vision and Human-Computer Interaction. Even with notable progress in the last 10 years, automatic gaze analysis still remains challenging due to the uniqueness of eye appearance, eye-head interplay, occlusion, image quality, and illumination conditions. There are several open questions, including what are the important cues to interpret gaze direction in an unconstrained environment without prior knowledge and how to encode them in real-time. We review the progress across a range of gaze analysis tasks and applications to elucidate these fundamental questions, identify effective methods in gaze analysis, and provide possible future directions. We analyze recent gaze estimation and segmentation methods, especially in the unsupervised and weakly supervised domain, based on their advantages and reported evaluation metrics. Our analysis shows that the development of a robust and generic gaze analysis method still needs to address real-world challenges such as unconstrained setup and learning with less supervision. We conclude by discussing future research directions for designing a real-world gaze analysis system that can propagate to other domains including Computer Vision, Augmented Reality (AR), Virtual Reality (VR), and Human Computer Interaction (HCI).
Collapse
|
24
|
Tian S, Tu H, He L, Wu YI, Zheng X. FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video. SENSORS (BASEL, SWITZERLAND) 2023; 23:9604. [PMID: 38067977 PMCID: PMC10708753 DOI: 10.3390/s23239604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 11/25/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023]
Abstract
Gaze is a significant behavioral characteristic that can be used to reflect a person's attention. In recent years, there has been a growing interest in estimating gaze from facial videos. However, gaze estimation remains a challenging problem due to variations in appearance and head poses. To address this, a framework for 3D gaze estimation using appearance cues is developed in this study. The framework begins with an end-to-end approach to detect facial landmarks. Subsequently, we employ a normalization method and improve the normalization method using orthogonal matrices and conduct comparative experiments to prove that the improved normalization method has a higher accuracy and a lower computational time in gaze estimation. Finally, we introduce a dual-branch convolutional neural network, named FG-Net, which processes the normalized images and extracts eye and face features through two branches. The extracted multi-features are then integrated and input into a fully connected layer to estimate the 3D gaze vectors. To evaluate the performance of our approach, we conduct ten-fold cross-validation experiments on two public datasets, namely MPIIGaze and EyeDiap, achieving remarkable accuracies of 3.11° and 2.75°, respectively. The results demonstrate the high effectiveness of our proposed framework, showcasing its state-of-the-art performance in 3D gaze estimation.
Collapse
Affiliation(s)
- Shang Tian
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
- Key Laboratory of Information and Automation Technology of Sichuan Province, Sichuan University, Chengdu 610065, China
| | - Haiyan Tu
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
- Key Laboratory of Information and Automation Technology of Sichuan Province, Sichuan University, Chengdu 610065, China
| | - Ling He
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yue Ivan Wu
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Xiujuan Zheng
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
- Key Laboratory of Information and Automation Technology of Sichuan Province, Sichuan University, Chengdu 610065, China
| |
Collapse
|
25
|
Hou JC, Li CJ, Chou CC, Shih YC, Fong SL, Dufau SE, Lin PT, Tsao Y, McGonigal A, Yu HY. Artificial Intelligence-Based Face Transformation in Patient Seizure Videos for Privacy Protection. MAYO CLINIC PROCEEDINGS. DIGITAL HEALTH 2023; 1:619-628. [PMID: 40206307 PMCID: PMC11975641 DOI: 10.1016/j.mcpdig.2023.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
Objective To investigate the feasibility and accuracy of artificial intelligence (AI) methods of facial deidentification in hospital-recorded epileptic seizure videos, for improved patient privacy protection while preserving clinically important features of seizure semiology. Patients and Methods Videos of epileptic seizures displaying seizure-related involuntary facial changes were selected from recordings at Taipei Veterans General Hospital Epilepsy Unit (between August 1, 2020 and February 28, 2023), and a single representative video frame was prepared per seizure. We tested 3 AI transformation models: (1) morphing the original facial image with a different male face; (2) substitution with a female face; and (3) cartoonization. Facial deidentification and preservation of clinically relevant facial detail were calculated based on: (1) scoring by 5 independent expert clinicians and (2) objective computation. Results According to the clinician scoring of 26 facial frames in 16 patients, the best compromise between deidentification and preservation of facial semiology was the cartoonization model. A male facial morphing model was superior to the cartoonization model for deidentification, but clinical detail was sacrificed. Objective similarity testing of video data reported deidentification scores in agreement with the clinicians' scores; however, preservation of semiology gave mixed results likely due to inadequate existing comparative databases. Conclusion Artificial intelligence-based face transformation of medical seizure videos is feasible and may be useful for patient privacy protection. In our study, the cartoonization approach provided the best compromise between deidentification and preservation of seizure semiology.
Collapse
Affiliation(s)
- Jen-Cheng Hou
- Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
| | - Chin-Jou Li
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Chien-Chen Chou
- Department of Neurology, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan
- School of Medicine, National Yang Ming Chiao Tung University College of Medicine, Taipei, Taiwan
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yen-Cheng Shih
- Department of Neurology, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan
- School of Medicine, National Yang Ming Chiao Tung University College of Medicine, Taipei, Taiwan
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Si-Lei Fong
- Department of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Stephane E. Dufau
- Laboratoire de psychologie cognitive (UMR7290), CNRS & Aix-Marseille University, France
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
- CROSSING, IRL 2010, CNRS, Adelaide, Australia
| | - Po-Tso Lin
- Department of Neurology, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan
- School of Medicine, National Yang Ming Chiao Tung University College of Medicine, Taipei, Taiwan
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yu Tsao
- Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
| | - Aileen McGonigal
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
- CROSSING, IRL 2010, CNRS, Adelaide, Australia
- Department of Neurosciences, Mater Misericordiae Hospital, Brisbane, Queensland, Australia and Mater Research Institute, Faculty of Medicine, University of Queensland, Australia
| | - Hsiang-Yu Yu
- Department of Neurology, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan
- School of Medicine, National Yang Ming Chiao Tung University College of Medicine, Taipei, Taiwan
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| |
Collapse
|
26
|
Lesport Q, Joerger G, Kaminski HJ, Girma H, McNett S, Abu-Rub M, Garbey M. Eye Segmentation Method for Telehealth: Application to the Myasthenia Gravis Physical Examination. SENSORS (BASEL, SWITZERLAND) 2023; 23:7744. [PMID: 37765800 PMCID: PMC10536520 DOI: 10.3390/s23187744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 08/28/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023]
Abstract
Due to the precautions put in place during the COVID-19 pandemic, utilization of telemedicine has increased quickly for patient care and clinical trials. Unfortunately, teleconsultation is closer to a video conference than a medical consultation, with the current solutions setting the patient and doctor into an evaluation that relies entirely on a two-dimensional view of each other. We are developing a patented telehealth platform that assists with diagnostic testing of ocular manifestations of myasthenia gravis. We present a hybrid algorithm combining deep learning with computer vision to give quantitative metrics of ptosis and ocular muscle fatigue leading to eyelid droop and diplopia. The method works both on a fixed image and frame by frame of the video in real-time, allowing capture of dynamic muscular weakness during the examination. We then use signal processing and filtering to derive robust metrics of ptosis and l ocular misalignment. In our construction, we have prioritized the robustness of the method versus accuracy obtained in controlled conditions in order to provide a method that can operate in standard telehealth conditions. The approach is general and can be applied to many disorders of ocular motility and ptosis.
Collapse
Affiliation(s)
- Quentin Lesport
- Department of Surgery, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA;
| | | | - Henry J. Kaminski
- Department of Neurology & Rehabilitation Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; (H.J.K.); (H.G.); (S.M.); (M.A.-R.)
| | - Helen Girma
- Department of Neurology & Rehabilitation Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; (H.J.K.); (H.G.); (S.M.); (M.A.-R.)
| | - Sienna McNett
- Department of Neurology & Rehabilitation Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; (H.J.K.); (H.G.); (S.M.); (M.A.-R.)
| | - Mohammad Abu-Rub
- Department of Neurology & Rehabilitation Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; (H.J.K.); (H.G.); (S.M.); (M.A.-R.)
| | - Marc Garbey
- Department of Surgery, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA;
- Care Constitution Corp., Newark, DE 19702, USA;
- LaSIE, UMR CNRS 7356, Université de la Rochelle, 17000 La Rochelle, France
| |
Collapse
|
27
|
Kopecek M, Kremlacek J. Eye-tracking control of an adjustable electric bed: construction and validation by immobile patients with multiple sclerosis. J Neuroeng Rehabil 2023; 20:75. [PMID: 37296480 PMCID: PMC10251586 DOI: 10.1186/s12984-023-01193-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 05/10/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND In severe conditions of limited motor abilities, frequent position changes for work or passive and active rest are essential bedside activities to prevent further health complications. We aimed to develop a system using eye movements for bed positioning and to verify its functionality in a control group and a group of patients with significant motor limitation caused by multiple sclerosis. METHODS The eye-tracking system utilized an innovative digital-to-analog converter module to control the positioning bed via a novel graphical user interface. We verified the ergonomics and usability of the system by performing a fixed sequence of positioning tasks, in which the leg and head support was repeatedly raised and then lowered. Fifteen women and eleven men aged 42.7 ± 15.9 years in the control group and nine women and eight men aged 60.3 ± 9.14 years in the patient group participated in the experiment. The degree of disability, according to the Expanded Disability Status Scale (EDSS), ranged from 7 to 9.5 points in the patients. We assessed the speed and efficiency of the bed control and the improvement during testing. In a questionnaire, we evaluated satisfaction with the system. RESULTS The control group mastered the task in 40.2 s (median) with an interquartile interval from 34.5 to 45.5 s, and patients mastered the task in in 56.5 (median) with an interquartile interval from 46.5 to 64.9 s. The efficiency of solving the task (100% corresponds to an optimal performance) was 86.3 (81.6; 91.0) % for the control group and 72.1 (63.0; 75.2) % for the patient group. Throughout testing, the patients learned to communicate with the system, and their efficiency and task time improved. A correlation analysis showed a negative relationship (rho = - 0.587) between efficiency improvement and the degree of impairment (EDSS). In the control group, the learning was not significant. On the questionnaire survey, sixteen patients reported gaining confidence in bed control. Seven patients preferred the offered form of bed control, and in six cases, they would choose another form of interface. CONCLUSIONS The proposed system and communication through eye movements are reliable for positioning the bed in people affected by advanced multiple sclerosis. Seven of 17 patients indicated that they would choose this system for bed control and wished to extend it for another application.
Collapse
Affiliation(s)
- Martin Kopecek
- Department of Medical Biophysics, Faculty of Medicine in Hradec Kralove, Charles University, Simkova 870, Hradec Kralove, Czech Republic
| | - Jan Kremlacek
- Department of Medical Biophysics, Faculty of Medicine in Hradec Kralove, Charles University, Simkova 870, Hradec Kralove, Czech Republic
| |
Collapse
|
28
|
Zhou J, Li G, Shi F, Guo X, Wan P, Wang M. EM-Gaze: eye context correlation and metric learning for gaze estimation. Vis Comput Ind Biomed Art 2023; 6:8. [PMID: 37145171 PMCID: PMC10163188 DOI: 10.1186/s42492-023-00135-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/15/2023] [Indexed: 05/06/2023] Open
Abstract
In recent years, deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction. Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images. This study presents a deep neural network for 2D gaze estimation on mobile devices. It achieves state-of-the-art 2D gaze point regression error, while significantly improving gaze classification error on quadrant divisions of the display. To this end, an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance. Subsequently, through a unified perspective for gaze estimation, metric learning for gaze classification on quadrant divisions is incorporated as additional supervision. Consequently, both gaze point regression and quadrant classification performances are improved. The experiments demonstrate that the proposed method outperforms existing gaze-estimation methods on the GazeCapture and MPIIFaceGaze datasets.
Collapse
Affiliation(s)
- Jinchao Zhou
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, 100191, China
| | - Guoan Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, 100191, China
| | - Feng Shi
- Kuaishou Technology, Beijing, 100085, China
| | | | | | - Miao Wang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, 100191, China.
| |
Collapse
|
29
|
Fan S, Shen Z, Jiang M, Koenig BL, Kankanhalli MS, Zhao Q. Emotional Attention: From Eye Tracking to Computational Modeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1682-1699. [PMID: 35446761 DOI: 10.1109/tpami.2022.3169234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Attending selectively to emotion-eliciting stimuli is intrinsic to human vision. In this research, we investigate how emotion-elicitation features of images relate to human selective attention. We create the EMOtional attention dataset (EMOd). It is a set of diverse emotion-eliciting images, each with (1) eye-tracking data from 16 subjects, (2) image context labels at both object- and scene-level. Based on analyses of human perceptions of EMOd, we report an emotion prioritization effect: emotion-eliciting content draws stronger and earlier human attention than neutral content, but this advantage diminishes dramatically after initial fixation. We find that human attention is more focused on awe eliciting and aesthetic vehicle and animal scenes in EMOd. Aiming to model the above human attention behavior computationally, we design a deep neural network (CASNet II), which includes a channel weighting subnetwork that prioritizes emotion-eliciting objects, and an Atrous Spatial Pyramid Pooling (ASPP) structure that learns the relative importance of image regions at multiple scales. Visualizations and quantitative analyses demonstrate the model's ability to simulate human attention behavior, especially on emotion-eliciting content.
Collapse
|
30
|
Holmqvist K, Örbom SL, Hooge ITC, Niehorster DC, Alexander RG, Andersson R, Benjamins JS, Blignaut P, Brouwer AM, Chuang LL, Dalrymple KA, Drieghe D, Dunn MJ, Ettinger U, Fiedler S, Foulsham T, van der Geest JN, Hansen DW, Hutton SB, Kasneci E, Kingstone A, Knox PC, Kok EM, Lee H, Lee JY, Leppänen JM, Macknik S, Majaranta P, Martinez-Conde S, Nuthmann A, Nyström M, Orquin JL, Otero-Millan J, Park SY, Popelka S, Proudlock F, Renkewitz F, Roorda A, Schulte-Mecklenbeck M, Sharif B, Shic F, Shovman M, Thomas MG, Venrooij W, Zemblys R, Hessels RS. Eye tracking: empirical foundations for a minimal reporting guideline. Behav Res Methods 2023; 55:364-416. [PMID: 35384605 PMCID: PMC9535040 DOI: 10.3758/s13428-021-01762-8] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2021] [Indexed: 11/08/2022]
Abstract
In this paper, we present a review of how the various aspects of any study using an eye tracker (such as the instrument, methodology, environment, participant, etc.) affect the quality of the recorded eye-tracking data and the obtained eye-movement and gaze measures. We take this review to represent the empirical foundation for reporting guidelines of any study involving an eye tracker. We compare this empirical foundation to five existing reporting guidelines and to a database of 207 published eye-tracking studies. We find that reporting guidelines vary substantially and do not match with actual reporting practices. We end by deriving a minimal, flexible reporting guideline based on empirical research (Section "An empirically based minimal reporting guideline").
Collapse
Affiliation(s)
- Kenneth Holmqvist
- Department of Psychology, Nicolaus Copernicus University, Torun, Poland.
- Department of Computer Science and Informatics, University of the Free State, Bloemfontein, South Africa.
- Department of Psychology, Regensburg University, Regensburg, Germany.
| | - Saga Lee Örbom
- Department of Psychology, Regensburg University, Regensburg, Germany
| | - Ignace T C Hooge
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | - Diederick C Niehorster
- Lund University Humanities Lab and Department of Psychology, Lund University, Lund, Sweden
| | - Robert G Alexander
- Department of Ophthalmology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | | | - Jeroen S Benjamins
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
- Social, Health and Organizational Psychology, Utrecht University, Utrecht, The Netherlands
| | - Pieter Blignaut
- Department of Computer Science and Informatics, University of the Free State, Bloemfontein, South Africa
| | | | - Lewis L Chuang
- Department of Ergonomics, Leibniz Institute for Working Environments and Human Factors, Dortmund, Germany
- Institute of Informatics, LMU Munich, Munich, Germany
| | | | - Denis Drieghe
- School of Psychology, University of Southampton, Southampton, UK
| | - Matt J Dunn
- School of Optometry and Vision Sciences, Cardiff University, Cardiff, UK
| | | | - Susann Fiedler
- Vienna University of Economics and Business, Vienna, Austria
| | - Tom Foulsham
- Department of Psychology, University of Essex, Essex, UK
| | | | - Dan Witzner Hansen
- Machine Learning Group, Department of Computer Science, IT University of Copenhagen, Copenhagen, Denmark
| | | | - Enkelejda Kasneci
- Human-Computer Interaction, University of Tübingen, Tübingen, Germany
| | | | - Paul C Knox
- Department of Eye and Vision Science, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool, UK
| | - Ellen M Kok
- Department of Education and Pedagogy, Division Education, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
- Department of Online Learning and Instruction, Faculty of Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands
| | - Helena Lee
- University of Southampton, Southampton, UK
| | - Joy Yeonjoo Lee
- School of Health Professions Education, Faculty of Health, Medicine, and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Jukka M Leppänen
- Department of Psychology and Speed-Language Pathology, University of Turku, Turku, Finland
| | - Stephen Macknik
- Department of Ophthalmology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Päivi Majaranta
- TAUCHI Research Center, Computing Sciences, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
| | - Susana Martinez-Conde
- Department of Ophthalmology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Antje Nuthmann
- Institute of Psychology, University of Kiel, Kiel, Germany
| | - Marcus Nyström
- Lund University Humanities Lab, Lund University, Lund, Sweden
| | - Jacob L Orquin
- Department of Management, Aarhus University, Aarhus, Denmark
- Center for Research in Marketing and Consumer Psychology, Reykjavik University, Reykjavik, Iceland
| | - Jorge Otero-Millan
- Herbert Wertheim School of Optometry and Vision Science, University of California, Berkeley, CA, USA
| | - Soon Young Park
- Comparative Cognition, Messerli Research Institute, University of Veterinary Medicine Vienna, Medical University of Vienna, Vienna, Austria
| | - Stanislav Popelka
- Department of Geoinformatics, Palacký University Olomouc, Olomouc, Czech Republic
| | - Frank Proudlock
- The University of Leicester Ulverscroft Eye Unit, Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, UK
| | - Frank Renkewitz
- Department of Psychology, University of Erfurt, Erfurt, Germany
| | - Austin Roorda
- Herbert Wertheim School of Optometry and Vision Science, University of California, Berkeley, CA, USA
| | | | - Bonita Sharif
- School of Computing, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Frederick Shic
- Center for Child Health, Behavior and Development, Seattle Children's Research Institute, Seattle, WA, USA
- Department of General Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Mark Shovman
- Eyeviation Systems, Herzliya, Israel
- Department of Industrial Design, Bezalel Academy of Arts and Design, Jerusalem, Israel
| | - Mervyn G Thomas
- The University of Leicester Ulverscroft Eye Unit, Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, UK
| | - Ward Venrooij
- Electrical Engineering, Mathematics and Computer Science (EEMCS), University of Twente, Enschede, The Netherlands
| | | | - Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
31
|
Mokatren M, Kuflik T, Shimshoni I. 3D Gaze Estimation Using RGB-IR Cameras. SENSORS (BASEL, SWITZERLAND) 2022; 23:381. [PMID: 36616978 PMCID: PMC9823916 DOI: 10.3390/s23010381] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
In this paper, we present a framework for 3D gaze estimation intended to identify the user's focus of attention in a corneal imaging system. The framework uses a headset that consists of three cameras, a scene camera and two eye cameras: an IR camera and an RGB camera. The IR camera is used to continuously and reliably track the pupil and the RGB camera is used to acquire corneal images of the same eye. Deep learning algorithms are trained to detect the pupil in IR and RGB images and to compute a per user 3D model of the eye in real time. Once the 3D model is built, the 3D gaze direction is computed starting from the eyeball center and passing through the pupil center to the outside world. This model can also be used to transform the pupil position detected in the IR image into its corresponding position in the RGB image and to detect the gaze direction in the corneal image. This technique circumvents the problem of pupil detection in RGB images, which is especially difficult and unreliable when the scene is reflected in the corneal images. In our approach, the auto-calibration process is transparent and unobtrusive. Users do not have to be instructed to look at specific objects to calibrate the eye tracker. They need only to act and gaze normally. The framework was evaluated in a user study in realistic settings and the results are promising. It achieved a very low 3D gaze error (2.12°) and very high accuracy in acquiring corneal images (intersection over union-IoU = 0.71). The framework may be used in a variety of real-world mobile scenarios (indoors, indoors near windows and outdoors) with high accuracy.
Collapse
|
32
|
Ban S, Lee YJ, Kim KR, Kim JH, Yeo WH. Advances in Materials, Sensors, and Integrated Systems for Monitoring Eye Movements. BIOSENSORS 2022; 12:1039. [PMID: 36421157 PMCID: PMC9688058 DOI: 10.3390/bios12111039] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/11/2022] [Accepted: 11/13/2022] [Indexed: 06/16/2023]
Abstract
Eye movements show primary responses that reflect humans' voluntary intention and conscious selection. Because visual perception is one of the fundamental sensory interactions in the brain, eye movements contain critical information regarding physical/psychological health, perception, intention, and preference. With the advancement of wearable device technologies, the performance of monitoring eye tracking has been significantly improved. It also has led to myriad applications for assisting and augmenting human activities. Among them, electrooculograms, measured by skin-mounted electrodes, have been widely used to track eye motions accurately. In addition, eye trackers that detect reflected optical signals offer alternative ways without using wearable sensors. This paper outlines a systematic summary of the latest research on various materials, sensors, and integrated systems for monitoring eye movements and enabling human-machine interfaces. Specifically, we summarize recent developments in soft materials, biocompatible materials, manufacturing methods, sensor functions, systems' performances, and their applications in eye tracking. Finally, we discuss the remaining challenges and suggest research directions for future studies.
Collapse
Affiliation(s)
- Seunghyeb Ban
- School of Engineering and Computer Science, Washington State University, Vancouver, WA 98686, USA
- IEN Center for Human-Centric Interfaces and Engineering, Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Yoon Jae Lee
- IEN Center for Human-Centric Interfaces and Engineering, Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA 30332, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Ka Ram Kim
- IEN Center for Human-Centric Interfaces and Engineering, Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA 30332, USA
- George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Jong-Hoon Kim
- School of Engineering and Computer Science, Washington State University, Vancouver, WA 98686, USA
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| | - Woon-Hong Yeo
- IEN Center for Human-Centric Interfaces and Engineering, Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA 30332, USA
- George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University School of Medicine, Atlanta, GA 30332, USA
- Neural Engineering Center, Institute for Materials, Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
33
|
Episode-based personalization network for gaze estimation without calibration. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
34
|
Kim HG, Chang JY. ArbGaze: Gaze Estimation from Arbitrary-Sized Low-Resolution Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:7427. [PMID: 36236526 PMCID: PMC9571979 DOI: 10.3390/s22197427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
The goal of gaze estimation is to estimate a gaze vector from an image containing a face or eye(s). Most existing studies use pre-defined fixed-resolution images to estimate the gaze vector. However, images captured from in-the-wild environments may have various resolutions, and variation in resolution can degrade gaze estimation performance. To address this problem, a gaze estimation method from arbitrary-sized low-resolution images is proposed. The basic idea of the proposed method is to combine knowledge distillation and feature adaptation. Knowledge distillation helps the gaze estimator for arbitrary-sized images generate a feature map similar to that from a high-resolution image. Feature adaptation makes creating a feature map adaptive to various resolutions of an input image possible by using a low-resolution image and its scale information together. It is shown that combining these two ideas improves gaze estimation performance substantially in the ablation study. It is also demonstrated that the proposed method can be generalized to other popularly used gaze estimation models through experiments using various backbones.
Collapse
Affiliation(s)
- Hee Gyoon Kim
- Department of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, Korea
| | - Ju Yong Chang
- Department of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, Korea
| |
Collapse
|
35
|
Sakthivelpathi V, Qian Z, Li T, Ahn S, Dichiara AB, Soetedjo R, Chung JH. Capacitive Eye Tracker Made of Fractured Carbon Nanotube-Paper Composites for Wearable Applications. SENSORS AND ACTUATORS. A, PHYSICAL 2022; 344:113739. [PMID: 40012761 PMCID: PMC11864793 DOI: 10.1016/j.sna.2022.113739] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
The uniqueness of eyes, facial geometry, and gaze direction makes eye tracking a very challenging technological pursuit. Although camera-based eye-tracking systems are popular, the obtrusiveness of their bulky equipment along with their high computational cost and power consumption is considered problematic for wearable applications. Noncontact gaze monitoring using capacitive sensing technique has been attempted but failed due to low sensitivity and parasitic capacitance. Here, we study the interaction between a novel capacitive sensor and eye movement for wearable eye-tracking. The capacitive sensors are made of a pair of asymmetric electrodes; one comprising carbon nanotube-paper composite fibers (CPC) and the other being a rectangular metal electrode. The interaction between the asymmetric sensor and a spherical object mimicking an eyeball is analyzed numerically. Using a face simulator, both single- and differential capacitive measurements are characterized with respect to proximity, geometry, and human body charge. Using a prototype eye tracker, multiple sensor locations are studied to determine the optimal configurations. The capacitive responses to vertical and horizontal gaze directions are analyzed in comparison to those of a commercial eye tracking system. The performance is demonstrated for sensitive eye-movement tracking, closed-eye monitoring, and human-machine interface. This research has important implications for the development of capacitive, wearable eye trackers, which can facilitate fields of human-machine interface, cognitive monitoring, neuroscience research, and rehabilitation.
Collapse
Affiliation(s)
| | - Zhongjie Qian
- Department of Mechanical Engineering, University of Washington, Box 352600, Seattle, WA 98195, USA
| | - Tianyi Li
- Department of Mechanical Engineering, University of Washington, Box 352600, Seattle, WA 98195, USA
| | - Sanggyeun Ahn
- Department of Industrial Design, University of Washington, Seattle, WA 98195, USA
| | - Anthony B. Dichiara
- Department of Environment and Forest Sciences, University of Washington Seattle, WA 98195, USA
| | - Robijanto Soetedjo
- Department of Physiology and Biophysics, University of Washington, Box 357290, Seattle, WA 98195
| | - Jae-Hyun Chung
- Department of Mechanical Engineering, University of Washington, Box 352600, Seattle, WA 98195, USA
| |
Collapse
|
36
|
Hsu WY, Cheng YW, Tsai CB. An Effective Algorithm to Analyze the Optokinetic Nystagmus Waveforms from a Low-Cost Eye Tracker. Healthcare (Basel) 2022; 10:healthcare10071281. [PMID: 35885808 PMCID: PMC9320438 DOI: 10.3390/healthcare10071281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 07/06/2022] [Accepted: 07/08/2022] [Indexed: 11/16/2022] Open
Abstract
Objective: Most neurological diseases are usually accompanied by changes in the oculomotor nerve. Analysis of different types of eye movements will help provide important information in ophthalmology, neurology, and psychology. At present, many scholars use optokinetic nystagmus (OKN) to study the physiological phenomenon of eye movement. OKN is an involuntary eye movement induced by a large moving surrounding visual field. It consists of a slow pursuing eye movement, called “slow phase” (SP), and a fast re-fixating saccade eye movement, called “fast phase” (FP). Non-invasive video-oculography has been used increasingly in eye movement research. However, research-grade eye trackers are often expensive and less accessible to most researchers. Using a low-cost eye tracker to quantitatively measure OKN eye movement will facilitate the general application of eye movement research. Methods & Results: We design an analytical algorithm to quantitatively measure OKN eye movements on a low-cost eye tracker. Using simple conditional filtering, accurate FP positions can be obtained quickly. The high-precision FP recognition rate is of great help for the subsequent calculation of eye movement analysis parameters, such as mean slow phase velocity (MSPV), which is beneficial as a reference index for patients with strabismus and other eye diseases. Conclusions: Experimental results indicate that the proposed method achieves faster and better results than other approaches, and can provide an effective algorithm to calculate and analyze the FP position of OKN waveforms.
Collapse
Affiliation(s)
- Wei-Yen Hsu
- Department of Information Management, National Chung Cheng University, Chiayi 621, Taiwan; (W.-Y.H.); (Y.-W.C.)
- Center for Innovative Research on Aging Society, National Chung Cheng University, Chiayi 621, Taiwan
- Advanced Institute of Manufacturing with High-Tech Innovations, National Chung Cheng University, Chiayi 621, Taiwan
| | - Ya-Wen Cheng
- Department of Information Management, National Chung Cheng University, Chiayi 621, Taiwan; (W.-Y.H.); (Y.-W.C.)
| | - Chong-Bin Tsai
- Department of Ophthalmology, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi 600, Taiwan
- Department of Optometry, College of Medical and Health Science, Asia University, Chiayi 600, Taiwan
- Correspondence: ; Tel.: +886-5-2765041 #8503
| |
Collapse
|
37
|
Bae Y. Decreased Saccadic Eye Movement Speed Correlates with Dynamic Balance in Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19137842. [PMID: 35805500 PMCID: PMC9266155 DOI: 10.3390/ijerph19137842] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 12/10/2022]
Abstract
This study aimed to determine the change in saccadic eye movement (SEM) speed according to age (young older; 65–72 years, middle older; 73–80 years, old older: over 81 years) in the elderly and identify the correlation between SEM speed and balance ability. We recruited 128 elderly individuals and measured their SEM speed and balance. The SEM speed was measured to allow the target to appear once every 2 s (0.5 Hz), twice per second (2 Hz), or thrice per second (3 Hz). The SEM performance time was 1 min with a washout period of 1 min. Balance ability was measured using the functional reach test (FRT), timed up-and-go test (TUG), and walking speed (WS). As age increased, FRT, TUG, and WS decreased and SEM speed was significantly decreased in old older than in young older adults at 3 HZ. In all participants, the 3 Hz SEM speed was significantly correlated with TUG and WS. Therefore, SEM speed may be inadequate or decreased in response to rapid external environmental stimuli and may be a factor that deteriorates the ability to balance in older adults.
Collapse
Affiliation(s)
- Youngsook Bae
- Department of Physical Therapy, College of Health Science, Gachon University, 191 Hambangmoe-ro, Yeonsu-gu, Incheon 21936, Korea
| |
Collapse
|
38
|
High-Accuracy 3D Gaze Estimation with Efficient Recalibration for Head-Mounted Gaze Tracking Systems. SENSORS 2022; 22:s22124357. [PMID: 35746135 PMCID: PMC9231356 DOI: 10.3390/s22124357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/04/2022] [Accepted: 06/07/2022] [Indexed: 11/16/2022]
Abstract
The problem of 3D gaze estimation can be viewed as inferring the visual axes from eye images. It remains a challenge especially for the head-mounted gaze tracker (HMGT) with a simple camera setup due to the complexity of the human visual system. Although the mainstream regression-based methods could establish the mapping relationship between eye image features and the gaze point to calculate the visual axes, it may lead to inadequate fitting performance and appreciable extrapolation errors. Moreover, regression-based methods suffer from a degraded user experience because of the increased burden in recalibration procedures when slippage occurs between HMGT and head. To address these issues, a high-accuracy 3D gaze estimation method along with an efficient recalibration approach is proposed with head pose tracking in this paper. The two key parameters, eyeball center and camera optical center, are estimated in head frame with geometry-based method, so that a mapping relationship between two direction features is proposed to calculate the direction of the visual axis. As the direction features are formulated with the accurately estimated parameters, the complexity of mapping relationship could be reduced and a better fitting performance can be achieved. To prevent the noticeable extrapolation errors, direction features with uniform angular intervals for fitting the mapping are retrieved over human’s field of view. Additionally, an efficient single-point recalibration method is proposed with an updated eyeball coordinate system, which reduces the burden of calibration procedures significantly. Our experiment results show that the calibration and recalibration methods could improve the gaze estimation accuracy by 35 percent (from a mean error of 2.00 degrees to 1.31 degrees) and 30 percent (from a mean error of 2.00 degrees to 1.41 degrees), respectively, compared with the state-of-the-art methods.
Collapse
|
39
|
Wang X, Zhang J, Zhang H, Zhao S, Liu H. Vision-Based Gaze Estimation: A Review. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3066465] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Xinming Wang
- School of Mechanical and Automation, State Key Laboratory of Robotics and Systems, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Jianhua Zhang
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
| | - Hanlin Zhang
- School of Mechanical and Automation, State Key Laboratory of Robotics and Systems, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Shuwen Zhao
- School of Computing, University of Portsmouth, Portsmouth, U.K
| | - Honghai Liu
- School of Mechanical and Automation, State Key Laboratory of Robotics and Systems, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| |
Collapse
|
40
|
Xiao P, Wu J, Wang Y, Chi J, Wang Z. Stable Gaze Tracking with Filtering Based on Internet of Things. SENSORS (BASEL, SWITZERLAND) 2022; 22:3131. [PMID: 35590821 PMCID: PMC9101891 DOI: 10.3390/s22093131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/09/2022] [Accepted: 03/18/2022] [Indexed: 06/15/2023]
Abstract
Gaze tracking is basic research in the era of the Internet of Things. This study attempts to improve the performance of gaze tracking in an active infrared source gaze-tracking system. Owing to unavoidable noise interference, the estimated points of regard (PORs) tend to fluctuate within a certain range. To reduce the fluctuation range and obtain more stable results, we introduced a Kalman filter (KF) to filter the gaze parameters. Considering that the effect of filtering is relevant to the motion state of the gaze, we design the measurement noise that varies with the speed of the gaze. In addition, we used a correlation filter-based tracking method to quickly locate the pupil, instead of the detection method. Experiments indicated that the variance of the estimation error decreased by 73.83%, the size of the extracted pupil image decreased by 93.75%, and the extraction speed increased by 1.84 times. We also comprehensively discussed the advantages and disadvantages of the proposed method, which provides a reference for related research. It must be pointed out that the proposed algorithm can also be adopted in any eye camera-based gaze tracker.
Collapse
Affiliation(s)
- Peng Xiao
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (P.X.); (Z.W.)
| | - Jie Wu
- School of Automation and Electronic Engineering, University of Science and Technology Beijing, Beijing 100083, China; (J.W.); (Y.W.)
| | - Yu Wang
- School of Automation and Electronic Engineering, University of Science and Technology Beijing, Beijing 100083, China; (J.W.); (Y.W.)
| | - Jiannan Chi
- School of Automation and Electronic Engineering, University of Science and Technology Beijing, Beijing 100083, China; (J.W.); (Y.W.)
| | - Zhiliang Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China; (P.X.); (Z.W.)
| |
Collapse
|
41
|
Holmqvist K, Örbom SL, Zemblys R. Small head movements increase and colour noise in data from five video-based P-CR eye trackers. Behav Res Methods 2022; 54:845-863. [PMID: 34357538 PMCID: PMC8344338 DOI: 10.3758/s13428-021-01648-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2021] [Indexed: 11/08/2022]
Abstract
We empirically investigate the role of small, almost imperceptible balance and breathing movements of the head on the level and colour of noise in data from five commercial video-based P-CR eye trackers. By comparing noise from recordings with completely static artificial eyes to noise from recordings where the artificial eyes are worn by humans, we show that very small head movements increase levels and colouring of the noise in data recorded from all five eye trackers in this study. This increase of noise levels is seen not only in the gaze signal, but also in the P and CR signals of the eye trackers that provide these camera image features. The P and CR signals of the SMI eye trackers correlate strongly during small head movements, but less so or not at all when the head is completely still, indicating that head movements are registered by the P and CR images in the eye camera. By recording with artificial eyes, we can also show that the pupil size artefact has no major role in increasing and colouring noise. Our findings add to and replicate the observation by Niehorster et al., (2021) that lowpass filters in video-based P-CR eye trackers colour the data. Irrespective of source, filters or head movements, coloured noise can be confused for oculomotor drift. We also find that usage of the default head restriction in the EyeLink 1000+, the EyeLink II and the HiSpeed240 result in noisier data compared to less head restriction. Researchers investigating data quality in eye trackers should consider not using the Gen 2 artificial eye from SR Research / EyeLink. Data recorded with this artificial eye are much noisier than data recorded with other artificial eyes, on average 2.2-14.5 times worse for the five eye trackers.
Collapse
Affiliation(s)
- Kenneth Holmqvist
- Institute of Psychology, Nicolaus Copernicus University in Torun, Torun, Poland
- Department of Psychology, Regensburg University, Regensburg, Germany
- Department of Computer Science and Informatics, University of the Free State, Bloemfontein, South Africa
| | - Saga Lee Örbom
- Department of Psychology, Regensburg University, Regensburg, Germany
| | | |
Collapse
|
42
|
Ha J, Park S, Im CH. Novel Hybrid Brain-Computer Interface for Virtual Reality Applications Using Steady-State Visual-Evoked Potential-Based Brain-Computer Interface and Electrooculogram-Based Eye Tracking for Increased Information Transfer Rate. Front Neuroinform 2022; 16:758537. [PMID: 35281718 PMCID: PMC8908008 DOI: 10.3389/fninf.2022.758537] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 01/27/2022] [Indexed: 11/13/2022] Open
Abstract
Brain-computer interfaces (BCIs) based on electroencephalogram (EEG) have recently attracted increasing attention in virtual reality (VR) applications as a promising tool for controlling virtual objects or generating commands in a "hands-free" manner. Video-oculography (VOG) has been frequently used as a tool to improve BCI performance by identifying the gaze location on the screen, however, current VOG devices are generally too expensive to be embedded in practical low-cost VR head-mounted display (HMD) systems. In this study, we proposed a novel calibration-free hybrid BCI system combining steady-state visual-evoked potential (SSVEP)-based BCI and electrooculogram (EOG)-based eye tracking to increase the information transfer rate (ITR) of a nine-target SSVEP-based BCI in VR environment. Experiments were repeated on three different frequency configurations of pattern-reversal checkerboard stimuli arranged in a 3 × 3 matrix. When a user was staring at one of the nine visual stimuli, the column containing the target stimulus was first identified based on the user's horizontal eye movement direction (left, middle, or right) classified using horizontal EOG recorded from a pair of electrodes that can be readily incorporated with any existing VR-HMD systems. Note that the EOG can be recorded using the same amplifier for recording SSVEP, unlike the VOG system. Then, the target visual stimulus was identified among the three visual stimuli vertically arranged in the selected column using the extension of multivariate synchronization index (EMSI) algorithm, one of the widely used SSVEP detection algorithms. In our experiments with 20 participants wearing a commercial VR-HMD system, it was shown that both the accuracy and ITR of the proposed hybrid BCI were significantly increased compared to those of the traditional SSVEP-based BCI in VR environment.
Collapse
Affiliation(s)
- Jisoo Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, South Korea
| | - Seonghun Park
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea
| | - Chang-Hwan Im
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, South Korea
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea
- Department of Biomedical Engineering, Hanyang University, Seoul, South Korea
| |
Collapse
|
43
|
Strauch C, Naber M. Irissometry: Effects of Pupil Size on Iris Elasticity Measured With Video-Based Feature Tracking. Invest Ophthalmol Vis Sci 2022; 63:20. [PMID: 35142787 PMCID: PMC8842542 DOI: 10.1167/iovs.63.2.20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Purpose It is unclear how the iris deforms during changes in pupil size. Here, we report an application of a multi-feature iris tracking method, which we call irissometry, to investigate how the iris deforms and affects the eye position signal as a function of pupil size. Methods To evoke pupillary responses, we repeatedly presented visual and auditory stimuli to healthy participants while we additionally recorded their right eye with a macro lens–equipped camera. We tracked changes in iris surface structure between the pupil and sclera border (limbus) by calculating local densities (distance between feature points) across evenly spaced annular iris regions. Results The time analysis of densities showed that the inner regions of the iris stretched more strongly as compared with the outer regions of the iris during pupil constrictions. The pattern of iris densities across eccentricities and pupil size showed highly similar patterns across participants, highlighting the robustness of this elastic property. Importantly, iris-based eye position detection led to more stable signals than pupil-based detection. Conclusions The iris regions near the pupil appear to be more elastic than the outer regions near the sclera. This elastic property explains the instability of the pupil border and the related position errors induced by eye movement and pupil size in pupil-based eye-tracking. Tracking features in the iris produce more robust eye position signals. We expect that irissometry may pave the way to novel eye trackers and diagnostic tools in ophthalmology.
Collapse
Affiliation(s)
- Christoph Strauch
- Experimental Psychology, Helmholtz Institute, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
| | - Marnix Naber
- Experimental Psychology, Helmholtz Institute, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
44
|
Yuan G, Wang Y, Yan H, Fu X. Self-calibrated driver gaze estimation via gaze pattern learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
45
|
Cojocaru D, Manta LF, Pană CF, Dragomir A, Mariniuc AM, Vladu IC. The Design of an Intelligent Robotic Wheelchair Supporting People with Special Needs, Including for Their Visual System. Healthcare (Basel) 2021; 10:healthcare10010013. [PMID: 35052177 PMCID: PMC8774883 DOI: 10.3390/healthcare10010013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 12/17/2021] [Accepted: 12/19/2021] [Indexed: 11/29/2022] Open
Abstract
The paper aims to study the applicability and limitations of the solution resulting from a design process for an intelligent system supporting people with special needs who are not physically able to control a wheelchair using classical systems. The intelligent system uses information from smart sensors and offers a control system that replaces the use of a joystick. The necessary movements of the chair in the environment can be determined by an intelligent vision system analyzing the direction of the patient’s gaze and point of view, as well as the actions of the head. In this approach, an important task is to detect the destination target in the 3D workspace. This solution has been evaluated, outdoor and indoor, under different lighting conditions. In order to design the intelligent wheelchair, and because sometimes people with special needs also have specific problems with their optical system (e.g., strabismus, Nystagmus) the system was tested on different subjects, some of them wearing eyeglasses. During the design process of the intelligent system, all the tests involving human subjects were performed in accordance with specific rules of medical security and ethics. In this sense, the process was supervised by a company specialized in health activities that involve people with special needs. The main results and findings are as follows: validation of the proposed solution for all indoor lightning conditions; methodology to create personal profiles, used to improve the HMI efficiency and to adapt it to each subject needs; a primary evaluation and validation for the use of personal profiles in real life, indoor conditions. The conclusion is that the proposed solution can be used for persons who are not physically able to control a wheelchair using classical systems, having with minor vision deficiencies or major vision impairment affecting one of the eyes.
Collapse
|
46
|
Technologies for Multimodal Interaction in Extended Reality—A Scoping Review. MULTIMODAL TECHNOLOGIES AND INTERACTION 2021. [DOI: 10.3390/mti5120081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
When designing extended reality (XR) applications, it is important to consider multimodal interaction techniques, which employ several human senses simultaneously. Multimodal interaction can transform how people communicate remotely, practice for tasks, entertain themselves, process information visualizations, and make decisions based on the provided information. This scoping review summarized recent advances in multimodal interaction technologies for head-mounted display-based (HMD) XR systems. Our purpose was to provide a succinct, yet clear, insightful, and structured overview of emerging, underused multimodal technologies beyond standard video and audio for XR interaction, and to find research gaps. The review aimed to help XR practitioners to apply multimodal interaction techniques and interaction researchers to direct future efforts towards relevant issues on multimodal XR. We conclude with our perspective on promising research avenues for multimodal interaction technologies.
Collapse
|
47
|
Ploumpis S, Ververas E, Sullivan EO, Moschoglou S, Wang H, Pears N, Smith WAP, Gecer B, Zafeiriou S. Towards a Complete 3D Morphable Model of the Human Head. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:4142-4160. [PMID: 32356737 DOI: 10.1109/tpami.2020.2991150] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Three-dimensional morphable models (3DMMs) are powerful statistical tools for representing the 3D shapes and textures of an object class. Here we present the most complete 3DMM of the human head to date that includes face, cranium, ears, eyes, teeth and tongue. To achieve this, we propose two methods for combining existing 3DMMs of different overlapping head parts: (i). use a regressor to complete missing parts of one model using the other, and (ii). use the Gaussian Process framework to blend covariance matrices from multiple models. Thus, we build a new combined face-and-head shape model that blends the variability and facial detail of an existing face model (the LSFM) with the full head modelling capability of an existing head model (the LYHM). Then we construct and fuse a highly-detailed ear model to extend the variation of the ear shape. Eye and eye region models are incorporated into the head model, along with basic models of the teeth, tongue and inner mouth cavity. The new model achieves state-of-the-art performance. We use our model to reconstruct full head representations from single, unconstrained images allowing us to parameterize craniofacial shape and texture, along with the ear shape, eye gaze and eye color.
Collapse
|
48
|
Gong H, Hsieh SS, Holmes D, Cook D, Inoue A, Bartlett D, Baffour F, Takahashi H, Leng S, Yu L, McCollough CH, Fletcher JG. An interactive eye-tracking system for measuring radiologists' visual fixations in volumetric CT images: Implementation and initial eye-tracking accuracy validation. Med Phys 2021; 48:6710-6723. [PMID: 34534365 PMCID: PMC8595866 DOI: 10.1002/mp.15219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 01/17/2023] Open
Abstract
PURPOSE Eye-tracking approaches have been used to understand the visual search process in radiology. However, previous eye-tracking work in computer tomography (CT) has been limited largely to single cross-sectional images or video playback of the reconstructed volume, which do not accurately reflect radiologists' visual search activities and their interactivity with three-dimensional image data at a computer workstation (e.g., scroll, pan, and zoom) for visual evaluation of diagnostic imaging targets. We have developed a platform that integrates eye-tracking hardware with in-house-developed reader workstation software to allow monitoring of the visual search process and reader-image interactions in clinically relevant reader tasks. The purpose of this work is to validate the spatial accuracy of eye-tracking data using this platform for different eye-tracking data acquisition modes. METHODS An eye-tracker was integrated with a previously developed workstation designed for reader performance studies. The integrated system captured real-time eye movement and workstation events at 1000 Hz sampling frequency. The eye-tracker was operated either in head-stabilized mode or in free-movement mode. In head-stabilized mode, the reader positioned their head on a manufacturer-provided chinrest. In free-movement mode, a biofeedback tool emitted an audio cue when the head position was outside the data collection range (general biofeedback) or outside a narrower range of positions near the calibration position (strict biofeedback). Four radiologists and one resident were invited to participate in three studies to determine eye-tracking spatial accuracy under three constraint conditions: head-stabilized mode (i.e., with use of a chin rest), free movement with general biofeedback, and free movement with strict biofeedback. Study 1 evaluated the impact of head stabilization versus general or strict biofeedback using a cross-hair target prior to the integration of the eye-tracker with the image viewing workstation. In Study 2, after integration of the eye-tracker and reader workstation, readers were asked to fixate on targets that were randomly distributed within a volumetric digital phantom. In Study 3, readers used the integrated system to scroll through volumetric patient CT angiographic images while fixating on the centerline of designated blood vessels (from the left coronary artery to dorsalis pedis artery). Spatial accuracy was quantified as the offset between the center of the intended target and the detected fixation using units of image pixels and the degree of visual angle. RESULTS The three head position constraint conditions yielded comparable accuracy in the studies using digital phantoms. For Study 1 involving the digital crosshairs, the median ± the standard deviation of offset values among readers were 15.2 ± 7.0 image pixels with the chinrest, 14.2 ± 3.6 image pixels with strict biofeedback, and 19.1 ± 6.5 image pixels with general biofeedback. For Study 2 using the random dot phantom, the median ± standard deviation offset values were 16.7 ± 28.8 pixels with use of a chinrest, 16.5 ± 24.6 pixels using strict biofeedback, and 18.0 ± 22.4 pixels using general biofeedback, which translated to a visual angle of about 0.8° for all three conditions. We found no obvious association between eye-tracking accuracy and target size or view time. In Study 3 viewing patient images, use of the chinrest and strict biofeedback demonstrated comparable accuracy, while the use of general biofeedback demonstrated a slightly worse accuracy. The median ± standard deviation of offset values were 14.8 ± 11.4 pixels with use of a chinrest, 21.0 ± 16.2 pixels using strict biofeedback, and 29.7 ± 20.9 image pixels using general biofeedback. These corresponded to visual angles ranging from 0.7° to 1.3°. CONCLUSIONS An integrated eye-tracker system to assess reader eye movement and interactive viewing in relation to imaging targets demonstrated reasonable spatial accuracy for assessment of visual fixation. The head-free movement condition with audio biofeedback performed similarly to head-stabilized mode.
Collapse
Affiliation(s)
- Hao Gong
- Department of Radiology, Mayo Clinic, Rochester, MN 55901
| | - Scott S. Hsieh
- Department of Radiology, Mayo Clinic, Rochester, MN 55901
| | - David Holmes
- Department of Physiology & Biomedical Engineering, Mayo Clinic, Rochester, MN 55901
| | - David Cook
- Department of Internal Medicine, Mayo Clinic, Rochester, MN 55901
| | - Akitoshi Inoue
- Department of Radiology, Mayo Clinic, Rochester, MN 55901
| | - David Bartlett
- Department of Radiology, Mayo Clinic, Rochester, MN 55901
| | | | | | - Shuai Leng
- Department of Radiology, Mayo Clinic, Rochester, MN 55901
| | - Lifeng Yu
- Department of Radiology, Mayo Clinic, Rochester, MN 55901
| | | | | |
Collapse
|
49
|
Narcizo FB, dos Santos FED, Hansen DW. High-Accuracy Gaze Estimation for Interpolation-Based Eye-Tracking Methods. Vision (Basel) 2021; 5:41. [PMID: 34564339 PMCID: PMC8482219 DOI: 10.3390/vision5030041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 08/29/2021] [Accepted: 09/02/2021] [Indexed: 11/17/2022] Open
Abstract
This study investigates the influence of the eye-camera location associated with the accuracy and precision of interpolation-based eye-tracking methods. Several factors can negatively influence gaze estimation methods when building a commercial or off-the-shelf eye tracker device, including the eye-camera location in uncalibrated setups. Our experiments show that the eye-camera location combined with the non-coplanarity of the eye plane deforms the eye feature distribution when the eye-camera is far from the eye's optical axis. This paper proposes geometric transformation methods to reshape the eye feature distribution based on the virtual alignment of the eye-camera in the center of the eye's optical axis. The data analysis uses eye-tracking data from a simulated environment and an experiment with 83 volunteer participants (55 males and 28 females). We evaluate the improvements achieved with the proposed methods using Gaussian analysis, which defines a range for high-accuracy gaze estimation between -0.5∘ and 0.5∘. Compared to traditional polynomial-based and homography-based gaze estimation methods, the proposed methods increase the number of gaze estimations in the high-accuracy range.
Collapse
Affiliation(s)
- Fabricio Batista Narcizo
- Eye Information Laboratory, Department of Computer Science, IT University of Copenhagen (ITU), 2300 Copenhagen, Denmark;
- Office of CTO, GN Audio A/S (Jabra), 2750 Ballerup, Denmark
| | | | - Dan Witzner Hansen
- Eye Information Laboratory, Department of Computer Science, IT University of Copenhagen (ITU), 2300 Copenhagen, Denmark;
| |
Collapse
|
50
|
A Human-Computer Control System Based on Intelligent Recognition of Eye Movements and Its Application in Wheelchair Driving. MULTIMODAL TECHNOLOGIES AND INTERACTION 2021. [DOI: 10.3390/mti5090050] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper presents a practical human-computer interaction system for wheelchair motion through eye tracking and eye blink detection. In this system, the pupil in the eye image has been extracted after binarization, and the center of the pupil was localized to capture the trajectory of eye movement and determine the direction of eye gaze. Meanwhile, convolutional neural networks for feature extraction and classification of open-eye and closed-eye images have been built, and machine learning was performed by extracting features from multiple individual images of open-eye and closed-eye states for input to the system. As an application of this human-computer interaction control system, experimental validation was carried out on a modified wheelchair and the proposed method proved to be effective and reliable based on the experimental results.
Collapse
|