1
|
Yuan LP, Dudley JJ, Kristensson PO, Qu H. Personalized Dual-Level Color Grading for 360-degree Images in Virtual Reality. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2435-2444. [PMID: 40072854 DOI: 10.1109/tvcg.2025.3549886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
The rising popularity of 360-degree images and virtual reality (VR) has spurred a growing interest among creators in producing visually appealing content through effective color grading processes. Although existing computational approaches have simplified the global color adjustment for entire images with Preferential Bayesian Optimization (PBO), they neglect local colors for points of interest and are not optimized for the immersive nature of VR. In response, we propose a dual-level PBO framework that integrates global and local color adjustments tailored for VR environments. We design and evaluate a novel context-aware preferential Gaussian Process (GP) to learn contextual preferences for local colors, taking into account the dynamic contexts of previously established global colors. Additionally, recognizing the limitations of desktop-based interfaces for comparing 360-degree images, we design three VR interfaces for color comparison. We conduct a controlled user study to investigate the effectiveness of the three VR interface designs and find that users prefer to be enveloped by one 360-degree image at a time and to compare two rather than four color-graded options.
Collapse
|
2
|
Chen C, Ma G, Song W, Li S, Hao A, Qin H. Saliency-Free and Aesthetic-Aware Panoramic Video Navigation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:2037-2054. [PMID: 40030675 DOI: 10.1109/tpami.2024.3516874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Most of the existing panoramic video navigation approaches are saliency-driven, whereby off-the-shelf saliency detection tools are directly employed to aid the navigation approaches in localizing video content that should be incorporated into the navigation path. In view of the dilemma faced by our research community, we rethink if the "saliency clues" are really appropriate to serve the panoramic video navigation task. According to our in-depth investigation, we argue that using "saliency clues" cannot generate a satisfying navigation path, failing to well represent the given panoramic video, and the views in the navigation path are also low aesthetics. In this paper, we present a brand-new navigation paradigm. Although our model is still trained on eye-fixations, our methodology can additionally enable the trained model to perceive the "meaningful" degree of the given panoramic video content. Outwardly, the proposed new approach is saliency-free, but inwardly, it is developed from saliency but biasing more to be "meaningful-driven"; thus, it can generate a navigation path with more appropriate content coverage. Besides, this paper is the first attempt to devise an unsupervised learning scheme to ensure all localized meaningful views in the navigation path have high aesthetics. Thus, the navigation path generated by our approach can also bring users an enjoyable watching experience. As a new topic in its infancy, we have devised a series of quantitative evaluation schemes, including objective verifications and subjective user studies. All these innovative attempts would have great potential to inspire and promote this research field in the near future.
Collapse
|
3
|
Wang G, Chen C, Hao A, Qin H, Fan DP. WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:1694-1713. [PMID: 40030507 DOI: 10.1109/tpami.2024.3510793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
To date, the widely adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where users' fixations are collected while wearing a HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the users cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance - the main purpose of fixations - of complex panoptic scenes. To conquer, this paper introduces the auxiliary window with a dynamic blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is able to well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Specifically, since using WinDB to collect fixations is blind zoom free, there exists frequent and intensive "fixation shifting" - a very special phenomenon that has long been overlooked by the previous research - in our new set. Thus, we present an effective fixation shifting network (FishNet) to conquer it. All these new fixation collection tool, dataset, and network could be very potential to open a new age for fixation-related research and applications in 360o environments.
Collapse
|
4
|
Deb SD, Jha RK, Jha K, Tripathi PS. A multi model ensemble based deep convolution neural network structure for detection of COVID19. Biomed Signal Process Control 2021; 71:103126. [PMID: 34493940 PMCID: PMC8413482 DOI: 10.1016/j.bspc.2021.103126] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 07/25/2021] [Accepted: 08/30/2021] [Indexed: 12/23/2022]
Abstract
The year 2020 will certainly be remembered for the COVID-19 outbreak. First reported in Wuhan city of China back in December 2019, the number of people getting affected by this contagious virus has grown exponentially. Given the population density of India, the implementation of the mantra of the test, track, and isolate is not obtaining satisfactory results. A shortage of testing kits and an increasing number of fresh cases encouraged us to come up with a model that can aid radiologists in detecting COVID19 using chest Xray images. In the proposed framework the low level features from the Chest X-ray images are extracted using an ensemble of four pre-trained Deep Convolutional Neural Network (DCNN) architectures, namely VGGNet, GoogleNet, DenseNet, and NASNet and later on are fed to a fully connected layer for classification. The proposed multi model ensemble architecture is validated on two publicly available datasets and one private dataset. We have shown that our multi model ensemble architecture performs better than single classifier. On the publicly available dataset we have obtained an accuracy of 88.98% for three class classification and for binary class classification we report an accuracy of 98.58%. Validating the performance on private dataset we obtained an accuracy of 93.48%. The source code and the dataset are made available in the github linkhttps://github.com/sagardeepdeb/ensemble-model-for-COVID-detection.
Collapse
Affiliation(s)
- Sagar Deep Deb
- Department of Electrical Engineering, Indian Institute of Technology Patna, India
| | - Rajib Kumar Jha
- Department of Electrical Engineering, Indian Institute of Technology Patna, India
| | - Kamlesh Jha
- Department of Physiology, All Indian Institute of Medical Science Patna, India
| | - Prem S Tripathi
- Department of Radiodiagnosis, MGM Medical College, Indore, India
| |
Collapse
|
5
|
|
6
|
Puttagunta M, Ravi S. Medical image analysis based on deep learning approach. MULTIMEDIA TOOLS AND APPLICATIONS 2021; 80:24365-24398. [PMID: 33841033 PMCID: PMC8023554 DOI: 10.1007/s11042-021-10707-4] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 11/28/2020] [Accepted: 02/10/2021] [Indexed: 05/05/2023]
Abstract
Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning Approach (DLA) in medical image analysis emerges as a fast-growing research field. DLA has been widely used in medical imaging to detect the presence or absence of the disease. This paper presents the development of artificial neural networks, comprehensive analysis of DLA, which delivers promising medical imaging applications. Most of the DLA implementations concentrate on the X-ray images, computerized tomography, mammography images, and digital histopathology images. It provides a systematic review of the articles for classification, detection, and segmentation of medical images based on DLA. This review guides the researchers to think of appropriate changes in medical image analysis based on DLA.
Collapse
Affiliation(s)
- Muralikrishna Puttagunta
- Department of Computer Science, School of Engineering and Technology, Pondicherry University, Pondicherry, India
| | - S. Ravi
- Department of Computer Science, School of Engineering and Technology, Pondicherry University, Pondicherry, India
| |
Collapse
|
7
|
Chen C, Wang G, Peng C, Fang Y, Zhang D, Qin H. Exploring Rich and Efficient Spatial Temporal Interactions for Real-Time Video Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3995-4007. [PMID: 33784620 DOI: 10.1109/tip.2021.3068644] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We have witnessed a growing interest in video salient object detection (VSOD) techniques in today's computer vision applications. In contrast with temporal information (which is still considered a rather unstable source thus far), the spatial information is more stable and ubiquitous, thus it could influence our vision system more. As a result, the current main-stream VSOD approaches have inferred and obtained their saliency primarily from the spatial perspective, still treating temporal information as subordinate. Although the aforementioned methodology of focusing on the spatial aspect is effective in achieving a numeric performance gain, it still has two critical limitations. First, to ensure the dominance by the spatial information, its temporal counterpart remains inadequately used, though in some complex video scenes, the temporal information may represent the only reliable data source, which is critical to derive the correct VSOD. Second, both spatial and temporal saliency cues are often computed independently in advance and then integrated later on, while the interactions between them are omitted completely, resulting in saliency cues with limited quality. To combat these challenges, this paper advocates a novel spatiotemporal network, where the key innovation is the design of its temporal unit. Compared with other existing competitors (e.g., convLSTM), the proposed temporal unit exhibits an extremely lightweight design that does not degrade its strong ability to sense temporal information. Furthermore, it fully enables the computation of temporal saliency cues that interact with their spatial counterparts, ultimately boosting the overall VSOD performance and realizing its full potential towards mutual performance improvement for each. The proposed method is easy to implement yet still effective, achieving high-quality VSOD at 50 FPS in real-time applications.
Collapse
|
8
|
Wang X, Li S, Chen C, Hao A, Qin H. Depth quality-aware selective saliency fusion for RGB-D image salient object detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
9
|
Chen C, Wei J, Peng C, Qin H. Depth-Quality-Aware Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2350-2363. [PMID: 33481710 DOI: 10.1109/tip.2021.3052069] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The existing fusion-based RGB-D salient object detection methods usually adopt the bistream structure to strike a balance in the fusion trade-off between RGB and depth (D). While the D quality usually varies among the scenes, the state-of-the-art bistream approaches are depth-quality-unaware, resulting in substantial difficulties in achieving complementary fusion status between RGB and D and leading to poor fusion results for low-quality D. Thus, this paper attempts to integrate a novel depth-quality-aware subnet into the classic bistream structure in order to assess the depth quality prior to conducting the selective RGB-D fusion. Compared to the SOTA bistream methods, the major advantage of our method is its ability to lessen the importance of the low-quality, no-contribution, or even negative-contribution D regions during RGB-D fusion, achieving a much improved complementary status between RGB and D. Our source code and data are available online at https://github.com/qdu1995/DQSD.
Collapse
|
10
|
Wang X, Li S, Chen C, Fang Y, Hao A, Qin H. Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:458-471. [PMID: 33201813 DOI: 10.1109/tip.2020.3037470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing RGB-D salient object detection methods treat depth information as an independent component to complement RGB and widely follow the bistream parallel network architecture. To selectively fuse the CNN features extracted from both RGB and depth as a final result, the state-of-the-art (SOTA) bistream networks usually consist of two independent subbranches: one subbranch is used for RGB saliency, and the other aims for depth saliency. However, depth saliency is persistently inferior to the RGB saliency because the RGB component is intrinsically more informative than the depth component. The bistream architecture easily biases its subsequent fusion procedure to the RGB subbranch, leading to a performance bottleneck. In this paper, we propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction, where we cyclically convert the original 4-dimensional RGB-D into DGB, RDB and RGD. Then, a newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D, achieving a new SOTA performance.
Collapse
|