Current researches

Monocular Depth Exploiting Monocular Depth Estimation for Style Harmonization in Landscape Painting
Style harmonization, also known as painterly image harmonization, is a technique to seamlessly blend objects from realistic photo into the background of paintings of different styles. While deep learning-based methods have produced satisfactory results, there are important factors to consider for successful style harmonization depending on the type of image being targeted. In particular, if a certain region of the target image takes up a large portion of the entire image, the object to be blended will be heavily affected by the region, failing to achieve plausible object conversion synthesis results. In this study, focusing on landscape painting in which significant portion of the entire image in occupied by sky regions, we
propose a novel method, Discarding Dominant Region (DRD),
that effectively removes the sky by utilizing monocular depth
estimation and Segment Anything Model(SAM).
Unsupervised Image-to-Image Translation Based on Bidirectional Style Transfer
Image-to-image translation (I2I) is an image synthesis technique that aims to map a source image to the style of target domain while preserving its content information. Existing image to image translation studies have achieved excellent image synthesis performance using generative adversarial network (GAN) based models, but they are not capable of efficiently handling the style of the target domain. As a way to overcome this limitation, a bidirectional style transfer network has been developed to perform image synthesis by intersecting images of two domains with each other’s styles, but the type of applicable dataset is limited due to supervised learning-based training. In this paper, we propose an unsupervised image-to-image translation method that employs bidirectional style transfer network with a cyclic collaborative loss to train the model. Experimental results show that the proposed network accurately reflects the style of the target domain in the image synthesis task.
Defective solar panel detection with pre-trained attention recycling
Methods that enable the visual inspection of solar panels are currently in demand, as a huge number of solar panels are now being deployed as a sustainable energy source. One of the solutions for inspection automation is an end-to-end deep learning framework, but this is not recommended for this problem because such a framework requires not only powerful computational resources, but also a large-scale class-balanced dataset. In this study, we present a cost-effective solar panel defect detection method. We emphasize the spatial feature of defects by utilizing an attention map that is generated by a pre-trained attention mechanism that can give attention on stroke ends, gathering, and bends. We define and extract 13 statistical features from the attention map, and then feed them into conventional machine learning model. Therefore, we no longer require energy depleting models such as end-to-end neural classifiers to discriminate between non-defective and defective panels.
Exploiting an Intermediate Latent Space between Photo and Sketch for Face Photo-Sketch Recognition
The photo-sketch matching problem is challenging because the modality gap between a photo and a sketch is very large. This work features a novel approach to the use of an intermediate latent space between the two modalities that circumvents the problem of modality gap for face photo-sketch recognition. To set up a stable homogenous latent space between a photo and a sketch that is effective for matching, we utilize a bidirectional collaborative synthesis network and equip the latent space with rich representation power by employing StyleGAN-like architectures. The latent space equipped with rich representation power enables us to conduct accurate matching because we can effectively align the distributions of the two modalities in the latent space. In addition, to resolve the problem of insufficient paired photo/sketch samples for training, we apply a three-step training scheme. The proposed methodology can be employed in matching other modality pairs.
Cross modal facial image synthesis using a collaborative bidirectional style transfer network
This work proposes a novel collaborative bidirectional style transfer network based on GAN for cross modal facial image synthesis, possibly with large modality gap. Unlike existing image synthesis methods that typically formulate image synthesis as an unidirectional feed forward mapping, our network utilizes mutual interaction between two opposite mappings in a collaborative way to address complex image synthesis problem with large modality gap. The proposed bidirectional network aligns shape content from two modalities and exchanges their appearance styles using feature maps of the layers in the encoder space. This allows us to effectively retain the shape content and transfer style details for synthesizing each modality. Focusing on facial images, we consider facial photo, sketch, and color-coded semantic segmentation as different modalities. We further apply our network to style-content manipulation to generate multiple photo images with various appearance styles for a same content shape.

Past researches

Effective Removal of User-Selected Foreground Object From Facial Images Using a Novel GAN-Based Network
This research features a user-friendly method for face de-occlusion in facial images where the user has control of which object to remove. Our system removes one object at a time, however, it is capable of removing multiple objects through repeated application. Although we show the effectiveness of our system on five commonly occurring occluding objects including hands, a medical mask, microphone, sunglasses, and eyeglasses, more types of object can be considered based on the proposed methodology. Our model learns to detect a user-selected, possibly distracting, object in the first stage. Then, the second stage removes the object using the object detection information from the first stage as guidance. To achieve this, we employ GAN-based networks in both stages. Specifically, in the second stage, we integrate both partial and vanilla convolution operations in the generator part of the GAN network. We show that by using this integration, the proposed network can learn a well-incorporated structure and also overcome the problem of visual discrepancies in the affected region of the face. To train our network, we produce a paired synthetic face-occluded dataset. Our model is evaluated using real world images collected from the Internet and publicly available CelebA and CelebA-HQ datasets. Experimental results confirm our model’s effectiveness in removing challenging foreground non-face objects from facial images as compared to the existing representative state-of-the-art approaches.
A Novel GAN-Based Network for Unmasking of Masked Face
The goal of this research is interaction-free large object (e. g., face mask) removal from facial images. In this work, we focus on unmasking of masked face because it is a very intriguing problem of great practical value. Given an input masked facial image, we detect the mask region, then feed the input image and a binary map of the detected mask region into a GAN based network and generate an image without the non-face object, which is the mask object in our case.
Efficient Generation of Multiple Sketch Styles Using a Single Network
In the real world, different artists draw sketches of the same person with different artistic styles both in texture and shape. Our goal is to synthesize realistic face sketches of different styles while retaining the input face identity, only using a single network. To achieve this, we employ a modified conditional GAN with a target style label as input. Our method is capable of synthesizing multiple sketch styles even though it is based on a single network. Sketches created by our method show sketch quality comparable to the state-of-the-art sketch synthesis methods that use multiple networks.
Exploiting the Context of Object-Action Interaction for Object Recognition
The goal of this study is to efficiently and effectively incorporate the context of object-action interaction into object recognition in order to boost the recognition performance. We employ a few image frames that contain key poses, which can be used to distinguish human actions. Since an assemblage of key poses can take advantage of the fiducial appearance of the human body in action, representation of human actions by concatenating a few key poses, is quite effective. The main contribution of this work is the establishment of an effective Bayesian approach that exploits the probabilities of objects and actions, through random forest and multi-class AdaBoost algorithms.
Depth edge detection
Depth edges play a very important role in many computer vision problems because they represent object contours. We strategically project structured light and exploit distortion of the light pattern in the structured light image along depth discontinuities to reliably detect depth edges.

Masked fake face detection
This research presents a novel 2D feature space where real faces and masked fake faces can be effectively discriminated. We exploit the reflectance disparity based on albedo between real faces and make materials. The feature vector used consists of radiance measurements of forehead region under 850nm and 685nm illuminations. Facial skin and mask material show linearly separable distributions in the feature space proposed.
Face Recognition Using ICA
We propose an effective part-based local representation method named locally salient ICA (LS-ICA) method for face recognition that is robust to local distortion and partial occlusion. The LS-ICA method only employs locally salient information from important facial parts in order to maximize the benefit of applying the idea of “recognition by parts.”
CBCT artifacts reduction
We research a statistical image reconstruction method for X-ray computed tomography (CT) that is based on a physical model that accounts for the polyenergetic X-ray source spectrum and the measurement nonlinearities caused by energydependent attenuation. Applying this method to simulated X-ray CT measurements of objects containing both bone and soft tissue yields images with significantly reduced artifacts.