Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2017, arXiv (Cornell University)
We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulate the per-frame appearance of the input texture, while statistics of filter responses from the optical flow ConvNet model its dynamics. To generate a novel texture, a randomly initialized input sequence is optimized to match the feature statistics from each stream of an example texture. Inspired by recent work on image style transfer and enabled by the two-stream model, we also apply the synthesis approach to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. We show that our approach generates novel, high quality samples that match both the framewise appearance and temporal evolution of input texture. Finally, we quantitatively evaluate our texture synthesis approach with a thorough user study.
IAEME PUBLICATION, 2020
In this paper, the proposed image style transfer methodology using the Convolutional neural network, given a random pair of images, a universal image style transfer technique extract, the image texture from a reference image to synthesize an output supported the design of a content image. Image processing algorithms supported second-order statistics however are either computationally high-priced or vulnerable to generate artifacts because of the trade-off between image quality and run-time performance. Recently there has been much progress within the field of image style transfer, a process that aims at redrawing an image within the type of another image. During this paper, the proposed technique consists of a normalization step and a smoothing step. Whereas the stylization step transfers the design of the reference image to the content photograph, the smoothing step ensures spatially consistent stylizations. Every one of the steps includes a closed-form solution and maybe computed efficiently. This paper tends to conduct extensive experimental validations. The results show that the proposed technique generates photorealistic stylization outputs that are additional most popular by human subjects as compared to those by the competitive strategies, whereas, running a lot of faster.
EURASIP Journal on Image and Video Processing, 2014
In this paper, we study the use of local spatiotemporal patterns in a non-parametric dynamic texture synthesis method. Given a finite sample video of a texture in motion, dynamic texture synthesis may create a new video sequence, perceptually similar to the input, with an enlarged frame size and longer duration. In general, non-parametric techniques select and copy regions from the input sample to serve as building blocks by pasting them together one at a time onto the outcome. In order to minimize possible discontinuities between adjacent blocks, the proper representation and selection of such pieces become key issues. In previous synthesis methods, the block description has been based only on the intensities of pixels, ignoring the texture structure and dynamics. Furthermore, a seam optimization between neighboring blocks has been a fundamental step in order to avoid discontinuities. In our synthesis approach, we propose to use local spatiotemporal cues extracted with the local binary pattern from three orthogonal plane (LBP-TOP) operator, which allows us to include in the video characterization the appearance and motion of the dynamic texture. This improved representation leads us to a better fitting and matching between adjacent blocks, and therefore, the spatial similarity, temporal behavior, and continuity of the input can be successfully preserved. Moreover, the proposed method simplifies other approximations since no additional seam optimization is needed to get smooth transitions between video blocks. The experiments show that the use of the LBP-TOP representation outperforms other methods, without generating visible discontinuities or annoying artifacts. The results are evaluated using a double-stimulus continuous quality scale methodology, which is reproducible and objective. We also introduce results for the use of our method in video completion tasks. Additionally, we hereby present that the proposed technique is easily extendable to achieve the synthesis in both spatial and temporal domains.
In this paper, the proposed image style transfer methodology using the Convolutional neural network, given a random pair of images, a universal image style transfer technique extract, the image texture from a reference image to synthesize an output supported the design of a content image. Image processing algorithms supported second-order statistics however are either computationally high-priced or vulnerable to generate artifacts because of the trade-off between image quality and run-time performance. Recently there has been much progress within the field of image style transfer, a process that aims at redrawing an image within the type of another image. During this paper, the proposed technique consists of a normalization step and a smoothing step. Whereas the stylization step transfers the design of the reference image to the content photograph, the smoothing step ensures spatially consistent stylizations. Every one of the steps includes a closed-form solution and maybe computed efficiently. This paper tends to conduct extensive experimental validations. The results show that the proposed technique generates photorealistic stylization outputs that are additional most popular by human subjects as compared to those by the competitive strategies, whereas, running a lot of faster.
2018
Flow-based generative models (Dinh et al., 2014) are conceptually attractive due to tractability of the exact log-likelihood, tractability of exact latent-variable inference, and parallelizability of both training and synthesis. In this paper we propose Glow, a simple type of generative flow using an invertible 1x1 convolution. Using our method we demonstrate a significant improvement in log-likelihood on standard benchmarks. Perhaps most strikingly, we demonstrate that a generative model optimized towards the plain log-likelihood objective is capable of efficient realistic-looking synthesis and manipulation of large images. The code for our model is available at this https URL
2007
We present a technique for synthesizing spatially and temporally varying textures on continuous flows using image or video input, guided by the physical characteristics of the fluid stream itself. This approach enables the generation of realistic textures on the fluid that correspond to the local flow behavior, creating the appearance of complex surface effects, such as foam and small bubbles. Our technique requires only a simple specification of texture behavior, and automatically generates and tracks the features and texture over time in a temporally coherent manner. Based on this framework, we also introduce a technique to perform feature-guided video synthesis. We demonstrate our algorithm on several simulated and recorded natural phenomena, including splashing water and lava flows. We also show how our methodology can be extended beyond realistic appearance synthesis to more general scenarios, such as temperature-guided synthesis of complex surface phenomena in a liquid during boiling.
Journal of Computer Science and Technology Studies
Artistic style transfer, a captivating application of generative artificial intelligence, involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs). By leveraging deep image representations learned by CNNs, we demonstrate how to separate and manipulate image content and style, enabling the synthesis of high-quality images that combine content and style in a harmonious manner. We describe the methodology, including content and style representations, loss computation, and optimization, and showcase experimental results highlighting the effectiveness and versatility of the approach across different styles and content.
Cornell University - arXiv, 2017
We propose a recurrent variational auto-encoder for texture synthesis. A novel loss function, FLTBNK, is used for training the texture synthesizer. It is rotational and partially color invariant loss function. Unlike L2 loss, FLTBNK explicitly models the correlation of color intensity between pixels. Our texture synthesizer 1 generates neighboring tiles to expand a sample texture and is evaluated using various texture patterns from Describable Textures Dataset (DTD). We perform both quantitative and qualitative experiments with various loss functions to evaluate the performance of our proposed loss function (FLTBNK)-a minihuman subject study is used for the qualitative evaluation.
Proceedings of Computer Graphics International 2018 on - CGI 2018, 2018
The techniques for photographic style transfer have been researched for a long time, which explores effective ways to transfer the style features of a reference photo onto another content photograph. Recent works based on convolutional neural networks present an effective solution for style transfer, especially for paintings. The artistic style transformation results are visually appealing, however, the photorealism is lost because of content-mismatching and distortions even when both input images are photographic. To tackle this challenge, this paper introduces a similarity loss function and a refinement method into the style transfer network. The similarity loss function can solve the content-mismatching problem, however, the distortion and noise artefacts may still exist in the stylized results due to the content-style trade-off. Hence, we add a post-processing refinement step to reduce the artefacts. The robustness and effectiveness of our approach has been evaluated through extensive experiments which show that our method can obtain finer content details and less artefacts than state-of-the-art methods, and transfer style faithfully. In addition, our approach is capable of processing photographic style transfer in almost real-time, which makes it a potential solution for video style transfer.
INTERNATIONAL JOURNAL OF ADVANCE RESEARCH, IDEAS AND INNOVATIONS IN TECHNOLOGY
In this paper, we are implementing the style transfer using convolutional neural networks. The style transfer means to extract the style and texture of a style image and applying it to the extracted content of another image. Our work is based on the work proposed by LA Gatys. We use a pre-trained model, VGG 16 for our work. This work includes the content reconstruction and style reconstruction from the content image and style image respectively. Now the style and content are merged in a manner that the features of content and style are retained.
ArXiv, 2020
As image generation techniques mature, there is a growing interest in explainable representations that are easy to understand and intuitive to manipulate. In this work, we turn to co-occurrence statistics, which have long been used for texture analysis, to learn a controllable texture synthesis model. We propose a fully convolutional generative adversarial network, conditioned locally on co-occurrence statistics, to generate arbitrarily large images while having local, interpretable control over the texture appearance. To encourage fidelity to the input condition, we introduce a novel differentiable co-occurrence loss that is integrated seamlessly into our framework in an end-to-end fashion. We demonstrate that our solution offers a stable, intuitive and interpretable latent representation for texture synthesis, which can be used to generate a smooth texture morph between different textures. We further show an interactive texture tool that allows a user to adjust local characteristi...
We propose StyleBank, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style, for neural image style transfer. To transfer an image to a specific style, the corresponding filter bank is operated on top of the intermediate feature embedding produced by a single auto-encoder. The StyleBank and the auto-encoder are jointly learnt, where the learning is conducted in such a way that the auto-encoder does not encode any style information thanks to the flexibility introduced by the explicit filter bank representation. It also enables us to conduct incremental learning to add a new image style by learning a new filter bank while holding the auto-encoder fixed. The explicit style representation along with the flexible network design enables us to fuse styles at not only the image level, but also the region level. Our method is the first style transfer network that links back to traditional texton mapping methods, and hence provides new understanding on neural style transfer. Our method is easy to train, runs in real-time, and produces results that qualitatively better or at least comparable to existing methods.
Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2019
Style is an intrinsic, inescapable part of human motion. It complements the content of motion to convey meaning, mood, and personality. Existing state-of-the-art motion style methods require large quantities of example data and intensive computational resources at runtime. To ensure output quality, such style transfer applications are often run on desktop machine with GPUs and significant memory. In this paper, we present a fast and expressive neural network-based motion style transfer method that generates stylized motion with quality comparable to the state of the art method, but uses much less computational power and a much smaller memory footprint. Our method also allows the output to be adjusted in a latent style space, something not offered in previous approaches. Our style transfer model is implemented using three multi-layered networks: a pose network, a timing network and a foot-contact network. A one-hot style vector serves as an input control knob and determines the styli...
It is a long standing question how biological systems transform visual inputs to robustly infer high-level visual information. Research in the last decades has established that much of the underlying computations take place in a hierarchical fashion along the ventral visual pathway. However, the exact processing stages along this hierarchy are difficult to characterise. Here we present a method to generate stimuli that will allow a principled description of the processing stages along the ventral stream. We introduce a new parametric texture model based on the powerful feature spaces of convolutional neural networks optimised for object recognition. We show that constraining a spatial summary statistic over feature maps suffices to synthesise high-quality natural textures. Moreover we establish that our texture representations continuously disentangle high-level visual information and demonstrate that the hierarchical parameterisation of the texture model naturally enables us to gen...
ArXiv, 2019
Affine transformation, layer blending, and artistic filters are popular processes that graphic designers employ to transform pixels of an image to create a desired effect. Here, we examine various approaches that synthesize new images: pixel-based compositing models and in particular, distributed representations of deep neural network models. This paper focuses on synthesizing new images from a learned representation model obtained from the VGG network. This approach offers an interesting creative process from its distributed representation of information in hidden layers of a deep VGG network i.e., information such as contour, shape, etc. are effectively captured in hidden layers of neural networks. Conceptually, if $\Phi$ is the function that transforms input pixels into distributed representations of VGG layers ${\bf h}$, a new synthesized image $X$ can be generated from its inverse function, $X = \Phi^{-1}({\bf h})$. We describe the concept behind the approach, present some repr...
arXiv (Cornell University), 2017
(a) Style images (b) Style transfer (c) Style transfer (d) Texture (e) Texture synthesis Figure 1: Our style transfer and texture synthesis results. The input styles are shown in (a), and style transfer results are in (b, c). Note that the angular shapes of the Picasso painting are successfully transferred on the top row, and that the more subtle brush strokes are transferred on the bottom row. The original content images are inset in the upper right corner. Unless otherwise noted, our algorithm is always run with default parameters (we do not manually tune parameters). Input textures are shown in (d) and texture synthesis results are in (e). For the texture synthesis, note that the algorithm synthesizes creative new patterns and connectivities in the output.
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018
2 Adobe Research 3 UC Berkeley 4 Argo AI Figure 1. With TextureGAN, one can generate novel instances of common items from hand drawn sketches and simple texture patches. You can now be your own fashion guru! Top row: Sketch with texture patch overlaid. Bottom row: Results from TextureGAN.
Transferring artistic styles onto everyday photographs has become an extremely popular task in both academia and industry. Recently, offline training has replaced online iterative optimization, enabling nearly real-time stylization. When those stylization networks are applied directly to high-resolution images, however, the style of localized regions often appears less similar to the desired artistic style. This is because the transfer process fails to capture small, intricate textures and maintain correct texture scales of the artworks. Here we propose a multimodal convolutional neural network that takes into consideration faithful representations of both color and luminance channels, and performs stylization hierarchically with multiple losses of increasing scales. Compared to state-of-the-art networks, our network can also perform style transfer in nearly realtime by conducting much more sophisticated training offline. By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones. That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Videos show continuous events, yet most-if not allvideo synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should betime-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image and video discriminators pair and propose to use a single hypernetwork-based one. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time. We build our model on top of StyleGAN2 and it is just ≈5% more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model achieves state-of-the-art results on four modern 256 2 video synthesis benchmarks and one 1024 2 resolution one. 1
International Journal of Computer …, 2003
Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties in time; these include sea-waves, smoke, foliage, whirlwind but also talking faces, traffic scenes etc. We present a novel characterization of dynamic textures that poses the problems of modelling, learning, recognizing and synthesizing dynamic textures on a firm analytical footing. We borrow tools from system identification to capture the "essence" of dynamic textures; we do so by learning (i.e. identifying) models that are optimal in the sense of maximum likelihood or minimum prediction error variance. For the special case of secondorder stationary processes we identify the model in closed form. Once learned, a model has predictive power and can be used for extrapolating synthetic sequences to infinite length with negligible computational cost. We present experimental evidence that, within our framework, even low dimensional models can capture very complex visual phenomena.
arXiv (Cornell University), 2023
Photorealistic style transfer aims to apply stylization while preserving the realism and structure of input content. However, existing methods often encounter challenges such as color tone distortions, dependency on pair-wise pretraining, inefficiency with high-resolution inputs, and the need for additional constraints in video style transfer tasks. To address these issues, we propose a Universal Photorealistic Style Transfer (UPST) framework that delivers accurate photorealistic style transfer on high-resolution images and videos without relying on pre-training. Our approach incorporates a lightweight StyleNet for per-instance transfer, ensuring color tone accuracy while supporting highresolution inputs, maintaining rapid processing speeds, and eliminating the need for pretraining. To further enhance photorealism and efficiency, we introduce instance-adaptive optimization, which features an adaptive coefficient to prioritize content image realism and employs early stopping to accelerate network convergence. Additionally, UPST enables seamless video style transfer without additional constraints due to its strong non-color information preservation ability. Experimental results show that UPST consistently produces photorealistic outputs and significantly reduces GPU memory usage, making it an effective and universal solution for various photorealistic style transfer tasks.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.