0% found this document useful (0 votes)

50 views37 pages

Project Report Format 2023

The document provides guidelines for preparing a project report for graduate students, emphasizing standardization, readability, and ethical norms. Key elements include report size, formatting, order of sections, and specific requirements for acknowledgments, abstracts, and references. It also outlines the structure of the report, including page numbering and the inclusion of non-paper materials, ensuring a comprehensive approach to documenting research work.

Uploaded by

Ganga Bhavani Tentu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views37 pages

Project Report Format 2023

Uploaded by

Ganga Bhavani Tentu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Guidelines for Preparation of Project Report

OBJECTIVE
A Project Report is a documentation of a Graduate student’s project work—a record of the
original work done by the student. It provides information on the student’s research work to the
future researchers. The Dept. is committed to preserve a proper copy of the student’s report for
archiving and cataloging it in the Departmental Library, making it available to others for
academic purpose.
Standardization, readability, conformance to ethical norms, and durability are the four overriding
criteria for an acceptable form of a report.
The objective of this document is to provide a set of guidelines that help a research student to
prepare the report to satisfy the above-mentioned criteria.
PRODUCTION
Report Size
1. The minimum number of pages of the Report should be 50 pages.
Paper Size
2. The standard size of paper of a Report is 21.5 cm (8½ inch) wide and 28 cm (11 inches) long.
3. Oversized figures and tables, if any, should be reduced to fit with the size of the report but the
reduction should not be so drastic as to impair clarity of their contents. One may also fold these
pages to fit with the report size.
Single-Sided printing
4. It is suggested that the report be printed on one side of the paper. Double-sided printing can
be done only if the paper is opaque enough not to impair readability on the other side in normal
lighting conditions.
Non-Paper Material
5. Digital or magnetic materials, such as CDs and DVDs, may be included in the report. They
have to be given in a closed pocket in the inside of the back cover page of the report. It should
be borne in mind that their formats may become obsolete due to rapid change in technology,
making it impossible for the Library to guarantee their preservation and use.
6. All non-paper materials, as above, must have a label each indicating the name of the student,
the date of submission, and the copyright notice.
Page Numbering
7. Page numbers for the prefacing materials of the report shall be in small Roman numerals and
should be centered at the bottom of the pages.
8. Page numbers for the body of the report should be in Arabic numerals and should be centered
at the bottom of the pages. The pagination should start with the first page of Chapter 1 and
should continue throughout the text (including tables, figures, and appendices)
Order of Report
1. Front Page ( Refer the sample page)
2. Certificate ( Refer the sample page)
3. Acknowledgement
4. Abstract
5. Table of Contents
6. List of Table
7. List of Figures
8. List of Abbreviations
9. Chapters ( All the chapters starting from introduction to conclusion and future work)
10. Appendix
11. References

Acknowledgements
It should be limited, preferably, to one page.
Contents
Chapter numbers, chapter names, section numbers, section headings, subsection numbers, and
subsection headings, along with the corresponding page numbers, should be given in the
contents.
See Sample Page
List of Symbols
All the symbols used in the report are to be given here along with their explanations and units of
measurement (if applicable).
Abstract
1. The abstract of the report should be limited to 200-300 words in 2 line spacing
2. A list of keywords should follow the abstract.
BODY OF THE REPORT
1. The report should be written in either British or American English, not a mixed mode.
However, because of increasing acceptance of both styles and blurring of the distinction between
the two, what is important is that consistency should be maintained all throughout the text.
2. The chapters should have numbers in Arabic numerals and should be written as 1.
INTRODUCTION, 2. LITERATURE REVIEW, etc. This should be followed by the title of the
chapter (e.g., Introduction, etc.). The font size should be 14-point, bold for the titles and
centered.

Ex: ABSTRACT, ACKNOWLEDGEMENT, LIST OF FIGURES, LIST OF

TABLES, CHAPTER HEADINGS (1.INTRODUCTION,
2. METHODOLOGY, 3. RESULT, 4. CONCLUSION, REFERENCES ) must
be in 14 font size with bold.
Remaining contents will be typed with font size of 12 , Times new roman. All main headings
& Sub heading must be bold.
3. At the end of Introduction chapter, Objective of the work and Organization of the thesis must
be included as sub heading.
4. Figures, tables, graphs should be positioned within the body of the text immediately after
citation. The citation should be like figure 1.1 / table 1.1 in the body of the report. See the sample
Guidelines for Tables and Figures below.
Name of the table should be written on above the table- 12 point with bold (eg: Table 2.1:)
Name of the figure should be written below the figure- 12 point with bold (eg: Figure 1.1:)
Equations should be typed in equation editor only. The equations are represented as (1.1), (1.2),
(2.1) aligned at right side of the page.
All figures, Tables and equations must be cited in the text.
References
1. IEEE style to be followed. References must be marked sequentially using square bracket and
if the same reference is required to use further, Mention in ascending order. Eg: [ 1-3, 5, 7]
2. 1 line spacing
3. All the references mentioned in the report must be cited in content.
Appendices
1. Each appendix should be identified as Appendix 1, Appendix 2, etc.
2. It should also have a title.
3. The appendices and their titles should be listed in the Contents.
4. Section and sub-section headings, equations, figures, and tables should be identified as A.1,
A.2, etc., in accordance with their appearance in an appendix.
TITLE OF THE PROJECT(18 font)
A project report submitted in partial fulfilment of the requirement
for the award of degree of (12 font)

BACHELOR OF TECHNOLOGY (14 font)

In (12 font)
COMPUTER SCIENCE AND ENGINEERING (14 font)

Submitted by (12 font)

Name 1 (Regd. No.)
Name 2 (Regd. No.)
Name 3 (Regd. No.)
Name 4 (Regd. No.)
Name 5 (Regd. No.)
Name 6 (Regd. No.)
Under the esteemed guidance of
Dr. xxxxxxxxxx
Designation, Dept. of CSE

GMR Institute of Technology

An Autonomous Institute Affiliated to JNTU-GV, Vizianagaram
(Accredited by NBA, NAAC with ‘A’ Grade & ISO 9001:2015 Certified Institution)

GMR Nagar, Rajam – 532127,

Andhra Pradesh, India
March 2023

Department of Computer Science and Engineering

CERTIFICATE

This is to certify that the thesis entitled TITLE submitted by Names (Regd. Nos) has been

carried out in partial fulfilment of the requirement for the award of degree of Bachelor of

Technology in Computer Science and Engineering of GMRIT, Rajam affiliated to

JNTUGV, Vizianagaram is a record of bonafide work carried out by them under my guidance

& supervision. The results embodied in this report have not been submitted to any other

University or Institute for the award of any degree.

Signature of Supervisor Signature of HOD

Dr.xxxxxxxxx Dr. A. V. Ramana
Professor/Associate/Assistant Professor Professor & Head
Department of CSE Department of CSE
GMRIT, Rajam. GMRIT, Rajam.

The report is submitted for the viva-voce examination held on ………………..

Signature of Internal Examiner Signature of HOD

ACKNOWLEDGEMENT

It gives us an immense pleasure to express deep sense of gratitude to my guide

Mr/Dr. X. xxxxx, Assistant/Associate/Professor, Department of Computer Science and
Engineering for his whole hearted and invaluable guidance throughout the project work. Without
his sustained and sincere effort, this project work would not have taken this shape. He
encouraged and helped us to overcome various difficulties that we have faced at various stages
of our project work.

We would like to sincerely thank our Head of the department Dr. A. V. Ramana, for
providing all the necessary facilities that led to the successful completion of our project work.

We would like to take this opportunity to thank our beloved Principal Dr. C. L. V. R. S.
V. Prasad, for providing all the necessary facilities and a great support to us in completing the
project work.

We would like to thank all the faculty members and the non-teaching staff of the
Department of Electronics and Communication Engineering for their direct or indirect support
for helping us in completion of this project work.

Finally, we would like to thank all of our friends and family members for their
continuous help and encouragement.

Name 1 Regd. no.

Name 2 Regd. no.
Name 3 Regd. no.
Name 4 Regd. no.
iii

ABSTRACT
The current trends in wireless industry are based on multi-carrier transmission technique such as

Orthogonal Frequency Division Multiplexing (OFDM) which is highly promising in terms of

higher data rates and better immunity to frequency selective fading. Wireless standards like

IEEE 802.11a/g/n, IEEE 802.16e and many others use one or other variation of OFDM, such as

OFDMA and MIMO-OFDM. However OFDM is handicapped with a major problem of high

peak-to-average power ratio which is a trait in-built to any multi-carrier transmission system.

High PAPR causes non-linear distortion in the signal and hence results in inter-carrier

interference and out-of-band radiation. To combat the effect of high PAPR, several PAPR

reduction techniques have been devised over the last few decades. All these techniques have

to strike a trade-off among some parameters such as computational complexity, PAPR

reduction performance, BER performance and redundancy.

Keywords:
iv

Sample page of contents:

TABLE OF CONTENTS
ACKNOWLEDGEMENT (Bold and caps) iii
ABSTRACT (Bold and caps) iv
LIST OF TABLES (Bold and caps) v
LIST OF FIGURES (Bold and caps) vi
LIST OF SYMBOLS & ABBREVIATIONS (Bold and caps) vii
1. NTRODUCTION (Bold and caps) 1

1.1 Introductory paragraph

1.2 Major challenges in the current literature inline with your proposed work
1.3 Solutions to those challenges
1.4 Background/Motivation for the proposed work
1.5 Overview of the proposed work/scheme/model
2. RELATED WORK/ THEORETICAL STUDY (Bold and caps)

The section should cover complete literature related to the proposed work and
organization of your paper.
3. EXPERIMENTAL STUDY (If applicable) (Bold and caps)
4. RESULTS AND DISCUSSIONS (Bold and caps)
5. CONCLUSIONS AND FUTURE SCOPE (Bold and caps)
APPENDIX 1
APPENDIX 2
REFERENCES 60
LIST OF PUBLICATIONS (in the reference format)
Add the Publication paper (Conference paper/Conference certificate/Journal paper or both)
LIST OF TABLES

TABLE NO TITLE PAGE NO

1.1 Data hiding technique 12

1.2 OFDM Spectrum 23
2.1
2.2
3.1
vii

LIST OF FIGURES

FIGURE NO TITLE PAGE NO

1.1 Conceptual Diagram for Generic Multi-carrier 12

1.2 OFDM Spectrum 23
2.1
2.2
3.1
viii

LIST OF SYMBOLS & ABBREVIATIONS (Alphabetic order)

S/P : Serial to Parallel Converter

P/S : Parallel to Serial Converter
D/A : Digital to Analog converter
LPF : Low Pass Filter
IP : Internet Protocol
CMOS : Complementary Metal Oxide Semiconductor
DAB : Digital Audio Broadcasting
OFDM : Orthogonal Frequency Division Multiplexing
VLSI : Very Large Scale Integration
WLAN : Wireless Local Area Network
WiMAX : Worldwide Interoperability for Microwave Access

Introduction:
Diffusion models have revolutionized image generation, enabling the production of high-quality,
photorealistic visuals from textual descriptions. These models have demonstrated remarkable
success in text-to-image synthesis, where users provide prompts to generate diverse and realistic
images. However, when applied to image editing, these models face significant challenges due to
the inherent ambiguity of text prompts, which often fail to specify precise spatial modifications.
As a result, text-guided editing struggles with localizing changes, maintaining structural
coherence, and ensuring high-fidelity transformations, particularly in cases that demand fine-
grained adjustments.
To overcome these limitations, researchers have explored interactive image editing, which
allows users to modify images using intuitive visual inputs such as sketches, clicks, and drags.
This approach provides greater spatial control compared to text-based methods, enabling users to
directly indicate which regions of an image should be changed and how. Despite these
advantages, existing interactive editing techniques are still constrained by the fundamental
limitations of image-to-image generation models, which rely on diffusion-based text-to-image
pipelines. These methods typically require vast training datasets, employ additional reference
encoders to enforce consistency, and suffer from computational inefficiencies. Furthermore,
maintaining semantic coherence between the original and edited images is challenging,
especially when dealing with complex modifications such as object deformations, appearance
changes, and structural transformations.
In this work, we introduce a novel image editing framework that redefines interactive image
editing as an image-to-video generation problem. Our key insight is that image editing can be
framed as a temporal transition—where the source image represents the first frame and the edited
image acts as the second frame of a short video. By leveraging video diffusion priors, our
method enhances realism, preserves spatial consistency, and significantly reduces training costs.
Unlike traditional image-editing models that require large-scale paired datasets, our approach
benefits from inherent motion priors present in video data, making it more data-efficient while
ensuring high-quality transformations.
Built upon Stable Video Diffusion, our approach integrates a lightweight sparse control encoder,
which injects user-provided editing signals into the diffusion process while preserving key
structural features of the original image. To further enhance consistency and realism, we
introduce a novel matching attention mechanism, which establishes dense correspondences
between the source and edited images. This component mitigates artifacts, aligns key object
regions, and improves spatial consistency, addressing the limitations of traditional temporal
attention in handling large-scale deformations. By combining spatial, temporal, and cross-
attention mechanisms, it ensures high-fidelity edits while maintaining the natural structure and
texture of the original image.
Through extensive experimentation, we demonstrate that it achieves state-of-the-art performance
across a wide range of interactive editing tasks, including shape deformations, object
modifications, and fine-grained detail adjustments. Additionally, our method exhibits remarkable
generalization capabilities, handling out-of-domain edits such as transforming a clownfish into a
shark-like shape, modifying reflections, and generating complex structural changes with minimal
supervision. These results highlight the effectiveness, flexibility, and efficiency of our approach,
establishing a new paradigm for interactive image editing that leverages the power of video
diffusion models to redefine user-controlled image manipulation.
By framing image editing as a temporal progression rather than an isolated transformation, our
work paves the way for more natural, coherent, and efficient editing techniques, offering a
scalable solution for high-quality, realistic image modifications with minimal data requirements.

Literature Survey:
1. Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., & Liu, Y. (2022). Ide-3d: Interactive disentangled editing
for high-resolution 3d-aware portrait synthesis. ACM Transactions on Graphics (ToG), 41(6), 1-10.
 The paper addresses the challenge faced by existing 3D-aware facial generation methods, which often
compromise between quality and editability. The proposed approach aims to provide both high-
resolution outputs and flexible editing capabilities.
 The IDE-3D method primarily utilizes the FFHQ (Flickr-Faces-HQ) dataset for training and
evaluation. This dataset is known for its high-quality images of human faces, which helps in
achieving photorealistic results in 3D-aware portrait synthesis. The paper mentions that models are
trained with FFHQ at a resolution of 512x512, except for FENeRF, which is trained at a lower
resolution of 128x128.
 The method relies on a single image to reconstruct a 3D facial volume, which is inherently an ill-
posed problem. This can lead to implausible facial geometry in some cases, indicating a limitation
in the method's ability to accurately capture complex facial structures from limited data.
2. Cheng, Y., Gan, Z., Li, Y., Liu, J., & Gao, J. (2020, October). Sequential attention GAN for interactive
image editing. In Proceedings of the 28th ACM international conference on multimedia (pp. 4383-4391).
 The paper introduces Interactive Image Editing using SeqAttnGAN, enabling users to modify images
through multi-turn commands while maintaining contextual consistency and image quality.
 The paper introduces two new datasets for the interactive image editing task: Zap-Seq, which contains
8,734 image sequences derived from 50,025 shoe images, and DeepFashion-Seq, which includes
4,820 sequences from around 290,000 clothing images, both paired with natural language
descriptions of the differences between consecutive images.
 The model's potential difficulty in handling complex modifications that require a deeper
understanding of context and semantics, as well as the need for further exploration of how well
SeqAttnGAN generalizes to diverse image editing tasks beyond the fashion domain.
3. Jiang, Y., Huang, Z., Pan, X., Loy, C. C., & Liu, Z. (2021). Talk-to-edit: Fine-grained facial editing via
dialog. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 13799-13808).
 The objective of this paper is to present "Talk-to-Edit," an interactive facial editing system that
enables users to modify facial attributes in images through natural language requests while
preserving identity and enhancing editing realism.
 The dataset used for training MagicQuill consists of 24,315 images categorized under 4,412 different
labels, specifically selected for their aesthetic scores above 6.5, ensuring a broad spectrum of data
for effective model training.
 Its reliance on user input, which may lead to challenges in accurately interpreting ambiguous or vague
language requests, potentially resulting in unsatisfactory editing outcomes.
4. Liu, Z., Yu, Y., Ouyang, H., Wang, Q., Cheng, K. L., Wang, W., ... & Shen, Y. (2024). Magicquill: An
intelligent interactive image editing system. arXiv preprint arXiv:2411.09703.
 This paper is to present and evaluate MagicQuill, an advanced image editing system that utilizes
diffusion models and AI to enhance user experience and precision in fine-grained image
manipulation.
 The dataset used in the "Talk-to-Edit" system is a large-scale visual-language facial attribute dataset
named CelebA-Dialog, which is designed to support fine-grained and language-driven facial
editing tasks.
 Its current limitations in expanding editing capabilities, such as the lack of reference-based editing
and insufficient support for typography manipulation within images.
5. Ling, H., Kreis, K., Li, D., Kim, S. W., Torralba, A., & Fidler, S. (2021). Editgan: High-precision semantic
image editing. Advances in Neural Information Processing Systems, 34, 16331-16345.
 The primary objective of the paper is to propose EditGAN, a novel GAN-based image editing
framework that enables high-precision semantic image editing. It allows users to modify detailed
object part segmentations with minimal labeled examples, making it scalable for various object
classes and part labels
 EditGAN requires only a handful of labeled examples for training, making it a scalable tool for high-
quality, high-precision semantic image editing. It builds on a GAN framework that jointly models
images and their semantic segmentations, allowing users to edit images by modifying detailed part
segmentation masks.
 EditGAN still faces challenges with certain complex edits that require more extensive optimization,
indicating a gap in efficiency for specific use cases.
6. Brooks, T., Holynski, A., & Efros, A. A. (2023). Instructpix2pix: Learning to follow image editing
instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (pp. 18392-18402).
 The primary objective of the paper is to develop a model that can perform image edits based on
human-written instructions without requiring full descriptions of the input or output images. The
model aims to generate edited images directly in the forward pass, enhancing the efficiency of
image editing tasks.
 The dataset used in the study consists of over 450,000 training examples generated through a two-part
method: first, using a finetuned GPT-3 to create instructions and edited captions, and second,
employing StableDiffusion in combination with Prompt-to-Prompt to generate pairs of images from
those captions.
 The paper discusses the potential for incorporating human feedback, such as reinforcement learning,
to improve alignment between the model's outputs and human intentions, indicating a gap in
current capabilities.
7. Ceylan, D., Huang, C. H. P., & Mitra, N. J. (2023). Pix2video: Video editing using image diffusion.
In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 23206-23217).
 The primary objective of the paper "Pix2Video, Video Editing using Image Diffusion" is to explore
the feasibility of editing video clips using a pre-trained image diffusion model guided by text
instructions, without requiring additional training.
 The dataset used for evaluation in the study is obtained from the DAVIS dataset, which is commonly
referenced in video object segmentation tasks.
 The paper acknowledges that there is still room for improvement in terms of temporal coherency and
suggests exploring additional energy terms, such as patch-based similarity and CLIP similarity,
during the latent update stage.
8. Wang, S., Saharia, C., Montgomery, C., Pont-Tuset, J., Noy, S., Pellegrini, S., ... & Chan, W. (2023).
Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition (pp. 18359-18369).
 The primary objective of the paper is to present the Imagen Editor, a model designed for text-guided
image inpainting, which allows users to make localized edits to images based on user-defined
masks and text prompts.
 The dataset proposed for evaluating text-guided image inpainting is called EditBench. It consists of
three components for each example: a masked input image, an input text prompt, and a high-quality
output image for reference. EditBench captures a wide variety of language, types of images, and
levels of difficulty, with prompts categorized along attributes, objects, and scenes.
 The paper identifies gaps in the model's performance with abstract attributes and complex prompts,
suggesting that future work should focus on improving these areas.
9. Alaluf, Y., Tov, O., Mokady, R., Gal, R., & Bermano, A. (2022). Hyperstyle: Stylegan inversion with
hypernetworks for real image editing. In Proceedings of the IEEE/CVF conference on computer Vision
and pattern recognition (pp. 18511-18521).
 The primary objective of the paper is to present HyperStyle, a method for image inversion that
achieves high-quality reconstructions and editability in latent space while being computationally
efficient compared to traditional optimization techniques.
 The datasets used in the experiments include FFHQ for training and the CelebA-HQ test set for
quantitative evaluations in the human facial domain. For the cars domain, the Stanford Cars dataset
is utilize.
 The paper notes challenges in comparing editability across different inversion methods due to varying
editing strengths, which could introduce bias.
 Further research is needed to enhance robustness to diverse input conditions, particularly for images
outside the training domain.

10. Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J. Y., & Ermon, S. (2021). Sdedit: Guided image
synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073.
 The primary objective of the paper is to introduce SDEdit, a framework for guided image synthesis
and editing that balances realism and faithfulness to user inputs, enabling the generation of photo-
realistic images from various levels of detail without the need for extensive data collection or
model retraining.
 The dataset used in the experiments includes ImageNet, LSUN (cat and horse), CelebA-HQ, and
FFHQ. These datasets are utilized for stroke-based image synthesis and image compositing tasks
with SDEdit .
 The paper primarily focuses on specific datasets (e.g., LSUN and CelebA-HQ) and may not fully
address the performance of SDEdit across a broader range of image types and editing tasks.
 While SDEdit shows significant improvements, further exploration is needed to understand its
limitations in real-world applications and its adaptability to various user inputs and styles.
11. Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., & Cohen-Or, D. (2022). Prompt-to-
prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626.
 The primary goal of the paper is to develop a prompt-to-prompt image editing framework that allows
users to modify images using only textual prompts, without the need for spatial masks. This aims to
preserve the original image's structure and content while enabling intuitive editing.
 The authors acknowledge that the challenge of inversion for text-guided diffusion models is an area
for future research, indicating a gap in the current understanding and implementationThere is a
suggestion to incorporate cross-attention in higher-resolution layers to improve localized editing,
which remains unaddressed in the current work.
12. Chen, X., Zhao, Z., Zhang, Y., Duan, M., Qi, D., & Zhao, H. (2022). Focalclick: Towards practical
interactive image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (pp. 1300-1309).
 The primary goal of FocalClick is to develop a practical interactive image segmentation method that
efficiently produces fine masks with quick responses, particularly on low-power devices. It
addresses the gap between academic approaches and industrial needs by improving both efficiency
and accuracy in mask annotation.
 The dataset used in the study is based on the DAVIS dataset, specifically a new benchmark called
DAVIS-585. This dataset was created by annotating each object or accessory separately and
filtering out masks under 300 pixels, resulting in 585 test samples. The authors also simulated
defects on ground truth masks using super-pixels to generate flawed initial masks for their
experiments.
 Although FocalClick improves efficiency, there remains a challenge in maintaining accuracy,
especially when reducing input sizes for faster processing.
 Need for Larger Datasets: The performance gap compared to SOTA methods highlights the need for
larger and m
13. Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, Youngjung Uh; Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 852-861
 The primary objective of the paper titled "Exploiting Spatial Dimensions of Latent in GAN for
Real-time Image Editing" is to enhance the capabilities of Generative Adversarial Networks
(GANs) for real-time image editing. The authors introduce a novel approach called
StyleMapGAN, which aims to address several limitations associated with traditional GANs.
 The Paper presents several notable advantages that enhance the capabilities of image editing
using GANs Real-time Image Editing: Improved Fidelity and Accuracy High-Quality Output.
 The paper has gaps like low fidelity in encoder projections, limited exploration of spatial
dimensions, and narrow performanc comparison. It also lacks real-world adaptability insights
and a clear roadmap for integrating its methods into other architectures.
14. Omer Bar-Tal, Dolev Ofri-Amar, Rafail Fridman, Yoni Kasten, and Tali Dekel. Text2LIVE:
Textdriven layered image and video editing. arXiv preprint, arXiv:2204.02491, 2022.
 The main goal of DIFFEDIT is to enable semantic image editing by automatically identifying
regions of an image that need to be edited based on a text query, enhancing the editing process
without requiring user-generated masks
 DIFFEDIT leverages a diffusion model to produce more natural and subtle edits by integrating
the edited regions into the background effectively, outperforming previous methods
 The paper identifies gap in existing methods require user-generated masks, DIFFEDIT addresses
automatically generating masks, but it still faces challenges in ensuring the text query aligns
well with the image content.
15. . Omri Avrahami, Dani Lischinski, Ohad Fried; Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), 2022, pp. 18208-18218
 The paper aims to introduce a novel solution for performing local edits in natural images
using natural language descriptions and region-of-interest (ROI) masks. This is achieved by
combining a (CLIP) with (DDPM) to generate realistic image edits based on user prompts
 The paper Gives an advantages like Intuitive Interface means it gives highly intuitive interface
for users, making it easier to specify desired changes .
High Realism: The method outperforms previous solutions in terms of overall realism
 Improving the ranking system to consider the entire image context could enhance results. Future
research could extend the method to 3D or videos and train CLIP to be noise-agnostic for
better robustness.
16. Jacob Austin, Daniel Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured
denoising diffusion models in discrete state-spaces. In NeurIPS, volume 34, 2021.
 The paper aims to develop a framework for semantic image synthesis that generates
photorealistic images from semantic layouts, addressing the limitations of GAN-based
methods in handling complex scenes.
 The framework outperforms previous methods in generating high-fidelity, diverse images, state-
of-the-art results benchmark datasets. improves image quality and balances the trade-off
between quality and diversity.
 The paper highlights a gap in GAN-based methods' inability to generate high-fidelity and diverse
results for complex scenes, which the proposed framework addresses by using diffusion
models instead of adversarial learning.
 ore diverse training datasets to fully leverage FocalClick's capabilities.
17. Nguyen, T., Ojha, U., Li, Y., Liu, H., & Lee, Y. J. (2024). Edit One for All: Interactive Batch Image
Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (pp. 8271-8280).
 The objective of the research is to develop a method for interactive batch image editing that
allows users to apply a specified edit from one image to a large batch of images while
maintaining a consistent final state across all edited images.
 The method for interactive batch image editing minimizes human supervision in the editing
process, allowing for significant time.
 The research identifies a gap in the existing methods that primarily focus on single image
editing, leaving batch image editing underexplored.
18. . Liang, Y., Gan, Y., Chen, M., Gutierrez, D., & Muñoz, A. (2019, October). Generic interactive
pixel‐level image editing. In Computer Graphics Forum (Vol. 38, No. 7, pp. 23-34).
 The objective of the research is to develop a generic interactive pixel-level image editing
paradigm that generates continuous additional per-pixel values from user inputs, specifically
RGB color values and user scribbles.
 The paradigm allows for interactive refinement of image edits, enhancing user control and
satisfaction
 It produces results that are on-par with state-of-the-art methods across various applications such
as depth of field blurring and dehazing.
 The research paper identifies gaps in previous superpixel-based image editing methods,
particularly regarding the propagation of additional values, which often results in artificial
discontinuities at superpixel boundaries.
19. Shi, Y., Xue, C., Liew, J. H., Pan, J., Yan, H., Zhang, W., ... & Bai, S. (2024). Dragdiffusion:
Harnessing diffusion models for interactive point-based image editing. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8839-8849).
 The objective of the research is to introduce DRAGDIFFUSION, a novel method that extends
interactive point-based editing to large-scale diffusion models. This method aims to enhance
the applicability of interactive editing by allowing users to manipulate images at a fine-grained
level through point-based interactions.
 DRAGDIFFUSION enhances the applicability of interactive point-based editing by utilizing
large-scale diffusion models, which improves generalizability compared to previous methods
reliant on GANs
20. Shin, J., Choi, D., & Park, J. (2024, December). InstantDrag: Improving Interactivity in Drag-based
Image Editing. In SIGGRAPH Asia 2024 Conference Papers (pp. 1-10).
 The objective of the research paper is to improve interactivity in drag-based image editing
through the development of a dedicated pipeline called InstantDrag.
 Instant Drag excels at preserving consistency, particularly high-frequency features, even without
the use of a mask.
 The method generates plausible images with realistic motions, enhancing the quality of the edits.
 The model occasionally struggles with preserving identity or creating accurate motions for non-
facial scenes without fine-tuning, as it has been primarily trained on facial videos. This
indicates a gap in generalizability across diverse motion types and scenes.
21. . Shinagawa, S., Yoshino, K., Alavi, S. H., Georgila, K., Traum, D., Sakti, S., & Nakamura, S. (2020). An
Interactive Image Editing System Using an Uncertainty-Based Confirmation Strategy. IEEE Access, 8,
98471-98480.
 The objective of this paper is to develop an interactive image editing framework using a modified
Deep Convolutional Generative Adversarial Network (DCGAN) with a Source Image Masking
module and an entropy-based confirmation strategy to enhance user control, dialogue efficiency,
and image quality in response to natural language editing requests.
 The dataset used in the paper is the Avatar Image Manipulation with an Instruction dataset, which
consists of 22 types of editing tasks, such as changing a beard, eyebrows, and hair. The dataset is
structured as triplets of {source image, target image, instruction (editing request)} and was split
into training, validation, and test sets in the ratio of 4:1:1, totaling 230 samples for validation and
testing each.
 It may struggle with ambiguous natural language requests, which can hinder the image editing
process and limit the effectiveness of the masking mechanism, potentially restricting the range
of changes that can be made to the images
22. Lin, J., Zhang, R., Ganz, F., Han, S., & Zhu, J. Y. (2021). Anycost gans for interactive image synthesis and
editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14986-
14996).
 The primary objective of the paper is to propose "Anycost" GANs, which are designed for interactive
image synthesis and editing. The goal is to create a generator that can operate at various computational
costs while maintaining visually consistent outputs. This allows for quick previews during editing and
high-quality final outputs when needed.
 The paper does not provide quantitative results for latent space-based editing, which could limit the
understanding of the model's performance in practical applications.
 Spatially-Varying Trade-offs: There is a gap in the model's ability to support spatially-varying trade-
offs between fidelity and latency, which could enhance its adaptability to different editing scenarios.
23.Cui, X., Li, Z., Li, P., Hu, Y., Shi, H., & He, Z. (2023). Chatedit: Towards multi-turn interactive facial
image editing via dialogue. arXiv preprint arXiv:2303.11108.
 The primary objective is to develop a multi-turn interactive facial image editing system via
dialogue. It introduces the CHATEDIT benchmark dataset, which facilitates research in this field.
 The C HAT E DIT dataset comprises 12,000 examples, each consisting of a facial image, a
corresponding caption, and an annotated multi-turn dialogue. The dataset is divided into training
(10,000 examples), validation (1,000 examples), and testing sets (1,000 examples). It includes
approximately 96,174 utterances across these dialogues, with an average of 4 turns per dialogue and
8 utterances per dialogue. The dataset is constructed using images from the CelebA-HQ dataset,
focusing on 21 editable facial attributes.
24. Ivan Anokhin, Kirill V. Demochkin, Taras Khakhulin, Gleb Sterkin, Victor S. Lempitsky, and Denis
Korzhenkov. Image generators with conditionally-independent pixel synthesis. 2021 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR), pages 14273–14282, 2020
 The paper presents INVE (Interactive Neural Video Editing), a real-time video editing solution that
propagates edits made on a single frame across the entire video, improving speed and editability
compared to previous methods like the Layered Neural Atlas (LNA)

 INVE speeds up training and inference, being 5 times faster than existing methods, and allows
users to make edits on one frame that automatically apply to the entire video. It simplifies video
editing for novices by reducing the need for frame-by-frame adjustments
 The paper identifies gaps in INVE's support for certain editing use cases, like direct frame editing
and rigid texture tracking, which LNA also struggled with. It also notes that despite improved
speed, the mapping process may still limit fully intuitive editing

25. Mirabet-Herranz, N. (2024). Advancing Beyond People Recognition in Facial Image Processing (Doctoral
dissertation, Sorbonne Université).
 The objective of the research is to develop a method for generating associative skeleton guidance maps
that facilitate human-object interaction in image editing. This involves creating an object-interactive
skeleton that can be synthesized naturally with a human figure interacting with an object.
 The framework demonstrates superior performance in image editing tasks compared to existing
models, as indicated by quantitative results on metrics such as FID, KID, and CS.

 The research identifies a critical limitation in existing image generation models, particularly their
inability to autonomously generate additional condition maps necessary for accurately rendering
human figures, which often requires manual input of supplementary details.
Table:
S. Title Year Objective Limitations Advantages Performa Gaps
No nce
Metrics
1. Ide-3d: 2022 The paper addresses The limitations It allows - The method relies on
Interactive the challenge faced include users to a single image to
disentangled by existing 3D- potential perform reconstruct a 3D facial
editing for aware facial distortions in interactive volume, which is
high- generation methods, facial shapes, global and inherently an ill-posed
resolution 3d- which often challenges in local editing problem. This can
aware portrait compromise between maintaining on facial lead to implausible
synthesis quality and consistency features facial geometry in
editability. The across while some cases, indicating
proposed approach different poses,maintaining a limitation in the
aims to provide both and the view method's ability to
high-resolution requirement consistency, accurately capture
outputs and flexible for standard enabling complex facial
editing capabilities front-facing adjustments structures from
images for to elements limited data
accurate like glasses,
editing. hair, and
expressions.
2 Sequential 2020 The paper introduces While It Zaq-Seq The model's potential
Attention Interactive Image SeqAttnGAN effectively 1.IS:9.58 difficulty in handling
GAN for Editing using performs well, enables 2.FID:50 complex
Interactive SeqAttnGAN, it may still interactive .31 modifications that
Image enabling users to struggle with image 3.SSIM: require a deeper
Editing modify images complex editing 0.651 understanding of
through multi-turn modifications through context and semantics,
commands while that require a multi-turn as well as the need for
maintaining deeper commands, further exploration of
contextual understanding ensuring how well
consistency and of context and high visual SeqAttnGAN
image quality semantics quality and generalizes to diverse
beyond the contextual image editing tasks
provided consistency beyond the fashion
textual in generated domain
commands images
3. Talk-to-Edit: 2021 The objective of this The paper does Its ability to 1.Bangs: Its reliance on user
Fine-Grained paper is to present not address the facilitate Is:06047 input, which may lead
Facial "Talk-to-Edit," an potential fine-grained As:0.366 to challenges in
Editing via interactive facial limitations in facial 0 accurately interpreting
Dialog editing system that handling attribute 2.Eyegla ambiguous or vague
enables users to highly manipulatio sses: language requests,
modify facial complex or n through Is:0.6229 potentially resulting in
attributes in images ambiguous natural As:0.772 unsatisfactory editing
through natural user requests, language 0 outcomes
language requests which may interactions, 3.Beard:
while preserving affect the allowing for Is:0.8324
identity and system's ability
a more As:0.689
enhancing editing to provide intuitive 1
realism. satisfactory and flexible 4.Smilin
edits in all editing g
scenarios. experience Is:0.6434
compared to As:0.502
traditional 8
methods
that require
fixed
control
patterns
4. MagicQuill: 2024 This paper is to MagicQuill Its ability to Over all Its current limitations
An Intelligent present and evaluate significantly enhance user in expanding editing
Interactive MagicQuill, an improves user satisfacti capabilities, such as
Image advanced image image editing experience on: 80% the lack of reference-
Editing editing system that efficiency and and based editing and
System utilizes diffusion precision, it precision in insufficient support
models and AI to still faces image for typography
enhance user limitations in editing manipulation within
experience and expanding through an images
precision in fine- editing intuitive
grained image capabilities, interface
manipulation. such as and real-
incorporating time
reference- prediction
based editing of user
and enhancing intentions
typography using a
support for multimodal
textual large
elements language
model.
5. EditGAN: 2021 The primary EditGAN, like EditGAN Measure Despite its
High- objective of the other GAN- requires d using a advantages, EditGAN
Precision paper is to propose based methods, significantly pretraine still faces challenges
Semantic EditGAN, a novel is limited to less d with certain complex
Image GAN-based image images that annotated ArcFace edits that require more
Editing. editing framework can be training feature extensive
that enables high- effectively data extractio optimization,
precision semantic modeled by the compared to n indicating a gap in
image editing. It GAN. This other network efficiency for specific
allows users to poses methods, to ensure use cases
modify detailed challenges needing as the
object part when applying few as 16 subject's
segmentations with it to complex labeled identity
minimal labeled scenes, such as examples. remains
examples, making it vivid intact
scalable for various cityscapes. after
object classes and editing.
part labels
6. InstructPix2P 2023 The primary The The model The The paper discusses
ix: Learning objective of the performance of allows for metrics the potential for
to Follow paper is to develop a the model is intuitive used in incorporating human
Image model that can limited by the image the paper feedback, such as
Editing perform image edits visual quality editing by to assess reinforcement
Instructions based on human- of the following the learning, to improve
written instructions generated diverse model alignment between the
without requiring dataset and the human include, model's outputs and
full descriptions of diffusion instructions, the human intentions,
the input or output model used, such as degree to indicating a gap in
images. The model which in this changing which current capabilities.
aims to generate case is Stable styles, the
edited images Diffusion replacing altered
directly in the objects, or image
forward pass, altering matches
enhancing the settings the
efficiency of image original
editing tasks image is
known as
image
consisten
cy
7. Pix2Video: 2023 The primary The key The The The paper
Video objective of the limitations approach is paper acknowledges that
Editing using paper "Pix2Video, identified in training- evaluates there is still room for
Image Video Editing using the paper is the free, the improvement in terms
Diffusion Image Diffusion" is challenge of allowing for performa of temporal coherency
to explore the maintaining generalizati nce of and suggests
feasibility of editing temporal on to a wide the exploring additional
video clips using a coherency, range of method energy terms, such as
pre-trained image especially edits using patch-based similarity
diffusion model when the without the metrics and CLIP similarity,
guided by text distance from need for such as during the latent
instructions, without the anchor video- CLIP update stage
requiring additional frame specific similarit
training. increases in fine-tuning y
longer videos, or extensive between
which can lead pre- image
to quality processing embeddi
degradation ng of
consecuti
ve
frames
(CLIP-
Image)
and the
mean-
squared
pixel
error
between
warped
frames
8. Imagen 2023 The primary The The Imagen This The paper identifies
Editor and objective of the performance of Editor is ranking- gaps in the model's
EditBench: paper is to present the Imagen preferred by based performance with
Advancing the Imagen Editor, a Editor drops human approach abstract attributes and
and model designed for significantly annotators measures complex prompts,
Evaluating text-guided image with complex over other how well suggesting that future
Text-Guided inpainting, which prompts models like the work should focus on
Image allows users to make (Mask-Rich), SD and generate improving these areas.
Inpainting localized edits to indicating that DL2, with d image
images based on while it excels preference retrieves
user-defined masks in simpler rates of the text
and text prompts. scenarios. 78% and prompt
77% . among
distractor
s
9. HyperStyle: 2022 The primary While HyperStyle Measure The paper notes
StyleGAN objective of the HyperStyle operates d using challenges in
Inversion paper is to present generalizes nearly 200 the comparing
with HyperStyle, a well, there is times faster Curricula editability
HyperNetwor method for image still a need for than rFace across different
ks for Real inversion that improvement StyleGAN2 method, inversion
Image achieves high- in handling optimizatio assessing methods due to
Editing quality unaligned n, making it how well varying editing
reconstructions and images and practical for the strengths, which
editability in latent unstructured real-time original could introduce
space while being domains. applications identity bias.
computationally Although it .It provides is Further research
efficient compared performs well, highly- preserve is needed to
to traditional there may still editable d during enhance
optimization be cases where latent codes, edits. robustness to
techniques. identity allowing for Evaluate diverse input
preservation is meaningful d conditions,
not perfect modificatio through particularly for
compared to ns while trait- images outside
some preserving specific the training
optimization identity. classifier domain
methods. HyperStyle s to
achieves determin
visually e the
comparable extent of
results to modifica
optimizatio tions
n supporte
techniques d by the
but with latent
significantly codes [8]
lower .
computation Compare
al costs d
qualitativ
ely and
quantitati
vely
against
state-of-
the-art
methods
using
metrics
like
LPIPS
and L2
loss

10. SDEDIT: 2021 The primary The SDEdit L2 The paper

GUIDED objective of the performance of outperforms Distance primarily
IMAGE paper is to introduce SDEdit can be traditional : Used focuses on
SYNTHESIS SDEdit, a framework affected by the GAN-based for specific datasets
AND for guided image quality of the models in measurin (e.g., LSUN and
EDITING synthesis and editing user-provided generating g CelebA-HQ)
WITH that balances realism guides, as realistic faithfuln and may not
STOCHASTI and faithfulness to lower-quality images that ess in fully address the
C user inputs, enabling inputs may maintain the image performance of
DIFFERENT the generation of lead to less semantics generatio SDEdit across a
IAL photo-realistic realistic of input n. broader range of
EQUATION images from various outputs [7]. stroke MTurk image types and
S levels of detail While SDEdit paintings, and KID editing tasks.
without the need for shows achieving Scores: While SDEdit
extensive data promise, its over 80% Employe shows
collection or model generalization better d for significant
retraining across diverse realism assessing improvements,
datasets and scores and realism further
tasks may still 75% higher and exploration is
require further overall overall needed to
validation satisfaction human understand its
scores satisfacti limitations in
compared to on real-world
baselines. through applications and
The pairwise its adaptability
framework comparis to various user
allows for ons inputs and styles
flexible User
image Satisfact
editing and ion
synthesis Scores:
without the Quantifie
need for d by
retraining combinin
models for g
each new faithfuln
task, ess and
addressing realism
the evaluatio
inefficiencie ns from
s of existing human
methods. assessors

11 Prompt-to- 2022 The primary goal of The current The The paper The authors
Prompt the paper is to inversion framework evaluates the acknowledg
Image develop a prompt-to- process can allows for performance e that the
Editing with prompt image lead to visible text-driven based on the challenge of
Cross editing framework distortions in editing, fidelity of inversion for
Attention that allows users to some test making it the generated text-guided
Control modify images using images, which easier for images to the diffusion
only textual prompts, affects the users to original models is an
without the need for quality of the express prompts and area for
spatial masks. This output. their intent the ability to future
aims to preserve the The attention without preserve the research,
original image's maps used in needing to original indicating a
structure and content the model are provide composition gap in the
while enabling of low detailed while current
intuitive editing. resolution, masks. making edits. understandin
limiting the The method The g and
precision of retains the effectiveness implementat
localized spatial of the ionThere is
editing layout and attention a suggestion
semantics injection to
of the across incorporate
original different cross-
image when diffusion attention in
making steps is also higher-
edits, which a key metric. resolution
is a layers to
significant improve
improveme localized
nt over editing,
traditional which
methods remains
that often unaddressed
lead to in the
complete current work
alterations

12 FocalClick: 2022 The primary goal of FocalClick FocalClick Number of Although

Towards FocalClick is to struggles with decomposes Clicks (NoC FocalClick
Practical develop a practical tiny structures, slow IOU): This improves
Interactive interactive image indicating that predictions metric efficiency,
Image segmentation its into two fast measures the there
Segmentation method that performance inferences average remains a
efficiently produces could be on small number of challenge in
fine masks with enhanced by crops, clicks maintaining
quick responses, utilizing more which required to accuracy,
particularly on low- finely significantly reach the especially
power devices. It annotated data enhances target when
addresses the gap or matting processing Intersection reducing
between academic datasets. speed and over Union input sizes
approaches and While it efficiency (IOU), with for faster
industrial needs by performs well compared to a default processing.
improving both on images previous upper limit Need for
efficiency and under 1080P, it methods of 20 clicks. Larger
accuracy in mask faces speed It Number of Datasets:
annotation bottlenecks introduces a Failures The
with 4K sub-task (NoF IOU): performance
images, called This metric gap
necessitating Interactive indicates the compared to
extensive Mask average SOTA
engineering Correction, number of methods
efforts to build allowing failures highlights
a practical users to when the the need for
annotation refine model does larger and
system preexisting not reach the more diverse
masks target IOU training
without within the datasets to
destroying specified fully
correct clicks leverage
parts, thus FocalClick's
improving capabilities
usability

13. Exploiting 2021 The primary The paper The Paper The paper The paper
Spatial objective of the presents presents evaluates has gaps
Dimensions paper titled significant several StyleMapGA like low
of Latent in "Exploiting Spatial advancements notable N metrics fidelity in
GAN for Dimensions of in real-time advantages like FID encoder
Real-time Latent in GAN for image editing that (image projections,
Image Real-time Image using GANs enhance the quality and limited
Editing Editing" is to but has capabilities diversity), exploration
enhance the limitations like of image MSE (pixel- of spatial
capabilities of dependency on editing level dimensions,
Generative segmentation using GANs accuracy), and narrow
Adversarial masks, Real-time MSE src/ref performanc
Networks (GANs) potential Image (local editingcomparison.
for real-time image artifacts, high Editing: accuracy). It also lacks
editing. The authors requirements. Improved These both real-world
introduce a novel Fidelity and pixel-level adaptability
approach called Accuracy and insights and
StyleMapGAN, High- perceptual a clear
which aims to Quality quality roadmap for
address several Output. assessment. integrating
limitations its methods
associated with into other
traditional GANs. architecture
s.
14. DIFFEDIT: 2022 The main goal of One limitation DIFFEDIT The The paper
DIFFUSION- DIFFEDIT is to noted is that leverages a performance identifies
BASED enable semantic the alignment diffusion of DIFFEDIT gap in
SEMANTIC image editing by between the model to is evaluated existing
IMAGE automatically text query and produce using metrics methods
EDIT- identifying regions the image more such as require
ING WITH of an image that caption is often natural and LPIPS user-
MASK need to be edited not perfect, subtle edits CSFID generated
GUIDANCE based on a text which can by demonstratin masks,
query, enhancing the affect the integrating g its DIFFEDIT
editing process quality of the the edited effectiveness addresses
without requiring edit regions into on datasets automaticall
user-generated the like y generating
masks background ImageNet masks, but
effectively, and COCO it still faces
outperformi challenges
ng previous in ensuring
methods the text
query aligns
well with
the image
content
15. Blended 2022 The paper aims to The model has The paper The paper Improving
Diffusion for introduce a novel high inference Gives an compares the ranking
Text-driven solution for time (30s per advantages proposed system to
Editing of performing local image) and a like method consider the
Natural edits in natural ranking system Intuitive against entire image
Images images using natural that overlooks Interface several context
language overall image means it baselines could
descriptions and quality. It also gives both enhance
region-of-interest inherits biases highly qualitatively results.
(ROI) masks. This from CLIP and intuitive and Future
is achieved by is unsuitable interface for quantitatively research
combining a (CLIP) for real-time or users, , could
with (DDPM) to low-power making it demonstratin extend the
generate realistic devices, easier to g its superior method to
image edits based on requiring faster specify performance 3D or
user prompts. diffusion desired in generating videos and
sampling. changes . realistic train CLIP
High images to be noise-
Realism: maintains agnostic for
The method background better
outperforms integrity robustness.
previous
solutions in
terms of
overall
realism

16 Semantic 2023 The paper aims to Despite The The paper The paper
Image develop a framework improvements, framework evaluates its highlights a
Synthesis via for semantic image the framework outperforms framework gap in
Diffusion synthesis that struggles with previous using FID for GAN-based
Models generates high-fidelity methods in image methods'
photorealistic images generation in generating quality, inability to
from semantic complex high- LPIPS for generate
layouts, addressing scenes like fidelity, diversity, and high-fidelity
the limitations of GANs. diverse mIOU and diverse
GAN-based methods Performance images, semantic results for
in handling complex metrics like state-of-the- interpretabilit complex
scenes. mIOU are art results y, ensuring a scenes,
resolution- benchmark comprehensi which the
sensitive, datasets. ve proposed
impacting improves performance framework
semantic image assessment. addresses
interpretabilityquality and by using
evaluation. balances the diffusion
trade-off models
between instead of
quality and adversarial
diversity. learning.
21 An 2020 The objective of this The potential Its SSIM It may
Interactive paper is to develop for ambiguity innovative With P- struggle
Image an interactive image in natural use of an value<0.001 with
Editing editing framework language entropy- ambiguous
System Using using a modified requests, based natural
an Deep Convolutional which can confirmatio language
Uncertainty- Generative complicate the n strategy requests,
Based Adversarial Network image editing within an which can
Confirmation (DCGAN) with a process, and interactive hinder the
Strategy Source Image the trade-off image image
Masking module and between image editing editing
an entropy-based quality and the framework, process and
confirmation constraints which limit the
strategy to enhance imposed by the enhances effectivenes
user control, masking user control s of the
dialogue efficiency, mechanism, and reduces masking
and image quality in which may redundant mechanism,
response to natural restrict the dialogues potentially
language editing extent of while restricting
requests. changes that maintaining the range of
can be made to high image changes that
the images quality in can be made
response to to the
natural images.
language
editing
requests
22 Anycost 2021 The primary Anycost GANs The control The paper The paper
GANs for objective of the can be over discusses the does not
Interactive paper is to propose executed at channel use of provide
Image "Anycost" GANs, different numbers reconstructio quantitative
Synthesis and which are designed computational may be n loss to results for
Editing for interactive image budgets, challenging evaluate the latent
synthesis and allowing users for users performance space-based
editing. The goal is to choose unfamiliar of the full editing,
to create a generator between with neural generator and which could
that can operate at quality and networks. its sub- limit the
various efficiency. Future generators. It understandi
computational costs This is improveme also mentions ng of the
while maintaining achieved by nts aim to measuring model's
visually consistent using subsets provide the average performanc
outputs. This allows of weights more reconstructio e in
for quick previews from the full intuitive n practical
during editing and generator controls for performance applications
high-quality final without users. across .
outputs when requiring fine- The current different Spatially-
needed. tuning. model architectures Varying
The model approximate found Trade-offs:
supports multi- s every through an There is a
resolution output pixel evolutionary gap in the
outputs and equally, algorithm. model's
adaptive- which may The LPIPS ability to
channel not loss is used support
inference, prioritize for spatially-
which important comparing varying
enhances its objects (like image trade-offs
usability faces) over quality, between
across less critical indicating a fidelity and
different background focus on latency,
hardware elements. perceptual which could
configurations This could similarity in enhance its
lead to outputs. adaptability
suboptimal to different
results in editing
certain scenarios.
scenarios.

23. CHATEDIT: 2023 The primary ` It Directly It measures The current

Towards objective is to modifies the the ability to system
Multi-turn develop a multi-turn original correctly struggles
Interactive interactive facial image track user with out-of-
Facial Image image editing system instead of requests, scope
Editing via via dialogue. It cascading accuracy attributes,
Dialogue. introduces the edits, Measures if requiring
CHATEDIT preventing all requested better
benchmark dataset, error attributes adaptability.
which facilitates accumulatio
research in this field. n and
attribute
forgetting.
24. INVE: 2021 The paper presents INVE faces INVE The paper The paper
Interactive INVE (Interactive challenges speeds up highlights identifies
Neural Video Neural Video with non-linear training and that INVE is gaps in
Editing Editing), a real-time mapping inference, 5 times faster INVE's
video editing between the being 5 than LNA, support for
solution that atlas and video times faster requiring certain
propagates edits pixels, leading than only 12,000 editing use
made on a single to less intuitive existing iterations for cases, like
frame across the editing and methods, training on direct frame
entire video, potential and allows 70-frame editing and
improving speed and artifacts. The users to videos at rigid texture
editability compared reliance on make edits 768×432 tracking,
to previous methods layered neural on one resolution, which LNA
like the Layered atlas frame that compared to also
Neural Atlas (LNA) representations automaticall LNA's struggled
may also cause y apply to 300,000 with. It also
issues with the entire iterations. notes that
drift and video. It despite
occlusions, simplifies improved
affecting edit video speed, the
quality. editing for mapping
novices by process may
reducing the still limit
need for fully
frame-by- intuitive
frame editing.
adjustments
.

RESULT:

A comprehensive comparison between our proposed method, Spatio-Temporal Unet-Editor, and

several state-of-the-art interactive image editing approaches across different editing signals, including
Sketch and CoarseEdit.

Method EditingSignal Training CLIP-FID(↓) LPIPS(↓) SSIM(↑)

Samples
MasaCtrl+ControlNet Sketch None 17.933 0.302 0.655

Spatio-Temporal Sketch 1600 6.345 0.223 0.937

UNet-Editor(ours)
MagicFixup CoarseEdit 2,500k 8.757 0.166 0.855

Spatio-Temporal CoarseEdit 1600 6.467 0.158 0.947

UNet-Editor(ours)
Table 1

A quantitative comparison of our proposed method, Spatio-Temporal UNet-Editor, against two

prominent baseline methods: MasaCtrl and MagicFixup. The comparison is based on three key
evaluation criteria that reflect the overall effectiveness and quality of the image editing process:

1. Visual Consistency (%): Measures how well the edited images maintain structural and spatial
consistency with the original content. Higher percentages indicate better preservation of the image
layout and coherence after editing.
2. Edit Accuracy (%): Evaluates how accurately the edits reflect the user-provided editing signals
(e.g., sketches or coarse edits). Higher scores demonstrate better alignment between the intended
and generated modifications.
3. Image Quality (%): Assesses the overall perceptual quality of the edited images, considering
aspects like sharpness, realism, and absence of artifacts. Higher percentages signify superior
visual fidelity and fewer distortions.

MethodComparison Visual Edit Accuracy Image Quality

Consistency
Spatio-Temporal 90.4% 94.25% 89.65%
UNet-Editor(ours)
vs MasaCtrl
Spatio-Temporal 89.3% 92.65% 86.75%
Unet-Editor(ours) vs
MagicFix

Table 2

This table presents a comparison of training and testing accuracy for three different image segmentation
methods: GAN-based, U-Net-based, and our proposed Spatio-Temporal UNet-Editor

Method Training Accuracy (%) Testing Accuracy (%)

GAN-based 83% 84%
U-Net-based (ours) 95% 92%

Table 3

CONCLUSION:

In this paper, we introduced Spatio-Temporal ViT-Editor, an advanced interactive image editing

framework that redefines traditional image editing tasks by leveraging video diffusion priors and UNET
architectures. Unlike conventional text-to-image diffusion approaches, our method frames image editing
as an image-to-video generation problem, enhancing spatial and temporal consistency, and enabling
precise, high-quality transformations with minimal data requirements.
By integrating a Sparse Control Encoder, CLIP Vision Model, and a Spatio-Temporal UNet, our
framework allows users to intuitively guide image manipulation using various input modalities such as
sketches, coarse edits, and point annotations. The matching attention mechanism ensures dense
correspondence between source and target images, preserving object identity and structural coherence
even in complex edits.

Through extensive experimentation and comparative evaluations, Spatio-Temporal UNet-Editor

demonstrates superior performance over existing methods, achieving higher visual consistency, edit
accuracy, and image quality metrics. Our method exhibits robust generalization, handling out-of-domain
edits with ease and significantly reducing the training data requirements compared to other state-of-the-
art approaches like MagicFixup and MasaCtrl. This research paves the way for scalable, efficient, and
user-friendly interactive image editing systems. Future work may explore extending the framework for
real-time video editing applications, multi-modal editing incorporating textual prompts, and further
optimization for low-resource hardware environments.

REFERNCES:

[1] Sun, J., Wang, X., Shi, Y., Wang, L., Wang, J., & Liu, Y. (2022). Ide-3d: Interactive disentangled
editing for high-resolution 3d-aware portrait synthesis. ACM Transactions on Graphics (ToG), 41(6), 1-
10.

[2] Cheng, Y., Gan, Z., Li, Y., Liu, J., & Gao, J. (2020, October). Sequential attention GAN for
interactive image editing. In Proceedings of the 28th ACM international conference on multimedia (pp.
4383-4391).

[3] Jiang, Y., Huang, Z., Pan, X., Loy, C. C., & Liu, Z. (2021). Talk-to-edit: Finegrained facial editing
via dialog. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 13799-
13808).

[4] Liu, Z., Yu, Y., Ouyang, H., Wang, Q., Cheng, K. L., Wang, W., ... & Shen, Y. (2024). Magicquill:
An intelligent interactive image editing system. arXiv preprint arXiv:2411.09703.

[5] Ling, H., Kreis, K., Li, D., Kim, S. W., Torralba, A., & Fidler, S. (2021). Editgan: High-precision
semantic image editing. Advances in Neural Information Processing Systems, 34, 16331-16345.

[6] Brooks, T., Holynski, A., & Efros, A. A. (2023). Instructpix2pix: Learning to follow image editing
instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(pp. 18392-18402).

[7] Ceylan, D., Huang, C. H. P., & Mitra, N. J. (2023). Pix2video: Video editing using image diffusion.
In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 23206-23217).

[8] Wang, S., Saharia, C., Montgomery, C., Pont-Tuset, J., Noy, S., Pellegrini, S., ... & Chan, W.
(2023). Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18359-
18369).
[9] Alaluf, Y., Tov, O., Mokady, R., Gal, R., & Bermano, A. (2022). Hyperstyle: Stylegan inversion
with hypernetworks for real image editing. In Proceedings of the IEEE/CVF conference on computer
Vision and pattern recognition (pp. 18511-18521).

[10] Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J. Y., & Ermon, S. (2021). Sdedit: Guided image
synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073.

[11] Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., & Cohen-Or, D. (2022). Prompt-
to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626

[12] Chen, X., Zhao, Z., Zhang, Y., Duan, M., Qi, D., & Zhao, H. (2022). Focalclick: Towards practical
interactive image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (pp.1300)

[13] Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, Youngjung Uh; Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 852-861

[14] Omer Bar-Tal, Dolev Ofri-Amar, Rafail Fridman, Yoni Kasten, and Tali Dekel. Text2LIVE:
Textdriven layered image and video editing. arXiv preprint, arXiv:2204.02491, 2022.

[15] Omri Avrahami, Dani Lischinski, Ohad Fried; Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), 2022, pp. 18208-18218

[16] Jacob Austin, Daniel Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured
denoising diffusion models in discrete state-spaces. In NeurIPS, volume 34, 2021.

[17] Nguyen, T., Ojha, U., Li, Y., Liu, H., & Lee, Y. J. (2024). Edit One for All: Interactive Batch
Image Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (pp. 8271-8280).

[18] Liang, Y., Gan, Y., Chen, M., Gutierrez, D., & Muñoz, A. (2019, October). Generic interactive
pixel‐level image editing. In Computer Graphics Forum (Vol. 38, No. 7, pp. 23-34).

[19] Shi, Y., Xue, C., Liew, J. H., Pan, J., Yan, H., Zhang, W., ... & Bai, S. (2024). Dragdiffusion:
Harnessing diffusion models for interactive point-based image editing. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 8839-8849).

[20] Shin, J., Choi, D., & Park, J. (2024, December). InstantDrag: Improving Interactivity in Drag-based
Image Editing. In SIGGRAPH Asia 2024 Conference Papers (pp. 1-10).

[21] Shinagawa, S., Yoshino, K., Alavi, S. H., Georgila, K., Traum, D., Sakti, S., & Nakamura, S.
(2020). An Interactive Image Editing System Using an Uncertainty-Based Confirmation Strategy. IEEE
Access, 8, 98471-98480.

[22] Lin, J., Zhang, R., Ganz, F., Han, S., & Zhu, J. Y. (2021). Anycost gans for interactive image
synthesis and editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (pp. 14986-14996).
[23] Cui, X., Li, Z., Li, P., Hu, Y., Shi, H., & He, Z. (2023). Chatedit: Towards multiturn interactive
facial image editing via dialogue. arXiv preprint arXiv:2303.11108.

[24] Ivan Anokhin, Kirill V. Demochkin, Taras Khakhulin, Gleb Sterkin, Victor S. Lempitsky, and
Denis Korzhenkov. Image generators with conditionally-independent pixel synthesis. 2021 IEEE/CVF
Conference on Computer Vision and PatterenRecognition (CVPR), pages 14273–14282, 2020

[25] Mirabet-Herranz, N. (2024). Advancing Beyond People Recognition in Facial Image

Processing (Doctoral dissertation, Sorbonne Université).

Guidelines To Prepare Project Report
No ratings yet
Guidelines To Prepare Project Report
4 pages
Rimt-Iet, Mandi Gobindgarh: Format For Preparation of Project Report
No ratings yet
Rimt-Iet, Mandi Gobindgarh: Format For Preparation of Project Report
13 pages
Project Report Guidelines
No ratings yet
Project Report Guidelines
6 pages
Guidelines To Write Project Report and Certificate
No ratings yet
Guidelines To Write Project Report and Certificate
4 pages
Project Report Format-YTC
No ratings yet
Project Report Format-YTC
13 pages
Project Report Format CS
No ratings yet
Project Report Format CS
5 pages
VTU Project Report Preparation Guidelines
No ratings yet
VTU Project Report Preparation Guidelines
6 pages
The Format For The Project Report: B.Tech (CSE/IT) - VIII Sem
No ratings yet
The Format For The Project Report: B.Tech (CSE/IT) - VIII Sem
11 pages
Project Format 4th Sem
No ratings yet
Project Format 4th Sem
10 pages
Project Format
No ratings yet
Project Format
11 pages
Major Project Report Format
No ratings yet
Major Project Report Format
8 pages
3rd, 5th & 7th Industrial Training Report Format
No ratings yet
3rd, 5th & 7th Industrial Training Report Format
13 pages
Vtu Projectreport Format
No ratings yet
Vtu Projectreport Format
9 pages
Department of Electronics and Communication Engineering: B.Tech Mini Project Guidelines & Report Format
No ratings yet
Department of Electronics and Communication Engineering: B.Tech Mini Project Guidelines & Report Format
12 pages
VTU UG Project Report Guide
No ratings yet
VTU UG Project Report Guide
3 pages
Project Report Template - GIST
No ratings yet
Project Report Template - GIST
7 pages
Final Year Mini Project Report Format
No ratings yet
Final Year Mini Project Report Format
18 pages
7th Sem Project Work Report Format
No ratings yet
7th Sem Project Work Report Format
9 pages
Project Report Format
No ratings yet
Project Report Format
13 pages
Project-Format Final
No ratings yet
Project-Format Final
11 pages
BCA Minor Project Report Format
No ratings yet
BCA Minor Project Report Format
9 pages
Article Index: Written by Administrator
No ratings yet
Article Index: Written by Administrator
5 pages
Project Report Preparation Guidelines For M.Tech Thesis
No ratings yet
Project Report Preparation Guidelines For M.Tech Thesis
9 pages
D Section Mini Project Guidelines 2019 2020
No ratings yet
D Section Mini Project Guidelines 2019 2020
9 pages
Full - Updated - Report - Format (2024-25)
No ratings yet
Full - Updated - Report - Format (2024-25)
41 pages
Guidelines For The Preparation of Be Report
No ratings yet
Guidelines For The Preparation of Be Report
5 pages
Guidelines For The Preparation of B
No ratings yet
Guidelines For The Preparation of B
11 pages
Project Report Guidelines
No ratings yet
Project Report Guidelines
15 pages
Aiml Report Format 2025-2026
No ratings yet
Aiml Report Format 2025-2026
8 pages
8th Sem Project Phase 2 Guidelines - Vtu - Academic Year 2020 - 21
No ratings yet
8th Sem Project Phase 2 Guidelines - Vtu - Academic Year 2020 - 21
7 pages
Project Report Format PDF
100% (1)
Project Report Format PDF
7 pages
Blackbook Sample For Engineering Projects
100% (3)
Blackbook Sample For Engineering Projects
26 pages
Project Report Guidelines
No ratings yet
Project Report Guidelines
7 pages
Project Thesis Guidelines for SVCE Indore
No ratings yet
Project Thesis Guidelines for SVCE Indore
11 pages
Major Project Report Guidelines
No ratings yet
Major Project Report Guidelines
27 pages
Guidelines For The Preparation of B.E./B. Tech. Project Reports
No ratings yet
Guidelines For The Preparation of B.E./B. Tech. Project Reports
6 pages
Format For Final Year Project Report
No ratings yet
Format For Final Year Project Report
8 pages
Major Project Report Format
No ratings yet
Major Project Report Format
27 pages
Mini Project Report Format 2022-23 (III Sem)
No ratings yet
Mini Project Report Format 2022-23 (III Sem)
13 pages
Guidelines For The Preparation of M
No ratings yet
Guidelines For The Preparation of M
6 pages
BCA Major Project Report Format
No ratings yet
BCA Major Project Report Format
9 pages
GUIDELINES FOR THE PREPARATION - Industry
No ratings yet
GUIDELINES FOR THE PREPARATION - Industry
6 pages
Wa0102.
No ratings yet
Wa0102.
7 pages
Project Details
No ratings yet
Project Details
5 pages
Format of Report
100% (1)
Format of Report
14 pages
Final Year PW-1 VTU Guidelines For The Preparation of The Project Reports
No ratings yet
Final Year PW-1 VTU Guidelines For The Preparation of The Project Reports
7 pages
Report
No ratings yet
Report
11 pages
Mini Project Guidelines Electronics Lab BEC151
No ratings yet
Mini Project Guidelines Electronics Lab BEC151
13 pages
Main Proj Format
No ratings yet
Main Proj Format
9 pages
SE CSE III MiniProject 1 ReportFormat 2023 24
No ratings yet
SE CSE III MiniProject 1 ReportFormat 2023 24
9 pages
M.tech Project Documentation Guidelines& Certificates
No ratings yet
M.tech Project Documentation Guidelines& Certificates
8 pages
New Project Report Format - 2022-23 Annexure
No ratings yet
New Project Report Format - 2022-23 Annexure
10 pages
2504-12 Mind The Trojan Horse Image Prompt Adapter Enabling Scalable and
No ratings yet
2504-12 Mind The Trojan Horse Image Prompt Adapter Enabling Scalable and
28 pages
ComfyUI Text To Image Workflow
No ratings yet
ComfyUI Text To Image Workflow
15 pages
EA-AI 2025 Programme
No ratings yet
EA-AI 2025 Programme
4 pages
Infinity: High-Res Image Synthesis
No ratings yet
Infinity: High-Res Image Synthesis
24 pages
3.DiffBoost Enhancing Medical Image
No ratings yet
3.DiffBoost Enhancing Medical Image
13 pages
Best AI Image Generator
100% (1)
Best AI Image Generator
12 pages
Revolutionizing Visuals: The Role of Generative AI in Modern Image Generation
No ratings yet
Revolutionizing Visuals: The Role of Generative AI in Modern Image Generation
22 pages
Mca Project
No ratings yet
Mca Project
13 pages
Got: Unleashing Reasoning Capability of Multimodal Large Language Model For Visual Generation and Editing
No ratings yet
Got: Unleashing Reasoning Capability of Multimodal Large Language Model For Visual Generation and Editing
20 pages
Janus Pro Tech Report
No ratings yet
Janus Pro Tech Report
13 pages
LLM-Grounded Diffusion for Image Generation
No ratings yet
LLM-Grounded Diffusion for Image Generation
29 pages
Ground Dino
No ratings yet
Ground Dino
14 pages
Wang Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model CVPR 2024 Paper
No ratings yet
Wang Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model CVPR 2024 Paper
11 pages
463 Linguistic Profiling of Deepfa
No ratings yet
463 Linguistic Profiling of Deepfa
12 pages
Admsci 15 00395
No ratings yet
Admsci 15 00395
26 pages
Skillcertpro GenAI 3
No ratings yet
Skillcertpro GenAI 3
66 pages
Beyond Finite Data: Towards Data-Free Out-Of-Distribution Generalization Via Extrapolation
No ratings yet
Beyond Finite Data: Towards Data-Free Out-Of-Distribution Generalization Via Extrapolation
24 pages
Generative AI System Design Resources
No ratings yet
Generative AI System Design Resources
5 pages
Intrinsix Diffusion
No ratings yet
Intrinsix Diffusion
11 pages
Movie Gen: Advanced Video Generation Models
No ratings yet
Movie Gen: Advanced Video Generation Models
96 pages
Videocrafter1: Open Diffusion Models For High-Quality Video Generation
No ratings yet
Videocrafter1: Open Diffusion Models For High-Quality Video Generation
12 pages
Editar: Unified Conditional Generation With Autoregressive Models
No ratings yet
Editar: Unified Conditional Generation With Autoregressive Models
22 pages
Capstone Project Report
No ratings yet
Capstone Project Report
26 pages
Gme: Improving Universal Multimodal Retrieval by Multi-Modal Llms
No ratings yet
Gme: Improving Universal Multimodal Retrieval by Multi-Modal Llms
32 pages
Lets Go Shopping: Webscale Image-Text Dataset For Visual Concept Understanding
No ratings yet
Lets Go Shopping: Webscale Image-Text Dataset For Visual Concept Understanding
20 pages
Video Diffusion Tutorial Prof Mike Shou NUS 2023 Dec 15
No ratings yet
Video Diffusion Tutorial Prof Mike Shou NUS 2023 Dec 15
274 pages
Make Pixels Dance - High-Dynamic Video Generation
No ratings yet
Make Pixels Dance - High-Dynamic Video Generation
11 pages
DragonDiffusion: Image Editing with Diffusion
No ratings yet
DragonDiffusion: Image Editing with Diffusion
10 pages
Research Paper Shailesh Tagadghar 31031523034
No ratings yet
Research Paper Shailesh Tagadghar 31031523034
16 pages
B.Tech Text2Video Project Report
No ratings yet
B.Tech Text2Video Project Report
49 pages

Project Report Format 2023

Uploaded by

Project Report Format 2023

Uploaded by

Guidelines for Preparation of Project Report

Ex: ABSTRACT, ACKNOWLEDGEMENT, LIST OF FIGURES, LIST OF

BACHELOR OF TECHNOLOGY (14 font)

Submitted by (12 font)

GMR Institute of Technology

GMR Nagar, Rajam – 532127,

Department of Computer Science and Engineering

Technology in Computer Science and Engineering of GMRIT, Rajam affiliated to

University or Institute for the award of any degree.

Signature of Supervisor Signature of HOD

The report is submitted for the viva-voce examination held on ………………..

It gives us an immense pleasure to express deep sense of gratitude to my guide

Name 1 Regd. no.

Orthogonal Frequency Division Multiplexing (OFDM) which is highly promising in terms of

to strike a trade-off among some parameters such as computational complexity, PAPR

reduction performance, BER performance and redundancy.

Sample page of contents:

1.1 Introductory paragraph

TABLE NO TITLE PAGE NO

1.1 Data hiding technique 12

FIGURE NO TITLE PAGE NO

1.1 Conceptual Diagram for Generic Multi-carrier 12

LIST OF SYMBOLS & ABBREVIATIONS (Alphabetic order)

S/P : Serial to Parallel Converter

10. SDEDIT: 2021 The primary The SDEdit L2 The paper

12 FocalClick: 2022 The primary goal of FocalClick FocalClick Number of Although

23. CHATEDIT: 2023 The primary ` It Directly It measures The current

A comprehensive comparison between our proposed method, Spatio-Temporal Unet-Editor, and

Method EditingSignal Training CLIP-FID(↓) LPIPS(↓) SSIM(↑)

Spatio-Temporal Sketch 1600 6.345 0.223 0.937

Spatio-Temporal CoarseEdit 1600 6.467 0.158 0.947

A quantitative comparison of our proposed method, Spatio-Temporal UNet-Editor, against two

MethodComparison Visual Edit Accuracy Image Quality

Method Training Accuracy (%) Testing Accuracy (%)

In this paper, we introduced Spatio-Temporal ViT-Editor, an advanced interactive image editing

Through extensive experimentation and comparative evaluations, Spatio-Temporal UNet-Editor

[25] Mirabet-Herranz, N. (2024). Advancing Beyond People Recognition in Facial Image

You might also like