Scene Text Editing papers

SRNet

style retention network (SRNet)
Editing Text in the wild. 이걸 기반해서 SwapText, STEFANN이 나옴
Three modules
1. text conversion module : changes the text content of the source image into the target text while keeping the original text style
2. background inpainting module : erases the original text and fills the text region with appropriate texture
3. fusion module : combines the information from the two former modules, and generates the edited text images
paper : https://arxiv.org/pdf/1908.03047.pdf
code : https://github.com/endy-see/SRNet-1

character-level text editing in image
the unobserved character (target) is generated from an observed character (source) being modified
replace the source character with the generated character maintaining both geometric and visual consistency with neighboring characters
paper : STEFANN_CVPR_2020_paper.pdf
code: https://github.com/prasunroy/stefann

get content and style features from a text image by using scene text recognition
rewrite new text to the original image by using the features
paper : https://arxiv.org/pdf/2107.11041.pdf
code : https://github.com/GOGOOOMA/AIFFEL_Hackathon (unofficial code for RewriteNet) official github (currently not available)

Scene Text Replacement in Videos
the text in all frames is normalized to a frontal pose using a spatio-temporal transformer network
the text is replaced in a single reference frame using a state-of-art still-image text replacement method
the new text is transferred from the reference to remaining frames using a novel learned image transformation network
paper : G_STRIVE_Scene_Text_Replacement_in_Videos_ICCV_2021_paper.pdf
github : https://github.com/striveiccv2021/STRIVE-ICCV2021

MOdifying Scene Text image at strokE Level (MOSTEL)
generate stroke guidance maps to explicitly indicate regions to be edited
propose a Semisupervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images
paper : https://arxiv.org/pdf/2212.01982.pdf
code : https://github.com/qqqyd/MOSTEL

하고 싶은 task 에 가장 가까운 연구임. 하지만 코드 공개가 안되어있음..
a three-stage framework to transfer texts across scene images
1. first stage: a novel text swapping network to replace text labels only in the foreground image
2. second stage: a background completion network to reconstruct background images
3. third stage: the fusion network generate the word image by using the foreground and background images
paper : swaptext.pdf
code : not released

transfer font styles between different languages by observing only a few samples
network into a multilevel attention form to capture both local and global features of the style images
paper : Li_Few-Shot_Font_Style_Transfer_Between_Different_Languages_WACV_2021_paper.pdf
code : https://github.com/ligoudaner377/font_translator_gan

mT5 : https://huggingface.co/docs/transformers/model_doc/mt5
mBart: https://huggingface.co/transformers/v3.5.1/model_doc/mbart.html
googletrans : Free and Unlimited Google translate API for Python. Documentation
word2word : easy-to-use word translations by Kakao Brain. github link
seq2seq : A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation. github link

Scene text removal paper using Gated Attention and RoI Generation method
Gated Attention : focus on the text stroke as well as the textures and colors of the surrounding regions to remove text from the input image much more precisely
RoI Generation : focus on only the region with text instead of the entire image to train the model more efficiently
paper : 4705_ECCV_2022_paper
code : https://github.com/naver/garnet