0% found this document useful (0 votes)
18 views10 pages

SEMU Net

SEMU-Net is a comprehensive framework designed to address fabrication process variations in integrated silicon photonic devices by automating the segmentation of scanning electron microscope (SEM) images and utilizing deep learning models. It consists of three main components: a segmentation model that classifies SEM images, a predictor model that forecasts structural variations, and a corrector model that adjusts design files to align with intended specifications. Experimental results demonstrate high accuracy in segmentation and correction, significantly improving the reliability and efficiency of photonic device fabrication.

Uploaded by

Rambod Azimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views10 pages

SEMU Net

SEMU-Net is a comprehensive framework designed to address fabrication process variations in integrated silicon photonic devices by automating the segmentation of scanning electron microscope (SEM) images and utilizing deep learning models. It consists of three main components: a segmentation model that classifies SEM images, a predictor model that forecasts structural variations, and a corrector model that adjusts design files to align with intended specifications. Experimental results demonstrate high accuracy in segmentation and correction, significantly improving the reliability and efficiency of photonic device fabrication.

Uploaded by

Rambod Azimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SEMU-Net: A Segmentation-based Corrector for Fabrication Process Variations

of Nanophotonics with Microscopic Images

Rambod Azimi1* Yijian Kong1*


[email protected] [email protected]

Dusan Gostimirovic1 James J. Clark1


[email protected] [email protected]

Odile Liboiron-Ladouceur1†
[email protected]
1
Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada

Abstract and machine learning systems [23], which often require sig-
nificant resources for managing and processing complex
Integrated silicon photonic devices, which manipulate models [4, 6].
light to transmit and process information on a silicon-on- Integrated silicon photonic devices often underperform
insulator chip, are highly sensitive to structural variations. experimentally due to fabrication-induced structural vari-
Minor deviations during nanofabrication—the precise pro- ations [20]. Nanometer-scale deviations, such as over-
cess of building structures at the nanometer scale—such etching, under-etching, and corner rounding, can signifi-
as over- or under-etching, corner rounding, and unin- cantly degrade device performance, efficiency, and func-
tended defects, can significantly impact performance. To tionality [11]. Research indicates that these variations tend
address these challenges, we introduce SEMU-Net, a com- to follow consistent patterns, which may allow them to be
prehensive set of methods that automatically segments scan- learned and mitigated [11].
ning electron microscope (SEM) images and uses them to With advancements in AI and particularly convolutional
train two deep neural network models based on U-Net and neural networks (CNNs) [19], researchers have achieved re-
its variants. The predictor model anticipates fabrication- markable improvements in computer vision tasks, including
induced variations, while the corrector model adjusts the image [9, 25] and video recognition [30]. These sophisti-
design to address these issues, ensuring that the final fab- cated architectures have enabled the development of mod-
ricated structures closely align with the intended specifi- els that can automatically learn and extract features from
cations. Experimental results show that the segmentation visual data, enhancing accuracy, efficiency, and robustness
U-Net reaches an average IoU score of 99.30%, while the in tasks such as object detection [24] and semantic segmen-
corrector attention U-Net in a tandem architecture achieves tation [21].
an average IoU score of 98.67%. In this paper, we introduce SEMU-Net (see Figure 1),
a comprehensive set of methods designed to automate the
segmentation of scanning electron microscope (SEM) im-
1. Introduction ages of integrated silicon photonic devices into binary clas-
sifications of silicon (core waveguide confining light) and
Integrated silicon photonic devices operate using light silica (cladding material around the core). These segmented
(photons) [8] rather than electrons to perform various func- images, along with their corresponding design files in the
tions through a silicon waveguide on a silicon-on-insulator Graphic Data System (GDS) format—a standard file format
(SOI) die. This approach offers significant advantages over used in the photonics industry for representing the physical
traditional electronic devices [2]. Photonic devices can layout of devices—are then used to train two CNN models:
achieve lower latency, reduced heat generation, and mini- the predictor and the corrector.
mal energy loss, making them ideal for high-bandwidth data The segmentation model is designed to automatically
transmission and processing tasks [11, 17]. This efficiency segment SEM images of photonic devices. It takes SEM
is crucial given the growing computational demands of AI images as input and uses manually segmented images as
Figure 1. Overview of the SEMU-Net framework for improving the fabrication of integrated silicon photonic devices. (i) The segmentation
model converts SEM images into segmented SEM images. (ii) The corrector model uses the binarized SEM images along with their
corresponding GDS design files to train itself, generating corrected GDS layouts that compensate for fabrication-induced variations. (iii)
The predictor model uses the corrected GDS files to predict the final fabricated structures, enabling pre-fabrication validation and further
refinement.

fabricated structure, along with its original SEM image, the


corresponding GDS design file, and the difference between
the SEM of the fabricated structure and the intended design
in the GDS layout.
The predictor model learns to map the actual GDS de-
sign file (the intended designed structure) to the SEM image
(the fabricated device), thereby identifying structural vari-
ations that occur during fabrication and enabling the pre-
diction of potential issues before fabrication begins. This
predictive capability can significantly reduce both cost and
time. Conversely, the corrector model uses an inverse de-
sign approach [22] to map SEM images to GDS design files,
enabling the adjustment of design files to better match the
desired output. By employing this reverse approach, the
corrector model generates design files with deliberate exag-
gerations in areas identified by the predictor model as prone
to fabrication changes. As a result, the corrected design files
more accurately align with the intended design after fabri-
cation.
The SEMU-Net integrates three models—segmentation,
predictor, and corrector—into a unified framework, stream-
lining the entire process within a cohesive system.
Figure 2. A sample from the ANT NanoSOI dataset: (a) an For the model architecture, we employ U-Net [21], a
SEM image of a photonics structure with silicon (gray) and sil- widely used encoder-decoder-based CNN, to train our mod-
ica (black), (b) its segmented SEM image, (c) its corresponding els using the ANT NanoSOI dataset, provided by Applied
GDS design file, and (d) the difference between the segmented Nanotools Inc 1 . The U-Net architecture is well-suited for
SEM image and the GDS design file. this application due to its effectiveness in segmenting com-
plex structures and handling the intricate details present in
the photonic circuit images. In addition to the standard
labels, leveraging a traditional U-Net [21] architecture to U-Net, we explore various U-Net variants, including at-
achieve accurate segmentation of previously unseen SEM tention U-Net [18], residual attention U-Net [16], and U-
images. Figure 2 presents a segmented SEM image of a Net++ [28], to determine the optimal performance while
maintaining relatively low computational costs. cally for segmenting SEM images of nanophotonics. This
To enhance the results further, we employ a tandem ar- novel application addresses the unique requirements of pho-
chitecture by stacking the predictor and corrector models, tonic integrated structures, which involves specific struc-
freezing the weights of the predictor, and updating the cor- tural features and high-precision segmentation not covered
rector to learn the identity mapping from GDS to GDS. We by prior deep learning methods.
assess the model’s effectiveness using the Intersection-over-
Union (IoU) score. The experimental results reveal that 2.2. Image Translations for Photonic Devices
the segmentation U-Net achieves an average IoU score of Photonics devices, which rely heavily on precise geo-
99.30%, while the corrector model attains an average IoU metrical configurations, can undergo shape changes due to
score of 98.67%, evaluated on a custom benchmark con- fabrication imperfections from inherent process variations.
sisting of various distinct shapes, including gratings, stars, Predicting these shape changes is crucial for maintaining
crosses, and circles. The segmentation model is tested us- device performance and reliability. Traditional methods for
ing the traditional U-Net with a binary cross-entropy (BCE) predicting shape changes in photonics devices include finite
loss function, while the corrector model is tested using the element analysis and other simulation-based approaches in
attention U-Net in the tandem configuration, with a loss which features are explicitly biased to mimic over- and
function that combines BCE and a weighted 0.5 dice loss. under-etching [15]. These methods do not capture, how-
ever, the process variations leading to corner rounding and
2. Related Work curvature changes. In contrast, SEM images, which can
2.1. Segmentation of SEM Images serve as training data, are more readily available, providing
an alternative pathway for predicting shape changes allow-
Segmenting SEM images presents unique challenges due ing to capture a greater range of fabrication process varia-
to the high level of noise, varying contrast, and intricate tex- tion.
tures present in these images. Traditional image processing There has been considerable research in using machine
techniques, such as thresholding [1] or edge detection [5], learning techniques for image-to-image translation, with
often fall short of accurately segmenting these images. applications spanning various domains. Notable works in
In recent years, deep learning models like U-Net have this area include the development of generative adversarial
been applied to image segmentation tasks with promising networks (GANs) for tasks such as translating sketches to
results. For instance, the U-Net architecture, originally in- photos [12], converting day images to night images [29],
troduced by [21], has become a standard in the field of im- and performing style transfer [10]. These studies high-
age segmentation, particularly for biomedical images. Its light the potential of machine learning to transform images
encoder-decoder structure allows it to capture context while across domains while preserving core content.
maintaining spatial resolution, making it effective for seg- When it comes to photonics, the application of machine
menting images with complex and fine details. The U- learning for predicting and correcting structural changes is
Net model’s strength lies in its ability to work with limited relatively underexplored. For example, the recent work pre-
datasets and its flexibility to adapt to various segmentation sented in [11] laid the groundwork for integrating machine
tasks, which has led to its adoption in different domains, learning methods towards better performing photonic inte-
including SEM image analysis. grated devices. However, there remains significant room
Various enhancements to the U-Net architecture have for further exploration and enhancement, particularly in im-
been proposed to improve its performance in specific tasks. proving prediction accuracy and generalizability to greater
For example, [26] introduced dilated convolutions to in- photonic structures.
crease the receptive field without losing resolution, which Our approach addresses this gap by integrating SEM
is useful in detecting fine structures in high-resolution im- image segmentation with a predictive model that forecasts
ages. Other variants, such as the attention U-Net [18], have shape changes in photonic devices, along with a corrector
incorporated attention mechanisms to focus on the most rel- model that refines the structure to maintain alignment with
evant features, which could be beneficial in handling the the original image after structural variations. This integra-
high noise levels often present in SEM images. tion not only enhances the accuracy of shape predictions but
While there has been significant progress in using deep also streamlines the overall process, making it more appli-
learning for SEM image segmentation, existing works have cable to real-world scenarios.
primarily concentrated on applications outside the photon-
ics domain or on general image segmentation tasks. Our 2.3. Summary of Gaps and Contributions
work, therefore, applies a machine learning model specifi- In summary, while there has been significant progress in
1 https://www.appliednt.com/nanosoi-fabrication- the image segmentation domain and the application of ma-
service chine learning in photonics, key gaps remain. Most notably,
the lack of integrated tools for segmentation and dynamic the predictor model learns to map the actual GDS design
shape prediction in photonic integrated devices presents a file to its corresponding segmented SEM image.
challenge for practical applications. In addition, existing Corrector: Instead of mapping GDS to SEM images,
methods for predicting structural changes in photonic de- the corrector model performs the reverse of the predictor
vices are computationally demanding, limiting their real- model by learning to map SEM images back to GDS files.
time use. Our work addresses these gaps by developing This enables the corrector to understand how to transition
SEMU-Net, which integrates segmentation with predictive from a fabricated design to the original design, allowing it
and corrective modeling to address shape changes in pho- to adjust new design files so that the final fabricated result
tonics devices. closely matches the intended design. For the correction of
photonic devices, we employ the attention U-Net, first in-
3. Methodology troduced by [18], which enhances the standard U-Net ar-
chitecture with attention mechanisms to focus on relevant
In this section, we introduce the SEMU-Net, a toolkit features.
for photonic design correction based on SEM images. As Attention Gate: The attention block refines feature
shown in Figure 1, our model comprises three key compo- maps through the following steps: it first applies a 1×1 con-
nents: the segmentation model for automatic label gener- volution to reduce dimensions and then combines this with
ation, the predictor model for forecasting structural varia- another 1×1 convolution of the gating input. The resulting
tions during fabrication, and the corrector model for design features are processed with a ReLU activation and another
rectification. The predictor model specifically evaluates the 1×1 convolution to produce an attention map. This map, af-
corrected version of a device image by applying predictions ter being upsampled, is used to weight the original feature
to the corrected design and comparing the resulting output maps, enhancing important regions.
with the original design file. This approach ensures that the
corrections are accurately aligned with the intended design 3.3. Dataset Processing
specifications.
Our dataset pre-processing involves several key tech-
These three modules are trained and run separately. We
niques to enhance the performance of the U-Net model.
first explain our model structure and its corresponding prin-
Data Augmentation: To increase the robustness and
ciples, then introduce our dataset processing strategies, and
generalizability of the model, we apply data augmentation
finally, the training techniques.
techniques to the training images. The augmentation pro-
3.1. The Segmentation Model cess is performed three times on each image with a 50%
probability for each transformation. The augmentations ap-
We use the original U-Net model, introduced by [21], plied include horizontal flip, rotation, shift, scale, Gaussian
which is widely regarded for its effectiveness in image seg- noise, brightness change and contrast adjustment.
mentation by capturing both fine details and broader con- Image Patching: After applying data augmentation,
textual information. Our U-Net architecture, illustrated in the images are sliced into smaller patches to facilitate
Figure 3, is structured into three main components: the model training. Each image is divided into patches of size
encoder, bottleneck, and decoder paths. The encoder, or 256×256 pixels. This patching process helps manage mem-
contracting path, uses a series of 3×3 convolutional layers ory usage and allows the model to learn from localized re-
with ReLU activations followed by 2×2 max-pooling layers gions of the images.
to capture increasingly abstract features. At the network’s Dataset Shuffling: By randomly reordering the data
base, the bottleneck layer further processes these features samples before each training epoch, dataset shuffling helps
and links the encoder to the decoder. The decoder, or ex- prevent the model from learning unintended patterns based
pansive path, uses upsampling and convolutional layers to on the sequence of the data. In our approach, we employ
reconstruct the segmentation map, with skip connections dataset shuffling to reduce the risk of overfitting and en-
that concatenate feature maps from the encoder to preserve hance the model’s generalization ability.
spatial details.
3.4. Training Techniques
3.2. The Predictor and Corrector Models
Tandem Architecture: The tandem architecture is de-
Predictor: For predicting structural variations in pho- signed to refine the model by updating the weights dur-
tonic devices, we employ the same U-Net model as used in ing the correction stage, ensuring more accurate and reli-
the segmentation U-Net, retaining the identical encoder and able corrections. It sequentially integrates the correction
decoder paths. The only difference lies in the mapping im- and prediction processes to align the final output with the
ages. In the segmentation U-Net, the model learns to map intended design. This architecture consists of two key com-
the original SEM image to its binary truth label. However, ponents: the tandem corrector and the tandem predictor,
Corrector

AG
AG
AG

AG

1024 1024
512 512 Bottleneck 512 512 512 512
256 256 256 256 256 256
128 128 128128 128 128
64 64 64 64 6464

Figure 3. Overview of the segmentation, predictor, and corrector models, all based on the U-Net architecture. The corrector model
distinguishes itself by incorporating attention gates (labeled AG) in the decoder path, whereas the segmentation and predictor models do
not utilize attention gates. The segmentation model takes SEM images as input and generates segmented labels, while the corrector model
takes GDS designs as input and outputs corrected designs.

stacked together. The tandem corrector refines the device efits from both fine-grained pixel accuracy and robust shape
structure, generating a corrected design file, while the tan- matching, leading to more precise segmentation results.
dem predictor evaluates this corrected file, simulating its Hyperparameter Tuning: Hyperparameter tuning is
post-fabrication appearance. A loss function is then calcu- crucial in optimizing model performance, involving adjust-
lated by comparing the predicted output to the original SEM ing parameters that control the learning process, such as
image, with the goal of minimizing this loss to ensure the learning rate, batch size, and network architecture [27]. In
corrected output closely matches the SEM image. The tan- our approach, we conduct a thorough search to optimize the
dem predictor model’s weights are frozen to maintain con- hyperparameter set for all three models in our study. This
sistency, while the corrector model’s weights are updated at process ensures that each model is configured with proper
each iteration to improve the correction process. parameters to enhance overall performance and accuracy.
Early Stopping: Early stopping is a technique used dur-
ing model training to prevent overfitting and improve effi- 4. Experiments
ciency [3]. We use this technique by monitoring the valida- In this section, we evaluate each component of our pro-
tion loss and halting training when no significant improve- posed SEMU-Net framework. We present quantitative re-
ment is observed after a certain number of iterations. sults for both the segmentation and correction models, com-
Learning Rate Scheduler: The learning rate scheduler paring their performance using IoU as the metric. To rig-
is a method used to adjust the learning rate during training orously assess the corrector model, we develop a custom
to improve model convergence [7]. By gradually reducing benchmark comprising several hundred structure images
the learning rate over time, the model can make finer adjust- featuring various shapes, including stars, gratings, circles,
ments as it approaches an optimal solution, helping to avoid holes, and more (see Figure 4).
overshooting minima in the loss function. We employ this
technique to enhance training efficiency. 4.1. Experiment Setup
Combined Loss Function: The combination of BCE Dataset: For our dataset, we utilize Applied Nanotools
and 0.5 dice loss is used as a composite loss function to Inc., a reputable integrated photonics foundry, for nanofab-
improve the correction performance [13]. BCE focuses rication and SEM imaging to acquire high-quality data.
on pixel-wise classification, penalizing incorrect predic- Due to the high costs and lengthy timelines associated with
tions for each pixel, making it effective for distinguishing nanofabrication, our training structures are designed to ac-
between foreground and background. Dice Loss, on the quire a large, varied dataset with minimal chip space and
other hand, measures the overlap between the predicted and imaging time.
ground truth masks, emphasizing the overall shape accu- The generated patterns are fabricated on a 220nm
racy. By combining BCE with 0.5 dice loss, the model ben- thick silicon-on-insulator (SOI) platform using electron-
Figure 4. Performance comparison between the attention U-Net in tandem configuration and the original U-Net across three sample
images. Each sample presents (i) the original design and its predicted structure post-fabrication, highlighting corner rounding and structural
deviations that could impact device performance, (ii) the correction and prediction of correction for the tandem attention U-Net, and (iii) the
correction and prediction of correction for the original U-Net. The tandem attention U-Net outperforms the original U-Net, demonstrating
better structural fidelity and achieving a higher IoU score.

beam lithography through a silicon photonic multi-project rectly categorized into silicon or silicon dioxide. A higher
wafer service provided by Applied Nanotools Inc. After IoU score indicates better model performance, with an IoU
lithography and etching, SEM images with a resolution of of 1.0 representing perfect segmentation.
1nm/pixel are taken of each pattern. The GDS and SEM Training Configurations: The segmentation model is
images are then segmented with the help of our segmenta- trained using image slices of size 256×256 over 50 epochs,
tion model, cropped, aligned, and prepared for training of with a batch size of 32 to ensure robust evaluation. The
the predictor and corrector models. model uses BCE as the loss function and a learning rate of
Finally, we augment the binarized SEM and GDS images 0.0001. On the other hand, the predictor and corrector mod-
to artificially increase their size and use these enhanced im- els are trained with a larger slice size of 2048×2048 over 20
ages to train our predictor and corrector models. epochs, employing early stopping after 3 epochs if no im-
Computational Resources: The training and testing of provement in validation loss is observed. A batch size of
our models are conducted using one GeForce RTX 4090 2 and a validation split of 20% are utilized, with a learning
GPU, which is known for its good performance in deep rate of 0.0004 and the AdamW optimizer to ensure efficient
learning tasks, and provides the necessary computational training. Additionally, data augmentation is applied to en-
power to efficiently handle large datasets and complex op- hance dataset diversity. These models are trained using a
erations involved in our SEMU-Net framework. This GPU combination of BCE and 0.5 Dice Loss. For the tandem cor-
allows for faster training times and enables us to experiment rector model, the original training configurations are main-
with various model configurations, ensuring robust and ac- tained, but the number of epochs is extended from 20 to 60
curate results. for more comprehensive learning.
Evaluation Metrics: To evaluate the performance of our
4.2. The Segmentation Model
models, we use IoU as the primary metric, a widely used
metric in segmentation tasks, providing a measure of the In this section, we compare the performance of our seg-
overlap between the predicted segmentation and the ground mentation U-Net model with two baseline models: the seg-
truth. In our experiments, we calculate the IoU for each ment anything model (SAM) [14] fine-tuned on our dataset,
pixel classification, determining whether it has been cor- and a traditional threshold-based model. The goal is to eval-
uate the effectiveness of each model in segmenting SEM 4.3. The Predictor and Corrector Models
images of photonics devices.
Baseline Models: (1) The attention U-Net extends the In this section, we evaluate the performance of our cor-
traditional U-Net architecture by incorporating attention rector model using four different architectures: U-Net [21],
mechanisms into the skip connections between the encoder attention U-Net [18], residual attention U-Net [16], and U-
and decoder. These attention gates focus on relevant fea- Net++ [28]. Additionally, we incorporate the tandem ar-
tures from the encoder while suppressing irrelevant infor- chitecture into each model and assess the performance of
mation. The attention U-Net improves segmentation ac- all configurations. The aim is to assess the effectiveness of
curacy by dynamically highlighting important regions in each model in correcting GDS images of photonics devices
the feature maps, which helps in handling more complex and to identify the most effective configuration for optimal
and varied structures. (2) SAM is a large-scale image seg- performance.
mentation model developed by Meta is based on vision
Results: The results of the comparison between the
transformer (ViT) architecture and is pre-trained on a vast
U-Net, attention U-Net, residual attention U-Net, and U-
dataset. For our experiments, we chose the heavy model
Net++ and their tandem configuration are summarized in
ViT-H, which is then fine-tuned on our dataset to adapt its
Table 2. The table presents the IoU scores for each model,
general segmentation capabilities to the specific character-
providing a clear view of their correction performance by
istics of SEM images. This fine-tuning involves updating
the average IoU, minimum IoU and the median of Iou,
the model parameters to improve performance on our par-
which reveal that the tandem attention U-Net model outper-
ticular task. (3) The traditional threshold-based model is
forms all other models, achieving the highest IoU score. To
implemented following the workflow shown in Figure 5.
ensure consistency in the presence of random dataset order-
This simple method is used as a baseline to evaluate the
ing and weight initializations, each experiment is run five
improvements achieved by more advanced models like U-
times, and the median of the results is used.
Net and SAM. Despite its simplicity, this model provides a
useful comparison point for assessing the effectiveness of Processing images of size 2048×2048 requires a sub-
more sophisticated approaches. stantial amount of GPU memory, which can be challenging
Results: The results of the comparison between the U- to manage. To address this constraint, we reduce the filter
Net, SAM, and the traditional threshold-based model are sizes in our models, which, in turn, decreases the number
summarized in Table 1. The table shows the IoU scores of parameters and allows us to train the models efficiently
for each model, providing their segmentation performance within the limits of our available GPU resources.
by the average IoU, maximum IoU and minimum of IoU,
The results indicate that while the base models show ro-
which reveals that the U-Net model outperforms all other
bust performance, the tandem versions, particularly the at-
models, achieving the highest IoU score. The threshold-
tention U-Net (tandem), outperform others with the highest
based method comes in second and finally the SAM model.
average IoU of 98.67%, a minimum IoU of 88.86%, and
Two example SEM images are used to visualize the seg-
a median IoU of 99.51%. These metrics reflect a notable
mentation results from different models in Figure 6.
enhancement in accuracy and consistency when using the
Despite its state-of-the-art status and extensive pre- tandem architecture. Following closely are the U-Net (tan-
training on a large dataset, SAM’s performance in this con- dem) and residual attention U-Net (tandem) configurations.
text is relatively lower. SAM struggles to achieve optimal The U-Net (tandem) configuration performs commendably,
results due to its unfamiliarity with the specific segmenta- achieving an average IoU of 97.87%, a minimum IoU of
tion task of SEM images. The model’s generalization ca- 71.39%, and a median IoU of 98.90%, indicating its effec-
pabilities are limited when applied to a highly specialized tive correction capabilities. The residual attention U-Net
dataset, indicating that fine-tuning SAM alone is insuffi- (tandem) also demonstrates a strong performance with an
cient for achieving top performance in this domain. average IoU of 96.74%, a minimum IoU of 65.91%, and a
median IoU of 98.64%.
The U-Net’s superior performance can be attributed to
its straightforward architecture and effectiveness in han- These results highlight the effectiveness of the tandem
dling complex segmentation tasks. Its symmetric encoder- architecture in improving the model’s accuracy and consis-
decoder structure with skip connections allows it to capture tency across different configurations. Figure 4 illustrates
both high-level contextual information and fine-grained de- the performance comparison between the attention U-Net in
tails. This high accuracy demonstrates the model’s robust- tandem configuration (representing our best performance)
ness and suitability for SEM image segmentation, where and the original U-Net across three sample images from our
precise delineation of features is crucial. custom dataset.
Canny
Pre-processing
Post-processing

Paddle, Crop
Dilate

Denoise, Equalize Erode


Otsu’s

Figure 5. Workflow of the threshold-based segmentation model. A combination of Canny and Otsu’s methods is used to detect the contours
and further segment the pre-processed SEM images.

Model # Parameters (M) Average IoU (%) Max IoU (%) Min IoU (%)
U-Net 7.94 99.30 99.71 98.35
SAM (ViT-H) 636 88.35 96.90 43.48
Threshold Model NA 96.54 98.59 86.16

Table 1. Comparison of the segmentation model performance on the custom benchmark. To ensure consistency, each experiment is run
five times due to random dataset ordering and weight initializations, and the median result is taken.

Model # Parameters (M) Average IoU (%) Min IoU (%) Median IoU (%)
U-Net 1.94 92.38 6.12 97.01
Attention U-Net 2.01 93.55 72.15 97.62
Residual Attention U-Net 2.64 93.64 46.73 96.42
U-Net++ 0.56 92.63 66.64 95.19
U-Net (tandem) 3.88 97.87 71.39 98.90
Attention U-Net (tandem) 4.02 98.67 88.86 99.51
Residual Attention U-Net (tandem) 5.28 96.74 65.91 98.64
U-Net++ (tandem) 1.13 93.98 58.55 96.58

Table 2. Comparison of the corrector model performance on the custom benchmark. To ensure consistency, each experiment is run five
times due to random dataset ordering and weight initializations, and the median result is taken.

5. Conclusion search will focus on refining the SEMU-Net framework and


expanding its application to more complex and varied fab-
Integrated silicon photonic devices exhibit significant rication processes used in nanophotonics.
performance degradation due to structural variations during
fabrication, such as over- or under-etching, corner round- References
ing, and unintended defects. To address these challenges,
we propose SEMU-Net, a comprehensive approach for ad- [1] Salem Saleh Al-amri, N. V. Kalyankar, and Khamitkar S. D.
Image segmentation by using threshold techniques, 2010. 3
dressing the issues associated with scanning electron micro-
[2] Kirubel Amsalu and Sivaprakasam Palani. A review on
scope (SEM) image analysis and subsequent correction.
photonics and its applications. Materials Today: Proceed-
SEMU-Net combines advanced segmentation and cor- ings, 33:3372–3377, 2020. International Conference on Nan-
rection techniques using two deep neural network models otechnology: Ideas, Innovation and Industries. 1
based on U-Net. The segmentation model, built on the orig- [3] Yingbin Bai, Erkun Yang, Bo Han, Yanhua Yang, Jiatong Li,
inal U-Net architecture, achieves an average Intersection- Yinian Mao, Gang Niu, and Tongliang Liu. Understanding
over-Union (IoU) score of 99.30% for identifying critical and improving early stopping for learning with noisy labels,
features in SEM images. Complementing this, the cor- 2021. 5
rector attention U-Net model works in tandem to address [4] Tom B. Brown, Benjamin Mann, et al. Language models are
fabrication variations, applying inverse design to correct few-shot learners, 2020. 1
discrepancies and produce designs that closely match in- [5] John Canny. A computational approach to edge detection.
IEEE Transactions on Pattern Analysis and Machine Intelli-
tended specifications. The tandem corrector attention U-Net
gence, PAMI-8(6):679–698, 1986. 3
achieves an IoU score of 98.67%, effectively improving de-
[6] Aakanksha Chowdhery, Sharan Narang, et al. Palm: Scaling
vice accuracy by addressing structural defects. language modeling with pathways, 2022. 1
Our approach marks a significant advancement in [7] Aaron Defazio, Ashok Cutkosky, Harsh Mehta, and Kon-
nanophotonic fabrication, offering a robust solution to im- stantin Mishchenko. When, why and how much? adaptive
prove device performance, yield, and reliability. Future re- learning rate scheduling by refinement, 2023. 5
Prediction Prediction

U-Net U-Net

Image Ground Truth Image Ground Truth

SAM SAM

Threshold
Threshold

Model Model

Figure 6. Performance comparison between the U-Net, SAM and threshold model for two sample SEM images, which are processed
through the U-Net, SAM and threshold models and the segmentation results are compared with corresponding ground truth.

[8] Po Dong, Young-Kai Chen, Guang-Hua Duan, and David [17] Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu,
Neilson. Silicon photonic devices and integrated circuits. Zhixing Jiang, Zhoufeng Ying, Jason Midkiff, Sourabh Jain,
Nanophotonics, 3, 2014. 1 May H. Hlaing, David Z. Pan, and Ray T. Chen. Photonic-
[9] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, electronic integrated circuits for high-performance comput-
Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, ing and ai accelerators, 2024. 1
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- [18] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew
vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori,
worth 16x16 words: Transformers for image recognition at Steven McDonagh, Nils Y. Hammerla, Bernhard Kainz, Ben
scale, 2021. 1 Glocker, and Daniel Rueckert. Attention u-net: Learning
[10] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. where to look for the pancreas, 2018. 2, 3, 4, 7
Image style transfer using convolutional neural networks. In [19] Keiron O’Shea and Ryan Nash. An introduction to convolu-
Proceedings of the IEEE Conference on Computer Vision tional neural networks, 2015. 1
and Pattern Recognition, pages 2414–2423, 2016. 3 [20] Alexander Y. Piggott, Eric Y. Ma, Logan Su, Geun Ho
[11] Dusan Gostimirovic, Dan-Xia Xu, Odile Liboiron- Ahn, Neil V. Sapra, Dries Vercruysse, Andrew M. Nether-
Ladouceur, and Yuri Grinberg. Deep learning-based ton, Akhilesh S. P. Khope, John E. Bowers, and Jelena
prediction of fabrication-process-induced structural Vučković. Inverse-designed photonics for semiconductor
variations in nanophotonic devices. ACS Photonics, foundries. ACS Photonics, 7(3):569–575, 2020. 1
9(8):2623–2633, 2022. 1, 3
[21] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net:
[12] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A.
Convolutional networks for biomedical image segmentation,
Efros. Image-to-image translation with conditional adver-
2015. 1, 2, 3, 4, 7
sarial networks, 2018. 3
[22] Martin F. Schubert, Alfred K. C. Cheung, Ian A. D.
[13] Shruti Jadon. A survey of loss functions for semantic seg-
Williamson, Aleksandra Spyra, and David H. Alexander. In-
mentation. In 2020 IEEE Conference on Computational
verse design of photonic devices with strict foundry fabrica-
Intelligence in Bioinformatics and Computational Biology
tion constraints. ACS Photonics, 9(7):2327–2336, 2022. 2
(CIBCB), pages 1–7. IEEE, Oct 2020. 5
[14] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, [23] Shawn Yohanes Siew, Bo Li, Feng Gao, H. Y. Zheng,
Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- Weipeng Zhang, Pengfei Guo, S. W. Xie, A. Song, Bo Dong,
head, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and L. W. Luo, Chao Li, Xianshu Luo, and GuoQiang Lo. Re-
Ross Girshick. Segment anything, 2023. 6 view of silicon photonics technology and platform develop-
[15] Zeqin Lu, Jaspreet Jhoja, Jackson Klein, Xu Wang, Amy Liu, ment. Journal of Lightwave Technology, 39(13):4374–4389,
Jonas Flueckiger, James Pond, and Lukas Chrostowski. Per- 2021. 1
formance prediction for silicon photonics integrated circuits [24] Farhana Sultana, Abu Sufian, and Paramartha Dutta. A Re-
with layout-dependent correlated manufacturing variability. view of Object Detection Models Based on Convolutional
Optics Express, 25(9):9712–9733, 2017. 3 Neural Network, pages 1–16. Springer Singapore, 2020. 1
[16] Zhen-Liang Ni, Gui-Bin Bian, Xiao-Hu Zhou, Zeng-Guang [25] Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, and Gang
Hou, Xiao-Liang Xie, Chen Wang, Yan-Jie Zhou, Rui-Qi Li, Sun. Deep image: Scaling up image recognition, 2015. 1
and Zhen Li. Raunet: Residual attention u-net for semantic [26] Fisher Yu and Vladlen Koltun. Multi-scale context aggrega-
segmentation of cataract surgical instruments, 2019. 2, 7 tion by dilated convolutions, 2016. 3
[27] Tong Yu and Hong Zhu. Hyper-parameter optimization: A
review of algorithms and applications, 2020. 5
[28] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima
Tajbakhsh, and Jianming Liang. Unet++: A nested u-net
architecture for medical image segmentation, 2018. 2, 7
[29] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A.
Efros. Unpaired image-to-image translation using cycle-
consistent adversarial networks, 2020. 3
[30] Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari,
Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R.
Manmatha, and Mu Li. A comprehensive study of deep video
action recognition, 2020. 1

You might also like