ImageEntity Extractor:
Automated Extraction of Product
Attributes from Images Using
Machine Learning
Samarth Shinde
18 September 2024
ImageEntity Extractor 1
Table of Contents
1. Executive Summary
2. Introduction
3. Problem Statement
4. Objectives
5. Methodology
• Data Collection
• Data Preprocessing
• Model Architecture
• Training Process
• Evaluation Metrics
6. Results
7. Challenges and Solutions
8. Conclusion
9. Future Work
10. References
ImageEntity Extractor 2
Executive Summary
The ImageEntity Extractor project aims to automate the
extraction of key product attributes from images, addressing
the challenge of limited textual descriptions in digital
marketplaces. By leveraging Optical Character Recognition
(OCR) and advanced machine learning models, the project
successfully extracted vital information such as weight,
volume, voltage, wattage, and dimensions from product
images. Achieving an F1 score of 0.289 and ranking 660 out
of 1,980 participating teams in the hackathon, this project
demonstrates the potential of integrating computer vision and
natural language processing to enhance data accuracy and
ef ciency in e-commerce platforms.
ImageEntity Extractor 3
fi
Introduction
In the rapidly expanding digital marketplace, accurate and detailed
product information is paramount for consumer trust and informed
decision-making. However, many products lack comprehensive
textual descriptions, relying solely on images. Extracting key attributes
directly from images can bridge this information gap, enhancing
product listings and improving user experience. This project explores
the application of machine learning techniques to automate the
extraction of such attributes, providing a scalable solution for large e-
commerce platforms.
Problem Statement
Digital marketplaces often face the issue of incomplete or insuf cient
product descriptions, relying heavily on images that may not convey
all necessary details. Essential attributes like weight, volume, voltage,
wattage, and dimensions are critical for consumers to make informed
purchases but are frequently absent in textual form. Manually
annotating these attributes is time-consuming and prone to errors.
Therefore, there is a need for an automated system that can
accurately extract these entity values from product images.
ImageEntity Extractor 4
fi
Objectives
• Automate Extraction: Develop a machine learning model capable
of extracting speci c product attributes from images.
• Enhance Data Accuracy: Improve the accuracy and consistency of
product information in digital marketplaces.
• Scalability: Create a scalable solution that can handle large
volumes of images with varying qualities and formats.
• Ef ciency: Reduce the time and resources required for manual
data annotation.
Methodology
Data Collection
The dataset comprised product images along with corresponding CSV
les containing index, image_link, group_id, entity_name, and
entity_value. The training dataset included labeled entity values, while
the test dataset provided images without these labels for prediction.
Data Preprocessing
1. Image Downloading: Utilized the download_images.py script to
download images from provided URLs, organizing them into
images/train and images/test directories.
2. OCR Processing: Employed Tesseract OCR via the pytesseract
library to extract textual information from images.
ImageEntity Extractor 5
fi
fi
fi
3. Data Cleaning: Processed the OCR outputs to remove noise and
irrelevant text, ensuring only pertinent data was used for training.
4. Feature Engineering: Combined extracted text with entity names
to create input features for the model.
Model Architecture
Implemented a Transformer-based model using Hugging Face’s T5
architecture. The choice of T5 was due to its versatility in handling
text-to-text tasks, making it suitable for mapping extracted text to
speci c entity values.
Training Process
1. Tokenization: Used T5Tokenizer to tokenize input and target
texts, ensuring uniform input sizes.
2. Dataset Preparation: Created a custom PyTorch Dataset class to
handle input-output pairs, facilitating ef cient data loading.
3. Training Loop: Trained the model using the MPS backend on a
MacBook M1 GPU, optimizing with AdamW optimizer and a linear
learning rate scheduler.
4. Validation: Monitored performance using the F1 score to evaluate
precision and recall of the model’s predictions.
Evaluation Metrics
The primary metric for evaluation was the F1 score, balancing
precision and recall to provide a comprehensive measure of the
model’s accuracy in extracting entity values.
ImageEntity Extractor 6
fi
fi
Results
The ImageEntity Extractor achieved an F1 score of 0.289,
securing the 660th rank out of 1,980 participating teams in the
hackathon. While there is room for improvement, this result
demonstrates the feasibility of using machine learning for
automated entity extraction from images. The model effectively
identi ed and extracted key attributes, though further
enhancements in data preprocessing and model architecture
could yield higher accuracy.
Challenges and Solutions
• Data Quality: Variations in image quality and text readability posed
signi cant challenges. To mitigate this, extensive data cleaning and
augmentation techniques were employed to enhance OCR
accuracy.
• Model Performance: Achieving a higher F1 score required ne-
tuning the Transformer model and experimenting with different
architectures. Future iterations may explore larger models or
ensemble techniques.
• Resource Constraints: Training on a MacBook M1 limited
computational resources. Optimizing code and leveraging ef cient
libraries helped maximize performance within these constraints.
ImageEntity Extractor 7
fi
fi
fi
fi
Conclusion
The ImageEntity Extractor successfully demonstrated the potential
of machine learning in automating the extraction of product attributes
from images. Despite achieving a moderate F1 score, the project laid
a strong foundation for further enhancements. By addressing data
quality issues and optimizing model architectures, future work can
signi cantly improve accuracy, making this solution highly valuable for
e-commerce platforms seeking to enrich product information
ef ciently.
Future Work
• Model Optimization: Explore more advanced Transformer
architectures or incorporate pre-trained models specialized in OCR
tasks.
• Data Augmentation: Implement more sophisticated data
augmentation techniques to enhance model robustness against
varying image qualities.
• Multi-Attribute Extraction: Extend the model to handle multi-
attribute extraction simultaneously, improving ef ciency and
scalability.
• Deployment: Develop a deployment pipeline to integrate the model
into live e-commerce platforms, enabling real-time attribute
extraction.
ImageEntity Extractor 8
fi
fi
fi
References
• Vaswani, A., et al. (2017). “Attention is All You Need.” Advances in
Neural Information Processing Systems.
• Tesseract OCR Documentation. Retrieved from https://github.com/
tesseract-ocr/tesseract
• Hugging Face Transformers. Retrieved from https://huggingface.co/
transformers/
• PyTorch Documentation. Retrieved from https://pytorch.org/docs/
stable/index.html
ImageEntity Extractor 9