FSMTSAD(FSMTSA Dataset)

Overview

The FSMTSA (Fatih Sultan Mehmet Target Sentiment Analysis) dataset is a comprehensive resource designed for sentiment analysis studies in Turkish. It includes text samples from diverse sources such as hotel reviews, restaurant reviews, movie critiques, product evaluations from e-commerce platforms, and social media posts (tweets). Additionally, the dataset is enriched with text samples generated by Large Language Models (LLMs) to improve data diversity and coverage.

This dataset has been expanded through data augmentation to provide an extensive representation of positive, neutral, and negative sentiments, facilitating robust and unbiased sentiment analysis.

Data Summary

Total Samples: 15,853
Class Distribution:
- Positive: 5,284 (33.3%)
- Neutral: 5,206 (32.8%)
- Negative: 5,363 (33.8%)

Data Sources

The data originates from a variety of real-world sources, including:

Hotel, restaurant, and movie reviews
E-commerce product reviews
Social media posts (tweets)
Texts generated by Large Language Models (LLMs)

For neutral samples, particular care was taken to select texts without explicit subjective judgments or with balanced opposing sentiments.

Data Augmentation Techniques

Back-Translation:
- Sentences were translated into English and then back into Turkish to introduce structural variations while preserving semantic meaning.
Synonym Replacement:
- Key words in the text were replaced with their synonyms using the WordNet lexical database to create contextually equivalent variations.

During the augmentation process, duplicate entries were systematically removed to ensure data quality.

Annotation Process

The dataset was annotated manually by three independent annotators.
In cases of disagreement, a majority vote was taken, with further validation by a supervisor.
Disputed annotations were compared against outputs from at least three different LLMs for consistency.
Sentiment classes are encoded as follows:
- -1: Negative
- 0: Neutral
- 1: Positive

Example Data Samples

Text	Source	Polarity
Akşam 9'da kapanma olacak ya sanırım İstanbul'un trafik yoğunluğunun %50'si şu an Yeniköy'de bu ne hal?	Tweet	-1 (Negative)
Vatandaşlar, oy kullanma hakkına sahiptirler, ulaşılabilirlik konusuna dikkat edilmektedir.	LLM-generated	0 (Neutral)
Kokusu güzel hafif, diğer yumuşatıcılar gibi ağır yoğun bir kokusu yok. Bahar gibi kokuyor, bahar aylarında tercih edilebilecek bir yumuşatıcı bence.	Product Review	1 (Positive)

Usage Guidelines

This dataset is publicly available for academic and research purposes.
Users must properly cite the original authors and reference the dataset in their publications.
The dataset should not be used for commercial purposes without explicit permission.

Citation Guide

If you are using the FSMTSA dataset in your research, please cite as follows: Zümberoğlu, K. B., Dik, S. Z., Karadeniz, B. S., & Sahmoud, S. (2025). Towards Better Sentiment Analysis in the Turkish Language: Dataset Improvements and Model Innovations. Applied Sciences, 15(4), 2062. https://doi.org/10.3390/app15042062

Contact

For questions or feedback, please contact [email protected].

Acknowledgements

We acknowledge Dr. Shaaban Sahmoud for his invaluable guidance and extend our thanks to Sümeyye Zülal Dik and Büşra Sinem Karadeniz for their dedicated efforts in the creation, annotation, and validation of the FSMTSA dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FSMTSA_Dataset		FSMTSA_Dataset
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSMTSAD(FSMTSA Dataset)

Overview

Data Summary

Data Sources

Data Augmentation Techniques

Annotation Process

Example Data Samples

Usage Guidelines

Citation Guide

Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FSMTSAD(FSMTSA Dataset)

Overview

Data Summary

Data Sources

Data Augmentation Techniques

Annotation Process

Example Data Samples

Usage Guidelines

Citation Guide

Contact

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages