The FSMTSA (Fatih Sultan Mehmet Target Sentiment Analysis) dataset is a comprehensive resource designed for sentiment analysis studies in Turkish. It includes text samples from diverse sources such as hotel reviews, restaurant reviews, movie critiques, product evaluations from e-commerce platforms, and social media posts (tweets). Additionally, the dataset is enriched with text samples generated by Large Language Models (LLMs) to improve data diversity and coverage.
This dataset has been expanded through data augmentation to provide an extensive representation of positive, neutral, and negative sentiments, facilitating robust and unbiased sentiment analysis.
- Total Samples: 15,853
- Class Distribution:
- Positive: 5,284 (33.3%)
- Neutral: 5,206 (32.8%)
- Negative: 5,363 (33.8%)
The data originates from a variety of real-world sources, including:
- Hotel, restaurant, and movie reviews
- E-commerce product reviews
- Social media posts (tweets)
- Texts generated by Large Language Models (LLMs)
For neutral samples, particular care was taken to select texts without explicit subjective judgments or with balanced opposing sentiments.
-
Back-Translation:
- Sentences were translated into English and then back into Turkish to introduce structural variations while preserving semantic meaning.
-
Synonym Replacement:
- Key words in the text were replaced with their synonyms using the WordNet lexical database to create contextually equivalent variations.
During the augmentation process, duplicate entries were systematically removed to ensure data quality.
- The dataset was annotated manually by three independent annotators.
- In cases of disagreement, a majority vote was taken, with further validation by a supervisor.
- Disputed annotations were compared against outputs from at least three different LLMs for consistency.
- Sentiment classes are encoded as follows:
- -1: Negative
- 0: Neutral
- 1: Positive
| Text | Source | Polarity |
|---|---|---|
| Akşam 9'da kapanma olacak ya sanırım İstanbul'un trafik yoğunluğunun %50'si şu an Yeniköy'de bu ne hal? | Tweet | -1 (Negative) |
| Vatandaşlar, oy kullanma hakkına sahiptirler, ulaşılabilirlik konusuna dikkat edilmektedir. | LLM-generated | 0 (Neutral) |
| Kokusu güzel hafif, diğer yumuşatıcılar gibi ağır yoğun bir kokusu yok. Bahar gibi kokuyor, bahar aylarında tercih edilebilecek bir yumuşatıcı bence. | Product Review | 1 (Positive) |
- This dataset is publicly available for academic and research purposes.
- Users must properly cite the original authors and reference the dataset in their publications.
- The dataset should not be used for commercial purposes without explicit permission.
If you are using the FSMTSA dataset in your research, please cite as follows: Zümberoğlu, K. B., Dik, S. Z., Karadeniz, B. S., & Sahmoud, S. (2025). Towards Better Sentiment Analysis in the Turkish Language: Dataset Improvements and Model Innovations. Applied Sciences, 15(4), 2062. https://doi.org/10.3390/app15042062
For questions or feedback, please contact [email protected].
We acknowledge Dr. Shaaban Sahmoud for his invaluable guidance and extend our thanks to Sümeyye Zülal Dik and Büşra Sinem Karadeniz for their dedicated efforts in the creation, annotation, and validation of the FSMTSA dataset.