We built a personified AI chatbot that delivers personalized Vedic astrology readings by processing users’ birth chart data via an API.
Why We Attempted Fine-Tuning
We aimed to improve:
- Simpler, More Human Language – Making responses warm, engaging, and easy to understand.
- Conversational Variability – Reducing repetition and ensuring a more dynamic, natural flow.
- Concise Output – Keeping responses brief and impactful.
Our Approach
- Collected real user chat data.
- Manually refined responses to match our desired tone and style.
- Fine-tuned the model using this improved dataset.
Unexpected Fine-Tuning Issues
- Worsened Performance – The fine-tuned model performed worse than the original system prompt version.
- Language & Tone Issues – Responses became unnatural, erratic, and sometimes incoherent.
- Overall Degradation – The fine-tuned model did not deliver the expected improvements and was less effective than the system prompt approach.
Looking for Insights
- Has anyone faced similar degradation when fine-tuning with user chat data?
- What alternative strategies (e.g., refined prompt engineering, reinforcement learning, or hybrid approaches) could improve chatbot responses while maintaining the strengths of the system prompt model?