Fine-Tuning Model Performance – Seeking Solutions

We built a personified AI chatbot that delivers personalized Vedic astrology readings by processing users’ birth chart data via an API.

Why We Attempted Fine-Tuning

We aimed to improve:

  • Simpler, More Human Language – Making responses warm, engaging, and easy to understand.
  • Conversational Variability – Reducing repetition and ensuring a more dynamic, natural flow.
  • Concise Output – Keeping responses brief and impactful.

Our Approach

  • Collected real user chat data.
  • Manually refined responses to match our desired tone and style.
  • Fine-tuned the model using this improved dataset.

Unexpected Fine-Tuning Issues

  • Worsened Performance – The fine-tuned model performed worse than the original system prompt version.
  • Language & Tone Issues – Responses became unnatural, erratic, and sometimes incoherent.
  • Overall Degradation – The fine-tuned model did not deliver the expected improvements and was less effective than the system prompt approach.

Looking for Insights

  • Has anyone faced similar degradation when fine-tuning with user chat data?
  • What alternative strategies (e.g., refined prompt engineering, reinforcement learning, or hybrid approaches) could improve chatbot responses while maintaining the strengths of the system prompt model?