我现在是爬了校园集市和贴吧的一些帖子和评论,把校园集市的求助贴筛选出来,然后提供标题+内容+回复 让大模型进行处理,总结问题和回答,再把结果放进qwen2.5 7b模型上面微调 总共300条问答数据
实际效果不太行,许多训练集里面的问题回答的都不好,是参数没调好吗,我在想,如果让结果过拟合,是不是更好一些,反正一个学校的数据就那些,我收集的数据基本上全覆盖了,模型的泛化能力不是很重要,而且这种特定领域的问题,感觉也泛化不了
有没有大佬推荐一下思路
我现在是爬了校园集市和贴吧的一些帖子和评论,把校园集市的求助贴筛选出来,然后提供标题+内容+回复 让大模型进行处理,总结问题和回答,再把结果放进qwen2.5 7b模型上面微调 总共300条问答数据
实际效果不太行,许多训练集里面的问题回答的都不好,是参数没调好吗,我在想,如果让结果过拟合,是不是更好一些,反正一个学校的数据就那些,我收集的数据基本上全覆盖了,模型的泛化能力不是很重要,而且这种特定领域的问题,感觉也泛化不了
有没有大佬推荐一下思路
还有些其他思路,把帖子数据结构化后作为知识库,我测试效果还是不错的,不过token消耗太多了
或者直接返回知识库检索结果,不要大模型处理回答了,但是这样感觉不够ai ![]()
跑下来 loss 是多少?
0.1583
拟合了吗,再跑能降不?
知识库可以用硅基的免费模型,token不怕消耗了
1
2025-02-17 21:22:47,091 - INFO - The job is queuing.
2
2025-02-17 21:23:46,834 - INFO - data process succeeded, start to fine-tune
3
Fine-tune estimated consuming tokens: 129084 !
4
Fine-tune estimated time: 19.06 mins!
5
Fine-tune started
6
{‘loss’: 4.6328, ‘learning_rate’: 0.0, ‘consumed_train_tokens’: 305, ‘consumed_train_samples’: 4, ‘consumed_train_tokens_with_padding’: 305, ‘finished_epoch’: 0, ‘current_epoch_steps’: 1, ‘each_epoch_steps’: 16, ‘epoch’: 0.0625}
7
{‘loss’: 2.8148, ‘learning_rate’: 0.0003, ‘consumed_train_tokens’: 1793, ‘consumed_train_samples’: 20, ‘consumed_train_tokens_with_padding’: 1793, ‘finished_epoch’: 0, ‘current_epoch_steps’: 5, ‘each_epoch_steps’: 16, ‘epoch’: 0.3125}
8
{‘loss’: 2.5448, ‘learning_rate’: 0.0002977710712676958, ‘consumed_train_tokens’: 3500, ‘consumed_train_samples’: 40, ‘consumed_train_tokens_with_padding’: 3500, ‘finished_epoch’: 0, ‘current_epoch_steps’: 10, ‘each_epoch_steps’: 16, ‘epoch’: 0.625}
9
{‘loss’: 2.4419, ‘learning_rate’: 0.000291150533339526, ‘consumed_train_tokens’: 5213, ‘consumed_train_samples’: 60, ‘consumed_train_tokens_with_padding’: 5213, ‘finished_epoch’: 0, ‘current_epoch_steps’: 15, ‘each_epoch_steps’: 16, ‘epoch’: 0.9375}
10
{‘loss’: 2.1277, ‘learning_rate’: 0.0002803351619892201, ‘consumed_train_tokens’: 7002, ‘consumed_train_samples’: 80, ‘consumed_train_tokens_with_padding’: 7002, ‘finished_epoch’: 1, ‘current_epoch_steps’: 4, ‘each_epoch_steps’: 16, ‘epoch’: 1.25}
11
{‘loss’: 1.7535, ‘learning_rate’: 0.000265646411921625, ‘consumed_train_tokens’: 8610, ‘consumed_train_samples’: 100, ‘consumed_train_tokens_with_padding’: 8610, ‘finished_epoch’: 1, ‘current_epoch_steps’: 9, ‘each_epoch_steps’: 16, ‘epoch’: 1.5625}
12
{‘loss’: 1.7997, ‘learning_rate’: 0.00024752086248890656, ‘consumed_train_tokens’: 10134, ‘consumed_train_samples’: 120, ‘consumed_train_tokens_with_padding’: 10134, ‘finished_epoch’: 1, ‘current_epoch_steps’: 14, ‘each_epoch_steps’: 16, ‘epoch’: 1.875}
13
{‘loss’: 1.3865, ‘learning_rate’: 0.0002264972416694545, ‘consumed_train_tokens’: 11734, ‘consumed_train_samples’: 140, ‘consumed_train_tokens_with_padding’: 11734, ‘finished_epoch’: 2, ‘current_epoch_steps’: 3, ‘each_epoch_steps’: 16, ‘epoch’: 2.1875}
14
{‘loss’: 1.1832, ‘learning_rate’: 0.00020320041398307471, ‘consumed_train_tokens’: 13500, ‘consumed_train_samples’: 160, ‘consumed_train_tokens_with_padding’: 13500, ‘finished_epoch’: 2, ‘current_epoch_steps’: 8, ‘each_epoch_steps’: 16, ‘epoch’: 2.5}
15
{‘loss’: 1.0061, ‘learning_rate’: 0.0001783228082540057, ‘consumed_train_tokens’: 15086, ‘consumed_train_samples’: 180, ‘consumed_train_tokens_with_padding’: 15086, ‘finished_epoch’: 2, ‘current_epoch_steps’: 13, ‘each_epoch_steps’: 16, ‘epoch’: 2.8125}
16
{‘eval_loss’: 2.766188144683838, ‘eval_acc’: 0.4569309754706218, ‘eval_runtime’: 7.9517, ‘eval_samples_per_second’: 8.049, ‘eval_steps_per_second’: 2.012, ‘epoch’: 3.125}
17
{‘loss’: 0.8011, ‘learning_rate’: 0.0001526038372261947, ‘consumed_train_tokens’: 16686, ‘consumed_train_samples’: 200, ‘consumed_train_tokens_with_padding’: 16686, ‘finished_epoch’: 3, ‘current_epoch_steps’: 2, ‘each_epoch_steps’: 16, ‘epoch’: 3.125}
18
{‘loss’: 0.5516, ‘learning_rate’: 0.00012680792072147963, ‘consumed_train_tokens’: 18310, ‘consumed_train_samples’: 220, ‘consumed_train_tokens_with_padding’: 18310, ‘finished_epoch’: 3, ‘current_epoch_steps’: 7, ‘each_epoch_steps’: 16, ‘epoch’: 3.4375}
19
{‘loss’: 0.5131, ‘learning_rate’: 0.00010170176553685336, ‘consumed_train_tokens’: 19992, ‘consumed_train_samples’: 240, ‘consumed_train_tokens_with_padding’: 19992, ‘finished_epoch’: 3, ‘current_epoch_steps’: 12, ‘each_epoch_steps’: 16, ‘epoch’: 3.75}
20
{‘loss’: 0.4688, ‘learning_rate’: 7.803157736820147e-05, ‘consumed_train_tokens’: 21547, ‘consumed_train_samples’: 260, ‘consumed_train_tokens_with_padding’: 21547, ‘finished_epoch’: 4, ‘current_epoch_steps’: 1, ‘each_epoch_steps’: 16, ‘epoch’: 4.0625}
21
{‘loss’: 0.3286, ‘learning_rate’: 5.650088206821785e-05, ‘consumed_train_tokens’: 23292, ‘consumed_train_samples’: 280, ‘consumed_train_tokens_with_padding’: 23292, ‘finished_epoch’: 4, ‘current_epoch_steps’: 6, ‘each_epoch_steps’: 16, ‘epoch’: 4.375}
22
{‘loss’: 0.1983, ‘learning_rate’: 3.774961543555743e-05, ‘consumed_train_tokens’: 24821, ‘consumed_train_samples’: 300, ‘consumed_train_tokens_with_padding’: 24821, ‘finished_epoch’: 4, ‘current_epoch_steps’: 11, ‘each_epoch_steps’: 16, ‘epoch’: 4.6875}
23
{‘loss’: 0.2673, ‘learning_rate’: 2.2335103028971096e-05, ‘consumed_train_tokens’: 26628, ‘consumed_train_samples’: 320, ‘consumed_train_tokens_with_padding’: 26628, ‘finished_epoch’: 4, ‘current_epoch_steps’: 0, ‘each_epoch_steps’: 16, ‘epoch’: 5.0}
24
{‘loss’: 0.1328, ‘learning_rate’: 1.071549532480653e-05, ‘consumed_train_tokens’: 28170, ‘consumed_train_samples’: 340, ‘consumed_train_tokens_with_padding’: 28170, ‘finished_epoch’: 5, ‘current_epoch_steps’: 5, ‘each_epoch_steps’: 16, ‘epoch’: 5.3125}
25
{‘loss’: 0.1442, ‘learning_rate’: 3.2361505584861427e-06, ‘consumed_train_tokens’: 29850, ‘consumed_train_samples’: 360, ‘consumed_train_tokens_with_padding’: 29850, ‘finished_epoch’: 5, ‘current_epoch_steps’: 10, ‘each_epoch_steps’: 16, ‘epoch’: 5.625}
26
{‘loss’: 0.1583, ‘learning_rate’: 1.1936997944770487e-07, ‘consumed_train_tokens’: 31642, ‘consumed_train_samples’: 380, ‘consumed_train_tokens_with_padding’: 31642, ‘finished_epoch’: 5, ‘current_epoch_steps’: 15, ‘each_epoch_steps’: 16, ‘epoch’: 5.9375}
27
{‘eval_loss’: 3.9192488193511963, ‘eval_acc’: 0.4244152880775813, ‘eval_runtime’: 6.3999, ‘eval_samples_per_second’: 10.0, ‘eval_steps_per_second’: 2.5, ‘epoch’: 6.0}
28
{‘train_runtime’: 516.9354, ‘train_samples_per_second’: 2.971, ‘train_steps_per_second’: 0.186, ‘train_loss’: 1.0940038183083136, ‘epoch’: 6.0}
29
Actual number of consumed tokens is 129084!
30
Uploaded checkpoint!
31
Fine-tune succeeded!
32
2025-02-17 21:36:25,277 - INFO - fine-tuned output got, start to transfer it for inference
33
2025-02-17 21:37:32,918 - INFO - transfer for inference succeeded, start to deliver it for inference
34
2025-02-17 21:42:14,966 - INFO - start to save checkpoint
35
2025-02-17 21:43:25,955 - INFO - finetune-job succeeded
36
2025-02-17 21:43:26,404 - INFO - training usage 129084
37
2025-02-17 21:43:26,463 - INFO - ##FT_COMPLETE##
这个是日志,看着应该是过拟合了,不太看的懂 ![]()
免费的都是小模型,总结能力不太行,在参数大的模型上知识库效果才可以,在r1上测试简直无敌了,回答比我都好
来跟始皇学习下微调LLM ![]()
看来7b模型不行?
“训练集里面的问题回答的都不好”是指主观感受嘛?资源有限的话可以试试更大的模型的QLoRA微调
建议搞下知识库这条路,模型微调的玄幻在于数据集……
我们学校用deepseek微调了一个
这种场景更适合RAG吧,校园内的数据可以实时更新。微调模型更适合使用某一领域的基础数据进行微调。针对RAG的token消耗太高,你可以从chunk的切分,或者检索召回后过一遍Rerank然后进行截取来解决。
就是他的回答没有有效信息,有种贴吧包打听的味,比如问转专业,我的数据集里面的回答是 “一般来说绩点排名前25%就能转,部分专业75%都可以,可以看教务处去年转专业的文件了解详情”
大模型的回答是 ”转专业看个人绩点,具体询问教务处官网“
7b算是一个测试吧,先熟悉一下流程,等会试试32b
我的想法是知识库做一个补充,想学学大模型相关的才微调模型的 ![]()
微调的方法不难学,但效果的好坏80%在数据集。
哪个平台可以微调模型
R1 也能做向量模型吗![]()
想做个大模型相关的项目写简历上
,大学的这些数据基本上几年都不会变,知识库用来更新数据,我看看rerank,谢谢大佬指导 ![]()