Su Zhu

Results 13 comments of Su Zhu

Find solid baselines in https://github.com/sz128/slot_filling_and_intent_detection_of_SLU.

@lale314 可以用如下代码过滤出不完整的数据。 ```python import sys import json with open(sys.argv[1]) as fin: for line in fin: line = line.strip() sample = json.loads(line) output = sample['output'].strip(" \n\"”") if output[-1] in set("?!.。?!})]`》)") or...

> Isn't this already supported with #31629 ? It seems that both implementations are similar. But, we need to consider the situation that position ids are not reset between different...

> Hey! Don't you think that ragging the tensor would be more efficient? Yes. I didn't describe it well. I updated the description of this PR. The implementations of this...

> > > Hey! Don't you think that ragging the tensor would be more efficient? > > > > > > Yes. I didn't describe it well. I updated the...

> > we need to consider the scenario where position IDs are not reset between different short samples, especially for LLM pre-training > > does this imply us properly computing...

> > > > we need to consider the scenario where position IDs are not reset between different short samples, especially for LLM pre-training > > > > > >...

> Why wouldn't we use `position_ids` to encode all information (packed, not packed, padded, not padded) in a slightly more elegant way without touching `attention_mask`? > > For example let's...

> Yep, agree with that definitely! My proposal was to leave this choice to users to set in data collator. If they wish to treat such concatenated sequences as a...