Official repository for the paper "RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs." Code and refine models coming soon!