Log inSign up
BigCode
271 posts
user avatar
BigCode
@BigCodeProject
Open and responsible research and development of large language models for code. #BigCodeProject run by @huggingface + @ServiceNowRSRCH
bigcode-project.org
Joined August 2022
3
Following
9,133
Followers
  • Pinned
    user avatar
    BigCode
    @BigCodeProject
    Feb 28, 2024
    Introducing: StarCoder2 and The Stack v2 โญ๏ธ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open! hf.co/bigcode/starcoโ€ฆ
    223K
  • user avatar
    BigCode
    @BigCodeProject
    May 4, 2023
    Introducing: ๐Ÿ’ซStarCoder StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant. Try it here: shorturl.at/cYZ06r Release thread๐Ÿงต
    882K
  • user avatar
    BigCode
    @BigCodeProject
    Oct 27, 2022
    Introducing ๐Ÿ“‘ The Stack - a 3TB dataset of permissively licensed code in 30 programming languages. hf.co/datasets/bigcoโ€ฆ You want your code excluded from the model training? There is an opt-out form and data governance plan: bigcode-project.org/docs/about/theโ€ฆ Let's take a tour๐Ÿงต
  • user avatar
    BigCode
    @BigCodeProject
    Dec 22, 2022
    Announcing a holiday gift: ๐ŸŽ…SantaCoder - a 1.1B multilingual LM for code that outperforms much larger open-source models on both left-to-right generation and infilling! Demo: hf.co/spaces/bigcodeโ€ฆ Paper: hf.co/datasets/bigcoโ€ฆ Attribution: hf.co/spaces/bigcodeโ€ฆ A๐Ÿงต:
    264K
  • user avatar
    BigCode
    @BigCodeProject
    Jun 8, 2023
    ๐Ÿ“ฃ Introducing โญ StarCoder+ & StarChat Beta! We trained StarCoder on the Falcon model's English web dataset and Instruction-tuned it. Both models rank high in the LLM leaderboard, with strong natural language performance and coding capabilities. huggingface.co/HuggingFaceH4/โ€ฆ
    80K
  • user avatar
    BigCode
    @BigCodeProject
    Apr 5, 2023
    We started training something big and the daily training updates have degenerated to weather reports ๐ŸŒฆ:
    67K
  • user avatar
    BigCode
    @BigCodeProject
    Apr 29, 2024
    Releasing StarCoder2 Instruct! ๐Ÿš€ Achieves 72% HumanEval score using only self-generated content without any GPT-3.5/4 data. This work demonstrates that self-instruct works already well at the 15B scale without data from proprietary models! Read more: huggingface.co/blog/sc2-instrโ€ฆ
    39K
  • user avatar
    BigCode
    @BigCodeProject
    May 4, 2023
    Replying to @BigCodeProject
    Today we release two open-access models! StarCoderBase: trained on 1T tokens in 80+ programming languages huggingface.co/bigcode/starcoโ€ฆ StarCoder: additionally trained on 35B Python tokens that can be prompted to reach 40.8% pass@1 huggingface.co/bigcode/starcoโ€ฆ
    86K
  • user avatar
    BigCode
    @BigCodeProject
    Jul 27, 2023
    ๐ŸŒŒ News from the StarCoder cosmos! We trained smaller versions of StarCoder: 1B, 3B and 7B models. 1T tokens, 80+ programming languages with 8k context window, MQA & FIM.
    78K
  • user avatar
    BigCode
    @BigCodeProject
    Sep 26, 2022
    print("Hello world! ๐ŸŽ‰") Excited to announce the BigCode project led by @ServiceNowRSRCH and @huggingface! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way. Join here: bigcode-project.org/docs/about/joiโ€ฆ A thread with our goals๐Ÿงต
  • user avatar
    BigCode
    @BigCodeProject
    Apr 18, 2023
    Day 18: Weather is clear and the loss is still going down ...
    37K
  • user avatar
    BigCode
    @BigCodeProject
    May 22, 2023
    Introducing the BigCode Evaluation Harness for Code LLMs: github.com/bigcode-projecโ€ฆ Inspired by the lm-evaluation-harness from @AiEleuther, it ensures ease-of-use, reproducibility and efficiency. Letโ€™s explore its key features ๐Ÿงต:
    32K
  • user avatar
    BigCode
    @BigCodeProject
    Dec 1, 2022
    Today we are releasing The Stack v1.1! ๐Ÿš€ We added more data, included more programming languages, and extended the list of permissive licenses used. huggingface.co/datasets/bigcoโ€ฆ Also the first batch of opt-out requests was removed from the dataset.
  • user avatar
    BigCode
    @BigCodeProject
    May 4, 2023
    Replying to @BigCodeProject
    We present the most extensive evaluation of code LLMs to date in the full tech report with 68 (!) authors. You can also read up on all the details from data preprocessing and governance to training at scale! drive.google.com/file/d/1cN-b9Gโ€ฆ
    35K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

TermsยทPrivacyยทCookiesยทAccessibilityยทAds Infoยทยฉ 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up