Log inSign up
Xinyun Chen
232 posts
user avatar
Xinyun Chen
@xinyun_chen_
Research Scientist @Meta MSL. Prev. @GoogleDeepMind. PhD @Berkeley_EECS.
jungyhuk.github.io
Joined February 2020
1,295
Following
7,198
Followers
  • Pinned
    user avatar
    Xinyun Chen
    @xinyun_chen_
    Apr 9
    It’s been a great honor to work in the incredible team on our first milestone towards personal superintelligence! Really proud of what we have achieved in the past nine months. Try our new models on meta.ai and let us know your feedback!
    user avatar
    Alexandr Wang
    Meta
    @alexandr_wang
    Apr 8
    1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
    6.2K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Apr 12, 2023
    New preprint: Teach LLMs to self-debug! (arxiv.org/abs/2304.05128) With few-shot demonstrations, LLMs can perform rubber duck debugging: w/o error messages, it can identify bugs by explaining the predicted code. SOTA on several code generation benchmarks using code-davinci-002.
    140K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Nov 30, 2023
    New preprint: Universal Self-Consistency for Large Language Model Generation arxiv.org/abs/2311.17311 We propose Universal Self-Consistency (USC) to aggregate free-form responses, such as code generation and summarization, where the original SC is not applicable.
    52K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Oct 13, 2023
    Our new work (arxiv.org/abs/2310.07064) shows that LLMs can learn (sometimes uncommon) rules with 2 stages: (1) induction: generate and verify rules from exemplars; (2) deduction: utilize the rule library for new problems. 11-27% gain on reasoning tasks that require rule learning.
    user avatar
    Zhaocheng Zhu
    @zhu_zhaocheng
    Oct 12, 2023
    🔥 When talking about training LLMs, do you think of updating model parameters? In fact, you can use LLMs to learn a rule library. This not only improves multi-step reasoning, but also has many advantages: interpretability, transferability, and applicable to black-box LLMs. 🧵1/6
    44K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Feb 2, 2022
    I am very excited to be part of the team #AlphaCode in my summer internship last year! A huge thanks to my host @liyuajia for adding me to this amazing team! Looking forward to see what comes next!
    user avatar
    Google DeepMind
    @GoogleDeepMind
    Feb 2, 2022
    Introducing #AlphaCode: a system that can compete at average human level in competitive coding competitions like @codeforces. An exciting leap in AI problem-solving capabilities, combining many advances in machine learning! Read more: dpmd.ai/Alpha-Code 1/
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Dec 19, 2024
    Very excited to be part of the team that builds Gemini 2.0 Flash Thinking. Try our experimental model at aistudio.google.com/prompts/new_ch…. Any feedback is welcome and appreciated!
    user avatar
    Jeff Dean
    @JeffDean
    Dec 19, 2024
    Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time
    20K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Feb 15, 2024
    New preprint🔥: Premise Order Matters in Reasoning with Large Language Models arxiv.org/abs/2402.08939 In typical logical reasoning, premise order doesn't matter. However, for SOTA LLMs, changing the premise order may cause an accuracy drop of >30%! 🧵 1/8
    11K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Jul 4, 2023
    Our work (x.com/xinyun_chen_/s…) demonstrates that self-debugging ability already exists in the base model w/o instruction tuning (code-davinci-002). The main difference is that we need few-shot prompting for such models to trigger self-debugging.
    user avatar
    Jim Fan
    @DrJimFan
    Jul 3, 2023
    GPT-4 has one emergent ability that is extremely useful and stronger than any other models: self-debug. Even the most expert human programmer cannot always get a program correct at the first try. We look at execution results, reason about what's wrong, apply fixes, rinse and
    35K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Sep 9, 2023
    Thanks for sharing our work (arxiv.org/abs/2309.03409)! Besides the huge improvement with prompts optimized by LLMs, we are also amazed by the creativity of LLMs, which continually surprise us with interesting prompts tailored to the LLM in the optimization loop!
    user avatar
    Ethan Mollick
    @emollick
    Sep 8, 2023
    In a new paper showing that AI comes up with more effective prompts for other AIs than humans do, there is this gem that shows how weird AIs are... The single most effective prompt was to start by telling the AI "Take a deep breath and work step-by-step!" arxiv.org/pdf/2309.03409…
    29K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Oct 5, 2023
    Our new work (arxiv.org/abs/2310.01798) shows that currently LLM self-correction w/o external feedback (e.g., oracle verification, code execution) often degrades the performance on reasoning tasks. The main issue is the LLM itself does not properly judge its reasoning correctness.
    user avatar
    Jie Huang
    xAI
    @jefffhj
    Oct 4, 2023
    Can LLMs Self-Correct Their Reasoning? Recent studies (self-refine, self-critique, etc.) suggest LLMs possess a great ability to self-correct their responses. However, our research indicates LLMs cannot self-correct their reasoning intrinsically. arxiv.org/abs/2310.01798 [1/n]
    18K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Sep 11, 2023
    In our work arxiv.org/abs/2309.03409, besides prompt optimization as our primary application, we also investigate the potential of LLMs for broader optimization problems. Interestingly, LLMs can find good solutions to some small-scale classic optimization problems; e.g., TSP. This
    user avatar
    Chengrun Yang
    @chengrun_yang
    Sep 8, 2023
    New preprint: Large Language Models as Optimizers (arxiv.org/abs/2309.03409) (1/5)
    24K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Feb 16, 2024
    Excited to share our work (read-agent.github.io) for reading long documents way exceeding the context window (up to 20x). Inspired by human reading paradigm, Read Agent summarizes the input episodically as gist memories, and uses them to retrieve relevant details when needed.
    user avatar
    Kuang-Huei Lee
    @kuanghueilee
    Feb 16, 2024
    We propose ReadAgent 📖, a LLM agent that reads and reasons over text up to 20x more than the raw context length. Like humans, it decides where to pause, keeps fuzzy episodic memories of past readings, and looks up detail info as needed. Just by prompting. read-agent.github.io
    https://read-agent.github.io
    11K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Mar 9, 2023
    Our new work led by @JerryWeiAI on the in-context learning ability of large language models. While smaller-scale pretrained models rely more on their semantic prior, larger models can follow in-context exemplars that are even contradictory to their own knowledge.
    user avatar
    Jerry Wei
    @JerryWeiAI
    Mar 8, 2023
    New @GoogleAI paper: How do language models do in-context learning? arxiv.org/abs/2303.03846 Large language models (GPT-3.5, PaLM) can follow in-context exemplars, even if the labels are flipped or semantically unrelated. This ability wasn’t present in small language models. 1/
    22K
  • user avatar
    Xinyun Chen
    @xinyun_chen_
    Oct 4, 2023
    Our new work (arxiv.org/abs/2310.01714) shows that LLM-generated exemplars can outperform hand-crafted CoT. Interestingly, LLM-generated tutorials for competitive programming improve the results even if the generated example problems are much simpler than the new contest problem!
    You’re unable to view this Post because this account owner limits who can view their Posts. Learn more
    arXiv logo
    arxiv.org
    Large Language Models as Analogical Reasoners
    Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we...
    16K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up