Xinyun Chen (@xinyun_chen

Xinyun Chen

232 posts

Xinyun Chen

@xinyun_chen_

Research Scientist @Meta MSL. Prev. @GoogleDeepMind. PhD @Berkeley_EECS.

Joined February 2020

Pinned
Xinyun Chen
@xinyun_chen_
Apr 9
It’s been a great honor to work in the incredible team on our first milestone towards personal superintelligence! Really proud of what we have achieved in the past nine months. Try our new models on meta.ai and let us know your feedback!
Alexandr Wang
@alexandr_wang
Apr 8
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
6.2K
Xinyun Chen
@xinyun_chen_
Apr 12, 2023
New preprint: Teach LLMs to self-debug! (arxiv.org/abs/2304.05128) With few-shot demonstrations, LLMs can perform rubber duck debugging: w/o error messages, it can identify bugs by explaining the predicted code. SOTA on several code generation benchmarks using code-davinci-002.
140K
Xinyun Chen
@xinyun_chen_
Nov 30, 2023
New preprint: Universal Self-Consistency for Large Language Model Generation arxiv.org/abs/2311.17311 We propose Universal Self-Consistency (USC) to aggregate free-form responses, such as code generation and summarization, where the original SC is not applicable.
52K
Xinyun Chen
@xinyun_chen_
Oct 13, 2023
Our new work (arxiv.org/abs/2310.07064) shows that LLMs can learn (sometimes uncommon) rules with 2 stages: (1) induction: generate and verify rules from exemplars; (2) deduction: utilize the rule library for new problems. 11-27% gain on reasoning tasks that require rule learning.
Zhaocheng Zhu
@zhu_zhaocheng
Oct 12, 2023
🔥 When talking about training LLMs, do you think of updating model parameters? In fact, you can use LLMs to learn a rule library. This not only improves multi-step reasoning, but also has many advantages: interpretability, transferability, and applicable to black-box LLMs. 🧵1/6
44K
Xinyun Chen
@xinyun_chen_
Feb 2, 2022
I am very excited to be part of the team #AlphaCode in my summer internship last year! A huge thanks to my host @liyuajia for adding me to this amazing team! Looking forward to see what comes next!
Google DeepMind
@GoogleDeepMind
Feb 2, 2022
Introducing #AlphaCode: a system that can compete at average human level in competitive coding competitions like @codeforces. An exciting leap in AI problem-solving capabilities, combining many advances in machine learning! Read more: dpmd.ai/Alpha-Code 1/
Xinyun Chen
@xinyun_chen_
Dec 19, 2024
Very excited to be part of the team that builds Gemini 2.0 Flash Thinking. Try our experimental model at aistudio.google.com/prompts/new_ch…. Any feedback is welcome and appreciated!
Jeff Dean
@JeffDean
Dec 19, 2024
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time
20K
Xinyun Chen
@xinyun_chen_
Feb 15, 2024
New preprint🔥: Premise Order Matters in Reasoning with Large Language Models arxiv.org/abs/2402.08939 In typical logical reasoning, premise order doesn't matter. However, for SOTA LLMs, changing the premise order may cause an accuracy drop of >30%! 🧵 1/8
11K
Xinyun Chen
@xinyun_chen_
Jul 4, 2023
Our work (x.com/xinyun_chen_/s…) demonstrates that self-debugging ability already exists in the base model w/o instruction tuning (code-davinci-002). The main difference is that we need few-shot prompting for such models to trigger self-debugging.
Jim Fan
@DrJimFan
Jul 3, 2023
GPT-4 has one emergent ability that is extremely useful and stronger than any other models: self-debug. Even the most expert human programmer cannot always get a program correct at the first try. We look at execution results, reason about what's wrong, apply fixes, rinse and
35K
Xinyun Chen
@xinyun_chen_
Sep 9, 2023
Thanks for sharing our work (arxiv.org/abs/2309.03409)! Besides the huge improvement with prompts optimized by LLMs, we are also amazed by the creativity of LLMs, which continually surprise us with interesting prompts tailored to the LLM in the optimization loop!
Ethan Mollick
@emollick
Sep 8, 2023
In a new paper showing that AI comes up with more effective prompts for other AIs than humans do, there is this gem that shows how weird AIs are... The single most effective prompt was to start by telling the AI "Take a deep breath and work step-by-step!" arxiv.org/pdf/2309.03409…
29K
Xinyun Chen
@xinyun_chen_
Oct 5, 2023
Our new work (arxiv.org/abs/2310.01798) shows that currently LLM self-correction w/o external feedback (e.g., oracle verification, code execution) often degrades the performance on reasoning tasks. The main issue is the LLM itself does not properly judge its reasoning correctness.
Jie Huang
@jefffhj
Oct 4, 2023
Can LLMs Self-Correct Their Reasoning? Recent studies (self-refine, self-critique, etc.) suggest LLMs possess a great ability to self-correct their responses. However, our research indicates LLMs cannot self-correct their reasoning intrinsically. arxiv.org/abs/2310.01798 [1/n]
18K
Xinyun Chen
@xinyun_chen_
Sep 11, 2023
In our work arxiv.org/abs/2309.03409, besides prompt optimization as our primary application, we also investigate the potential of LLMs for broader optimization problems. Interestingly, LLMs can find good solutions to some small-scale classic optimization problems; e.g., TSP. This
Chengrun Yang
@chengrun_yang
Sep 8, 2023
New preprint: Large Language Models as Optimizers (arxiv.org/abs/2309.03409) (1/5)
24K
Xinyun Chen
@xinyun_chen_
Feb 16, 2024
Excited to share our work (read-agent.github.io) for reading long documents way exceeding the context window (up to 20x). Inspired by human reading paradigm, Read Agent summarizes the input episodically as gist memories, and uses them to retrieve relevant details when needed.
Kuang-Huei Lee
@kuanghueilee
Feb 16, 2024
We propose ReadAgent 📖, a LLM agent that reads and reasons over text up to 20x more than the raw context length. Like humans, it decides where to pause, keeps fuzzy episodic memories of past readings, and looks up detail info as needed. Just by prompting. read-agent.github.io
11K
Xinyun Chen
@xinyun_chen_
Mar 9, 2023
Our new work led by @JerryWeiAI on the in-context learning ability of large language models. While smaller-scale pretrained models rely more on their semantic prior, larger models can follow in-context exemplars that are even contradictory to their own knowledge.
Jerry Wei
@JerryWeiAI
Mar 8, 2023
New @GoogleAI paper: How do language models do in-context learning? arxiv.org/abs/2303.03846 Large language models (GPT-3.5, PaLM) can follow in-context exemplars, even if the labels are flipped or semantically unrelated. This ability wasn’t present in small language models. 1/
22K
Xinyun Chen
@xinyun_chen_
Oct 4, 2023
Our new work (arxiv.org/abs/2310.01714) shows that LLM-generated exemplars can outperform hand-crafted CoT. Interestingly, LLM-generated tutorials for competitive programming improve the results even if the generated example problems are much simpler than the new contest problem!
You’re unable to view this Post because this account owner limits who can view their Posts. Learn more
arxiv.org
Large Language Models as Analogical Reasoners
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we...
16K