Matthew Carrigan (@carrigmat) / X

Matthew Carrigan

2,308 posts

Matthew Carrigan

@carrigmat

@huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him

Dublin, Ireland

github.com/rocketknight1

Joined April 2021

Pinned
Matthew Carrigan
@carrigmat
Aug 12, 2024
Big announcement today @huggingface: We now have a unified API for tool use across models from @MistralAI, @AIatMeta, @cohere, @NousResearch and more! That means that you can reuse the same simplified, portable code to add tool capabilities to all of those models! 🧵
78K
Matthew Carrigan
@carrigmat
Jan 28, 2025
Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:
5.5M
Matthew Carrigan
@carrigmat
Nov 29, 2024
The prophecy of @karpathy has come to pass
Julien Chaumond
@julien_c
Nov 29, 2024
Qwen QwQ switching to Chinese when it needs to _really think_ about something, then switching back to English, is pretty cool @Alibaba_Qwen
189K
Matthew Carrigan
@carrigmat
Jul 27, 2024
I almost spat water all over my screen while reading the LLM benchmarks on r/LocalLLaMA
219K
Matthew Carrigan
@carrigmat
Jun 21, 2024
Good morning. At some point this summer, perhaps quite soon, @AIatMeta will be releasing a LLaMA-3 model with 400B parameters. It will likely be the strongest open-source LLM ever released by a wide margin. This is a thread about how to run it locally. 🧵
373K
Matthew Carrigan
@carrigmat
Feb 1, 2022
Deep learning pro tip: When submitting a paper for blind review, claim that you used JAX + Haiku. Unable to see the author byline, the reviewers will assume you're at DeepMind and be intimidated into automatically accepting you, possibly even for a keynote presentation.
Matthew Carrigan
@carrigmat
Jan 28, 2025
Replying to @carrigmat
And once it passes that test, just use llama-server to host the model and pass requests in from your other software. You now have frontier-level intelligence hosted entirely on your local machine, all open-source and free to use!
165K
Matthew Carrigan
@carrigmat
Jan 28, 2025
Replying to @carrigmat
Motherboard: Gigabyte MZ73-LM0 or MZ73-LM1. We want 2 EPYC sockets to get a massive 24 channels of DDR5 RAM to max out that memory size and bandwidth.
MZ73-LM0 (Rev. 3.x) Server Motherboard - GIGABYTE Global
From gigabyte.com
341K
Matthew Carrigan
@carrigmat
Apr 9, 2024
Alright, strap in. Support for Command-R+ was merged into llama.cpp exactly 4 hours ago. We're going to start talking to a GPT-4 level model on local hardware without a GPU. If you have 64GB of RAM, feel free to follow along 🧵
281K
Matthew Carrigan
@carrigmat
Jan 28, 2025
Replying to @carrigmat
And if you got this far: Yes, there's no GPU in this build! If you want to host on GPU for faster generation speed, you can! You'll just lose a lot of quality from quantization, or if you want Q8 you'll need >700GB of GPU memory, which will probably cost $100k+
170K
Matthew Carrigan
@carrigmat
Jan 28, 2025
Replying to @carrigmat
If all goes well, you should witness a short load period followed by the stream of consciousness as a state-of-the-art local LLM begins to ponder your question:
00:00
183K
Matthew Carrigan
@carrigmat
Aug 29, 2024
An elegant idea I got from a @GoogleDeepMind paper years back: When doing continuous-valued regression with a neural net, don't have a single neuron output estimating the value. Instead, have a layer of neurons outputting the mean/SD/weight of gaussians. 🧵
134K
Matthew Carrigan
@carrigmat
Feb 16, 2022
Real programmers debug by putting something like print("aa aaaa AAAAA") inside a hot loop.
Matthew Carrigan
@carrigmat
Jan 28, 2025
Replying to @carrigmat
RAM: This is the big one. We are going to need 768GB (to fit the model) across 24 RAM channels (to get the bandwidth to run it fast enough). That means 24 x 32GB DDR5-RDIMM modules. Example kits: v-color.net/products/ddr5-… newegg.com/nemix-ram-384g…
299K