Log inSign up
Matthew Carrigan
2,308 posts
user avatar
Matthew Carrigan
@carrigmat
@huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him
Dublin, Ireland
github.com/rocketknight1
Joined April 2021
399
Following
13.4K
Followers
  • Pinned
    user avatar
    Matthew Carrigan
    @carrigmat
    Aug 12, 2024
    Big announcement today @huggingface: We now have a unified API for tool use across models from @MistralAI, @AIatMeta, @cohere, @NousResearch and more! That means that you can reuse the same simplified, portable code to add tool capabilities to all of those models! 🧵
    78K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jan 28, 2025
    Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:
    5.5M
  • user avatar
    Matthew Carrigan
    @carrigmat
    Nov 29, 2024
    The prophecy of @karpathy has come to pass
    user avatar
    Julien Chaumond
    @julien_c
    Nov 29, 2024
    Qwen QwQ switching to Chinese when it needs to _really think_ about something, then switching back to English, is pretty cool @Alibaba_Qwen
    189K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jul 27, 2024
    I almost spat water all over my screen while reading the LLM benchmarks on r/LocalLLaMA
    219K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jun 21, 2024
    Good morning. At some point this summer, perhaps quite soon, @AIatMeta will be releasing a LLaMA-3 model with 400B parameters. It will likely be the strongest open-source LLM ever released by a wide margin. This is a thread about how to run it locally. 🧵
    373K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Feb 1, 2022
    Deep learning pro tip: When submitting a paper for blind review, claim that you used JAX + Haiku. Unable to see the author byline, the reviewers will assume you're at DeepMind and be intimidated into automatically accepting you, possibly even for a keynote presentation.
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jan 28, 2025
    Replying to @carrigmat
    And once it passes that test, just use llama-server to host the model and pass requests in from your other software. You now have frontier-level intelligence hosted entirely on your local machine, all open-source and free to use!
    165K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jan 28, 2025
    Replying to @carrigmat
    Motherboard: Gigabyte MZ73-LM0 or MZ73-LM1. We want 2 EPYC sockets to get a massive 24 channels of DDR5 RAM to max out that memory size and bandwidth.
    MZ73-LM0 (Rev. 3.x) Server Motherboard - GIGABYTE Global
    From gigabyte.com
    341K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Apr 9, 2024
    Alright, strap in. Support for Command-R+ was merged into llama.cpp exactly 4 hours ago. We're going to start talking to a GPT-4 level model on local hardware without a GPU. If you have 64GB of RAM, feel free to follow along 🧵
    281K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jan 28, 2025
    Replying to @carrigmat
    And if you got this far: Yes, there's no GPU in this build! If you want to host on GPU for faster generation speed, you can! You'll just lose a lot of quality from quantization, or if you want Q8 you'll need >700GB of GPU memory, which will probably cost $100k+
    170K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jan 28, 2025
    Replying to @carrigmat
    If all goes well, you should witness a short load period followed by the stream of consciousness as a state-of-the-art local LLM begins to ponder your question:
    00:00
    183K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Aug 29, 2024
    An elegant idea I got from a @GoogleDeepMind paper years back: When doing continuous-valued regression with a neural net, don't have a single neuron output estimating the value. Instead, have a layer of neurons outputting the mean/SD/weight of gaussians. 🧵
    134K
  • user avatar
    Matthew Carrigan
    @carrigmat
    Feb 16, 2022
    Real programmers debug by putting something like print("aa aaaa AAAAA") inside a hot loop.
  • user avatar
    Matthew Carrigan
    @carrigmat
    Jan 28, 2025
    Replying to @carrigmat
    RAM: This is the big one. We are going to need 768GB (to fit the model) across 24 RAM channels (to get the bandwidth to run it fast enough). That means 24 x 32GB DDR5-RDIMM modules. Example kits: v-color.net/products/ddr5-… newegg.com/nemix-ram-384g…
    299K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up