Benn Stancil on benn.substack

65 Comments

Dec 4, 2023

I think I'd have questions about that:

1. It makes sense to me that doing "math" on text (be it hand-wavy LLM magic or topic models or older sentiment analysis stuff or whatever) would have some weird nuances like what you're describing - but as you said, the same seems like it's true of math on numbers? I don't think either number math or text "math" makes the conclusions obvious, and there's still a lot of human interpretation to be done in both cases. But it seems to me that number math has made a working with lots of numbers a lot easier, so that we're at least able transform the numbers into something more digestible. Not conclusive, not on its own, but digestible. My question is if we can do the same with lots of text.

2. The even more hand-wavy thought is that, while the text "math" might have these weird properties (e.g., summarize(1000) + summarize(200) makes a whole new thing), that wouldn't be the case for human researchers. If you gave 20 interviews to people and said, "tell me what is important," they'd probably say pretty much the same thing if you gave them that 20 plus 5 more. Unless those 5 are really different, or make some trend obvious, in which case they might say something new - but then, that seems like the right decision? Ultimately, I think that's my thought here though: Can LLMs approximate the same results as what people would do? And if they can, that seems kinda "right?"

Reply (1)

melee_warhead

Dec 4, 2023Edited

1) We're in the same space. Where I get concerned is that when I think about most of the "math on numbers" problems, a very large % of them are descriptive, and they work in obvious ways. So, add a high number, and the average changes.

The text-math examples are more complex modeling problems where the inflection points of change are less obvious. I'm in agreement that digestibility is likely possible. I disagree with the idea of turning it into a metric.

2) "If you gave 20 interviews to people and said, "tell me what is important," they'd probably say pretty much the same thing if you gave them that 20 plus 5 more." And there may be ways of doing this with types of training for these models, as in "retain X clusters but add Y new variables".

I just know that with people, they're doing a thoughtful trade-off evaluation on their clustering approach. Maybe a ChatGPT will just have the "good enough clustering", I don't know? From my understanding of Topic Modeling is that there are several different types of approaches, and that it's still domain-specific(as in, the type of solution will need to match the type of problem) with multiple potential approaches. If reality strictly works a certain way, the same topics will always show up. However, I think this is more model-like, and less standard descriptive analysis like at this point in time. As in, a bit more "fiddly" & "hand-wavy" than the comparison set of objective numerical metrics.

Maybe I'm off-base, or I'm being thrown off by cursory research into topic models a few months back. I can (in theory) see a company getting used to this approach, but it's not obvious.

Reply (1)

Benn Stancil

Dec 5, 2023

1) Sure, that's fair. It's definitely not precise. You couldn't do anything properly scientific in this way I don't think. It'd really have to be more like humanities research, where there's not only variability, but sometimes outright disagreement. (Though as I say that, I do wonder if there'd be some sort of rough "central limit theorem" with this, where if you have large enough samples, every model built in broadly similar ways would converge-ish. But who knows.)

2) I could see "different models do different things" also be related to the topic modeling stuff you're describing too. Even if LLMs (and AI generally, mostly) wasn't fundamentally probabilistic, you can always get different results by asking questions in slightly different ways, training the models differently, using slightly different models, and so on. So even if one company had a standard approach for how they do it, it's almost more cultural. The research analogy might still work there: Give 20 interviews to one research team; they'll give you X back. Give 20+5 to the same team, you'll probably get Xish. Give 20 to a different team, who knows? You could get something entirely different.