Is ' compounding correctness' mainly observed in the coding or math domains? Or in other complex real-world domains too? Especially for open-ended tasks?
Extraordinary claims need extraordinary evidence.
If 'compounding correctness" is reliable why is it not being used by Anthropic to solve unsolved math conjectures every hour? How about every week? Every month?
Or why is it not being used to create new breakthrough killer apps or middleware or efficient OSes or new device drivers or publication-worthy algorithms?
How Out-of-training-distribution does the task have to get before the situation reverts to compounding error heh.
> Is ' compounding correctness' mainly observed in the coding or math domains? Or in other complex real-world domains too? Especially for open-ended tasks?
In the video above, Rohan shows how it can be used for UX design (for eg)
> If 'compounding correctness" is reliable why is it not being used by Anthropic to solve unsolved math conjectures every hour? How about every week? Every month?
> Or why is it not being used to create new breakthrough killer apps or middleware or efficient OSes or new device drivers or publication-worthy algorithms?
I feel like these things are happening? Compounding correctness is not free, the token cost is not worth it, but we are absolutely seeing publications around ai driven or supplemented gains in, e.g., mathematics.
> How Out-of-training-distribution does the task have to get before the situation reverts to compounding error heh.
Is ' compounding correctness' mainly observed in the coding or math domains? Or in other complex real-world domains too? Especially for open-ended tasks?
Extraordinary claims need extraordinary evidence.
If 'compounding correctness" is reliable why is it not being used by Anthropic to solve unsolved math conjectures every hour? How about every week? Every month?
Or why is it not being used to create new breakthrough killer apps or middleware or efficient OSes or new device drivers or publication-worthy algorithms?
How Out-of-training-distribution does the task have to get before the situation reverts to compounding error heh.
> Is ' compounding correctness' mainly observed in the coding or math domains? Or in other complex real-world domains too? Especially for open-ended tasks?
In the video above, Rohan shows how it can be used for UX design (for eg)
> If 'compounding correctness" is reliable why is it not being used by Anthropic to solve unsolved math conjectures every hour? How about every week? Every month?
> Or why is it not being used to create new breakthrough killer apps or middleware or efficient OSes or new device drivers or publication-worthy algorithms?
I feel like these things are happening? Compounding correctness is not free, the token cost is not worth it, but we are absolutely seeing publications around ai driven or supplemented gains in, e.g., mathematics.
> How Out-of-training-distribution does the task have to get before the situation reverts to compounding error heh.
Billion dollar question tbh