Scaling Textual Gradients via Sampling-Based Momentum

Ding, Zixin; Hong, Junyuan; Shi, Zhan; Wang, Jiachen T.; Lin, Zinan; Yin, Li; Liu, Meng; Wang, Zhangyang; Chen, Yuxin

doi:10.1145/3786335.3813168

Computer Science > Computation and Language

arXiv:2506.00400 (cs)

[Submitted on 31 May 2025 (v1), last revised 29 Jun 2026 (this version, v4)]

Title:Scaling Textual Gradients via Sampling-Based Momentum

Authors:Zixin Ding, Junyuan Hong, Zhan Shi, Jiachen T. Wang, Zinan Lin, Li Yin, Meng Liu, Zhangyang Wang, Yuxin Chen

View PDF HTML (experimental)

Abstract:LLM-based prompt optimization, which uses LLM-provided ``textual gradients'' (feedback) to refine prompts, has emerged as an effective method for automatic prompt engineering. However, its scalability and stability are unclear when using more data in training. We systematically investigate the potential and challenges of scaling training data in textual gradient descent. We show that naively scaling training examples is infeasible due to both explicit context-length limits and an implicit context wall, where long-context degradation yields diminishing returns. Inspired by prior wisdom in stochastic gradient descent, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M), which reweights updates through momentum sampling, using bootstrapped minibatch validation accuracy as importance weights over historical prompts. To stabilize TSGD and enable effective scaling within a limited context window, TSGD-M carries prior prompts information by \textit{dynamically} exploring the past top performing prompts without expanding input context length. TSGD-M integrates seamlessly into existing prompt optimization frameworks, including TextGrad, DSPy-COPRO, and AdalFlow, and achieves consistent gains across 6 benchmarks.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.00400 [cs.CL]
	(or arXiv:2506.00400v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2506.00400
Journal reference:	CAIS '26: Proceedings of the ACM Conference on AI and Agentic Systems, 2026
Related DOI:	https://doi.org/10.1145/3786335.3813168

Submission history

From: Zixin Ding [view email]
[v1] Sat, 31 May 2025 05:35:45 UTC (3,267 KB)
[v2] Thu, 13 Nov 2025 00:48:56 UTC (466 KB)
[v3] Tue, 18 Nov 2025 00:22:57 UTC (473 KB)
[v4] Mon, 29 Jun 2026 04:01:03 UTC (1,217 KB)

Computer Science > Computation and Language

Title:Scaling Textual Gradients via Sampling-Based Momentum

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Scaling Textual Gradients via Sampling-Based Momentum

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators