Researchers at Stanford University and MIT launch search streams

A machine learning framework that enables language models to learn to solve problems by searching for languages without any external support

Researchers from Stanford University, MIT and Harvey Mudd have designed a method to teach language models how to search and backtrack by representing the search process as a serialized string “search stream”(SoS). They proposed a unified search language and demonstrated it through a countdown game. Pre-training Transformer-based language models on search streams increased accuracy by 25%, while further fine-tuning through policy improvement methods solved 36% of previously unsolved issues. This suggests that language models can learn to solve problems through search, improve themselves, and discover new strategies on their own.

Recent research integrates language models into search and planning systems, using them to generate and evaluate potential actions or states. These methods use symbol search algorithms such as BFS or DFS to formulate exploration strategies. However, LM is mainly used for reasoning, and reasoning capabilities need to be improved. Instead, the contextual demonstration uses language to explain the search process, allowing LM to perform tree searches accordingly. However, these methods are limited by the demonstration program. Process supervision involves training an external validator model to provide detailed feedback for LM training, which performs better than results supervision, but requires a large amount of labeled data.

The following is a summary of the paper:

Imagine seeing only the right solution to a problem but not seeing the mistakes or recovering from them. You may learn that problems must be solved once, rather than through exploration and error. Most of the data used to train language models (LMs) reflect only the results of the decision-making process, not the decision-making process itself. LM will never make mistakes. They will never learn to search, plan or go back. Complex decision-making and reasoning require search. In this article, we explore the impact of training LMs on the search process, including errors, and then letting them improve themselves.

Autoregressive models based on Transformers have proven difficult to cope with planning (Valmeekam et al., 2024;Pallagani et al., 2023;Momennejad et al., 2024). Recent work has highlighted this weakness of autoregressive models by identifying two major issues (LeCun, 2023;Bachmann Nagarajan, 2024):
1) Errors snowball, where individual errors can compound and cause worse performance in subsequent steps (Ross et al., 2011;Arora et al., 2022), and
2) Difficulties in the “forward-looking task”, where a model must predict the consequences of its actions several steps in advance (Credit Allocation, Cf. Sutton and Barto, 2018).
Both problems can be attributed to limited search and traceability capabilities. While recent efforts to combine language models with symbol search algorithms (Ahn et al., 2022;Yao et al., 2024) to alleviate some of these issues, they are limited-only supplementing the language model in the reasoning process-and they leave an open question of whether the language model can effectively search on its own. Perhaps the most important results of learning to search are during the training process (Silver et al., 2018）。If language models can learn to search during training, they may be able to discover more flexible search strategies through self-improvement. This may lead to models being better able to respond to the challenges posed by erroneous compounding and proactive tasks.

The results show that the Transformer-based language model can learn to solve problems through search while showing how to recover from errors and search through different options. More importantly, our results show that these models can self-improve to autonomously use different search strategies to solve previously unresolved problems. Finally, we saw some evidence that they discovered new search strategies when trained to optimize accuracy.

Each of these operations can be implicit, affect how the trajectory unfolds, or be explicitly expressed in the language as part of the search trajectory.𝒯When operations are implicit, models are more likely to internalize their abstract representations, which can be improved through training. An explicit operation becomes an explicit inference action made by LM. We chose to clearly express the current state, target state, backtracking operations, target inspection, and exploration options in the trajectory. We choose implicit heuristics, state values, and pruning strategies.

@Stanford @MIT_CSAIL
Thesis: https://arxiv.org/abs/2404.03683
GitHub： https://github.com/kanishkg/stream-of-search
Quick reading: https://marktechpost.com/2024/04/10/researchers-at-stanford-and-mit-introduced-the-stream-of-search-sos-a-machine-learning-framework-that-enables-language-models-to-learn-to-solve-problems-by-searching-in-language-without-any-externa/

If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank

Video: