Our humble attempts to build o1-like autonomous AI agents that can
We present R-MCTS and Exploratory Learning for building o1-like models for agentic applications. Our R-MCTS agent extends traditional MCTS with 1) contrastive reflection to learn from past success/mistakes, and 2) multi-agent debate value function. Exploratory Learning is a novel learning strategy that trains the models to explore the environment, evaluate a state, and backtrack to viable ones when encountering unpromising states. Our R-MCTS agent and Exploratory Learning demonstrate the compute scaling properties in both training and testing time.