Value Function in Reinforcement Learning

20h

Daguan Data Intelligent Recommendation: The Core of Building a Highly Relevant Recommendation Mechanism for Short Video Information Platforms

When opening a short video information app and swiping across the screen, users expect to find content that is 'just what they want to see' — but the reality is often different: during commutes, users ...

The 9 old-school money habits that modern experts say are actually genius

The 9 old-school money habits that modern experts say are actually genius In today’s world of contactless payments that ...

23h

EMNLP2025 | The Combination of SFT and RL: vivo AI Lab Proposes a New Post-Training Method

Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) fine-tuning are two common methods for post-training large models. While reinforcement learning fine-tuning has made significant progress ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Daguan Data Intelligent Recommendation: The Core of Building a Highly Relevant Recommendation Mechanism for Short Video Information Platforms

The 9 old-school money habits that modern experts say are actually genius

EMNLP2025 | The Combination of SFT and RL: vivo AI Lab Proposes a New Post-Training Method

Trending now