arxiv.org
Apr 10, 2026
7.20/10
Medium
Agent Memory Systems
๐ง MemReader-0.6B, MemReader-4B, MemOS, GRPO (Group Relative Policy Optimization), ReAct
arxiv.org
Apr 10, 2026
6.50/10
Low
Reinforcement Learning for LLM Agents
๐ง T-STAR, Group Relative Policy Optimization
arxiv.org
Apr 10, 2026
6.50/10
Low
Reinforcement Learning for Mobile AI Agents
๐ง Android Coach, UI-TARS-1.5-7B, PPO, GRPO, AndroidLab, AndroidWorld
arxiv.org
Apr 10, 2026
6.50/10
Medium
Multimodal AI / Reinforcement Learning for LLMs
๐ง OpenVLThinkerV2, G2RPO (Gaussian GRPO)
arxiv.org
Apr 10, 2026
6.20/10
Low
3D Sketch Generation / Spatial AI Reasoning
๐ง CLIP, GRPO (Group Reward Policy Optimization)
arxiv.org
Apr 10, 2026
5.50/10
Low
Empathetic AI / Conversational AI
๐ง PEER, SER, UnifiReward, GRPO
pub.towardsai.net
Apr 8, 2026
6.50/10
Medium
LLM Fine-Tuning and Alignment
๐ง DPO (Direct Preference Optimization), GRPO (Group Relative Policy Optimization), PPO (Proximal Policy Optimization), LoRA, QLoRA, vLLM, SGLang, LMDeploy
arxiv.org
Apr 8, 2026
7.20/10
Medium
LLM Alignment & Sycophancy Reduction
๐ง GRPO (Group Relative Policy Optimisation), SycophancyEval
arxiv.org
Apr 8, 2026
7.20/10
Medium
Agentic AI Training
๐ง TRACE, LoRA, GRPO, GEPA, tau2-bench, ToolSandbox
arxiv.org
Apr 8, 2026
7.20/10
Medium
Chain-of-Thought Reasoning Optimization
๐ง ETR (Entropy Trend Reward), GRPO (Group Relative Policy Optimization), DeepSeek-R1-Distill-7B, arXiv, GitHub, DeepSeek
arxiv.org
Apr 8, 2026
7.20/10
Low
LLM Reasoning and Algorithmic Innovation
๐ง Qwen3-4B-Thinking-2507, GRPO
arxiv.org
Apr 8, 2026
7.20/10
Medium
Information Retrieval / RAG Optimization
๐ง OpenAI text-embedding-3-small, OpenAI text-embedding-3-large, Jina-ColBERT-V2, GRPO, OpenAI, Jina AI
arxiv.org
Apr 8, 2026
6.50/10
Low
LLM Training / Reinforcement Learning
๐ง ThinkTwice, GRPO (Group Relative Policy Optimization)
arxiv.org
Apr 8, 2026
6.50/10
Medium
LLM Inference Efficiency / Multi-Turn Reasoning
๐ง TAB (Turn-Adaptive Budgets), TAB All-SubQ, GRPO (Group Relative Policy Optimization)
arxiv.org
Apr 8, 2026
6.50/10
Low
Knowledge Editing in Large Language Models
๐ง CoT2Edit, RAG (Retrieval-Augmented Generation), GRPO (Group Relative Policy Optimization), SFT (Supervised Fine-Tuning), GitHub
arxiv.org
Apr 8, 2026
5.50/10
Low
Multimodal AI Reasoning / Reinforcement Learning
๐ง NoisyGRPO, Qwen2.5-VL 3B
arxiv.org
Apr 8, 2026
5.50/10
Low
Multimodal Retrieval-Augmented Generation / Re-Ranking
๐ง Region-R1, r-GRPO (region-aware group relative policy optimization), arXiv
aws.amazon.com
Apr 6, 2026
7.20/10
Medium
AI Model Fine-Tuning / Agentic AI
๐ง Amazon SageMaker AI, Kiro, MLflow, GRPO, RLVR, RLAIF, SFT, DPO
arxiv.org
Apr 6, 2026
9.50/10
High
Agentic Reinforcement Learning / Competitive Programming AI
๐ง GrandCode, Agentic GRPO, Codeforces, Google
arxiv.org
Apr 6, 2026
8.00/10
Medium
Reinforcement Learning for AI Agents
๐ง MT-GRPO, GTPO, Qwen3.5-4B, Qwen3-30B-A3B, GPT-4.1, GPT-4o, Claude Sonnet 4.5, arXiv
arxiv.org
Apr 6, 2026
7.20/10
Low
Embodied AI / Active Visual Perception
๐ง EyeVLA, GRPO (Group Relative Policy Optimization)
arxiv.org
Apr 6, 2026
6.50/10
Low
LLM Alignment / Reinforcement Learning
๐ง RTT (Rubrics to Tokens), RTT-GRPO
arxiv.org
Apr 6, 2026
6.50/10
Low
Reinforcement Learning for Language Agents
๐ง Self-Guide, GRPO
arxiv.org
Apr 6, 2026
6.50/10
Low
Autonomous Driving / Reinforcement Learning
๐ง ExploreVLA, GRPO (Group Relative Policy Optimization), NAVSIM, nuScenes
arxiv.org
Apr 6, 2026
6.20/10
Low
LLM Reasoning / Reinforcement Learning from Process Rewards
๐ง PROGRS, GRPO (Group Relative Policy Optimization)
arxiv.org
Apr 3, 2026
7.20/10
Medium
Reinforcement Learning / AI Safety
๐ง GRPO, representation engineering
arxiv.org
Apr 3, 2026
6.50/10
Low
LLM Fine-Tuning / Optimization Methods
๐ง Evolution Strategies (ES), GRPO (Group Relative Policy Optimization), GitHub
arxiv.org
Apr 3, 2026
6.50/10
Low
LLM Post-Training / Reinforcement Learning Optimization
๐ง GRPO, SDPO, SRPO
arxiv.org
Apr 3, 2026
5.50/10
Low
3D Scene Understanding / Affordance Reasoning
๐ง A3R, MLLM (Multimodal Large Language Model), GRPO (Group Relative Policy Optimization), arXiv
arxiv.org
Apr 1, 2026
8.50/10
High
AI Safety / Adversarial Fine-Tuning
๐ง Constitutional Classifiers, GRPO-based hybrid reinforcement learning, Anthropic
arxiv.org
Apr 1, 2026
8.50/10
Medium
AI Research Automation / Neural Architecture Search
๐ง ASI-Evolve, DeltaNet, GRPO, arXiv
arxiv.org
Apr 1, 2026
7.80/10
Medium
Reinforcement Learning for Clinical Decision Support
๐ง DeToxR, GRPO (Group Relative Policy Optimization)
arxiv.org
Apr 1, 2026
7.20/10
Medium
AI Calibration in Medical Imaging
๐ง ConRad, GRPO
arxiv.org
Apr 1, 2026
6.50/10
Medium
AI Agent Memory Systems
๐ง MemFactory, LLaMA-Factory, Memory-R1, RMM, MemAgent, GRPO (Group Relative Policy Optimization)
arxiv.org
Apr 1, 2026
6.50/10
Low
AI Benchmarks & Visual Code Generation
๐ง VectorGym, GRPO, VLM-as-a-Judge, Hugging Face, ServiceNow, OpenAI
arxiv.org
Apr 1, 2026
6.50/10
Low
Egocentric Video Understanding / Multimodal Reasoning
๐ง EgoReasoner, GRPO, Chain-of-Thought (CoT), Qwen
arxiv.org
Apr 1, 2026
6.20/10
Low
LLM Training / Reinforcement Learning
๐ง ShapE-GRPO, GRPO
arxiv.org
Mar 31, 2026
7.20/10
Medium
Vision-Language Model Reasoning
๐ง VAPO-Thinker-7B, VAPO (Vision-Anchored Policy Optimization), GRPO (Group Relative Policy Optimization)
arxiv.org
Mar 31, 2026
7.20/10
Low
Reinforcement Learning / Reasoning Models
๐ง SARL (Structure Aware Reinforcement Learning), PPO, GRPO, Qwen3-4B
arxiv.org
Mar 31, 2026
6.50/10
Low
Reinforcement Learning / LLM Training Optimization
๐ง ERPO, GRPO
arxiv.org
Mar 31, 2026
6.50/10
Low
Autonomous Driving AI
๐ง AutoDrive-P3, P3-CoT, P3-GRPO, arXiv, GitHub
arxiv.org
Mar 31, 2026
6.50/10
Low
Reinforcement Learning for Video Generation
๐ง GRPO (Group Relative Policy Optimization), Wan-R1
arxiv.org
Mar 31, 2026
6.50/10
Low
Reinforcement Learning for LLM Agents
๐ง RetroAgent, SimUtil-UCB, GRPO, ALFWorld, WebShop, Sokoban, MineSweeper
arxiv.org
Mar 31, 2026
6.20/10
Low
Multimodal AI / Retrieval-Augmented Generation
๐ง OmniRAG-Agent, OmniLLM, GRPO (Group Relative Policy Optimization), OmniVideoBench, WorldSense, Daily-Omni
arxiv.org
Mar 31, 2026
5.50/10
Low
Vision-Language Model Training / Reinforcement Learning from Feedback
๐ง GRPO (Group Relative Policy Optimization), Differential Feedback
arxiv.org
Mar 31, 2026
5.50/10
Low
Medical AI / Reinforcement Learning
๐ง MedLoc-R1, GRPO (Group Relative Policy Optimization), GitHub, MembrAI
arxiv.org
Mar 31, 2026
5.50/10
Low
Reinforcement Learning for Generative Models
๐ง Flow-GRPO, Stepwise-Flow-GRPO, DDIM
arxiv.org
Mar 27, 2026
7.20/10
Medium
Adversarial Robustness / Robotic AI Security
๐ง SABER, GRPO, ReAct, LIBERO benchmark
arxiv.org
Mar 27, 2026
6.50/10
Low
Reinforcement Learning / LLM Training Efficiency
๐ง GRPO, DAPO, ARRoL
arxiv.org
Mar 27, 2026
6.50/10
Low
Mixture-of-Experts / Vision-Language Model Optimization
๐ง MoE-GRPO, GRPO (Group Relative Policy Optimization), arXiv