cs.AI updates on the arXiv.org e-print archive.
TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
1 week ago
ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning
1 week ago
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
1 week ago
cs.CV updates on the arXiv.org e-print archive.
Self-Improving 4D Perception via Self-Distillation
1 week ago
Cost-Efficient Multi-Scale Fovea for Semantic-Based Visual Search Attention
1 week ago
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
1 week ago
cs.LG updates on the arXiv.org e-print archive.
Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
1 week ago
CASE: Cadence-Aware Set Encoding for Large-Scale Next Basket Repurchase Recommendation
1 week ago
GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning
1 week ago
cs.CL updates on the arXiv.org e-print archive.
A systematic framework for generating novel experimental hypotheses from language models
1 week ago
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
1 week ago
TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving
1 week ago
cs.RO updates on the arXiv.org e-print archive.
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
1 week ago
"Why This Avoidance Maneuver?" Contrastive Explanations in Human-Supervised Maritime Autonomous Navigation
1 week ago
Spatio-Temporal Grounding of Large Language Models from Perception Streams
1 week ago
Making AI accessible to 100K+ learners. Find the most practical, hands-on and comprehensive AI Engineering and AI for Work certifications at academy.towardsai.net - we have pathways for any experience ...
21 Models in One Pipeline: What Actually Drives Knowledge Graph Quality
1 week ago
Why Temperature Matters for LLMs
1 week ago
Top 20 Anomaly Detection Interview Questions and Answers (Part 1 of 2)
1 week ago
stat.ML updates on the arXiv.org e-print archive.
LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers
1 week ago
Capturing Unseen Spatial Heat Extremes Through Dependence-Aware Generative Modeling
1 week ago
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
1 week ago
Community focused on running large language models locally. Covers llama.cpp, Ollama, quantization, and open-weight models.
Just bought a DGX Spark, what kind of VLMs are you guys running on this kind of hardware?
2 weeks ago
Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions (Part 2)
2 weeks ago
ATOM Report highlights the sheer dominance of Chinese labs in the Open-Source LLM space
2 weeks ago
A community blog devoted to refining the art of rationality
The Unintelligibility is Ours: Notes on Chain-of-Thought
1 week ago
Reproducing steering against evaluation awareness in a large open-weight model
1 week ago
Linear vs Non-linear Probes for Interpretability
1 week ago
cs.IR updates on the arXiv.org e-print archive.
Detecting RAG Advertisements Across Advertising Styles
1 week ago
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
1 week ago
DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation
1 week ago
Discussion forum for machine learning research, papers, projects, and career advice.
Free tool I built to score dataset quality (LQS) — feedback welcome [D]
2 weeks ago
[P] Building a LLM from scratch with Mary Shelley's "Frankenstein" (on Kaggle)
2 weeks ago
[R] Hybrid attention for small code models: 50x faster inference, but data scaling still dominates
2 weeks ago
cs.MA updates on the arXiv.org e-print archive.
Enhancing Clinical Trial Patient Matching through Knowledge Augmentation and Reasoning with Multi-Agent
1 week ago
Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization
1 week ago
From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
1 week ago
The most recent home feed on DEV Community.
Teaching Machines to Understand Documents with Docling
1 week ago
Why I debug my RAG pipeline stage by stage, not end to end
1 week ago
RedSOC: Open-source framework to benchmark adversarial attacks on AI-powered SOCs — 100% detection rate across 15 attack scenarios [paper + code]
1 week ago
Artificial Intelligence: News, Business, Research
LLMs crush coding and math but choke on casual questions, and that's not a contradiction
1 week ago
New Stanford study reveals when teaming up AI agents is worth the compute
1 week ago
Google's AI Overviews are correct nine out of ten times, study finds
2 weeks ago
Publish AI, ML & data-science insights to a global community of data professionals.
Why MLOps Retraining Schedules Fail — Models Don’t Forget, They Get Shocked
1 week ago
How Does AI Learn to See in 3D and Understand Space?
1 week ago
A Visual Explanation of Linear Regression
1 week ago
Rapid AI paper summaries and research news
Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts
1 week ago
A Coding Guide to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim
1 week ago
NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model
1 week ago
Community for deep learning practitioners covering neural networks, architectures, training techniques, and research papers.
Gemma 4 E4B enterprise benchmark — structured output, compliance, and reasoning results
2 weeks ago
Can AI ignore "Hospital Food" complaints to find a Brain Tumor? MANN-Engram Router
2 weeks ago
Cross-Validation Explained Visually | K-Fold, Stratified, LOOCV & Nested CV
2 weeks ago
AI Technology & Industry Review
Comment on DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design by Video to Text
1 week ago
Comment on Meta’s Sapiens: Revolutionizing Human Pose, Segmentation, and Depth Estimation with Vision Transformers by openskycc com
2 weeks ago
Comment on Microsoft’s Fully Pipelined Distributed Transformer Processes 16x Sequence Length with Extreme Hardware Efficiency by gin'gin li
2 weeks ago
BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.
2 weeks ago
Finally Abliterated Sarvam 30B and 105B!
2 weeks ago
Hugging Face contributes Safetensors to PyTorch Foundation to secure AI model execution
2 weeks ago
cs.NE updates on the arXiv.org e-print archive.
Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration
1 week ago
Multi-Modal Learning meets Genetic Programming: Analyzing Alignment in Latent Space Optimization
1 week ago
Internal noise in deep neural networks: interplay of depth, neuron number, and noise injection step
1 week ago
A recent study has found that LLMs are worse at giving accurate, truthful answers to people who have lower English proficiency and less formal education, rendering them more unreliable towards the most vulnerable users.
2 weeks ago
We are already in the early stages of recursive self improvement, which will eventually result in superintelligent AI that humans can't control - Roman Yampolskiy
2 weeks ago
Voice-based AI gets FDA breakthrough status for detecting heart failure in 5 seconds
2 weeks ago
Apple machine learning teams are engaged in state of the art research in machine learning and artificial intelligence. Learn about the latest advancements.
LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss
2 weeks ago
Beyond Real Data: Synthetic Data through the Lens of Regularization
3 weeks ago
Entropy-Preserving Reinforcement Learning
3 weeks ago
Humans with irrational brains writing about machines with rational brains.
The Intelligence Paradox: Why We're Building LLMs Wrong (And How to Fix It)
1 week ago
Bonsai-8B-gguf Shrinks an 8B Model to Just 1.15 GB
1 week ago
This 20B Search Model Helps AI Systems Find Better Evidence Faster
2 weeks ago
Top news and commentary for technology's leaders, from all around the web
The CIA says it recently used AI to create its first-ever autonomous intelligence report, and plans to build "AI co-workers" into all of its analytic platforms (John Sakellariadis/Politico)
1 week ago
Z.ai releases GLM-5.1, a 754B-parameter model that it says outperforms GPT-5.4 and Claude Opus 4.6 on SWE-bench Pro, available under an MIT license (Carl Franzen/VentureBeat)
2 weeks ago
Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6 (Michael Nuñez/VentureBeat)
2 weeks ago
Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting
2 weeks ago
ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text
1 month ago
Import AI 448: AI R&D; Bytedance’s CUDA-writing agent; on-device satellite AI
1 month ago
Latest technology news, AI breakthroughs, and electric vehicle developments from China's innovative tech landscape
PolyU and OPPO Propose Vision-Only Super-Resolution Framework VOSR
1 week ago
Alibaba Launches Qwen3.5-Omni Multimodal Model
3 weeks ago
SpatialPoint Integrates Depth as Core Input for Vision-Language Models
3 weeks ago
t3n digital pioneers - News
Benchmarks sollten diese 4 Punkte erfüllen – nur so können wir den Nutzen der KI in der Arbeitswelt beurteilen
2 weeks ago
Raven: Dieses KI-System entdeckt über 100 neue Exoplaneten in alten Nasa-Daten
2 weeks ago
KI-Modelle missachten Befehle, um sich gegenseitig vor der Abschaltung zu bewahren
2 weeks ago
Learn everything about Analytics
From Karpathy’s LLM Wiki to Graphify: AI Memory Layers are Here
1 week ago
LLM Wiki Revolution: How Andrej Karpathy’s Idea is Changing AI
2 weeks ago
Google’s Gemma 4: Is it the Best Open-Source Model of 2026?
2 weeks ago
Last week in Generative Image & Video
2 weeks ago
Help me find optimal hyper-parameters for Ultimate Stable Diffusion Upscale and complete my masters degree!
2 weeks ago
Inpainting with reference to LTX-2.3 (MR2V)
2 weeks ago
Academic experts explain AI developments in plain language, offering research-backed perspectives on artificial intelligence
AI can design and run thousands of lab experiments without human hands. Humanity isn’t ready for the new risks this brings to biology
1 week ago
Artificial intelligence and biology: AI’s potential for launching a novel era for health and medicine
2 weeks ago
AI is reengineering drug discovery by speeding up testing and scanning petabytes of data for connections between diseases
2 weeks ago