cs.AI updates on the arXiv.org e-print archive.
TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
1 week ago
ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning
1 week ago
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
1 week ago
The most recent home feed on DEV Community.
Cert-gating every tool call: zero-trust for AI agents
1 week ago
Claude Code install and config for Ollama, llama.cpp, pricing
1 week ago
Hermes AI Assistant - Install, Setup, Workflow, and Troubleshooting
1 week ago
cs.CV updates on the arXiv.org e-print archive.
Self-Improving 4D Perception via Self-Distillation
1 week ago
UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding
1 week ago
Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
1 week ago
cs.LG updates on the arXiv.org e-print archive.
GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning
1 week ago
SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
1 week ago
Sensitivity-Positional Co-Localization in GQA Transformers
1 week ago
Making AI accessible to 100K+ learners. Find the most practical, hands-on and comprehensive AI Engineering and AI for Work certifications at academy.towardsai.net - we have pathways for any experience ...
21 Models in One Pipeline: What Actually Drives Knowledge Graph Quality
1 week ago
Your AI Assistant Is Lying to You — And It’s Not the AI’s Fault
1 week ago
I tested GLM-5.1 — it beat GPT-5.4 & Claude Opus 4.6 and is 7.8× cheaper.
1 week ago
cs.CL updates on the arXiv.org e-print archive.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
1 week ago
TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving
1 week ago
PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
1 week ago
Community focused on running large language models locally. Covers llama.cpp, Ollama, quantization, and open-weight models.
New TTS Model: VoxCPM2
2 weeks ago
Just bought a DGX Spark, what kind of VLMs are you guys running on this kind of hardware?
2 weeks ago
[AutoBe] Qwen 3.5-27B Just Built Complete Backends from Scratch — 100% Compilation, 25x Cheaper
2 weeks ago
Community for discussing Anthropic's Claude AI assistant, sharing prompts, use cases, and tips.
Claude Code hallucinated a Github username for tool usage
2 weeks ago
I used Claude to build a full networking protocol for AI agents. It’s now at 12K+ nodes across 19 countries.
2 weeks ago
I built an MCP that gives Claude Code its own servers to fix bugs in parallel
2 weeks ago
stat.ML updates on the arXiv.org e-print archive.
LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers
1 week ago
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
1 week ago
Differentially Private Language Generation and Identification in the Limit
1 week ago
Artificial Intelligence: News, Business, Research
LLMs crush coding and math but choke on casual questions, and that's not a contradiction
1 week ago
New Stanford study reveals when teaming up AI agents is worth the compute
1 week ago
Anthropic launches managed infrastructure for autonomous AI agents
1 week ago
Publish AI, ML & data-science insights to a global community of data professionals.
Why MLOps Retraining Schedules Fail — Models Don’t Forget, They Get Shocked
1 week ago
How Does AI Learn to See in 3D and Understand Space?
1 week ago
A Visual Explanation of Linear Regression
1 week ago
Discussion forum for machine learning research, papers, projects, and career advice.
Free tool I built to score dataset quality (LQS) — feedback welcome [D]
2 weeks ago
[P] Building a LLM from scratch with Mary Shelley's "Frankenstein" (on Kaggle)
2 weeks ago
[R] Hybrid attention for small code models: 50x faster inference, but data scaling still dominates
2 weeks ago
Community for the Ollama project — running LLMs locally, model management, and self-hosted AI.
Gemma 4 E2B and Qwen 3.5 2B on a Raspberry Pi 5 with Ollama — here's what each one is actually good for
2 weeks ago
I built a Free OpenSource CLI coding agent specifically for 8k context windows LLMs.
2 weeks ago
The Open Source AI Lie: Weight-Washing, Broken Definitions, and Who Benefits
2 weeks ago
Technology insight for the enterprise
AWS targets AI agent sprawl with new Bedrock Agent Registry
1 week ago
AI agents aren’t failing. The coordination layer is failing
1 week ago
Anthropic rolls out Claude Managed Agents
1 week ago
InfoQ AI, ML & Data Engineering feed
AAIF's MCP Dev Summit: Gateways, gRPC, and Observability Signal Protocol Hardening
1 week ago
Article: Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery
1 week ago
Google Brings MCP Support to Colab, Enabling Cloud Execution for AI Agents
1 week ago
Rapid AI paper summaries and research news
Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts
1 week ago
A Coding Guide to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim
1 week ago
NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model
1 week ago
Enterprise technology leadership news covering IT strategy, digital transformation, and CIO decision-making.
Understanding tokenization and consumption in LLMs
1 week ago
Q&A: Design principles for multi-environment AI architectures
3 weeks ago
NetSuite expands toolkit to ease enterprise use of third-party AI assistants with ERP data
3 weeks ago
cs.IR updates on the arXiv.org e-print archive.
DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation
1 week ago
ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment
1 week ago
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature
1 week ago
BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.
2 weeks ago
Finally Abliterated Sarvam 30B and 105B!
2 weeks ago
Hugging Face contributes Safetensors to PyTorch Foundation to secure AI model execution
2 weeks ago
Community for deep learning practitioners covering neural networks, architectures, training techniques, and research papers.
Gemma 4 E4B enterprise benchmark — structured output, compliance, and reasoning results
2 weeks ago
BREAKING : Anthropic announced Claude Managed Agents in public beta on Claude Platform!
2 weeks ago
Can AI ignore "Hospital Food" complaints to find a Brain Tumor? MANN-Engram Router
2 weeks ago
Stay updated with the latest news, research, and developments in the world of generative AI. We cover everything from AI model updates, comprehensive tutorials, and real-world applications to the broa ...
Code Got Faster. Everything Else Didn’t.
2 weeks ago
Conversating Agents for Portfolio Drift Analysis with Semantic Kernel
2 weeks ago
Modern RAG in 2026: The Components That Actually Matter
2 weeks ago
Had Claude review a popular ComfyUI node by Painter called "LongVideo" after a developer called it BS on discord. This is Claude's full review - "The node is essentially writing data into conditioning that nothing reads".
2 weeks ago
ComfyUI LTX Lora Trainer for 16GB VRAM
2 weeks ago
Black Forest Labs just released FLUX.2 Small Decoder: a faster, drop-in replacement for their standard decoder. ~1.4x faster, Lower peak VRAM - Compatible with all open FLUX.2 models
2 weeks ago
Learn everything about Analytics
From Karpathy’s LLM Wiki to Graphify: AI Memory Layers are Here
1 week ago
How to Run Gemma 4 on Your Phone Without Internet: A Hands-On Guide
2 weeks ago
Rethinking Enterprise Search: How Cortex Search Turns Data into Business Impact
2 weeks ago
Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice.
How to Build a Secure AI PR Reviewer with Claude, GitHub Actions, and JavaScript
1 week ago
CUDA Programming for NVIDIA H100s
1 week ago
How to Build Reliable AI Systems.
1 week ago
AI Technology & Industry Review
Comment on DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design by Video to Text
1 week ago
Comment on Meta’s Sapiens: Revolutionizing Human Pose, Segmentation, and Depth Estimation with Vision Transformers by openskycc com
2 weeks ago
Comment on Microsoft’s Fully Pipelined Distributed Transformer Processes 16x Sequence Length with Extreme Hardware Efficiency by gin'gin li
2 weeks ago
A community blog devoted to refining the art of rationality
A Fast and Loose Clustering of LLM Benchmarks
1 week ago
Exploring capability gated out-of-context reasoning
1 week ago
Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms
2 weeks ago
Agentic Infrastructure
1 week ago
Zero Data Retention on AI Gateway
2 weeks ago
Opus 4.6 Fast Mode available on AI Gateway
2 weeks ago
cs.MA updates on the arXiv.org e-print archive.
Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization
1 week ago
From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
1 week ago
"Theater of Mind" for LLMs: A Cognitive Architecture Based on Global Workspace Theory
1 week ago
Web Directions
Quantization from the ground up | ngrok blog
3 weeks ago
We Rewrote JSONata with AI in a Day, Saved $500K/Year | Reco
3 weeks ago
Using Git with coding agents – Agentic Engineering Patterns – Simon Willison’s Weblog
1 month ago
Latest technology news, AI breakthroughs, and electric vehicle developments from China's innovative tech landscape
Li Auto Unveils StreamingClaw, a Unified Agent Framework for Embodied AI
2 weeks ago
Alibaba Launches Qwen3.6-Plus, Enhancing Coding and AI Agent Capabilities
2 weeks ago
ByteDance Launches Standalone Version of TRAE AI Coding Tool “SOLO”
3 weeks ago