When the actor in your systems is an AI
Anthropic, Grab and Meta each shipped infrastructure work this week that, read together, sketches the early blueprint for running autonomous AI agents safely inside a real company.
3 papers
// plain-english deep dives into the research shaping the field
Prediction is turning into a commodity. The value is migrating to everything around it — calibration, execution, reusable representations, and the decision a forecast is actually meant to serve.
6 papers
Anthropic, Grab and Meta each shipped infrastructure work this week that, read together, sketches the early blueprint for running autonomous AI agents safely inside a real company.
3 papers
A maturity-aware graph model turns a tiny forecasting edge into a smooth, near-market-neutral return stream from commodity calendar spreads — correlation with the S&P 500 of about minus two-hundredths.
arXiv:2606.25811
The standard ways we grade a model's confidence can be not just unhelpful but actively misleading — and the fix is to stop grading the forecast and start grading the decision it enables.
arXiv:2606.26990
In fast markets a mid-price signal decays in microseconds, so latency is part of the strategy — not an implementation detail. "Best model" is meaningless without a time budget.
arXiv:2606.25986
Correctness, infrastructure, culture. Jane Street, Slack and The Guardian each pulled a different lever this month — and together they show where lasting advantage hides when demos get cheap.
3 papers
Three teams — Cloudflare, Netflix and Airbnb — on three faces of staying reliable at scale: how to undo a half-finished job, schedule a flood of work fairly, and push changes to thousands of servers without breaking any.
3 papers
Traditional weather uncertainty methods claim ninety percent confidence on rainfall but catch the truth only a third of the time. A distribution-free fix lands where forecasting begins.
arXiv:2606.27001
Point a pretrained time-series model at the stock market and it wins almost every contest. Look closer and the gaps are a fraction of a single bit — a ranking victory that carries no edge you could trade.
arXiv:2606.27100
A retailer may need a billion product-by-store forecasts a week. This method predicts only the smooth 0.3% of them, then splits the rest with loaded dice so everything adds up.
arXiv:2606.26774
A line of algebra can match forecasting architectures thousands of times its size — if you stop scaling the model and start tuning the preprocessing nobody bothers to touch.
arXiv:2606.27282
Language models shatter numbers into meaningless fragments before they ever reason about them. Fix that one interface — not the model — and forecasting accuracy jumps.
OpenReview preprint
A camera that physically follows a ball in flight, cancels its travel, and reads the spin off the surface — live, on the regular ball, at 750 measurements a second.
arXiv preprint
A popular way of training trajectory predictors quietly corrupts the probabilities a self-driving planner relies on. The fix needs no retraining — just a clearer view of what went wrong.
arXiv:2606.26424
Neural networks can beat the decades-old workhorses of bond forecasting — but this study insists on the harder test, asking whether a better forecast actually makes a better trade.
arXiv:2606.26815
A machine-learning system can be genuinely right about Bitcoin's next move and still lose almost everything — because the cost of acting on each small correct call exceeds the call itself.
arXiv:2606.00060
One reusable embedding of every on-ball event, learned BERT-style with the player names stripped out — and reused for expected goals, action value, and scouting.
arXiv:2606.09327
A sixty-year-old chess-rating trick turns out to be a special case of a clean statistical principle — one that extends naturally to draws, scorelines, and whole-field rankings, with fairness baked in.
arXiv:2604.09143
Today's AI coding agents close ten-minute tickets with ease. Give them a forty-hour project — port Kubernetes, clone Slack — and the best of them fail seven times out of ten.
arXiv:2606.07682
AI weather models are brilliant on average and quietly overconfident at the tails. A cheap, distribution-free correction makes their stated uncertainty match reality.
arXiv:2606.19642
From AIs that talk to each other in shorthand to a four-line fix after a six-week bug hunt — nine recent reads, and the threads that tie them together.
9 papers
For decades feeds answered one question — of everything that exists, what should we show you? This vision argues the next question is what should we make for you.
arXiv:2304.03516
If the reader is another AI, why write in tidy English at all? A new method compresses text into dense, alien symbols that humans can't read but models still understand.
arXiv:2606.19857
Skip HTML parsing entirely — retrieve and read the web as screenshots, in pixel space, and beat text-based retrieval even on text-only questions.
PixelRAG paper (GitHub)
An AI agent follows a written rule most of the time — but "most of the time" is exactly where reliability dies. The craft is knowing which rules to prompt and which to enforce in code.
Anthropic