Editor's Note
This project started as a learning exercise — I wanted to build something real with Python, statistical modeling, data pipelines, and machine learning.
I decided to take my boring investment philosophy — buy and hold cheap index funds — and do something interesting with it. This tool asks one additional question: for extra capital on the sidelines, is now a particularly good or poor time to deploy it into that same, existing philosophy?
Phase 1 — The data pipeline
30 years of S&P 500 prices, 12 FRED macro series, Shiller's CAPE data back to 1881, and 48 of Buffett's annual letters — all flowing into a local DuckDB warehouse. The first real lesson: data engineering is mostly about dealing with what's missing, broken, or mislabeled.
Phase 2 — The first models (and their honest failures)
Three models predicting 63-day market direction. The best reached AUC 0.72 — but a Phase 3 audit revealed 7 of 15 features flip their predictive sign between regimes. The model was answering a trader's question, not an investor's. I got caught up in building something I thought was cool instead of something I truly believe in.
Phase 3 — The pivot
CAPE's documented predictive power is at 10-year horizons, not 63 days. We rebuilt around a 10-year real return regression and added two novel features: Berkshire's capital deployment posture from SEC filings, and Buffett's annual letters scored by an LLM on a −2 to +2 scale.
Where we landed
Backtest result: Oracle overlay vs baseline DCA = +7 basis points annualized. No meaningful improvement — and we report it honestly. The tool's value is behavioral: a disciplined framework that prevents panic-driven pauses and encourages faster deployment during genuinely attractive conditions.
"My advice to the trustee couldn't be more simple: Put 10% of the cash in short-term government bonds and 90% in a very low-cost S&P 500 index fund. (I suggest Vanguard's.)"
— Warren Buffett, 2013 Annual Letter to ShareholdersContinue regular S&P 500 index contributions on schedule. The signal below applies only to extra capital beyond your regular contributions.
Pick any date from January 1995 onward. See the signal at that moment — and what $5,000 deployed then would be worth today.
Honest limitations: ~100 non-overlapping 10-year windows in full history. Quintile calibration is in-sample and indicative. Backtest: Oracle vs DCA = +7 bps (no meaningful improvement). The tool never recommends selling, pausing contributions, or holding cash beyond 12 months.
Built by Carlos Portocarrero · ask.carlosportocarrero.com