Teaching My Analyst to Remember

So I've been running Pulse 3-4 times a day for the past couple days and the thing that kept bugging me wasn't what it got wrong. It was what it kept re-discovering. Every scan it would come back and tell me, with great confidence, that the Strait of Hormuz was closed. Yeah man, I know. You told me that yesterday. And the day before. It's in the confirmed facts file that I built specifically so you'd stop doing this.

One problem was that all the context it needed was sitting in data files on disk, but the prompts themselves were (stupidly) partially frozen. I had included some hardcoded strings with stale reference values from March 6th. Gold at "$5,097 approximately." The S&P at 6,740. Those numbers were already days old and the model was using them as its sanity check for whether new data made sense. Which meant sometimes the new data looked wrong to it when it was actually right.

The other problem was, when you send an AI to search the web for "what happened in the last 24 hours," it comes back with a mix of things. Some of it is genuinely new. Some of it is a news article published today that's summarizing something that happened two days ago. And the model treats it all the same. Today's article about Monday's NATO intercept gets reported as breaking news on Wednesday. My audience (me and like 4 friends) checks this thing multiple times a day. They notice.

So I/claude rebuilt the prompt infrastructure. Three changes that sound boring but matter a lot.

First: the searches. I had four web searches that each covered two domains mashed together. Conflict and humanitarian in one query. Markets and second-order effects in another. That was a rate-limiting workaround from the early build. But it meant the humanitarian stuff was always getting crowded out by military headlines, and domestic politics was an afterthought jammed into the diplomacy search. I split it into six dedicated searches, one per domain. Each one gets its own focused query. Humanitarian gets to be about casualties and infrastructure damage without competing with missile launch rates for attention.

The trick if you can even call it that was keeping the total token budget roughly the same. Six searches instead of four, but each one does two web lookups instead of three and returns shorter results. Same twelve total web searches, similar cost. Just better organized.

Second: source quality. The old prompts said "prioritize wire service reports" but that was just a suggestion in the search query, and Haiku (the cheap model doing the searching) would happily mix a Reuters dispatch with some random blog post and present them side by side. Now each search query explicitly says to prioritize wire services and official statements, and to flag tabloid or opinion sources. The synthesis model (Opus) already knew to weigh sources differently, but it was working with a jumble of stuff. If you separate the Reuters from the commentary before it even gets to the analyst, the analyst doesn't have to do that work. It can just analyze.

Third, and this is the main one i wanted to improve is the feedback loop. Every scan produces "watch items" — things like "watch for the G7 SPR release decision on Tuesday" or "watch whether Turkey invokes Article 5." Previously those just showed up on the dashboard and then vanished when the next scan ran. Now they get fed back into the search queries. So when Tuesday comes around, the energy search specifically includes "check if the G7 announced a coordinated SPR release." The system is telling its future self what to look for.

It also now pulls the latest market data from Yahoo Finance (which I fetch separately) and uses those real numbers as the indicator sanity check instead of frozen values. So when Haiku comes back claiming gold is $2,900 (a 2024 number), the synthesis prompt knows gold was $5,199 as of this morning and can reject that.

Oh, and I fixed prediction ID collisions. The model kept generating predictions starting at pred-001 and they'd silently collide with existing ones and get dropped. Now it knows the highest existing ID and starts from the next one. Small thing, but predictions are the part of this I find most interesting, so I want them all captured. There may still be too many

Is any of this going to make the analysis dramatically better? I don't know. Probably not. But it should stop the model from rediscovering old news and hallucinating market numbers, which were the two things that made me not trust it. And the watch item feedback loop is the kind of thing where you won't notice it immediately but over a few days of scans the coverage should get more targeted. The system learns what matters.

Basically if you just let AI do the work you still get a bunch of problems... I think most vibe-coded things are pretty lame. But the problem isn't reall the AI so much as the prompts, the context management, the state that persists between calls. The model is smart enough but it really is the prompt that matters. Also I noticed that Pulse looks like about 3 other sites I read today.