Issue #3 · May 24, 2026

The agent moved inside the suite.

The standalone AI product is losing this month. Anthropic shipped Claude inside Microsoft Word, Excel, and PowerPoint as a native add-in on May 7. SAP turned its application suite into 200+ agents at Sapphire on May 13. dbt made its semantic layer callable by any OAuth-enabled agent in its May release. Three vendors. Three different angles. One conclusion.

The AI destination — the standalone surface a user opens, types into, and shuts — is being demoted. The AI integration — the agent that lives inside the tool the user already had open — is taking the seat. The pattern crossed a threshold this month. If your AI feature is a new tab the user has to remember exists, you are now competing with three vendors who do not make them leave the surface they were on.

The implication for builders is sharper than the headline. Your AI feature’s moat is the surface it lives on, not the model it calls. Shelf-space is the budget line. The integration is the product. The standalone AI app you shipped last year is the line item the procurement team will quietly drop at renewal.

What’s actually moving this week

1. Claude went GA inside Microsoft Office. May 7. Excel, Word, and PowerPoint shipped as native add-ins. Outlook is in public beta. The headline feature is cross-app context — Claude remembers what happened in Outlook when the user opens Excel, and carries that through to PowerPoint. Copilot resets between apps. Anthropic is the first non-Microsoft AI vendor to ship as a native add-in across the full Office suite, and they did it riding Microsoft’s distribution rather than around it. The B2B AI vendor that controls the surface wins the seat.

2. SAP unveiled the Autonomous Suite at Sapphire. May 11–13 in Orlando. Autonomous Finance, Spend, SCM, HCM, and CX — 200+ agents and 50+ assistants shipping across the next two quarters. Underneath sits the consolidated SAP Business AI Platform (BTP + Business Data Cloud + AI Foundation, unified May 13) and the SAP Knowledge Graph as the semantic substrate. Same architectural move Tableau, Microsoft Fabric, and Databricks have made in the last six weeks. Different vendor. Same play. The semantic graph is the floor every agent in the suite stands on.

3. dbt shipped MCP OAuth, Skills, and Admin-API agent access. The May release made dbt Fusion self-serve in the platform with 30× faster parsing. The dbt MCP server now supports OAuth — Claude, ChatGPT, and Glean connect using existing dbt login, no token management. Skills are loadable structured-knowledge files agents pull on demand. Admin API responds to agents directly, so Claude or Cursor can troubleshoot a failed job, not just write the query that broke it. dbt’s pitch is no longer “we describe your data.” It’s “we are the substrate every agent calls.”

4. Databricks shipped the first context-engineer certification. May 19 — the Databricks Context Engineer Associate, the first industry credential for reliable AI agent systems rather than for prompt engineering. The framing matters more than the cert itself. Vendors are not just shipping agents anymore; they are now teaching the profession that builds them. “Context engineer” lands on a job description next quarter. The reliability discipline around agents — eval, retrieval, routing, observability — just got a credential. If your team still treats agent reliability as a model problem, the hiring market is going to outpace your roadmap.

5. HubSpot is deprecating an AI feature today. May 22 — the Segments Intro tab and the legacy AI Segment Suggestions go away. Existing segments keep working; the AI-powered filters move into the core builder. The story is the deprecation, not the replacement. Vendors are pruning the standalone AI surfaces that didn’t pull weight, absorbing the ones that did into the core flow they were supposed to enhance. Watch for more of this. The “AI assistant” tabs shipped in 2024–25 that never landed are quietly being sunset across the B2B SaaS stack.

What I’d ship in your app this week

Feature one: the embedded AI agent inside your largest integration partner. Pick the partner where your customers actually spend the most time — Slack for sales ops teams, Salesforce for revenue ops, Excel for finance, Teams for advertiser ops. Ship the AI feature there as a native plugin or add-in. Not in your own app. The user is already there. The seat your team is fighting to keep visited is the wrong seat.

  • Shape. Native plugin or add-in that calls your product's API. Retrieval grounded in the user's data inside your system. Single LLM call to render the answer or the action, with a one-click "open in our app" escape hatch.
  • Latency budget. 5 seconds, inline render. The user is mid-task in the partner's surface. They will not wait.
  • Cost ceiling. $0.04 per call, hard daily cap per user. Anything more and the unit economics on a seat-priced product break.
  • Eval. 200 golden examples written by senior account managers, scored on "would the user have opened our app to do this otherwise." That is the unit of value the feature has to clear.
  • Two weeks in. 15% of paid seats invoke the plugin at least once per week. If you don't see it, the install flow is wrong — not the feature.

Feature two: the eval harness, before the next AI feature ships. Treat the eval harness as the product. The Databricks cert tells you what the bar is becoming. Build the harness once. Make it the gate every AI feature has to clear before launch, and put the golden set in the repo next to the code.

  • Shape. 250 golden examples per feature, versioned in the repo. CI runs the eval on every PR that touches the feature path. Regression fails the build, not the customer.
  • Latency budget. 4 minutes per full run locally. 15 minutes in CI. Longer than that and engineers route around it.
  • Cost ceiling. $8 per CI run, tracked monthly. Model spend on evals is a real budget line now; treat it as one.
  • Eval-of-the-eval. Every quarter, 20% of the golden set gets re-graded by a senior reviewer to catch label drift.
  • Two weeks in. Every AI feature in flight has a passing baseline checked into the repo. Anything that doesn't gets paused, not shipped.

Both ship in two to three weeks with the team you already have. Both compound. The plugin earns shelf-space inside someone else’s app. The harness pays for itself the first time it catches a regression before a customer does.


Sources

Send me an email and we will talk. If something here landed close to what you're working on, the door is open. No calendar funnel, no pitch deck — I read every note that comes in.

Doing the work rather than deciding what to build? Crafting is the letter for that chair.

Email me Get next week's issue

← All Signal issues · Drafted with Claude · Edited by Paul Brown