Vercel eve: agents as services, not conversations

@marcoscamara01|June 21, 2026 (3d ago)0 views

The problem with many agents is not that they cannot reason. It is that they do not live anywhere reliable.

An agent that only exists as a prompt, a local script, or a chat demo works for five minutes. The hard part starts when it has to remember task state, wait for approval, call GitHub, survive a deploy, answer from Slack, and leave traces someone can audit.

That is where eve fits. Its thesis is blunt: an AI agent is a durable, deployable backend application connected to real channels, not a script or a chat demo. eve is Vercel's open source framework for building exactly that.

This is not an installation guide. It is a technical read of eve based on the official documentation, the open source repo, the Vercel Labs templates, and Reprokit, a side project I built on top of it.

#1. What eve is

eve is a filesystem-first framework for building AI agents as durable backend applications. The base idea is simple: an agent is a directory. Instead of hiding behavior inside one large code block or a long prompt chain, eve uses file conventions under agent/.

Vercel's landing page sums it up as "Markdown for instructions and skills, TypeScript for tools". The Vercel Knowledge Base and the official vercel/eve repo say the same thing differently: agent capabilities live in conventional locations so the project is easier to inspect, extend, and operate.

So eve is not another wrapper around a model call. It packages agent behavior as an application, with pieces a team can review as code:

agent.ts to choose the model and configure the runtime.
instructions.md for persistent agent rules.
tools/ for TypeScript functions the model can call.
skills/ for Markdown procedures the agent loads on demand.
channels/ to connect the same agent to HTTP, Slack, GitHub, Discord, or Linear.
connections/ to reach external services through MCP or OpenAPI with managed auth.
sandbox/, subagents, scheduled tasks, hooks, evaluations, and durable state for more demanding flows.

A chatbot starts from the interface. eve starts from behavior and from the infrastructure needed to run that behavior. That difference is what turns an agent into something you can discuss in a pull request, not only something you can demo.

#2. The problem eve solves

For a long time, "building an agent" has meant gluing together three things: a prompt, a model call, and an improvised tool. That is fine for prototypes, but it breaks once the agent has to live beyond one HTTP request or one chat session.

eve moves the center of gravity. The jump is not from "short prompt" to "longer prompt". It is from prompt + API call to deployable agent system. The official introduction lists durable execution, isolated compute, human approvals, subagents, and evaluations, and that list maps onto the problems that appear when an agent touches production:

durability to pause and resume work without keeping compute active;
channels so it is not locked to one chat UI;
typed tools so actions are auditable and testable;
skills so procedural knowledge does not pile up in the base prompt;
human-in-the-loop approval for expensive, sensitive, or irreversible actions;
deployment and observability so a team can operate it.

One caveat, and only one: eve is in public preview (beta, per the docs), announced on June 17, 2026. The surface is still moving, so for anything serious I would pin the version, read the changelog before upgrading, and cover critical flows with evaluations.

My read is that eve does for agents what web frameworks did for routes, compilation, data, and deployment: it turns repeated architectural decisions into a conventional surface. An agent does not only fail because the model is wrong. It fails because nobody knows which prompt version was active, which tool ran, whether the approval came from the right person, or whether the same webhook arrived twice. eve gives each of those an explicit place, which changes the question from "what prompt should I use?" to "what operational contract does this agent have?".

#3. The mental model

eve turns a file tree into an executable agent. You do not write a giant orchestration graph; you declare the pieces, and the runtime discovers, compiles, and connects them.

A minimal structure looks like this:

agent/
├── agent.ts
├── instructions.md
├── tools/
│   └── get_weather.ts
├── skills/
│   └── investigate.md
├── channels/
│   └── slack.ts
├── connections/
│   └── linear.ts
├── subagents/
│   └── researcher/
└── schedules/
    └── monday_summary.ts

The convention is deliberate. tools/get_weather.ts becomes a tool named get_weather. skills/investigate.md is loaded when the model needs it. channels/slack.ts adapts Slack events to the same agent. agent.ts sets the model and runtime config. That separation — instructions for stable behavior, tools for actions, skills for procedures, channels for input and output, sandbox for isolated work — is worth more than a longer prompt once the agent grows.

The runtime pipeline reads like this:

agent/
  → discovery + compile
  → manifest + module map
  → durable runtime
       ├─ sessions and turns
       ├─ tools and skills
       ├─ channels: HTTP, Slack, GitHub, Discord, Linear
       ├─ sandbox and human-in-the-loop
       └─ deployment on Vercel

The core is sessions and turns. A session is the durable conversation or task; a turn is a message and the work it triggers. eve runs sessions on top of Vercel Workflows, persisting progress as an event log. On replay, finished work is reused and work interrupted halfway is re-executed.

That single fact drives the most important rule in eve, and it is worth stating once and clearly: any tool with an external effect — sending an email, charging money, deleting data, opening a PR — must be idempotent or gated behind approval, because replay means it can run again. Designing agents in eve is therefore less about "what should I tell the model?" and more about which parts are instructions, which are actions, which run in the sandbox, which channel starts the work, and which decisions require a person.

#4. What the landing page does not show

There is a reading of eve that is easy to miss: it does not only organize an agent, it pushes you to name what each part actually is.

In a prototype, instructions, permissions, execution, memory, business logic, API access, and human decisions all blur together. In eve, a skill teaches a procedure or a team convention, and a tool crosses into the outside world — it calls an API, writes a file, comments on GitHub. That keeps "knowledge" and "capability" from collapsing into the same thing.

A channel is more than a Slack or GitHub integration. It normalizes input, decides how responses are delivered, and keeps the continuationToken that lets a task resume. If you need strict ordering between messages, let the agent settle — it parks while waiting — before sending the next one.

The most useful part is the review surface. An eve agent is a set of files that answers concrete questions:

What can it actually do?
What persistent instructions does it have?
Which actions require approval?
Which secrets reach the runtime, and which never enter the sandbox?
Which channel can start work, and with what auth?
Which scheduled tasks exist?

That list is more useful than an abstract "safe agent" label. It gives concrete audit points and pushes agents toward reviewable software.

#5. Where eve makes sense — and where it does not

The Vercel Labs templates are a good signal of the intended use cases: the Slack agent (a minimal agent with tools and skills), the content agent (Slack, Notion, Vercel Blob), the personal agent (web chat, Slack, iMessage, Linear, user-approved memory), the PR triage agent (GitHub and diffs), and the chat template (a persisted Next.js interface).

Product shapes that fit:

Internal agents that query systems like Linear, Notion, GitHub, or Datadog.
Support automation that gathers context, drafts a response, and waits for review.
Engineering agents that triage PRs, issues, incidents, or regressions.
Workflows where a person approves sensitive steps before execution.
Agent backends that serve a web UI, Slack, and webhooks without duplicating logic.

eve is not free, though, and a few costs stay visible:

Platform dependency. eve leans on Vercel Workflows, Vercel Sandbox, AI Gateway, Vercel Connect, and Vercel observability. An advantage if you already live there; an architectural commitment if you do not.
Initial complexity. For a small automation, agent/, channels, tools, skills, sandbox, and evaluations are too much. A script or a simple queue is better.
Security is not automatic. Route auth is not user, tenant, or session authorization. If several users share one agent, you still design ownership and permissions.
Uneven channels. I only exercised GitHub; I would expect maturity and ergonomics to vary across Slack, Linear, Discord, and Teams.

So the rule of thumb: reach for eve when the agent already needs durability, channels, human review, or repeatable deployment. Not before. Below that line, the conventions are weight without payoff; above it, they start to earn their keep.

#6. Reprokit: a project built on eve

To ground the idea in code, I built Reprokit, a project that turns GitHub issues into verifiable reproductions, reviewed fixes, and PRs opened only after checks pass. Its core commands — /repro, /fix, /compare, /stop — live as issue or CLI commands, and the agent uses tools to read issues, prepare an isolated checkout, run external workers, generate reports, run checks, and open PRs.

I want to be honest about scope: Reprokit adopts eve's file conventions and uses eve as an orchestrator, but it keeps an independent webhook path for the simpler flow. So read it as an experiment in the architecture eve encourages, not as a stress test of the durable runtime.

The interesting part is the safety policy around the flow, not the bug-fixing itself:

/repro never modifies code, creates branches, or opens PRs.
/fix requires an explicit human signal in the issue.
There is no auto-merge and no automatic deployment.
Each run uses an isolated checkout under .runs/issue-<number>/.
Logs are redacted before they are published.
PRs stay reviewable by humans.

That policy points at a transferable pattern: separate orchestrating from doing the dangerous work. For real agents the safer flow is rarely "do the task" — it is understand the context, prepare an isolated environment, generate evidence, request approval when something meaningful changes, execute, verify, and leave a reviewable artifact. eve fits that sequence because it separates instructions, tools, skills, channels, and durable state; Reprokit just applies it to GitHub bugs.

You do not need a full deployment to try it. Clone the repo, install dependencies, and run npm run typecheck and npm test to exercise the parsers, reports, and policies offline; the eve mode and the GitHub webhook need real credentials and a public HTTPS URL on top of that. Testing in layers like this is the point — compilation and policy first, real channels later.

#7. Where this goes

eve's strength shows up late, not on day one. For a small demo it can look like too much: folders, compilation, channels, runtime, auth, sandbox, evaluations. The real question is day thirty, when the agent touches real data, receives messages through more than one entry point, and has to explain why it did something.

A maintainable agent needs boundaries, state, typed tools, repeatable deployment, observability, a security policy, and a way to ask a human for help. eve does not answer all of that for you; it gives those answers a place to live. It is still early to know how the approach ages across teams, stricter security requirements, and version migrations. But the shift it pushes is already useful: from agents as conversations to agents as services — software you can deploy, review, observe, and constrain.