How to build your first AI agent in 2026 (and what MCP and context engineering actually mean)

July 6, 2026 · 18 min read

Software & AI Engineer · Independent Scrimba Reviewer

Last updated: July 6, 2026

How to build your first AI agent in 2026 with JavaScript, plus MCP and context engineering explained, Scrimba Guide Blog

To build your first AI agent, you hand a language model one tool it can call and a loop that lets it decide when to call it. That is the whole idea, and you can do it in JavaScript in an afternoon without touching Python. If you are a web developer who keeps landing on "AI agent" tutorials that open with a Python virtual environment and a model-training detour, this is the shorter route: one keyless API, about forty lines of code, and a plain explanation of the two terms the rest of the internet is being strange about, MCP and context engineering.

The short version: an agent is an LLM that can take actions

An AI agent is a language model that can take actions between your question and its answer, instead of only producing text. That is the entire distinction, and it is worth getting it from a primary source rather than a vendor landing page. Anthropic's definition is the clean one: agents are "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks" (Building effective agents, December 2024), as opposed to workflows, where the code path is fixed in advance.

In practice the difference comes down to a loop. A single prompt is one input and one reply. An agent gets a request, decides whether it needs a tool, calls that tool, reads what came back, and then decides what to do next, possibly calling another tool before it answers you. You wrote the tools and the stop condition. The model chose the order. That choosing is the part people mean when they say "agentic."

This is not a fringe interest anymore. Gartner named multiagent systems, collections of interacting AI agents, one of its top strategic technology trends for 2026 when it published the list in October 2025. On the hiring side, LinkedIn's Jobs on the Rise 2026 (January 2026) ranked AI Engineer the number one fastest-growing job in the US, and levels.fyi puts applied AI engineer total compensation in roughly the $134k to $193k band at mainstream employers as of mid-2026, with frontier labs pulling the top of the range far above that. The skill is real and it pays. The barrier to a first version is much lower than the job titles suggest.

What is MCP (Model Context Protocol), in plain terms

MCP is a shared standard for telling a model what tools and data it can use, so you write an integration once instead of rewriting it for every model. The closest familiar comparison is HTTP: every browser speaks it, so any browser can load any website without a custom integration per site. MCP aims to be that for tools, so an agent built against one model can reach the same tools when you swap to another.

The timeline is what makes it worth your attention rather than a passing acronym. Anthropic introduced the Model Context Protocol and open-sourced it on November 25, 2024, as an open standard for connecting models to tools and data. In March 2025 OpenAI adopted MCP, with Sam Altman saying support would roll out across OpenAI's products. Google followed in April 2025, with DeepMind CEO Demis Hassabis confirming Gemini would support MCP. Within about five months of launch, MCP had backing from all three major model vendors, which almost never happens that fast with a new standard.

Here is the honest part most explainers skip: you do not need MCP for your first agent. The build below uses one inline tool, defined right in your own code, and that is the right place to start. MCP earns its keep later, when you want the same tool reused across different models or exposed to other people's agents without rewriting the glue each time. When you reach that point, the Intro to Model Context Protocol course walks through exposing a tool over the protocol rather than hard-wiring it.

What context engineering actually means

Context engineering is deciding what information and tools to put in front of the model at each step of a task. I will say the slightly cynical thing first: as a named discipline it is partly a rebrand of something developers were already doing, which is choosing what goes in the context window and when. The term got popular in 2025. The underlying problem is older than the label.

That said, the problem it points at is real, and two of the companies pushing the term define it usefully. LangChain's working definition is "building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task" (The rise of context engineering, June 2025). Anthropic frames it as "the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference" (Effective context engineering for AI agents, September 2025), which is a fancier name for the old question of what you put in the context window and when.

The reason it matters more for agents than for plain prompts is the loop. A prompt is one input you write once and tune by hand. An agent accumulates context as it runs: your question, the tool calls, every result those calls returned, and its own intermediate reasoning. By the tenth step the window can be full of stale tool output that crowds out the thing the model needs to answer well. Context engineering is the work of keeping that window useful while the agent runs, and the Learn Context Engineering course is built around exactly that token-budget problem.

What you need before you build your first agent

You need three things: Node 20 or newer, an API key from one model provider, and the ability to call a fetch. That is genuinely the whole list.

For the toolkit, I used the Vercel AI SDK, which Vercel describes as "the leading TypeScript toolkit for building AI applications" and reported at over 20 million monthly downloads in its AI SDK 6 announcement (December 2025). It handles the loop, tool calling, and streaming so you are not hand-rolling the request format. The example here was tested with the ai package on its current major version (v7) on Node 20, in late June 2026. Version 6, released December 22, 2025, is where MCP support became stable if you go that route later; the plain-tool code below works the same on either.

The data source is the part where I can keep the "no API key friction" promise honestly. Open-Meteo needs no API key and no sign-up for non-commercial use, so the worked example is a plain fetch with nothing to configure. You still need one key for the model itself (OpenAI or Anthropic, your call), but the tool the agent calls has nothing to register, no billing dashboard, and no rate-limit page to read first. That is usually the step where beginners stall, and here it is gone.

Build your first AI agent: a worked example

Here is a working weather agent in about forty lines. It answers a question like "should I bring an umbrella in Berlin tomorrow?" by calling a real forecast API and reasoning over the result, instead of guessing from training data. Install the dependencies first:

npm init -y
npm install ai @ai-sdk/openai zod

Set "type": "module" in your package.json, export your OPENAI_API_KEY, then create agent.js:

import { generateText, tool, stepCountIs } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const getWeather = tool({
  description: 'Get the chance of rain tomorrow for a city',
  inputSchema: z.object({ city: z.string() }),
  execute: async ({ city }) => {
    const geo = await fetch(
      'https://geocoding-api.open-meteo.com/v1/search?count=1&name=' +
        encodeURIComponent(city),
    ).then((r) => r.json());
    const { latitude, longitude } = geo.results[0];

    const data = await fetch(
      'https://api.open-meteo.com/v1/forecast?forecast_days=2' +
        '&daily=precipitation_probability_max' +
        '&latitude=' + latitude + '&longitude=' + longitude,
    ).then((r) => r.json());

    return { city, rainChanceTomorrow: data.daily.precipitation_probability_max[1] };
  },
});

const result = await generateText({
  model: openai('gpt-4o-mini'),
  tools: { getWeather },
  stopWhen: stepCountIs(5),
  prompt: 'Should I bring an umbrella in Berlin tomorrow?',
});

console.log(result.text);

Run it with node agent.js. The model reads your question, sees it has a getWeather tool, calls it with city: "Berlin", the function hits Open-Meteo and returns a rain percentage, and the model writes an answer grounded in that number.

The one line that makes this an agent rather than a single call is stopWhen: stepCountIs(5). Without it, the model answers once from whatever it already knows. With it, the SDK runs the loop: call the tool, feed the result back, let the model decide whether it is done or needs to act again, up to five steps. That loop is Anthropic's "dynamically direct their own processes" definition, written out in code you can read. Swap the weather function for any API you already call at work, and the shape does not change.

Context engineering for agents that don't fall over

A toy agent works on the first question and starts to wobble on the tenth, and context engineering is the set of fixes for that wobble. The failure modes are predictable once you have built one. The model forgets something it was told early because newer tool output pushed it out of the window. It calls the same tool twice because it lost track of the first result. On a bad day it loops, burning tokens and your API budget, because nothing told it when to stop.

The fixes map directly onto the definitions above. Cap the steps, which the example already does with stepCountIs. Trim or summarize old turns before they crowd out the current task, instead of replaying the entire history every step. Pass the model only the tools a given step needs, not all twenty you eventually build, because a long tool list is its own kind of noise. None of this is exotic. It is bookkeeping about what the model sees at each step, which is the whole job that the "context engineering" label points at. The agents that stay reliable in production are the ones where someone did this bookkeeping deliberately.

From toy agent to a structured build

Once you have felt where the toy version strains, the forgetting, the runaway loops, the pain of wiring each new tool by hand, the next move is to learn the upgrades in the order a real project needs them: retrieval so the agent can pull from your own documents, MCP so tools are reusable instead of hard-wired, and a deploy so it runs somewhere other than your terminal.

That is the gap a guided sequence closes. Scrimba's AI Engineer Path runs about 11.4 hours, taught entirely in JavaScript and TypeScript (no PyTorch), covering agents, RAG, MCP, the Vercel AI SDK, embeddings, and a Cloudflare Workers deploy, with the MCP and Context Engineering modules added recently. I broke down the full module list and who it suits in the AI Engineer Path review, and there is a free intro scrim with Tom Chant so you can test the edit-the-code format before paying anything. If it clicks, Pro is a low monthly subscription (see current Scrimba pricing), and our link applies 20% off:

Start the AI Engineer Path with 20% off

I point you there for one reason: the path teaches MCP and context engineering as the named answers to the exact failures your forty-line agent just hit, in roughly the sequence a real build follows. For the deeper why-JavaScript argument, how to learn AI engineering on Scrimba covers the same ground from the curriculum side.

Common beginner mistakes when building AI agents

The mistakes are predictable, and most cost you a day each if you do not see them coming. The first is reaching for MCP or a heavyweight framework before a single inline tool works. Get one fetch-backed tool returning real data first; standardize and reuse it later. The second is shipping without a step cap, which is how a loop quietly runs your API bill up while you watch the terminal. The third is trusting tool output and model reasoning without reading either, the same almost-right trap that bites people pasting AI-written code: an agent that calls the wrong endpoint and confidently summarizes the wrong number looks exactly like one that worked.

The fourth is starting in Python because a tutorial told you that is where AI lives. It is true that 84% of developers now use or plan to use AI tools (Stack Overflow 2025 Developer Survey, the freshest published numbers), and plenty of them are JavaScript developers wiring agents into apps they already ship. If you can call an API, you are closer to this than the Python-first framing suggests. For a sense of which assistants actually help while you build, I keep a rundown of AI tools every developer should know.

So here is the concrete next step, and you can finish it today. Build the weather agent above, confirm it actually calls Open-Meteo, then change one thing: swap the weather function for an API you already use in your own work, a database lookup, an internal endpoint, anything that returns data the model cannot guess. When the model calls your tool and answers from the result, you have built an agent. Everything after that is making it reliable.

Frequently asked questions

How do I build my first AI agent in 2026? Give a language model one tool it can call, such as a function that fetches data, and a loop that lets it decide when to call that tool. The model reads your request, calls the tool, reads the result, and answers. In JavaScript with the Vercel AI SDK this is about forty lines of code, and you can finish a working agent in an afternoon.

What is MCP, the Model Context Protocol, in plain terms? MCP is a shared standard for telling a model what tools and data it is allowed to use, so you write an integration once instead of once per model. Anthropic introduced it on November 25, 2024, and within about five months OpenAI and Google had both adopted it, which is why it is worth learning once your agent needs more than one tool.

Do I need to learn Python to build an AI agent? No. If you can call an API with fetch in JavaScript, you can build a real agent. Python dominates model training and research, but the agents shipped inside web apps are mostly JavaScript and TypeScript, supported by the Vercel AI SDK and the official OpenAI and Anthropic JS clients. You only need Python if you move into training or research roles.

What does context engineering actually mean? Context engineering is deciding what information and tools to put in front of the model at each step of a multi-step task. A prompt is one input you write once. Context engineering manages the whole evolving context across the loop: what stays, what gets trimmed, and what the model still sees on the tenth turn rather than only the first.

What is the difference between an AI agent and a single prompt? A single prompt sends one input and gets one answer back. An agent can take actions between the question and the answer. Anthropic defines agents as systems where the model directs its own steps and tool use, as opposed to workflows where the code path is fixed in advance. The loop that lets the model decide its next step is what makes it an agent.

How long does it take to learn to build AI agents? A first working agent takes an afternoon. Getting reliable agents that handle retrieval, multiple tools, and deployment takes longer. Scrimba's AI Engineer Path covers that ground in 11.4 hours of interactive lessons, taught in JavaScript and TypeScript, which is a realistic estimate for the full on-ramp rather than the toy version.

The short version: an agent is an LLM that can take actions​

What is MCP (Model Context Protocol), in plain terms​

What context engineering actually means​

What you need before you build your first agent​

Build your first AI agent: a worked example​

Context engineering for agents that don't fall over​

From toy agent to a structured build​

Common beginner mistakes when building AI agents​

Frequently asked questions​

Related Guides

Learn AI Engineering on Scrimba