Skip to main content
Natural Language Web protocol connecting websites to AI search agents
AI Search

What Is NLWeb? Microsoft's Natural Language Web Protocol

SwingIntel · AI Search Intelligence8 min read
Read by AI
0:00 / 7:04

Websites have always spoken HTML to browsers. NLWeb is the protocol designed to make them speak natural language to AI agents — and it could reshape how your content gets discovered, cited, and recommended.

Key Takeaways

  • NLWeb (Natural Language Web) is an open protocol from Microsoft that lets websites respond to natural language queries using Schema.org and JSON, functioning as a Model Context Protocol (MCP) server
  • Created by R.V. Guha — the same person behind RSS, RDF, and Schema.org — NLWeb is positioned as the "HTML of the agentic web"
  • The protocol ingests a site's existing structured data into a vector database, then serves natural language answers through a standardised /ask endpoint
  • NLWeb complements llms.txt and MCP rather than replacing them — each protocol serves a different layer of AI discoverability
  • Early adopters include Eventbrite, Shopify, Tripadvisor, and O'Reilly Media, signalling serious enterprise interest

How NLWeb Works: From Structured Data to Conversational AI

NLWeb operates on a straightforward principle: take the structured data your website already publishes — Schema.org markup, RSS feeds, JSONL — and make it queryable through natural language.

Here is the technical flow. NLWeb crawls your site and extracts Schema.org JSON-LD markup. That structured data gets loaded into a vector database, which represents content as mathematical vectors rather than keywords. When a user or AI agent sends a natural language query to the site's /ask endpoint, NLWeb combines vector search results with an LLM to generate a contextual, Schema.org-formatted JSON response.

Every NLWeb instance acts as a Model Context Protocol (MCP) server. This means any AI agent that supports MCP — including Claude, ChatGPT, and Copilot — can query your website directly through the same interface that human users access. The protocol supports three query modes: List (return matching items), Summarise (condense results), and Generate (create new content from the data).

What makes NLWeb practical is that it leverages data most websites already have. If your site has Schema.org markup on product pages, recipe listings, event schedules, or business profiles, NLWeb can ingest and serve that data conversationally. You do not need to restructure your entire site or build a custom API from scratch. The protocol meets you where you are.

Who Created NLWeb and Why It Matters

NLWeb was conceived by R.V. Guha, a Technical Fellow and Corporate Vice President at Microsoft. Guha's track record is hard to overstate: he created RSS, RDF, and Schema.org — three standards that fundamentally shaped how data is shared and structured across the web.

Microsoft introduced NLWeb in 2025 with a bold framing: NLWeb is to MCP and A2A (Agent-to-Agent protocol) what HTML is to HTTP. Just as HTML gave the web a universal document format, NLWeb aims to give the AI web a universal query-and-response format for websites.

The project is fully open source, with reference implementations in both Python and .NET 9 available on GitHub. Twelve organisations are already collaborating on the protocol, including Eventbrite, Shopify, Tripadvisor, O'Reilly Media, and Chicago Public Media. These are not experiments — production systems are being built on NLWeb today.

For business owners, NLWeb represents a fundamental shift in how websites participate in the AI ecosystem. Rather than waiting for AI crawlers to scrape your pages and hoping they interpret your content correctly, NLWeb lets you serve authoritative, structured answers directly to AI agents on your own terms. You control the data. You control the response. The AI agent gets exactly what you want it to have.

NLWeb vs llms.txt vs MCP: How the Protocols Fit Together

The emerging AI web runs on multiple protocols, and they serve different purposes. Understanding where NLWeb fits relative to llms.txt and MCP is essential for anyone building an AI discoverability strategy.

MCP (Model Context Protocol) provides the transport layer. It defines how AI agents connect to external tools and data sources. Think of MCP as the plumbing that enables AI agents to interact with the outside world — it handles the connection, but it does not define what data flows through it.

llms.txt is a static Markdown file that gives AI crawlers a curated, human-readable summary of your site's most important content. It is passive — you publish it, and AI systems consume it during training or retrieval. It is lightweight, easy to implement, and immediately useful for AI training pipelines.

We Test What AI Actually Says About Your Business

15 AI visibility checks. Instant score. No signup required.

NLWeb operates at the application layer, combining MCP as its transport with Schema.org as its data format to create live, queryable conversational interfaces. Unlike llms.txt, NLWeb is interactive — AI agents can ask questions and receive real-time, structured answers drawn directly from your site's data.

According to Search Engine Journal, these protocols together form the standards powering the agentic web. The practical takeaway: they are complementary, not competing. A well-optimised site in 2026 might publish an llms.txt file for AI training crawlers, implement Schema.org markup for both traditional search and NLWeb consumption, and expose an NLWeb endpoint for real-time AI agent queries — all running over MCP as the transport layer.

What NLWeb Means for Your AI Visibility Strategy

If your business depends on being found by AI search agents — and increasingly, every business does — NLWeb adds a new dimension to your visibility strategy.

Today, AI visibility depends on whether AI agents can find, understand, and cite your content. Most of that process is indirect: you publish content, hope crawlers index it, and trust that the AI models interpret it accurately. NLWeb makes the process direct. Your website becomes a queryable API that AI agents can interact with natively, getting authoritative answers straight from the source.

This matters because AI agents are becoming the primary interface between consumers and businesses. When someone asks ChatGPT "find me beach resorts in Thailand with kid-friendly activities" or "which project management tool has the best free tier," the answer increasingly comes from AI agents querying structured data — not from traditional search result pages.

The businesses that expose their data through protocols like NLWeb will have a structural advantage: AI agents can query them directly, get authoritative answers, and cite them with confidence. The businesses that do not will depend entirely on third-party interpretations of their content — interpretations they cannot control and may not even know are happening.

Right now, the foundation for NLWeb readiness is strong structured data and clear, AI-readable content. You can see how well AI agents currently discover and cite your website with a free AI readiness scan — it takes 30 seconds and requires no signup. For the complete picture, SwingIntel's AI Readiness Audit delivers expert research across 9 AI platforms, covering structured data quality, citation testing, and the technical discoverability signals that protocols like NLWeb will rely on.

Frequently Asked Questions

What is NLWeb in simple terms?

NLWeb (Natural Language Web) is an open protocol from Microsoft that allows websites to answer natural language questions from both AI agents and human users. Instead of AI crawlers scraping your pages and guessing what the content means, NLWeb lets your site serve structured, authoritative answers directly through a standardised API endpoint.

How is NLWeb different from llms.txt?

llms.txt is a static Markdown file that AI crawlers read passively during training or retrieval. NLWeb is an interactive protocol — AI agents send natural language queries, and your site responds with real-time, structured answers. They are complementary: llms.txt helps AI systems learn about your site, while NLWeb lets them query it live.

Do I need Schema.org markup for NLWeb to work?

NLWeb is designed to consume the structured data your site already publishes, primarily Schema.org JSON-LD markup. While it can also work with RSS feeds and JSONL data, having robust Schema.org markup gives NLWeb the richest data to work with. If your site lacks structured data, implementing it benefits both traditional search rankings and NLWeb readiness simultaneously.

Is NLWeb ready for production use?

NLWeb is open source with production-quality reference implementations in Python and .NET 9. Twelve organisations — including Eventbrite, Shopify, and Tripadvisor — are already building on the protocol. While adoption is still early, the protocol is mature enough for forward-looking businesses to evaluate and pilot today.

Who should care about NLWeb?

Any business that relies on digital discovery should pay attention. NLWeb is particularly relevant for sites with structured content — e-commerce catalogues, event listings, recipe collections, business directories, and media archives. If AI agents are becoming a discovery channel for your customers, NLWeb offers a way to serve them directly rather than relying on intermediaries.

ai-searchai-visibilitystructured-dataai-discoverability

More Articles

Website infrastructure connecting to AI agents through Microsoft's NLWeb protocol for natural language queriesAI Search

What Is NLWeb? The Protocol That Makes Websites Queryable by AI Agents

NLWeb turns your website into a natural language endpoint that AI agents can query directly. Learn what it is, how it works with Schema.org and MCP, and what it means for your brand's AI visibility.

9 min read
Marketing team reviewing answer engine optimization best practices for AI search visibility in 2026AI Search

AEO Best Practices Marketing Teams Can't Ignore in 2026

The answer engine optimization best practices every marketing team needs now — from answer-first content structure and schema markup to entity consistency and AI citation measurement. A practical AEO checklist for 2026.

13 min read
ChatGPT interface displaying AI-powered product recommendations for a shopping queryAI Search

ChatGPT Product Recommendations: How to Make Sure You Are One in 2026

ChatGPT processes 84 million shopping queries weekly with zero paid placements. Here is the complete playbook for making your product the one it recommends — structured data, authority signals, and the tactics that actually work.

7 min read
AI systems processing website content through an llms.txt file — a Markdown protocol that helps language models understand your site structureAI Search

Does Your Website Need an LLMs.txt File? Here's How to Create One

LLMs.txt gives AI agents a Markdown map of your website. Learn what it is, whether your site actually needs one, and how to create an llms.txt file step by step — with real examples and a template you can copy.

10 min read
Ecommerce brand being discovered by AI search engines through structured data and content authority signalsAI Search

How Ecommerce Brands Actually Get Discovered in AI Search

AI search engines discover ecommerce brands through structured data, content authority, merchant programs, and third-party validation — not keywords. Here's how the discovery pipeline actually works across ChatGPT, Perplexity, Gemini, and Google AI.

10 min read
Entity SEO and digital brand visibility in AI-powered search enginesAI Search

Entity SEO: Build Brand Visibility in AI Search

Entity SEO is how brands get cited by ChatGPT, Perplexity, and Google AI. Learn how to build your digital entity with structured data, knowledge graphs, and third-party signals.

13 min read

We Test What AI Actually Says About Your Business

15 AI visibility checks. Instant score. No signup required.