Websites have always spoken HTML to browsers. NLWeb is the protocol designed to make them speak natural language to AI agents — and it could reshape how your content gets discovered, cited, and recommended.
Key Takeaways
- NLWeb (Natural Language Web) is an open protocol from Microsoft that lets websites respond to natural language queries using Schema.org and JSON, functioning as a Model Context Protocol (MCP) server
- Created by R.V. Guha — the same person behind RSS, RDF, and Schema.org — NLWeb is positioned as the "HTML of the agentic web"
- The protocol ingests a site's existing structured data into a vector database, then serves natural language answers through a standardised /ask endpoint
- NLWeb complements llms.txt and MCP rather than replacing them — each protocol serves a different layer of AI discoverability
- Early adopters include Eventbrite, Shopify, Tripadvisor, and O'Reilly Media, signalling serious enterprise interest
How NLWeb Works: From Structured Data to Conversational AI
NLWeb operates on a straightforward principle: take the structured data your website already publishes — Schema.org markup, RSS feeds, JSONL — and make it queryable through natural language.
Here is the technical flow. NLWeb crawls your site and extracts Schema.org JSON-LD markup. That structured data gets loaded into a vector database, which represents content as mathematical vectors rather than keywords. When a user or AI agent sends a natural language query to the site's /ask endpoint, NLWeb combines vector search results with an LLM to generate a contextual, Schema.org-formatted JSON response.
Every NLWeb instance acts as a Model Context Protocol (MCP) server. This means any AI agent that supports MCP — including Claude, ChatGPT, and Copilot — can query your website directly through the same interface that human users access. The protocol supports three query modes: List (return matching items), Summarise (condense results), and Generate (create new content from the data).
What makes NLWeb practical is that it leverages data most websites already have. If your site has Schema.org markup on product pages, recipe listings, event schedules, or business profiles, NLWeb can ingest and serve that data conversationally. You do not need to restructure your entire site or build a custom API from scratch. The protocol meets you where you are.
Who Created NLWeb and Why It Matters
NLWeb was conceived by R.V. Guha, a Technical Fellow and Corporate Vice President at Microsoft. Guha's track record is hard to overstate: he created RSS, RDF, and Schema.org — three standards that fundamentally shaped how data is shared and structured across the web.
Microsoft introduced NLWeb in 2025 with a bold framing: NLWeb is to MCP and A2A (Agent-to-Agent protocol) what HTML is to HTTP. Just as HTML gave the web a universal document format, NLWeb aims to give the AI web a universal query-and-response format for websites.
The project is fully open source, with reference implementations in both Python and .NET 9 available on GitHub. Twelve organisations are already collaborating on the protocol, including Eventbrite, Shopify, Tripadvisor, O'Reilly Media, and Chicago Public Media. These are not experiments — production systems are being built on NLWeb today.
For business owners, NLWeb represents a fundamental shift in how websites participate in the AI ecosystem. Rather than waiting for AI crawlers to scrape your pages and hoping they interpret your content correctly, NLWeb lets you serve authoritative, structured answers directly to AI agents on your own terms. You control the data. You control the response. The AI agent gets exactly what you want it to have.
NLWeb vs llms.txt vs MCP: How the Protocols Fit Together
The emerging AI web runs on multiple protocols, and they serve different purposes. Understanding where NLWeb fits relative to llms.txt and MCP is essential for anyone building an AI discoverability strategy.
MCP (Model Context Protocol) provides the transport layer. It defines how AI agents connect to external tools and data sources. Think of MCP as the plumbing that enables AI agents to interact with the outside world — it handles the connection, but it does not define what data flows through it.
llms.txt is a static Markdown file that gives AI crawlers a curated, human-readable summary of your site's most important content. It is passive — you publish it, and AI systems consume it during training or retrieval. It is lightweight, easy to implement, and immediately useful for AI training pipelines.
NLWeb operates at the application layer, combining MCP as its transport with Schema.org as its data format to create live, queryable conversational interfaces. Unlike llms.txt, NLWeb is interactive — AI agents can ask questions and receive real-time, structured answers drawn directly from your site's data.
According to Search Engine Journal, these protocols together form the standards powering the agentic web. The practical takeaway: they are complementary, not competing. A well-optimised site in 2026 might publish an llms.txt file for AI training crawlers, implement Schema.org markup for both traditional search and NLWeb consumption, and expose an NLWeb endpoint for real-time AI agent queries — all running over MCP as the transport layer.
What NLWeb Means for Your AI Visibility Strategy
If your business depends on being found by AI search agents — and increasingly, every business does — NLWeb adds a new dimension to your visibility strategy.
Today, AI visibility depends on whether AI agents can find, understand, and cite your content. Most of that process is indirect: you publish content, hope crawlers index it, and trust that the AI models interpret it accurately. NLWeb makes the process direct. Your website becomes a queryable API that AI agents can interact with natively, getting authoritative answers straight from the source.
This matters because AI agents are becoming the primary interface between consumers and businesses. When someone asks ChatGPT "find me beach resorts in Thailand with kid-friendly activities" or "which project management tool has the best free tier," the answer increasingly comes from AI agents querying structured data — not from traditional search result pages.
The businesses that expose their data through protocols like NLWeb will have a structural advantage: AI agents can query them directly, get authoritative answers, and cite them with confidence. The businesses that do not will depend entirely on third-party interpretations of their content — interpretations they cannot control and may not even know are happening.
Right now, the foundation for NLWeb readiness is strong structured data and clear, AI-readable content. You can see how well AI agents currently discover and cite your website with a free AI readiness scan — it takes 30 seconds and requires no signup. For the complete picture, SwingIntel's AI Readiness Audit delivers expert research across 9 AI platforms, covering structured data quality, citation testing, and the technical discoverability signals that protocols like NLWeb will rely on.
Frequently Asked Questions
What is NLWeb in simple terms?
NLWeb (Natural Language Web) is an open protocol from Microsoft that allows websites to answer natural language questions from both AI agents and human users. Instead of AI crawlers scraping your pages and guessing what the content means, NLWeb lets your site serve structured, authoritative answers directly through a standardised API endpoint.
How is NLWeb different from llms.txt?
llms.txt is a static Markdown file that AI crawlers read passively during training or retrieval. NLWeb is an interactive protocol — AI agents send natural language queries, and your site responds with real-time, structured answers. They are complementary: llms.txt helps AI systems learn about your site, while NLWeb lets them query it live.
Do I need Schema.org markup for NLWeb to work?
NLWeb is designed to consume the structured data your site already publishes, primarily Schema.org JSON-LD markup. While it can also work with RSS feeds and JSONL data, having robust Schema.org markup gives NLWeb the richest data to work with. If your site lacks structured data, implementing it benefits both traditional search rankings and NLWeb readiness simultaneously.
Is NLWeb ready for production use?
NLWeb is open source with production-quality reference implementations in Python and .NET 9. Twelve organisations — including Eventbrite, Shopify, and Tripadvisor — are already building on the protocol. While adoption is still early, the protocol is mature enough for forward-looking businesses to evaluate and pilot today.
Who should care about NLWeb?
Any business that relies on digital discovery should pay attention. NLWeb is particularly relevant for sites with structured content — e-commerce catalogues, event listings, recipe collections, business directories, and media archives. If AI agents are becoming a discovery channel for your customers, NLWeb offers a way to serve them directly rather than relying on intermediaries.






