What Is NLWeb? The Protocol That Makes Websites Queryable by AI Agents

The web was built for humans navigating pages. The agentic web is being built for AI agents querying endpoints. NLWeb is the protocol bridging those two worlds — and it may be the most significant shift in web architecture since HTML itself.

Developed by R.V. Guha — the creator of RSS, RDF, and Schema.org — NLWeb is an open-source project from Microsoft that transforms any website into a natural language interface. Instead of requiring users to click through navigation menus and search filters, NLWeb enables both humans and AI agents to query your website's content directly using plain language and receive structured, verified answers.

The implications for AI visibility are enormous. Here is what NLWeb is, how it works, and what your business needs to do about it.

Key Takeaways

NLWeb is an open-source Microsoft project that turns websites into AI-queryable endpoints using Schema.org structured data — every NLWeb instance automatically functions as a Model Context Protocol (MCP) server.
The protocol was created by R.V. Guha, who also created RSS, RDF, and Schema.org, and positions NLWeb as playing "a similar role to HTML in the emerging agentic web."
Unlike traditional LLM citations that generate answers then find sources, NLWeb retrieves verified objects directly from your structured data and presents them as natural language — reducing hallucination risk and giving publishers greater control.
Early adopters include Eventbrite, Shopify, Tripadvisor, O'Reilly Media, and Hearst, with Yoast building WordPress integration via Schema Aggregation.
Content with proper Schema.org markup has a 2.5x higher chance of appearing in AI-generated answers — NLWeb makes that structured data directly queryable by AI agents.

What NLWeb Actually Does

At its core, NLWeb converts your website's structured data into a conversational interface. When an AI agent — or a human — asks your website a question, NLWeb processes the query against your Schema.org markup, RSS feeds, and other semi-structured data, then returns a natural language response grounded in verified, published information.

This is fundamentally different from how AI search works today. Current AI platforms like ChatGPT and Perplexity crawl your site, ingest the content into their models, and generate answers that may or may not accurately represent what you actually published. NLWeb inverts that process. Instead of the AI inventing answers and then looking for sources, it pulls verified objects directly from your website's structured data and presents them in natural language.

The practical difference: you retain far more control over how your brand is represented in AI-generated answers.

The Architecture Behind NLWeb

NLWeb consists of five core modules:

AskAgent — the central query processor that handles natural language questions against your Schema.org data
AgentFinder — a discovery service that helps AI agents locate NLWeb instances across the web
DataFinder — translates natural language queries into structured database requests for enterprise systems
ModelRouter — intelligently selects which LLM to use based on cost and quality thresholds
NLWebScorer — neural ranking models that evaluate search result relevance

The system supports all major LLMs (OpenAI, Anthropic, Gemini, DeepSeek), multiple vector databases (Qdrant, Elasticsearch, PostgreSQL, Azure AI Search), and runs on any operating system. It is MIT-licensed with over 6,000 stars on GitHub.

NLWeb and the Model Context Protocol

Every NLWeb instance automatically functions as an MCP (Model Context Protocol) server. This is the detail that makes NLWeb strategically important rather than just technically interesting.

MCP, created by Anthropic, has become the universal standard for connecting AI applications to external tools and data sources — reaching 97 million monthly SDK downloads within its first year, with adoption from OpenAI, Google, and Microsoft. By making every NLWeb site an MCP server, Microsoft ensures that any website running NLWeb immediately becomes accessible to the entire ecosystem of MCP-compatible AI assistants and agents.

In practical terms: when a customer asks ChatGPT, Claude, or any MCP-compatible agent a question that your website can answer, NLWeb provides the standardised pathway for that agent to query your content directly — not through a search engine intermediary, but from your own data.

NLWeb vs. llms.txt: Different Problems, Different Solutions

If you have been following the llms.txt conversation, you might wonder how NLWeb compares. They address different layers of the same challenge.

	NLWeb	llms.txt
What it does	Dynamic conversational endpoint	Static Markdown file
Data format	Schema.org JSON-LD	Markdown with links
Interaction model	AI agents query your site in real time	AI agents read a file at crawl time
Content control	Responses grounded in your structured data	Curated table of contents
Adoption	Connectors for all major LLM platforms	844,000+ implementations but no confirmed LLM usage as a ranking signal

The two protocols are complementary. An llms.txt file helps AI crawlers understand your site structure. NLWeb enables AI agents to query your content in real time. One is a signpost. The other is a conversation.

Why Structured Data Is Now Your Entry Ticket

We Test What AI Actually Says About Your Business

15 AI visibility checks. Instant score. No signup required.

NLWeb does not work without structured data. Since the protocol relies entirely on crawling and extracting Schema.org markup, the precision, completeness, and interconnectedness of your site's structured data determine whether NLWeb can surface your content at all.

This changes the calculus for every business investing in AI visibility. Structured data was already important for rich results in traditional search and for AI citation likelihood. NLWeb raises the stakes further — your Schema.org implementation is no longer just a visibility tactic. It is the fundamental infrastructure that determines whether AI agents can interact with your website at all.

As Search Engine Land puts it: "Robust, entity-first schema optimization is no longer just a way to win a rich result; it is the fundamental barrier to entry for the agentic web."

Who Is Already Using NLWeb

Microsoft launched NLWeb with twelve early adopters spanning publishing, commerce, and technology:

Shopify — product catalogue queries via natural language
Tripadvisor — "family-friendly restaurants in Barcelona with outdoor seating" returns structured Schema.org data rather than a list of links
Eventbrite — event discovery through conversational queries
O'Reilly Media — technical content accessible to AI agents
Common Sense Media — media reviews queryable by parents and AI assistants
Hearst — publishing content exposed as queryable endpoints

On the implementation side, Yoast announced Schema Aggregation in March 2026 — a feature that organises WordPress sites' structured data specifically to reduce the technical effort required to build NLWeb integration. This gives the millions of WordPress sites running Yoast a direct on-ramp.

The Four Protocols Shaping the Agentic Web

NLWeb does not exist in isolation. It is one of four protocols collectively defining how AI agents interact with the web — and they are being adopted at unprecedented speed:

MCP (Model Context Protocol) — Anthropic's standard for connecting AI to tools and data. 97 million monthly SDK downloads. Universal platform adoption in 12 months.
A2A (Agent2Agent) — Google's protocol for agents from different vendors to discover and collaborate with each other. From 50 to 150+ organisations in three months.
NLWeb — Microsoft's protocol for making websites conversationally queryable. Major publisher adoption at launch.
AGENTS.md — Standardised guidance files for AI coding agents. 60,000+ open-source projects adopted within months.

These protocols are coordinated through the Linux Foundation's Agentic AI Foundation (AAIF), whose eight platinum members include AWS, Anthropic, Google, Microsoft, and OpenAI. This is not speculative — the companies building the AI agents are simultaneously building the protocols those agents will use.

What Your Business Should Do Now

You do not need to deploy NLWeb today. But you do need to prepare the foundation it requires — which, not coincidentally, is the same foundation that improves your AI visibility across every platform right now.

1. Audit your structured data for completeness

NLWeb consumes Schema.org markup. If your structured data is incomplete, disconnected, or inaccurate, NLWeb cannot represent your business to AI agents. Audit your JSON-LD for entity relationships — do your Product, Organisation, LocalBusiness, and FAQPage types properly reference each other?

2. Think entities, not pages

The entity-first approach that drives AI visibility is exactly what NLWeb needs. Make sure your brand is a well-defined entity in your structured data, with clear relationships to your products, services, locations, and people.

3. Ensure your content is server-rendered

NLWeb crawls your site to extract structured data. If your content is rendered client-side with JavaScript, NLWeb — like every other AI crawler — cannot access it. Server-side rendering is non-negotiable.

4. Monitor the NLWeb ecosystem

Yoast's Schema Aggregation is the first major integration tool. If you run WordPress, evaluate it. If you run a custom stack, watch the NLWeb GitHub repository for integration guides and community connectors.

5. Test your AI visibility baseline

Before optimising for any new protocol, know where you stand. Measure how AI agents currently perceive and cite your brand across ChatGPT, Perplexity, Gemini, Claude, and other platforms. That baseline tells you which gaps to close first — whether through structured data improvements, content restructuring, or protocol adoption.

The Bigger Picture

NLWeb represents a philosophical shift in how the web works. The web's purpose is evolving from a link graph designed for human clicks to a queryable knowledge graph optimised for machine interaction. The brands that succeed in agentic AI are those that treat their websites as queryable endpoints, not just collections of pages.

The good news: the investment required to prepare for NLWeb is the same investment that improves your AI visibility today. Better structured data. Clearer entity definitions. Server-rendered content. Machine-readable markup. These are not future-proofing exercises — they are the foundations of AI search visibility right now, and the protocol layer is simply making that visibility programmable.

ai-searchai-visibilitystructured-dataai-optimizationai-discoverability

What Is NLWeb? The Protocol That Makes Websites Queryable by AI Agents

What NLWeb Actually Does

The Architecture Behind NLWeb

NLWeb and the Model Context Protocol

NLWeb vs. llms.txt: Different Problems, Different Solutions

Why Structured Data Is Now Your Entry Ticket

We Test What AI Actually Says About Your Business

Who Is Already Using NLWeb

The Four Protocols Shaping the Agentic Web

What Your Business Should Do Now

The Bigger Picture

You're Losing Customers to AI Search.

More Articles

Does Your Website Need an LLMs.txt File? Here's How to Create One

Entity SEO: Build Brand Visibility in AI Search

7 Steps to Optimize Your Ecommerce Store for AI Search

What AI Gets Wrong About Your Website — And Whether llms.txt Actually Fixes It

What Is NLWeb? Microsoft's Natural Language Web Protocol

AI Optimization: How to Rank in AI Search (+ Checklist)

We Test What AI Actually Says About Your Business