Page Finder: List Every URL on a Website
A page finder is any method or tool that lists every URL on a website. Whether you are auditing your own site, sizing up a competitor, or just trying to remember which pages you published two years ago, the goal is the same: produce a complete, deduplicated list of the URLs that make up a domain. The catch is that no single source holds the full picture — sitemaps, search engines, and crawlers each see a different slice of the site. This guide shows you exactly how each method works, where each one falls short, and how to combine them so nothing slips through.
What a Page Finder Actually Does
A page finder answers one question: which URLs exist on this domain? That sounds simple, but a real website is rarely a tidy folder of files. Pages are generated dynamically, hidden behind filters, paginated into hundreds of variations, or published and then quietly forgotten. A good page finder doesn't just grab the obvious links from the navigation — it reaches the pages that are buried, orphaned, or only reachable through search.
There are three fundamentally different ways to build that list, and understanding the difference between them is the whole game:
- Ask the site what it wants you to see — read its XML sitemap, the file the owner publishes specifically to list important pages.
- Ask Google what it has indexed — use the
site:search operator to see a sample of pages already in Google's index. - Walk the site yourself — run a crawler that starts at the homepage and follows every internal link, the way a search engine bot does.
Each approach finds pages the others miss. Used together, they get you very close to a truly complete list.
The Three Core Methods: Sitemap, Google Search, and a Crawler
1. The XML sitemap
The fastest starting point is the site's XML sitemap — a machine-readable file the owner publishes to tell search engines which URLs matter. It usually lives at one of these paths:
https://example.com/sitemap.xml https://example.com/sitemap_index.xml https://example.com/sitemap-index.xml
Large sites rarely use a single flat file. Instead they publish a sitemap index — a sitemap that lists other sitemaps. A single sitemap file is capped at 50,000 URLs and 50 MB uncompressed, so a site with hundreds of thousands of pages splits them across many child sitemaps (one for posts, one for products, one for categories) and ties them together with an index file. To list every URL, you fetch the index, then fetch each child sitemap it references.
The sitemap is the cleanest list you'll get — but it only contains what the owner chose to submit. Pages can be live and perfectly crawlable yet absent from the sitemap, and conversely sitemaps sometimes list stale URLs that no longer exist. Treat it as a strong starting point, not the final word. (For a deeper walkthrough, see our guide on how to find a website's sitemap.)
2. The Google site: operator
Type this straight into Google to see pages it has indexed for a domain:
site:example.com site:example.com/blog site:example.com inurl:product
The site: operator is instant, requires no tools, and works on any public website — including competitors. You can narrow it with a path (site:example.com/blog) or combine it with inurl: to filter by URL pattern.
But it has real limits. The result count Google shows is a rough estimate, not an exact figure, and it often disagrees with reality by a wide margin. Google only returns pages it has actually indexed — not every page that exists — and it paginates results, so you typically can't scroll past a few hundred entries even on a site with thousands of pages. Use site: as a quick sanity check on what's indexed, never as a definitive page inventory.
3. A crawler
A crawler is the most thorough page finder. It starts at a seed URL (usually the homepage), reads the HTML, extracts every link, then visits those links and repeats — mapping the entire site the way Googlebot does. Desktop tools like Screaming Frog SEO Spider and Sitebulb, and cloud crawlers built into Ahrefs and Semrush, all work this way. Because a crawler follows actual links, it surfaces pages that never made it into the sitemap — as long as something on the site links to them. That last condition is the crawler's one blind spot, and it's the reason orphan pages are so easy to lose track of (more on that below).
| Method | Finds | Misses |
|---|---|---|
| XML sitemap | URLs the owner submitted | Pages left out of the sitemap |
Google site: | A sample of indexed pages | Non-indexed pages; capped result list |
| Crawler | Every internally linked page | Orphan pages with no inbound links |
How to Find Pages When a Site Has No Sitemap
Plenty of sites — small business pages, older builds, hand-coded sites — have no sitemap at all. That doesn't stop you from finding their pages. Work through these in order:
- Check robots.txt anyway. Even sites without an obvious sitemap often declare one in
/robots.txt. Openexample.com/robots.txtand scan for aSitemap:directive. It can point to a path you'd never have guessed. - Crawl from the homepage. With no sitemap, a crawler becomes your primary tool. Point Screaming Frog (or any crawler) at the root URL and let it follow internal links to map the reachable site.
- Mine Google's index. Run
site:example.comand page through the results. Anything Google indexed is a real, public URL — and some of those may not be reachable by the crawler. - Use a third-party index. SEO tools like Ahrefs and Semrush keep their own crawl of the web. Their “Top Pages” or “Site Structure” reports can list URLs your own crawl missed, drawn from links they've seen across the wider web.
Merge the output of all four steps and deduplicate. The union of a crawl plus Google's index plus a third-party index is usually as complete as a public page finder can get without server access.
Listing All Subpages and URLs of a Domain
Once you have a raw list, the work shifts from finding URLs to organising them. A flat dump of 4,000 URLs isn't useful; a grouped, deduplicated inventory is. A few practical steps:
- Normalise duplicates. Collapse trailing-slash variants,
httpvshttps, andwwwvs non-www so the same page isn't counted three times. - Strip or group URL parameters. Filter and tracking parameters (
?color=red,?utm_source=...) can balloon a few hundred real pages into thousands of near-duplicates. Decide which parameters create genuinely distinct pages and fold the rest together. - Group by path. Sort by directory (
/blog/,/products/,/docs/) to see the site's structure at a glance and spot whole sections you forgot about. - Check status codes. A useful inventory flags which URLs return
200, which redirect, and which are broken (404/5xx). A “page” that 404s isn't really a page.
For a step-by-step version of this process aimed at your own site, see our guide on how to find all pages on a website.
Pages Google Has Indexed vs All Crawlable Pages
This is the single most misunderstood part of finding pages, so it's worth being precise. There are three different populations of URLs, and they are almost never the same size:
- Pages that exist — every URL that returns content, whether or not anything links to it.
- Pages Google has discovered — URLs Google knows about, because it found them via links, your sitemap, or a previous crawl. “Discovered” does not mean indexed; Google may know a page exists and still not have crawled or stored it.
- Pages Google has indexed — the subset Google actually crawled, judged worth keeping, and made eligible to appear in search. This is what
site:roughly reflects.
The gap between “discovered” and “indexed” is where most SEO problems hide. A page can be perfectly crawlable, sitting in your sitemap, and still show as “Discovered – currently not indexed” in Google Search Console because Google decided it wasn't worth the crawl. So a page finder gives you the raw inventory; Search Console's Pages report tells you what Google did with each URL. You need both. If you want to verify which of your pages made it into the index, our guide on how to check if your site is indexed walks through every method.
Why Orphan Pages Break Most Page Finders
An orphan page is a live URL that nothing on the site links to. It exists, it loads, but no menu, footer, or in-content link points at it. Orphan pages defeat crawlers entirely — a crawler can only reach what it can follow a link to, so a page with zero inbound internal links is invisible to a crawl no matter how thorough it is. They're also frequently missing from the sitemap, because the same neglect that orphaned them usually means nobody added them to the sitemap either.
That's why a page with no links and no sitemap entry can slip past all three core methods at once. The only reliable way to surface true orphans is data that records actual hits or known URLs regardless of internal linking:
- Server access logs — every URL a bot or user has requested, linked or not.
- Analytics data — any page that received a visit is a page that exists.
- Google Search Console's Pages report — lists URLs Google has encountered, including some you never linked.
The standard technique is to crawl the site, export the sitemap, pull these data sources, and then compare the lists. URLs that appear in your logs or analytics but not in the crawl are your orphans — and finding them is half the reason to run a page finder in the first place.
Free Ways to See Every Page on a Website
You don't need paid tools to get most of the way there. The free stack:
- The sitemap, read directly. Open
/sitemap.xml(or the index it points to) in your browser and read the URLs straight from the XML. Completely free, and the cleanest list available. - Google
site:search. Free, instant, works on any public site. Best for a quick read on what's indexed. - Screaming Frog (free tier). The free version of the SEO Spider crawls up to 500 URLs — enough to fully map most small and mid-size sites.
- Google Search Console (for your own site). Free, and its Pages report is the most accurate view of which of your URLs Google has indexed or excluded — including ones a crawl would miss.
- A free online page finder. Tools like our sitemap finder locate a site's sitemap and list its URLs without you installing anything.
Using SitemapFixer to Find and Audit Every Page
Finding URLs is step one. The reason you want the list is usually to act on it — to see which pages are missing from the sitemap, which are broken, which Google is ignoring, and which need fixing. That's where SitemapFixer comes in. Point it at your domain and it locates your sitemap, pulls every URL, and audits the whole set in one pass.
- Use the sitemap finder to locate the sitemap (including ones declared only in robots.txt or split across a sitemap index) and list every URL it contains.
- Run the sitemap checker to validate that list — flagging broken URLs, redirects, non-canonical entries, and pages that shouldn't be in the sitemap.
- Get AI-generated fixes for each issue, so finding the pages and fixing them happens in the same workflow instead of across five different tools.
A page finder that only lists URLs leaves you with homework. One that lists and audits turns the inventory into an action plan.
List and audit every page on your site
Find your sitemap, see every URL, and get AI-powered fixes in under a minute.
Analyze My Sitemap FreeFrequently Asked Questions
How do I find all pages on a website?
Start with the XML sitemap (usually at /sitemap.xml or listed in /robots.txt) — it lists the URLs the owner wants indexed. To catch pages missing from it, run a crawler like Screaming Frog from the homepage, which follows every internal link. For a quick public estimate of indexed pages, search site:example.com in Google. Combining all three gives the most complete list.
How do I see every URL on a site?
There's no single source that shows literally every URL. The sitemap shows what the owner submitted, a crawler shows every internally linked page, and Google's site: operator shows a sample of indexed pages. Orphan pages with no links and no sitemap entry can be missed by all three — those usually only surface in server logs or analytics. Merge a sitemap export with a full crawl and deduplicate for the fullest picture.
Can I find pages not in the sitemap?
Yes. A crawler that follows internal links will find linked pages left out of the sitemap. Pages with no internal links at all (orphan pages) can't be found by crawling alone — you need server logs, analytics, or Google Search Console's Pages report. Comparing a crawl against the sitemap is the fastest way to spot what the sitemap is missing.
How do I find a website's sitemap?
Try example.com/sitemap.xml first. If it 404s, open example.com/robots.txt and look for a Sitemap: line — it points to the real location, often a sitemap index (sitemap_index.xml) listing several child sitemaps. A sitemap finder checks all the standard paths and robots.txt for you automatically.
How many pages does my site have?
It depends which number you mean. The sitemap count is how many URLs you submitted. The crawl count is how many internally linked pages exist. The indexed count (from Search Console or site:) is how many Google kept. These rarely match — the sitemap may list pages that no longer link anywhere, the crawl may find pages you forgot, and Google indexes only a subset. Use Search Console's Pages report for the most accurate indexed count.