Docs SEO NEO Version 1.x Robots & Indexing

Robots & Indexing

Per-page noindex/nofollow, site-wide auto-noindex for unpublished and hidden pages, granular Google directives (max-snippet, max-image-preview), and AI/LLM opt-out.

How SEO NEO handles robots

SEO NEO intentionally omits the <meta name="robots"> tag for pages in the default index,follow state. Bots assume index,follow when no tag is present, so emitting a no-op tag is unnecessary byte waste. The tag only appears when something non-default is needed.

Per-page noindex and nofollow

Every page with seoneo_tab on its template shows two checkboxes on the SEO tab:

  • Noindex — adds noindex to the robots directive for this page. Tells search engines not to include this page in their index.
  • Nofollow — adds nofollow to the robots directive. Tells bots not to follow links on this page.

These always take precedence over any site-wide default.

Site-wide kill switches

Two checkboxes at the top of Modules > Configure > SeoNeo let you flip the entire site to noindex/nofollow without touching individual pages. These are the highest-priority signal in SEO NEO — they override every per-page checkbox and per-template hook.

  • Site-wide noindex — when enabled, every page is rendered with noindex. The intended use is staging environments, pre-launch previews, or an emergency "take us out of the index" while you fix something.
  • Site-wide nofollow — when enabled, every page is rendered with nofollow. Less commonly needed, but useful in the same staging / pre-launch contexts.

Important: remember to turn these off before you go live. A common deployment mistake is to forget the site-wide noindex was on in staging, copy the database to production, and ship the kill switch with it. Make a checklist item.

Auto-noindex defaults

Two further toggles handle the most common edge cases automatically:

  • Auto-noindex unpublished pages — enabled by default. ProcessWire allows superusers and editors with view-permission to render unpublished pages on the frontend (for preview). Without this toggle a search engine following an internal preview link could index a draft. With it on, any unpublished page gets noindex regardless of its checkbox state.
  • Auto-noindex hidden pages — off by default. Hidden pages are publicly viewable; enable this if you use the Hidden flag as a "not for search" signal (e.g. utility pages, redirect stubs).

Per-page checkboxes override the hidden-page auto-default. Auto-noindex for unpublished pages applies when the checkbox is unchecked (drafts cannot opt into the index). Site-wide kill switches always win over everything.

Granular Google directives

Four optional settings in module config compose additional directives into the same <meta name="robots"> tag. All default to empty (nothing emitted) until you opt in:

  • max-snippet — caps the character length of Google's SERP text snippet for this site. -1 for no limit, 0 to suppress snippets entirely, a positive integer for the cap. Useful for paywalled content or news sites with snippet licensing.
  • max-image-preview — controls image preview size in search results: none (no image preview), standard (small), or large (full-size). Leave blank to let Google choose.
  • max-video-preview — seconds of video preview allowed. -1 for no limit, 0 to suppress.
  • unavailable_after — an RFC 850 or ISO 8601 datetime. Google drops the page from its index after this point. Useful for event listings, time-limited offers, or embargoed news. Example: 2026-12-31T23:59:59+00:00.

When any of these is configured alongside a noindex, they all appear in the same tag:

<meta name="robots" content="noindex,nofollow,max-snippet:50,max-image-preview:large">

These directives are site-wide. For per-template or per-page overrides, hook ___getRobotsDirectives($page) — see Hooks & Customisation.

AI / LLM opt-out directives

Two optional toggles in module config append AI-specific directives to the robots tag:

  • noai — asks AI crawlers not to use the site's content for generative-AI training.
  • noimageai — asks AI crawlers not to include the site's images in AI training datasets.

These are honoured by some AI crawlers (the spec originates from DeviantArt). They're a polite request, not enforcement. For stronger blocking, deny GPTBot, ClaudeBot, PerplexityBot, and others at the robots.txt or HTTP level — that's outside SEO NEO's scope, but a simple site/templates/robots.php override or MarkupRobotsTxt handles it.

When noai is enabled the output looks like:

<meta name="robots" content="index,follow,noai,noimageai">

Canonical URLs and pagination

Robots directives and canonical URLs work together. A page with noindex still emits a canonical — some SEOs prefer this for consolidated crawl signals on large sites. For paginated series, SEO NEO's default canonical policy includes the page number in the URL (/news/page2/), so each paginated variant has its own canonical rather than all collapsing to /news/. See Smart-Map & Fallbacks and the canonical section of Configuration for more detail.

Common gotchas

  • Page appears in Google despite noindex. The noindex tag is only respected when Google can crawl the page. If it's also blocked in robots.txt, Google may honour the disallow instead and never see the noindex — leaving a cached version in the index. Remove the robots.txt block and let Google crawl the noindex tag.
  • Auto-noindex not firing on a staging site. Auto-noindex only triggers on unpublished or hidden pages. To block an entire staging domain, flip the Site-wide noindex kill switch in module config — or, for stronger isolation, add HTTP basic auth at the web server.
  • The robots tag shows even though the page is index,follow. Check whether a granular directive or AI opt-out is configured in module config — those force the tag to appear even on default-state pages.

See also

  • Configuration — where to find the site-wide robots defaults and granular directive settings.
  • Hooks & Customisation — per-template robots overrides via ___getRobotsDirectives($page).
  • The SEO Tab — per-page noindex and nofollow checkboxes in the page editor.

Last updated