---
name: ai-crossref-commentary
description: |
  Build the "Commentary & Insights" section of an AI Bot × AI Assistant Visits × Google
  Search Clicks crossref report. Produces five tabs (Headline, Source Behaviour, Content
  Themes, Trend Reading, Actions) interpreting raw Ahrefs Web Analytics, Bot Analytics
  and GSC data, with a per-section methodology modal listing every endpoint, metric,
  step, and gotcha used. Trigger when the user asks for "AI bot vs AI visit vs Google
  cross-reference", "AI page priorities", "AI overlap commentary", or wants to add
  human-readable interpretation on top of the standard 3-channel crossref tables.
version: 1.0
inputs:
  - ahrefs_project_id (int64-as-string) with Bot Analytics + Web Analytics + GSC enabled
  - scope filter (full domain or URL prefix, e.g. /blog)
  - timeframe (current + prior windows of equal length, default 30d + 30d)
outputs:
  - One Commentary & Insights tabbed section embedded in a Flask/Jinja Console report
  - Five sub-tabs with editorial bullets (Headline / Source / Themes / Trends / Actions)
  - A methodology modal opened from a small "i" info button next to the section title,
    listing every connector endpoint, every metric extracted, numbered steps, and notes
---

# AI Crossref — Commentary & Insights section

This skill describes how to build the **Commentary & Insights** block of an "AI Bot
crawls × AI Assistant visits × Google Search clicks" crossref report. It is the
narrative layer on top of seven structured tables (Headline KPIs, Anomalies,
Top 25 overlap, per-channel Top 25, Source↔Bot, per-platform overlap, Trends).

The block contains:

1. A section heading `<h2>` with an inline `<button class="info-btn">i</button>` that
   opens a methodology modal.
2. A sub-tabs row (5 tabs) and 5 `<div class="tab-panel">` panes underneath.
3. A methodology modal at the end of the page, populated from a `METHODOLOGY[key]`
   record passed via `{{ methodology | tojson }}`.

The interpretation surfaces in every tab MUST trace back to a specific table on the
page. If the table and prose disagree, the table wins.

---

## 1. Data inputs — Ahrefs endpoints used

The commentary tabs read from **11 connector calls** (8 unique endpoints, some
called twice for current + prior windows). All filter to the same URL prefix
(e.g. `/blog`) and the same two equal-length windows.

The exact connector IDs are from the agent platform's `ahrefs_web_analytics` and
`ahrefs_gsc` providers, which proxy the official Ahrefs API v3 — see the MCP
mapping section at the end of this skill for the equivalent MCP tool names.

### A. Bot Analytics (AI crawler side) — 5 calls

| # | Endpoint | Args | Why |
|---|---|---|---|
| 1 | `ahrefs_web_analytics.bot_stats` | `dimension=none`, `metrics=[count]`, `where: is_ai_bot=true AND page icontains_any /scope/` | Headline raw total of AI-bot hits in the window. One row. |
| 2 | `ahrefs_web_analytics.bot_stats` (current) | `dimension=bot_page`, `metrics=[count]`, AI-bot filter, scope filter, `limit=200`, `order_by_metric=count desc` | Top 200 URLs by AI-bot hits — feeds anomaly classification, editorial vs non-editorial split, Content Themes "what bots crawl", Actions "high-bot-low-web" identification. |
| 3 | `ahrefs_web_analytics.bot_stats` (prior) | Same as #2, prior-window dates | Trend Reading — Δ bot hits per URL. |
| 4 | `ahrefs_web_analytics.bot_stats` | `dimension=bot_name`, AI-bot filter, scope filter | Source Behaviour — which bots dominate; Meta/Apple zero-conversion observation. |
| 5 | `ahrefs_web_analytics.bot_stats` (×3, per platform) | `dimension=bot_page`, `where: page icontains_any /scope/ AND bot_name is [<platform bots>]`, `limit=100` | Three calls — one each for ChatGPT, Gemini, Perplexity — to compute editorial-only crawl per platform. Needed for the editorial crawl→visit ratio that drives Source Behaviour. |

**Critical metric note:** with `dimension=bot_page` for AI bots, use `count`, NOT
`visits` or `pageviews` (those return 0 for AI-classified bots — silent zero).

### B. Web Analytics (AI assistant visits side) — 4 calls

| # | Endpoint | Args | Why |
|---|---|---|---|
| 6 | `ahrefs_web_analytics.stats` (current) | `dimension=source`, `metrics=[visits]`, `where: source_channel is [llm] AND page icontains_any /scope/` | Headline LLM session total (5,005 in the reference run) and per-source share (ChatGPT 71% etc.). |
| 7 | `ahrefs_web_analytics.stats` (prior) | Same as #6, prior-window dates | Trend Reading — source-share movement period over period. |
| 8 | `ahrefs_web_analytics.stats` (current) | `dimension=page`, `metrics=[visits]`, `source_channel=llm`, scope filter, `limit=200` | URL-level web visits — feeds Content Themes and Actions identification of pages AI sends traffic to. |
| 9 | `ahrefs_web_analytics.stats` (prior) | Same as #8, prior-window dates | Trend Reading — per-URL Δ web visits. |

**Critical filter note:** `source_channel` is an **array** field — use
`{"field":"source_channel","is":["is",["llm"]]}`. `source` is a plain string —
`{"field":"source","is":["is_stringy","ChatGPT"]}`. They are NOT interchangeable.
Mixing the two raises a validation error.

### C. Google Search Console — 2 calls

| # | Endpoint | Args | Why |
|---|---|---|---|
| 10 | `ahrefs_gsc.top_pages` (current) | `project_id=<id>`, `period_from`/`period_to` = current window, `filters.url_contains_partial=/scope/`, `limit=500` | Headline GSC click total and URL-level clicks for the Top 25 GSC table, content-theme cross-reference (evergreens vs AI-meta), Actions "GSC-strong-AI-weak refresh" candidate list. |
| 11 | `ahrefs_gsc.top_pages` (prior) | Same as #10, prior-window dates | Trend Reading — Δ GSC clicks per URL. |

**Critical metric note:** GSC top_pages returns `metrics.clicks` inside a nested
`metrics` object (not the top-level `clicks` field, which is null in the response).
The `page` field is a **stringified Python dict** — extract `protocollessUrl` via
regex: `re.search(r"'protocollessUrl':\s*'([^']+)'", page_str)`.

---

## 2. Metrics extracted

| Metric | Source | Used in |
|---|---|---|
| `count` per URL | bot_stats (`dimension=bot_page`) | Anomalies, Top 25 bot, overlap join, per-platform overlap, trend bot side |
| `count` per bot | bot_stats (`dimension=bot_name`) | Source ↔ Bot table; Source Behaviour ("heaviest crawler") |
| `visits` per URL | stats (`dimension=page`) | Top 25 web, overlap join, per-platform overlap, trend web side |
| `visits` per source | stats (`dimension=source`) | Headline ChatGPT-share figure (use this, NOT per-page sum; per-page double-counts multi-page sessions) |
| `metrics.clicks` per URL | gsc.top_pages | Top 25 GSC, overlap join, trend GSC side |
| Derived: `editorial_crawl` per platform | sum of `count` on URLs classified `editorial` from call #5 | Source Behaviour crawl→visit ratio |
| Derived: `crawl_to_visit_ratio` | `editorial_crawl / source_visits` | Source Behaviour ("most efficient converter") |
| Derived: AI Priority | `(bot * web * gsc) ** (1/3) * 3` (geometric mean × channel count) | Top 25 overlap ranking, Δ Priority in trends, Actions priority ordering |
| Derived: anomaly share | `(raw_total − editorial_total) / raw_total` | Headline non-editorial % |

---

## 3. Steps — how the commentary numbers are produced

1. Fire all 11 connector calls in parallel where possible. Cache raw JSON responses
   on disk; do not stream rows through chat context.
2. **Normalise URLs identically** across every result set: lowercase host, strip
   query and fragment, collapse double slashes, preserve trailing slash. Without
   this the channels won't join cleanly (e.g. `https://example.com/x` vs
   `https://example.com/x/` becomes two rows).
3. **Classify every URL** into `editorial` / `asset` / `wordpress` / `api`:
   - `asset` → path ends in `.(css|js|map|png|jpg|jpeg|svg|gif|webp|ico|woff|woff2|ttf|eot)`
   - `wordpress` → path contains `/wp-json`, `/wp-content`, `/wp-admin`, `/feed`,
     `/sitemap`, `/robots.txt`
   - `api` → path contains `/mcp/`, `/api/`, or host starts with `api.`/`app.`
   - everything else → `editorial`
4. Compute the **anomaly share** for the Headline tab. Drop non-editorial rows
   for every ranking, ratio, and "heaviest crawler" claim downstream. A single
   MCP or `wp-json/oembed/1.0/embed` endpoint can absorb 80%+ of all hits.
5. Compute **per-platform editorial crawl**: sum `count` on editorial URLs only
   from each of the three per-platform `bot_page` queries (call #5 family).
   Use the canonical platform→bots mapping:
   - ChatGPT → `GPTBot`, `OAI-SearchBot`, `ChatGPT-User`
   - Claude → `ClaudeBot`, `Claude-User`, `Claude-SearchBot`, `Claude-Code`,
     `Anthropic AI`
   - Perplexity → `PerplexityBot`, `Perplexity-User`
   - Gemini → `Google-Extended`, `Google-NotebookLM`, `Gemini-Deep-Research`,
     `GoogleAgent-Mariner`
   - Meta → `Meta-ExternalAgent`, `Meta-ExternalFetcher`
   - Apple → `Applebot`, `Applebot-Extended`
   - Mistral → `MistralAI-User`
   - Copilot → ∅ (Bing uses bingbot, not AI-classified — visits ✓, crawl ✗)
6. Compute `crawl_to_visit_ratio = editorial_crawl / web_visits` per platform.
   This drives the "ChatGPT ~106:1", "Perplexity ~17:1" claims. Do NOT use raw
   `bot_by_name` totals here — they include WordPress/asset contamination.
7. **Intersect three channels** for the overlap set: keep only URLs present in
   bot ∩ web ∩ gsc AND classified `editorial`. Compute AI Priority per URL.
   The Top 25 by AI Priority gives the content clusters that feed Content Themes.
8. Cross-reference the per-channel Top 25 tables to identify imbalanced pages:
   - high Bot / low Web → AI sees but doesn't pick → answer-quality refresh
   - low Bot / high Web → cited from training → freshness updates
   - strong GSC / weak AI → Google ranks but AI doesn't cite → modernise for
     2026 citation patterns
   These mappings feed the Actions tab.
9. **Trend math**: for every URL touching either window compute `pri_cur`,
   `pri_prev`, `delta_pri`, per-channel Δ% and `up`/`down` counts. Apply the
   balance filter (≥2 of 3 channels moving same direction). Rising/Declining/New
   lists feed Trend Reading.
10. Render the commentary as five tabs. Every bullet must reference either a
    KPI tile, a specific table row, or a specific derived ratio.

---

## 4. The five sub-tabs — content template

Tab IDs and active state controlled by a `data-tabgroup="insights"` div with five
`<button class="tab-btn" data-target="ins-…">` buttons and five
`<div class="tab-panel" id="ins-…">` panes. Vanilla JS toggles `.active`.

### Tab 1 — Headline
Bullet template — fill brackets from KPI tile values:

- AI bot crawls (top-200 sample): **{bot_raw}** hits, of which **{anomaly_pct}%**
  are non-editorial (WordPress JSON, assets, MCP/API); **editorial-only:
  {bot_editorial}**. Headline domain-wide: ~{full_period_editorial} editorial
  /blog AI-bot hits.
- AI assistant visits: **{web_total}** sessions across {n} named sources.
  **{top_source} = {top_source_pct}%**.
- Google Search clicks: **{gsc_total}** across {n_pages} top-{limit}/blog pages.
- Overlap (all 3 channels, editorial only): **{overlap_count} qualifying URLs**.
- Heaviest editorial AI crawler: {top_bot} ({top_bot_count} hits).

### Tab 2 — Source Behaviour
One bullet per source mentioned in Source↔Bot, anchored to its editorial ratio:

- **ChatGPT dominates end-to-end** — X% of LLM visits, Y raw bot hits, ~Z:1
  editorial crawl→visit ratio.
- **Gemini crawls aggressively, converts poorly.** Inspect top-crawled URLs;
  if `.js`/`.css` dominate, call out the asset-scraping pattern.
- **Perplexity efficiency check.** Compute its ratio — typically the cleanest.
- **Meta-ExternalAgent / Applebot** crawl-heavy zero-conversion observation —
  training scrapers, not live assistants.
- **Copilot crawl invisibility caveat** — Bing's bingbot isn't AI-classified.
  Don't compute a ratio.

### Tab 3 — Content Themes
- **AI-meta content owns AI Priority top** — pages *about* AI search
  (e.g. `ai-overviews-reduce-clicks-update`, `what-is-llms-txt`, `geo-…`) are
  what assistants cite most.
- **GSC top is dominated by evergreens** — tutorial pages with strong Google
  ranking, moderate AI engagement. These are refresh candidates.
- **Bot-strong / Web-weak** = answer-quality problem (AI sees, doesn't pick).
- **Bot-weak / Web-strong** = trained-from cache (freshness updates compound).
- **Balanced rows** = cleanest signal of active AI engagement → prioritise.

### Tab 4 — Trend Reading
- **Compounding cluster** — list the top 4–5 URLs marked `3↑` in the rising
  table; characterise the theme (AI-meta + commercial / pricing / fundamentals).
- **Declining cluster** — top 4–5 with `2↓` or `3↓`; characterise (early-2026
  AI-topic content losing freshness / outdated mechanics).
- **New from zero** — name the most prominent entrants (fundamentals being
  re-cited, new editorial launches).
- **Bot-sampling caveat** — explicitly state that bot top-200 captures only the
  current top of distribution. URLs dropping out of top-200 in one window can
  read as "down" without actually losing traffic. Trust `3↑`/`3↓` rows fully;
  treat `2↑`/`2↓` rows as suggestive but verify with web + GSC.

### Tab 5 — Actions
Ordered priority list — each item maps to a specific table:

1. **Refresh balanced top-20** (from Top 25 overlap, ranked by AI Priority).
   Improvements compound.
2. **Fix AI-sees-doesn't-pick** — top URLs from Top 25 Bot with low Web Visits.
   Add direct-answer summaries, definitions, stats boxes near top.
3. **Refresh GSC-strong/AI-weak** — top URLs from Top 25 GSC with low Web
   Visits. Modernise for AI citation patterns.
4. **Refresh decline cluster** — URLs in Declining table that still have audience
   signal but are degrading on all sides.
5. **Don't optimise for Meta/Apple traffic** — name the bots' raw hit volume,
   call out that they drive zero attributable LLM visits, recommend only
   infrastructure-protection action.

---

## 5. Methodology modal — schema + UI

The modal is a single fixed-position `<div id="meth-overlay">` populated from a
JS const `METHODOLOGY` injected as `{{ methodology | tojson }}` in the Jinja
template. Every section heading gets a sibling info button:

```html
<h2>Commentary & Insights
  <button class="info-btn" type="button" data-method="insights"
          aria-label="How was this calculated?">i</button>
</h2>
```

The `METHODOLOGY` dict keys map to `data-method` button attributes. Each value
is a struct of:

```python
METHODOLOGY = {
    "insights": {
        "title": "Commentary & Insights",
        "intro": "Human-written interpretation of every other section on this page. The narrative is editorial, but the figures it cites all come from the connector queries listed below.",
        "endpoints": [
            ["ahrefs_web_analytics.bot_stats (totals)", "..."],
            ["ahrefs_web_analytics.bot_stats (per URL)", "..."],
            ["ahrefs_web_analytics.bot_stats (per bot)", "..."],
            ["ahrefs_web_analytics.bot_stats (per-platform pages)", "..."],
            ["ahrefs_web_analytics.stats (per source)", "..."],
            ["ahrefs_web_analytics.stats (per URL)", "..."],
            ["ahrefs_web_analytics.stats (per source, prior window)", "..."],
            ["ahrefs_gsc.top_pages", "..."],
        ],
        "metrics": [
            "`count` per URL and per bot (bot_stats) — raw AI-bot hits, summed by URL or bot name.",
            "`visits` per URL and per source (stats) — sessions where `source_channel` includes `llm`.",
            "`metrics.clicks` per URL (gsc_pages) — organic Google clicks; the `page` field is a stringified dict and `protocollessUrl` is parsed via regex.",
            "Derived: `editorial_crawl` (sum of `count` on URLs classified `editorial` only) per platform.",
            "Derived: `crawl_to_visit_ratio = editorial_crawl / source_visits` per platform.",
            "Derived: AI Priority = (bot * web * gsc) ** (1/3) * 3 (geometric mean × 3 channels).",
        ],
        "steps": [
            "Fetch the 8 queries plus the 3 prior-window queries (bot per URL, web per URL, GSC). 11 total connector calls.",
            "Normalise every URL identically (lowercase host, strip query/fragment, collapse slashes).",
            "Classify every URL into editorial / asset / wordpress / api. Compute anomaly share; drop non-editorial everywhere else.",
            "Compute per-platform editorial crawl and crawl→visit ratios.",
            "Intersect three channels; compute AI Priority and rank — top 25 surfaces content clusters.",
            "Cross-reference per-channel top-25 tables for imbalanced pages.",
            "Run trend math: for every URL in either window compute `delta_pri` and per-channel `up`/`down` counts.",
            "Translate structured outputs into the five commentary tabs.",
        ],
        "notes": "Every claim in the Commentary tabs traces back to a specific row in a specific table on this page — if the prose and a table disagree, trust the table.",
    },
}
```

JS to wire it up:

```javascript
const METHODOLOGY = {{ methodology | tojson }};

function renderInline(s) {
  // Render backtick `code spans` + escape everything else.
  if (!s) return '';
  const parts = String(s).split(/(`[^`]+`)/);
  return parts.map(p => {
    if (p.startsWith('`') && p.endsWith('`')) {
      return '<code class="inline">' +
             p.slice(1,-1).replace(/[&<>"']/g, c =>
               ({'&':'&amp;','<':'&lt;','>':'&gt;','"':'&quot;',"'":'&#39;'}[c])) +
             '</code>';
    }
    return p.replace(/[&<>"']/g, c =>
      ({'&':'&amp;','<':'&lt;','>':'&gt;','"':'&quot;',"'":'&#39;'}[c]));
  }).join('');
}

function openMethModal(key) {
  const m = METHODOLOGY[key];
  if (!m) return;
  document.getElementById('meth-title').textContent = m.title;
  document.getElementById('meth-intro').innerHTML = renderInline(m.intro || '');
  // endpoints
  const ep = document.getElementById('meth-endpoints'); ep.innerHTML = '';
  (m.endpoints || []).forEach(([name, desc]) => {
    const row = document.createElement('div');
    row.className = 'meth-endpoint';
    row.innerHTML = '<code class="name">' + name + '</code>' +
                    '<span class="desc">' + renderInline(desc) + '</span>';
    ep.appendChild(row);
  });
  // metrics (bulleted) and steps (numbered) — same pattern
  // notes — single notes div
  document.getElementById('meth-overlay').classList.add('open');
}

document.addEventListener('click', ev => {
  const btn = ev.target.closest('[data-method]');
  if (btn) openMethModal(btn.dataset.method);
  if (ev.target.closest('[data-meth-close]') || ev.target.id === 'meth-overlay') {
    document.getElementById('meth-overlay').classList.remove('open');
  }
});
document.addEventListener('keydown', e => {
  if (e.key === 'Escape')
    document.getElementById('meth-overlay').classList.remove('open');
});
```

The modal layout has four sections: **Ahrefs Endpoints Used** (rendered as cards
with the connector ID styled like a code chip + a descriptive line), **Metrics
Extracted** (bullet list), **Steps** (numbered list with circular step badges),
and **Notes** (highlighted callout).

---

## 6. CSS — minimum needed

```css
.info-btn { display: inline-flex; align-items: center; justify-content: center;
            width: 1.35rem; height: 1.35rem; border-radius: 999px;
            background: var(--card); border: 1px solid var(--border-strong);
            color: var(--fg-muted); font-size: 0.72rem; font-weight: 700;
            cursor: pointer; vertical-align: middle; margin-left: 0.55rem; }
.info-btn:hover { color: var(--accent); border-color: var(--accent); }

.meth-overlay { position: fixed; inset: 0; z-index: 1000;
                background: rgba(0,0,0,0.72); display: none;
                align-items: flex-start; justify-content: center;
                padding: 4vh 1rem; overflow-y: auto; }
.meth-overlay.open { display: flex; }
.meth-modal { background: var(--bg); border: 1px solid var(--border-strong);
              border-radius: 12px; width: 100%; max-width: 760px;
              padding: 1.5rem 1.75rem 1.75rem; }

.meth-endpoint { background: var(--card); border: 1px solid var(--border);
                 border-radius: 6px; padding: 0.55rem 0.8rem;
                 margin-bottom: 0.45rem; }
.meth-endpoint code.name { display: inline-block; color: var(--accent);
                           font-size: 0.82rem; font-weight: 700;
                           background: rgba(96,165,250,0.10);
                           border: 1px solid rgba(96,165,250,0.22);
                           border-radius: 4px; padding: 0.08rem 0.4rem; }
.meth-endpoint .desc { display: block; color: var(--fg-muted);
                       font-size: 0.83rem; margin-top: 0.4rem; }

.meth-modal ol { list-style: none; padding: 0; counter-reset: meth-step; }
.meth-modal ol li { counter-increment: meth-step; position: relative;
                    padding-left: 2.1rem; margin-bottom: 0.65rem; }
.meth-modal ol li::before { content: counter(meth-step); position: absolute;
                            left: 0; top: 0.05rem; width: 1.45rem;
                            height: 1.45rem; border-radius: 999px;
                            background: var(--accent); color: #fff;
                            display: flex; align-items: center;
                            justify-content: center; }

.meth-notes { background: rgba(96,165,250,0.06);
              border-left: 3px solid var(--accent); border-radius: 4px;
              padding: 0.7rem 0.95rem; }
.meth-notes strong { color: var(--accent); }
```

---

## 7. Gotchas (don't skip)

- **`source_channel` is an array, `source` is a string.** Don't mix the
  operators (`is [llm]` vs `is_stringy "ChatGPT"`).
- **Bot side uses `count`, not `visits`.** With `dimension=bot_page` AI bots
  return zero for visits/pageviews — silent failure.
- **Per-page web `visits` double-counts multi-page sessions.** Use
  `dimension=source` for headline session totals; use `dimension=page` only
  for ranking.
- **`metrics.clicks`, not `clicks`.** GSC top_pages puts clicks inside a nested
  `metrics` object; the top-level `clicks` field is null.
- **GSC `page` field is a stringified dict.** Parse `protocollessUrl` via regex.
- **Connector responses can be truncated.** Anything >~10 KB returns
  `_truncated: true` + `_full_payload_path` — always load the file path; the
  `_preview` is only the first 5 rows.
- **Anomaly URLs distort everything.** A WordPress `wp-json/oembed/1.0/embed`
  or MCP endpoint can absorb 80%+ of all AI-bot hits. Classify and exclude
  *before* any ranking, ratio, or "heaviest crawler" claim.
- **Overlap = intersection, not union.** A URL with huge bot volume but zero
  web visits goes in the per-channel table, NOT the overlap table.
- **Geometric mean penalises imbalance** — that's the point. Don't substitute
  arithmetic mean or sum.
- **Trust the `3↑`/`3↓` Dir column over Δ% on individual channels** — bot
  top-200 sampling means lower-ranked URLs drop out of the top-200 between
  windows without losing real traffic.

---

## 8. Ahrefs MCP — fallback / alternative when this connector layer isn't available

The agent platform's `ahrefs_web_analytics.*` and `ahrefs_gsc.*` connector IDs
are a typed wrapper around Ahrefs API v3. If you're rebuilding this elsewhere
(Claude Desktop, ChatGPT, Cursor) using the **official Ahrefs MCP server**
(`@ahrefs/mcp` or the remote MCP at `mcp.ahrefs.com`), the mapping is:

### Available via Ahrefs MCP (use these directly)

The Ahrefs MCP exposes the **API v3 REST surface as MCP tools** — ~112 tools
covering Site Explorer, Keywords Explorer, Rank Tracker, Site Audit, and Brand
Radar. Use these tools the same way:

| What this skill needs | Ahrefs MCP / API v3 equivalent |
|---|---|
| Google Search Console clicks/impressions per URL | `rank-tracker_gsc-pages-table` / `gscPagesTable` (rank-tracker domain with GSC linked). Returns the same `clicks` / `impressions` / `ctr` / `position` per URL. Filter via `filters.url_contains_partial`, paginate via `limit`. |
| Top GSC pages | `rank-tracker_gsc-top-pages` / `gscTopPages` — drop-in replacement for `ahrefs_gsc.top_pages`. |
| Site Explorer organic traffic (proxy for GSC if no GSC integration) | `site-explorer_top-pages-v3` / `siteExplorerTopPagesV3`. Less faithful than GSC but usable. |
| Backlink / referring-domain context for rising URLs | `site-explorer_backlinks-v3`, `site-explorer_refdomains-v3` |
| Keyword intent for content-theme analysis | `keywords-explorer_keywords-metrics-by-keywords` |
| Brand mentions in AI surfaces (for cross-validation of LLM visits) | `brand-radar_*` family — Brand Radar add-on required |

### NOT available via Ahrefs MCP — fall back / proxy

The two **most important** endpoints for this skill have **no MCP equivalent**:

| Missing | Why | Workaround |
|---|---|---|
| `ahrefs_web_analytics.bot_stats` | **Bot Analytics** is a separate Ahrefs product fed by Cloudflare Logpush / Worker — server-side log ingestion, not part of the v3 REST API. No MCP tool exposes it as of this writing. | (a) Sign up for Ahrefs Bot Analytics, configure Cloudflare Logpush, then export via the Bot Analytics UI manually; (b) substitute Cloudflare Bot Analytics directly (Cloudflare's own dashboard / API) — cheaper, similar data; (c) parse raw access logs with a User-Agent classifier; (d) use [Vercel/Cloudflare Workers Analytics] if the site runs on either. None match Ahrefs' AI-bot classification quality but are usable. |
| `ahrefs_web_analytics.stats` (with `source_channel=llm`) | **Web Analytics** is Ahrefs' first-party JS tracker — also outside the v3 REST API. No MCP tool. | (a) Use Google Analytics 4 with a custom referrer-classification dimension (regex-match `chatgpt.com`, `claude.ai`, `perplexity.ai`, `gemini.google.com`, `copilot.microsoft.com`, `mistral.ai` against `session_source`). Pull via GA4 Data API. (b) Use Plausible / Fathom / Vercel Analytics with similar referrer regex. (c) Server-side analytics with a custom `is_llm_source` field set via Referer-header parsing. |

### Suggested MCP-only architecture (no Web Analytics, no Bot Analytics)

If you only have the Ahrefs MCP plus GA4 MCP plus Google Search Console MCP, the
skill degrades to a **two-channel** crossref (GSC clicks × LLM visits from GA4)
and the bot side becomes a **separate** report fed from Cloudflare Bot Analytics
or raw logs. The commentary tabs adapt:

- **Headline** → drop the bot raw/editorial split; keep web sessions + GSC
  clicks + overlap count.
- **Source Behaviour** → keep the per-source visits from GA4; you cannot compute
  crawl→visit ratios. Replace that bullet with "raw bot crawl volume from
  Cloudflare Bot Analytics: N / week" if available.
- **Content Themes** → unchanged; still works with two channels.
- **Trend Reading** → balance filter is "≥2 of 2" — strict.
- **Actions** → drop priority 2 (bot-sees-doesn't-pick) since you can't measure
  the imbalance.

The AI Priority formula generalises: with `n` channels, `priority =
(prod(values)) ** (1/n) * n`.

### Ahrefs MCP setup (for reference)

Remote MCP (preferred):
```json
{"mcpServers": {"ahrefs": {"url": "https://mcp.ahrefs.com",
  "headers": {"Authorization": "Bearer <API_KEY>"}}}}
```

Local Node install:
```bash
npm install --prefix=~/.global-node-modules @ahrefs/mcp -g
```

```json
{"mcpServers": {"ahrefs": {"command": "npx",
  "args": ["--prefix=~/.global-node-modules", "@ahrefs/mcp"],
  "env": {"API_KEY": "<API_KEY>"}}}}
```

Ahrefs MCP requires **Enterprise plan API access** plus consumes **Integration
API units** (Lite 25K/mo → Enterprise 2M/mo). Each MCP call costs ≥50 units
with the heavier endpoints costing more — exactly the same unit model as
direct API v3 calls. The remote MCP server handles auth + rate limits
automatically.

---

## 9. Wiring checklist

1. Add `METHODOLOGY` dict to the report's Python file. Pass via Jinja context as
   `methodology=METHODOLOGY`.
2. Render `{{ methodology | tojson }}` into a JS const inside `<script>`.
3. Add the modal markup at the bottom of the page (one instance per page).
4. Add the info button next to every `<h2>` you want to document, with
   `data-method="<key>"` matching a key in `METHODOLOGY`.
5. Add the 5-tab `data-tabgroup="insights"` block with five
   `<button data-target="ins-X">` and five `<div class="tab-panel" id="ins-X">`.
6. Add CSS (info button + modal + tab panels). Match your page's design tokens.
7. Wire the global `click` and `keydown` listeners for open/close.
8. Pre-format all KPI strings server-side (`f"{n:,}"`) — don't ship raw ints to
   the template and rely on Jinja for formatting; it complicates the per-tab
   bullet templates.

---

## 10. Acceptance test

- ✓ Every `<h2>` has a visible `i` button.
- ✓ Clicking the button opens the modal centred over the page.
- ✓ Modal shows: title, intro, endpoint list (with code-chip styling), bullet
  metrics, numbered steps, optional notes callout.
- ✓ Backtick `code spans` in any field render as styled inline code.
- ✓ Modal closes on overlay click, the × button, and Escape.
- ✓ Sub-tab clicks within Commentary & Insights swap the visible panel without
  affecting other tab groups (overlap top-25 channel tabs, per-platform tabs,
  trend tabs).
- ✓ Every bullet in every commentary tab maps to a number that exists in one of
  the seven tables further down the page.
- ✓ No new connector calls fire when the commentary tab is rendered — all
  numbers come from the same dataset that built the structured tables.