Platform

AEO Website Research-grade Content Content Factory About Audits Rankings Pricing

Resources

Knowledge Base Research FAQ
Engine Optimization Criterion #503

Migrate Any Site to Astro: 50-Page Mirror, 25 MB Assets, 1 Commit

AEO Website Migrator pulls a live site, renders every page through headless Chromium, downloads same-origin assets, and commits the entire mirror to a fresh GitHub repo as one atomic commit. End-to-end takes minutes, not weeks.

One of 48 criteria in AEO Rank, the citation-readiness score we run against every site we audit.

By Alex Shortov

low effort high impact

Quick Answer

An operator pastes a source URL into admin.aeocontent.ai/aeo-website and within minutes there is a fresh Astro 5 repo on GitHub holding a working mirror of the site - up to 50 pages by default (hard cap 500), every same-origin asset rehosted under /_mirror/, every page rendered with Playwright Chromium so JavaScript-built markup is preserved. Six platform extractors (WordPress, Webflow, Bitrix, Squarespace, Wix, Tilda) refine the raw mirror into typed Astro components when the source platform is detected. Pixel-perfect QA loop compares mirror vs original and iterates until the visual diff is within tolerance.

Audit Note

In our audits, we've measured Migrate Any Site to Astro: 50-Page Mirror, 25 MB Assets, 1 Commit on live sites, we've compared implementations, and we've audited the gaps that keep scores low.

How does AEO Content AI migrate a website to Astro?

Paste the source URL at admin.aeocontent.ai/aeo-website and a Modal job discovers pages, renders with Playwright, rehosts assets, and commits a working Astro 5 repo.

Which CMS platforms can be migrated, and what does each extractor do?

Six platform extractors cover WordPress, Webflow, Bitrix, Squarespace, Wix, and Tilda, each replacing raw HTML with typed Astro components that match the block model.

Will my JavaScript-built pages render correctly after migration?

Yes, Playwright Chromium renders JavaScript-heavy pages before capture so the resulting Astro mirror reflects what users actually see, not the empty SPA shell.

What happens to images, fonts, and other assets during migration?

Assets are rehosted under /_aeo within the new repo with a 2,000 file and 25 MB default budget that can be raised per migration for larger sites.

Can I run a pixel-perfect comparison between the mirror and the original?

Yes, a pixel-perfect QA loop renders the mirror and original side-by-side and iterates targeted fixes until the visual diff drops under the configured threshold.

Summarize This Article With AI

Open this article in your preferred AI engine for an instant summary and analysis.

Migration Pipeline
🔍 Discover URLs
🖼️ Render Pages
📦 Rehost Assets
🧩 Extract Components
🚀 Commit to Repo
aeocontent.ai
Migration Pipeline. Infographic illustrating the AEO Rank criterion discussed in this article.

What this article answers

  • How does AEO Content AI migrate a website to Astro?
  • Which CMS platforms can be migrated, and what does each extractor do?
  • Will my JavaScript-built pages render correctly after migration?
  • What happens to images, fonts, and other assets during migration?
  • Can I run a pixel-perfect comparison between the mirror and the original?

Key takeaways

  • Migration runs as one Modal job - URL discovery (sitemap.xml or BFS over same-origin links), page render with Playwright Chromium, asset rehosting, single atomic Git commit. No CMS export plugin, no manual file copying.
  • Default crawl is 50 pages, hard cap 500. Asset budget is 2,000 files / 25 MB - tunable per migration if you have an unusually large site.
  • Six platform extractors handle the high-volume CMS shapes: WordPress, Webflow, Bitrix, Squarespace, Wix, Tilda. Each replaces the raw HTML capture with typed Astro components that match the platform’s block model.
  • Pixel-perfect QA loop renders mirror + original side-by-side and iterates targeted fixes until diff falls under threshold - prevents the “mostly right but obviously broken” outcome of cheap site copiers.
  • The new repo ships with the Astro 5 starter (npm run dev/build/preview, strict TypeScript), commits land via GitHub App so no developer credentials are shared, and the resulting site is ready to deploy on Cloudflare Pages, Vercel, or Netlify without further config.
  • The migrator is one of two pieces - the other is the publish layer that ships future articles into this same repo, so once you migrate you also unlock automatic article publishing into the same Astro codebase.

Why Migrate to Astro in the First Place?

Migrating to Astro turns your CMS-locked site into version-controlled files, letting AI agents inspect and edit schema, sitemap, llms.txt, and pages with one commit.

WordPress, Webflow, Squarespace, and Wix all share a structural problem from an AEO perspective: the content lives inside a CMS database that can only be touched through admin screens, plugins, and platform-specific APIs. Every meaningful update is manual labor or paid plugin glue. Schema changes, sitemap regeneration, llms.txt updates, robots.txt rewrites, RSS feeds - all of it needs CMS access or extension installs.

A static Astro site flips that constraint. The website becomes files in a code repository. Pages, schema, redirects, sitemap, llms.txt, styles, and deploy configuration are all version-controlled. AI agents (Claude Code, Codex) can inspect the entire site, propose changes through pull requests, and ship those changes the moment a human approves. The whole stack is reviewable, diffable, rollback-able.

For AEO specifically, the Astro layout unlocks two compounding wins: content updates ship at code-deploy speed (seconds, not days waiting for editorial workflow), and the publish pipeline can regenerate sitemap.xml + llms.txt + feed.xml in the same commit as the new article - no out-of-band cron job, no half-published states where the article exists but the sitemap doesn’t know about it yet.

How Does the Migration Actually Run?

The migrator runs at admin.aeocontent.ai/aeo-website, bootstrapping an Astro 5 repo, crawling source pages, rewriting assets, then committing the mirror through a GitHub App.

The migrator lives at admin.aeocontent.ai/aeo-website. An operator pastes the source URL, optionally tweaks the target repo name (auto-derived from the hostname), picks a target org (default Data-Subsystems), and hits Trigger. From there the pipeline runs in well-defined phases.

Phase 1 - Repo bootstrap (seconds). A POST /api/migrations/trigger validates input, inserts an aeo_migrations row in phase=triggered, then authenticates as the AEO Content Publisher GitHub App (app id 3425939) and calls repos.createInOrg. The freshly-created repo is initialized with the 9-file Astro 5 starter (package.json, astro.config.mjs, tsconfig.json, src/layouts/Layout.astro, src/pages/index.astro, public/.gitkeep, README.md, .gitignore, .env.example) as one atomic commit. Row transitions to phase=starter_committed.

Phase 2 - URL discovery (seconds). The Modal worker aeo-web2astro takes over. It looks for sitemap.xml first (one level of sitemap_index.xml expansion), and if that returns nothing useful, falls back to BFS-following <a> links on the same origin. The crawl is bounded by max_pages (default 50, hard cap 500) so even pathological sites don’t run for hours.

Phase 3 - Page rendering (the bulk of the time). Every discovered URL is rendered with headless Chromium via Playwright. This is the critical step that separates a real migration from naive HTML copying: a site built on React or Vue without server-side rendering shows almost no content in raw HTTP responses, but renders normally inside a browser. The Playwright pass captures the DOM after JavaScript has built it. Same-origin assets - <img>, <script>, <link>, <source>, <video>, <audio>, plus srcset references - are queued for download in this pass.

Phase 4 - Asset rehosting. Discovered assets are downloaded (bounded MAX_ASSETS=2000, MAX_ASSET_BYTES=25 MB) and rewritten in the captured HTML to point at /_mirror/... URLs that resolve to files in the new repo’s public/ tree. Cross-origin assets stay as absolute URLs so the mirror doesn’t bloat with third-party CDN content.

Phase 5 - Astro page emission. Each captured page becomes a .astro file under src/pages/. Path mirrors the source URL structure. The HTML is rendered through a single top-level <Fragment set:html={html} /> so quotes and template-special characters can’t break the build. Platform extractors (next section) may rewrite this into proper component shapes per platform.

Phase 6 - Single atomic commit. A shallow clone of the repo is updated with the new pages + assets, one commit is created with a message referencing the migration id, push lands via the aeo-github-pat-audit PAT, and the migration row transitions to phase=mirror_done, status=succeeded. Pages-discovered, pages-mirrored, and assets-downloaded counts are persisted for the admin dashboard.

Which Platform Extractors Exist?

Six platform extractors (WordPress, Webflow, Squarespace, Wix, Shopify, custom) recognize the source platform and emit typed Astro components matching the original block model.

The raw-mirror approach gets a site running fast, but the resulting Astro pages are one big <Fragment set:html> blob per page. That works for hosting and AEO crawling, but it doesn’t unlock the developer experience benefits of a real component library. To bridge that gap, we built six platform extractors that recognize the source platform and emit typed Astro components matching that platform’s block model.

PlatformWhat the extractor produces
WordPressBlock-level Astro components matching Gutenberg block taxonomy. Theme styles inlined, post types preserved.
WebflowComponent decomposition matching the Designer’s symbol library. CMS Collection items become Astro content collections.
Bitrix (1C-Bitrix)Recognizes the Russian-market platform’s component structure. Handles cyrillic content and bx-* selectors.
SquarespaceSection / block / grid hierarchy preserved as nested Astro components.
WixEditor + Studio templates supported. Velo back-end logic flagged for manual review (not auto-ported).
TildaTilda’s zero-block taxonomy mapped to a known set of Astro component primitives.

If no platform is detected, the migration ships the raw-mirror output - the site still runs, just with one big HTML blob per page instead of components. Extractor coverage is additive and shipping iteratively; a site mirrored today as raw HTML can re-run through a new extractor later without rebuilding the repo.

The Astro migrator supports six platform extractors, each with a different fidelity profile for the resulting mirror.

Source PlatformExtractor CoverageTypical Fidelity
WordPressFullHigh
WebflowFullHigh
SquarespacePartialModerate
WixPartialModerate
Shopify (content)PartialModerate
Generic HTMLUniversal fallbackDepends on markup

How Do You Know the Mirror Actually Matches the Original?

A pixel-perfect QA loop renders the mirror and original through headless Chromium, computes visual diffs per page, then runs targeted normalization passes until deltas fall below threshold.

A migration that ships “mostly right but obviously broken” is worse than no migration - the brand impression hit when a footer renders wrong or a hero image misaligns is worse than just letting the old site keep running. To prevent that, the migrator includes a pixel-perfect QA loop that runs after Phase 6.

The loop renders both the live mirror and the original source page through headless Chromium at the same viewport size, computes a visual diff (pixel-by-pixel + structural comparison), and reports per-page deltas. Where deltas exceed threshold, the loop runs a targeted normalization pass (font fallbacks, CSS specificity adjustments, asset URL fixes from the rewriter library) and re-renders. Iteration continues until either the diff drops under threshold or the loop hits its iteration cap, at which point the failing pages are flagged for operator review in the admin dashboard.

The output is a per-migration QA report: which pages match within tolerance, which need manual touch-up, what the specific deltas are. Operators see the visual diffs directly in the admin UI and can decide whether to ship as-is, run another loop iteration, or hand-fix specific pages.

Migrations preserve URL paths by default, keeping every backlink and indexed URL resolving, with a redirects placeholder for the rare cases where structure must change.

Migrations preserve URL paths by default - if the source page lived at /blog/launch-announcement it lives at the same path in the new Astro repo. That means existing backlinks, search-engine indexed URLs, and bookmarked links keep resolving as long as the new site deploys to the same hostname (or behind a CDN rewriting the old hostname to the new one).

For sites where the URL structure does need to change (rare - usually only when consolidating multiple subdomains into one), the operator can add explicit redirects to the repo’s _redirects or vercel.json file. The starter includes a redirects placeholder so this is a one-file edit, not a CMS plugin configuration.

The migration also preserves <title>, meta description, OpenGraph tags, and existing JSON-LD schema blocks per page. Search Console rankings, AEO Site Rank scores, and citation graph data carry forward.

What Happens After the Migration Completes?

Three things unlock immediately: the new repo deploys to any host with zero config, AEO articles publish through git, and AI agents can edit the entire site.

Three things become possible the moment the mirror lands:

First - the new repo is immediately deployable. Cloudflare Pages, Vercel, Netlify, and Coolify all auto-detect Astro and build with zero config. Point DNS at the new deploy and the migration is live.

Second - any future article generated by the AEO Content pipeline publishes into the same repo through the same GitHub App (see how-aeo-publishes-to-your-cms). One PR per article, one atomic commit including the article + the regenerated sitemap.xml + llms.txt + feed.xml. The whole AEO content motion runs through git from this point forward.

Third - AI agents (Claude Code, Codex) can now inspect and improve the site. Want to add a new section to the homepage? File a PR. Want a redesign? Run an agent against the codebase. The site is files, not admin screens.

What Are the Limits and Edge Cases?

Known limits include JavaScript-heavy SPA shells, authentication-gated content, and thin headless CMS consumers, each requiring a different migration approach than the default mirror.

The migrator has known boundaries we ship honestly rather than paper over:

  • JavaScript-heavy SPA shells (React/Vue without SSR + no static export) capture as one page with empty bodies because the client-side router never gets the chance to materialize sub-routes. These need a different approach (run the SPA’s build step and capture the output, or rewrite the router into Astro routes).
  • Authentication-gated content is not captured - the crawler runs unauthenticated by design, so anything behind a login is excluded.
  • Headless CMS architectures (Contentful, Sanity, Strapi) where the live site is a thin API consumer benefit more from a content-export migration than a mirror - the mirror captures the rendered output but loses the source-of-truth content model.
  • Very large sites (5,000+ pages) exceed the 500-page hard cap. The cap is a budget, not a censorship rule - operators can run multiple sequential migrations covering disjoint path prefixes, or run the migration with an elevated cap on a per-project basis.
  • Heavy plugin sites (e.g., WordPress with 50+ plugins) often have plugin runtime that the mirror cannot replicate. Forms, comments, search widgets, e-commerce flows need to be re-implemented or wired through API integrations after the mirror lands.

The migrator gets you the static content + brand surface in minutes. The remaining 5-15% of platform-specific behavior is what the platform extractors are gradually closing, plus operator review.

How This Connects to Your AEO Score

A migration is not directly a scoring action - it is the prerequisite that unlocks every other AEO improvement at code-deploy speed. Once your site is in git:

  • llms.txt can be added in one commit (covers the AI Discovery pillar criterion that requires an llms.txt file)
  • robots.txt with the 2026 AEO crawler allowlist is one file
  • JSON-LD schema across Organization / Article / FAQPage / BreadcrumbList lands as edits to the Layout component, applied site-wide in one commit
  • Sitemap completeness is automated by the build step; every new article gets included automatically
  • Internal linking can be edited at the source instead of through CMS interfaces
  • Topic coherence improves because article publishing through the pipeline tracks cluster membership at the schema level

The score lifts that follow are not from the migration itself - they are from the fact that every other improvement is now a 5-minute commit instead of a multi-step CMS workflow.

External Resources

Key takeaways

  • Migration runs as one Modal job - URL discovery (sitemap.xml or BFS over same-origin links), page render with Playwright Chromium, asset rehosting, single atomic Git commit. No CMS export plugin, no manual file copying.
  • Default crawl is 50 pages, hard cap 500. Asset budget is 2,000 files / 25 MB - tunable per migration if you have an unusually large site.
  • Six platform extractors handle the high-volume CMS shapes: WordPress, Webflow, Bitrix, Squarespace, Wix, Tilda. Each replaces the raw HTML capture with typed Astro components that match the platform's block model.
  • Pixel-perfect QA loop renders mirror + original side-by-side and iterates targeted fixes until diff falls under threshold - prevents the 'mostly right but obviously broken' outcome of cheap site copiers.
  • The new repo ships with the Astro 5 starter (npm run dev/build/preview, strict TypeScript), commits land via GitHub App so no developer credentials are shared, and the resulting site is ready to deploy on Cloudflare Pages, Vercel, or Netlify without further config.
  • The migrator is one of two pieces - the other is the publish layer that ships future articles into this same repo, so once you migrate you also unlock automatic article publishing into the same Astro codebase.

Related FAQs

Technical Implementation
Getting Started