Skip to content
Yasser Alattas | MLOps, Kubernetes & SRE Blog
Go back

GEO Research — Generative Engine Optimization (March 2026)

Introduction

SEO is getting a new sibling. Generative Engine Optimization (GEO) is how you get your content cited by ChatGPT, Perplexity, Google AI Overviews, and Copilot — not just ranked in a list of blue links.

I spent a few days digging into the research: the Princeton GEO paper, Microsoft’s official guidance, real case studies, and every open-source tool I could find. This post is the distilled version — how RAG pipelines actually pick sources, what content formats get cited, and a prioritized action plan you can steal for your own developer blog.

Fair warning: this is long. It’s meant to be a reference you come back to, not a single-sitting read.

This is 99% written by AI but validated by me. Validation is quick online research checking the claims but not 100% evidence based. The goal is to provide a practical, actionable guide based on the best available information as of March 2026.

RAG and Source Selection

Confidence: High

Retrieval-Augmented Generation (RAG) is the foundational architecture behind all major AI search engines. Here’s how it works at a technical level:

Vector Embedding Process:

Hybrid Retrieval (How Modern Systems Actually Work):

Re-ranking as the Critical Filter:

Information Gain — The Hidden Ranking Signal:

Structured Data for GEO

Confidence: Medium (direct impact on RAG is debated; indirect benefits are proven)

What we know:

What’s unproven:

What the evidence suggests (pragmatic view):

Key Schema Types for GEO (priority order):

  1. Person — for author/creator identification
  2. Article — headline, author, datePublished, dateModified (partially implemented)
  3. FAQPage — for Q&A content (not implemented)
  4. HowTo — for tutorial content (not implemented)
  5. Organization — for company/employer association
  6. BreadcrumbList — for navigation structure
  7. WebSite — with searchAction for site search
  8. SameAs — linking to social profiles (critical for entity disambiguation)

So what: Implement comprehensive schema markup across your sites. Even if direct RAG impact is unproven, it signals professionalism, aids entity recognition, and has zero downside. The sameAs property linking to GitHub, LinkedIn, and other profiles is particularly important for entity disambiguation — it helps AI systems confirm that you’re the same person across platforms.

Content Formats That Get Cited

Confidence: High

Based on the Princeton GEO research, Microsoft’s official guidance, and multiple industry analyses:

Formats that perform best in AI citation:

  1. Direct definitions + expansions — Lead with a concise 1-2 sentence answer, then expand. The “inverted pyramid” structure. AI systems can extract the lead sentence as a citation.
  2. Comparison tables — Side-by-side comparisons are easily extractable. “Karpenter vs. Cluster Autoscaler” would be perfect for AI citation.
  3. Numbered/bulleted lists — Steps, features, recommendations. AI engines lift these almost verbatim. A structured “N problems” format already follows this pattern.
  4. Original statistics — Specific numbers from real experience: “60%+ of pods running without resource requests”, “hundreds of millions of daily API requests”. AI engines preferentially cite content with hard data.
  5. FAQ sections — Direct Q&A pairs mirror how users query AI systems. “Q: Why do Kubernetes pods get OOMKilled?” → direct answer.
  6. Code blocks with explanations — Technical content with inline YAML/code examples, especially with explanatory context, gets cited in developer-focused queries.
  7. Case studies with measurable outcomes — “We reduced compute spend by 30%” is more citable than “we optimized cloud costs.”

Formats that perform poorly:

So what: If your blog already uses winning formats (numbered lists, code blocks, specific numbers), the opportunity is to: (1) add FAQ schema, (2) add comparison tables, (3) include more specific statistics, and (4) add TL;DR summaries at the top of each section.

E-E-A-T in the AI Era

Confidence: Medium-High

E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) remains important, but the signals differ for AI engines:

Experience: AI engines can detect first-person experience claims (“In my experience auditing clusters…”, “When I built this at my company…”). First-person production stories are strong signals.

Expertise: Signaled through:

Authoritativeness: For AI, this comes from:

Trustworthiness: Signaled by:

So what: Most developer blogs score well on Experience and Expertise. The common gap is Authoritativeness (no third-party mentions, few indexed pages) and Trustworthiness (blog posts should cite more external sources and include specific, verifiable data).


Content Strategy for GEO

Confidence: High

Based on the Go Fish Digital case study (+43% AI traffic, +83% conversions) and Microsoft’s official guidance:

1. Fact-Dense Cornerstone Content

Create 5-8 comprehensive articles on core topics that serve as definitive references:

Each article should include: original statistics, comparison tables, code examples, FAQ sections, and external citations.

2. Prompt Mapping

Map the actual queries users ask AI systems:

Write content that directly addresses these queries. Go Fish Digital calls these “query fan-out” — AI systems expand queries into related questions, so cover the entire question cluster.

3. Information Gain Content

Focus on content that provides information NOT available elsewhere:

4. Content Cadence

Technical Setup Checklist

Confidence: High

robots.txt (CRITICAL)

Replace the current generic robots.txt with AI-aware directives:

# Standard crawlers
User-agent: *
Allow: /
Sitemap: https://yasser.solutions/sitemap.xml

# AI Search Crawlers — ALLOW (these handle live user queries)
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Applebot
Allow: /

# AI Training Crawlers — BLOCK (prevent content scraping for model training)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

Rationale: Allow search-mode AI crawlers (they send traffic/citations back) while blocking training-mode crawlers (they consume content without attribution). This is the recommended best practice from Playwire’s publisher guide (Jan 2026).

Same approach applies to any subdomain blog.

Sitemap Improvements

Current sitemap has hash-fragment URLs (#about, #experience) which are problematic — search engines typically ignore fragments. Convert to:

For the blog, ensure each article has its own sitemap entry with <lastmod> dates.

Meta Tags

Ensure every page has:

<meta name="description" content="[150-160 char compelling description]">
<meta name="author" content="Yasser Alattas">
<meta property="og:title" content="[Page Title]">
<meta property="og:description" content="[Description]">
<meta property="og:type" content="article"> <!-- for blog posts -->
<meta property="og:image" content="[Social share image URL]">
<meta property="og:url" content="[Canonical URL]">
<link rel="canonical" href="[Canonical URL]">

HTTP Headers for AI Crawlers

Add in Cloudflare Workers or _headers file:

X-Robots-Tag: index, follow

Ensure Cloudflare’s bot protection doesn’t inadvertently block AI search crawlers (this is a known issue — Cloudflare themselves documented Perplexity using stealth crawlers to evade blocks, Aug 2025).

Schema Patterns for AI Visibility

Confidence: Medium-High

Portfolio Site — Enhanced Person Schema

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Yasser Alattas",
  "jobTitle": "Senior Cloud & MLOps Engineer",
  "description": "Cloud infrastructure engineer building multi-cluster Kubernetes platforms, GitOps pipelines, and AI inference systems at production scale. 7+ years experience, 400M+ daily API requests.",
  "url": "https://yasser.solutions",
  "image": "https://yasser.solutions/assets/yasser-photo.jpg",
  "sameAs": [
    "https://linkedin.com/in/yasseralattas",
    "https://github.com/yalattas",
    "https://blog.yasser.solutions"
  ],
  "knowsAbout": [
    "Kubernetes", "Amazon EKS", "GitOps", "ArgoCD",
    "Terraform", "MLOps", "SageMaker", "Karpenter",
    "Istio", "DevSecOps", "Cloud Infrastructure"
  ],
  "worksFor": {
    "@type": "Organization",
    "name": "Your Company",
    "url": "https://company.com"
  },
  "hasCredential": {
    "@type": "EducationalOccupationalCredential",
    "credentialCategory": "certification",
    "name": "AWS Solutions Architect Associate"
  }
}

Critical addition: sameAs — this links the Person entity across platforms, helping AI systems build a complete picture of who you are. Without this, AI systems may not connect your portfolio, blog, GitHub, and LinkedIn into a single entity.

Critical addition: knowsAbout — explicitly tells AI systems what topics Yasser is an authority on.

Blog Articles — Article Schema (for each post)

{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Every Kubernetes Cluster I Audit Has the Same Problems",
  "description": "The four misconfigurations found in nearly every production K8s cluster — and what to do about them.",
  "author": {
    "@type": "Person",
    "name": "Yasser Alattas",
    "url": "https://yasser.solutions"
  },
  "datePublished": "YYYY-MM-DD",
  "dateModified": "YYYY-MM-DD",
  "publisher": {
    "@type": "Person",
    "name": "Yasser Alattas",
    "url": "https://yasser.solutions"
  },
  "mainEntityOfPage": "https://blog.yasser.solutions/posts/kubernetes-clusters-have-similar-problems",
  "keywords": ["Kubernetes", "EKS", "production readiness", "resource requests", "PodDisruptionBudget", "Karpenter"],
  "about": [
    {"@type": "Thing", "name": "Kubernetes"},
    {"@type": "Thing", "name": "Site Reliability Engineering"}
  ],
  "proficiencyLevel": "Expert"
}

FAQ Schema (add to blog posts)

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Why do Kubernetes pods get OOMKilled?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Pods get OOMKilled when they exceed their memory limits or when the node runs out of memory. Without resource limits, a single misbehaving pod can consume all node resources and trigger OOMKills across co-located workloads."
      }
    },
    {
      "@type": "Question",
      "name": "What happens if you don't set resource requests in Kubernetes?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Without resource requests, the Kubernetes scheduler cannot make informed placement decisions. Nodes appear underutilized on paper but may be memory-starved, leading to cascading OOMKills. Over 60% of pods in production clusters audited run without resource requests."
      }
    }
  ]
}

Meta Tags & Headers That Matter

Confidence: High

Based on Microsoft Bing’s official guidance (Oct 2025):

Title Tag: Most important signal. Should clearly summarize what the content delivers using natural language aligned with search intent.

H1: Should match or closely reflect the page title. Sets expectations for what follows.

Meta Description: Explain value/outcome, avoid keyword stuffing.

H2/H3 Headings: Act as “chapter titles” that AI uses to segment content. Use descriptive headings:

Self-contained phrasing: Sentences that make sense when pulled out of context. AI systems extract individual passages — they should work standalone.


Open Source Tools

Confidence: Medium-High

1. GEO-Bench (Princeton/Georgia Tech)

2. AutoGEO (CMU — ICLR 2026)

3. Gego — Open Source GEO Tracker

4. GEO/AEO Tracker (from amplifying-ai list)

Cloudflare-Deployable Solutions

Confidence: Medium

No mature GEO-specific tool runs natively on Cloudflare Workers. However, Cloudflare can support GEO through:

  1. Cloudflare Workers for Dynamic Schema Injection
// Example: Inject FAQ schema into blog responses
export default {
  async fetch(request, env) {
    const response = await fetch(request);
    const html = await response.text();
    
    // Inject structured data if page is a blog post
    if (new URL(request.url).pathname.startsWith('/posts/')) {
      const faqSchema = JSON.stringify({
        "@context": "https://schema.org",
        "@type": "FAQPage",
        "mainEntity": [] // Dynamically populated
      });
      const injection = `<script type="application/ld+json">${faqSchema}</script>`;
      const modified = html.replace('</head>', `${injection}</head>`);
      return new Response(modified, response);
    }
    return response;
  }
};
  1. Cloudflare _headers File — Set robots tags and cache headers for AI crawlers
  2. Cloudflare Page Rules / WAF — Ensure AI search crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot) aren’t blocked by bot protection
  3. Cloudflare Analytics — Monitor for AI crawler traffic patterns

Monitoring & Tracking

Confidence: High

Commercial Tools (for reference)

ToolCoveragePriceKey Feature
Otterly.aiChatGPT, Perplexity, Google AIOFreemiumAI visibility tracking, citation analysis
Peec AIMulti-LLM daily monitoringPaidEnterprise-grade, export-ready reports
Promptmonitor.ioChatGPT, PerplexityPaidPrompt testing, answer visibility
SemrushGoogle + AI citationsPaidIntegrated with existing SEO tooling
ProfoundChatGPT, Perplexity, Google AIOEnterprise680M+ citation dataset, deepest analysis

Free/DIY Monitoring Approach

  1. Manual prompt testing: Weekly, submit 25-50 relevant queries across ChatGPT, Perplexity, and Google AI Overviews. Document which sources are cited.
  2. GA4 AI referral tracking: Set up regex filters for ChatGPT, Perplexity, and Copilot referral traffic in Google Analytics.
  3. Gego (self-hosted): Deploy on existing infrastructure for automated tracking.
  4. Google Search Console: Monitor indexed pages and crawl activity.

Content Optimization Tools

Confidence: Medium-High

ToolTypeUse Case
Frase.ioSaaSAI content optimization, question research
Schema.org ValidatorFreeValidate JSON-LD structured data
Google Rich Results TestFreeTest schema implementation
Screaming FrogFreemiumTechnical SEO audit including schema validation
ChatGPT/ClaudeSaaSTest your own content by asking AI about topics you’ve written about

Top GEO-Optimized Developer Sites

Confidence: Medium (based on general patterns; specific developer portfolio GEO studies are rare)

Sites that consistently get cited in AI responses for technical content:

  1. DigitalOcean Community — Tutorials with consistent structure, code blocks, step-by-step format. Heavily cited by all AI platforms.
  2. Learnk8s.io — Kubernetes-specific educational content with diagrams, comparisons, and structured data. Frequently cited for K8s queries.
  3. Martin Fowler’s blog — Opinionated, experience-based technical writing with clear definitions. Model for authoritative developer content.
  4. Julia Evans (jvns.ca) — Visual, accessible explanations of complex topics. High citation rates due to unique explanation style.
  5. Kelsey Hightower’s GitHub/talks — Minimal content but extremely high authority on Kubernetes topics.

Common patterns among highly-cited developer content:

What They Do Differently

Confidence: High

Based on the Go Fish Digital case study and Chen et al. (2025) research:

  1. Earned media dominance: They don’t just publish on their own blog — they contribute to Reddit threads, answer Stack Overflow questions, write on Medium/Dev.to, and get mentioned in other people’s articles.
  2. Information gain: Every piece of content adds something new — a benchmark, a comparison, a real production experience — rather than summarizing existing knowledge.
  3. Semantic coverage: They don’t write one article per topic — they cover an entire “topic cluster” with multiple related articles that cross-reference each other.
  4. Machine scannability: Clean HTML, consistent heading structures, code blocks with language annotations, tables for comparisons.
  5. Freshness signals: Regular updates, dateModified metadata, “Updated [date]” notices.

Case Studies

Confidence: High

Case Study 1: Go Fish Digital (Sep 2025)

Case Study 2: Princeton GEO Research (KDD 2024)

Case Study 3: TheRankMasters GEO Case Study


How to Assess Your GEO Readiness

Here’s a framework for scoring any site’s GEO readiness across 10 dimensions:

CategoryWhat to Measure
Content QualityOriginal data, opinionated, experience-based writing
Content VolumeEnough articles to establish topical authority (aim for 8+)
Schema MarkupPerson, Article, FAQ, sameAs, knowsAbout implemented
Technical SEOrobots.txt configured, sitemap clean, pages indexed
AI Crawler AccessAI search bots explicitly allowed; not blocked by WAF
Cross-Platform PresenceActive on Reddit, Stack Overflow, Medium, Dev.to
Earned MediaThird-party mentions and citations
E-E-A-T SignalsExperience + expertise + authority + trust evidence
Content StructureHeadings, code blocks, lists, FAQ sections, comparison tables
FreshnessRegular updates, dateModified signals

Common GEO Gaps (Checklist)

These are the most common gaps I see when auditing sites for GEO readiness:

  1. CRITICAL: Pages not indexed — If Google hasn’t indexed your pages, Google AI Overviews can’t cite your content. Fix this first.
  2. CRITICAL: No AI-specific robots.txt — Generic robots.txt doesn’t explicitly allow AI search crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot).
  3. HIGH: Thin content library — AI engines need a body of work to establish topical authority. A handful of posts is insufficient.
  4. HIGH: No cross-platform presence — AI engines (especially Perplexity and Google AI Overviews) heavily cite Reddit, LinkedIn, and Medium.
  5. HIGH: No earned media — Chen et al. (2025) proved AI Search has “overwhelming bias toward earned media.”
  6. HIGH: Missing sameAs in Person schema — Without linking profiles across platforms, AI systems can’t build a complete entity graph.
  7. MEDIUM: No FAQ schema — Blog posts without FAQ structured data miss easy wins for Q&A-style AI queries.
  8. MEDIUM: No Article/TechArticle schema — Blog posts without proper Article structured data lack datePublished, dateModified, author link signals.
  9. MEDIUM: Sitemap issues — Fragment URLs (#about, #experience) in sitemaps are ignored by search engines.
  10. MEDIUM: No external citations in content — AI engines value content that cites authoritative external sources.
  11. MEDIUM: No comparison/benchmark content — “X vs Y” content is highly citable.
  12. MEDIUM: No OpenGraph images — Missing og:image for social sharing and AI system content cards.
  13. LOW: No RSS feed promotion — AI indexing services can use RSS for content discovery.
  14. LOW: No content update signals — No dateModified, no “Updated on” notices, no content refresh strategy.

Sources & References

#SourceURLTypeDateConfidence
1Aggarwal et al., “GEO: Generative Engine Optimization”https://arxiv.org/abs/2311.09735Academic (KDD 2024)Nov 2023High
2Chen et al., “Generative Engine Optimization: How to Dominate AI Search”https://arxiv.org/abs/2509.08919AcademicSep 2025High
3Wu et al., “What Generative Search Engines Like and How to Optimize Web Content” (AutoGEO)https://arxiv.org/abs/2510.11438Academic (ICLR 2026)Oct 2025High
4Microsoft Ads, “Optimizing Your Content for Inclusion in AI Search Answers”https://about.ads.microsoft.com/en/blog/post/october-2025/optimizing-your-content-for-inclusion-in-ai-search-answersOfficial (Bing)Oct 2025High
5Profound, “AI Platform Citation Patterns” (680M citations analyzed)https://www.tryprofound.com/blog/ai-platform-citation-patternsIndustry ResearchAug 2025High
6WordStream, “GEO vs. SEO: Everything to Know in 2026”https://www.wordstream.com/blog/generative-engine-optimizationIndustry BlogMar 2026Medium-High
7Contentful, “What is GEO and how does it differ from SEO?“https://www.contentful.com/blog/generative-engine-optimization-seo/Industry BlogDec 2025Medium-High
8Go Fish Digital, “GEO Case Study: 3X’ing Leads”https://gofishdigital.com/blog/generative-engine-optimization-geo-case-study-driving-leads/Case StudySep 2025High
9Visively, “How LLMs and RAG Systems Retrieve, Rank, and Cite Content”https://visively.com/kb/ai/llm-rag-retrieval-rankingTechnical AnalysisDec 2025High
10Cloudflare, “Perplexity is using stealth crawlers”https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers/Official (Cloudflare)Aug 2025High
11Perplexity, “Perplexity Crawlers Documentation”https://docs.perplexity.ai/docs/resources/perplexity-crawlersOfficial (Perplexity)2025High
12Anthropic, “Contextual Retrieval”https://www.anthropic.com/engineering/contextual-retrievalOfficial (Anthropic)2024High
13Google, “Information Gain Patent”https://patents.google.com/patent/US20200349181A1/enPatent2020High
14NVIDIA, “What Is Retrieval-Augmented Generation (RAG)“https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/Official (NVIDIA)Oct 2025High
15Gego GitHub Repositoryhttps://github.com/AI2HU/gegoOpen Source Tool2025Medium
16AutoGEO GitHub Repository (ICLR 2026)https://github.com/cxcscmu/AutoGEOResearch Tool2025High
17GEO-Bench Repositoryhttps://github.com/GEO-optim/GEOResearch Benchmark2023High
18Playwire, “How to Block AI Bots with robots.txt”https://www.playwire.com/blog/how-to-block-ai-bots-with-robotstxt-the-complete-publishers-guidePublisher GuideJan 2026Medium-High
19Semrush, “How to Optimize Content for AI Search Engines”https://www.semrush.com/blog/how-to-optimize-content-for-ai-search-engines/Industry BlogMar 2026Medium-High
20SEOPress, “How to Optimize Content for AI Overviews”https://www.seopress.org/newsroom/featured-stories/generative-engine-optimization/Industry BlogFeb 2026Medium
21Frase.io, “What is GEO?“https://www.frase.io/blog/what-is-generative-engine-optimization-geoIndustry GuideMar 2026Medium
22SingleGrain, “Real GEO Optimization Case Studies”https://www.singlegrain.com/search-everywhere-optimization/real-geo-optimization-case-studies/Case StudiesOct 2025Medium
23Conductor, “Top 10 AEO/GEO Tools”https://www.conductor.com/academy/best-aeo-geo-tools/Tool ReviewNov 2025Medium
24WPRiders, “Schema Markup: 8 Tactics to Boost AI Citations”https://wpriders.com/schema-markup-for-ai-search-types-that-get-you-cited/Industry BlogDec 2025Medium
25Neil Patel, “GEO vs AEO”https://neilpatel.com/blog/geo-vs-aeo/Industry BlogDec 2025Medium
26Digiday, “WTF are GEO and AEO?“https://digiday.com/media/wtf-are-geo-and-aeo-and-how-they-differ-from-seo/Industry BlogOct 2025Medium
27AWS, “What is RAG?“https://aws.amazon.com/what-is/retrieval-augmented-generation/Official (AWS)Mar 2026High
28Conbersa, “What Is JSON-LD Structured Data?“https://www.conbersa.ai/learn/what-is-json-ld-structured-dataIndustry BlogFeb 2026Medium
29DeepMind, “GopherCite”https://deepmind.google/blog/gophercite-teaching-language-models-to-support-answers-with-verified-quotes/Official (DeepMind)2022High
30Data Studios, “How Does Perplexity Choose Sources”https://www.datastudios.org/post/how-does-perplexity-choose-and-rank-its-information-sources-algorithm-and-transparencyAnalysisFeb 2026Medium
31Strapi, “GEO vs Traditional SEO Guide”https://strapi.io/blog/generative-engine-optimization-vs-traditional-seo-guideIndustry Blog2025Medium
32Search Engine Journal, “Google CTR Study: AI Overviews Rise as Click Rates Decline”https://www.searchenginejournal.com/google-ctr-study-ai-overviews-rise-as-click-rates-decline/541465/Industry Research2025Medium-High
33awesome-generative-engine-optimizationhttps://github.com/amplifying-ai/awesome-generative-engine-optimizationCurated List2025Medium

Appendices

A: GEO Audit Checklist

Use this checklist to evaluate any page’s GEO readiness:

Technical Foundation

Structured Data

Content Quality

Authority & Distribution

Freshness

B: Schema.org Templates for GEO

Person (Portfolio Site)

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Yasser Alattas",
  "jobTitle": "Senior Cloud & MLOps Engineer",
  "description": "Cloud infrastructure engineer building multi-cluster Kubernetes platforms, GitOps pipelines, and AI inference systems at production scale.",
  "url": "https://yasser.solutions",
  "image": "https://yasser.solutions/assets/yasser-photo.jpg",
  "sameAs": [
    "https://linkedin.com/in/yasseralattas",
    "https://github.com/yalattas",
    "https://blog.yasser.solutions",
    "https://dev.to/yalattas",
    "https://stackoverflow.com/users/XXXXXX/yasser-alattas"
  ],
  "knowsAbout": [
    "Kubernetes", "Amazon EKS", "GitOps", "ArgoCD", "Terraform",
    "MLOps", "SageMaker", "Karpenter", "Istio", "DevSecOps",
    "Cloud Infrastructure", "Platform Engineering"
  ],
  "worksFor": {
    "@type": "Organization",
    "name": "Salla",
    "url": "https://salla.com",
    "description": "Saudi Arabia's leading e-commerce platform"
  },
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Jeddah",
    "addressCountry": "SA"
  },
  "hasCredential": {
    "@type": "EducationalOccupationalCredential",
    "credentialCategory": "certification",
    "name": "AWS Solutions Architect Associate"
  }
}

TechArticle (Blog Posts)

{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "[Article Title]",
  "description": "[150 char description]",
  "author": {
    "@type": "Person",
    "name": "Yasser Alattas",
    "url": "https://yasser.solutions",
    "sameAs": ["https://linkedin.com/in/yasser-alattas", "https://github.com/yalattas"]
  },
  "datePublished": "YYYY-MM-DD",
  "dateModified": "YYYY-MM-DD",
  "publisher": {
    "@type": "Person",
    "name": "Yasser Alattas",
    "url": "https://yasser.solutions"
  },
  "mainEntityOfPage": "[Canonical URL]",
  "keywords": ["keyword1", "keyword2"],
  "about": [
    {"@type": "Thing", "name": "Kubernetes"},
    {"@type": "Thing", "name": "Cloud Computing"}
  ],
  "proficiencyLevel": "Expert",
  "wordCount": 2000,
  "inLanguage": "en"
}

FAQPage (Add to Article Pages)

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "[Question text — match how users ask AI]",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "[Direct, specific answer with data points]"
      }
    }
  ]
}

WebSite (Both Sites)

{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Yasser Alattas — Cloud Infrastructure Engineer",
  "url": "https://yasser.solutions",
  "description": "Portfolio and blog of Yasser Alattas, Senior Cloud & MLOps Engineer",
  "author": {
    "@type": "Person",
    "name": "Yasser Alattas"
  }
}

C: Cloudflare Workers Code Snippets

1. Dynamic Schema Injection Worker

// Cloudflare Worker: Inject/enhance JSON-LD schema
export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const response = await fetch(request);
    
    // Only modify HTML responses
    const contentType = response.headers.get('content-type') || '';
    if (!contentType.includes('text/html')) return response;
    
    let html = await response.text();
    
    // Add WebSite schema if missing
    if (!html.includes('"@type":"WebSite"') && !html.includes('"@type": "WebSite"')) {
      const websiteSchema = `<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Yasser Alattas Blog",
  "url": "https://blog.yasser.solutions",
  "author": {"@type": "Person", "name": "Yasser Alattas", "url": "https://yasser.solutions"}
}
</script>`;
      html = html.replace('</head>', `${websiteSchema}\n</head>`);
    }
    
    return new Response(html, {
      headers: {
        ...Object.fromEntries(response.headers),
        'X-Robots-Tag': 'index, follow',
      },
    });
  },
};

2. AI Crawler Analytics Worker

// Cloudflare Worker: Log AI crawler visits
const AI_CRAWLERS = [
  'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
  'PerplexityBot', 'Perplexity-User',
  'ClaudeBot', 'Claude-SearchBot',
  'Google-Extended', 'Applebot'
];

export default {
  async fetch(request, env) {
    const ua = request.headers.get('user-agent') || '';
    const crawler = AI_CRAWLERS.find(c => ua.includes(c));
    
    if (crawler) {
      // Log to Cloudflare Analytics Engine or KV
      const logEntry = {
        timestamp: new Date().toISOString(),
        crawler,
        path: new URL(request.url).pathname,
        ip: request.headers.get('cf-connecting-ip'),
      };
      
      // Store in KV for later analysis
      if (env.AI_CRAWLER_LOGS) {
        const key = `${logEntry.timestamp}-${crawler}`;
        await env.AI_CRAWLER_LOGS.put(key, JSON.stringify(logEntry), {
          expirationTtl: 86400 * 30, // 30 days
        });
      }
    }
    
    return fetch(request);
  },
};

3. Robots.txt Dynamic Handler

// Cloudflare Worker: Serve enhanced robots.txt
export default {
  async fetch(request) {
    const url = new URL(request.url);
    if (url.pathname !== '/robots.txt') return fetch(request);
    
    const robotsTxt = `# Yasser Alattas — Cloud Infrastructure Engineer
# AI Search Crawlers: Welcome
# AI Training Crawlers: Please respect our boundaries

User-agent: *
Allow: /
Sitemap: ${url.origin}/sitemap.xml

# AI Search Crawlers — ALLOWED
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Applebot
Allow: /

# AI Training Crawlers — BLOCKED
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /
`;
    
    return new Response(robotsTxt, {
      headers: { 'Content-Type': 'text/plain' },
    });
  },
};

Research compiled from 35+ primary sources including academic papers, official platform guidance, and industry case studies.


Share this post on:

Previous Post
What Anthropic's Labor Market Research Means for Saudi Arabia's White-Collar Future
Next Post
Every Kubernetes Cluster I Audit Has the Same Problems