{
  "name": "Discover article URLs from any website with GPT-5-mini and Google Sheets",
  "nodes": [
    {
      "id": "bce47b0d-6d94-4dbf-9b27-ef7b1f1385e9",
      "name": "Sticky Note - Introduction",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -400,
        80
      ],
      "parameters": {
        "width": 460,
        "height": 540,
        "content": "## 🚀 AI-Powered Multi-Source URL Discovery Engine\n\nThis workflow automatically discovers article URLs from any website using AI intelligence.\n\n### How it works:\n1. **Input URLs** → Read seed URLs from"
      }
    },
    {
      "id": "e30f2f04-4f9f-4ca9-8561-236c6a69f234",
      "name": "Sticky Note - Input",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        336,
        80
      ],
      "parameters": {
        "width": 280,
        "height": 536,
        "content": "## 📥 Input Stage\n\nReads seed URLs from your Google Sheets.\n\n**Required Sheet Columns:**\n- `URL` - The webpage to crawl\n\n**Tip:** Add multiple publisher homepages, blog indexes, or news feeds to discov"
      }
    },
    {
      "id": "6822173c-3e3a-45a3-9b35-495d9d933766",
      "name": "Sticky Note - Loop",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        720,
        80
      ],
      "parameters": {
        "width": 424,
        "height": 536,
        "content": "## 🔄 Processing Loop\n\nProcesses each URL with rate limiting to avoid being blocked.\n\n**Features:**\n- Batch processing (1 at a time)\n- Wait node prevents rate limiting\n- Error handling continues on fai"
      }
    },
    {
      "id": "156a24f3-8e3f-4ffa-853d-2da7721e6f82",
      "name": "Sticky Note - Fetch",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1424,
        80
      ],
      "parameters": {
        "width": 440,
        "height": 536,
        "content": "## 🌐 Web Fetching\n\nFetches HTML with browser User-Agent to avoid bot detection.\n\n**Key Settings:**\n- Custom User-Agent header\n- Error handling: continue on failure\n- Converts HTML → Markdown for AI"
      }
    },
    {
      "id": "6e4911b5-c368-43e2-8398-5814e7a972b4",
      "name": "Sticky Note - AI",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1952,
        80
      ],
      "parameters": {
        "width": 440,
        "height": 708,
        "content": "## 🤖 AI URL Extraction\n\nThe AI Agent analyzes page content and extracts valid article URLs.\n\n**What it identifies:**\n✅ Article/blog/news URLs\n✅ Multi-word slugs with dates\n✅ Content pages\n\n**What it e"
      }
    },
    {
      "id": "7ee53dbe-63b3-4903-9a67-3169f7bbd2db",
      "name": "Sticky Note - Parser",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2464,
        80
      ],
      "parameters": {
        "width": 280,
        "height": 536,
        "content": "## 🔧 URL Parser & Normalizer\n\nCleans and validates AI output.\n\n**Processing:**\n- Strips markdown code fences\n- Parses JSON array\n- Normalizes URLs (removes query params)\n- Removes duplicates\n- Handles"
      }
    },
    {
      "id": "923f5e7d-57ac-47d9-9863-7164d666dbe0",
      "name": "Sticky Note - Output",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2848,
        80
      ],
      "parameters": {
        "width": 328,
        "height": 716,
        "content": "## 💾 Output Stage\n\nSaves discovered URLs to Google Sheets.\n\n**Output Columns:**\n- `URL` - Discovered article URL\n- `Source` - Publisher/category\n- `Status` - Set to \"Pending\"\n\n**Deduplication:**\nUses "
      }
    },
    {
      "id": "daf17cd7-1dc4-4770-99a1-70ee9e0b27c7",
      "name": "Daily Schedule (6 AM)",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        128,
        304
      ]
    },
    {
      "id": "a0ee457a-f997-454f-b4ee-ec411d40688f",
      "name": "Manual Trigger",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        128,
        496
      ]
    },
    {
      "id": "8763f6c9-f9e0-4b48-be2d-7489f1db81d9",
      "name": "Read Seed URLs",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        432,
        400
      ]
    },
    {
      "id": "69fe92b6-996f-4225-9bda-7785bda6daae",
      "name": "Loop Over URLs",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        784,
        400
      ]
    },
    {
      "id": "7173466b-9bef-4683-8358-86cb7726d357",
      "name": "Rate Limit (3s)",
      "type": "n8n-nodes-base.wait",
      "position": [
        1008,
        416
      ]
    },
    {
      "id": "27de86ef-f8e0-4b22-85eb-3d67a73340f9",
      "name": "Fetch Webpage HTML",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1488,
        416
      ]
    },
    {
      "id": "6cbae844-1781-438d-b8e4-6e1ae000b836",
      "name": "HTML to Markdown",
      "type": "n8n-nodes-base.markdown",
      "position": [
        1712,
        416
      ]
    },
    {
      "id": "1472c962-8dcf-4b43-bf84-14626ed71583",
      "name": "AI URL Extractor",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        2048,
        416
      ]
    },
    {
      "id": "6caf2efa-9170-4c17-8931-8d2725188a30",
      "name": "URL Parser & Normalizer",
      "type": "n8n-nodes-base.code",
      "position": [
        2560,
        400
      ]
    },
    {
      "id": "5af45284-92bc-4ae9-8b36-f078765b0a57",
      "name": "Save Discovered URLs",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        2960,
        608
      ]
    },
    {
      "id": "45d35d27-b373-480a-95c0-2d82fade93d0",
      "name": "Completion Summary",
      "type": "n8n-nodes-base.set",
      "position": [
        1216,
        224
      ]
    },
    {
      "id": "e5d9866e-c489-44ae-88bd-a79f3a7869c9",
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        2096,
        624
      ]
    }
  ],
  "connections": {
    "Loop Over URLs": {
      "main": [
        [
          {
            "node": "Completion Summary",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Rate Limit (3s)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Manual Trigger": {
      "main": [
        [
          {
            "node": "Read Seed URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Read Seed URLs": {
      "main": [
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Rate Limit (3s)": {
      "main": [
        [
          {
            "node": "Fetch Webpage HTML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "AI URL Extractor": {
      "main": [
        [
          {
            "node": "URL Parser & Normalizer",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTML to Markdown": {
      "main": [
        [
          {
            "node": "AI URL Extractor",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "AI URL Extractor",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Webpage HTML": {
      "main": [
        [
          {
            "node": "HTML to Markdown",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Save Discovered URLs": {
      "main": [
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Daily Schedule (6 AM)": {
      "main": [
        [
          {
            "node": "Read Seed URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "URL Parser & Normalizer": {
      "main": [
        [
          {
            "node": "Save Discovered URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}