{
  "name": "Index legal documents for hybrid search with Qdrant, OpenAI & BM25",
  "nodes": [
    {
      "id": "2556a724-93f9-4ecc-8112-10458fea8b3e",
      "name": "Create Collection",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        560,
        368
      ]
    },
    {
      "id": "c4c7120a-aff6-4bdd-880b-903761b88af8",
      "name": "Check Collection Exists",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        208,
        288
      ]
    },
    {
      "id": "0639e81c-130c-4fd0-a4df-80509c2f0aaf",
      "name": "If",
      "type": "n8n-nodes-base.if",
      "position": [
        400,
        288
      ]
    },
    {
      "id": "c454200a-9216-4e69-88cf-bcb3f93b65f0",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1056,
        192
      ],
      "parameters": {
        "width": 592,
        "height": 864,
        "content": "## Index Legal Dataset to Qdrant for Hybrid Retrieval\n*This pipeline is the first part of **\"Hybrid Search with Qdrant & n8n, Legal AI\"**.  \nThe second part, **\"Hybrid Search with Qdrant & n8n, Legal "
      }
    },
    {
      "id": "03b3d5c1-cbed-43c6-8d2a-241c8a04d79d",
      "name": "Index Dataset from HuggingFace",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -368,
        768
      ]
    },
    {
      "id": "8e97d7e3-1daf-4cb8-89ea-6235b0d5f8ad",
      "name": "Split Them All Out",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        256,
        944
      ]
    },
    {
      "id": "4e9a2449-ef56-4f76-b6b6-9195a591e2a8",
      "name": "Get Dataset Splits",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        64,
        944
      ]
    },
    {
      "id": "4227306b-4008-4d3a-a233-404d12729114",
      "name": "Divide Per Row",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        640,
        944
      ]
    },
    {
      "id": "8d9b6c80-00ff-48c5-a9aa-75318c10e080",
      "name": "Loop Over Batches",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        2640,
        496
      ]
    },
    {
      "id": "987ee18a-78b8-46f4-be12-5897176784e0",
      "name": "Aggregate a Batch",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        2976,
        512
      ]
    },
    {
      "id": "5a11322c-665d-41e4-86fa-b7a0b16a4c75",
      "name": "Upsert Points",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        3232,
        512
      ]
    },
    {
      "id": "a4d4ed4a-b24a-4dba-895c-46964d2915be",
      "name": "Limit",
      "type": "n8n-nodes-base.limit",
      "position": [
        1440,
        1264
      ]
    },
    {
      "id": "3d45c4b2-c3da-4add-9256-a9cdba062637",
      "name": "Merge",
      "type": "n8n-nodes-base.merge",
      "position": [
        2224,
        784
      ]
    },
    {
      "id": "8a5ba479-f1b1-4bdf-8934-ff39dfa384dd",
      "name": "Sum them Up",
      "type": "n8n-nodes-base.summarize",
      "position": [
        1856,
        1264
      ]
    },
    {
      "id": "dced86c8-5dfb-4718-89ce-707997268382",
      "name": "Get the Average Text Length",
      "type": "n8n-nodes-base.set",
      "position": [
        2064,
        1264
      ]
    },
    {
      "id": "c6de3504-36f4-47b9-8a1d-7df398284e8e",
      "name": "Loop Over Batches1",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        2640,
        1312
      ]
    },
    {
      "id": "19e6b91d-f03a-4cb7-afd9-a148eb724877",
      "name": "Upsert Points1",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        4192,
        1312
      ]
    },
    {
      "id": "1b4ceeb5-fa40-4544-a4f8-cfd9860de452",
      "name": "Create Collection1",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        3008,
        1840
      ]
    },
    {
      "id": "948b1d9a-a529-4919-bb99-63ce30e2e2a5",
      "name": "Check Collection Exists1",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        2608,
        1744
      ]
    },
    {
      "id": "e73d6246-e782-4293-bd57-ccd9a9276e06",
      "name": "If1",
      "type": "n8n-nodes-base.if",
      "position": [
        2816,
        1744
      ]
    },
    {
      "id": "7809aff3-02d1-45e4-949d-b251b37be7ef",
      "name": "Merge1",
      "type": "n8n-nodes-base.merge",
      "position": [
        3680,
        1312
      ]
    },
    {
      "id": "d68cf8a5-400f-41e3-b8bf-3a3e71ff1985",
      "name": "Split Out",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        3520,
        1104
      ]
    },
    {
      "id": "cdac0c35-6aa9-441a-9859-3f3bfa8e3521",
      "name": "Get OpenAI embeddings",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        3344,
        1104
      ]
    },
    {
      "id": "3a5ba038-021f-4cfc-8d59-189357309479",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        0,
        592
      ],
      "parameters": {
        "width": 1344,
        "height": 528,
        "content": "## Get Dataset from Hugging Face\n\nFetching a sample dataset from Hugging Face using the [Dataset Viewer API](https://huggingface.co/docs/dataset-viewer/quick_start).\n**Dataset:** [LegalQAEval from isa"
      }
    },
    {
      "id": "4f9d02bb-6474-4448-9eab-5bc599cc2587",
      "name": "Get Dataset Rows (Pagination)",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        448,
        944
      ]
    },
    {
      "id": "d1b63d11-d424-44ca-8ca9-843eb488235a",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1424,
        1024
      ],
      "parameters": {
        "width": 800,
        "height": 416,
        "content": "## Estimate Average Length of Text Chunks\n\nAverage length of texts in the dataset is a part of the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) formula used for keyword-based retrieval.\n\n1. **Sele"
      }
    },
    {
      "id": "b16cbdd6-789c-4b21-8755-502e089ca547",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        16,
        -128
      ],
      "parameters": {
        "width": 1088,
        "height": 640,
        "content": "## Create [Qdrant Collection](https://qdrant.tech/documentation/concepts/collections/) for Hybrid Search\nThe collection used for **Hybrid Search** is configured here with two types of vectors:\n\n**1. ["
      }
    },
    {
      "id": "3f4cedea-edeb-4796-967b-d75b95fd4aad",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2544,
        288
      ],
      "parameters": {
        "width": 960,
        "height": 480,
        "content": "## (Option №1) Index Text Chunks to Qdrant Using [Cloud Inference](https://qdrant.tech/documentation/cloud/inference/)\n\n- **Embed & upsert text chunks in batches**  \n  - **Dense embeddings inference +"
      }
    },
    {
      "id": "67fc6b7c-9168-4214-94cd-3c2d68e477cc",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2528,
        1552
      ],
      "parameters": {
        "width": 688,
        "height": 448,
        "content": "## (Option №2) 1. Configure a Collection for OpenAI Embeddings & BM25 Retrieval\nSince [`text-embedding-3-small`] OpenAI embeddings have a different dimensionality (1536) than mxbai embeddings (1024), "
      }
    },
    {
      "id": "ed76cf94-3b3b-4c8f-af1f-2ea5f7096785",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2512,
        864
      ],
      "parameters": {
        "width": 1872,
        "height": 1152,
        "content": "## (Option №2) Index Text Chunks to Qdrant Using External Embedding Provider (OpenAI)\n*Don't forget to create and configure a separate collection for OpenAI’s [`text-embedding-3-small`](https://platfo"
      }
    },
    {
      "id": "5eb0cbf7-a151-4bf4-a180-914909a04901",
      "name": "Restructure for Deduplicating",
      "type": "n8n-nodes-base.set",
      "position": [
        816,
        944
      ]
    },
    {
      "id": "e3f582f9-aad1-47a4-83a8-1e0127b78ce9",
      "name": "Restructure for Batching",
      "type": "n8n-nodes-base.set",
      "position": [
        1200,
        944
      ]
    },
    {
      "id": "74568439-a6ab-4f4e-acc5-9a0784d6c1d2",
      "name": "Deduplicate Texts",
      "type": "n8n-nodes-base.summarize",
      "position": [
        1008,
        944
      ]
    },
    {
      "id": "b65a9c60-44e1-465c-99f4-1d33428e5c4a",
      "name": "Calculate #words in Each Text",
      "type": "n8n-nodes-base.set",
      "position": [
        1648,
        1264
      ]
    },
    {
      "id": "f778e469-8a74-47fe-a854-7da473156f87",
      "name": "Edit Fields",
      "type": "n8n-nodes-base.set",
      "position": [
        2912,
        1104
      ]
    },
    {
      "id": "5a66c3c1-2c6b-4280-b7cb-514f2ae5c720",
      "name": "Aggregate a Batch to Embed",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        3088,
        1216
      ]
    },
    {
      "id": "1e4971c7-c41f-4e7b-b9a1-c777193578c7",
      "name": "Aggregate a Batch to Upsert",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        3952,
        1312
      ]
    }
  ],
  "connections": {
    "If": {
      "main": [
        [],
        [
          {
            "node": "Create Collection",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "If1": {
      "main": [
        [],
        [
          {
            "node": "Create Collection1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Limit": {
      "main": [
        [
          {
            "node": "Calculate #words in Each Text",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Merge": {
      "main": [
        [
          {
            "node": "Loop Over Batches",
            "type": "main",
            "index": 0
          },
          {
            "node": "Loop Over Batches1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Merge1": {
      "main": [
        [
          {
            "node": "Aggregate a Batch to Upsert",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out": {
      "main": [
        [
          {
            "node": "Merge1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Sum them Up": {
      "main": [
        [
          {
            "node": "Get the Average Text Length",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Upsert Points": {
      "main": [
        [
          {
            "node": "Loop Over Batches",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Divide Per Row": {
      "main": [
        [
          {
            "node": "Restructure for Deduplicating",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Upsert Points1": {
      "main": [
        [
          {
            "node": "Loop Over Batches1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate a Batch": {
      "main": [
        [
          {
            "node": "Upsert Points",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Create Collection": {
      "main": [
        []
      ]
    },
    "Deduplicate Texts": {
      "main": [
        [
          {
            "node": "Restructure for Batching",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Batches": {
      "main": [
        [],
        [
          {
            "node": "Aggregate a Batch",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Dataset Splits": {
      "main": [
        [
          {
            "node": "Split Them All Out",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Batches1": {
      "main": [
        [
          {
            "node": "Edit Fields",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Merge1",
            "type": "main",
            "index": 1
          },
          {
            "node": "Aggregate a Batch to Embed",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Them All Out": {
      "main": [
        [
          {
            "node": "Get Dataset Rows (Pagination)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get OpenAI embeddings": {
      "main": [
        [
          {
            "node": "Split Out",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Collection Exists": {
      "main": [
        [
          {
            "node": "If",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Collection Exists1": {
      "main": [
        [
          {
            "node": "If1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Restructure for Batching": {
      "main": [
        [
          {
            "node": "Limit",
            "type": "main",
            "index": 0
          },
          {
            "node": "Merge",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate a Batch to Embed": {
      "main": [
        [
          {
            "node": "Get OpenAI embeddings",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate a Batch to Upsert": {
      "main": [
        [
          {
            "node": "Upsert Points1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get the Average Text Length": {
      "main": [
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Calculate #words in Each Text": {
      "main": [
        [
          {
            "node": "Sum them Up",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Dataset Rows (Pagination)": {
      "main": [
        [
          {
            "node": "Divide Per Row",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Restructure for Deduplicating": {
      "main": [
        [
          {
            "node": "Deduplicate Texts",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Index Dataset from HuggingFace": {
      "main": [
        [
          {
            "node": "Get Dataset Splits",
            "type": "main",
            "index": 0
          },
          {
            "node": "Check Collection Exists",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}