Extract Invoice Data from PDFs to JSON with Gemini AI and XML Transformation β€” n8n Workflow

Medium complexity⚑ Trigger6 nodes🏷️ Invoice Processingby Mauricio Perera

Overview

This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation β€” without writing any code.

πŸš€ How it works

Upload form β†’ The user uploads a PDF file. Text extraction β†’ The PDF content is extracted as plain text. XML schema definition β†’ A standard invoice structure is defined with fields such as:

Invoice number Customer and issuer details Items with description, quantity, and price Totals and taxes Bank account details AI (

Nodes used

Google Gemini

Workflow Preview

PDF to text
Clean data and XML structure definition
Generate XML string
String to XML to Json
⚑
O
On form submission
E
Extract from File
Message a model
L
Limpio data
L
Limpio XML
X
XML to JSON
6 nodes5 edges

How it Works

  1. 1

    Trigger

    The workflow starts with a trigger trigger.

  2. 2

    Process

    Data flows through 6 nodes, connecting extractfromfile, formtrigger, googlegemini.

  3. 3

    Output

    The workflow completes its automation and delivers the result to the configured destination.

Node Details (6)

GO

Google Gemini

n8n-nodes-langchain.googleGemini

#1

How to Import This Workflow

  1. 1Click Download JSON button on the right to save the workflow file.
  2. 2Open your n8n instance. Go to Workflows β†’ New β†’ Import from file.
  3. 3Select the downloaded extract-invoice-data-from-pdfs-to-json-with-gemini-ai-and-xml-transformation file and click Import.
  4. 4Set up credentials for each service node (API keys, OAuth, etc.).
  5. 5Click Test Workflow to verify everything works, then activate it.

Or paste directly in n8n β†’ Import from JSON:

{ "name": "Extract Invoice Data from PDFs to JSON with Gemini AI and XML Transformation", "nodes": [...], ...}

Integrations

extractfromfileformtriggergooglegeminisetxml

Get This Workflow

Download and import in one click

Download JSONView on n8n.io
Nodes6
Complexitymedium
Triggertrigger

Created by

Mauricio Perera

Mauricio Perera

@rckflr

Tags

extractfromfileformtriggergooglegeminisetxml
⚑

New to n8n?

n8n is a free, open-source workflow automation tool. Self-host it or use the cloud version.

Get n8n Free β†’

Related Invoice Processing Workflows

COCOEMEX+5
medium

Automate Custom QuickBooks Invoice PDFs & Email with n8n

Standard accounting templates often fail to reflect a premium brand identity. This sophisticated n8n workflow bridges the gap between financial record-keeping and professional client presentation. By moving beyond the native limitations of QuickBooks Online, this automation enables businesses to generate high-end, multi-page PDF invoices that align perfectly with their corporate styling. The process begins the moment a new invoice is generated in QuickBooks, triggering a webhook that captures real-time billing data. The workflow then utilizes advanced HTML-to-File conversion and custom Code nodes to structure data into a polished, branded layout. It handles complex logic such as line-item merging and multi-page formatting automatically. Once the document is rendered, the system bypasses generic 'no-reply' senders by routing the finalized PDF through your preferred email provider. This ensures a seamless, white-labeled experience for your clients while eliminating the manual overhead of exporting, styling, and attaching files. Ideal for agencies and service providers, this flow guarantees that your most frequent touchpointβ€”the billβ€”is as professional as your work. **Common Use Cases:** - High-end creative agencies requiring bespoke, white-labeled billing documents for premium clients. - Automated recurring subscription billing where custom tax disclosures or localized branding are required. - Service-based businesses needing to attach dynamic project reports or terms of service directly to QuickBooks invoices.

πŸ”— WebhookΒ·12 nodes