Codex Claude Code ClaudeCursorGemini

Data Extractor

Extract structured data from documents in any format: PDF, DOCX, HTML, TXT, images, and more. Converts unstructured or semistructured content into clean JSON, CSV, or other structured formats. Handles invoices, forms, reports, and freetext documents.

Overview

The Data Extractor is a specialized utility designed for AI agents to process unstructured and semi-structured information from various file formats. Available within the TerminalSkills/skills repository, this tool enables agents like Claude, Gemini, and Codex to parse content from PDFs, DOCX files, HTML, and images. It focuses on transforming raw text, invoices, and reports into organized formats such as JSON or CSV for further analysis. By leveraging this skill, users can automate the conversion of free-text documents into machine-readable data structures. The TerminalSkills collection, which hosts this tool, currently maintains a popularity rating of 72 stars on GitHub, reflecting its utility for developers building data-driven agentic workflows and automated document processing pipelines.

Use Cases

Converting scanned invoice images into structured JSON for accounting software integration.

Parsing complex PDF reports to extract specific data points for Pandas-based analysis.

Transforming unstructured HTML or text files into CSV format for database ingestion.

Install Notes

# Review source first
open https://github.com/TerminalSkills/skills/blob/main/skills/data-extractor/SKILL.md

Copy or clone the skill folder into your agent skills directory after reviewing its instructions and scripts.

Security Notes

This skill processes document content to generate structured outputs. Users should ensure that sensitive information within PDFs, images, or text files is handled according to their specific privacy requirements. As part of the TerminalSkills repository, the tool operates within the execution environment of the compatible AI agent, and data handling is subject to the permissions granted to that agent.

Related Skills

Electron

vercel-labs/agent-browser

Automate Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify, etc.) using agent-browser via Chrome DevTools Protocol. Use when the user needs to interact with an Electron app, automate a desktop app, connect to a running app, control a native app, or test an Electron application. Triggers include "au

37,057 starsSource linked

CodeQL

trailofbits/skills

Scans a codebase for security vulnerabilities using CodeQL's interprocedural data flow and taint tracking analysis. Triggers on "run codeql", "codeql scan", "codeql analysis", "build codeql database", or "find vulnerabilities with codeql". Supports "run all" (security-and-quality + security-experimental suites) and "im

Claude CodeClaude

typescriptpython

5,853 starsSource linked

Deep Agents Orchestration

langchain-ai/langchain-skills

INVOKE THIS SKILL when using subagents, task planning, or human approval in Deep Agents. Covers SubAgentMiddleware, TodoList for planning, and HITL interrupts.

typescriptpython

817 starsSource linked

LangChain Fundamentals

langchain-ai/langchain-skills

Create LangChain agents with create_agent, define tools, and use middleware for human-in-the-loop and error handling.

typescriptpython

817 starsSource linked

LangGraph Fundamentals

langchain-ai/langchain-skills

INVOKE THIS SKILL when writing ANY LangGraph code. Covers StateGraph, state schemas, nodes, edges, Command, Send, invoke, streaming, and error handling.

typescriptpython

817 starsSource linked

Ecosystem Primer

langchain-ai/langchain-skills

INVOKE FIRST for any LangChain / LangGraph / Deep Agents agent building project before consulting other skills or writing any agent code. Required starting point for up to date info on framework selection (LangChain vs LangGraph vs Deep Agents vs hybrid composition), agent patterns, install, environment setup, and whic

typescriptpython

817 starsSource linked