Document Layout Analysis resources repos for development with PdfPig.
-
Updated
Oct 1, 2023 - C#
Document Layout Analysis resources repos for development with PdfPig.
Cross-platform Pdf Reader Application
Extract tables from PDF files (port of tabula-java)
Cross-platform C# library to render PDF as images
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
ChatGPT-like Application using RAG pattern that allows to ask question to my own documents - I Used Semantic Kernel to integrate a LLM (OpenAI) using C# to orchestrate AI pluggins (Azure Cognitive Services). For the document embeddings I used Qdrant for the vector database and Pdfpig to extract the content from the pdfs
Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
AI Resume Analyzer: Azure OpenAI-powered ATS Scoring & Skill Gap Detection
🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries
PDF layout intelligence for .NET — structured extraction tuned for RAG and LLM pipelines.
This project implements a production-style RAG ingestion and retrieval pipeline in .NET 8 using Azure OpenAI, Azure AI Search, and Azure Blob Storage, with an accompanying retrieval benchmark comparing chunk sizes using Recall@K, Precision@K, MRR, and Hit@K.
Add a description, image, and links to the pdfpig topic page so that developers can more easily learn about it.
To associate your repository with the pdfpig topic, visit your repo's landing page and select "manage topics."