A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
-
Updated
Feb 14, 2026 - Python
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
A package for parsing PDFs and analyzing their content using LLMs.
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
Embedding-driven, context-aware text chunking for Semantic Kernel and RAG workflows in .NET
⚡ Debug your RAG pipeline without leaving the terminal. Real-time chunking visualization, batch testing, quality metrics, and one-click export to LangChain/LlamaIndex.
This project is designed to extract text from documents and prepare it for processing by Large Language Models (LLM). Implemented a feature to store and utilize text style information, enabling the program to identify and segment content based on potential headers and titles.
Cutting-edge tool designed to intelligently segment text documents into optimally-sized chunks
A lightweight TypeScript text splitter for RAG applications
Turn any document into a powerful Anki deck with NeuralDeck. This offline desktop app uses local AI to create high-quality flashcards from your PDFs, Word documents, and more. With smart deck matching, AI editing, and direct Anki sync, NeuralDeck is built for serious students who demand control, privacy, and efficiency.
An exploration of text splitting and chunking in JavaScript
A service-oriented .NET library for AI with interchangeable orchestrations and vector stores.
Sementic chunking algorithm in (mostly) Go
Preprocess document service for RAG (Retriveal Augumented Generation)
Smart text chunker for LLM preprocessing (sections → paragraphs → sentences → hard splits).
A Streamlit-based semantic search engine that converts documents into embeddings and retrieves the most meaningful text chunks using cosine similarity and dynamic chunking.
This Text Summarization Tool uses advanced machine learning models to create concise, meaningful summaries of lengthy texts. Built with Hugging Face Transformers and Gradio, it efficiently handles various input lengths, ideal for summarizing articles, reports, and more
📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.
Curso completo de Agentes IA con LangChain, LangRAP y n8n. Incluye ejemplos prácticos, agentes simples, agentes que resumen PDFs, agentes girly, gestión de entornos, variables de entorno y buenas prácticas con GitHub.
A lightweight, modular Retrieval-Augmented Generation (RAG) system built with Streamlit, FAISS, and LLMs like OpenAI and Ollama. Upload documents, embed them, and ask intelligent questions with real-time context-aware responses.
Add a description, image, and links to the text-chunking topic page so that developers can more easily learn about it.
To associate your repository with the text-chunking topic, visit your repo's landing page and select "manage topics."