text-chunking

Star

Here are 32 public repositories matching this topic...

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Feb 14, 2026
Python

lazyFrogLOL / llmdocparser

Star

A package for parsing PDFs and analyzing their content using LLMs.

nlp ocr chunking document-analysis pdf-parser pdfparser rag llm text-chunking

Updated Aug 6, 2024
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Feb 3, 2026
JavaScript

drittich / SemanticSlicer

Star

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

windows macos linux ai embeddings openai gpt chunking chunker azure-openai llm chatgpt chat-gpt langchain text-chunking

Updated Feb 26, 2026
C#

GregorBiswanger / SemanticChunker.NET

Sponsor

Star

Embedding-driven, context-aware text chunking for Semantic Kernel and RAG workflows in .NET

library ai csharp dotnet chunking slm embedding rag llm semantic-kernel semantickernel text-chunking semanticchunker

Updated Feb 9, 2026
C#

rasinmuhammed / rag-tui

Star

⚡ Debug your RAG pipeline without leaving the terminal. Real-time chunking visualization, batch testing, quality metrics, and one-click export to LangChain/LlamaIndex.

python nlp textual tui embeddings developer-tools chunking terminal-ui rag vector-search llm langchain llamaindex retrieval-augmented-generation ollama text-chunking

Updated Feb 8, 2026
Python

This project is designed to extract text from documents and prepare it for processing by Large Language Models (LLM). Implemented a feature to store and utilize text style information, enabling the program to identify and segment content based on potential headers and titles.

python data-processing text-parsing large-language-models llms text-chunking

Updated Nov 17, 2024
HTML

smart-models / Sentences-Chunker

Star

Cutting-edge tool designed to intelligently segment text documents into optimally-sized chunks

nlp docker-compose gpu-acceleration document-processing rag fastapi text-chunking

Updated Sep 30, 2025
Python

betcorg / llm-text-splitter

Star

A lightweight TypeScript text splitter for RAG applications

chatbots rag text-splitter text-chunking

Updated Feb 28, 2026
TypeScript

Aegean-E / NeuralDeck

Sponsor

Star

Turn any document into a powerful Anki deck with NeuralDeck. This offline desktop app uses local AI to create high-quality flashcards from your PDFs, Word documents, and more. With smart deck matching, AI editing, and direct Anki sync, NeuralDeck is built for serious students who demand control, privacy, and efficiency.

desktop-app python flashcards anki anki-addon document-processing privacy-first pdf-processing study-tool local-llm text-chunking llm-integration openai-compatible docx-processing ai-study-assistant

Updated Feb 24, 2026
Python

philnash / chunkers

Sponsor

Star

An exploration of text splitting and chunking in JavaScript

text-splitter llamaindex langchain-js text-chunking text-splitting

Updated Nov 20, 2025
TypeScript

vivet / Vivet.AI

Sponsor

Star

A service-oriented .NET library for AI with interchangeable orchestrations and vector stores.

chat ai knowledge memory azure inference openai summarization embedding huggingface llm metadata-retrieval ollama amazon-bedrock text-chunking google-gemini context-deduplication

Updated Nov 20, 2025
C#

njyeung / go-semantic-chunking

Star

Sementic chunking algorithm in (mostly) Go

vector embeddings chunking semantic-segmentation text-splitter text-chunking semantic-chunking retreival-augmented-generation

Updated Feb 6, 2026
Go

Besthope-Official / predoc

Star

Preprocess document service for RAG (Retriveal Augumented Generation)

api microservice yolo pdf-parser text-embedding document-parser rag text-chunking

Updated Oct 22, 2025
Python

cspnms / MSchunker

Star

Smart text chunker for LLM preprocessing (sections → paragraphs → sentences → hard splits).

Updated Dec 7, 2025
Python

Khushmeet-patil / Sementic-Search-task

Star

A Streamlit-based semantic search engine that converts documents into embeddings and retrieves the most meaningful text chunks using cosine similarity and dynamic chunking.

python nlp machine-learning information-retrieval ai cosine-similarity semantic-search streamlit sentence-transformers text-chunking

Updated Dec 7, 2025
Python

ushakiranmai / text_summarization

Star

This Text Summarization Tool uses advanced machine learning models to create concise, meaningful summaries of lengthy texts. Built with Hugging Face Transformers and Gradio, it efficiently handles various input lengths, ideal for summarizing articles, reports, and more

web-development file-handling text-summarization gradio-interface text-chunking model-handling output-formats python-libraries-and-tools

Updated Jan 23, 2025
Python

ItzikAquaMotek / rag-chunk

Star

📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.

tree-sitter library ai csharp dotnet chroma ia code-structure embedding-vectors streamlit hybrid-search aisearch semantickernel text-chunking rag-pipeline llama3 document-chunking propositional-models

Updated Mar 6, 2026
Python

mariamarmolejo / agentes-girly

Star

Curso completo de Agentes IA con LangChain, LangRAP y n8n. Incluye ejemplos prácticos, agentes simples, agentes que resumen PDFs, agentes girly, gestión de entornos, variables de entorno y buenas prácticas con GitHub.

environment-variables intelligent-agents practical-exercises faiss virtual-environment n8n langchain text-chunking custom-prompts faiss-vector-database huggingface-embeddings chatgroq n8n-automation pypdfloader langrap rag-flow simple-agents summarizer-agents line-by-line-comments

Updated Nov 17, 2025
Python

samay-jain / Retrieval-Augmented-Generation-RAG-simple-program

Star

A lightweight, modular Retrieval-Augmented Generation (RAG) system built with Streamlit, FAISS, and LLMs like OpenAI and Ollama. Upload documents, embed them, and ask intelligent questions with real-time context-aware responses.

embeddings openai nomic chroma faiss python-nlp rag vector-search streamlit gpt4 langchain ollama text-chunking llama3 llm-app simple-rag document-question-answering pdf-nlp qa-application

Updated Jun 26, 2025
Python

Improve this page

Add a description, image, and links to the text-chunking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-chunking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-chunking

Here are 32 public repositories matching this topic...

isaacus-dev / semchunk

lazyFrogLOL / llmdocparser

jparkerweb / semantic-chunking

drittich / SemanticSlicer

GregorBiswanger / SemanticChunker.NET

rasinmuhammed / rag-tui

ChenTaHung / HTML-Text-Parser

smart-models / Sentences-Chunker

betcorg / llm-text-splitter

Aegean-E / NeuralDeck

philnash / chunkers

vivet / Vivet.AI

njyeung / go-semantic-chunking

Besthope-Official / predoc

cspnms / MSchunker

Khushmeet-patil / Sementic-Search-task

ushakiranmai / text_summarization

ItzikAquaMotek / rag-chunk

mariamarmolejo / agentes-girly

samay-jain / Retrieval-Augmented-Generation-RAG-simple-program

Improve this page

Add this topic to your repo