OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
After successfully extracting text from a .docx file which includes text in table cells, HybridChunker removes useful text in the chunks resulting in the loss of data. Experienced in AWS on v2.49.0 ...