📑 Exparso

本ライブラリは、画像を含むドキュメントのパースを行うためのライブラリです。テキストとして出力することで、従来のベクトル検索や全文検索での利用を可能することを目的とします。

📥 インストール方法

LibreOffice

Office ファイルをテキストに変換するために、LibreOffice をインストールします。

# Ubuntu
sudo apt install libreoffice

# Mac
brew install --cask libreoffice

ライブラリのインストール

pip install exparso

💡 使用方法

parse_document 関数を利用して、ドキュメントをパースします。

from exparso import parse_document
from langchain_openai import AzureChatOpenAI

llm_model = AzureChatOpenAI(model="gpt-4o")
text = parse_document(path="path/to/document.pdf", model=llm_model)

📑 対応ファイル

コンテンツタイプ	拡張子
📑 ドキュメント	PDF, PowerPoint
🖼️ 画像	JPEG, PNG, BMP
📝 テキストデータ	テキストファイル, Markdown
📊 表データ	Excel, CSV

🔥 LLM

クラウドベンダー	モデル
Azure	ChatGPT(`gpt-4o`, `gpt-4o-mini`)
Google Cloud	Claude(`claude-3.7-sonnet`,`claude-3.5-sonnet`), Gemini(`gemini-2.0-flash`,`gemini-1.5-flash-`,`gemini-2.0-pro-`)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
docs		docs
eval		eval
exparso		exparso
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📑 Exparso

📥 インストール方法

LibreOffice

ライブラリのインストール

💡 使用方法

📑 対応ファイル

🔥 LLM

About

Uh oh!

Releases 3

Uh oh!

Contributors 2

Uh oh!

Languages

License

InsightEdgeJP/exparso

Folders and files

Latest commit

History

Repository files navigation

📑 Exparso

📥 インストール方法

LibreOffice

ライブラリのインストール

💡 使用方法

📑 対応ファイル

🔥 LLM

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors 2

Uh oh!

Languages