Skip to content

terrierteam/pyterrier_deepct

Repository files navigation

pyterrier-deepct

Advanced PyTerrier bindings for DeepCT.

Installation

pip install --upgrade git+https://github.com/terrierteam/pyterrier_deepct.git

Usage

from pyterrier_deepct import DeepCT, Toks2Text
deepct = DeepCT() # loads macavaney/deepct, a version of the model weights converted to huggingface format by default
indexer = deepct >> Toks2Text() >> pt.IterDictIndexer("./deepct_index_path")
indexer.index(dataset.get_corpus_iter())

Options:

  • device: device to run the model on, defualt cuda if available (or cpu if not)
  • batch_size: batch size when encoding documents, defualt 64
  • scale: score multiplier that moves the model outputs to a reasonable integer range, default 100
  • round: round the scores to the nearest integer, default True

Usage (legacy API)

The old API uses the deepct repository, which requires version 1 of tensorflow (not available everywhere, e.g., Colab).

Given an existing DeepCT checkpoint and original Google BERT files, an DeepCT transformer can be created as follows:

from pyterrier_deepct import DeepCTTransformer
deepct = pyterrier_deepct.DeepCTTransformer("bert-base-uncased/bert_config.json", "marco/model.ckpt-65816")
indexer = deepct >> pt.IterDictIndexer("./deepct_index_path")
indexer.index(dataset.get_corpus_iter())

Demos

  • vaswani.ipy - [Github] [Colab] - demonstrates end-to-end indexing and retrieval on the Vaswani corpus (~11k documents)

References

Credits

  • Craig Macdonald, University of Glasgow
  • Sean MacAvaney, University of Glasgow

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •