Skip to content

Conversation

@jedyang97
Copy link
Contributor

@jedyang97 jedyang97 commented Jun 1, 2023

Try the newest demo at here!

Specifically, we have made improvement on:

  • Use LERF embeddings + DBSCAN clustering to determine camera poses
  • Take a picture for each object instance
  • Use LLaVA-13B to caption each picture
  • GPT-4 reads all captions and reason internally or ask user for clarification to ground object
  • Display grounding results to user: object instances highlighted in a 3D mesh using bounding sphere
  • Significantly speed up the pipeline with parallelization on rendering and LLaVA inference

"How many doors are there in this room?"
image

"find all the chairs"
image

@jedyang97 jedyang97 requested a review from XuweiyiChen June 1, 2023 07:45
@jedyang97 jedyang97 self-assigned this Jun 1, 2023
@jedyang97 jedyang97 requested a review from JasonQSY June 1, 2023 07:49
@jedyang97 jedyang97 merged commit f419c7b into main Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants