Hani Alomari hanialomari

Hi, I'm Hani Alomari 👋

I build retrieval systems that understand images, text, video, and sound - not just literal matches.

I'm a PhD researcher at Virginia Tech, working on vision-language models (VLMs), RAG, and ranking/reranking. My focus is multi-prompt (multi-vector) embeddings: many small, controllable "views" of meaning that make search richer, more interpretable, and less prone to collapse.

What I work on

Reasoning in vision-language models (VLMs).
Cross-modal retrieval across images, text, video, and audio.
Structured information extraction from multimodal data.
Knowledge representation for multimodal reasoning.
Exploring room acoustics (RIRs) as spatial signals for learning geometry-aware representations

Why it matters

Real-world queries are polysemous: idioms, metaphor, culture, and context often matter more than surface similarity. I design retrieval pipelines that surface the right connections, not only the nearest neighbor.

Projects (quick view)

Multi-Prompt Embedding for Retrieval
- One input -> multiple focused embeddings to boost recall and reduce length/bias collapse.
RAG + Reranker for Multimodal Search
- Lightweight bi-encoder retrieval + VLM reader + cross-encoder reranker for better final ranking.
Diversity-Aware VLM Retrieval
- Retrieves multiple perspectives (literal/figurative/emotional/abstract/background) instead of forcing a single vector.

Tech I use (most often)

Languages

ML / Data

Systems / Tools

Open to collaborations

If you are working on diversity-aware retrieval, interpretable VLMs, or multimodal reasoning benchmarks, lets talk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hani Alomari hanialomari

Achievements

Achievements

Highlights

Block or report hanialomari