Skip to content

Bug report: our implementation of searching SharePoint in large resultsets is buggy #5710

@martinlingstuyl

Description

@martinlingstuyl

Today I was searching SharePoint using PnP PowerShell (Invoke-PnPSearchQuery) and ran into duplicated results on large resultsets.

I got like a 100.000 files as resultrows, but only a third of those results were unique results (unique based on the URL of the file). Which is an odd thing. It appeared I was finding all files, but what I actually got back was duplicated rows. (Not due to TrimDuplicates:False)

In this scenario I implemented paging myself (for some reasons). I discovered that when I used the -All parameter on the commandlet (instead of paging myself), I would get as many result rows, but now they were all unique files. Very odd discovery.

When looking in the PnP.PowerShell codebase (and searching online) it turned out that using paging with the StartRow property is very untrustworthy. And therefore PnP.PowerShell has implemented a different way, that is also documented by MS. Source: https://learn.microsoft.com/en-us/sharepoint/dev/general-development/pagination-for-large-result-sets

Although this source doesn't mention the duplicate results that I was struggling with, it does show that for paging on large resultsets you should sort by the DocId and page like that. And that's precisely what PnP.PowerShell is doing if you look into their codebase.

I looked into ours and it seems we are paging by StartRow when --allResults is used.

I propose that we update the way we page in this command to the DocId version, to make our command trustworthy on large resultsets as well.

Implementation

In short, this is what we need to do:

  1. Query everything where IndexDocId>${lastDocId}
  2. Sort by DocId: [DocId]:ascending

I wrote down a slightly more extensive explanation in my blog:
https://www.blimped.nl/correctly-paging-when-searching-sharepoint/

Additional

While we;re at it, lets add extra verbose message while looping all results.

Something like:

Processing search query, retrieved 1000 of 1000000 items
Processing search query, retrieved 1500 of 1000000 items
Processing search query, retrieved 2000 of 1000000 items

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions