Bug report: our implementation of searching SharePoint in large resultsets is buggy

Today I was searching SharePoint using PnP PowerShell (`Invoke-PnPSearchQuery`) and ran into duplicated results on large resultsets.

I got like a 100.000 files as resultrows, but only a third of those results were unique results (unique based on the URL of the file). Which is an odd thing. It appeared I was finding all files, but what I actually got back was duplicated rows. (Not due to TrimDuplicates:False)

In this scenario I implemented paging myself (for some reasons). I discovered that when I used the `-All` parameter on the commandlet (instead of paging myself), I would get as many result rows, but now they were all unique files. Very odd discovery. 

When looking in the PnP.PowerShell codebase (and searching online) it turned out that using paging with the StartRow property is very untrustworthy. And therefore PnP.PowerShell has implemented a different way, that is also documented by MS. Source: https://learn.microsoft.com/en-us/sharepoint/dev/general-development/pagination-for-large-result-sets

Although this source doesn't mention the duplicate results that I was struggling with, it does show that for paging on large resultsets you should sort by the DocId and page like that. And that's precisely what PnP.PowerShell is doing if you look into their codebase.

 I looked into ours and it seems we are paging by StartRow when `--allResults` is used. 

I propose that we update the way we page in this command to the DocId version, to make our command trustworthy on large resultsets as well.

## Implementation

In short, this is what we need to do:

1. Query everything where `IndexDocId>${lastDocId}`
2. Sort by DocId: `[DocId]:ascending` 

I wrote down a slightly more extensive explanation in my blog:
https://www.blimped.nl/correctly-paging-when-searching-sharepoint/

## Additional

While we;re at it, lets add extra verbose message while looping all results.

Something like:

```text
Processing search query, retrieved 1000 of 1000000 items
Processing search query, retrieved 1500 of 1000000 items
Processing search query, retrieved 2000 of 1000000 items
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report: our implementation of searching SharePoint in large resultsets is buggy #5710

Implementation

Additional

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug report: our implementation of searching SharePoint in large resultsets is buggy #5710

Description

Implementation

Additional

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions