Skip to content

Commit 8909b7e

Browse files
committed
Update docs
1 parent 6e655df commit 8909b7e

3 files changed

Lines changed: 104 additions & 9 deletions

File tree

tests/FSharp.Compiler.Service.Tests2/DepResolving.fs

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,8 @@ let detectFileDependencies (nodes : FileAST[]) : Graph =
194194
)
195195
|> Array.choose id
196196

197+
log "ASTs traversed"
198+
197199
let trie = buildTrie nodes
198200

199201
// Find dependencies for all files (can be in parallel)
@@ -275,7 +277,8 @@ let detectFileDependencies (nodes : FileAST[]) : Graph =
275277
|> Seq.toArray
276278
node.Name, (int64)node.Code.Length, deps
277279
)
278-
analyseEfficiency graph
280+
log "Done"
281+
//analyseEfficiency graph
279282
graph
280283

281284
[<Test>]
@@ -359,8 +362,8 @@ type B = int
359362
let Test () =
360363
log "start"
361364
let m = AnalyzerManager()
362-
// let projectFile = @"C:\projekty\fsharp\fsharp_main\src\Compiler\FSharp.Compiler.Service.fsproj"
363-
let projectFile = @"C:\projekty\fsharp\heuristic\tests\FSharp.Compiler.ComponentTests\FSharp.Compiler.ComponentTests.fsproj"
365+
let projectFile = @"C:\projekty\fsharp\fsharp_main\src\Compiler\FSharp.Compiler.Service.fsproj"
366+
// let projectFile = @"C:\projekty\fsharp\heuristic\tests\FSharp.Compiler.ComponentTests\FSharp.Compiler.ComponentTests.fsproj"
364367
let analyzer = m.GetProject(projectFile)
365368
let results = analyzer.Build()
366369
log "built"

tests/FSharp.Compiler.Service.Tests2/Docs.md

Lines changed: 94 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,13 @@ For more details see https://github.com/dotnet/fsharp/pull/13521
6565

6666
## Parallel type-checking of independent files
6767

68+
The main idea is quite simple:
69+
- process files in a graph order instead of sequential order
70+
- reduce the dependency graph used for type-checking, increasing parallelism
71+
- implement branching and merging of type-checking information
72+
73+
Below is some quasi-theoretical background on this.
74+
6875
### Background
6976
Files in an F# project are ordered and processed from the top (first) and the bottom (last) file.
7077
The compiler ensures that no information, including type information, flows upwards.
@@ -105,8 +112,92 @@ A few slightly imprecise/vague statements about all the graphs:
105112
3. Type-checking _must_ process files in an order that is compatible with the topological order in the _necessary dependencies_ graph.
106113
4. If using a dependency graph as an ordering mechanism for (parallel) type-checking, the closer it is to the _necessary dependencies_ graph, the higher parallelism is possible.
107114
5. Type-checking files in appearance order is equivalent to using the `allowed dependencies` graph for ordering.
108-
6. Removing an edge from a _dependency_ graph _can_ increase (but not decrease) the level of parallelism possible and improve wall-clock time.
115+
6. Removing an edge from the _dependency_ graph used _can_ increase (but not decrease) the level of parallelism possible and improve wall-clock time.
116+
117+
Let's look at point `6.` in more detail.
118+
119+
### The impact of reducing the dependency graph on type-checking parallelisation and wall-clock time.
120+
121+
Let's make a few definitions and assumptions:
122+
1. Time it takes to type-check file f = 'T(f)'
123+
2. Time it takes to type-check files f1...fn in parallel = 'T(f1+...fn)'
124+
3. Time it takes to type-check a file f and all its dependencies = 'D(f)'
125+
4. Type-checking is performed on a machine with infinite number of parallel processors.
126+
5. There is no slowdowns due to parallel processing, ie. T(f1+...+fn) = max(T(f1),...,T(fn))
127+
128+
With the above it can be observed that:
129+
```
130+
D(f) = max(D(n)) + T(f), for n = any necessary dependency of f
131+
```
132+
In other words wall-clock time for type-checking using a given dependency graph is equal to the "longest" path in the graph.
133+
134+
Therefore the main goal that the idea presented here aims to achieve is to replace the _allowed dependencies_ graph as currently used with a reduced graph that's much closer to the _necessary dependencies_ graph, therefore optimising the type-checking process.
135+
136+
## A way to reduce the dependency graph used
137+
138+
For all practical purposes the only way to calculate the _necessary dependencies_ graph fully accurately is to perform the type-checking process, which misses the point of this exercise.
139+
140+
However, there exist cheaper solutions that reduce the initial graph significantly with low computational cost, providing a good trade-off.
141+
142+
As noted in https://github.com/dotnet/fsharp/discussions/11634 , scanning the ASTs can provide a lot information that helps narrow down the set of types, modules/namespaces and files that a given file _might_ depend on.
143+
144+
This is the approach we're taking.
145+
146+
The dependency detection algorithm can be summarised as follows:
147+
1. For each parsed file in parallel:
148+
1. Extract its top-level module or a list of top-level namespaces
149+
2. Find all partial module references in the AST by traversing it once.
150+
2. Build a single [Trie](https://en.wikipedia.org/wiki/Trie) by adding all top-level items extracted in 1.i. Every module/namespace _segment_ (eg. `FSharp, Compiler, Service in FSharp.Compiler.Service`) is represented by a single edge in the Trie. Note down positions of all added files in the Trie.
151+
3. For each file in parallel:
152+
1. Clone the Trie.
153+
2. Start by marking the root as 'reachable'.
154+
3. Process all partial module references found in this file in 1.ii one-by-one, in order of appearance in the AST:
155+
1. For a given partial reference:
156+
1. Start at every 'reachable' node.
157+
2. 'Extend' the node with the new partial reference by walking down the Trie. If a leaf is reached, do not go further/do not extend the Trie.
158+
3. Mark the reached node as 'reachable'
159+
4. Collect all reachable nodes and their ancestors.
160+
5. Find all files that added one or more modules/namespaces to any of the nodes found in 5.
161+
6. Return those as dependencies.
162+
163+
### Performance
164+
165+
Little to no effort has been invested in optimising the algorithm.
166+
167+
However, initial tests show that in its current unoptimised form it is very performant.
168+
169+
Sample results for `FSharp.Compiler.Service`:
170+
171+
| Phase | Time |
172+
|------------------------------------------|-------------------------------------|
173+
| Parallel Parsing | 1.70s |
174+
| Type-Checking | 21.6s (13s with fsi optimisation) |
175+
| Total compilation time w/o optimisations | 40.3s (33.3s with fsi optimisation) |
176+
| Dependency resolution - total | 0.23s |
177+
| Dependency resolution - AST traversal | 0.18s |
178+
| Dependency resolution - Trie processing | 0.05s |
179+
180+
Things that can be easily improved:
181+
- Quicker AST traversal - tail-recursion, not using `seq`, avoid allocations
182+
- Quicker Trie operations
183+
184+
#### Overhead of dispatching work and branching/merging state
185+
186+
On top of dependency resolution, the feature will add some overhead for dispatching work and allowing separation of type-checking state in the graph.
187+
188+
No timings are available at the moment.
189+
190+
## The problem of maintaining multiple instances of type-checking information
191+
192+
The parallel type-checking idea generates a problem that needs to be solved.
193+
Instead of one instance of the type-checking information, we now have to:
194+
- 'clone' an instance multiple times when passing the state from one file to one or more of its dependants
195+
- 'merge' an instance when receiving state instances from one or more dependencies
196+
197+
We believe this is doable, although no implementation exists as of yet.
198+
199+
### Ordering of diagnostics/errors
109200

110-
Let's look at point `5.` in detail.
201+
As noted in https://github.com/dotnet/fsharp/pull/13737#issuecomment-1224124532 , disrupting the processing order makes it difficult to retain the original behaviour when it comes to the order in which diagnostics are presented and suppressed.
111202

112-
### The impact of reducing the dependency graph
203+
The importance of this problem and potential solutions are open questions.

tests/service/Program.fs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ open System
44

55
[<EntryPoint>]
66
let main argv =
7-
//FSharp.Compiler.Service.Tests.DepResolving.Test()
8-
Environment.CurrentDirectory <- "c:/projekty/fsharp/heuristic/src/Compiler"
9-
FSharp.Compiler.Service.Tests.RunCompiler.runCompiler()
7+
FSharp.Compiler.Service.Tests.DepResolving.Test()
8+
0
9+
//Environment.CurrentDirectory <- "c:/projekty/fsharp/heuristic/src/Compiler"
10+
//FSharp.Compiler.Service.Tests.RunCompiler.runCompiler()

0 commit comments

Comments
 (0)