Skip to content

Conversation

@JongHeonChoi
Copy link
Contributor

@JongHeonChoi JongHeonChoi commented Jul 14, 2021

Add new 'symbolicate' commands to dotnet-stack.

# dotnet dotnet-stack -h
Usage:
  dotnet-stack [options] [command]

Options:
  --version    Display version information

Commands:
  report                      reports the managed stacks from a running .NET process
  ps                          Lists the dotnet processes that traces can be collected
  symbolicate <input-path>    Get the line number from the Method Token and IL Offset in a stacktrace

# dotnet dotnet-stack symbolicate -h
symbolicate:
  Get the line number from the Method Token and IL Offset in a stacktrace

Usage:
  dotnet-stack symbolicate [options] <input-path>

Arguments:
  <input-path>    Path to the stacktrace text file

Options:
  -d, --search-dir <directory1 directory2 ...>    Path of multiple directories with assembly and pdb
  -o, --output <output-path>                      Output directly to a file (Default: <input-path>.symbolicated)
  -c, --stdout                                    Output directly to a console

Related PR : #2145, dotnet/runtime#44013

@JongHeonChoi JongHeonChoi requested a review from a team as a code owner July 14, 2021 04:21
@JongHeonChoi
Copy link
Contributor Author

JongHeonChoi commented Jul 14, 2021

Sample test:

sh-3.2# cat stack.trace 
System.NullReferenceException: Object reference not set to an instance of an object.
   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in LineNumberExtract.Tizen.dll: token 0x6000001+0x5
   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in LineNumberExtract.Tizen.dll: token 0x6000002+0x1
   at LineNumberExtract.Program.OnCreate() in LineNumberExtract.Tizen.dll: token 0x6000004+0x15
   at LineNumberExtract.App.OnStart(String[] args) in LineNumberExtract.dll: token 0x6000002+0x2

sh-3.2# dotnet dotnet-stack symbolicate stack.trace  
Symbolicate result:     

   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in LineNumberExtract.Tizen.dll: token 0x6000001+0x5
> at LineNumberExtract.Program.ExceptionMethod2(String[] args) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 14

   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in LineNumberExtract.Tizen.dll: token 0x6000002+0x1
> at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 19

   at LineNumberExtract.Program.OnCreate() in LineNumberExtract.Tizen.dll: token 0x6000004+0x15
> at LineNumberExtract.Program.OnCreate() in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 58

   at LineNumberExtract.App.OnStart(String[] args) in LineNumberExtract.dll: token 0x6000002+0x2
> at LineNumberExtract.App.OnStart(String[] args) in U:\LineNumberExtract\LineNumberExtract\LineNumberExtract.cs:line 33

@josalem
Copy link
Contributor

josalem commented Jul 14, 2021

Rather than output the symbolicated lines one at a time out of context like that, could we output the entire input text with the symbols inserted in context?

e.g., like this:

sh-3.2# cat stack.trace 
System.NullReferenceException: Object reference not set to an instance of an object.
   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in LineNumberExtract.Tizen.dll: token 0x6000001+0x5
   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in LineNumberExtract.Tizen.dll: token 0x6000002+0x1
   at LineNumberExtract.Program.OnCreate() in LineNumberExtract.Tizen.dll: token 0x6000004+0x15
   at LineNumberExtract.App.OnStart(String[] args) in LineNumberExtract.dll: token 0x6000002+0x2

sh-3.2# dotnet stack symbolicate stack.trace  
System.NullReferenceException: Object reference not set to an instance of an object.
   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 14
   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 19
   at LineNumberExtract.Program.OnCreate() in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 58
   at LineNumberExtract.App.OnStart(String[] args) in U:\LineNumberExtract\LineNumberExtract\LineNumberExtract.cs:line 33

Long inputs will result in long outputs, but that's what > file and -o|--output are for.

Comment on lines 42 to 51
string input = string.Empty;
if (inputPath.Length > 1)
{
for (int i = 0; i < inputPath.Length; i++)
{
Console.WriteLine($" {i + 1}. {inputPath[i]}");
}
Console.Write($"Select one of several stacktrace files [1-{inputPath.Length}]: ");
input = inputPath[Int32.Parse(Console.ReadLine()) - 1].FullName;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user specifies multiple input files, should the tool symbolicate all of them rather than forcing user interaction to pick one?

It occurred to me that we'd need to do an in-place update of the existing file or drop and updated version of each file next to it. It also occurred to me that we don't have a specific file extension we can look for. Specifying multiple files with a wildcard would be problematic due to that.

@noahfalk / @shirhatti what are your thoughts on handling multiple files? Should we have users script around the tool to hit multiple files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to receive only one specified extension(xxxxx.stack or xxxxx.stacktrace).
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with specifying an extension is that there isn't an accepted, standard extension for this type of output. This means we either need to have our users manually change the name of each file they want touched or specifically list all the files they want touched. In either case we are requiring the user to manually list out their files.

I'm inclined to not allow wildcards and only accept a list of specific files since it is simpler for our tool to handle. If a user wants to wildcard, they can rely on the shell's ability to do so, e.g., $ find ./*_exception_log | xargs dotnet-stack symbolicate.

We will have different default behaviors depending on the number of input files as a result:

  • 1 input file --> print modified file to stdout
  • >1 input file --> print modified file to new file next to old one (e.g., input.txt -> intput.sym.txt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user specifies multiple input files, should the tool symbolicate all of them rather than forcing user interaction to pick one?

Should we allow multiple stacktrace files to be input? Isn't it better to input only one file at a time?
Or how about guiding the format of the input file contents?

  ------- Input file format -------
   at {typeName}.{methodName}({parameters}) in {moduleName}: token {methodToken}+{ilOffset}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk / @shirhatti what are your thoughts on handling multiple files? Should we have users script around the tool to hit multiple files?

I would keep it simple (at least initially) and support a single file input and write the output to some path adjacent to the output. Then as optional improvements we could add arguments that change the output path or write the output to stdout. File output as the default appealed to me because:
a) It requires less shell scripting knowledge
b) Probably if the input started in a file the user wants the output in a file
c) If we later add support for multiple files the experience doesn't need to abruptly change behavior

Or how about guiding the format of the input file contents?

I assume we'll have to have some definition of the format the tool can parse. I'm not sure if I understood the question though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @noahfalk. The tool should take one file in, and write to a new file next to the old one, e.g.,

$ ls
input.txt
$ dotnet stack symbolicate input.txt
$ ls
input.txt	input.symbolicated.txt

or whatever addition to the file extension we want to use to differentiate.

If you're asking whether we should require the input file to have a specific format, I'm not sure that's something we can require of users. There isn't a standardized file format that we can expect. I think it is sufficient to look for the at ... in ... token ... pattern and fixup when we find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a standardized file format that we can expect.

I think the format of the input file is already set in the function:
https://github.com/dotnet/runtime/blob/c93bb62e33934c3b8b6b1d293612d44360483bd8/src/libraries/System.Private.CoreLib/src/System/Diagnostics/StackTrace.cs#L206
So, the user using this tool thought that it was easy to input without changing the output stack.
Because it is already output as below.
" at {typeName}.{methodName}({parameters}) in {moduleName}: token {methodToken}+{ilOffset}"

I think it is sufficient to look for the at ... in ... token ... pattern and fixup when we find it.

Also, I agree with you. But wouldn't most users use it without change? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be saying the same thing.

I think we should take in any file and treat it as a text file because there is no agreed upon file extension for stack traces like this. We should just scan the file for the pattern and attempt to fix it up in place, then save the fixed-up file next to the original.

Comment on lines 96 to 108
foreach (var pdbFile in pdbFiles)
{
foreach (var peFile in peFiles)
{
if (Path.GetFileNameWithoutExtension(peFile) == Path.GetFileNameWithoutExtension(pdbFile))
{
string xmlPath = Path.Combine(tempDirectory, Path.GetFileName(Path.ChangeExtension(peFile, "xml")));
GenXmlFromPdb(peFile, pdbFile, xmlPath);
xmlList.Add(xmlPath);
break;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect we would only have one PDB per assembly. If you sorted both arrays before iterating and then used a for(;;;) loop to not lose your spot in the array, you would do less work. As it is now, this could potentially iterate all the assembly files for every PDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect we would only have one PDB per assembly.

Right. But #2145 (comment)

In generally, CE products(mobile, wearable, TV, refrigerator, etc) do not have enough memory.
So, It is often deployed an application without a PDB file to reduce the deployment size or obfuscate.

I think PDB file per assembly may not exist. So, I generated XML file only when there is assembly and pdb of the same name. Do you have a better idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only suggesting a way to match up the assembly list and the pdb list with less looping.

for example (in pseudo code):

function match_assembly_pdb:
	pairList <- []
	pdbList.sort()
	asmList.sort()
	j <- 0
	for i in pdbList.length: # assuming pdbList.length < asmList.length
		while j < asmList.length:
			if pdbList[i] == asmList[j]:
				pairList.add((pdbList[i], asmList[j]))
				break
			j++

Then the code only ever touches each element of each array once after the sort.

All that being said, I chatted with my team a bit and I think there is a simpler way of getting the symbols without converting to XML. Rather than use DiaSymReader, you can use the System.Reflection.Metadata APIs directly since those already know how to manipulate portable PDBs.

Here is an example from SOS on how you can use these APIs:

private bool GetSourceLineByILOffset(
OpenedReader openedReader,
int methodToken,
long ilOffset,
out int lineNumber,
out string fileName)
{
lineNumber = 0;
fileName = null;
MetadataReader reader = openedReader.Reader;
try
{
Handle handle = MetadataTokens.Handle(methodToken);
if (handle.Kind != HandleKind.MethodDefinition)
return false;
MethodDebugInformationHandle methodDebugHandle = ((MethodDefinitionHandle)handle).ToDebugInformationHandle();
if (methodDebugHandle.IsNil)
return false;
MethodDebugInformation methodDebugInfo = reader.GetMethodDebugInformation(methodDebugHandle);
SequencePointCollection sequencePoints = methodDebugInfo.GetSequencePoints();
SequencePoint? nearestPoint = null;
foreach (SequencePoint point in sequencePoints)
{
if (point.Offset > ilOffset)
break;
if (point.StartLine != 0 && !point.IsHidden)
nearestPoint = point;
}
if (nearestPoint.HasValue)
{
lineNumber = nearestPoint.Value.StartLine;
fileName = reader.GetString(reader.GetDocument(nearestPoint.Value.Document).Name);
return true;
}
}
catch (Exception ex)
{
Trace.TraceError($"GetSourceLineByILOffset: {ex.Message}");
}
return false;
}

Here is an example from the runtime on how to use those APIs (this one is a little simpler and handles caching parsed PDBs):
https://github.com/dotnet/runtime/blob/b1452c8fcab789dd9f2921c1479425b291111c5d/src/libraries/System.Diagnostics.StackTrace/src/System/Diagnostics/StackTraceSymbols.cs#L13-L94

Since non-portable PDBs are opt-in only and only work on Windows, I'm inclined to not add the extra headache of using DiaSym which ultimately wraps the System.Reflection.Metadata for portable PDBs anyway. This will help with caching parsed symbols, reduce dependencies, and decrease complexity by not requiring the conversion to XML phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to emphasize the use of System.Reflection.Metadata here since that should simplify the logic and dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to emphasize the use of System.Reflection.Metadata here since that should simplify the logic and dependencies.

Ok. I'll check the System.Reflection.Metadata

Comment on lines 256 to 275
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(xmlPath);
XmlElement xRoot = xmlDoc.DocumentElement;
XmlNodeList xnList = xRoot.ChildNodes;
int xnCount = xnList.Count;
if (xnCount > 0)
{
for (int i = xnCount - 1; i >= 0; i--)
{
XmlNode node = xnList[i];
if (node.Name == "files")
{
ParseFile(node.ChildNodes, stInfo);
}
else if (node.Name == "methods")
{
ParseMethod(node.ChildNodes, stInfo);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cache this load and the symbol lookup? This looks like it's going to load the file, parse the XML, and walk every method in the parsed XML for every frame of a stack trace. If a stack has many frames from a single assembly with a large PDB, this could be expensive.

Comment on lines 14 to 16
<PackageReference Include="Microsoft.DiaSymReader" Version="1.4.0-beta2-20217-02" />
<PackageReference Include="Microsoft.DiaSymReader.Converter.Xml" Version="1.1.0-beta2-20216-01" />
<PackageReference Include="Microsoft.DiaSymReader.PortablePdb" Version="1.6.0-beta2-21126-02"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoyosjs are these packages managed under DARC or should we leave the manual versioning here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd centralize them. Also, we already use Dia in SOS / dotnet dump.

@josalem josalem added this to the 6.0.0 milestone Jul 14, 2021
@JongHeonChoi
Copy link
Contributor Author

JongHeonChoi commented Jul 19, 2021

I'll check this #2436 (comment) in the next commit.

public static Option<string> OutputOption() =>
new Option<string>(new[] { "-o", "--output" }, "Output directly to a file")
{
Argument = new Argument<string>(name: "output-path")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in the latest versions of System.CommandLine, this property is not accessible.

@JongHeonChoi JongHeonChoi force-pushed the add_dotnet-stack_symbolicate branch from 1a593f3 to 43f39b2 Compare August 3, 2021 04:19
Comment on lines 194 to 195
fswo?.Close();
fswi?.Close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put these in a finally block, or refactor so you can use them in a using block.

Comment on lines 165 to 177
StreamWriter fswi = null;
if (!inputPath.EndsWith(".symbolicated"))
{
fswi = new StreamWriter(new FileStream(inputPath + ".symbolicated", FileMode.Create, FileAccess.Write));
}

string output = string.Empty;
StreamWriter fswo = null;
if (outputPath != null)
{
fswo = new StreamWriter(new FileStream(outputPath, FileMode.Create, FileAccess.Write));
output = $"\nOutput: {outputPath}\n";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it always writes the fixed up text to a .symbolicated file next to the input file. and then optionally also writes to an output file.

The tool should do one or the other, but not both since that would mean a user generates two files when they specify an output path.

if (!line.Contains("at ") || !line.Contains(" in ") || !line.Contains("token"))
{
fswo?.WriteLine(line);
console.Out.WriteLine($"{line}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in a previous comment chain, we had decided to only print to a file. We can have it print to the console, but we should put it behind a --stdout flag or something similar.

}
}

private static string GetRegex(IConsole console, string line, List<string> xmlList, StreamWriter fsw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to have a few side-effects unrelated to getting the regex. I'd rename the function to better reflect that. Something like TrySymbolicateLine or something similar, since it looks like it can fail.

In a similar vein, it looks like this function has the logic for writing to one file, while GetLineFromStack has the logic for writing to other outputs. Having the print logic spread out makes it difficult to reason about what's being written where and when. In this case, the function name doesn't imply that it does any writing.

Preferably, GetLineFromStack might look something like this:

outputWriter = outputPath.IsNullOrEmpty() ?
	/* open file next to input */ :
	/* open at output path */;
foreach (line in inputFile)
	outputWriter.Write(TrySymbolicateLine(line));

so all the writing happens in one place.

public string EndLine;
}

private static void GetLineFromStack(IConsole console, List<string> xmlList, string inputPath, string outputPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the name of this function reflects what it is doing. Perhaps something like SymbolicateFile?

Get implies this returns something, but it returns void.

Comment on lines 22 to 24
private static readonly string tempDirectory = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());
private static readonly Regex regex = new Regex(@" at (?<type>[\w+\.?]+)\.(?<method>\w+)\((?<params>.*)\) in (?<filename>[\w+\.?]+)(\.dll|\.ni\.dll): token (?<token>0x\d+)\+(?<offset>0x\d+)", RegexOptions.Compiled);
private static readonly Regex verifyRegex = new Regex(@"at (?<typeMethod>.*)\((?<params>.*?)\) in (?<filename>.*)token(?<token>.*)\+(?<offset>.*)", RegexOptions.Compiled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static readonly variables typically either get s_<name> or PascalCased names, e.g., s_tempDirectoryPath or TempDirectoryPath.

Comment on lines 41 to 47
if (searchDir.Length != 0)
{
foreach (var path in searchDir)
{
search_paths.Add(path.FullName);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably skip this stringification if you use DirectoryInfo[] and FileInfo[] everywhere.

}
string ret = GetRegex(console, line, xmlList, fswi);
fswo?.WriteLine(ret);
console.Out.WriteLine($"{ret}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting for a single string is unnecessary work. This can just be Console.Out.WriteLine(ret).

str.Append("+");
str.Append(match.Groups["offset"].Value.TrimStart().Split(" ")[0]);

if (regex.Match(str.ToString()).Success)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should come before doing all the StringBuilder work above since you throw it out if it is false.

{
string line = fsri.ReadLine();
// The stacktrace must have "at ... in ... token ..."
if (!line.Contains("at ") || !line.Contains(" in ") || !line.Contains("token"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a full read of the line 3 times in a row. You could simply check against the compiled regex once that would either match an return all the necessary info or not match, all in one step.

@JongHeonChoi
Copy link
Contributor Author

Exception log:

sh-3.2# pwd
/opt/usr/globalapps/org.tizen.example.LineNumberExtract.Tizen/bin
sh-3.2# cat stack.trace
System.NullReferenceException: Object reference not set to an instance of an object.
   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in LineNumberExtract.Tizen.dll: token 0x6000001+0x5
   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in LineNumberExtract.Tizen.dll: token 0x6000002+0x1
   at LineNumberExtract.Program.OnCreate() in LineNumberExtract.Tizen.dll: token 0x6000004+0x15
   at LineNumberExtract.App.OnStart(String[] args) in LineNumberExtract.dll: token 0x6000002+0x2

Case 1:

sh-3.2# dotnet dotnet-stack symbolicate stack.trace

Output: /opt/usr/globalapps/org.tizen.example.LineNumberExtract.Tizen/bin/stack.trace.symbolicated

Case 2:

sh-3.2# dotnet dotnet-stack symbolicate stack.trace -o result.out

Output: /opt/usr/globalapps/org.tizen.example.LineNumberExtract.Tizen/bin/result.out

Case 3:

sh-3.2# dotnet dotnet-stack symbolicate stack.trace --stdout     
System.NullReferenceException: Object reference not set to an instance of an object.
   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 14
   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 19
   at LineNumberExtract.Program.OnCreate() in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 58
   at LineNumberExtract.App.OnStart(String[] args) in U:\LineNumberExtract\LineNumberExtract\LineNumberExtract.cs:line 33

Output: /opt/usr/globalapps/org.tizen.example.LineNumberExtract.Tizen/bin/stack.trace.symbolicated

Case 4:

sh-3.2# dotnet dotnet-stack symbolicate stack.trace -o result.out --stdout
System.NullReferenceException: Object reference not set to an instance of an object.
   at LineNumberExtract.Program.ExceptionMethod2(String[] args) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 14
   at LineNumberExtract.Program.ExceptionMethod1(IntPtr data) in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 19
   at LineNumberExtract.Program.OnCreate() in U:\LineNumberExtract\LineNumberExtract.Tizen\LineNumberExtract.Tizen.cs:line 58
   at LineNumberExtract.App.OnStart(String[] args) in U:\LineNumberExtract\LineNumberExtract\LineNumberExtract.cs:line 33

Output: /opt/usr/globalapps/org.tizen.example.LineNumberExtract.Tizen/bin/result.out

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things are definitely shaping up in the right direction! Another round of comments. Thank you for the contributions!

Comment on lines 39 to 44
List<string> searchPaths = new List<string>();
foreach (var path in searchDir)
{
searchPaths.Add(path.FullName);
}
searchPaths.Add(Directory.GetCurrentDirectory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to pass the {File|Directory}Info[] around, so we don't have to stringify everything here.

If the order of entries in these arrays is used to locate files, the current directory should probably be at the front of the array.

Comment on lines 59 to 62
private static void Symbolicator(IConsole console, List<string> searchPaths, string inputPath, string outputPath, bool isStdout)
{
CreateSymbolicateFile(console, GetSearchPathList(searchPaths), inputPath, outputPath, isStdout);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove this function and just call CreateSymbolicateFile directly in Symbolicate.

Comment on lines 106 to 109
foreach (var peFile in Directory.GetFiles(assemDir, searchPattern, SearchOption.AllDirectories))
{
files.Add(peFile);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
foreach (var peFile in Directory.GetFiles(assemDir, searchPattern, SearchOption.AllDirectories))
{
files.Add(peFile);
}
files.AddRange(Directory.GetFiles(assemDir, searchPattern, SearchOption.AllDirectories));

Comment on lines 211 to 240
private static MetadataReader GetMetadataReader(string filePath)
{
try
{
Func<string, Stream> streamProvider = sp => new FileStream(sp, FileMode.Open, FileAccess.Read);
using Stream stream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
if (stream != null)
{
MetadataReaderProvider provider = null;
if (filePath.Contains(".dll"))
{
using PEReader peReader = new PEReader(stream);
if (!peReader.TryOpenAssociatedPortablePdb(filePath, streamProvider, out provider, out string pdbPath))
{
return null;
}
}
/*else if (filePath.Contains(".pdb"))
{
provider = MetadataReaderProvider.FromPortablePdbStream(stream);
}*/
return provider.GetMetadataReader();
}
return null;
}
catch
{
return null;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should attempt to cache the MetadataReaders so we don't repeat work if there are multiple frames from the same assembly.

I recommend stashing them in a Dictionary<string,MetadataReader> and being sure to clean them up at the end of the process.

Comment on lines 201 to 207
foreach (var path in searchPathList)
{
if (path.Contains(stInfo.Assembly))
{
return GetLineFromMetadata(GetMetadataReader(path), ret, stInfo);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you add a caching mechanism for the MetadataReaders, you can eliminate this loop by making the key of the cache the assembly name. Then this becomes a constant time lookup rather than a linear lookup. You would need some function like bool TryGetMetadataReader(string assemblyName, out MetadataReader reader) that uses what you have written in GetMetadataReader that also attempts to cache things.

Comment on lines 22 to 23
private static readonly Regex s_regex = new Regex(@" at (?<type>[\w+\.?]+)\.(?<method>\w+)\((?<params>.*)\) in (?<filename>[\w+\.?]+)(\.dll|\.ni\.dll): token (?<token>0x\d+)\+(?<offset>0x\d+)", RegexOptions.Compiled);
private static readonly Regex s_verifyRegex = new Regex(@"at (?<typeMethod>.*)\((?<params>.*?)\) in (?<filename>.*)token(?<token>.*)\+(?<offset>.*)", RegexOptions.Compiled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does s_verifyRegex accomplish that s_regex doesn't already?

Comment on lines 148 to 168
private static string GetVerifiedStackTrace(string line)
{
Match match = s_verifyRegex.Match(line);
StringBuilder str = new StringBuilder();
str.Append(" at ");
str.Append(match.Groups["typeMethod"].Value.TrimEnd());
str.Append("(");
str.Append(match.Groups["params"].Value);
str.Append(") in ");
str.Append(match.Groups["filename"].Value.Replace(":", "").Trim());
str.Append(": token ");
str.Append(match.Groups["token"].Value.Trim());
str.Append("+");
str.Append(match.Groups["offset"].Value.TrimStart().Split(" ")[0]);

if (s_regex.Match(str.ToString()).Success)
{
return str.ToString();
}
return line;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this attempting to force all lines that roughly match the pattern into a specific format?

I'm not sure that is necessary as a separate step. s_regex looks like it should handle this already.

public string Offset;
}

private static string GetRegex(string line, List<string> searchPathList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static string GetRegex(string line, List<string> searchPathList)
private static string TrySymbolicateLine(string line)

If you add a cache for the MetadataReaders you don't need to pass the list of paths around and can access it from some global state.

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of feedback 😃. I think this is almost ready for check-in.

{
try
{
SetMetadataReader(searchDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-populating the MetadataReader cache. This can be pretty expensive if we have an application that has a large number of PDBs. We also may not be using all of the PDB/MetadataReaders. If possible, these should be lazily constructed when they're first needed.

Comment on lines 64 to 69
if (!s_regex.Match(line).Success)
{
fileStreamWriter?.WriteLine(ret);
if (isStdout) console.Out.WriteLine(ret);
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is duplicating work that happens in TrySymbolicateLine. ret from TrySymbolicateLine will already be the original input if it doesn't match the regex. This call to RegularExpression.Match can be removed.

Comment on lines 210 to 217
private static MetadataReader TryGetMetadataReader(string assemblyName)
{
if (s_metadataReaderDictionary.ContainsKey(assemblyName))
{
return s_metadataReaderDictionary[assemblyName];
}
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have logic like this:

if (dict.TryGetValue(assemblyName, out MetadataReader reader))
	return reader;
else
	dict[assemblyName] = /* instantiate new MetadataReader */;

This will only instantiate a MetadataReader when we ask for it.

{
Arity = ArgumentArity.ZeroOrOne
}
}.ExistingOnly()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this method. Is this saying that the output file needs to exist in order for the option to be valid? That seems opposite of the intention of the option, i.e., create the file and fill it with the contents we generate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will generate an error if the file doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. It's my mistake. I'll delete it.

{
string sourceFile = reader.GetString(reader.GetDocument(bestPointSoFar.Value.Document).Name);
int sourceLine = bestPointSoFar.Value.StartLine;
return $" at {stInfo.Type}.{stInfo.Method}({stInfo.Param}) in {sourceFile}:line {sourceLine}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, the input line could have any number or type of whitespace (or other characters) at the beginning of the line. For example, the input text may be from a log that has timestamps on every line. We could potentially be throwing out data by rewriting the line like this.

If possible, we should try to rewrite the line in situ. You could reuse the regex you already have with the Regex.Replace(String, String, MatchEvaluator, RegexOptions) API. You would write a MatchEvaluator method that receives each match in the regex and replaces it with the appropriate part of the StackTraceInfo. You may also need to add a capture group for the : token part so you can replace it with : line.

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're to a point where I'm ready to merge this feature 🎊 . I'm assuming you have been doing some testing along the way? I'd like some sample input/output from local testing a few scenarios before we merge.

  • input file that is just the stack without symbols
  • input file where the stack is embedded in a log-ish format, e.g., timestamps or other text around the stack lines
  • output file where only some of the PDBs were available

It doesn't need to be part of this PR, but I would like some automated tests of this verb at some point. When this merges, I'll create an issue to track adding those.

@JongHeonChoi
Copy link
Contributor Author

  1. without PDBs
E/STDERR  ( 4653): Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
E/STDERR  ( 4653):    at DotnetStackSymbolicate.App.MethodA()
E/STDERR  ( 4653):    at DotnetStackSymbolicate.App..ctor()
E/STDERR  ( 4653):    at DotnetStackSymbolicate.Program.OnCreate()
E/STDERR  ( 4653): onSigabrt called

// output
E/STDERR  ( 4653): Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
E/STDERR  ( 4653):    at DotnetStackSymbolicate.App.MethodA()
E/STDERR  ( 4653):    at DotnetStackSymbolicate.App..ctor()
E/STDERR  ( 4653):    at DotnetStackSymbolicate.Program.OnCreate()
E/STDERR  ( 4653): onSigabrt called
  1. without PDBs(token+ILOffset)
08-25 14:17:56.403+0900 E/STDERR  (12932): Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App.MethodA() in DotnetStackSymbolicate.dll:token 0x6000002+0x6
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App..ctor() in DotnetStackSymbolicate.dll:token 0x6000003+0x51
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.Program.OnCreate() in DotnetStackSymbolicate.Tizen.dll:token 0x6000001+0x8
08-25 14:17:56.403+0900 E/STDERR  (12932): onSigabrt called

// output
08-25 14:17:56.403+0900 E/STDERR  (12932): Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App.MethodA() in DotnetStackSymbolicate.dll:token 0x6000002+0x6
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App..ctor() in DotnetStackSymbolicate.dll:token 0x6000003+0x51
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.Program.OnCreate() in DotnetStackSymbolicate.Tizen.dll:token 0x6000001+0x8
08-25 14:17:56.403+0900 E/STDERR  (12932): onSigabrt called
  1. with PDBs
// output
08-25 14:17:56.403+0900 E/STDERR  (12932): Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App.MethodA() in C:\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate.cs:line 19
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App..ctor() in C:\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate.cs:line 38
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.Program.OnCreate() in C:\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate.Tizen\DotnetStackSymbolicate.Tizen.cs:line 12
08-25 14:17:56.403+0900 E/STDERR  (12932): onSigabrt called
  1. Only specific PDB(DotnetStackSymbolicate.pdb) exist
// output
08-25 14:17:56.403+0900 E/STDERR  (12932): Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App.MethodA() in C:\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate.cs:line 19
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.App..ctor() in C:\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate\DotnetStackSymbolicate.cs:line 38
08-25 14:17:56.403+0900 E/STDERR  (12932):    at DotnetStackSymbolicate.Program.OnCreate() in DotnetStackSymbolicate.Tizen.dll:token 0x6000001+0x8
08-25 14:17:56.403+0900 E/STDERR  (12932): onSigabrt called

@josalem
Copy link
Contributor

josalem commented Aug 25, 2021

Merging 🎊

Thanks @JongHeonChoi for this contribution! This will be an awesome feature 😄

@josalem josalem merged commit 3ec1f31 into dotnet:main Aug 25, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Jan 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants