Skip to content

Conversation

@AndyAyersMS
Copy link
Member

Need to check for \r\n as well as \n.

Need to check for \r\n as well as \n.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 6, 2020
@AndyAyersMS
Copy link
Member Author

PTAL @sandreenko
CC @dotnet/jit-contrib

Suspect this is a result of #43736. Not sure why the current version is passing in CI, perhaps we are enabling old behavior there still?

@stephentoub
Copy link
Member

Suspect this is a result of #43736

It shouldn't be, as regex uses ordinal by default.

@stephentoub
Copy link
Member

Not sure why the current version is passing in CI, perhaps we are enabling old behavior there still?

Maybe when the data files got checked out locally, they got rewritten with \r\n?

@AndyAyersMS
Copy link
Member Author

Maybe when the data files got checked out locally, they got rewritten with \r\n?

Maybe, though this is new behavior and I don't recall changing any config settings.

@danmoseley
Copy link
Member

Presumably the test is failing without your change?

@AndyAyersMS
Copy link
Member Author

Locally, yes...

agggtaaa|tttaccct 1
[cgt]gggtaaa|tttaccc[acg] 0
a[act]ggtaaa|tttacc[agt]t 0
ag[act]gtaaa|tttac[agt]ct 0
agg[act]taaa|ttta[agt]cct 0
aggg[acg]aaa|ttt[cgt]ccct 1
agggt[cgt]aa|tt[acg]accct 0
agggta[cgt]a|t[acg]taccct 0
agggtaa[cgt]|[acg]ttaccct 0

342
256
157
ERRORLEVEL=-1

That last 157 is a string length, it should be 152. There are 5 newlines in the input file...

@BruceForstall
Copy link
Contributor

fwiw, I see the same failure as Andy. And the .txt files all have \r\n on my file system.

Did something change in the moving of the test tree? #38058

E.g., were the files previously marked binary?

Presumably Andy's change allows the tests to work no matter what the newline format is, so is general goodness?

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM, but I agree with Bruce it would be useful to understand what has changed and why it is passing in CI.

Maybe when the data files got checked out locally, they got rewritten with \r\n?

These files had /r/n from the beginning according to my Notepad++, do not know if it is possible to show it on GitHub.

@stephentoub
Copy link
Member

stephentoub commented Nov 6, 2020

These files had /r/n from the beginning according to my Notepad++, not if it is possible to show it on GitHub.

Not in the git repo:

using System;
using System.Net.Http;

using var c = new HttpClient();
Console.WriteLine((await c.GetByteArrayAsync("https://raw.githubusercontent.com/dotnet/runtime/master/src/tests/JIT/Performance/CodeQuality/BenchmarksGame/regex-redux/regexdna-input25.txt")).AsSpan().IndexOf((byte)'\r'));
Console.WriteLine((await c.GetByteArrayAsync("https://raw.githubusercontent.com/dotnet/runtime/master/src/tests/JIT/Performance/CodeQuality/BenchmarksGame/regex-redux/regexdna-input25000.txt")).AsSpan().IndexOf((byte)'\r'));

outputs:

-1
-1

@BruceForstall
Copy link
Contributor

BruceForstall commented Nov 6, 2020

I think I found the problem: .gitattributes didn't get updated when the tests moved -- they still point at the old files:

# CLR specific
src/coreclr/src/pal/tests/palsuite/paltestlist.txt text eol=lf
src/coreclr/src/pal/tests/palsuite/paltestlist_to_be_reviewed.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/regexdna/regexdna-input25.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/regexdna/regexdna-input25000.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/regex-redux/regexdna-input25.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/regex-redux/regexdna-input25000.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/reverse-complement/revcomp-input25.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/reverse-complement/revcomp-input25000.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/k-nucleotide/knucleotide-input.txt text eol=lf
src/coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/k-nucleotide/knucleotide-input-big.txt text eol=lf
src/coreclr/tests/src/performance/Scenario/JitBench/Resources/word2vecnet.patch text eol=lf

cc @trylek

It's unfortunate that the tests are dependent on the specific line endings and require this configuration.

@stephentoub
Copy link
Member

the version before the move

No '\r's in that one either 😄

@stephentoub
Copy link
Member

It's unfortunate that the tests are dependent on the specific line endings

My only hesitancy in changing them is it moves away from how regex redux is defined:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/regexredux.html#regexredux
"use the same simple regex pattern match-replace to remove FASTA sequence descriptions and all linefeed characters"
and the same ">.*\n|\n" pattern that all the submissions use.

If the goal of having the test here is to match that, we should be wary of changing it. If the goal of having the test here is just to have a meaty regex test, then changing it is fine.

@trylek
Copy link
Member

trylek commented Nov 6, 2020

My apologies about the missing .gitattributes path renames, I'll send out a PR to fix these.

@BruceForstall
Copy link
Contributor

For simplicity, we should probably keep these as close to possible as the "official" code, so we'll just need to live with the config to force LF line endings for the test input.

@AndyAyersMS
Copy link
Member Author

Ah, I remember creating those gitattributes, long ago. Updating them makes sense, so I think we can abandon this PR.

Still not clear how the tests are passing in CI. Any explanation for that? I suppose we are shipping files around so perhaps we're building on one OS and running on another?

@BruceForstall
Copy link
Contributor

In the CI we always (?) build tests on Mac, so its git doesn't do line ending translation.

@trylek
Copy link
Member

trylek commented Nov 6, 2020

Correct - well, in fact I recently switched that over to Linux as the lab Macs have become much slower but your argument still holds.

@AndyAyersMS AndyAyersMS closed this Nov 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants