-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to match regular expressions #55
Comments
That's an interesting idea! Would you want to apply the regular expression to a single file / folder at a time, or a larger piece of the path? E.g. would your example |
If you can provide a simple example of two files and what diff output you want, it would help me understand the specific behavior you're looking for. And to determine how much of an extension it is. |
Thanks for your interest! Here are a few examples. The LHS is controlled, and the RHS is independent. LHS: {
"length":{{[0-9]+}},
"location":{
"index":0,
"uri":"file://{{.*}}clang/test/Frontend/sarif-diagnostics.cpp"
},
"mimeType":"text/plain",
"roles":[
"resultFile"
]
} RHS (no diff): {
"length":100,
"location":{
"index":0,
"uri":"file:///tmp/clang/test/Frontend/sarif-diagnostics.cpp"
},
"mimeType":"text/plain",
"roles":[
"resultFile"
]
} RHS ( {
"length":"100",
"location":{
"index":0,
"uri":"file:///tmp/clang/test/Frontend/sarif-diagnostics.cpp"
},
"mimeType":"text/plain",
"roles":[
"resultFile"
]
} @ ["length"]
- {{\d+}}
+ "100" RHS ( {
"length":100,
"location":{
"index":0,
"uri":"/tmp/clang/test/Frontend/sarif-diagnostics.cpp"
},
"mimeType":"text/plain",
"roles":[
"resultFile"
]
} @ ["location","uri"]
- "file://{{.*}}/clang/test/Frontend/sarif-diagnostics.cpp"
+ "/tmp/clang/test/Frontend/sarif-diagnostics.cpp" |
I had never considered using jd this way! You are wanting to define equality in a very flexible way. This is bordering on defining a schema. E.g. Json Schema. If it "matches" then it's equal. Otherwise there is a difference. JSON as a format lacks metadata--a way of attaching information directly to values--data about data. Other, similar data formats do, such as Ion Annotations and EDN tags. So we have to either use flag values (special keys and values, like Your example isn't valid JSON ( The jd library and format can accept and encode metadata in the path to a value. It's a way to point to a part of the structure and say "interpret this data in this way". When making a diff you have to provide it out-of-band with the Example: @ ["roles", ["set"], {}]
+ "notResultFile" The Or this for short. @ ["roles", {}]
+ "notResultFile" The existing metadata is about how to interpret collections (arrays and objects). But you want to provide metadata about how to interpret leave nodes--individual values. "Two values are the same if they are both numbers." "Two values are the same if they are strings and have the same ending". This could easily extend to "two values are the same if they are a string representing the same datetime" (but different timezones). So we need a powerful way to express a binary function for equality. You are proposing regular expressions which is solid option. They are well known, compact and powerful. Good for inlining into another data format. We just need to provide these expressions in a side-channel, either a file or as commandline parameters. So we could do something like this:
Applied to these two files would return no diff: {
"location":{
"uri":"file:///tmp/clang/test/Frontend/sarif-diagnostics.cpp"
}
} {
"location":{
"uri":"file:///my-temporary-folder/clang/test/Frontend/sarif-diagnostics.cpp"
}
} Or the metadata could be provided as a separate file. Like this: [
["location","uri",{"regex":"file://{{.*}}clang/test/Frontend/sarif-diagnostics.cpp"}],
["length",{"regex":"\\d+"}]
] (I have a strong preference for using valid JSON as much as possible) The LHS has a concrete value which falls with the "schema". But that's kinda weird because you don't really care what the concrete value is. Just that it matches the metadata. Just that it falls within the "schema". So why not just use an existing schema language to set constraints? Maybe your LHS should be something like this: {
"type": "object",
"properties": {
"length": {
"type": "integer"
},
...
"required": [ "length", ... ]
} Maybe you should be using a tool that validates JSON Schema instead of jd. Or maybe we should start thinking of jd as a terse schema validator and add validation inline per your original suggestion. Let me ask some follow-on questions to understand your use case better. How do you plan to maintain and update the LHS? Will you have a source-of-truth JSON file that you update from time-to-time? Or will you write it carefully by hand as validation for the RHS? Will you start with a strict match and make parts "fuzzy" selectively (like the path)? Or will it be primarily composed of regular expressions? |
Another odd thing about embedding regular expressions in the LHS would be that you repeat them (denormalize) throughout a list. Example: {
"locations":[
{
"index":0,
"uri":"file://{{.*}}clang/test/Frontend/sarif-diagnostics.cpp"
},
{
"index":9,
"uri":"file://{{.*}}clang/test/Frontend/another-diagnostics.cpp"
},
{
"index":99,
"uri":"file://{{.*}}clang/test/Frontend/yet-another-diagnostics.cpp"
}
]
} This example might not make sense for you use case, but I could easily see a case where we have a list of the same type of object. A schema would give that object a name and define it's shape once. But using jd with in-line regular expressions, we would have to repeat the "type" definition over and over. A second odd thing about using inline regular expressions would be telling the difference between types. Your example wants a number for length. Definitely not a string. So you embedded the regular expression Instead we should just say what type we want. Which again points to out-of-band (not inline) metadata describing the shape. So your specific use case sounds a lot more like a job for JSON schema. Schema files can be pretty verbose so a good way to get started would be to use a schema generator (Google You still need to handle the file prefix problem, a problem for which JSON schema doesn't have native support. But most schema validators allow you to provide a custom validation function and you could have a "file" function that you parameterize with the suffix. For that matter, we could add some custom function hooks into the jd library so you could provide the same kinds of validation function. You would need to write them in golang since that's the jd library language unless you want to run jd as WASM (which is totally possible--that's how the UI works). But I don't want to add such custom functions into jd natively (in repo, built into the binary, part of the jd diff format) for two reasons: /1/ it's departing from the "do one thing well" principle and getting pretty far into the schema validation space, for which there are better tools and /2/ I've maintained a pretty strict round-trip invariant that all diffs can also be applied as patches to produce the original input. So So in conclusion, you could still go either way: /1/ use JSON schema to validate the RHS (using a generator on a golden LHS, then writing custom schema functions) or /2/ extending jd to accept custom equality functions, then using the jd library to build a different tool (I can help you with this). The answer depends on the details of your use case (questions above) and your appetite for building new tooling. What do you think? |
I was thinking about your Do you really care if the I'm leaning more toward building some of this into jd because it aligns well with the path masking feature. And one of the masks could be a regular expression. Also sorry for the gigantic response. It's just helpful to externalize my though process. 😉 |
Possibly! We need a simple diff for the vast majority of our output: it's just paths and lengths that get regexed.
Our LHS is the expected SARIF that we expect Clang to output (i.e. it's a test case). The test is embedded in the source file as the source of truth (here's an example of what it would look like*). Although we could technically update *The |
No worries! I got distracted on Friday, so I appreciate your patience :)
We don't care about the length at the moment because I only recently learnt that our current (non-JSON-friendly) tool supports regex, so this isn't a tall order.
Good question. I'll mull this one over and get back to you in the next day or so (though if path masking gets added, that may suffice). Very much appreciate how much time and thought you've put into this, thank you :) |
A third option is for |
There are apparently some infra reasons we can't use Go, so unfortunately Thanks for all your assistance on this issue, and for making a really cool utility! |
@cjdb I'm glad the tool is useful and I'm sorry you can't use Go! Keep me posted if you do create a port. I would love to share ideas and keep them compatible. I'm in the process of implementing a 2.0 version of the format. I need to make some backward incompatible changes to add context for producing minimal diffs: #50 |
I'm reopening this issue because it's a feature that is still useful. Even if the requestor doesn't need it anymore. |
I'm working on a project that wants to use jd to check our program output, but there are some fields that contain absolute paths. In order to have a robust test suite, we only hardcode the relative path, starting from the project directory, and fill the rest in with a regular expression. For example:
Our current tool interprets
{{.*}}
as a regular expression. Sadly, it's not a JSON diff tool, so we need to do some other, unreadable things to get it to work (hence the desire to use jd).Being able to facilitate regular expressions in jd would be wonderful. Is there any appetite for such an extension?
The text was updated successfully, but these errors were encountered: