Plots diff for different file paths in current workspace #5808
Replies: 14 comments 2 replies
-
@pared Thoughts on this as a solution to one of the questions raised by @mnrozhkov? |
Beta Was this translation helpful? Give feedback.
-
@dberenbaum I think that could be a proper solution. That brings up a question of how to approach diffing "across revisions"/"between different files". Currently, our API is built around the idea that we compare the same files between revisions. I think that we need to
Providing a solution to 2 seems like breaking change from the user perspective, so we might need to consider (until 3.0) having current API and providing Also, there is a question of whether we want to provide the solution to it (comparing different files from different revisions) at all. I think it's hard to design CLI interaction that would allow us this functionality and be pleasant to use at the same time. |
Beta Was this translation helpful? Give feedback.
-
Hmm, good point. I'm not sure if that's a likely enough scenario to be worth the UI headache, but it does seem odd to leave it out if all the functionality is there. I guess DVC could always keep following the Git syntax: https://stackoverflow.com/questions/43041882/git-how-to-diff-two-different-files-in-different-commits 🤷 . |
Beta Was this translation helpful? Give feedback.
-
@dberenbaum makes sense! |
Beta Was this translation helpful? Give feedback.
-
Do you really think so? I'm less certain, since it's new and obscure syntax not used elsewhere. Keep in mind that whatever we do with plots will probably need to implemented across other output types. I think it's probably fine to either borrow the Git syntax or leave out this use case. |
Beta Was this translation helpful? Give feedback.
-
Was applying to #!/bin/bash
rm -rf repo
mkdir repo
pushd repo
git init --quiet
dvc init --quiet
git add -A
git commit -am "init"
echo data >> data
dvc add data
dvc run -n train -d data -o "a:c" "cat data >> 'a:c'" Which just adds up naming restrictions on outputs on top of other things we would need to do. So, this use case (cross-revision, cross-files As to solving current issue: It seems that we do agree that, for now |
Beta Was this translation helpful? Give feedback.
-
👍 I also realized that this looks like a duplicate of #5074. If you want to close it out in favor of that one, feel free. |
Beta Was this translation helpful? Give feedback.
-
Here is the most of discussion, lets not split the conversation |
Beta Was this translation helpful? Give feedback.
-
I'm afraid it's not :) Could you remind me what was the question? This seems like a workaround in current paradigm of plotting functionality in DVC. Here are few concerns:
|
Beta Was this translation helpful? Give feedback.
-
DVC plots diffs currently rely on comparing the same filepath between different commits. I thought you had asked about a workflow where outputs are spread across multiple filepaths/directories in the same commit/workspace. If we misunderstood, we can consider closing this issue and continue discussing other concerns elsewhere. |
Beta Was this translation helpful? Give feedback.
-
@mnrozhkov how does data structure look like in case of, for example GridSearch? Would you dump all results into a single file? |
Beta Was this translation helpful? Give feedback.
-
Yes, I think this is getting a bit off-topic for this issue, but would getting data from many experiments into one file that you could use to generate various plots (or do any other analysis) address some of your concerns @mnrozhkov? |
Beta Was this translation helpful? Give feedback.
-
Hi @dberenbaum, @pared ! Sorry, I've missed notifications on your comments. I try to remember which use case we discussed that may lead to this issue. Probably, this is due to the customer use case with 2 models (on different feature sets):
The main request is - how to plots show/diff plots with multiple data series (like GridSearch)?
At this moment, the only options is to save each plot separately. And try to compare them. This issue #5693 actually could help with this @dberenbaum. However, it seems not the right directions due to reasons discussed above. In my opinion, we need to think about
|
Beta Was this translation helpful? Give feedback.
-
Probably yes. In this case we need to think about such file structure and this file need to be changed (append new experiment data)by few tasks/pipelines. This seems not a good pattern. Another issue for such case is usability. For my personal projects, I really don't like to do plots generation in multi-steps scenario:
I could understand, that this is due to DVC's |
Beta Was this translation helpful? Give feedback.
-
Users may run similar experiment variants in different directories or paths (especially now that
foreach
syntax makes this structure easy to set up). They may want to do a plot comparing between those paths rather than between git references.This could borrow from the
git diff --no-index
syntax (https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-diff.html). For example:dvc plots diff --no-index model1_roc.tsv model2_roc.tsv
To keep this simple, I'll ignore the implications for other commands (
params
/metrics
) for now 😄.Beta Was this translation helpful? Give feedback.
All reactions