Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Debugging Scenario tests for V1 APIs #2937

Merged
merged 5 commits into from
Mar 14, 2019

Conversation

rogancarr
Copy link
Contributor

As laid out in #2498 , we need scenarios to cover the Debugging functionality we want fully supported in V1.

Scenarios

  • I can see how my data was read in to verify that I specified the schema correctly
  • I can see the output at the end of my pipeline to see which columns are available (score, probability, predicted label)
  • I can look at intermediate steps of the pipeline to debug my model. Example: > I were to have the text "Help I'm a bug!" I should be able to see the steps where it is normalized to "help i'm a bug" then tokenized into ["help", "i'm", "a", "bug"] then mapped into term numbers [203, 25, 3, 511] then projected into the sparse float vector {3:1, 25:1, 203:1, 511:1}, etc. etc.
  • (P1) I can access the information needed for understanding the progress of my training (e.g. number of trees trained so far out of how many)

Fixes #2932

@codecov
Copy link

codecov bot commented Mar 13, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@4dbc327). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master    #2937   +/-   ##
=========================================
  Coverage          ?   72.23%           
=========================================
  Files             ?      796           
  Lines             ?   142139           
  Branches          ?    16056           
=========================================
  Hits              ?   102668           
  Misses            ?    35091           
  Partials          ?     4380
Flag Coverage Δ
#Debug 72.23% <100%> (?)
#production 67.98% <ø> (?)
#test 88.39% <100%> (?)
Impacted Files Coverage Δ
test/Microsoft.ML.Functional.Tests/Debugging.cs 100% <100%> (ø)


public LogWatcher()
{
Lines = new Dictionary<string, int>();
Copy link
Member

@abgoswam abgoswam Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary [](start = 28, length = 10)

do we need a dictionary for this ? #Resolved

Copy link
Contributor Author

@rogancarr rogancarr Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I want to hold all the lines, keep a count of how many times I see a unique set of characters, and have the lookup be quick. A hash seems like the way to go, and a Dictionary is a type-safe hash (obligatory stackoverflow link ).

What do you think? I could also reverse the lookup, so that we have a dictionary with the lines of choice in them, and use LogWatcher to count the occurrences. Then I could assert on the number of occurrences (0, 1, 2, etc.) back in the main function.


In reply to: 265409282 [](ancestors = 265409282)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep as is because perf tradeoff is nominal.


In reply to: 265410723 [](ancestors = 265410723,265409282)

new TweetSentiment[]
{
new TweetSentiment { Sentiment = true, SentimentText = "I love ML.NET." },
new TweetSentiment { Sentiment = true, SentimentText = "I love TLC." },
Copy link
Member

@abgoswam abgoswam Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLC [](start = 83, length = 3)

? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny little cakes?


In reply to: 265409347 [](ancestors = 265409347)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, Tender Loving Care


In reply to: 265411100 [](ancestors = 265411100,265409347)

Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


// Verify that columns can be inspected.
// Validate the tokens column.
var tokensColumn = transformedData.GetColumn<string[]>(transformedData.Schema["Features_TransformedText"]);
Copy link
Contributor

@shmoradims shmoradims Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features_TransformedText [](start = 91, length = 24)

where is this magic string coming from? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This magic string is the tokenized column. It takes the name you give it, and returns ${OutputColumnName}_TransformedText} I'll file a separate issue on it.


In reply to: 265701391 [](ancestors = 265701391)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #2957


In reply to: 265708837 [](ancestors = 265708837,265701391)

var expectedLines = new string[3] {
@"[Source=SdcaTrainerBase; Training, Kind=Info] Auto-tuning parameters: L2 = 0.001.",
@"[Source=SdcaTrainerBase; Training, Kind=Info] Auto-tuning parameters: L1Threshold (L1/L2) = 0.",
@"[Source=SdcaTrainerBase; Training, Kind=Info] Using best model from iteration 7."};
Copy link
Contributor

@shmoradims shmoradims Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this text guaranteed to be constant, across runs and OSs? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!

Fixed seed gives consistency across runs; tests pass across OS signifies that that is guaranteed too.


In reply to: 265703890 [](ancestors = 265703890)

Copy link
Contributor

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit d794383 into dotnet:master Mar 14, 2019
@rogancarr rogancarr deleted the 2932_debugging_scenarios branch March 14, 2019 18:40
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants