Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] DF Analytics: create classification jobs via the UI #51619

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@ interface RegressionAnalysis {
};
}

interface ClassificationAnalysis {
classification: {
dependent_variable: string;
training_percent?: number;
num_top_classes?: string;
};
}

export const SEARCH_SIZE = 1000;

export const defaultSearchQuery = {
Expand Down Expand Up @@ -77,11 +85,16 @@ interface LoadEvaluateResult {
error: string | null;
}

type AnalysisConfig = OutlierAnalysis | RegressionAnalysis | GenericAnalysis;
type AnalysisConfig =
| OutlierAnalysis
| RegressionAnalysis
| ClassificationAnalysis
| GenericAnalysis;

export enum ANALYSIS_CONFIG_TYPE {
OUTLIER_DETECTION = 'outlier_detection',
REGRESSION = 'regression',
CLASSIFICATION = 'classification',
UNKNOWN = 'unknown',
}

Expand All @@ -100,6 +113,10 @@ export const getDependentVar = (analysis: AnalysisConfig) => {
if (isRegressionAnalysis(analysis)) {
depVar = analysis.regression.dependent_variable;
}

if (isClassificationAnalysis(analysis)) {
depVar = analysis.classification.dependent_variable;
}
return depVar;
};

Expand Down Expand Up @@ -132,6 +149,11 @@ export const isRegressionAnalysis = (arg: any): arg is RegressionAnalysis => {
return keys.length === 1 && keys[0] === ANALYSIS_CONFIG_TYPE.REGRESSION;
};

export const isClassificationAnalysis = (arg: any): arg is ClassificationAnalysis => {
const keys = Object.keys(arg);
return keys.length === 1 && keys[0] === ANALYSIS_CONFIG_TYPE.CLASSIFICATION;
};

export const isRegressionResultsSearchBoolQuery = (
arg: any
): arg is RegressionResultsSearchBoolQuery => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import {
createPermissionFailureMessage,
} from '../../../../../privilege/check_privilege';

import { getAnalysisType } from '../../../../common/analytics';
import { getAnalysisType, isClassificationAnalysis } from '../../../../common/analytics';

import { getResultsUrl, isDataFrameAnalyticsRunning, DataFrameAnalyticsListRow } from './common';
import { stopAnalytics } from '../../services/analytics_service';
Expand All @@ -26,10 +26,12 @@ export const AnalyticsViewAction = {
render: (item: DataFrameAnalyticsListRow) => {
const analysisType = getAnalysisType(item.config.analysis);
const jobStatus = item.stats.state;
const isDisabled = isClassificationAnalysis(item.config.analysis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we disable the view button for all job types that aren't supported, for cases where the advanced editor has been used to create a type of job that isn't currently supported in the UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea - from what I can see only outlier detection, regression and classification are the types of jobs so classification is the only one that we need to check for right now.
Are you suggesting changing the check to be more like

const isDisabled = !isRegressionAnalysis(item.config.analysis) && !isOutlierDetectionAnalysis(item.config.analysis);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think switching the check to disable if it isn't one of the supported types (currently regression and outlier_detection, soon to add classification) makes sense. There aren't any other types coming in the short-term, but doing the check against known supported types would make it future proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in c09aa27


const url = getResultsUrl(item.id, analysisType, jobStatus);
return (
<EuiButtonEmpty
isDisabled={isDisabled}
onClick={() => (window.location.href = url)}
size="xs"
color="text"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ const NUMERICAL_FIELD_TYPES = new Set([
'scaled_float',
]);

const SUPPORTED_CLASSIFICATION_FIELD_TYPES = new Set(['boolean', 'text', 'keyword', 'ip']);

// List of system fields we want to ignore for the numeric field check.
const OMIT_FIELDS: string[] = ['_source', '_type', '_index', '_id', '_version', '_score'];

Expand Down Expand Up @@ -111,6 +113,18 @@ export const CreateAnalyticsForm: FC<CreateAnalyticsFormProps> = ({ actions, sta
}
};

// Regression supports numeric fields. Classification supports numeric, boolean, text, keyword and ip.
const shouldAddFieldOption = (field: Field) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean types aren't currently returned from the newJobCapsService. You will need to add ES_FIELD_TYPES.BOOLEAN into the list of supported types in server/models/job_service/new_job_caps/field_service.ts, and in turn some of the tests will need editing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated newJobCapsService in c09aa27

if (field.id === EVENT_RATE_FIELD_ID) return false;

const isNumerical = NUMERICAL_FIELD_TYPES.has(field.type);
const isSupportedByClassification =
isNumerical || SUPPORTED_CLASSIFICATION_FIELD_TYPES.has(field.type);

if (jobType === JOB_TYPES.REGRESSION) return isNumerical;
if (jobType === JOB_TYPES.CLASSIFICATION) return isNumerical || isSupportedByClassification;
};

const loadModelMemoryLimitEstimate = async () => {
try {
const jobConfig = getJobConfigFromFormState(form);
Expand Down Expand Up @@ -150,7 +164,7 @@ export const CreateAnalyticsForm: FC<CreateAnalyticsFormProps> = ({ actions, sta
const options: Array<{ label: string }> = [];

fields.forEach((field: Field) => {
if (NUMERICAL_FIELD_TYPES.has(field.type) && field.id !== EVENT_RATE_FIELD_ID) {
if (shouldAddFieldOption(field)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about filtering for numeric types above here should be updated too, now that the filtering is doing more checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in c09aa27

options.push({ label: field.id });
}
});
Expand Down Expand Up @@ -195,7 +209,10 @@ export const CreateAnalyticsForm: FC<CreateAnalyticsFormProps> = ({ actions, sta
};

useEffect(() => {
if (jobType === JOB_TYPES.REGRESSION && sourceIndexNameEmpty === false) {
if (
(jobType === JOB_TYPES.REGRESSION || jobType === JOB_TYPES.CLASSIFICATION) &&
sourceIndexNameEmpty === false
) {
loadDependentFieldOptions();
} else if (jobType === JOB_TYPES.OUTLIER_DETECTION && sourceIndexNameEmpty === false) {
validateSourceIndexFields();
Expand All @@ -205,11 +222,11 @@ export const CreateAnalyticsForm: FC<CreateAnalyticsFormProps> = ({ actions, sta
useEffect(() => {
const hasBasicRequiredFields =
jobType !== undefined && sourceIndex !== '' && sourceIndexNameValid === true;
const jobTypesWithDepVar =
jobType === JOB_TYPES.REGRESSION || jobType === JOB_TYPES.CLASSIFICATION;

const hasRequiredAnalysisFields =
(jobType === JOB_TYPES.REGRESSION &&
dependentVariable !== '' &&
trainingPercent !== undefined) ||
(jobTypesWithDepVar && dependentVariable !== '' && trainingPercent !== undefined) ||
jobType === JOB_TYPES.OUTLIER_DETECTION;

if (hasBasicRequiredFields && hasRequiredAnalysisFields) {
Expand Down Expand Up @@ -401,7 +418,7 @@ export const CreateAnalyticsForm: FC<CreateAnalyticsFormProps> = ({ actions, sta
isInvalid={!destinationIndexNameEmpty && !destinationIndexNameValid}
/>
</EuiFormRow>
{jobType === JOB_TYPES.REGRESSION && (
{(jobType === JOB_TYPES.REGRESSION || jobType === JOB_TYPES.CLASSIFICATION) && (
peteharverson marked this conversation as resolved.
Show resolved Hide resolved
<Fragment>
<EuiFormRow
label={i18n.translate(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import { AnalyticsJobType, JOB_TYPES } from '../../hooks/use_create_analytics_fo

interface Props {
type: AnalyticsJobType;
setFormState: any; // TODO update type
setFormState: React.Dispatch<React.SetStateAction<any>>;
}

export const JobType: FC<Props> = ({ type, setFormState }) => {
Expand All @@ -33,9 +33,18 @@ export const JobType: FC<Props> = ({ type, setFormState }) => {
}
);

const classificationHelpText = i18n.translate(
'xpack.ml.dataframe.analytics.create.classificationHelpText',
{
defaultMessage:
'Classification supports fields that are numeric, boolean, text, keyword and ip. It is also tolerant of missing values. Please use the advanced editor to apply custom options such as num top classes.',
peteharverson marked this conversation as resolved.
Show resolved Hide resolved
}
);

const helpText = {
outlier_detection: outlierHelpText,
regression: regressionHelpText,
classification: classificationHelpText,
};

return (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import { DataFrameAnalyticsConfig } from '../../../../common';

import { ACTION } from './actions';
import { reducer, validateAdvancedEditor } from './reducer';
import { getInitialState } from './state';
import { getInitialState, JOB_TYPES } from './state';

jest.mock('ui/index_patterns', () => ({
validateIndexPattern: () => true,
Expand Down Expand Up @@ -51,6 +51,7 @@ describe('useCreateAnalyticsForm', () => {
destinationIndex: 'the-destination-index',
jobId: 'the-analytics-job-id',
sourceIndex: 'the-source-index',
jobType: JOB_TYPES.OUTLIER_DETECTION,
},
});
expect(updatedState.isValid).toBe(true);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,11 @@ import {
JOB_ID_MAX_LENGTH,
ALLOWED_DATA_UNITS,
} from '../../../../../../../common/constants/validation';
import { getDependentVar, isRegressionAnalysis } from '../../../../common/analytics';
import {
getDependentVar,
isRegressionAnalysis,
isClassificationAnalysis,
} from '../../../../common/analytics';

const mmlAllowedUnitsStr = `${ALLOWED_DATA_UNITS.slice(0, ALLOWED_DATA_UNITS.length - 1).join(
', '
Expand Down Expand Up @@ -53,7 +57,7 @@ const getSourceIndexString = (state: State) => {
};

export const validateAdvancedEditor = (state: State): State => {
const { jobIdEmpty, jobIdValid, jobIdExists, jobType, createIndexPattern } = state.form;
const { jobIdEmpty, jobIdValid, jobIdExists, createIndexPattern } = state.form;
const { jobConfig } = state;

state.advancedEditorMessages = [];
Expand Down Expand Up @@ -89,9 +93,9 @@ export const validateAdvancedEditor = (state: State): State => {
}

let dependentVariableEmpty = false;
if (isRegressionAnalysis(jobConfig.analysis)) {
if (isRegressionAnalysis(jobConfig.analysis) || isClassificationAnalysis(jobConfig.analysis)) {
const dependentVariableName = getDependentVar(jobConfig.analysis) || '';
dependentVariableEmpty = jobType === JOB_TYPES.REGRESSION && dependentVariableName === '';
dependentVariableEmpty = dependentVariableName === '';
}

if (sourceIndexNameEmpty) {
Expand Down Expand Up @@ -201,7 +205,10 @@ const validateForm = (state: State): State => {
modelMemoryLimit,
} = state.form;

const dependentVariableEmpty = jobType === JOB_TYPES.REGRESSION && dependentVariable === '';
const jobTypeEmpty = jobType === undefined;
const dependentVariableEmpty =
(jobType === JOB_TYPES.REGRESSION || jobType === JOB_TYPES.CLASSIFICATION) &&
dependentVariable === '';
const modelMemoryLimitEmpty = modelMemoryLimit === '';

if (!modelMemoryLimitEmpty && modelMemoryLimit !== undefined) {
Expand All @@ -210,6 +217,7 @@ const validateForm = (state: State): State => {
}

state.isValid =
!jobTypeEmpty &&
state.form.modelMemoryLimitUnitValid &&
!jobIdEmpty &&
jobIdValid &&
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export enum DEFAULT_MODEL_MEMORY_LIMIT {
regression = '100mb',
// eslint-disable-next-line @typescript-eslint/camelcase
outlier_detection = '50mb',
classification = '50mb',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably makes more sense to have the default as 100mb as it's more similar to regression.

}

export type EsIndexName = string;
Expand All @@ -34,6 +35,7 @@ export interface FormMessage {
export enum JOB_TYPES {
OUTLIER_DETECTION = 'outlier_detection',
REGRESSION = 'regression',
CLASSIFICATION = 'classification',
}

export interface State {
Expand Down Expand Up @@ -149,9 +151,12 @@ export const getJobConfigFromFormState = (
model_memory_limit: formState.modelMemoryLimit,
};

if (formState.jobType === JOB_TYPES.REGRESSION) {
if (
formState.jobType === JOB_TYPES.REGRESSION ||
formState.jobType === JOB_TYPES.CLASSIFICATION
) {
jobConfig.analysis = {
regression: {
[formState.jobType]: {
dependent_variable: formState.dependentVariable,
training_percent: formState.trainingPercent,
},
Expand Down