Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeCamp #147 [Doc] Add Chinese version of train & test tutorial #2355

Merged
merged 5 commits into from
Dec 12, 2022

Conversation

BLUE-coconut
Copy link
Contributor

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@mm-assistant
Copy link

mm-assistant bot commented Nov 28, 2022

We recommend using English or English & Chinese for pull requests so that we could have broader discussion.

@CLAassistant
Copy link

CLAassistant commented Nov 28, 2022

CLA assistant check
All committers have signed the CLA.

@xiexinch xiexinch self-assigned this Nov 28, 2022
@xiexinch xiexinch changed the title 英译中,完成4_train_test.md中文教程文档 [Doc] Add Chinese version of train & test tutorial Nov 28, 2022
@@ -0,0 +1,223 @@
# 教程4:使用现有模型进行训练和测试

MMsegmentation 支持在多种设备上训练和测试模型。如下文,具体方式分别为单GPU、分布式、族群式的训练和测试。通过本教程,你将知晓如何用MMsegmentation提供的脚本进行训练和测试。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MMsegmentation 支持在多种设备上训练和测试模型。如下文,具体方式分别为单GPU、分布式、族群式的训练和测试。通过本教程,你将知晓如何用MMsegmentation提供的脚本进行训练和测试
MMsegmentation 支持在多种设备上训练和测试模型。如下文,具体方式分别为单GPU、分布式以及计算集群的训练和测试。通过本教程,你将知晓如何用 MMsegmentation 提供的脚本进行训练和测试


- `--work-dir ${工作路径}`: 重新指定工作路径
- `--amp`: 使用自动混合精度计算
- `--resume`: 从工作路径中调用保存的最新的模型权重文件(checkpoint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `--resume`: 从工作路径中调用保存的最新的模型权重文件(checkpoint)
- `--resume`: 从工作路径中保存的最新检查点文件(checkpoint)恢复训练

- `--work-dir ${工作路径}`: 重新指定工作路径
- `--amp`: 使用自动混合精度计算
- `--resume`: 从工作路径中调用保存的最新的模型权重文件(checkpoint)
- `--cfg-options ${需更新的具体配置}`: 覆盖已载入的配置中的部分设置,并且 以 xxx=yyy 格式的键值对 将被合并到config配置文件中。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `--cfg-options ${需更新的具体配置}`: 覆盖已载入的配置中的部分设置,并且 以 xxx=yyy 格式的键值对 将被合并到config配置文件中
- `--cfg-options ${需更覆盖的配置}`: 覆盖已载入的配置中的部分设置,并且 以 xxx=yyy 格式的键值对 将被合并到 config 配置文件中


下面是对于多GPU测试的可选参数:

- `--launcher`: 用来分布式任务初始化运载器。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `--launcher`: 用来分布式任务初始化运载器。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。
- `--launcher`: 执行器的启动方式。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。

- `--launcher`: 用来分布式任务初始化运载器。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。
- `--local_rank`: 分布式中进程的序号。如果没有指定,默认设置为0。

**注意:** 在config配置文件中 `--resume` 和 field `load_from` 的不同之处:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**注意:** 在config配置文件中 `--resume` 和 field `load_from` 的不同之处:
**注意:** 命令行参数 `--resume` 和在配置文件中的参数 `load_from` 的不同之处:

基础用法如下:

```shell
[GPUS=${GPUS}] sh tools/slurm_test.sh ${划分} ${进程名} ${配置文件} ${检查点文件} [可选参数]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[GPUS=${GPUS}] sh tools/slurm_test.sh ${划分} ${进程名} ${配置文件} ${检查点文件} [可选参数]
[GPUS=${GPUS}] sh tools/slurm_test.sh ${分区} ${进程名} ${配置文件} ${检查点文件} [可选参数]

[GPUS=${GPUS}] sh tools/slurm_test.sh ${划分} ${进程名} ${配置文件} ${检查点文件} [可选参数]
```

你可以检查 [the source code](../../../tools/slurm_test.sh) 来查看全部的参数和环境变量。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
你可以检查 [the source code](../../../tools/slurm_test.sh) 来查看全部的参数和环境变量。
你可以通过 [源码](../../../tools/slurm_test.sh) 来查看全部的参数和环境变量。

GPUS=4 GPUS_PER_NODE=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${工作路径} --cfg-options env_cfg.dist_cfg.port=29501
```

2. 通过修改config配置文件,设置不同的通讯端口:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. 通过修改config配置文件,设置不同的通讯端口
2. 通过修改配置文件设置不同的通讯端口

enf_cfg = dict(dist_cfg=dict(backend='nccl', port=29501))
```

然后你可以通过 config1.py 和 config2.py 同时进行两个任务:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
然后你可以通过 config1.py 和 config2.py 同时进行两个任务
然后你可以通过 config1.py 和 config2.py 同时启动两个任务

CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${划分} ${进程名} config2.py ${工作路径}
```

3. 使用环境变量设置命令中的端口 'MASTER_PORT':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. 使用环境变量设置命令中的端口 'MASTER_PORT'
3. 在命令行中通过环境变量 `MASTER_PORT` 设置端口

@xiexinch xiexinch changed the title [Doc] Add Chinese version of train & test tutorial CodeCamp #147 [Doc] Add Chinese version of train & test tutorial Dec 5, 2022
docs/zh_cn/user_guides/4_train_test.md Outdated Show resolved Hide resolved
@MeowZheng MeowZheng merged commit 7edb141 into open-mmlab:dev-1.x Dec 12, 2022
MeowZheng pushed a commit to MeowZheng/mmsegmentation that referenced this pull request Dec 30, 2022
…orial open-mmlab#2355

* doc

* modify part of content

* changed parts of content

* modified

* Update docs/zh_cn/user_guides/4_train_test.md

Co-authored-by: 谢昕辰 <[email protected]>
aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this pull request Mar 27, 2023
…-mmlab#2355)

correctly locate 3rd file; also correct misleading docs
wjkim81 pushed a commit to wjkim81/mmsegmentation that referenced this pull request Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants