YAML Schema#

The distributed training workload is defined in YAML and can be launched by invoking the ads opctl run -f path/to/yaml command.

distributed schema

Key Value
kind string Must be distributed
apiVersion string Must be v1.0
spec dict See distributed.spec schema.

distributed.spec schema

Key Value
infrastructure dict See distributed.spec.infrastructure schema.
cluster dict See distributed.spec.cluster schema.
runtime dict See distributed.spec.runtime schema.

distributed.spec.infrastructure schema

Key Value
kind string Must be infrastructure
type string Must be dataScienceJob
apiVersion string Must be v1.0
spec dict See distributed.spec.infrastructure.spec schema.

distributed.spec.cluster schema

Key Value
kind string PYTORCH, DASK, HOROVOD, dask, pytorch, or horovod
apiVersion string Must be v1.0
spec dict See distributed.spec.cluster.spec schema.

distributed.spec.runtime schema

Key Value
kind string
apiVersion string Must be v1.0
spec dict See distributed.spec.runtime.spec schema.

distributed.spec.infrastructure.spec schema

Key Value
displayName string
compartmentId string
projectId string
logGroupId string
logId string
subnetId string
shapeName string
blockStorageSize integer Minimum: 50

distributed.spec.cluster.spec schema

Key Value
image string URI of the container image.
workDir string Object storage URI to store cluster information during the training.
name string
config dict See distributed.spec.cluster.spec.config schema.
main dict See distributed.spec.cluster.spec.main schema.
worker dict See distributed.spec.cluster.spec.worker schema.

distributed.spec.runtime.spec schema

Key Value
type string
uri string URI of the source code location.
branch string Name of the Git repository branch.
commit string Git commit (SHA-1 hash).
gitSecretId string
entryPoint string
kwargs string
args list List of number or string.
env list List of dict. For each element, see distributed.spec.runtime.spec.env schema.

distributed.spec.cluster.spec.config schema

Key Value
startOptions list List of string.
env list List of dict. For each element, see distributed.spec.cluster.spec.config.env schema.

distributed.spec.cluster.spec.main schema

Key Value
name string
replicas integer
config dict See distributed.spec.cluster.spec.main.config schema.

distributed.spec.cluster.spec.worker schema

Key Value
name string
replicas integer
config dict See distributed.spec.cluster.spec.worker.config schema.

distributed.spec.runtime.spec.env schema

Key Value
name string
value number or string

distributed.spec.cluster.spec.config.env schema

Key Value
name string
value number or string

distributed.spec.cluster.spec.main.config schema

Key Value
env list List of dict. For each element, see distributed.spec.cluster.spec.main.config.env schema.

distributed.spec.cluster.spec.worker.config schema

Key Value
env list List of dict. For each element, see distributed.spec.cluster.spec.worker.config.env schema.

distributed.spec.cluster.spec.main.config.env schema

Key Value
name string
value number or string

distributed.spec.cluster.spec.worker.config.env schema

Key Value
name string
value number or string

Following is the YAML schema for validating the YAML using Cerberus:

  1kind:
  2  type: string
  3  allowed:
  4    - distributed
  5apiVersion:
  6  type: string
  7  allowed:
  8    - v1.0
  9spec:
 10  type: dict
 11  schema:
 12    infrastructure:
 13      type: dict
 14      schema:
 15        kind:
 16          type: string
 17          allowed:
 18            - infrastructure
 19        type:
 20          type: string
 21          allowed:
 22            - dataScienceJob
 23        apiVersion:
 24          type: string
 25          allowed:
 26            - v1.0
 27        spec:
 28          type: dict
 29          schema:
 30            displayName:
 31              type: string
 32            compartmentId:
 33              type: string
 34            projectId:
 35              type: string
 36            logGroupId:
 37              type: string
 38            logId:
 39              type: string
 40            subnetId:
 41              type: string
 42            shapeName:
 43              type: string
 44            blockStorageSize:
 45              type: integer
 46              min: 50
 47    cluster:
 48      type: dict
 49      schema:
 50        kind:
 51          type: string
 52          allowed:
 53            - PYTORCH
 54            - DASK
 55            - HOROVOD
 56            - dask
 57            - pytorch
 58            - horovod
 59        apiVersion:
 60          type: string
 61          allowed:
 62            - v1.0
 63        spec:
 64          type: dict
 65          schema:
 66            image:
 67              type: string
 68              meta: URI of the container image.
 69            workDir:
 70              type: string
 71              meta: Object storage URI to store cluster information during the training.
 72            name:
 73              type: string
 74            config:
 75              type: dict
 76              nullable: true
 77              schema:
 78                startOptions:
 79                  type: list
 80                  schema:
 81                    type: string
 82                env:
 83                  type: list
 84                  nullable: true
 85                  schema:
 86                    type: dict
 87                    schema:
 88                      name:
 89                        type: string
 90                      value:
 91                        type:
 92                          - number
 93                          - string
 94            main:
 95              type: dict
 96              schema:
 97                name:
 98                  type: string
 99                replicas:
100                  type: integer
101                config:
102                  type: dict
103                  nullable: true
104                  schema:
105                    env:
106                      type: list
107                      nullable: true
108                      schema:
109                        type: dict
110                        schema:
111                          name:
112                            type: string
113                          value:
114                            type:
115                              - number
116                              - string
117            worker:
118              type: dict
119              schema:
120                name:
121                  type: string
122                replicas:
123                  type: integer
124                config:
125                  type: dict
126                  nullable: true
127                  schema:
128                    env:
129                      type: list
130                      nullable: true
131                      schema:
132                        type: dict
133                        schema:
134                          name:
135                            type: string
136                          value:
137                            type:
138                              - number
139                              - string
140    runtime:
141      type: dict
142      schema:
143        kind:
144          type: string
145        apiVersion:
146          type: string
147          allowed:
148            - v1.0
149        spec:
150          type: dict
151          schema:
152            type:
153              type: string
154            uri:
155              type: string
156              meta: URI of the source code location.
157            branch:
158              type: string
159              meta: Name of the Git repository branch.
160            commit:
161              type: string
162              meta: Git commit (SHA-1 hash).
163            gitSecretId:
164              type: string
165            entryPoint:
166              type: string
167            kwargs:
168              type: string
169            args:
170              type: list
171              schema:
172                type:
173                  - number
174                  - string
175            env:
176              type: list
177              nullable: true
178              schema:
179                type: dict
180                schema:
181                  name:
182                    type: string
183                  value:
184                    type:
185                      - number
186                      - string