YAML SchemaΒΆ
The distributed training workload is defined in YAML
and can be launched by invoking the ads opctl run -f path/to/yaml
command.
distributed
schema
Key | Value | |
---|---|---|
kind |
string
|
Must be distributed
|
apiVersion |
string
|
Must be v1.0
|
spec |
dict
|
See distributed.spec schema.
|
distributed.spec
schema
Key | Value | |
---|---|---|
infrastructure |
dict
|
See distributed.spec.infrastructure schema.
|
cluster |
dict
|
See distributed.spec.cluster schema.
|
runtime |
dict
|
See distributed.spec.runtime schema.
|
distributed.spec.infrastructure
schema
Key | Value | |
---|---|---|
kind |
string
|
Must be infrastructure
|
type |
string
|
Must be dataScienceJob
|
apiVersion |
string
|
Must be v1.0
|
spec |
dict
|
See distributed.spec.infrastructure.spec schema.
|
distributed.spec.cluster
schema
Key | Value | |
---|---|---|
kind |
string
|
PYTORCH , DASK , HOROVOD , dask , pytorch , or horovod
|
apiVersion |
string
|
Must be v1.0
|
spec |
dict
|
See distributed.spec.cluster.spec schema.
|
distributed.spec.runtime
schema
Key | Value | |
---|---|---|
kind |
string
|
|
apiVersion |
string
|
Must be v1.0
|
spec |
dict
|
See distributed.spec.runtime.spec schema.
|
distributed.spec.infrastructure.spec
schema
Key | Value | |
---|---|---|
displayName |
string
|
|
compartmentId |
string
|
|
projectId |
string
|
|
logGroupId |
string
|
|
logId |
string
|
|
subnetId |
string
|
|
shapeName |
string
|
|
blockStorageSize |
integer
|
Minimum: 50
|
distributed.spec.cluster.spec
schema
Key | Value | |
---|---|---|
image |
string
|
URI of the container image. |
workDir |
string
|
Object storage URI to store cluster information during the training. |
name |
string
|
|
config |
dict
|
See distributed.spec.cluster.spec.config schema.
|
main |
dict
|
See distributed.spec.cluster.spec.main schema.
|
worker |
dict
|
See distributed.spec.cluster.spec.worker schema.
|
distributed.spec.runtime.spec
schema
Key | Value | |
---|---|---|
type |
string
|
|
uri |
string
|
URI of the source code location. |
branch |
string
|
Name of the Git repository branch. |
commit |
string
|
Git commit (SHA-1 hash). |
gitSecretId |
string
|
|
entryPoint |
string
|
|
kwargs |
string
|
|
args |
list
|
List of number or string .
|
env |
list
|
List of dict. For each element, see distributed.spec.runtime.spec.env schema.
|
distributed.spec.cluster.spec.config
schema
Key | Value | |
---|---|---|
startOptions |
list
|
List of string. |
env |
list
|
List of dict. For each element, see distributed.spec.cluster.spec.config.env schema.
|
distributed.spec.cluster.spec.main
schema
Key | Value | |
---|---|---|
name |
string
|
|
replicas |
integer
|
|
config |
dict
|
See distributed.spec.cluster.spec.main.config schema.
|
distributed.spec.cluster.spec.worker
schema
Key | Value | |
---|---|---|
name |
string
|
|
replicas |
integer
|
|
config |
dict
|
See distributed.spec.cluster.spec.worker.config schema.
|
distributed.spec.runtime.spec.env
schema
Key | Value | |
---|---|---|
name |
string
|
|
value |
|
distributed.spec.cluster.spec.config.env
schema
Key | Value | |
---|---|---|
name |
string
|
|
value |
|
distributed.spec.cluster.spec.main.config
schema
Key | Value | |
---|---|---|
env |
list
|
List of dict. For each element, see distributed.spec.cluster.spec.main.config.env schema.
|
distributed.spec.cluster.spec.worker.config
schema
Key | Value | |
---|---|---|
env |
list
|
List of dict. For each element, see distributed.spec.cluster.spec.worker.config.env schema.
|
distributed.spec.cluster.spec.main.config.env
schema
Key | Value | |
---|---|---|
name |
string
|
|
value |
|
distributed.spec.cluster.spec.worker.config.env
schema
Key | Value | |
---|---|---|
name |
string
|
|
value |
|
Following is the YAML schema for validating the YAML using Cerberus:
1kind:
2 type: string
3 allowed:
4 - distributed
5apiVersion:
6 type: string
7 allowed:
8 - v1.0
9spec:
10 type: dict
11 schema:
12 infrastructure:
13 type: dict
14 schema:
15 kind:
16 type: string
17 allowed:
18 - infrastructure
19 type:
20 type: string
21 allowed:
22 - dataScienceJob
23 apiVersion:
24 type: string
25 allowed:
26 - v1.0
27 spec:
28 type: dict
29 schema:
30 displayName:
31 type: string
32 compartmentId:
33 type: string
34 projectId:
35 type: string
36 logGroupId:
37 type: string
38 logId:
39 type: string
40 subnetId:
41 type: string
42 shapeName:
43 type: string
44 blockStorageSize:
45 type: integer
46 min: 50
47 cluster:
48 type: dict
49 schema:
50 kind:
51 type: string
52 allowed:
53 - PYTORCH
54 - DASK
55 - HOROVOD
56 - dask
57 - pytorch
58 - horovod
59 apiVersion:
60 type: string
61 allowed:
62 - v1.0
63 spec:
64 type: dict
65 schema:
66 image:
67 type: string
68 meta: URI of the container image.
69 workDir:
70 type: string
71 meta: Object storage URI to store cluster information during the training.
72 name:
73 type: string
74 config:
75 type: dict
76 nullable: true
77 schema:
78 startOptions:
79 type: list
80 schema:
81 type: string
82 env:
83 type: list
84 nullable: true
85 schema:
86 type: dict
87 schema:
88 name:
89 type: string
90 value:
91 type:
92 - number
93 - string
94 main:
95 type: dict
96 schema:
97 name:
98 type: string
99 replicas:
100 type: integer
101 config:
102 type: dict
103 nullable: true
104 schema:
105 env:
106 type: list
107 nullable: true
108 schema:
109 type: dict
110 schema:
111 name:
112 type: string
113 value:
114 type:
115 - number
116 - string
117 worker:
118 type: dict
119 schema:
120 name:
121 type: string
122 replicas:
123 type: integer
124 config:
125 type: dict
126 nullable: true
127 schema:
128 env:
129 type: list
130 nullable: true
131 schema:
132 type: dict
133 schema:
134 name:
135 type: string
136 value:
137 type:
138 - number
139 - string
140 runtime:
141 type: dict
142 schema:
143 kind:
144 type: string
145 apiVersion:
146 type: string
147 allowed:
148 - v1.0
149 spec:
150 type: dict
151 schema:
152 type:
153 type: string
154 uri:
155 type: string
156 meta: URI of the source code location.
157 branch:
158 type: string
159 meta: Name of the Git repository branch.
160 commit:
161 type: string
162 meta: Git commit (SHA-1 hash).
163 gitSecretId:
164 type: string
165 entryPoint:
166 type: string
167 kwargs:
168 type: string
169 args:
170 type: list
171 schema:
172 type:
173 - number
174 - string
175 env:
176 type: list
177 nullable: true
178 schema:
179 type: dict
180 schema:
181 name:
182 type: string
183 value:
184 type:
185 - number
186 - string