Prerequisite#

To access the data in the Data Catalog or work with Data Flow, there are a number of steps that need to be completed.

To configure Data Flow you will need to:

  • Data Flow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for setting up storage.

  • Data Flow requires policies to be set in IAM to access resources to manage and run applications/sessions. Refer to the Data Flow documentation on how to setup policies.

  • Data Flow natively supports conda packs published to OCI Object Storage. Ensure the Data Flow Resource has read access to the bucket or path of your published conda pack, and that the spark version >= 3 when running your Data Flow Application/Session.

To configure Data Catalog you will need to:

  • Data Catalog requires policies to be set in IAM. Refer to the Data Catalog documentation on how to setup policies.

  • The spark-defaults.conf file needs to be configured.