Prerequisite

To access the data in the Data Catalog or work with Data Flow, there are a number of steps that need to be completed.

To configure Data Flow you will need to:

  • DataFlow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for setting up storage.

  • DataFlow requires policies to be set in IAM to access resources to manage and run applications. Refer to the Data Flow documentation on how to setup policies.

  • DataFlow natively supports conda packs published to OCI Object Storage. Ensure the Data Flow Resource has read access to the bucket or path of your published conda pack, and that the spark version >= 3 when running your Data Flow Application.

  • The core-site.xml file needs to be configured.

To configure Data Catalog you will need to:

  • Data Catalog requires policies to be set in IAM. Refer to the Data Catalog documentation on how to setup policies.

  • The spark-defaults.conf file needs to be configured.