Prerequisite

To access the data in the Data Catalog or work with Data Flow, there are a number of steps that need to be completed.

To configure Data Flow you will need to:

  • Data Flow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for setting up storage.

  • Data Flow requires policies to be set in IAM to access resources to manage and run applications. Refer to the Data Flow documentation on how to setup policies.

  • The core-site.xml file needs to be configured.

To configure Data Catalog you will need to:

  • Data Catalog requires policies to be set in IAM. Refer to the Data Catalog documentation on how to setup policies.

  • The spark-defaults.conf file needs to be configured.