core-site.xml
¶
The core-site.xml
is used to configure connections to Data Flow. This file can be configured manually or with the aid of the odsc
command-line tool. The best practice is to use the odsc core-site config
command-line tool when you want to connect to Data Flow. It gathers information about your environment and uses that to build the file.
The odsc core-site config
command-line tool has no required parameters. Default values are used or values are taken from your notebook session environment and OCI configuration file. Below is a discussion of common parameters that you may need to override.
The --authentication
option sets the authentication mode. It supports resource principal and API keys. The preferred method for authentication is resource principal and this is sent with --authentication resource_principal
. If you want to use API keys then used the option --authentication api_key
. If the --authentication
is not specified, API keys will be used. When API keys are used, information from the OCI configuration file is used to create the core-site.xml
file.
The Object Storage and the Data Flow are regional services. By default, the region is set to the region that your notebook session is in. This information is taken from the environment variable NB_REGION
. Use the --region
option to override this behavior.
The default location of the core-site.xml
file is in the ~/spark_conf_dir
directory, as defined in the SPARK_CONF_DIR
environment variable. Use the --output
option to define the directory where the file is to be written.
odsc
Command-line¶
The odsc core-site config
command-line tool is ideal for setting up the core-site.xml
file as it gathers information about your environment and uses that to build the file.
You will need to determine what settings are appropriate for your configuration. However, the following will work for most configurations.
odsc core-site config --authentication resource_principal
If the option --authentication api_key
is used, it will extract information from the OCI configuration file that is stored in ~/.oci/config
.
For details on the command-line option use the command:
odsc core-site config --help
Manual¶
The odsc
command-line tool is the preferred method for configuring the core-site.xml
file. However, if you are not in a notebook session or if you have special requirements, you may need to manually configure the file. This section will guide you through the steps.
The core-site.xml
file has the following format. The name of the parameter goes in between the <name> </name>
tags and the value goes in between the <value> </value>
tags. Each parameter is in between the <property> </property>
tags.
<?xml version="1.0"?>
<configuration>
<property>
<name>NAME_1</name>
<value>VALUE_1</value>
</property>
<property>
<name>NAME_2</name>
<value>VALUE_2</value>
</property>
</configuration>
The fs.oci.client.hostname
needs to be specified. It is the address of Object Storage. For example, https://objectstorage.us-ashburn-1.oraclecloud.com
You have to replace us-ashburn-1
with the region you are in.
Depending on the authentication method that is to be used there are additional parameters that need to be set. See the following sections for guidance.
Resource Principals¶
Update the core-site.xml
file parameters to use resource principal to authenticate:
fs.oci.client.custom.authenticator
: Set the value tocom.oracle.bmc.hdfs.auth.ResourcePrincipalsCustomAuthenticator
.
The following example core-site.xml
file illustrates using resource principals for authentication to Object Storage:
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.oci.client.hostname</name>
<value>https://objectstorage.us-ashburn-1.oraclecloud.com</value>
</property>
<property>
<name>fs.oci.client.custom.authenticator</name>
<value>com.oracle.bmc.hdfs.auth.ResourcePrincipalsCustomAuthenticator</value>
</property>
</configuration>
For details, see HDFS connector for Object Storage using a resource principal for authentication.
API Keys¶
Update the core-site.xml
file parameters to use API keys to authenticate:
fs.oci.client.auth.fingerprint
: Fingerprint for the key pair.fs.oci.client.auth.passphrase
: An optional password phrase if the PEM key is encrypted.fs.oci.client.auth.pemfilepath
: The fully qualified file name of the private key used for authentication.fs.oci.client.auth.tenantId
: OCID of your tenancy.fs.oci.client.auth.userId
: Your user OCID.
The values of these parameters are found in the OCI configuration file.
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.oci.client.hostname</name>
<value>https://objectstorage.us-ashburn-1.oraclecloud.com</value>
</property>
<property>
<name>fs.oci.client.auth.tenantId</name>
<value>ocid1.tenancy.oc1..<unique_id></value>
</property>
<property>
<name>fs.oci.client.auth.userId</name>
<value>ocid1.user.oc1..<unique_id></value>
</property>
<property>
<name>fs.oci.client.auth.fingerprint</name>
<value>01:23:45:67:89:ab:cd:ef:01:23:45:67:89:ab:cd:ef</value>
</property>
<property>
<name>fs.oci.client.auth.pemfilepath</name>
<value>/home/datascience/.oci/<filename>.pem</value>
</property>
</configuration>