Oracle Big Data Service
Available with ADS v2.5.10 and greater
To connect to Oracle Big Data Service(BDS) you need the following:
principal
: The unique identity to which Kerberos can assign tickets. It will be used to generate the kerberos ticket.kerb5 config file
: krb5.conf file which can be copied from /etc/krb5.conf from the master node of the BDS cluster. It will be used to generate the kerberos ticket.keytab file
: The principal’s keytab file which can be downloaded from the master node of the BDS cluster. It will be used to generate the kerberos ticket.hdfs host
: hdfs host name which will be used to connect to the hdfs file system.hdfs port
: hdfs port which will be used to connect to the hdfs file system.hive host
: hive host name which will be used to connect to the Hive Server.hive port
: hive port which will be used to connect to the Hive Server.
The BDSSecretKeeper
class saves the BDS credentials to the OCI Vault service.
Saving Credentials
Prerequisites
OCID of the vault created in the OCI Console.
OCID of the master key to use for encrypting the secret content stored inside the vault.
OCID of the compartment where the vault resides. This defaults to the compartment of the notebook session when used in a Data Science notebook session.
BDSSecretKeeper
You can also save the connection parameters as well as the files needed to configure the kerberos authentication into vault. This will allow you to use repetitively in different notebook sessions, machines, and Jobs.
BDSSecretKeeper
requires the following fields:
principal
: str. The unique identity to which Kerberos can assign tickets.hdfs_host
: str. The hdfs host name from the bds cluster.hive_host
: str. The hive host name from the bds cluster.hdfs_port
: str. The hdfs port from the bds cluster.hive_port
: str. The hive port from the bds cluster.kerb5_path
: str. Thekrb5.conf
file path.keytab_path
: str. The path to the keytab file.vault_id: str
. OCID of the vault.key_id: str
. OCID of the master key used for encrypting the secret.compartment_id: str
. OCID of the compartment where the vault is located. This defaults to the compartment of the notebook session when used in a Data Science notebook session.
BDSSecretKeeper.save
BDSSecretKeeper.save
API serializes and stores the credentials to Vault using the following parameters:
name (str)
– Name of the secret when saved in Vault.description (str)
– Description of the secret when saved in Vault.freeform_tags (dict, optional)
. Default None. Free form tags to use for saving the secret in the OCI Console.defined_tags (dict, optional.)
. Default None. Save the tags under predefined tags in the OCI Console.save_files (bool, optional.)
. Default True. If set to True, then the keytab and kerb5 config files are serialized and saved.
Examples
Saving a Secret With the Keytab and kerb5 config Files
import ads
import fsspec
import os
from ads.secrets.big_data_service import BDSSecretKeeper
from ads.bds.auth import has_kerberos_ticket, refresh_ticket, krbcontext
ads.set_auth('resource_principal')
principal = "<your_principal>"
hdfs_host = "<your_hdfs_host>"
hive_host = "<your_hive_host>"
hdfs_port = <your_hdfs_port>
hive_port = <your_hive_port>
vault_id = "ocid1.vault.oc1.iad.*********"
key_id = "ocid1.key.oc1.iad.*********"
secret = BDSSecretKeeper(
vault_id=vault_id,
key_id=key_id,
principal=principal,
hdfs_host=hdfs_host,
hive_host=hive_host,
hdfs_port=hdfs_port,
hive_port=hive_port,
keytab_path=keytab_path,
kerb5_path=kerb5_path
)
saved_secret = secret.save(name="your_bds_config_secret_name",
description="your bds credentials",
freeform_tags={"schema":"emp"},
defined_tags={},
save_files=True)
Saving a Secret Without Saving the Keytab and kerb5 config File
import ads
import fsspec
import os
from ads.secrets.big_data_service import BDSSecretKeeper
from ads.bds.auth import has_kerberos_ticket, refresh_ticket, krbcontext
ads.set_auth('resource_principal')
principal = "<your_principal>"
hdfs_host = "<your_hdfs_host>"
hive_host = "<your_hive_host>"
hdfs_port = <your_hdfs_port>
hive_port = <your_hive_port>
vault_id = "ocid1.vault.oc1.iad.*********"
key_id = "ocid1.key.oc1.iad.*********"
bds_keeper = BDSSecretKeeper(
vault_id=vault_id,
key_id=key_id,
principal=principal,
hdfs_host=hdfs_host,
hive_host=hive_host,
hdfs_port=hdfs_port,
hive_port=hive_port,
keytab_path=keytab_path,
kerb5_path=kerb5_path
)
saved_secret = bds_keeper.save(name="your_bds_config_secret_name",
description="your bds credentials",
freeform_tags={"schema":"emp"},
defined_tags={},
save_files=False)
print(saved_secret.secret_id)
'ocid1.vaultsecret.oc1..<unique_ID>'
Loading Credentials
Prerequisite
OCID of the secret stored in vault.
BDSSecretKeeper.load_secret
BDSSecretKeeper.load_secret
API deserializes and loads the credentials from Vault. You could use this API in one of
the following ways -
Using a with
statement:
with BDSSecretKeeper.load_secret('ocid1.vaultsecret.oc1..<unique_ID>') as bdssecret:
print(bdssecret['hdfs_host'])
Without using a with
statement:
bdssecretobj = BDSSecretKeeper.load_secret('ocid1.vaultsecret.oc1..<unique_ID>')
bdssecret = bdssecretobj.to_dict()
print(bdssecret['hdfs_host'])
load_secret
takes following parameters -
source
: Either the file that was exported fromexport_vault_details
or the OCID of the secretformat
: Optional. Ifsource
is a file, then this value must bejson
oryaml
depending on the file format.export_env
: Default is False. If set to True, the credentials are exported as environment variable when used with thewith
operator.export_prefix
: The default name for environment variable is user_name, password, service_name, and wallet_location. You can add a prefix to avoid name collisionauth
: Provide overriding authorization information if the authorization information is different from theads.set_auth
setting.keytab_dir
: Optional. Directory path where thekeytab
ZIP file is saved after the contents are retrieved from the vault. If thekeytab
content is not available in the specified secret OCID, then this attribute is ignored.
If the keytab
and kerb5 configuration files were saved in the vault, then a keytab
and kerb5 configuration file of the same name is created by load_secret
. By default, the keytab
file is created in the keytab_path
specified in the secret.
To update the location, set the directory path with key_dir
. However, the kerb5 configuration file is always saved in the “~/.bds_config/krb5.conf” path.
Note that keytab
and kerb5 configuration files are saved only when the
content is saved into the vault.
After you load and save the configuration parameters files, you can
call the krbcontext
context manager to create a Kerberos ticket.
Examples
Access Credentials Using a With Statement
To specify a local keytab
file, set the path to the ZIP file with wallet_location
:
from pyhive import hive
with BDSSecretKeeper.load_secret(saved_secret.secret_id, keytab_dir="~/path/to/save/keytab_file/") as cred:
with krbcontext(principal=cred["principal"], keytab_path=cred['keytab_path']):
hive_cursor = hive.connect(host=cred["hive_host"],
port=cred["hive_port"],
auth='KERBEROS',
kerberos_service_name="hive").cursor()
Now you can query the data from Hive:
hive_cursor.execute("""
select *
from your_db.your_table
limit 10
""")
import pandas as pd
pd.DataFrame(hive_cursor.fetchall(), columns=[col[0] for col in hive_cursor.description])
Access Credentials Without Using a With Statement
Loading from secret id:
bdssecretobj = BDSSecretKeeper.load_secret(saved_secret.secret_id)
bdssecret = bdssecretobj.to_dict()
print(bdssecret)
Loading from a JSON file:
bdssecretobj = BDSSecretKeeper.load_secret(source="./my_bds_vault_info.json", format="json")
bdssecretobj.to_dict()
Loading from a YAML file:
bdssecretobj = BDSSecretKeeper.load_secret(source="./my_bds_vault_info.yaml", format="yaml")
bdssecretobj.to_dict()