Connect

Available with ADS v2.5.10 and greater

Notebook Session

Notebook sessions require a conda environment that has the BDS module of ADS installed.

Using the Vault

The preferred method to connect to a BDS cluster is to use the BDSSecretKeeper class. This allows you to store the BDS credentials in the vault and not the notebook. It also provides a greater level of access control to the secrets and allows for credential rotation without breaking connections from various sources.

import ads
import os

from ads.bds.auth import krbcontext
from ads.secrets.big_data_service import BDSSecretKeeper
from pyhive import hive

ads.set_auth('resource_principal')
with BDSSecretKeeper.load_secret("<secret_id>") as cred:
    with krbcontext(principal=cred["principal"], keytab_path=cred['keytab_path']):
        cursor = hive.connect(host=cred["hive_host"],
                              port=cred["hive_port"],
                              auth='KERBEROS',
                              kerberos_service_name="hive").cursor()

Without Using the Vault

BDS requires a Kerberos ticket to authenticate to the service. The preferred method is to use the vault and BDSSecretKeeper because it is more secure, and prevents private information from being stored in a notebook. However, if this is not possible, you can use the refresh_ticket() method to manually create the Kerberos ticket. This method requires the following parameters:

  • kerb5_path: The path to the krb5.conf file. You can copy this file from the master node of the BDS cluster located in /etc/krb5.conf.

  • keytab_path: The path to the principal’s keytab file. You can download this file from the master node on the BDS cluster.

  • principal: The unique identity to that Kerberos can assign tickets to.

import ads
import fsspec
import os

from ads.bds.auth import refresh_ticket

ads.set_auth('resource_principal')
refresh_ticket(principal="<your_principal>", keytab_path="<your_local_keytab_file_path>",
               kerb5_path="<your_local_kerb5_config_file_path>")
cursor = hive.connect(host="<hive_host>", port="<hive_port>",
                      auth='KERBEROS', kerberos_service_name="hive").cursor()

Jobs

A job requires a conda environment that has the BDS module of ADS installed. It also requires secrets and configuration information that can be used to obtain a Kerberos ticket for authentication. You must copy the keytab and krb5.conf files to the jobs instance and can be copied as part of the job. We recommend that you save them into the vault then use BDSSecretKeeper to access them. This is secure because the vault provides access control and allows for key rotation without breaking exiting jobs. You can use the notebook to load configuration parameters like hdfs_host, hdfs_port, hive_host, hive_port, and so on. The keytab and krb5.conf files are securely loaded from the vault then saved in the jobs instance. The krbcontext() method is then used to create the Kerberos ticket. Once the ticket is created, you can query BDS.