ads.bds package
Submodules
ads.bds.auth module
- exception ads.bds.auth.KRB5KinitError
Bases:
Exception
KRB5KinitError class when kinit -kt command failed to generate cached ticket with the keytab file and the krb5 config file.
- ads.bds.auth.has_kerberos_ticket()
Whether kerberos cache ticket exists.
- ads.bds.auth.init_ccache_with_keytab(principal: str, keytab_file: str) None
Initialize credential cache using keytab file.
- Parameters:
principal (str) – The unique identity to which Kerberos can assign tickets.
keytab_path (str) – Path to your keytab file.
- Returns:
Nothing.
- Return type:
None
- ads.bds.auth.krbcontext(principal: str, keytab_path: str, kerb5_path: str = '~/.bds_config/krb5.conf') None
A context manager for Kerberos-related actions. It provides a Kerberos context that you can put code inside. It will initialize credential cache automatically with keytab if no cached ticket exists. Otherwise, does nothing.
- Parameters:
principal (str) – The unique identity to which Kerberos can assign tickets.
keytab_path (str) – Path to your keytab file.
kerb5_path ((str, optional).) – Path to your krb5 config file.
- Returns:
Nothing.
- Return type:
None
Examples
>>> from ads.bds.auth import krbcontext >>> from pyhive import hive >>> with krbcontext(principal = "your_principal", keytab_path = "your_keytab_path"): >>> hive_cursor = hive.connect(host="your_hive_host", ... port="your_hive_port", ... auth='KERBEROS', ... kerberos_service_name="hive").cursor()
- ads.bds.auth.refresh_ticket(principal: str, keytab_path: str, kerb5_path: str = '~/.bds_config/krb5.conf') None
generate new cached ticket based on the principal and keytab file path.
- Parameters:
principal (str) – The unique identity to which Kerberos can assign tickets.
keytab_path (str) – Path to your keytab file.
kerb5_path ((str, optional).) – Path to your krb5 config file.
- Returns:
Nothing.
- Return type:
None
Examples
>>> from ads.bds.auth import refresh_ticket >>> from pyhive import hive >>> refresh_ticket(principal = "your_principal", keytab_path = "your_keytab_path") >>> hive_cursor = hive.connect(host="your_hive_host", ... port="your_hive_port", ... auth='KERBEROS', ... kerberos_service_name="hive").cursor()
ads.bds.big_data_service module
- class ads.bds.big_data_service.ADSHiveConnection(host: str, port: str = '10000', auth_mechanism: str = 'GSSAPI', driver: str = 'impyla', **kwargs)
Bases:
object
Initiate the connection.
- Parameters:
host (str) – Hive host name.
port (str) – Hive port. Default to 10000.
auth_mechanism (str) – Default to “GSSAPI”. Using “PLAIN” for unsecure cluster.
driver (str) – Default to “impyla”. Client used to communicate with Hive. Only support impyla by far.
kwargs – Other connection parameters accepted by the client.
- insert(table_name: str, df: DataFrame, if_exists: str, batch_size: int = 1000, **kwargs)
insert a table from a pandas dataframe.
- Parameters:
(str) (if_exists) – Table Name. Table name contains database name as well. By default it will use ‘default’ database. You can specify the database name by table_name=<db_name>.<tb_name>.
(pd.DataFrame) (df) – Data to be injected to the database.
(str) – Whether to replace, append or fail if the table already exists.
batch_size (int, default 1000) – Inserting in batches improves insertion performance. Choose this value based on available memory and network bandwidth.
(dict) (kwargs) – Other parameters used by pandas.DataFrame.to_sql.
- query(sql: str, bind_variables: Optional[Dict] = None, chunksize: Optional[int] = None) Union[DataFrame, Iterator[DataFrame]]
Query data which support select statement.
- Parameters:
(str) (sql) – sql query.
(Optional[Dict]) (bind_variables) – Parameters to be bound to variables in the SQL query, if any. Impyla supports all DB API paramstyle`s, including `qmark, numeric, named, format, pyformat.
(Optional[int]) (chunksize) – chunksize of each of the dataframe in the iterator.
- Returns:
A pandas DataFrame or a pandas DataFrame iterator.
- Return type:
Union[pd.DataFrame, Iterator[pd.DataFrame]]
- class ads.bds.big_data_service.HiveConnection(**params)
Bases:
ABC
Base class Interface.
set up hive connection.
- abstract get_cursor()
Returns the cursor from the connection.
- Returns:
cursor using a specific client.
- Return type:
HiveServer2Cursor
- abstract get_engine()
Returns engine from the connection.
- Return type:
Engine object for the connection.
- class ads.bds.big_data_service.HiveConnectionFactory
Bases:
object
- clientprovider = {'impyla': <class 'ads.bds.big_data_service.ImpylaHiveConnection'>}
- classmethod get(driver='impyla')
- class ads.bds.big_data_service.ImpylaHiveConnection(**params)
Bases:
HiveConnection
ImpalaHiveConnection class which uses impyla client.
set up the impala connection.
- get_cursor() impala.hiveserver2.HiveServer2Cursor
Returns the cursor from the connection.
- Returns:
cursor using impyla client.
- Return type:
impala.hiveserver2.HiveServer2Cursor
- get_engine(schema='default')
return the sqlalchemy engine from the connection.
- Parameters:
schema (str) – Default to “default”. The default schema used for query.
- Returns:
engine using a specific client.
- Return type:
sqlalchemy.engine