ads.bds package¶
Submodules¶
ads.bds.auth module¶
- exception ads.bds.auth.KRB5KinitError[source]¶
Bases:
Exception
KRB5KinitError class when kinit -kt command failed to generate cached ticket with the keytab file and the krb5 config file.
- ads.bds.auth.init_ccache_with_keytab(principal: str, keytab_file: str) None [source]¶
Initialize credential cache using keytab file.
- ads.bds.auth.krbcontext(principal: str, keytab_path: str, kerb5_path: str = '~/.bds_config/krb5.conf') None [source]¶
A context manager for Kerberos-related actions. It provides a Kerberos context that you can put code inside. It will initialize credential cache automatically with keytab if no cached ticket exists. Otherwise, does nothing.
- Parameters:
- Returns:
Nothing.
- Return type:
None
Examples
>>> from ads.bds.auth import krbcontext >>> from pyhive import hive >>> with krbcontext(principal = "your_principal", keytab_path = "your_keytab_path"): >>> hive_cursor = hive.connect(host="your_hive_host", ... port="your_hive_port", ... auth='KERBEROS', ... kerberos_service_name="hive").cursor()
- ads.bds.auth.refresh_ticket(principal: str, keytab_path: str, kerb5_path: str = '~/.bds_config/krb5.conf') None [source]¶
generate new cached ticket based on the principal and keytab file path.
- Parameters:
- Returns:
Nothing.
- Return type:
None
Examples
>>> from ads.bds.auth import refresh_ticket >>> from pyhive import hive >>> refresh_ticket(principal = "your_principal", keytab_path = "your_keytab_path") >>> hive_cursor = hive.connect(host="your_hive_host", ... port="your_hive_port", ... auth='KERBEROS', ... kerberos_service_name="hive").cursor()
ads.bds.big_data_service module¶
- class ads.bds.big_data_service.ADSHiveConnection(host: str, port: str = '10000', auth_mechanism: str = 'GSSAPI', driver: str = 'impyla', **kwargs)[source]¶
Bases:
object
Initiate the connection.
- Parameters:
host (str) – Hive host name.
port (str) – Hive port. Default to 10000.
auth_mechanism (str) – Default to “GSSAPI”. Using “PLAIN” for unsecure cluster.
driver (str) – Default to “impyla”. Client used to communicate with Hive. Only support impyla by far.
kwargs – Other connection parameters accepted by the client.
- insert(table_name: str, df: DataFrame, if_exists: str, batch_size: int = 1000, **kwargs)[source]¶
insert a table from a pandas dataframe.
- Parameters:
(str) (if_exists) – Table Name. Table name contains database name as well. By default it will use ‘default’ database. You can specify the database name by table_name=<db_name>.<tb_name>.
(pd.DataFrame) (df) – Data to be injected to the database.
(str) – Whether to replace, append or fail if the table already exists.
batch_size (int, default 1000) – Inserting in batches improves insertion performance. Choose this value based on available memory and network bandwidth.
(dict) (kwargs) – Other parameters used by pandas.DataFrame.to_sql.
- query(sql: str, bind_variables: Dict | None = None, chunksize: int | None = None) DataFrame | Iterator[DataFrame] [source]¶
Query data which support select statement.
- Parameters:
(str) (sql) – sql query.
(Optional[Dict]) (bind_variables) – Parameters to be bound to variables in the SQL query, if any. Impyla supports all DB API paramstyle`s, including `qmark, numeric, named, format, pyformat.
(Optional[int]) (chunksize) – chunksize of each of the dataframe in the iterator.
- Returns:
A pandas DataFrame or a pandas DataFrame iterator.
- Return type:
Union[pd.DataFrame, Iterator[pd.DataFrame]]
- class ads.bds.big_data_service.HiveConnection(**params)[source]¶
Bases:
ABC
Base class Interface.
set up hive connection.
- class ads.bds.big_data_service.HiveConnectionFactory[source]¶
Bases:
object
- clientprovider = {'impyla': <class 'ads.bds.big_data_service.ImpylaHiveConnection'>}¶
- class ads.bds.big_data_service.ImpylaHiveConnection(**params)[source]¶
Bases:
HiveConnection
ImpalaHiveConnection class which uses impyla client.
set up the impala connection.