Source code for ads.type_discovery.credit_card_detector
#!/usr/bin/env python# -*- coding: utf-8; -*-# Copyright (c) 2020, 2022 Oracle and/or its affiliates.# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/"""NOTE: There's an opportunity here to generate a new feature, credict card numbers are not preditive because they don't generalize, however, if the feature is replaced by the type of card that might be predictive. - Visa: ^4[0-9]{12}(?:[0-9]{3})?$ All Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13. - MasterCard: ^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$ MasterCard numbers either start with the numbers 51 through 55 or with the numbers 2221 through 2720. All have 16 digits. - American Express: ^3[47][0-9]{13}$ American Express card numbers start with 34 or 37 and have 15 digits. - Diners Club: ^3(?:0[0-5]|[68][0-9])[0-9]{11}$ Diners Club card numbers begin with 300 through 305, 36 or 38. All have 14 digits. There are Diners Club cards that begin with 5 and have 16 digits. These are a joint venture between Diners Club and MasterCard, and should be processed like a MasterCard. - Discover: ^6(?:011|5[0-9]{2})[0-9]{12}$ Discover card numbers begin with 6011 or 65. All have 16 digits. - JCB: ^(?:2131|1800|35\d{3})\d{11}$ JCB cards beginning with 2131 or 1800 have 15 digits. JCB cards beginning with 35 have 16 digits."""from__future__importprint_function,absolute_import,divisionimportreimportpandasaspdfromads.type_discoveryimportloggerfromads.type_discovery.abstract_detectorimportAbstractTypeDiscoveryDetectorfromads.type_discovery.typed_featureimportCreditCardTypedFeature
[docs]defis_credit_card(self,name,values):cc=re.compile(CreditCardDetector._pattern_string,re.VERBOSE)# since the nulls have been previously filtered we can safely do "all"samp=(valuesifvalues.size<=CreditCardDetector._max_sample_size_to_luhn_checkelsevalues.sample(n=CreditCardDetector._max_sample_size_to_luhn_check))ifsamp.dtype.namein["float16","float32","float64"]:ifsamp.apply(float.is_integer).all():samp=samp.fillna(0.0).astype(int)ifsamp.dtype.namein["int16","int32","int64"]:samp=samp.astype(str)ifall([cc.match(str(x))forxinsamp]):## iff the pattern matching succeeds do we try the luhn algorithm on a sample#returnall([self.is_luhn_valid(x)forxinsamp])returnFalse