A High-level Scorecard Modeling API | 评分卡建模尽在于此
V1.2.2 fixed some non-critical bugs in previous versions.
plt.annotate()
in previous versions, parameter s
is used to pass in the text. However, this parameter has been renamed as text
and from Python3.9 continuing using s
may cause in TypeError annotate() missing 1 required positional argument: 'text'
. In V1.2.2 parameter text
is used when using plt.annotate()
Change default parameter values: Change the default value of parameter min_intervals
in ChiMerge from 1 to 2.
Adjust the naming of private variables in classes:
BaseEstimator
and TransformerMixin
classess in Scikit-learn, and for each parameter Scikit-learn checks whether it is existed inside the class as an property with the exact same name. The previous codes set such parameters as private variables with two underscores as prefix. This resulted in errors like cannot found __xx in class xxxx
when users try to print the instance or access these private variables. Note that this problem won't stop you from getting the correct results.ChiMerge
, WOE
andLogisticRegressionScoreCard
to avoid such problem.This is an emergency update to fix 2 related bugs that may be triggered in rare cases but are hard to debug for someone who is not familiar with the codes. Thanks to @ zeyunH for bring one of the bugs to me.
force_inf
to scorecardbundle/utils/func_numpy.py/_assign_interval_base(), assign_interval_unique(), assign_interval_str()
force_inf
should be set to True so that the largest interval will be overwritten from (xxx, upper] to (xxx, inf]. In other words, the previous upper boundary value is abandoned.force_inf=True
when fitting ChiMerge; Set force_inf=False
when calling ChiMerge transform or Scorecard predict._assign_interval_base
in ChiMerge fit()
, the largest interval will be overwritten from (xxx, upper] to (xxx, inf] to cover the entire value range. However, previously the codes only perform this when the upper boundary (one of the given thresholds) is equal to the maximum value of the data, while in practive the upper boundary may be larger due to rounding (e.g. the max value is 3.14159 and the threshold happend to choose this value and rounded up to 3.1316 due to the decimal
parameter of ChiMerge). From V1.2.1, the condition has been changed to >=
force_inf=False
in function assign_interval_str
when calling Scorecard predict();X_beforeWOE
parameter of LogisticRegressionScoreCard.predict()
. In the case when the Scorecard rules have features which are not in the passed features data, or the passed features data has features which are not in the Scorecard rules, an exception will be raised.feature_discretization:
decimal
to class ChiMerge.ChiMerge()
, which allows users to control the number of decimals of the feature interval boundaries.FeatureIntervalAdjustment.plot_event_dist()
.FeatureIntervalAdjustment.feature_stat()
that computes the input feature's sample distribution, including the sample sizes, event sizes and event proportions of each feature value.feature_selection.FeatureSelection:
identify_colinear_features()
that identifies the highly-correlated features pair that may cause colinearity problem.unstacked_corr_table()
that returns the unstacked correlation table to help analyze the colinearity problem.model_training.LogisticRegressionScoreCard:
LogisticRegressionScoreCard
class so that it now accepts all parameters of sklearn.linear_model.LogisticRegression
and its fit()
fucntion accepts all parameters of the fit()
of sklearn.linear_model.LogisticRegression
(including sample_weight
)baseOdds
for LogisticRegressionScoreCard
. This allows users to pass user-defined base odds (# of y=1 / # of y=0) to the Scorecard model.model_evaluation.ModelEvaluation:
pref_table
, which evaluates the classification performance on differet levels of model scores . This function is useful for setting classification threshold based on precision and recall.model_interpretation:
ScorecardExplainer.important_features()
to help interpret the result of a individual instance. This function indentifies features who contribute the most in pusing the total score of a particular instance above a threshold.V1.1.3 covers all major steps of creating a scorecard model. This version has been used in dozens of scorecard modeling tasks without being found any error/bug during my career as a data analyst.