Scorecard Bundle Versions Save

A High-level Scorecard Modeling API | 评分卡建模尽在于此

V1.2.2

2 years ago

V1.2.2 fixed some non-critical bugs in previous versions.

  1. Corrected the use of deprecated parameters
  • When using plt.annotate() in previous versions, parameter s is used to pass in the text. However, this parameter has been renamed as text and from Python3.9 continuing using s may cause in TypeError annotate() missing 1 required positional argument: 'text'. In V1.2.2 parameter text is used when using plt.annotate()
  1. Change default parameter values: Change the default value of parameter min_intervals in ChiMerge from 1 to 2.

  2. Adjust the naming of private variables in classes:

  • Several classes in ScorecardBundle are inherited from the BaseEstimator and TransformerMixin classess in Scikit-learn, and for each parameter Scikit-learn checks whether it is existed inside the class as an property with the exact same name. The previous codes set such parameters as private variables with two underscores as prefix. This resulted in errors like cannot found __xx in class xxxx when users try to print the instance or access these private variables. Note that this problem won't stop you from getting the correct results.
  • V1.2.2 adjusted the use of OOP in ChiMerge, WOE andLogisticRegressionScoreCardto avoid such problem.

v1.2.1

2 years ago

This is an emergency update to fix 2 related bugs that may be triggered in rare cases but are hard to debug for someone who is not familiar with the codes. Thanks to @ zeyunH for bring one of the bugs to me.

  • feature_discretization:
    • [Fix] Add parameter force_inf to scorecardbundle/utils/func_numpy.py/_assign_interval_base(), assign_interval_unique(), assign_interval_str()
      • This parameter controls Whether to force the largest interval's right boundary to be positive infinity. Default is True.
      • In the case when an upper boundary is not smaller then the maximum value, the largest interval output will be (xxx, upper]. In tasks like fitting ChiMerge where the output intervals are supposed to cover the entire value space (-inf ~ inf), this parameter force_inf should be set to True so that the largest interval will be overwritten from (xxx, upper] to (xxx, inf]. In other words, the previous upper boundary value is abandoned.
      • However when merely applying given boundaries, the output intervals should be exactly where the values belong according to the given boundaries and does not have to cover the entire value space. Users may only pass in a few values to transform into intervals, forcing the largest interval to have inf may generate intervals that did not exist.
      • Therefore, set force_inf=True when fitting ChiMerge; Set force_inf=False when calling ChiMerge transform or Scorecard predict.
    • [Fix] When generating intervals with _assign_interval_base in ChiMerge fit(), the largest interval will be overwritten from (xxx, upper] to (xxx, inf] to cover the entire value range. However, previously the codes only perform this when the upper boundary (one of the given thresholds) is equal to the maximum value of the data, while in practive the upper boundary may be larger due to rounding (e.g. the max value is 3.14159 and the threshold happend to choose this value and rounded up to 3.1316 due to the decimal parameter of ChiMerge). From V1.2.1, the condition has been changed to >=
  • model_training.LogisticRegressionScoreCard:
    • [Fix] Set force_inf=False in function assign_interval_str when calling Scorecard predict();
    • [Add] Add a sanity check against the Scorecard rules on the X_beforeWOE parameter of LogisticRegressionScoreCard.predict() . In the case when the Scorecard rules have features which are not in the passed features data, or the passed features data has features which are not in the Scorecard rules, an exception will be raised.

V1.2.0

3 years ago

Updates in V1.2.0

  • feature_discretization:

    • [Add] Add parameter decimal to class ChiMerge.ChiMerge(), which allows users to control the number of decimals of the feature interval boundaries.
    • [Add] Add data table to the feature visualization FeatureIntervalAdjustment.plot_event_dist().
    • [Add] Add function FeatureIntervalAdjustment.feature_stat() that computes the input feature's sample distribution, including the sample sizes, event sizes and event proportions of each feature value.
  • feature_selection.FeatureSelection:

    • [Add] Add function identify_colinear_features() that identifies the highly-correlated features pair that may cause colinearity problem.
    • [Add] Add function unstacked_corr_table() that returns the unstacked correlation table to help analyze the colinearity problem.
  • model_training.LogisticRegressionScoreCard:

    • [Fix] Alter the LogisticRegressionScoreCard class so that it now accepts all parameters of sklearn.linear_model.LogisticRegression and its fit() fucntion accepts all parameters of the fit() of sklearn.linear_model.LogisticRegression (including sample_weight)
    • [Add] Add parameter baseOdds for LogisticRegressionScoreCard. This allows users to pass user-defined base odds (# of y=1 / # of y=0) to the Scorecard model.
  • model_evaluation.ModelEvaluation:

    • [Add] Add function pref_table, which evaluates the classification performance on differet levels of model scores . This function is useful for setting classification threshold based on precision and recall.
  • model_interpretation:

    • [Add] Add functionScorecardExplainer.important_features()to help interpret the result of a individual instance. This function indentifies features who contribute the most in pusing the total score of a particular instance above a threshold.

1.1.3

3 years ago

V1.1.3 covers all major steps of creating a scorecard model. This version has been used in dozens of scorecard modeling tasks without being found any error/bug during my career as a data analyst.