In case you missed it, I using a from . Then I built a to serve the model’s predictions. That’s nice and all, but… how is my model? built a neural network to predict loan risk public dataset LendingClub public API good Today I’m going to put it to the test, pitting it against the risk models of the very institution who issued those loans. That’s right, LendingClub included their own calculated loan grades (and sub-grades) in the dataset, so all the pieces are in place for the most thrilling risk modeling smackdown of this century (or at least this week). May the best algorithm win! joblib prev_notebook_folder = loans = joblib.load(prev_notebook_folder + ) loans.shape import "../input/building-a-neural-network-to-predict-loan-risk/" "loans_for_eval.joblib" (1110171, 70) loans.head() ┌────┬────────────┬────────────┬─────────────┬─────────────────┬─────────────┬─────────────────────┬────────┬──────────────┬───────────────────┬─────────────────┬──────┬────────────┬──────────────────┬────────────────────┬─────────────────┬─────────────────────────────┬─────────────────────┬───────────┬────────┬────────────┬─────────────────┐ │ │ loan_amnt │ term │ emp_length │ home_ownership │ annual_inc │ purpose │ dti │ delinq_2yrs │ cr_hist_age_mths │ fico_range_low │ ... │ tax_liens │ tot_hi_cred_lim │ total_bal_ex_mort │ total_bc_limit │ total_il_high_credit_limit │ fraction_recovered │ issue_d │ grade │ sub_grade │ expected_return │ ├────┼────────────┼────────────┼─────────────┼─────────────────┼─────────────┼─────────────────────┼────────┼──────────────┼───────────────────┼─────────────────┼──────┼────────────┼──────────────────┼────────────────────┼─────────────────┼─────────────────────────────┼─────────────────────┼───────────┼────────┼────────────┼─────────────────┤ │ 0 │ 3600.0 │ 36 months │ 10+ years │ MORTGAGE │ 55000.0 │ debt_consolidation │ 5.91 │ 0.0 │ 148.0 │ 675.0 │ ... │ 0.0 │ 178050.0 │ 7746.0 │ 2400.0 │ 13734.0 │ 1.0 │ Dec-2015 │ C │ C4 │ 4429.08 │ │ 1 │ 24700.0 │ 36 months │ 10+ years │ MORTGAGE │ 65000.0 │ small_business │ 16.06 │ 1.0 │ 192.0 │ 715.0 │ ... │ 0.0 │ 314017.0 │ 39475.0 │ 79300.0 │ 24667.0 │ 1.0 │ Dec-2015 │ C │ C1 │ 29530.08 │ │ 2 │ 20000.0 │ 60 months │ 10+ years │ MORTGAGE │ 63000.0 │ home_improvement │ 10.78 │ 0.0 │ 184.0 │ 695.0 │ ... │ 0.0 │ 218418.0 │ 18696.0 │ 6200.0 │ 14877.0 │ 1.0 │ Dec-2015 │ B │ B4 │ 25959.60 │ │ 4 │ 10400.0 │ 60 months │ 3 years │ MORTGAGE │ 104433.0 │ major_purchase │ 25.37 │ 1.0 │ 210.0 │ 695.0 │ ... │ 0.0 │ 439570.0 │ 95768.0 │ 20300.0 │ 88097.0 │ 1.0 │ Dec-2015 │ F │ F1 │ 17394.60 │ │ 5 │ 11950.0 │ 36 months │ 4 years │ RENT │ 34000.0 │ debt_consolidation │ 10.20 │ 0.0 │ 338.0 │ 690.0 │ ... │ 0.0 │ 16900.0 │ 12798.0 │ 9400.0 │ 4000.0 │ 1.0 │ Dec-2015 │ C │ C3 │ 14586.48 │ └────┴────────────┴────────────┴─────────────┴─────────────────┴─────────────┴─────────────────────┴────────┴──────────────┴───────────────────┴─────────────────┴──────┴────────────┴──────────────────┴────────────────────┴─────────────────┴─────────────────────────────┴─────────────────────┴───────────┴────────┴────────────┴─────────────────┘ 5 rows × 70 columns This post was adapted from a Jupyter Notebook, by the way, so if you’d like to follow along in your own notebook, go ahead and fork mine or ! Kaggle GitHub Ground rules This is going to be a clean fight—my model won’t use any data LendingClub wouldn’t have access to at the point they calculate a loan’s grade (including the grade itself). I’m going to sort the dataset chronologically (using the column, the month and year the loan was issued) and split it into two parts. The first 80% I’ll use for training my competition model, and I’ll compare performance on the last 20%. issue_d sklearn.model_selection train_test_split loans[ ] = loans[ ].astype( ) loans.sort_values( , axis= , inplace= , kind= ) train, test = train_test_split(loans, test_size= , shuffle= ) train, test = train.copy(), test.copy() print( ) from import "date" "issue_d" "datetime64[ns]" "date" "index" True "mergesort" 0.2 False f"The test set contains loans." {len(test):,} The test set contains 222,035 loans. At the earlier end of the test set my model may have a slight informational advantage, having been trained on a few loans that may not have closed yet at the point LendingClub was grading those ones. On the other hand, LendingClub may have a slight informational advantage on the later end of the test set, since they would have known the outcomes of some loans on the earlier end of the test set by that time. I have to give credit to Michael Wurm, by the way, for of comparing my model’s performance to LendingClub’s loan grades, but my approach is pretty different. I’m not trying to simulate the performance of an investment portfolio; I’m just evaluating how well my predictions of simple risk compare. the idea Test metric The test: who can pick the best set of grade A loans, judged on the basis of the independent variable from , the fraction of an expected loan return that a prospective borrower will pay back (which I engineered as ). my last notebook fraction_recovered LendingClub will take the plate first. I’ll gather all their grade A loans from the test set, count them, and calculate their average . That average will be the metric my model has to beat. fraction_recovered Then I’ll train my model on the training set using the same pipeline and parameters I settled on in . Once it’s trained, I’ll use it to make predictions on the test set, then gather the number of top predictions equal to the number of LendingClub’s grade A loans. Finally, I’ll calculate the same average of on that subset, and we’ll have ourselves a winner! my last notebook fraction_recovered LendingClub's turn statistics mean lc_grade_a = test[test[ ] == ] print( ) print( ) print(round(mean(lc_grade_a[ ]), )) from import "grade" "A" f"LendingClub gave loans in the test set an A grade." {len(lc_grade_a):,} "\nAverage `fraction_recovered` on LendingClub's grade A loans:" "fraction_recovered" 5 LendingClub gave 38,779 loans in the test set an A grade. Average `fraction_recovered` on LendingClub's grade A loans: 0.96021 That’s a pretty high percentage. I’m a bit nervous. My turn First, I’ll copy over my function from : run_pipeline my previous notebook sklearn.model_selection train_test_split sklearn_pandas DataFrameMapper sklearn.preprocessing OneHotEncoder, OrdinalEncoder, StandardScaler tensorflow.keras Sequential, Input tensorflow.keras.layers Dense, Dropout X = data.drop(columns=[ ]) y = data[ ] X_train, X_valid, y_train, y_valid = ( train_test_split(X, y, test_size= , random_state= ) validate (X, , y, ) ) transformer = DataFrameMapper( [ (onehot_cols, OneHotEncoder(drop= )), ( list(ordinal_cols.keys()), OrdinalEncoder(categories=list(ordinal_cols.values())), ), ], default=StandardScaler(), ) X_train = transformer.fit_transform(X_train) X_valid = transformer.transform(X_valid) validate input_nodes = X_train.shape[ ] output_nodes = model = Sequential() model.add(Input((input_nodes,))) model.add(Dense( , activation= )) model.add(Dropout( , seed= )) model.add(Dense( , activation= )) model.add(Dropout( , seed= )) model.add(Dense( , activation= )) model.add(Dropout( , seed= )) model.add(Dense(output_nodes)) model.compile(optimizer= , loss= ) history = model.fit( X_train, y_train, batch_size=batch_size, epochs= , validation_data=(X_valid, y_valid) validate , verbose= , ) history.history, model, transformer onehot_cols = [ , , , ] ordinal_cols = { : [ , , , , , , , , , , , ] } from import from import from import from import from import : def run_pipeline ( data, onehot_cols, ordinal_cols, batch_size, validate=True, ) "fraction_recovered" "fraction_recovered" 0.2 0 if else None None "if_binary" if else None 1 1 64 "relu" 0.3 0 32 "relu" 0.3 1 16 "relu" 0.3 2 "adam" "mean_squared_logarithmic_error" 100 if else None 2 return "term" "application_type" "home_ownership" "purpose" "emp_length" "< 1 year" "1 year" "2 years" "3 years" "4 years" "5 years" "6 years" "7 years" "8 years" "9 years" "10+ years" Now for the moment of truth: _, model, transformer = run_pipeline( train.drop(columns=[ , , , , ]), onehot_cols, ordinal_cols, batch_size= , validate= , ) X_test = transformer.transform( test.drop( columns=[ , , , , , , ] ) ) test[ ] = model.predict(X_test) test_sorted = test.sort_values( , axis= , ascending= ) ty_grade_a = test_sorted.iloc[ :len(lc_grade_a)] print( ) print(format(mean(ty_grade_a[ ]), )) # Train the model "issue_d" "date" "grade" "sub_grade" "expected_return" 128 False # Make predictions "fraction_recovered" "issue_d" "date" "grade" "sub_grade" "expected_return" "model_predictions" # Gather top predictions "model_predictions" "index" False 0 # Display results "\nAverage `fraction_recovered` on Ty's grade A loans:" "fraction_recovered" ".5f" Epoch 1/100 6939/6939 - 13s - loss: 0.0249 Epoch 2/100 6939/6939 - 13s - loss: 0.0204 Epoch 3/100 6939/6939 - 13s - loss: 0.0202 Epoch 4/100 6939/6939 - 13s - loss: 0.0202 Epoch 5/100 6939/6939 - 13s - loss: 0.0202 Epoch 6/100 6939/6939 - 14s - loss: 0.0201 Epoch 7/100 6939/6939 - 14s - loss: 0.0201 Epoch 8/100 6939/6939 - 14s - loss: 0.0201 Epoch 9/100 6939/6939 - 13s - loss: 0.0201 Epoch 10/100 6939/6939 - 12s - loss: 0.0201 Epoch 11/100 6939/6939 - 13s - loss: 0.0201 Epoch 12/100 6939/6939 - 13s - loss: 0.0201 Epoch 13/100 6939/6939 - 13s - loss: 0.0201 Epoch 14/100 6939/6939 - 13s - loss: 0.0201 Epoch 15/100 6939/6939 - 12s - loss: 0.0201 Epoch 16/100 6939/6939 - 12s - loss: 0.0201 Epoch 17/100 6939/6939 - 13s - loss: 0.0200 Epoch 18/100 6939/6939 - 13s - loss: 0.0200 Epoch 19/100 6939/6939 - 13s - loss: 0.0200 Epoch 20/100 6939/6939 - 14s - loss: 0.0200 Epoch 21/100 6939/6939 - 13s - loss: 0.0200 Epoch 22/100 6939/6939 - 13s - loss: 0.0200 Epoch 23/100 6939/6939 - 12s - loss: 0.0200 Epoch 24/100 6939/6939 - 12s - loss: 0.0200 Epoch 25/100 6939/6939 - 12s - loss: 0.0200 Epoch 26/100 6939/6939 - 13s - loss: 0.0200 Epoch 27/100 6939/6939 - 13s - loss: 0.0200 Epoch 28/100 6939/6939 - 13s - loss: 0.0200 Epoch 29/100 6939/6939 - 13s - loss: 0.0200 Epoch 30/100 6939/6939 - 13s - loss: 0.0200 Epoch 31/100 6939/6939 - 15s - loss: 0.0200 Epoch 32/100 6939/6939 - 13s - loss: 0.0200 Epoch 33/100 6939/6939 - 12s - loss: 0.0200 Epoch 34/100 6939/6939 - 13s - loss: 0.0200 Epoch 35/100 6939/6939 - 13s - loss: 0.0200 Epoch 36/100 6939/6939 - 13s - loss: 0.0200 Epoch 37/100 6939/6939 - 13s - loss: 0.0200 Epoch 38/100 6939/6939 - 13s - loss: 0.0200 Epoch 39/100 6939/6939 - 13s - loss: 0.0200 Epoch 40/100 6939/6939 - 13s - loss: 0.0200 Epoch 41/100 6939/6939 - 13s - loss: 0.0200 Epoch 42/100 6939/6939 - 13s - loss: 0.0200 Epoch 43/100 6939/6939 - 14s - loss: 0.0200 Epoch 44/100 6939/6939 - 13s - loss: 0.0200 Epoch 45/100 6939/6939 - 13s - loss: 0.0200 Epoch 46/100 6939/6939 - 13s - loss: 0.0200 Epoch 47/100 6939/6939 - 13s - loss: 0.0200 Epoch 48/100 6939/6939 - 13s - loss: 0.0200 Epoch 49/100 6939/6939 - 13s - loss: 0.0200 Epoch 50/100 6939/6939 - 13s - loss: 0.0200 Epoch 51/100 6939/6939 - 13s - loss: 0.0200 Epoch 52/100 6939/6939 - 13s - loss: 0.0200 Epoch 53/100 6939/6939 - 13s - loss: 0.0200 Epoch 54/100 6939/6939 - 14s - loss: 0.0200 Epoch 55/100 6939/6939 - 14s - loss: 0.0200 Epoch 56/100 6939/6939 - 13s - loss: 0.0200 Epoch 57/100 6939/6939 - 13s - loss: 0.0200 Epoch 58/100 6939/6939 - 13s - loss: 0.0200 Epoch 59/100 6939/6939 - 13s - loss: 0.0200 Epoch 60/100 6939/6939 - 13s - loss: 0.0200 Epoch 61/100 6939/6939 - 13s - loss: 0.0200 Epoch 62/100 6939/6939 - 13s - loss: 0.0200 Epoch 63/100 6939/6939 - 13s - loss: 0.0200 Epoch 64/100 6939/6939 - 13s - loss: 0.0200 Epoch 65/100 6939/6939 - 12s - loss: 0.0200 Epoch 66/100 6939/6939 - 13s - loss: 0.0200 Epoch 67/100 6939/6939 - 14s - loss: 0.0200 Epoch 68/100 6939/6939 - 13s - loss: 0.0200 Epoch 69/100 6939/6939 - 13s - loss: 0.0200 Epoch 70/100 6939/6939 - 13s - loss: 0.0200 Epoch 71/100 6939/6939 - 13s - loss: 0.0200 Epoch 72/100 6939/6939 - 13s - loss: 0.0200 Epoch 73/100 6939/6939 - 13s - loss: 0.0200 Epoch 74/100 6939/6939 - 13s - loss: 0.0200 Epoch 75/100 6939/6939 - 13s - loss: 0.0200 Epoch 76/100 6939/6939 - 13s - loss: 0.0200 Epoch 77/100 6939/6939 - 13s - loss: 0.0200 Epoch 78/100 6939/6939 - 13s - loss: 0.0200 Epoch 79/100 6939/6939 - 14s - loss: 0.0200 Epoch 80/100 6939/6939 - 13s - loss: 0.0200 Epoch 81/100 6939/6939 - 13s - loss: 0.0200 Epoch 82/100 6939/6939 - 13s - loss: 0.0200 Epoch 83/100 6939/6939 - 13s - loss: 0.0200 Epoch 84/100 6939/6939 - 12s - loss: 0.0200 Epoch 85/100 6939/6939 - 13s - loss: 0.0200 Epoch 86/100 6939/6939 - 13s - loss: 0.0200 Epoch 87/100 6939/6939 - 13s - loss: 0.0200 Epoch 88/100 6939/6939 - 13s - loss: 0.0200 Epoch 89/100 6939/6939 - 13s - loss: 0.0200 Epoch 90/100 6939/6939 - 13s - loss: 0.0200 Epoch 91/100 6939/6939 - 14s - loss: 0.0200 Epoch 92/100 6939/6939 - 13s - loss: 0.0200 Epoch 93/100 6939/6939 - 13s - loss: 0.0200 Epoch 94/100 6939/6939 - 13s - loss: 0.0200 Epoch 95/100 6939/6939 - 13s - loss: 0.0200 Epoch 96/100 6939/6939 - 13s - loss: 0.0200 Epoch 97/100 6939/6939 - 13s - loss: 0.0200 Epoch 98/100 6939/6939 - 13s - loss: 0.0200 Epoch 99/100 6939/6939 - 13s - loss: 0.0200 Epoch 100/100 6939/6939 - 13s - loss: 0.0200 Average `fraction_recovered` on Ty's grade A loans: 0.96166 Victory! Phew, that was a close one! My win might be too small to be statistically significant, but hey, it’s cool seeing that I can keep up with LendingClub’s best and brightest. What I’d really like to know now is what quantitative range of estimated risk each LendingClub grade and sub-grade corresponds to, but it looks like . Does anyone know if loans grades generally correspond to certain percentage ranges like letter grades in academic classes? If not, have any ideas for better benchmarks I could use to evaluate my model’s performance? Go ahead and chime in in the comments below. that’s proprietary Previously published at https://tymick.me/blog/loan-grading-showdown