如何在Python中按时间序列计算不同逾期天数的贷款违约率
Hey there, let's work through this loan portfolio default rate calculation problem step by step!
Problem Background
I'm calculating different default levels for a loan portfolio over time, where default is defined as:amount_of_money_due_to_be_paid - amount_of_money_actually_paid
Sample Dataset
| id | amount | due_date | paid y/n |
|---|---|---|---|
| 1 | 1000 | 2017-01-01 | no |
| 2 | 1000 | 2017-02-01 | yes |
| 3 | 1000 | 2017-03-01 | no |
| 4 | 1000 | 2017-04-01 | no |
| 5 | 1000 | 2017-05-01 | yes |
Core Calculation Needs
I need to compute default metrics for different past-due thresholds (like 30 days_past_due, 60 days_past_due, etc.):
- At each month-end, sum the amount of loans that are overdue by more than x days (e.g., as of June 30, sum amounts overdue >15 days)
- Calculate the default rate using this formula:
amount_of_money_actually_paid / amount_money_due_to_be_paid_up_to_15_days_ago
Expected Output
I want a DataFrame where column names are month-end dates, and rows represent the default rate, like this example:
| 2017-01-31 | 2017-02-29 | 2017-03-31 | 2017-04-30 | 2017-05-31 | |
|---|---|---|---|---|---|
| Default % | 100% | 50% | 66.67% | 25% | 40% |
Note: In Excel terms, the denominator is the sum of amounts where the due date is earlier than (column header date - x days).
Current Code Limitation & Request
I have a Python snippet that currently counts the number of qualifying loans, but I need to modify it to sum the amounts instead. Can someone help adjust this code?
def default_days(date, outstanding_loans, days): m = (outstanding_loans['repayment_date'] <= date - pd.DateOffset(days)) c = sum(m) return c dates = pd.date_range('2015-11-30',pd.to_datetime('today'), freq='M') defaultt = pd.DataFrame({'date':dates}) defaultt['15 days'] = defaultt['date'].apply(default_days, args=[repayments, 15])
内容的提问来源于stack exchange,提问作者Stats DUB01




