The Effect of Ai-Enabled Credit Scoring On
The Effect of Ai-Enabled Credit Scoring On
Hongchang Wang
Department of Information Systems, Naveen Jindal School of Management, University of Texas at Dallas
Dallas, TX, U.S.A. {[Link]@[Link]}
Songtao Jiang
Department of Data Science, CreditX Inc.
Changning, Shanghai, CHINA {jiangsongtao@[Link]}
Bin Gu
Department of Information Systems, Questrom School of Business, Boston University
Boston, MA U.S.A. {bgu@[Link]}
We studied the effect of a major bank adopting an AI-enabled credit scoring model on financial inclusion as
measured by changes to the approval rate, default rate, and utilization level of a personal loan product for an
underserved population. The bank serves over 50 million customers and previously used a traditional rule-based
model to evaluate the default risk of each loan application. It recently developed an AI model with a higher
prediction accuracy of default risk and used the AI model and the traditional model together to assess loan
applications for one of its personal loan products. Although the AI model may be more accurate in estimating
default risk, little is known about its impact on financial inclusion. We investigated this question using a difference-
in-differences approach by comparing changes in financial inclusion of the personal loan product that adopted
the AI model to that of a similar personal loan product that did not adopt the AI model. We found that the AI
model enhanced financial inclusion for the underserved population by simultaneously increasing the approval
rate and reducing the default rate. Further analysis attributed the enhancement in financial inclusion to the use
of weak signals (i.e., data not conventionally used to evaluate creditworthiness) by the AI model and its
sophisticated machine learning algorithms. Our findings are consistent with statistical discrimination theory, as
the use of weak signals and sophisticated machine learning algorithms improves prediction accuracy at the
individual level, thus reducing the reliance on group characteristics that often lead to financial exclusion. We
elaborated on the development process of the AI model to illustrate how and why the AI model can better evaluate
members of underserved populations. We also found the impacts of the AI model to be heterogeneous across
subgroups, and those with missing weak signals saw smaller improvements in the approval rate. A simulation-
based analysis showed that simplified AI models were also able to increase the approval rate and reduce the
default rate for this population. Our findings provide rich theoretical and practical implications for social justice
by documenting how an AI model designed for improving prediction accuracy can enhance financial inclusion.
Keywords: Financial inclusion, credit scoring, AI models, weak signals, social justice
1
Min-Seok Pang, Atreyi Kankanhalli, Margunn Aanestad, Sudha Ram, and Likoebe M. Maruping were the accepting senior editors for this paper. Jennifer Jie
Zhang served as the associate editor. Hongchang Wang is the corresponding author. The transparency materials for this paper can be found at
[Link]
©2024. The Authors. This work is licensed under the terms of the Creative Commons Attribution CC BY-NC-ND 4.0 License
([Link]
DOI:10.25300/MISQ/2024/18340 MIS Quarterly Vol. 48 No. 4 pp. 1803-1834 / December 2024 1803
Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion
2
The definition and meaning of the underserved population may differ
slightly in various contexts. After introducing our research context, we will
provide detailed definition in Table 1.
Given the promises and pitfalls of AI models, we are creditworthiness in complex and novel ways. While the weak
interested in understanding how AI models influence financial signals provide the raw information for meaningful features
inclusion. Considering the competing theoretical arguments and the machine learning algorithms connect features to
and uncertain prior expectations of the impacts of AI models, creditworthiness, the combination of them brought the largest
we are motivated to investigate the following three research improvement (compared to using weak signals by humans
questions without proposing formal hypotheses: only or using machine learning algorithms on strong signals
only). This improvement benefited the underserved
RQ1—treatment effects: How do AI models influence population significantly more because they were overlooked
financial inclusion regarding the approval rate, default rate, by the traditional model and the higher prediction accuracy
and utilization level of the underserved population? reduced the reliance on group characteristics that often lead to
financial exclusion. Third, the impacts of the AI model were
RQ2—underlying mechanisms: Through what mechanisms heterogeneous, but almost all subgroups of the underserved
and logic chains do AI models impact financial inclusion, and population witnessed an increase in the approval rate and a
what are the key drivers? decrease in the default rate. For subgroups with one or more
missing data domains of weak signals, the impacts of the AI
RQ3—theoretical boundaries: Under what conditions and model were similar, though with a smaller magnitude. Fourth,
for what subpopulations are AI models effective at enhancing our simulation analysis showed that the positive impact of AI
financial inclusion? models on financial inclusion was potentially generalizable to
financial institutions without strong IT capability or access to
To answer these research questions, we cooperated with a rich weak signals. We found that simplified versions of the
traditional state-owned bank in China, which serves over 50 focal AI model (e.g., built with a simple AI algorithm or with
million customers (the focal bank hereafter). One decade ago, only one domain of weak signals) could still increase the
China started publishing a series of financial inclusion policies approval rate and reduce the default rate for the underserved
to motivate banks to serve the underserved population better. population. These findings held across multiple robustness
Among all the suggestions provided by these policies, one was checks and the consideration of alternative explanations.
to use financial technologies (e.g., AI models) in the evaluation
process. Motivated by these policies, the focal bank developed Our findings make a theoretical contribution to the AI and
and tested an AI-enabled credit scoring model in one of its social justice literature by revealing how and why AI models
personal loan products. This AI model incorporated weak can influence financial inclusion. More specifically, we show
signals and used sophisticated machine learning algorithms to that AI models with a goal to improve prediction accuracy can
improve the prediction accuracy of default risk. The bank tested have a significant and positive impact on financial inclusion.
the AI model by using it in combination with the existing This effect is driven by AI models’ ability to utilize both weak
traditional rule-based model to make the final lending decisions. signals and sophisticated machine learning algorithms to
We also identified a similar product that used the traditional improve prediction accuracy, which reduces the bank’s
model only during this time period (i.e., the control product) and reliance on traditional strong signals in loan evaluation.
applied a difference-in-differences (DID) strategy to estimate Because the underserved population often lacks strong
the impacts of the AI model on financial inclusion. signals, the reduced reliance on such signals leads to enhanced
financial inclusion. Therefore, AI models can widen access to
Our sample covered seven months and nine million capital for the underserved population. Our findings also make
applications. Our analysis yielded several important results. a practical contribution by demonstrating a feasible and
First, the adoption of the AI model increased the approval rate powerful AI tool to enhance financial inclusion, even when it
for the underserved population and reduced the default rate for was trained to enhance the prediction accuracy of default risk.
both the underserved and regular populations. 3 It also We provide rich information and detailed discussions on the
increased the utilization level of the whole population. design process of AI models, the compliance issues in using
Second, the enhancement in financial inclusion came from the AI models and privacy-sensitive data, the implementation of
improved prediction accuracy of the AI model. The use of AI models with traditional models, and the heterogeneous
both weak signals and sophisticated machine learning impacts of AI models. All these findings could help
algorithms contributed to this accuracy improvement because practitioners decide how to develop their own AI models and
they could help generate novel features that are predictive of how to deploy AI models in their business settings. We also
creditworthiness and connect these features to discuss the limitations associated with AI models. First, AI
3
More precisely, the adoption of the AI model helped select the of the AI model did not necessarily reduce the default rate by increasing the
underserved applicants who were less likely to default, which was why the repayment capability or repayment intention of the approved borrowers
average default rate of the underserved population decreased. The adoption from the underserved population.
models do not necessarily benefit all applicants. A small (Brown et al., 2019), or provided with low-limit lines of
portion of applicants might have a lower chance of approval credit or loan amounts (Jappelli, 1990). As a result, the
because AI models can better understand their underserved population may have difficulty obtaining loans
creditworthiness, though negatively. Second, AI models or lines of credit from traditional financial institutions
cannot solve financial inclusion issues caused by structural (Leyshon & Thrift, 1995) and fall victim to predatory
social injustice (e.g., structural racism or discrimination that lending (Lusardi & Scheresberg, 2013). To summarize, the
has led to unfair opportunities/wealth allocation, which in turn tendency for financial institutions to lend to borrowers with
has affected applicants’ creditworthiness). These issues would good credit history creates a “chicken-and-egg” problem that
require systematic policy interventions. As our research keeps preventing the underserved and marginalized
shows, AI tools can, however, be quite effective at addressing population from gaining access to capital.
financial inclusion issues caused by statistical discrimination
(Laouenan & Rathelot, 2022). Third, policymakers should Multiple initiatives and actions have been taken to solve this
collect more empirical evidence to understand the impacts of problem, including opening more bank branches in rural and
AI models and work on actionable AI regulations to deal with underserved areas, reducing eligibility criteria for the
privacy or financial exclusion issues. underserved population, and designing specific products
(Tantri, 2021). These efforts have solved the financial
inclusion issue to some extent. However, they come with high
operational costs and default risks (Awaworyi-Churchill,
Literature Review 2020; Cull, 2011). Our research aims to study whether the
advances in information technology, especially AI-enabled
Our study is related to three streams of research, including the credit scoring models, could help financial institutions
literature on financial inclusion and social justice, the enhance financial inclusion for the underserved population.
literature on AI in lending, and the literature on weak signals
and AI-enabled credit scoring.
AI in Lending
Financial Inclusion and Social Justice AI has been widely used in the finance industry for
algorithmic trading, identity verification, customer service,
The concept of financial inclusion emerged more than a fraud detection, risk assessment, asset valuation, etc. (Gomber
decade ago, advocating for access to valuable and affordable et al., 2018; Kang et al., 2017; Kankanhalli & Mellouli, 2019;
financial products and services (United Nations, 2015; World Lee et al., 2020). Our literature review focuses on the
Bank, 2010). Financial inclusion is a critical component of application of AI in credit scoring and lending decision-
social inclusion, which promotes a just society for all (i.e., all making. A few traditional financial institutions and many
people should have access to education, capital, employment, alternative lenders (e.g., Lending Club, [Link],
health care, political rights, and housing) (United Nations, OnDeck) have already started using AI models to consider
1995). Financial inclusion is the foundation for social justice alternative data and mine complex patterns between borrower
as it provides key financial resources (United Nations, 2006). data and loan performance (Jagtiani & Lemieux, 2019; Liu et
For example, the underserved population needs financial al., 2015). For example, Lending Club and [Link] used
resources such as personal loans to improve their immediate hundreds of variables in their AI models to evaluate the
surroundings and to create new opportunities for themselves creditworthiness of their applicants. As a result, they rely less
and their families. Adequate financial access can provide them on FICO scores and have thus been able to lend money to
with better healthcare, potential home ownership, and career applicants with relatively low FICO scores (Nowak et al.,
and employment opportunities (Mitlin, 2008). 2018). OnDeck and Kabbage included social media data, e-
commerce data, and transportation data in their AI models to
The key challenge in financial inclusion is that the evaluate small businesses and have outperformed traditional
underserved population often lacks sufficient financial banks that use sales data and credit data only (Godbillon-
literacy and credit history, which prevents financial Camus & Godlewski, 2005).
institutions from accurately assessing their risk and
creditworthiness (Loufield et al., 2018). Because traditional Extant studies have documented the impacts of AI models on
models tend to rely heavily on credit history, the lack of consumer and small business lending, including both the
credit history is considered a sign of uncreditworthiness reduction in the default rate (Serrano-Cinca et al., 2015) and
(Yawe & Prabhu, 2015). Therefore, the underserved the changes in approved borrowers’ features (Agarwal et al.,
population is likely to be classified as high risk and thus is 2020; Bartlett et al., 2022). However, there are also concerns
likely to be directly rejected, charged high interest rates and criticisms about the adoption of AI in general and of AI-
enabled credit scoring models in particular (Hiller, 2020). One (Chen et al., 2023), and other soft information (Iyer et al.,
concern is that the selection of the training sample may carry 2016, Hou et al., 2023). Weak signals have been underutilized
historical bias and discrimination (Gianfrancesco et al., 2018; by traditional models due mainly to three reasons (Autor,
Mejia & Parker, 2021). The other concern is that AI models 2014; Fügener et al., 2022; Liberti, 2018; Monk et al., 2019):
are typically “black boxes” that do not clearly show the (1) they do not have a clear and direct relationship with
relationships between data input and creditworthiness creditworthiness, so including them challenges underwriting
(Neumann et al., in press; Rudin, 2019). If not handled transparency and explainability; (2) they may contain noisy
carefully, these two issues may strengthen the historical bias information and unstructured contents, so significant efforts
or lead to new biases (Cowgill et al., 2020). are required to make sense of them and incorporate them into
traditional models; (3) they are often difficult to obtain and
Given the promises and pitfalls of AI models in lending, the suffer from missing data issues.
adoption of AI-enabled credit scoring models by traditional
financial institutions has been expected for a long time but has Although taking advantage of weak signals can be
yet to materialize. Because traditional financial institutions challenging, they can be extremely valuable in credit scoring,
face more scrutiny, AI models’ design, deployment, impacts, especially for the underserved population who often have a
and implications would also be expected to differ from those thin credit history or none at all. The value of weak signals
of alternative lenders. To fill this research gap, we cooperated stems from the fact that weak signals can provide helpful
with a traditional bank to understand the impacts of a information on an applicant’s willingness or ability to repay
compliance-compatible AI model on the approval rate and loans (Lu et al., 2023). For example, data collected from
default rate of the underserved population. We also examined mobile financial apps can reveal an applicant’s financial
the underlying mechanisms and the conditions under which management skills. Data related to social connections and
AI models can effectively enhance financial inclusion. social media activities can reflect an applicant’s social capital,
which is known to influence career success (Seibert et al.,
2017). To fulfill the potential of weak signals, modern
Weak Signals and AI-Enabled Credit Scoring machine learning algorithms such as random forests or neural
networks can learn complex models from vast training data
One advantage of AI models is their ability to utilize weak sets (LeCun et al., 2015; Schmidhuber, 2015). These
signals. We adopt the concept of “weak signals” from the algorithms can considerably reduce the cost and difficulty of
existing literature to refer to data that is not directly related to utilizing weak signals and discover interaction effects
creditworthiness or not conventionally used in traditional between strong and weak signals that no traditional model has
credit scoring models (Mendonça et al., 2012). Existing ever exploited. Due to these advantages of AI models, we
studies also used similar terms—for example, “alternative posit that AI models can help financial institutions better
data” (Loufield, 2018; Lu et al., 2023; Serrano-Cinca and understand their applicants, especially members of the
Gutiérrez-Nieto, 2016), “soft information” (Iyer et al., 2016; underserved population who lack credit history.
Liberti & Peterson, 2019), or “nonfinancial data.” We prefer
the term “weak signals” because it indicates that the data is not
only unconventional but also noisy in nature. Traditional
models rely heavily on credit history data, e.g., repayment Data and Empirical Strategy
information, utilization level, and frequency of credit inquiries
(Ozler, 1992). They sometimes also consider employment, Research Setting and Institutional Background
income, assets, and home ownership (Johnson, 2019). We call
these domains of information strong signals because they have We collaborated with a state-owned regional bank to
strong and intuitive relationships with creditworthiness and understand the impacts of AI models on financial inclusion. The
are largely available in numerical values with structural focal bank serves over 50 million people in its home province,
patterns (Liberti, 2019). providing various financial products such as saving accounts,
bill payments, social security funds, credit cards, personal loans,
Unlike strong signals, weak signals can cover a broader range mortgages, and investments. The focal bank previously applied
of financial or nonfinancial data domains. Previous studies traditional rule-based models in its loan evaluation process.
have discussed the following data domains of weak signals: However, it was motivated by a series of government policies
electronic footprint and trajectory (Kim, 2020; Lu et al., advocating banking financial inclusion and, starting in 2020,
2023), social networks (Lin et al., 2013; Lu et al., 2012; Gao decided to experiment with AI-enabled credit scoring models in
et al., 2022), educational background (Li & Hu, 2019), lender- one of its personal loan products. Different government
borrower communication (Xu & Chau, 2018), mobile phone agencies at various levels published these policies, and the
usage (Ma et al., 2018; Lu et al., 2023), facial information essential purpose was to advocate for banks to extend access to
capital for the underserved population by using AI models to loans for the treated product and a sample of 6.8 million
better assess their creditworthiness. We summarize the major applications and 3.2 million approved loans for the control
policies in Table 1. This setting provided us with a unique product. We also collected rich data on applicants’
empirical opportunity to investigate the impacts of AI models characteristics. We report the descriptive statistics of the
because the use of AI models was previously not allowed or not regular population in Table 2 and those of the underserved
encouraged for use by major banks. population in Table 3. To fully reveal the dynamics of the
applicant pool after the adoption of the AI model, we provide
The focal bank had multiple personal loan products and planned detailed statistics in two-week time slots.
to apply AI models to all these products eventually. It selected
one product as a pilot (the treated product hereafter) because the As reported in Tables 2 and 3, before the adoption of the AI
business team was more open to using new technologies. A model, around 50% of applicants from the regular population
development team spent about six months training an AI model were approved, whereas only 17% of applicants from the
to predict applicants’ creditworthiness (as reflected by credit underserved population were approved. Three aspects of
scores ranging from 300 to 850) and implemented the AI model applicants’ characteristics may explain this phenomenon: the
into the treated product on February 4, 2021. It is important to first distinction between the underserved and regular
note that the AI model focused solely on default risk, without any populations is that applicants from the underserved population
adjustment to favor applicants from the underserved population. often had a limited credit history and a higher default risk; the
second distinction is that applicants from the regular population
The treated product provides each approved borrower a credit were often employed by large firms and government agencies,
allowance, typically between 2,000 and 20,000 CNY. Once the whereas applicants from underserved population were often
borrowers use their credit, they must repay the principal and
self-employed (e.g., small business owners or farmers); the
interest within one month. The product’s interest rate was
third distinction is that applicants from the regular population
determined by a time-variant base point of interest and did not
often lived in big cities, whereas applicants from the
vary across individual applicants. Upon approval, the credit
allowance was determined by a separate formula (rather than underserved population often lived in small cities, towns, or
the traditional model or the AI model), which was mainly rural areas. The descriptive statistics indicate that the launch of
related to the bank’s available funds and the borrower’s the AI model dramatically increased the approval rate for the
income. 4 The AI model was used only in the underwriting underserved population and reduced the default rate for both the
process to determine whether to approve a loan application. underserved and regular populations.
Before the adoption of the AI model, only 15% of loans were
issued to the underserved population, who accounted for 80%
of the province’s population. In contrast, 85% of the loans were Development and Implementation of the AI
issued to the regular population, who accounted for only 20% Model
of the population of this province (the regular population
hereafter). We were able to identify a similar personal loan As mentioned earlier, the AI model does not affect the
product provided by the focal bank that had a similar design and formula for interest rate or credit allowance; it merely
served the same customer pool (the control product hereafter).5 influences the decision on whether an application will be
During our observation period, no significant adjustment was approved. The focal bank’s traditional evaluation process
applied to the control product (the focal bank started to develop relies on a rule-based credit scoring model that (1) primarily
an AI model for the control product in 2022). Figure 1 displays uses strong signals, (2) is built upon expert knowledge and
the timeline for the initial adoption of the AI model and a conventional rules, and (3) allows only linear combination
follow-up update on the AI model (more on the update later).
and a tree-structure logic.7 The AI model was developed by
a team of five people over six months. We detail the
We collected data on the treated and control products from
development and implementation procedure in Figure 2 and
October 1, 2020, to April 30, 2021. 6 We ended up with a
discuss the major actions in each step.
sample of 2.5 million applications and 1.2 million approved
4 7
This practice is typical for financial institutions in China and the U.S. The Traditional rule-based models rely largely on expert knowledge and
credit line or the loan amount is often determined after creditworthiness is experience. Rule developers may try different combinations of rules
evaluated. according to historical data, but their analysis mainly checks descriptive
5
See Table A1 for the compassion between underserved applicants of the statistics. Data analytics may play a more important role for traditional
treated and control products. models that use complex algorithms. However, traditional models usually
6
For details on the data, cleaning and processing procedures, and analysis restrict themselves to conventional and commonly used data (e.g., credit
code, please refer to the transparency materials. history) and algorithms (e.g., regressions). The reason is that traditional
models are used by traditional financial institutions subject to regulations.
As shown in Figure 2, the development team collected data loan performance (i.e., default or not) in Step 5. After the
from both internal and external sources in Step 1, including initial individual models were built, the team reviewed the
not only the aforementioned strong signals (e.g., credit data feature importance index of each feature, the consistency of
and asset data) but also weak signals (e.g., app data). In Step the index of each feature, and the meanings of top features to
2, the team cleaned the data by connecting multiple data remove unimportant, inconsistent, or hard-to-interpret
sources, checking extreme values, handling missing values, features in Step 4. Steps 4 and 5 formed an iterative process
and converting categorical variables. The team then generated because features were selected after models were trained and
features from all data domains using different feature new models were further trained with new features. Once all
generation techniques in Step 3 and trained four individual features were confirmed, the winning algorithm was selected
learners using different combinations of features to predict
for each individual model.8 Then in Step 6, the team trained the shaded cells in Figure 3), from approval to denial, from
the ultimate ensemble model using the four individual models denial to approval, or from approval/denial to pending
and tested its performance. The final AI model is a two-layer (pending means the bank needs to collect more information
model that combines four individual learners and one from the applicant and feed the information into this
ensemble learner (see Figure A1 in the Appendix for the evaluation process again to decide). When the traditional and
kernel of the AI model). In Step 7, before the implementation AI models shared the same recommendation, the evaluation
of the AI model, the development team and the validation process resulted in a straightforward decision.
team reviewed the model again and removed hard-to-interpret
features or features related to protected features (e.g., gender
and disability). This change pointed the process back to Step Identification Strategy and Empirical
4 and resulted in a slightly adjusted AI model. After the model Specifications
was eventually approved and implemented, the development
team continued to monitor its performance over time and its The adoption of the AI model for the treated product by the
potential impacts on marginalized subgroups (i.e., minority focal bank provided us with a unique opportunity to
races) in Step 8. investigate the impacts of AI-enabled credit scoring models
on financial inclusion. We applied a difference-in-differences
The development team used data samples from July 1, 2020, (DID) identification strategy to estimate their impacts. We
to December 31, 2020 (around 300,000 issued loans), to build took several steps to remove potential confounding effects in
the AI model. The sample was randomly split into three data our research setting.
sets, including 50% training data, 30% validation data, and
20% testing data. The training data was used in Step 5 for The first challenge came from the potential impacts of
training individual learners. The validation data was used in government policies on financial inclusion. One potential
Step 5 to confirm individual winning algorithms and in Step 6 impact of the government policies was the change in the
to train the ensemble learner. The testing data was used in Step applicant pool from the underserved population. Because many
6 to determine the winning ensemble learner. The training of the policies were issued several years ago before the adoption
process followed a common supervised learning approach and of the AI model, we do not believe they would have confounded
ended up with a LightGBM model as the final winner. It is our results. Furthermore, these policies would influence both
worth emphasizing that the training data set contained the the treated and control products similarly, which would be
natural proportion of the underserved population, and the handled by the DID strategy. Another potential impact of
training process did not assign higher weights to the government policies is on the bank’s loan evaluation process
underserved population. The regular and underserved and eligibility criteria. We communicated with the focal bank
populations shared the same features and were trained using to fully understand their actions to comply with these
the same model. To summarize, compared to the traditional government policies. The focal bank explicitly decided to use
model, the AI model (1) uses both strong and weak signals, the same cutoff score for loan approval decisions for both the
(2) is trained using machine learning algorithms, and (3) underserved and regular populations, as it intended to enhance
allows complex connections between features and financial inclusion without hurting lending performance.
creditworthiness. Due to financial regulations, the focal bank Besides adopting the AI model, the focal bank also took two
did not replace the traditional model with the AI model actions. During the third quarter of 2020, they proactively
entirely. Figure 3 shows how the AI model worked with the promoted all their financial products to the entire population in
traditional model. their home province and adjusted the traditional rule-based
models. We therefore excluded this period from our sample to
As indicated by Figure 3, the traditional and AI models first eliminate any confounding effects (this is why our sample starts
evaluated the applicants independently. Then, a decision on October 1, 2020). The focal bank also had a few bankers and
matrix was used to determine the final lending decision. The loan officers reach out to some applicants from the underserved
cutoff score was 520 before the adoption of the AI model; population to promote its products and evaluate the applications
thus, applicants with a score above 520 would have been manually during our observation period. This behavior would
approved if the traditional model had been used. However, have confounded our findings, so we removed all the
once adopted, the AI model could overturn the applications that were evaluated manually (manual evaluations
recommendations from the traditional model (as indicated in were marked in the bank’s database).
8
The team compared multiple state-of-the-art algorithms and found the
LightGBM algorithm to be the winner. More details on LightGBM (short
for light gradient-boosting machine) are available in Figure A1.
Figure 3. The Implementation of the AI-Enabled Credit Scoring Model at the Treated Product
The second challenge comes from the selection effect at the four sets of variables to make the estimation more precise,
product level. We explicitly discussed the motivation for including (1) 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 , which covers
deploying the AI model on the treated product, and the focal applicant/borrower financial and demographic information,
bank explained that they had a plan to apply AI models to all e.g., income, credit history, gender, age, etc.; (2)
personal loan products (including the control product). The 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 , which covers the two minor updates on the
treated product was selected as a pilot because the business team traditional model and one update on the AI model after its
was more open to new technologies (actually, another AI model adoption. Two traditional model updates occurred on
has been under development for the control product since November 15, 2020, and December 13, 2020, while the AI
2022). This selection was not 100% random, but it had no direct model update occurred on April 17, 2021; (3) 𝑇𝑖𝑚𝑒𝑡 , which
or clear connection with financial inclusion (see Table A1 in the captures the time-fixed effects at the daily level; (4) 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 ,
Appendix for the empirical evidence). which captures the product-fixed effects, i.e., the general long-
lasting differences between the treated and control products. In
After carefully reviewing the institutional background and this DID setting, we essentially compared the change in
cleaning the data set, we ended up with a sample containing performance in the treated group before and after the adoption
9,300,107 applications. Using this sample, we examined the of the AI model versus the change in performance in the control
changes brought by the AI model after its implementation into group before and after the same time period.
the evaluation process. The treated loan product serves as the
treatment group, whereas a similar loan product serves as the In addition to the approval rate, we were also interested in how
control group. We used Equation (1) to investigate how the the AI model influences lending performance. Therefore, we
adoption of the AI model influenced the approval rate of loan followed the logic in Equation (1) and applied Equation (2) to
applicants. The unit of analysis is each loan application, and the answer this question. Equation (2) is the same as Equation (1)
sample period is from October 1, 2020, to April 30, 2021. other than two differences: first, the unit of analysis is at the loan
level rather than the loan application level because we could
𝐴𝑝𝑝𝑟𝑜𝑣𝑒𝑑𝑖𝑗𝑡 = 𝛽0 + 𝛽1 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 + observe lending performance on the approved loans only;
𝛽2 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 + 𝛽3 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 + second, the key dependent variables are a loan’s default status
𝑇𝑖𝑚𝑒𝑡 + 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 + 𝜀𝑖𝑗𝑡 (1) and its utilization level. Loan default is a dummy variable, with
1 indicating default (i.e., repayment was late for over one
𝐴𝑝𝑝𝑟𝑜𝑣𝑒𝑑𝑖𝑗𝑡 denotes whether a loan application was approved, month). Utilization level represents the extent to which the
with 1 indicating approval. Because the unit of analysis is each borrower used the allocated credit allowance. It is measured by
application, i denotes each application, j denotes the product the outstanding loan a borrower had in the immediate one
(i.e., the treated product or control product), and t denotes the month after approval divided by the credit allowance. Because
day when the application was evaluated. Once we know the the interest rate varied over time and was unrelated to individual
application i, we know the product type j and the application borrowers’ characteristics, the time-fixed effects absorb the
date t. However, we used all three subscripts together to make effect of the loan interest rate. Credit allowance was added as a
Equation (1) clearer. 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 is the key independent control variable.
variable in this Equation, denoting whether the AI model has
been adopted in the evaluation process. For the control product, 𝐿𝑜𝑎𝑛 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑖𝑗𝑡 = 𝛽0 + 𝛽1 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 +
this variable always takes the value of 0. For the treated product, 𝛽2 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 + 𝛽3 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 +
this variable takes the value of 1 after the adoption of the AI 𝛽4 𝐶𝑟𝑒𝑑𝑖𝑡 𝐴𝑙𝑙𝑜𝑤𝑎𝑛𝑐𝑒𝑖 + 𝑇𝑖𝑚𝑒𝑡 + 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 + 𝜀𝑖𝑗𝑡 (2)
model. Before the adoption, the value is 0. We controlled for
Table 4. The Impacts of the AI Model on the Approval Rate and Lending Performance: Whole Population
Sample Whole Population
Dependent variables Approval (1/0) Default (1/0) Utilization Level
The initial launch of the AI model -0.008 (0.005) -0.008*** (0.003) 0.058** (0.027)
AI model update (Apr 17) 0.033*** (0.004) -0.003** (0.001) 0.032*** (0.011)
Traditional process update (Nov 15) -0.035*** (0.011) -0.014*** (0.004) 0.012 (0.015)
Traditional process update (Dec 13) -0.007 (0.007) -0.011*** (0.003) 0.023* (0.013)
Has income (1/0) 0.009*** (0.002) -0.006*** (0.001) 0.016*** (0.002)
Income imputed 0.006*** (0.002) -0.002*** (0.001) 0.003* (0.000)
Has credit history (1/0) 0.016*** (0.005) -0.007* (0.004) 0.005*** (0.001)
Number of applications imputed 0.000 (0.001) 0.004*** (0.000) 0.011*** (0.004)
Number of rejections imputed -0.010*** (0.003) 0.005*** (0.002) 0.013*** (0.001)
Bank account balance 0.004*** (0.002) -0.009*** (0.003) -0.003*** (0.001)
Gender (Female = 1) -0.003*** (0.001) -0.005*** (0.002) -0.013*** (0.001)
Age -0.006*** (0.002) -0.007** (0.004) 0.003*** (0.001)
Loan amount/credit allowance N.A. 0.000 (0.001) -0.003*** (0.001)
Time fixed effects √ √ √
Product fixed effects √ √ √
Adjusted R2 0.035 0.016 0.015
No. of observations 9,300,107 4,368,626 4,368,626
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level. Adjusted R2 was calculated
without including the fixed effects. Considering that not every applicant has income information or credit history, we used the dummy variables
"has income" / "has credit history" to denote if this information was missing. We further imputed missing income data or credit history data with
0. As a result, the coefficient of "has income" reflects the impact of the presence of verified income data on lending decisions, and the coefficient
of "income imputed" reflects the impact of the true/reported income (conditional on “has income” being 1). We included both in our model
because they had different implications ("has income" estimated the importance of having income data in the full sample, whereas "income
imputed" estimated the importance of income amount among the subsample who had income data).
Table 5. The Impacts of the AI Model on the Approval Rate and Lending Performance: Subsample
Analysis
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
variables (1/0) (1/0) Level (1/0) (1/0) Level
The initial launch of the -0.006 -0.008*** 0.057** 0.150*** -0.009*** 0.058**
AI model (Feb 4) (0.007) (0.003) (0.027) (0.012) (0.002) (0.028)
AI model update 0.030** -0.003*** 0.032*** 0.030 *** -0.007 *** 0.032***
(Apr 17) (0.005) (0.001) (0.011) (0.006) (0.001) (0.010)
Traditional process -0.126*** -0.015*** 0.011 -0.013 -0.010** 0.017
update (Nov 15) (0.033) (0.005) (0.015) (0.024) (0.004) (0.018)
Traditional process -0.012 -0.011*** 0.024* -0.033*** -0.014*** 0.022
update (Dec 13) (0.024) (0.003) (0.013) (0.010) (0.003) (0.014)
0.013*** -0.007*** 0.020*** 0.009*** 0.003* 0.009***
Has income (1/0) (0.003) (0.001) (0.004) (0.003) (0.002) (0.002)
0.007*** -0.003*** 0.003*** 0.000 0.003*** 0.002***
Income imputed
(0.002) (0.001) (0.000) (0.001) (0.001) (0.000)
0.010*** -0.008*** 0.004*** 0.020*** -0.004*** 0.006**
Has credit history (1/0)
(0.002) (0.000) (0.001) (0.005) (0.001) (0.004)
Number of credit 0.000 0.004*** 0.007* 0.000 0.001*** 0.011***
applications imputed (0.000) (0.000) (0.005) (0.000) (0.000) (0.002)
Number of credit -0.012*** 0.006*** 0.013*** -0.004*** 0.003*** 0.015***
rejections imputed (0.002) (0.001) (0.002) (0.001) (0.000) (0.004)
0.004*** -0.009*** -0.004*** 0.007*** -0.004*** -0.002***
Bank account balance (0.002) (0.002) (0.008) (0.002) (0.001) (0.000)
-0.002*** -0.004*** -0.012*** -0.006*** -0.009*** -0.016***
Gender = Female
(0.001) (0.001) (0.003) (0.002) (0.004) (0.002)
-0.005*** -0.008*** 0.004*** -0.006*** -0.004*** 0.002
Age
(0.002) (0.002) (0.001) (0.002) (0.001) (0.002)
N.A. 0.000 -0.008*** N.A. 0.003* 0.002
Loan amount
(0.000) (0.001) (0.002) (0.001)
Time fixed effects √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √
Adjusted R2 0.050 0.015 0.013 0.024 0.011 0.021
No. of observations 7,927,519 4,016,328 4,016,328 1,372,588 352,297 352,297
Notes: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.
The improvement in financial inclusion occurred against the model update, which further increased the approval rate and
backdrop of the traditional model, which evaluated applicants decreased the default rate for the underserved population.
using strong signals only. Because the underserved population
does not necessarily possess such strong signals, the traditional The findings from Tables 4 and 5 suggest that the AI model can
model often categorically rejects them. This is a form of better distinguish good applicants from bad applicants. This has
statistical discrimination (Fang & Moro, 2011). The AI model two implications: first, the approved applicants should have a
addressed this issue by incorporating weak signals and using lower default rate, which we have already seen; second, the
advanced machine learning algorithms, thus providing more rejected applicants should have a higher default rate if approved.
accurate estimations of applicants’ creditworthiness. Our results In other words, the default rate of the rejected population should
show that the AI model identified some “bad” regular applicants be positively related to the adoption of the AI model. However,
and denied access to capital for them, leading to a slight (though their loan performance would have no ground truth because their
insignificant) reduction in the approval rate and a substantial applications would have been rejected. To approximate their
reduction in the default rate. At the same time, the AI model loan performance, we identified applicants rejected by the
increased the approval rate and decreased the default rate for the treated product or the control product but approved by other
underserved population. Our results are consistent with the AI financial products within one week.9 We set one week as the
9
Most applicants rejected by the treated or control products failed to get a
loan elsewhere. Only 2.3% of them could get approved by other products
within seven days of the application date.
time interval to increase the likelihood that these applicants’ enlarged. These two facts collectively explain why the AI
creditworthiness would not change. Using their default status model can simultaneously increase the approval rate and
with other financial products as a proxy, we conducted a DID decrease the default rate. The more accurate and fair
analysis of the impact of the AI model on the lending distribution of the credit scores for the underserved population
performance of the rejected population. Table 6 reports the qualifies more applicants from the underserved population
results and shows that the applicants rejected by the treated and better distinguishes the creditworthy ones from the
product are associated with a higher default rate after the uncreditworthy ones.
adoption of the AI model. In addition, the applicants rejected by
the treated product are also associated with lower utilization Data Domains
levels after the adoption of the AI model.
To strengthen and deepen this logic chain, we further
Our main findings suggest that the AI model simultaneously investigated the factors in the AI model that contribute to the
reduces Type I errors (false approval) and Type II errors (false increase in prediction accuracy. As we mentioned earlier, this
denial), consistent with our theoretical expectation that the AI increase may come from the use of weak signals (including the
model reduces statistical discrimination against the features generated from them), advanced machine learning
underserved population. algorithms, or both. We first compared the data foundations for
the traditional and AI models, and then estimated each
mechanism’s magnitude. The traditional model of the treated
Mechanisms product mainly relies on human expertise to generate features
and rules using strong signals (e.g., Rule 1: the number of
Prediction Accuracy overdue credit products is more than 3). Experts assign each
rule a score, and the collection of rules hit by each applicant
To understand the seemingly contradictory finding that the AI determines the final credit score and lending decision. The
model simultaneously increases the approval rate for the traditional model eventually includes about 80 rules built upon
underserved population and reduces the default rate and 50 features. Unlike the traditional model, the AI model utilizes
increases the utilization level for the whole population, we both strong and weak signals to predict the default rate (see
note that an essential requirement of the AI model is that it Table 8). Specifically, the traditional model uses data domains
offers a more accurate creditworthiness prediction model for 1-6. In addition to these strong signals, the AI model also uses
both the regular and underserved populations. To assess the social security fund data, provident fund data, in-app financial
prediction accuracy of the AI model, we provide descriptive behavioral data, and in-app nonfinancial behavioral data (i.e.,
statistics on the approval rate and the default rate by looking Data Domains 7-10).
at the distributions of credit score brackets generated by the
traditional and AI models in Table 7. As defined earlier, weak signals refer to information that is not
directly related to creditworthiness and suffers from noisy
Several interesting findings emerge from Table 7. First, the patterns and missing data issues. First, none of the weak
traditional model often underestimated the underserved signals is directly related to creditworthiness. For example, the
population. For example, the traditional model assigned only provident fund can only be used to purchase real estate but
6.8% of the underserved population a credit score between cannot be used to pay off credit card bills or other loans; the
620 and 850. However, this number was 15.7% for the AI social security fund can only be withdrawn after retirement,
model. Given that the traditional model used an absolute and, again, cannot be used to pay off credit card bills or other
cutoff score (i.e., 520) to approve applicants, the adjustment loans; the in-app financial behaviors include depositing
on credit score distribution by the AI model helped approve money, paying bills, reading statements, which have no direct
more applicants from the underserved population. Second, the relationship with creditworthiness; in-app nonfinancial
credit score from the AI model had better prediction accuracy, behaviors include reading news, checking advertisements, and
which can be seen from the correlation between the credit watching videos, which again have no direct relationship with
score and the default rate. The traditional model did a poor job creditworthiness. Second, human experts have limited ability
for the underserved population as the credit score was not to infer complex relationships between weak signals and
closely correlated to the default rate (e.g., the default rate in creditworthiness. For example, given the employment status
the 620-850 range is even higher than the default rate in the and the nature of their jobs, the underserved population may
520-620 range). The AI model vastly improved prediction exhibit various patterns on their social security fund deposits,
accuracy, as the default rate of the above-cutoff population which makes it difficult for human experts to infer repayment
was significantly reduced, and the differences in the default capability. Therefore, making weak signals into rules is
rate between various credit score brackets were significantly largely impractical for traditional models.
Table 8 above also demonstrates the disadvantages of the which we used strong signals only to train a new AI model with
underserved population when the traditional model is used. loans approved before February 4, 2021. We report the results
More than half of the underserved population has no in Table A4. The results show that the use of advanced
information on Data Domains 4-6 (which are from credit algorithms enables AI to achieve higher prediction accuracy,
history data); even for those with data, the data are still limited even when the AI and traditional models use the same set of
as indicated by the number of features the data can generate. information. The deep root of this improvement stems from the
In contrast, the underserved population has better availability advanced feature generation techniques and the complex yet
of weak signals (i.e., Data Domains 7-10), and they have a reliable connections between features and creditworthiness.
similar number of features generated from weak signals as the Table A5 demonstrates this feature generation process and
regular population.10 provides further details and evidence.
10
A high proportion of the underserved population has weak signals because 11
AUC stands for “area under the ROC curve”, which is a measure of the
they are using the focal bank’s app to deposit social security and provident funds prediction accuracy of classification models. ROC stands for “received
or pay utility fees (the focal bank is the largest bank in that province handling operating characteristic curve,” which is a graph showing the true positive
these services). Members of the underserved population who are not customers rate against the false positive rate at all classification thresholds. Higher
of the focal bank will not have weak signals in this regard, and applying the AI AUC values indicate higher prediction accuracy.
model would not improve financial inclusion for them. Some other actions are
needed to provide access to capital to this subgroup.
Table 9. Why Does the AI Model Have Higher Prediction Accuracy: Weak Signals and Advanced Algorithms
Strong signal + Strong signal + Strong signal + AI Strong signal +
Models Traditional models Weak signals + models Weak signals +
Traditional models AI models
Column (1) (2) (3) (4)
Regular population Baseline +114% +186% +229%
Underserved population Baseline +143% +229% +357%
Additional Predictive Power of Weak Signals To assess the relationship between weak signals and strong
signals in predicting default, we focused on correlation scores.
To further understand the value of weak signals, we explored However, we note that within each data domain there can be
two aspects of weak signals: their predictive power on default dozens or hundreds of features. Therefore, we first applied a
likelihood and the relationships between weak signals and principal-component analysis (i.e., PCA) transformation and
strong signals. Table 10 reports the importance/power of each used the primary principal component to capture the
data domain in explaining default likelihood. We employed a maximum variation in all features from the same domain. We
single-domain explainability approach, which uses features then used the primary component from each data domain to
from only one data domain to train the AI model each time. produce a correlation table. Table 11 shows the results. Strong
The performance reflects the highest potential predictive signals are correlated mainly among themselves, but weak
power of each data domain. The results suggest that weak signals are not highly correlated to strong signals. An
signals contain reasonable predictive power, which is close to exception is that social security fund data and provident fund
the power contained by strong signals. This predictive power data are highly correlated with each other and both are highly
is the foundation for weak signals’ value in improving correlated to identity information and income asset
prediction accuracy. information. However, they are not highly correlated with
other strong signals. In-app financial behavioral data and in- The ability of the AI model to enhance financial inclusion for
app nonfinancial behavioral data are highly correlated, but the underserved population comes from its improved
neither of them is correlated to any other data domains, prediction accuracy, which comes from the use of weak signals
including both strong signals and other weak signals. To and advanced algorithms. As a result, the AI model reduces
summarize, Tables 10 and 11 collectively support that weak reliance on strong signals which often favor the regular
signals provide valuable and additional information beyond population and disadvantage the underserved population. To
strong signals in predicting creditworthiness. further investigate the AI model’s ability to reduce statistical
discrimination, we focused on three strong features that are
considered important in traditional loan evaluation: urban
Heterogeneous Impacts and Statistical household designation, self-employed status, and the
Discrimination availability of credit history. These three features are also the
key differentiators between the regular and underserved
Although the adoption of the AI model increases the average populations. We posited that the importance of these features
approval rate for the underserved population, it may have unequal in underwriting will be reduced after the launch of the AI
effects across the underserved population. It is also possible for model. To assess this proposition, we reran Equation (1) by
the AI model to hurt certain subgroups in the underserved including interaction terms between “the initial launch of the
population. We first investigated the heterogeneous impacts of AI model” and these three features. The coefficients of these
the AI model on five subgroups categorized based on the five interaction terms indicate how the approval likelihood changes
criteria of the underserved population. We reran the basic DID
for each applicant type. Table 14 reports the results.
model within each subsample and report the coefficients of “the
initial launch of the AI model” on the approval rate and default
Table 14 shows that applicants who were not self-employed,
rate in Table 12. The AI model increases the approval rate for four
subgroups (except for applicants without permanent residence) were living in urban areas, and had a credit history had a
and reduces the default rate for all five subgroups. The results higher likelihood of being approved before the adoption of the
indicate that the AI model largely improves financial inclusion AI model. After the adoption of the AI model, their
across the underserved population except those with no advantages shrank significantly within both the regular and
permanent residence and helps banks better understand the underserved populations, as indicated by the fact that the
creditworthiness of the underserved population. coefficients of these interaction terms all have opposite signs.
This effect is even more substantial for the underserved
As one key engine of the AI model is the use of weak signals, we population: the approval advantage/preference shrank by
further analyzed the impact of the AI model on subgroups with 47.1% (i.e., 0.008/-0.017) for employed applicants, 52.0%
incomplete or missing weak signals. We generated dummy (i.e., -0.013/0.025) for applicants living in urban areas, and
variables to represent the status of missing data in each data 27.3% (i.e., -0.003/0.011) for applicants with a credit history.
domain and interact them with “the initial launch of the AI Although applicants who were self-employed, living in
model” in Equation (1) to investigate how the impact of the AI nonurban areas, and lacking a credit history were still
model on the approval rate varies with missing data. Because in- considered unfavorable compared to applicants who were
app financial behavioral data and in-app nonfinancial behavioral employed, living in urban areas, and having a credit history,
data come from the same data source, we combined them to the bias toward them (or the disadvantage faced by them) was
generate one dummy variable. We also generated one dummy reduced significantly by the adoption of the AI model.
variable to represent cases in which all weak signals were
missing. Table 13 reports the results.
Robustness Checks
The negative coefficients of the interaction terms indicate that
missing weak signals generally reduce the impact of the AI model
One prerequisite for a valid DID strategy is the parallel trends
on the approval rate. However, because the main effect of the AI
assumption. The treatment and control groups should follow
model is positive for the underserved population, subgroups with
similar trends on key dependent variables to enhance the
missing data are still better off, and the absolute impact of the AI
likelihood that the post-treatment change does come from the
model on the approval rate is still positive and significant. One
treatment itself. We tested this assumption using a leads/lags
potential reason is that all data domains of weak signals help the
model (relative time model) as shown by Equation (3).
AI model better understand the creditworthiness of the
underserved population, and most of the time, they help correct
𝑌𝑖𝑗𝑡 = 𝛽0 + ∑−5 +5
𝜏=−2 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 + ∑𝜏=0 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 +
the underestimated credit score generated by the traditional
𝛽2 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 + 𝛽3 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 +
model. When one data domain is missing, the other data domains
can still provide helpful information to the AI model. 𝑇𝑖𝑚𝑒𝑡 + 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 + 𝜀𝑖𝑗𝑡 (3)
Table 13. Heterogeneous Impacts of the AI Model on Subgroups with Missing Weak Signals
Dependent variable Approval (1/0)
Sample Regular population Underserved population
AI model × Missing
-0.003*** -0.005***
social security fund
(0.001) (0.002)
data
AI model × Missing -0.002* -0.020***
provident fund data (0.001) (0.008)
AI model × Missing 0.002 -0.005***
app usage data (0.002) (0.001)
AI model × Missing -0.004*** -0.080**
all weak signals (0.001) (0.027)
The initial launch of -0.008 -0.008 -0.007 -0.008 0.132*** 0.141*** 0.132*** 0.131***
the AI model (0.007) (0.006) (0.006) (0.006) (0.004) (0.004) (0.003) (0.007)
Other controls √ √ √ √ √ √ √ √
Time fixed effects √ √ √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √ √ √
Adjusted R2 0.049 0.051 0.047 0.052 0.028 0.027 0.028 0.025
No. of observations 7,927,519 7,927,519 7,927,519 7,927,519 1,372,588 1,372,588 1,372,588 1,372,588
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.
Table 14. The Impacts of the AI Model on Approval Preference: Heterogeneity Analysis Based on Three
Distinction Features
Dependent variable Approval (1/0)
Sample Regular population Underserved population
0.006*** 0.008***
AI model × Self-employed
(0.002) (-0.001)
AI model × Urban -0.001* -0.013***
household (-0.001) (-0.002)
-0.000 -0.003***
AI model × Credit history
(0.000) (0.000)
The initial launch of the AI -0.007 -0.007 -0.007 0.131*** 0.132*** 0.131***
model (-0.006) (-0.006) (-0.006) (-0.003) (-0.003) (-0.003)
-0.010*** -0.017***
Self-employed (1/0)
(-0.002) (-0.005)
0.021*** 0.025***
Urban household (1/0)
(-0.007) (-0.009)
0.013*** 0.011***
Has credit history (1/0)
(0.004) (0.000)
Other controls √ √ √ √ √ √
Time fixed effects √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √
Adjusted R2 0.062 0.058 0.065 0.030 0.031 0.031
No. of observations 7,927,519 7,927,519 7,927,519 1,372,588 1,372,588 1,372,588
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.
Table 15. The Impacts of the AI Model on the Approval Rate and Lending Performance: Relative Time Model
Sample Regular population Underserved population
Approval Utilization Approval Utilization
Dependent variable Default (1/0) Default (1/0)
(1/0) level (1/0) level
0.009 0.006 -0.049*** 0.001 0.008 -0.040***
Relative time -5 or beyond -5
(0.015) (0.005) (0.009) (0.003) (0.006) (0.008)
0.009 -0.004 -0.003 -0.015 -0.002 -0.007
Relative time -4
(0.010) (0.003) (0.007) (0.016) (0.004) (0.012)
0.010 -0.003 -0.008 -0.013 0.000 -0.010
Relative time -3
(0.015) (0.003) (0.012) (0.008) (0.002) (0.018)
0.003 -0.001 0.003 0.003 -0.005 0.003
Relative time -2
(0.010) (0.001) (0.002) (0.008) (0.003) (0.004)
Relative time -1 Baseline
-0.013* -0.005 0.027*** 0.010 -0.007** 0.013**
Relative time 0
(0.009) (0.004) (0.004) (0.009) (0.004) (0.005)
0.008 -0.001 0.037*** -0.011 -0.008 0.028***
Relative time 1
(0.013) (0.002) (0.013) (0.007) (0.007) (0.007)
0.030*** 0.004*** -0.044*** 0.062*** -0.007*** -0.024***
Relative time 2
(0.003) (0.002) (0.002) (0.008) (0.001) (0.004)
0.016*** 0.001 -0.052*** 0.088*** -0.009*** 0.013
Relative time 3
(0.004) (0.001) (0.025) (0.029) (0.003) (0.012)
-0.004* 0.000 0.053*** 0.133*** -0.008*** 0.023
Relative time 4
(0.003) (0.001) (0.019) (0.015) (0.003) (0.017)
-0.009 -0.004** 0.051*** 0.144*** -0.010*** 0.053***
Relative time 5 or beyond 5
(0.007) (0.001) (0.014) (0.013) (0.001) (0.008)
Other controls √ √ √ √ √ √
Time fixed effects √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √
Adjusted R2 0.053 0.017 0.129 0.025 0.011 0.155
No. of observations 7,927,519 4,016,328 4,016,328 1,372,588 352,297 352,297
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.
Equation (3) extends Equation (1) with one change. It splits We detected no heterogeneous trends or significant
the 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 variable into multiple relative time dummies differences between the treated product and the control
based on how far away the current observation is from the product for the approval rate and default rate before the
adoption time of the AI model in terms of weeks (we used a adoption of the AI model. The utilization level showed one
week dummy rather than a day dummy to smooth the trend). significant difference between the treated and control
These relative time dummies are always 0 for applications products, but it was far away (i.e., 5+ weeks) from the
of the control product. One and only one of them would be adoption date and is unlikely to have confounded our
1 for each application of the treated product. When τ is findings. These results support the parallel trends
negative, the observation is τ weeks before the adoption date. assumption and alleviate the concern that the selection of the
We collapsed all observations five weeks or more than five treated product, rather than the adoption of the AI model,
leads to the enhancement in financial inclusion.
weeks before adoption into the -5 relative time dummy and
removed the -1 relative time dummy to serve as the baseline.
Although the control product seems to be a good
When τ is nonnegative, the observation is τ weeks after the
counterfactual based on the results of the relative time model,
adoption date. We also collapsed all observations five weeks
there are other concerns about the self-selection bias of the AI
or more than five weeks after adoption into the +5 relative model at different levels and the confounding effect of the
time dummy. These nonnegative relative time dummies focal bank’s other actions. We summarize these concerns in
estimate how soon the AI model takes effect and whether the Table 16 and fully address each concern. In a nutshell, our
effect persists. Table 15 reports the results. main findings still hold across multiple robustness checks.
12 13
It is also challenging from an ethical perspective, but that is beyond the [Link]
scope of our paper. growth-attracts-industry-leader-to-further-drive-revenue-growth-
[Link]
Discussion and Conclusion examined the financial inclusion issue from another perspective
by investigating the role of modern information technologies.
Financial inclusion is essential for members of the underserved More specifically, we investigated the impacts of an AI-enabled
population, who are often from rural areas, are self-employed, credit scoring model on the underserved population’s approval
and lack a credit history. Enhancing financial inclusion is a rate, default rate, and utilization level. We collaborated with a
major task for promoting social justice because access to capital regional bank in China and took advantage of a rare event in
may influence opportunities for education, healthcare, which the focal bank deployed an AI model to work with its
employment, housing, etc. Common solutions, such as opening traditional model to evaluate one of its personal loan products.
more branches in rural areas or reducing requirements for the This setting allowed us to identify a similar personal loan
underserved population, can help enhance financial inclusion product during the same period as a control group and apply a
but often come with high operational costs or default risk. We DID strategy to estimate the impacts of the AI model.
Our findings suggest that the AI model enhances financial signals and that less sophisticated AI models have a weaker
inclusion without sacrificing loan performance. The AI model impact on financial inclusion. Future studies could seek to better
significantly increases the approval rate for the underserved understand the heterogeneous impacts of various AI models. It
population while decreasing the default rate and increasing the would also be interesting for future studies to investigate the
utilization level for the whole population. These results are impacts of AI models on other financial products. Third, the
encouraging because they help solve the persistent dilemma of “black-box” problem of AI models and the use of personal data
how to balance financial inclusion and loan performance. Our should be handled better by financial institutions. Policymakers
study solves this dilemma using an AI model to improve financial should work on actionable AI regulations to deal with data
inclusion without hurting lending performance. This is feasible privacy and financial exclusion issues.
because the AI model can leverage weak signals and advanced
algorithms, thus reducing statistical discrimination against the
underserved population and improving their access to capital. Acknowledgments
Our study also delves into the underlying mechanisms of why We thank Dr. Mingjie Zhu, the chairman and CEO of CraiditX, for
AI models can simultaneously increase the approval rate and his unwavering support of our six-year-long project, including the
reduce the default rate for the underserved population. The provision of manpower, materials, and other resources. Our gratitude
direct reason is that the AI model improves the prediction also extends to the co-editors of the special issue, Min-Seok Pang,
accuracy of the evaluation/underwriting process. Weak signals Atreyi Kankanhalli, Margunn Aanestad, Sudha Ram, and Likoebe M.
Maruping, as well as to the associate editor, Jennifer Jie Zhang, the
and advanced algorithms are two engines for improving
transparency editor, and the reviewers for their constructive and
prediction accuracy. Human experts and advanced techniques developmental feedback throughout the review process. Chunxiao Li
can generate novel and meaningful features from weak signals, expresses gratitude for the support from the National Natural Science
which are then used by machine learning algorithms to predict Foundation of China (NSFC) [Grant 72121001]. Hongchang Wang
creditworthiness. Our empirical analysis finds that both weak is the corresponding author of this paper.
signals and advanced algorithms can contribute to prediction
accuracy and the combination of them brings the largest
improvement, especially for the underserved population. References
Our study contributes to the social justice literature by Abrahams, C. R., & Zhang, M. (2008). Fair lending compliance:
investigating one of its foundations—financial inclusion. We Intelligence and implications for credit risk management. John
focused on one potential solution enabled by modern AI Wiley & Sons.
technologies and found that AI models can enhance financial Agarwal, S., Alok, S., Ghosh, P., & Gupta, S. (2020). Financial
inclusion without hurting lending performance. Our study also inclusion and alternate credit scoring for the millennials: Role of
contributes to the AI and consumer lending literature by big data and machine learning in fintech. SSRN. [Link]
10.2139/ssrn.3507827
revealing the impacts of an AI model on financial inclusion.
Armstrong, C., Craig, B., Jackson, W. E., & Thomson, J. B. (2013).
Although previous studies have documented the value of The moderating influence of financial market development on the
machine learning models and weak signals in improving relationship between loan guarantees for SMEs and local market
crediting scoring models, few studies have considered their employment rates. Journal of Small Business Management, 52(1),
impact on financial inclusion (Di Giuseppe, 2021; Liang et al., 126-140. [Link]
2018; Liu, 2022). Leveraging a unique opportunity arising Autor, D. (2014). Polanyi’s paradox and the shape of employment
when a financial institution introduced an AI model to one of its growth (NBER Working Paper No. 20485). National Bureau of
personal loan products, we were able to use a DID identification Economic Research. [Link]
strategy to assess the impacts of the AI model on financial Awaworyi-Churchill, S. (2019). Microfinance financial sustainability
and outreach: Is there a trade-off? Empirical Economics, 59(3),
inclusion for the underserved population.
1329-1350. [Link]
Bao, Z., & Huang, D. (2021). Shadow banking in a crisis: Evidence
While our analysis demonstrates the value of AI models in from Fintech during COVID-19. Journal of Financial and
reducing statistical discrimination against the underserved Quantitative Analysis, 56(7), 2320-2355. [Link]
population and improving their financial inclusion, it is important S0022109021000430
to recognize the limitations of AI models. First, AI models cannot Bartlett, R., Morse, A., Stanton, R., & Wallace, N. (2022). Consumer-
address structural discrimination, which can cause inequity in lending discrimination in the fintech era. Journal of Financial
wealth distribution, access to education, or career advancement Economics, 143(1), 30-56. [Link]
and in turn affects applicants’ creditworthiness. Such structural 05.047
Brown, J. R., Cookson, J. A., & Heimer, R. Z. (2019). Growing up
discrimination would require systematic policy interventions.
without finance. Journal of Financial Economics, 134(3), 591-616.
Second, our analysis shows that the effect of the AI model on [Link]
financial inclusion is weaker for subgroups with missing weak
Bucker M, Szepannek G, Gosiewska A, Biecek P (2022). Godbillon-Camus, B., & Godlewski, C. J. (2005). Credit risk
Transparency, auditability, and explainability of machine learning management in banks: Hard information, soft information and
models in credit scoring. Journal of the Operational Research manipulation. SSRN. [Link]
Society, 73(1), 70-90. [Link] abstract_id=882027
1922098 Gomber, P., Kauffman, R. J., Parker, C., & Weber, B. W. (2018). On
Burtch, G., & Chan, J. (2019). Investigating the relationship between the Fintech revolution: Interpreting the forces of innovation,
medical crowdfunding and personal bankruptcy in the United disruption, and transformation in financial services. Journal of
States: Evidence of a digital divide. MIS Quarterly, 43(1), 237-262. Management Information Systems, 35(1), 220-265. [Link]
[Link] 10.1080/07421222.2018.1440766
Cai, L., & Zhu, Y. (2015). The challenges of data quality and data Gopal, R. D., Hidaji, H., Kutlu, S. N., Patterson, R. A., & Yaraghi, N.
quality assessment in the big data era. Data Science Journal, 14(2), (2023). Law, economics, and privacy: Implications of government
1-10. [Link] policies on website and third-party information sharing.
Chen, Z., Liu, Y.J., Meng, J., & Wang, Z. (2023). What’s in a face? An Information Systems Research, 34(4), 1375-1397. [Link]
experiment on facial information and loan-approval decision. 10.1287/isre.2022.1178
Management Science, 69(4), 2263-2283. [Link] Gross, J. P. K., Cekic, O., Hossler, D., & Hillman, N. (2010). What
mnsc.2022.4436 matters in student loan default: A review of the research literature.
Cowgill, B., Dell’Acqua, F., Deng, S., Hsu, D., Verma, N., & Journal of Student Financial Aid, 39(1), Article 2.
Chaintreau, A. (2020). Biased programmers? or biased data? A [Link]
field experiment in operationalizing AI ethics. SSRN. Gunnarsson, B. R., vanden Broucke, S., Baesens, B., Óskarsdóttir, M.,
[Link] & Lemahieu, W. (2021). Deep learning for credit scoring: Do or
Cull, R., Demirgüç–Kunt, A., & Morduch, J. (2011). Microfinance don’t? European Journal of Operational Research, 295(1), 292-
trade-offs: Regulation, competition and financing. In B. 305. [Link]
Armendáriz & M. Labie (Eds.), The handbook of microfinance, Hao, K., & Stray, J. (2019). Can you make AI fairer than a judge? Play
(pp. 141-157). World Scientific. [Link] our courtroom algorithm game. MIT Technology Review.
4295666_0007 [Link]
Di Giuseppe, D. (2021). Credit scoring model using machine learning than-judge-criminal-risk-assessment-algorithm/
(Working paper). Available at [Link] Hiller, J. S. (2020). Fairness in the eyes of the beholder: AI, fairness,
_DI%20GIUSEPPE_DAVIDE.pdf and alternative credit scoring. West Virginia Law Review, 123(3),
Dobbie, W., Liberman, A., Paravisini, D., & Pathania, V. (2021). 907-935. [Link]
Measuring bias in consumer lending. The Review of Economic Hou J, Zhang J, Zhang K (2023) Pictures that are worth a thousand
Studies, 88(6), 2799-2832. [Link] donations: How emotions in project images drive the success of
Dubber, M. D., Pasquale, F., & Das, S. (2020). The Oxford handbook online charity fundraising campaigns? An image design
of ethics of AI. Oxford University Press. perspective. MIS Quarterly, 47(2), 535-584. [Link]
Engel, K. C., & McCoy, P. A. (2001). A tale of three markets: The law 10.25300/MISQ/2022/17164
and economics of predatory lending. SSRN. [Link] Hurley, M., & Adebayo, J. (2016). Credit scoring in the era of big data.
com/sol3/[Link]?abstract_id=286649 Yale Journal of Law & Technology., 18, 148-202.
Fang, H., & Moro, A. (2011). Theories of statistical discrimination and Hurlin C, Perignon C, Saurin S (2022) The fairness of credit scoring
affirmative action: A survey. In J. Benhabib, A. Bisin, & M. O. Jackson models. SSRN. [Link]
(Eds.), Handbook of social economics (pp. 133-200). North-Holland. id=3785882
[Link] Iyer, R., Khwaja, A. I., Luttmer, E. F., & Shue, K. (2016). Screening peers
Fu, R., Huang, Y., & Singh, P. V. (2021). Crowds, lending, machine, softly: Inferring the quality of small borrowers. Management
and Bias. Information Systems Research, 32(1), 72-92. Science, 62(6), 1554-1577. [Link]
[Link] Jagtiani, J., & Lemieux, C. (2019). The roles of alternative data and
Fügener, A., Grahl, J., Gupta, A., & Ketter, W. (2022). Cognitive machine learning in fintech lending: Evidence from the
challenges in human–artificial intelligence collaboration: LendingClub consumer platform. Financial Management, 48(4),
Investigating the path toward productive delegation. Information 1009-1029. [Link]
Systems Research, 33(2), 678-696. [Link] Jappelli, T. (1990). Who is credit constrained in the U.S. economy? The
2021.1079 Quarterly Journal of Economics, 105(1), 219-234.
Gao, H., Kumar, S., Tan, Y. (Ricky), & Zhao, H. (2022). Socialize [Link]
more, pay less: Randomized field experiments on social pricing. Johnson, K. N. (2019). Examining the use of alternative data in
Information Systems Research, 33(3), 935-953. [Link] underwriting and credit scoring to expand access to credit (Tulane
10.1287/isre.2021.1089 Public Law Research Paper, 19-7). SSRN. [Link]
Garfinkel, S., Matthews, J., Shapiro, S. S., & Smith, J. M. (2017). 10.2139/ssrn.3481102
Toward algorithmic transparency and accountability. Kallus, N., Mao, X., & Zhou, A. (2020). Assessing algorithmic fairness
Communications of the ACM, 60(9), Article 5. [Link] with unobserved protected class using data combination. In
10.1145/3125780 Proceedings of the 2020 Conference on Fairness, Accountability,
Gianfrancesco, M. A., Tamang, S., Yazdany, J., & Schmajuk, G. and Transparency. [Link]
(2018). Potential biases in machine learning algorithms using Kang, L., Jiang, Q., & Tan, C.-H. (2017). Remarkable advocates: An
electronic health record data. JAMA Internal Medicine, 178(11), investigation of geographic distance and social capital for
1544. [Link]
crowdfunding. Information & Management, 54(3), 336-348. Lu, T., Zhang, Y., & Li, B. (2023). Profit vs. equality? The case of
[Link] financial risk assessment and a new perspective of alternative data.
Kankanhalli, A., Charalabidis, Y., & Mellouli, S. (2019). IOT and AI MIS Quarterly, 47(4), 1517-1556. [Link]
for smart government: A research agenda. Government 2023/17330
Information Quarterly, 36(2), 304-309. [Link] Lu, Y., Gu, B., Ye, Q., & Sheng, Z. (2012). Social influence and
[Link].2019.02.003 defaults in peer-to-peer lending networks. The Proceedings of 33rd
Kim, D. (2019). The importance of detailed patterns of herding International Conference on Information Systems. [Link]
behaviour in a P2P lending market. Applied Economics Letters, [Link]/icis2012/proceedings/DigitalNetworks/19
27(2), 127-130. [Link] Lusardi, A., & Scheresberg, C. de B. (2013). Financial literacy and
Kwon J, Johnson M (2018). Meaningful healthcare security: Does high-cost borrowing in the United States (NBER working paper
meaningful-use attestation improve information security #18969). National Bureau of Economic Research. [Link]
performance? MIS Quarterly, 42(4), 1043-1067. [Link] 10.3386/w18969
10.25300/MISQ/2018/13580 Ma, L., Zhao, X., Zhou, Z., & Liu, Y. (2018). A new aspect on P2P
Langenbucher K (2020). Responsible AI-based credit scoring: A legal online lending default prediction using meta-level phone usage
framework. European Business Law Review, 31(4), 527-572. data in China. Decision Support Systems, 111, 60-71.
[Link] [Link]
Laouénan, M., & Rathelot, R. (2022). Can information reduce ethnic Macha M, Foutz N, Li B, Ghose A (2023). Personalized privacy
discrimination? Evidence from Airbnb. American Economic preservation in consumer mobile trajectories. Information Systems
Journal: Applied Economics, 14(1), 107-132. [Link] Research, 35(1), 249-271. [Link]
10.1257/app.20190188 Martin, K. (2019). Designing ethical algorithms. MIS Quarterly
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, Executive, 18(2), 129-142. [Link]
521(7553), 436-444. Mejia J, Parker C (2021) When transparency fails: Bias and financial
Lee, G. M., Naughton, J. P., Zheng, X., & Zhou, D. (2020). Predicting incentives in ridesharing platforms. Management Science, 67(1),
litigation risk via Machine Learning. SSRN. [Link] 166-184. [Link]
com/sol3/[Link]?abstract_id=3740954 Mendonça, S., Cardoso, G., & Caraça, J. (2012). The strategic strength
Leyshon, A., & Thrift, N. (1995). Geographies of financial exclusion: of weak signal analysis. Futures, 44(3), 218-228. [Link]
Financial abandonment in Britain and the United States. 10.1016/[Link].2011.10.004
Transactions of the Institute of British Geographers, 20(3), 312- Mitlin, D. (2008). Urban poor funds: Development by the people for
314. [Link] the people. IIED.
Li, J., & Hu, J. (2019). Does university reputation matter? Evidence Monk, A., Prins, M., & Rook, D. (2019). Rethinking alternative data in
from peer-to-peer lending. Finance Research Letters, 31, 66-77. institutional investment. The Journal of Financial Data Science,
[Link] 1(1), 14-31. [Link]
Liang, F., Das, V., Kostyuk, N., & Hussain, M. M. (2018). Neumann, N., Tucker, C. E., Kaplan, L., Mislove, A., & Sapiezynski,
Constructing a data-driven society: China’s social credit system as P. (in press). Data deserts and black box bias: The impact of socio-
a state surveillance infrastructure. Policy & Internet, 10(4), 415- economic status on consumer profiling. Management Science.
453. [Link] Advance online publication. [Link]
Liberti, J. M. (2018). Initiative, incentives, and soft information. 2023.4979
Management Science, 64(8), 3714-3734. [Link] Nowak, A., Ross, A., & Yencha, C. (2018). Small business borrowing
mnsc.2016.2690 and peer‐to‐peer lending: Evidence from lending club.
Liberti, J. M., & Petersen, M. A. (2019). Information: Hard and soft. Contemporary Economic Policy, 36(2), 318-336. [Link]
Review of Corporate Finance Studies, 8(1), 1-41. [Link] 10.1111/coep.12252
10.1093/rcfs/cfy009 O'neil, C. (2017). Weapons of math destruction: How big data
Lin, M., Prabhala, N. R., & Viswanathan, S. (2013). Judging borrowers increases inequality and threatens democracy. Crown.
by the company they keep: Friendship networks and information Ozler, S. (1992). Have commercial banks ignored history? (NBER
asymmetry in online peer-to-peer lending. Management Science, working paper #3959). National Bureau of Economic Research.
59(1), 17-35. [Link] [Link]
Liu, D., Brass, D. J., Lu, Y., & Chen, D. (2015). Friendship in online peer- Rudin, C. (2019). Stop explaining black box machine learning models
to-peer lending: Pipes, prisms, and relational herding. MIS Quarterly, for high stakes decisions and use interpretable models instead.
39(3), 729-742. [Link] Nature Machine Intelligence, 1(5), 206-215. [Link]
Liu, M. (2022). Assessing human information processing in lending 10.1038/s42256-019-0048-x
decisions: A machine learning approach. Journal of Accounting Schmidhuber, J. (2015). Deep learning in neural networks: An
Research, 60(2), 607-651. [Link] overview. Neural Networks, 61, 85-117. [Link]
Loufield, E., Ferenzy, D., & Johnson, T. (2018). Accelerating financial [Link].2014.09.003
inclusion with new data. Center for Financial Inclusion. Serrano-Cinca, C., & Gutiérrez-Nieto, B. (2016). The use of profit
[Link] scoring as an alternative to credit scoring systems in peer-to-peer
financial-inclusion-with-new-data/ (P2P) lending. Decision Support Systems, 89, 113-122.
Lu, J., Lee, D. (DK), Kim, T. W., & Danks, D. (2019). Good [Link]
explanation for algorithmic transparency. SSRN. [Link] Seibert, S. E., Kraimer, M. L., & Liden, R. C. (2001). A social capital
[Link]/sol3/[Link]?abstract_id=3503603 theory of career success. Academy of Management Journal, 44(2),
219-237. [Link]
Talha M, Elmarzouqi N, Kalam A (2020). Towards a powerful solution About the Authors
for data accuracy assessment in the big data context. International
Journal of Advanced Computer Science and Applications, 11(2), Chunxiao Li is an associate professor at the School of Sci-tech
419-429. [Link] Business and the School of Management at the University of Science
Tantri, P. (2021). Fintech for the poor: Financial intermediation without and Technology of China. Her research primarily focuses on the areas
discrimination. Review of Finance, 25(2), 561-593. [Link] of fintech and AI ethics, with a particular emphasis on user and
10.1093/rof/rfaa039 machine behavior. She is dedicated to exploring fairness and
Teodorescu, M., Morse, L., Awwad, Y., & Kane, G. (2021). Failures interpretability in AI systems, the elimination of bias, and the
of fairness in automation require a deeper understanding of human- identification of irrational behavior. Her research has been published in
ML augmentation. MIS Quarterly, 45(3), 1483-1500. journals such as Journal of the Association for Information Systems and
[Link] IEEE Transactions on Knowledge and Data Engineering. She has
United Nations. (1995). Report of the world summit for social received two best paper runner-up awards. She received her Ph.D.
development. United Nations. [Link] degree from the W. P. Carey School of Business at Arizona State
document/auto-insert-189595/ University and has previously worked at the Antai College of
United Nations. (2006). Social justice in an open world: The role of the Economics and Management, Shanghai Jiao Tong University. ORCiD:
United Nations. [Link] 0000-0002-6946-9726
[Link]
United Nations. (2015). Economic and social survey of Asia and The
Hongchang Wang is an assistant professor in information systems at
Pacific 2015: Making growth more inclusive for sustainable
the Naveen Jindal School of Management in the University of Texas at
development. United Nations. [Link]
Dallas. His research investigates the economic and social impacts of
828926?ln=en&v=pdf
information systems, financial technologies, artificial intelligence, and
Wei, Y., Yildirim, P., Van den Bulte, C., & Dellarocas, C. (2016).
digital platforms. Implications of his research cover areas and industries
Credit scoring with social network data. Marketing Science, 35(2),
such as enterprise systems (e.g., ERP, SCM, CRM), online lending
234-258. [Link]
(e.g., Lending Club, [Link], traditional banks), online
World Bank. (2010) Financial inclusion, poverty reduction and
accommodation (e.g., Airbnb), and blockchain applications (e.g.,
economic growth. [Link]
NFT). His research has appeared in Management Science and other
2010/11/10/financial-inclusion-poverty-reduction-economic-
outlets. He has received three Best Paper runner-up awards and a Best
growth
Reviewer award. ORCiD: 0000-0002-8707-2810
World Bank. (2023) Financial inclusion overview. [Link]
[Link]/en/topic/financialinclusion/overview
Xu, J. J., & Chau, M. (2018). Cheap talk? The impact of lender- Songtao Jiang is the senior scientist at CraiditX, Inc. His research
borrower communication on peer-to-peer lending outcomes. primarily concentrates on the identification and mitigation of risks in
Journal of Management Information Systems, 35(1), 53-85. transactions and financial lending. This critical work aids in the
[Link] prevention of financial crimes such as money laundering and
Yawe, B., & Prabhu, J. (2015). Innovation and financial inclusion: A telecommunications fraud, and supports financial institutions in
review of the literature. Journal of Payments Strategy & Systems, minimizing potential losses, thereby enhancing decision-making
9(3), 215-228. [Link] processes and business management strategies. He received his
Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & bachelor of science degree in finance from the Southwestern University
Oermann, E. K. (2018). Confounding variables can degrade of Finance and Economics in Chengdu, China, in 2014, followed by a
generalization performance of radiological deep learning models. master of science degree in financial risk management from the
arXiv. [Link] University of Leeds, UK, in 2015. ORCiD: 0000-0001-6860-6998
Zhang, S., Mehta, N., Singh, P. V., & Srinivasan, K. (2021). Frontiers:
Can an artificial intelligence algorithm mitigate racial economic Bin Gu is Everett W. Lord Distinguished Faculty Scholar, professor,
inequality? An analysis in the context of Airbnb. Marketing and department chair of information systems at the Questrom School
Science, 40(5), 813-820. [Link] of Business, Boston University. His research interests are in using
Zhang, Y. (2018). Assessing fair lending risks using race/ethnicity information technologies and artificial intelligence to address
proxies. Management Science, 64(1), 178-197. [Link] information asymmetry and social inequity in business and society. His
10.1287/mnsc.2016.2579 research has been published in Management Science, MIS Quarterly,
Zhu, N. (2011). Household consumption and personal bankruptcy. The Information Systems Research, and Journal of Management
Journal of Legal Studies, 40(1), 1-37. [Link] Information Systems, among others. He received his Ph.D. from the
649033 University of Pennsylvania. ORCiD: 0000-0002-0396-8899
Appendix
Table A1 shows the descriptive statistics of the applicants from the underserved population for the treated and control product. The application
ratio and the features of these underserved applicants are very similar. Figure A1 describes the kernel of the AI model. Building upon the 10
data domains, the development team used various feature-generation techniques to create features within each data domain. For example, a
natural-language processing technique generated features from text messages and news content. A time-series technique generated features
from a sequence of provident fund deposits. This process led to four groups of features, which were used to train four individual learners
predicting default (multiple individual learners, compared to one learner, can better utilize all features and integrate related features). Each
learner was a LightGBM model, outperforming other state-of-the-art machine learning models in this context. Eventually, an ensemble learner
(also a LightGBM model) combined the results from these four individual learners and served as the final model to generate credit scores.
Table A1. Descriptive Statistics of the Underserved Applicants for Two Products
Treated product Control product
Percent of underserved population
27.53% 19.63%
among all applicants
Approval rate of the underserved
17.50% 16.67%
population
Percent of urban households within the
15.17% 13.02%
underserved population
Percent of self-employed within the
82.72% 79.67%
underserved population
Percent with income data within the
10.52% 11.88%
underserved population
Percent with credit history within the
48.62% 53.23%
underserved population
Note: We show the descriptive statistics of the underserved population applicants for the treated and control products.
Developed by Microsoft, LightGBM, short for light gradient-boosting machine, represents an avant-garde gradient-boosting framework,
catering predominantly to ranking, classification, and regression tasks (Ke et al., 2017). LightGBM is based on decision tree algorithms and
gives a prediction model in the form of an ensemble of simple decision trees. LightGBM shares many advantages with XGBoost, including
sparse optimization, parallel training, multiple loss functions, regularization, bagging, and early stopping (Wikipedia 2023). Diverging from
XGBoost’s level-wise tree growth strategy, LightGBM employs a leaf-wise tree growth strategy, which consequently prunes inefficient splits
to optimize operational speed. It embarks on an innovative trajectory with its histogram-based optimization for continuous features,
compressing the search space during node splits, and minimizing memory usage during training. Compared to frameworks like XGBoost and
CatBoost, LightGBM comes with enhanced computational efficiency and reduced memory consumption, maintaining commendable model
accuracy.
Positioning itself at the forefront of financial analytics, LightGBM transcends traditional models, as evidenced by its commendable
achievements in academic research and real-world applications. In academic research, Ma et al. (2018) showed its efficacy in enhancing P2P
lending models, marking a tangible reduction in loan defaults. Sun et al. (2020) demonstrated LightGBM’s superior predictive accuracy in
forecasting cryptocurrency trends compared to traditional models like support vector machines and random forests. Wang et al. (2022) utilized
LightGBM to evaluate the finance risk of 186 firms. Their exhaustive experiments, benchmarking LightGBM against other algorithms,
consistently underscored its superior predictive prowess in this domain. In industry, within the competitive milieu of Kaggle, from 2016 to
2019, LightGBM notched more than 30 top-three placements, distinguishing itself in flagship contests such as CIKM AnalytiCup 2017 and
IEEE Fraud Detection. Amazon has also incorporated LightGBM into platforms like SageMaker for distributed training since January 2023.
In this project, the development team utilized the open source code of LightGBM ([Link] and applied it
five times in the AI model. The four individual learners were trained separately by LightGBM using different sets of features. The ensemble
learner then integrated their predictions and generated the final scores (Sawhney et al., 2020). The development team applied this two-layer
structure for three reasons: (1) Training distinct LightGBM models tailored to specific objectives enhances model efficiency. Conversely,
merging all features into a singular model heightens the risk of overfitting; (2) This structure not only magnifies model interpretability by
demarcating feature contributions but also ensures optimal performance across models; (3) LightGBM’s inherent “exclusive feature
bundling” mechanism can amalgamate highly collinear features. Thus, when strong and weak signals coexist in a single model, their inter-
feature bundling can obscure the individual contributions of each to the predictive model.
Table A2 reveals three findings: first, weak signals have substantial predictive value because they can improve prediction accuracy even
when used by the traditional model; second, the best combination is to use strong and weak signals by the AI model, which can further
improve prediction accuracy; and third, the improvement brought by weak signals and advanced algorithms is more important for the
underserved population than the regular population.
Table A3 illustrates how the traditional approach (i.e., human expertise) and the advanced approaches (i.e., knowledge graph, time series,
natural language processing) generate features from weak signals. It is evident that weak signals are hard for human experts to use because
only limited features can be generated using human intuition. However, the advanced approaches can fit into different data natures and
generate novel and meaningful features, consequently serving as the foundation for the AI model. Table A4 reveals two patterns: first,
advanced algorithms do lead to higher prediction accuracy, which is the case in both the regular and underserved populations; second,
although the prediction accuracy is worse for the underserved population than for the regular population using the traditional model, the
performance for the underserved population becomes closer to (or even better than) that for the regular population using the AI model. In a
nutshell, advanced algorithms (e.g., a complex machine learning algorithm) are one of the mechanisms that can enhance financial inclusion
and are thus especially important for the underserved population.
Table A5 illustrates how the traditional approach (i.e., human expertise) and the advanced approaches (i.e., knowledge graph, time series)
generate features from strong signals. The novel and meaningful features generated by the advanced approaches serve as the foundation for
the AI model. Table A6 addresses the concern that the changes in the applicant pool of the treated product confounded the impacts of the AI
model. We reran the main Equations using shorter time periods, which were two weeks/four weeks/six weeks before and after the adoption
date. The trade-off is that a short time window may help rule out confounding events/changes, but it may not be long enough to capture the
entire impact. The results are largely consistent with those from our main sample: the adoption of the AI model increases the approval rate
and reduces the default rate for the underserved population. Therefore, changes in the applicant pool are not a major concern for our study.
Table A6. The Impacts of the AI Model on The Approval Rate and Lending Performance: Shorter Time
Period Analysis
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data period
variable (1/0) (1/0) level (1/0) (1/0) level
Six weeks The initial
-0.006 -0.002*** 0.047*** 0.104*** -0.008*** 0.065***
before and launch of the AI
(0.005) (0.001) (0.018) (0.013) (0.001) (0.019)
after model
Four weeks The initial
-0.006 -0.001*** 0.035** 0.089*** -0.007*** 0.072***
before and launch of the AI
(0.005) (0.001) (0.019) (0.011) (0.002) (0.008)
after model
Two weeks The initial
-0.002 -0.001 0.029 0.045*** -0.005 0.034
before and launch of the AI
(0.002) (0.000) (0.023) (0.008) (0.004) (0.027)
after model
Note: We reran Equations (1) and (2) using alternate time periods as a robustness check and report the coefficients of “the initial launch of the
AI model” on the approval rate and default rate in this table.
Table A7 reports the results from a matched sample analysis. We conducted a coarsened exact matching approach and matched the applicants
for the treated product and the control product. We generated a variable “underserved” and used it in the matching approach so the
underserved (regular) applicants applying for the treated product were matched to the underserved (regular) applicants applying for the
control product (M-to-N match). We also used age, gender, with income data, with credit history, self-employed, urban household, and annual
income as matching variables. As a result of this matching approach, we ended up with a well-balanced sample of around 25% of the original
sample size (the balance check result is reported in Table A8). This process helped us build a sample that contained similar applicants who
applied for either product both before or after the adoption of AI. The results in Table 7 are consistent with our main findings. Table A9
reports the results from a subsample analysis using online applicants only. The focal bank never promoted the treated product to the
underserved population via online channels. Therefore, all online applications were unlikely to suffer from human intervention. The impacts
of the AI model for the underserved population are similar to our main findings.
Table A7. The Impacts of the AI Model on the Approval Rate and Lending Performance: Matched Sample
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data
variable (1/0) (1/0) level (1/0) (1/0) level
The initial
Matched 0.005 -0.002*** 0.036*** 0.096*** -0.004*** 0.014***
launch of the
Sample (0.004) (0.000) (0.005) (0.037) (0.002) (0.004)
AI model
Note: We rerun Equations (1) and (2) using the matched sample and report the coefficients of “the initial launch of the AI model” on the approval
rate, default rate, and utilization level in this table.
Table A9. The Impacts of the AI Model on the Approval Rate and Lending Performance: Online
Applications
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data
variable (1/0) (1/0) level (1/0) (1/0) level
The initial
Online 0.010*** -0.005*** 0.030*** 0.110*** -0.007*** 0.014
launch of the
applications (0.003) (0.000) (0.013) (0.007) (0.001) (0.033)
AI model
Note: We reran Equations (1) and (2) using applicants who were from the online channel as a robustness check (there was barely human
intervention in this process) and report the coefficients of “the initial launch of the AI model” on the approval rate, default rate, and utilization
level in this table.
Table A10 reports the results from three subsample analyses that split the whole sample based on the location of bank branches. Given the
concern that the effect of manual promotion and evaluation may be at play and last for a while, we distinguished the bank branches that were
not likely to suffer from this intervention (i.e., bank branches in big cities) from those that were likely to suffer from this intervention (i.e.,
bank branches in medium and small cities). The working assumption is that because the focal bank sent almost all its special personnel to the
branches in medium and small cities/towns, the impact identified for applications via branches in big cities likely comes from the AI model
only. The results are consistent with our main findings, with an increase in the approval rate and a decrease in the default rate for applicants
from the underserved population in big cities.
Table A11 reports the results from three placebo tests. Given the concern that the focal bank might conduct multiple efforts to echo the
government policies, we assume that the focal bank has indeed taken some unobserved actions to promote financial inclusion and call them
fake treatments. Relying on the same specifications and using the time period before the adoption of the AI model, we were able to identify
the impacts of these fake treatments. As is shown in Table A11, the level of human intervention is largely insignificant, and the magnitude
of the impact of human intervention on the approval rate is about 1/10 of the impact of the AI model (i.e., 0.018 compared to 0.150). More
importantly, the impact of human intervention, if there was any, is positive on the default rate. The results further support that AI models can
increase the approval rate and reduce the default rate simultaneously, which manual interventions cannot easily achieve. Table A12
summarizes how weak signals are used in various contexts (outside of China), as documented in the literature. Our literature review shows
that although many data domains and features may seem sensitive or private, they have been widely used by businesses both in China and in
the U.S., which is appropriate as long as they obtain consent from customers and comply with relevant regulations.
Table A10. The Impacts of the AI Model on The Approval Rate and Lending Performance: A Branch-Based
Analysis
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data
variable (1/0) (1/0) level (1/0) (1/0) level
The initial
Large -0.006*** -0.008*** 0.072*** 0.117*** -0.008*** 0.072***
launch of the
cities/towns (0.001) (0.003) (0.031) (0.015) (0.002) (0.031)
AI model
The initial
Medium-sized 0.002 -0.007*** 0.037 0.158*** -0.005*** 0.041***
launch of the
cities/towns (0.002) (0.003) (0.028) (0.009) (0.001) (0.012)
AI model
The initial
Small -0.002 -0.008*** 0.040*** 0.126*** -0.008*** 0.036***
launch of the
cities/towns (0.002) (0.003) (0.007) (0.018) (0.002) (0.011)
AI model
Note: We reran Equations (1) and (2) using subsamples based on city/town size as a robustness check and report the coefficients of “the initial
launch of the AI model” on the approval rate, default rate, and utilization level in this table. The results come from three individual regressions.
Table A11. The Impacts of the AI Model on the Approval Rate and Lending Performance: Placebo Tests
Sample Regular population Underserved population
Fake Dependent Approval Default Utilization Approval Default Utilization
treatment variable (1/0) (1/0) level (1/0) (1/0) level
0.002 0.003*** 0.010*** 0.018* 0.003*** 0.003
Nov 1, 2020
0.002 0.000 0.002 0.009 0.000 0.003
The initial
launch of the 0.002*** 0.001*** -0.004*** 0.017 0.003*** 0.003***
Dec 1, 2020
fake treatment 0.000 0.000 0.001 0.010 0.000 0.001
-0.001*** -0.000 -0.004*** 0.011 0.002*** 0.000
Jan 1, 2021
0.000 0.000 0.000 0.009 0.000 0.001
Note: We reran Equations (1) and (2) within the time period before the initial launch of the AI model and used fake treatment marks. We report
the coefficients of "the initial launch of the fake treatment" on the approval rate, default rate, and utilization level. The results come from three
individual regressions.
The adoption of AI models increased the approval rate and decreased the default rate for the underserved population. This was achieved by better predicting creditworthiness using weak signals, which traditional models often overlook .
Weak signals serve as the informational base for AI models, allowing machine learning algorithms to create novel and meaningful features that are predictive of creditworthiness. This leads to improved prediction accuracy without relying on traditional, often exclusionary, strong signals. By effectively evaluating underserved applicants who would not qualify through conventional methods, AI models enhance financial inclusion .
The study outlines that AI models can be a powerful tool for financial inclusion. Practitioners should consider using weak signals and machine learning algorithms to enhance credit scoring models, even when operating with limited IT capabilities. Additionally, combining AI models with traditional models can yield improved outcomes in credit assessment processes. These insights guide the design and deployment of AI systems in financial contexts, emphasizing their adaptability and efficacy .
AI models provide a powerful tool for enhancing financial inclusion by improving default risk prediction accuracy. This enables banks to widen access to capital for the underserved, even with limited IT resources .
The combination of weak signals and machine learning algorithms produces the largest prediction accuracy improvement. This enhances financial inclusion significantly, surpassing the results from using one without the other .
The adoption of the AI model resulted in higher approval rates and lower default rates among underserved applicants compared to a control group using traditional methods. This indicates that AI models are more effective in assessing credit risk and enhancing financial inclusion while maintaining or even improving loan performance metrics .
AI models contribute to financial inclusion and social justice literature by demonstrating significant potential to improve prediction accuracy and increase capital access for underserved populations. By reducing reliance on traditional strong signals, typically lacking among underserved groups, the models mitigate statistical discrimination and enhance financial inclusion. These insights expand the theoretical understanding of AI's role in social justice, showing how technological advances can address systemic financial barriers .
Weak signals contribute by providing raw information that, when processed by machine learning algorithms, predicts creditworthiness more accurately than strong signals alone. This results in improved selection of applicants who are less likely to default, thereby enhancing financial inclusion .
The AI models enhance financial inclusion primarily by improving prediction accuracy, which is achieved through the use of weak signals and sophisticated machine learning algorithms. Weak signals provide raw data to create viable predictor features, while machine learning algorithms connect these features to creditworthiness, reducing reliance on group characteristics prone to financial exclusion. The combination of these methods results in the largest improvement, particularly beneficial for the underserved population previously overlooked by traditional models .
AI models contribute to social justice by enhancing financial inclusion. They improve access to capital for the underserved population, reducing statistical discrimination by leveraging weak signals and advanced algorithms .