0% found this document useful (0 votes)
244 views33 pages

The Effect of Ai-Enabled Credit Scoring On

This study investigates the impact of an AI-enabled credit scoring model on financial inclusion for an underserved population of over one million individuals. The findings reveal that the AI model improved approval rates and reduced default rates by utilizing weak signals and advanced machine learning algorithms, thereby enhancing access to capital for marginalized groups. The research highlights the potential of AI tools to address financial exclusion while also emphasizing the need for careful implementation and regulation to avoid perpetuating existing biases.

Uploaded by

Y. Crusader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views33 pages

The Effect of Ai-Enabled Credit Scoring On

This study investigates the impact of an AI-enabled credit scoring model on financial inclusion for an underserved population of over one million individuals. The findings reveal that the AI model improved approval rates and reduced default rates by utilizing weak signals and advanced machine learning algorithms, thereby enhancing access to capital for marginalized groups. The research highlights the potential of AI tools to address financial exclusion while also emphasizing the need for careful implementation and regulation to avoid perpetuating existing biases.

Uploaded by

Y. Crusader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SPECIAL ISSUE

THE EFFECT OF AI-ENABLED CREDIT SCORING ON


FINANCIAL INCLUSION: EVIDENCE FROM AN
UNDERSERVED POPULATION OF OVER ONE MILLION1
Chunxiao Li
Department of Information Systems, Faculty of Business in Scitech, School of Management, University of Science and Technology of China
Hefei, Anhui, CHINA {[Link]@[Link]}

Hongchang Wang
Department of Information Systems, Naveen Jindal School of Management, University of Texas at Dallas
Dallas, TX, U.S.A. {[Link]@[Link]}

Songtao Jiang
Department of Data Science, CreditX Inc.
Changning, Shanghai, CHINA {jiangsongtao@[Link]}

Bin Gu
Department of Information Systems, Questrom School of Business, Boston University
Boston, MA U.S.A. {bgu@[Link]}

We studied the effect of a major bank adopting an AI-enabled credit scoring model on financial inclusion as
measured by changes to the approval rate, default rate, and utilization level of a personal loan product for an
underserved population. The bank serves over 50 million customers and previously used a traditional rule-based
model to evaluate the default risk of each loan application. It recently developed an AI model with a higher
prediction accuracy of default risk and used the AI model and the traditional model together to assess loan
applications for one of its personal loan products. Although the AI model may be more accurate in estimating
default risk, little is known about its impact on financial inclusion. We investigated this question using a difference-
in-differences approach by comparing changes in financial inclusion of the personal loan product that adopted
the AI model to that of a similar personal loan product that did not adopt the AI model. We found that the AI
model enhanced financial inclusion for the underserved population by simultaneously increasing the approval
rate and reducing the default rate. Further analysis attributed the enhancement in financial inclusion to the use
of weak signals (i.e., data not conventionally used to evaluate creditworthiness) by the AI model and its
sophisticated machine learning algorithms. Our findings are consistent with statistical discrimination theory, as
the use of weak signals and sophisticated machine learning algorithms improves prediction accuracy at the
individual level, thus reducing the reliance on group characteristics that often lead to financial exclusion. We
elaborated on the development process of the AI model to illustrate how and why the AI model can better evaluate
members of underserved populations. We also found the impacts of the AI model to be heterogeneous across
subgroups, and those with missing weak signals saw smaller improvements in the approval rate. A simulation-
based analysis showed that simplified AI models were also able to increase the approval rate and reduce the
default rate for this population. Our findings provide rich theoretical and practical implications for social justice
by documenting how an AI model designed for improving prediction accuracy can enhance financial inclusion.
Keywords: Financial inclusion, credit scoring, AI models, weak signals, social justice

1
Min-Seok Pang, Atreyi Kankanhalli, Margunn Aanestad, Sudha Ram, and Likoebe M. Maruping were the accepting senior editors for this paper. Jennifer Jie
Zhang served as the associate editor. Hongchang Wang is the corresponding author. The transparency materials for this paper can be found at
[Link]

©2024. The Authors. This work is licensed under the terms of the Creative Commons Attribution CC BY-NC-ND 4.0 License
([Link]

DOI:10.25300/MISQ/2024/18340 MIS Quarterly Vol. 48 No. 4 pp. 1803-1834 / December 2024 1803
Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Introduction enhanced financial inclusion, but also face significant


limitations: first, the operational costs are high because new
Financial inclusion refers to the accessibility of useful facilities and staff are often involved; second, the default risks
and affordable financial products and services that are high because the underserved population, on average, has
cater to the need of individuals and businesses, a higher default rate than the regular population; third, these
including transactions, payments, savings, credit, and solutions cannot be easily applied at large scale (Bao &
insurance—delivered in a responsible and sustainable Huang, 2021). Therefore, finding a solution that can enhance
manner. (World Bank, 2023) financial inclusion at scale without incurring significant costs
is crucial and urgent.
Financial inclusion is an important driver of social justice and
has significant implications for improving living conditions and The advances in information technology, particularly in
machine learning, big data, and artificial intelligence (AI),
supporting the economic development of vulnerable and
provide a new way to address this notorious financial
marginalized groups (Dev, 2006). Research has found that the
inclusion problem. In the context of consumer lending, AI is
measures of financial inclusion, such as small business loans,
most widely adopted as a credit-scoring model trained on
student loans, and others, have a far-reaching effect on historical loan performance data and used to evaluate
employment (Armstrong et al., 2014), education (Gross et al., applicants’ creditworthiness (Di Giuseppe, 2021; Dobbie et
2010), health care (Burtch & Chan, 2019), and housing (Zhu, al., 2021; Liu, 2022). Compared to the traditional rule-based
2011). In this sense, financial inclusion is one of the foundations credit scoring models developed by underwriting experts
for social justice that can provide a solution to social injustice. based on human experience and descriptive statistics
(traditional models hereafter), AI-enabled credit scoring
While financial inclusion is a cornerstone of development (The models (AI models hereafter) have two outstanding
Global Findex Database, 2021), one main hurdle to financial advantages: first, AI models can incorporate a broader range
inclusion (as measured by access to capital) is the lack of credit of information, especially weak financial signals or even
history, which financial institutions conventionally rely on to nonfinancial signals that are usually not considered in
assess an applicant’s creditworthiness. Credit history is critical traditional models (Iyer et al., 2015; Wei et al., 2016); second,
in this evaluation process because it provides detailed and AI models can utilize advanced machine learning or deep
comprehensive information on an applicant’s past usage and learning algorithms to consider complex relationships
loan repayment. For example, The Fair Isaac Corporation’s between these signals and creditworthiness (Fu et al., 2021;
(FICO) Score uses the consumer’s debt level, length of credit Hurley & Adebayo, 2016). As a result, AI models often have
history, and on-time payment history to determine their higher prediction accuracy and, thus, a stronger capability to
creditworthiness. However, the heavy reliance on credit history identify the creditworthiness of individual applicants
poses a huge challenge to financial inclusion and social justice (Gunnarsson et al., 2021). This effect would be more profound
because members of marginalized groups (“the underserved for the underserved population because they often lack credit
population” hereafter.2) often have a thin credit history or none history (i.e., strong signals hereafter), which is used
at all (Abrahams & Zhang, 2008). Due to historical extensively in traditional models. However, concerns and
discrimination and long-lasting disadvantages in criticism have continued since the invention of AI models, and
socioeconomic status, the underserved population has difficulty many of them are still relevant in the financial inclusion
accessing capital, so they never have a fair chance to build a context. One data-related concern is that AI models may carry
credit history (O’Neil, 2017). This phenomenon represents a historical bias because historical data are used to train AI
classic “chicken-and-egg” problem: the underserved population models (Gianfrancesco et al., 2018; Martin, 2019; Teodorescu
needs access to capital to build their credit history, but the lack et al., 2021). In addition, using a wide range of data domains
of credit history prevents them from getting approved by may bring new disadvantages and even discrimination to
financial institutions in the first place (Engel & McCoy, 2001). some subgroups, especially those who do not have
information in these data domains (Hao & Stray, 2019). One
Financial institutions have developed multiple approaches to model-related concern is that there is no guarantee that all
provide access to capital for the underserved population, features make logical sense and no protected features are used
including opening more branches in rural and underserved because AI models can be “black boxes” (Garfinkel et al.,
areas, lowering eligibility criteria, and designing specific 2017; Lu et al., 2019; Zech et al., 2018; Zhang et al., 2021).
financial products (Tantri, 2021). These endeavors have

2
The definition and meaning of the underserved population may differ
slightly in various contexts. After introducing our research context, we will
provide detailed definition in Table 1.

1804 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Given the promises and pitfalls of AI models, we are creditworthiness in complex and novel ways. While the weak
interested in understanding how AI models influence financial signals provide the raw information for meaningful features
inclusion. Considering the competing theoretical arguments and the machine learning algorithms connect features to
and uncertain prior expectations of the impacts of AI models, creditworthiness, the combination of them brought the largest
we are motivated to investigate the following three research improvement (compared to using weak signals by humans
questions without proposing formal hypotheses: only or using machine learning algorithms on strong signals
only). This improvement benefited the underserved
RQ1—treatment effects: How do AI models influence population significantly more because they were overlooked
financial inclusion regarding the approval rate, default rate, by the traditional model and the higher prediction accuracy
and utilization level of the underserved population? reduced the reliance on group characteristics that often lead to
financial exclusion. Third, the impacts of the AI model were
RQ2—underlying mechanisms: Through what mechanisms heterogeneous, but almost all subgroups of the underserved
and logic chains do AI models impact financial inclusion, and population witnessed an increase in the approval rate and a
what are the key drivers? decrease in the default rate. For subgroups with one or more
missing data domains of weak signals, the impacts of the AI
RQ3—theoretical boundaries: Under what conditions and model were similar, though with a smaller magnitude. Fourth,
for what subpopulations are AI models effective at enhancing our simulation analysis showed that the positive impact of AI
financial inclusion? models on financial inclusion was potentially generalizable to
financial institutions without strong IT capability or access to
To answer these research questions, we cooperated with a rich weak signals. We found that simplified versions of the
traditional state-owned bank in China, which serves over 50 focal AI model (e.g., built with a simple AI algorithm or with
million customers (the focal bank hereafter). One decade ago, only one domain of weak signals) could still increase the
China started publishing a series of financial inclusion policies approval rate and reduce the default rate for the underserved
to motivate banks to serve the underserved population better. population. These findings held across multiple robustness
Among all the suggestions provided by these policies, one was checks and the consideration of alternative explanations.
to use financial technologies (e.g., AI models) in the evaluation
process. Motivated by these policies, the focal bank developed Our findings make a theoretical contribution to the AI and
and tested an AI-enabled credit scoring model in one of its social justice literature by revealing how and why AI models
personal loan products. This AI model incorporated weak can influence financial inclusion. More specifically, we show
signals and used sophisticated machine learning algorithms to that AI models with a goal to improve prediction accuracy can
improve the prediction accuracy of default risk. The bank tested have a significant and positive impact on financial inclusion.
the AI model by using it in combination with the existing This effect is driven by AI models’ ability to utilize both weak
traditional rule-based model to make the final lending decisions. signals and sophisticated machine learning algorithms to
We also identified a similar product that used the traditional improve prediction accuracy, which reduces the bank’s
model only during this time period (i.e., the control product) and reliance on traditional strong signals in loan evaluation.
applied a difference-in-differences (DID) strategy to estimate Because the underserved population often lacks strong
the impacts of the AI model on financial inclusion. signals, the reduced reliance on such signals leads to enhanced
financial inclusion. Therefore, AI models can widen access to
Our sample covered seven months and nine million capital for the underserved population. Our findings also make
applications. Our analysis yielded several important results. a practical contribution by demonstrating a feasible and
First, the adoption of the AI model increased the approval rate powerful AI tool to enhance financial inclusion, even when it
for the underserved population and reduced the default rate for was trained to enhance the prediction accuracy of default risk.
both the underserved and regular populations. 3 It also We provide rich information and detailed discussions on the
increased the utilization level of the whole population. design process of AI models, the compliance issues in using
Second, the enhancement in financial inclusion came from the AI models and privacy-sensitive data, the implementation of
improved prediction accuracy of the AI model. The use of AI models with traditional models, and the heterogeneous
both weak signals and sophisticated machine learning impacts of AI models. All these findings could help
algorithms contributed to this accuracy improvement because practitioners decide how to develop their own AI models and
they could help generate novel features that are predictive of how to deploy AI models in their business settings. We also
creditworthiness and connect these features to discuss the limitations associated with AI models. First, AI

3
More precisely, the adoption of the AI model helped select the of the AI model did not necessarily reduce the default rate by increasing the
underserved applicants who were less likely to default, which was why the repayment capability or repayment intention of the approved borrowers
average default rate of the underserved population decreased. The adoption from the underserved population.

MIS Quarterly Vol. 48 No. 4 / December 2024 1805


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

models do not necessarily benefit all applicants. A small (Brown et al., 2019), or provided with low-limit lines of
portion of applicants might have a lower chance of approval credit or loan amounts (Jappelli, 1990). As a result, the
because AI models can better understand their underserved population may have difficulty obtaining loans
creditworthiness, though negatively. Second, AI models or lines of credit from traditional financial institutions
cannot solve financial inclusion issues caused by structural (Leyshon & Thrift, 1995) and fall victim to predatory
social injustice (e.g., structural racism or discrimination that lending (Lusardi & Scheresberg, 2013). To summarize, the
has led to unfair opportunities/wealth allocation, which in turn tendency for financial institutions to lend to borrowers with
has affected applicants’ creditworthiness). These issues would good credit history creates a “chicken-and-egg” problem that
require systematic policy interventions. As our research keeps preventing the underserved and marginalized
shows, AI tools can, however, be quite effective at addressing population from gaining access to capital.
financial inclusion issues caused by statistical discrimination
(Laouenan & Rathelot, 2022). Third, policymakers should Multiple initiatives and actions have been taken to solve this
collect more empirical evidence to understand the impacts of problem, including opening more bank branches in rural and
AI models and work on actionable AI regulations to deal with underserved areas, reducing eligibility criteria for the
privacy or financial exclusion issues. underserved population, and designing specific products
(Tantri, 2021). These efforts have solved the financial
inclusion issue to some extent. However, they come with high
operational costs and default risks (Awaworyi-Churchill,
Literature Review 2020; Cull, 2011). Our research aims to study whether the
advances in information technology, especially AI-enabled
Our study is related to three streams of research, including the credit scoring models, could help financial institutions
literature on financial inclusion and social justice, the enhance financial inclusion for the underserved population.
literature on AI in lending, and the literature on weak signals
and AI-enabled credit scoring.
AI in Lending
Financial Inclusion and Social Justice AI has been widely used in the finance industry for
algorithmic trading, identity verification, customer service,
The concept of financial inclusion emerged more than a fraud detection, risk assessment, asset valuation, etc. (Gomber
decade ago, advocating for access to valuable and affordable et al., 2018; Kang et al., 2017; Kankanhalli & Mellouli, 2019;
financial products and services (United Nations, 2015; World Lee et al., 2020). Our literature review focuses on the
Bank, 2010). Financial inclusion is a critical component of application of AI in credit scoring and lending decision-
social inclusion, which promotes a just society for all (i.e., all making. A few traditional financial institutions and many
people should have access to education, capital, employment, alternative lenders (e.g., Lending Club, [Link],
health care, political rights, and housing) (United Nations, OnDeck) have already started using AI models to consider
1995). Financial inclusion is the foundation for social justice alternative data and mine complex patterns between borrower
as it provides key financial resources (United Nations, 2006). data and loan performance (Jagtiani & Lemieux, 2019; Liu et
For example, the underserved population needs financial al., 2015). For example, Lending Club and [Link] used
resources such as personal loans to improve their immediate hundreds of variables in their AI models to evaluate the
surroundings and to create new opportunities for themselves creditworthiness of their applicants. As a result, they rely less
and their families. Adequate financial access can provide them on FICO scores and have thus been able to lend money to
with better healthcare, potential home ownership, and career applicants with relatively low FICO scores (Nowak et al.,
and employment opportunities (Mitlin, 2008). 2018). OnDeck and Kabbage included social media data, e-
commerce data, and transportation data in their AI models to
The key challenge in financial inclusion is that the evaluate small businesses and have outperformed traditional
underserved population often lacks sufficient financial banks that use sales data and credit data only (Godbillon-
literacy and credit history, which prevents financial Camus & Godlewski, 2005).
institutions from accurately assessing their risk and
creditworthiness (Loufield et al., 2018). Because traditional Extant studies have documented the impacts of AI models on
models tend to rely heavily on credit history, the lack of consumer and small business lending, including both the
credit history is considered a sign of uncreditworthiness reduction in the default rate (Serrano-Cinca et al., 2015) and
(Yawe & Prabhu, 2015). Therefore, the underserved the changes in approved borrowers’ features (Agarwal et al.,
population is likely to be classified as high risk and thus is 2020; Bartlett et al., 2022). However, there are also concerns
likely to be directly rejected, charged high interest rates and criticisms about the adoption of AI in general and of AI-

1806 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

enabled credit scoring models in particular (Hiller, 2020). One (Chen et al., 2023), and other soft information (Iyer et al.,
concern is that the selection of the training sample may carry 2016, Hou et al., 2023). Weak signals have been underutilized
historical bias and discrimination (Gianfrancesco et al., 2018; by traditional models due mainly to three reasons (Autor,
Mejia & Parker, 2021). The other concern is that AI models 2014; Fügener et al., 2022; Liberti, 2018; Monk et al., 2019):
are typically “black boxes” that do not clearly show the (1) they do not have a clear and direct relationship with
relationships between data input and creditworthiness creditworthiness, so including them challenges underwriting
(Neumann et al., in press; Rudin, 2019). If not handled transparency and explainability; (2) they may contain noisy
carefully, these two issues may strengthen the historical bias information and unstructured contents, so significant efforts
or lead to new biases (Cowgill et al., 2020). are required to make sense of them and incorporate them into
traditional models; (3) they are often difficult to obtain and
Given the promises and pitfalls of AI models in lending, the suffer from missing data issues.
adoption of AI-enabled credit scoring models by traditional
financial institutions has been expected for a long time but has Although taking advantage of weak signals can be
yet to materialize. Because traditional financial institutions challenging, they can be extremely valuable in credit scoring,
face more scrutiny, AI models’ design, deployment, impacts, especially for the underserved population who often have a
and implications would also be expected to differ from those thin credit history or none at all. The value of weak signals
of alternative lenders. To fill this research gap, we cooperated stems from the fact that weak signals can provide helpful
with a traditional bank to understand the impacts of a information on an applicant’s willingness or ability to repay
compliance-compatible AI model on the approval rate and loans (Lu et al., 2023). For example, data collected from
default rate of the underserved population. We also examined mobile financial apps can reveal an applicant’s financial
the underlying mechanisms and the conditions under which management skills. Data related to social connections and
AI models can effectively enhance financial inclusion. social media activities can reflect an applicant’s social capital,
which is known to influence career success (Seibert et al.,
2017). To fulfill the potential of weak signals, modern
Weak Signals and AI-Enabled Credit Scoring machine learning algorithms such as random forests or neural
networks can learn complex models from vast training data
One advantage of AI models is their ability to utilize weak sets (LeCun et al., 2015; Schmidhuber, 2015). These
signals. We adopt the concept of “weak signals” from the algorithms can considerably reduce the cost and difficulty of
existing literature to refer to data that is not directly related to utilizing weak signals and discover interaction effects
creditworthiness or not conventionally used in traditional between strong and weak signals that no traditional model has
credit scoring models (Mendonça et al., 2012). Existing ever exploited. Due to these advantages of AI models, we
studies also used similar terms—for example, “alternative posit that AI models can help financial institutions better
data” (Loufield, 2018; Lu et al., 2023; Serrano-Cinca and understand their applicants, especially members of the
Gutiérrez-Nieto, 2016), “soft information” (Iyer et al., 2016; underserved population who lack credit history.
Liberti & Peterson, 2019), or “nonfinancial data.” We prefer
the term “weak signals” because it indicates that the data is not
only unconventional but also noisy in nature. Traditional
models rely heavily on credit history data, e.g., repayment Data and Empirical Strategy
information, utilization level, and frequency of credit inquiries
(Ozler, 1992). They sometimes also consider employment, Research Setting and Institutional Background
income, assets, and home ownership (Johnson, 2019). We call
these domains of information strong signals because they have We collaborated with a state-owned regional bank to
strong and intuitive relationships with creditworthiness and understand the impacts of AI models on financial inclusion. The
are largely available in numerical values with structural focal bank serves over 50 million people in its home province,
patterns (Liberti, 2019). providing various financial products such as saving accounts,
bill payments, social security funds, credit cards, personal loans,
Unlike strong signals, weak signals can cover a broader range mortgages, and investments. The focal bank previously applied
of financial or nonfinancial data domains. Previous studies traditional rule-based models in its loan evaluation process.
have discussed the following data domains of weak signals: However, it was motivated by a series of government policies
electronic footprint and trajectory (Kim, 2020; Lu et al., advocating banking financial inclusion and, starting in 2020,
2023), social networks (Lin et al., 2013; Lu et al., 2012; Gao decided to experiment with AI-enabled credit scoring models in
et al., 2022), educational background (Li & Hu, 2019), lender- one of its personal loan products. Different government
borrower communication (Xu & Chau, 2018), mobile phone agencies at various levels published these policies, and the
usage (Ma et al., 2018; Lu et al., 2023), facial information essential purpose was to advocate for banks to extend access to

MIS Quarterly Vol. 48 No. 4 / December 2024 1807


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

capital for the underserved population by using AI models to loans for the treated product and a sample of 6.8 million
better assess their creditworthiness. We summarize the major applications and 3.2 million approved loans for the control
policies in Table 1. This setting provided us with a unique product. We also collected rich data on applicants’
empirical opportunity to investigate the impacts of AI models characteristics. We report the descriptive statistics of the
because the use of AI models was previously not allowed or not regular population in Table 2 and those of the underserved
encouraged for use by major banks. population in Table 3. To fully reveal the dynamics of the
applicant pool after the adoption of the AI model, we provide
The focal bank had multiple personal loan products and planned detailed statistics in two-week time slots.
to apply AI models to all these products eventually. It selected
one product as a pilot (the treated product hereafter) because the As reported in Tables 2 and 3, before the adoption of the AI
business team was more open to using new technologies. A model, around 50% of applicants from the regular population
development team spent about six months training an AI model were approved, whereas only 17% of applicants from the
to predict applicants’ creditworthiness (as reflected by credit underserved population were approved. Three aspects of
scores ranging from 300 to 850) and implemented the AI model applicants’ characteristics may explain this phenomenon: the
into the treated product on February 4, 2021. It is important to first distinction between the underserved and regular
note that the AI model focused solely on default risk, without any populations is that applicants from the underserved population
adjustment to favor applicants from the underserved population. often had a limited credit history and a higher default risk; the
second distinction is that applicants from the regular population
The treated product provides each approved borrower a credit were often employed by large firms and government agencies,
allowance, typically between 2,000 and 20,000 CNY. Once the whereas applicants from underserved population were often
borrowers use their credit, they must repay the principal and
self-employed (e.g., small business owners or farmers); the
interest within one month. The product’s interest rate was
third distinction is that applicants from the regular population
determined by a time-variant base point of interest and did not
often lived in big cities, whereas applicants from the
vary across individual applicants. Upon approval, the credit
allowance was determined by a separate formula (rather than underserved population often lived in small cities, towns, or
the traditional model or the AI model), which was mainly rural areas. The descriptive statistics indicate that the launch of
related to the bank’s available funds and the borrower’s the AI model dramatically increased the approval rate for the
income. 4 The AI model was used only in the underwriting underserved population and reduced the default rate for both the
process to determine whether to approve a loan application. underserved and regular populations.
Before the adoption of the AI model, only 15% of loans were
issued to the underserved population, who accounted for 80%
of the province’s population. In contrast, 85% of the loans were Development and Implementation of the AI
issued to the regular population, who accounted for only 20% Model
of the population of this province (the regular population
hereafter). We were able to identify a similar personal loan As mentioned earlier, the AI model does not affect the
product provided by the focal bank that had a similar design and formula for interest rate or credit allowance; it merely
served the same customer pool (the control product hereafter).5 influences the decision on whether an application will be
During our observation period, no significant adjustment was approved. The focal bank’s traditional evaluation process
applied to the control product (the focal bank started to develop relies on a rule-based credit scoring model that (1) primarily
an AI model for the control product in 2022). Figure 1 displays uses strong signals, (2) is built upon expert knowledge and
the timeline for the initial adoption of the AI model and a conventional rules, and (3) allows only linear combination
follow-up update on the AI model (more on the update later).
and a tree-structure logic.7 The AI model was developed by
a team of five people over six months. We detail the
We collected data on the treated and control products from
development and implementation procedure in Figure 2 and
October 1, 2020, to April 30, 2021. 6 We ended up with a
discuss the major actions in each step.
sample of 2.5 million applications and 1.2 million approved

4 7
This practice is typical for financial institutions in China and the U.S. The Traditional rule-based models rely largely on expert knowledge and
credit line or the loan amount is often determined after creditworthiness is experience. Rule developers may try different combinations of rules
evaluated. according to historical data, but their analysis mainly checks descriptive
5
See Table A1 for the compassion between underserved applicants of the statistics. Data analytics may play a more important role for traditional
treated and control products. models that use complex algorithms. However, traditional models usually
6
For details on the data, cleaning and processing procedures, and analysis restrict themselves to conventional and commonly used data (e.g., credit
code, please refer to the transparency materials. history) and algorithms (e.g., regressions). The reason is that traditional
models are used by traditional financial institutions subject to regulations.

1808 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 1. Government Policies on Financial Inclusion


Year Government agency Policy title Main content
Promoting the Brought up the long-term goal to enhance the
General Office of the State
2015 development plan of coverage and accessibility of financial services for
Council
financial inclusion the underserved population
China Banking and Insurance Report on the
2018 Regulatory Commission development of financial Brought up the idea of digital financial inclusion
(CBIRC) inclusion in China
People’s Bank of China, Guidance for financial
Encouraged banks to build a credit scoring system
2019 CBIRC, Ministry of Agriculture services for rural
to serve the underserved population
and Rural Affairs revitalization
Supported the application of emerging technologies
Standing Committee of the X X province’s local
2020 such as big data and AI in enhancing financial
Province’s People’s Congress financial regulations
inclusion
Note: These government policies do not precisely define the underserved population. The focal bank identified applicants as underserved based
on any of these criteria: (1) annual household income < 20,000 CNY (i.e., Chinese Yuan, which is the official currency of China); (2) household
assets < 80,000 CNY; (3) have been unemployed for six consecutive months; (4) have no permanent residence (parents’ house can be
considered as children’s permanent residence); (5) have received government benefits in the last three years.

Figure 1. The Adoption Timeline of The AI Model

Table 2. Descriptive Statistics for the Regular Population


Sample Regular population
After 0-2 After 2-4 After 4-6 After 6+
Variables/time Before
weeks weeks weeks weeks
Total loan applications 3,752,483 359,059 461,883 542,398 2,812,676
Approval rate 49.89% 49.76% 51.25% 53.37% 51.20%
Default rate 0.30% 0.11% 0.11% 0.11% 0.10%
Utilization level 32.35% 35.33% 34.53% 38.86% 38.58%
Average credit allowance 5,805 5,876 5,933 8,126 9,547
Percent of urban households 93.46% 90.93% 91.07% 90.57% 90.81%
Percent of self-employed 10.39% 11.93% 12.26% 12.37% 12.20%
Average monthly revenue of self-employed 26,474 26,535 26,113 26,587 26,899
Percent with income data 23.79% 23.18% 22.93% 22.95% 23.69%
Average annual income (CNY) 148,134 149,336 149,767 150,029 151,573
Percent with credit history 94.39% 94.59% 94.88% 94.23% 94.99%
Average number of credit applications 3.40 3.25 3.24 3.22 3.28
Average number of credit rejections 0.36 0.38 0.38 0.39 0.39
Average account balance at the bank 46,238 45,037 45,456 44,612 44,335
Percent of females 0.41 0.44 0.44 0.44 0.44
Average age 38.78 36.33 36.23 35.61 36.14
Note: “before” and “after” denote whether the time of loan application is before or after February 4, 2021, which is the adoption date of the AI model.

MIS Quarterly Vol. 48 No. 4 / December 2024 1809


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Figure 2. The Development and Implementation of the AI Model

Table 3. Descriptive Statistics for the Underserved Population


Sample Underserved population
After 0-2 After 2-4 After 4-6 After 6+
Variables/time Before
weeks weeks weeks weeks
Total loan applications 583,252 62,041 89,275 107,290 531,725
Approval rate 16.83% 23.36% 25.58% 26.62% 35.50%
Default rate 4.58% 2.17% 2.15% 1.03% 0.93%
Utilization level 46.55% 49.67% 49.54% 56.73% 58.62%
Average credit allowance† 3,656 3,573 3,617 5,828 6,822
Percent of urban households 12.33% 17.25% 17.31% 19.26% 19.83%
Percent of self-employed 80.56% 82.03% 82.11% 83.53% 84.55%
Average monthly revenue of self-employed 24,867 23,701 23,669 23,981 23,007
Percent with income data 11.57% 11.05% 11.03% 10.58% 10.13%
Average annual income (CNY) 58,862 58,073 57,535 57,427 57,013
Percent with credit history 52.32% 55.37% 59.54% 61.23% 68.88%
Average number of credit applications 1.32 1.03 1.05 1.01 1.01
Average number of credit rejections 0.16 0.20 0.20 0.19 0.21
Average account balance at the bank 27,535 26,271 25,231 24,089 23,331
Percent of women 0.29 0.34 0.35 0.39 0.39
Average age 35.29 31.31 30.23 30.17 29.51
Note: "Before" and "after" denote whether the time of loan application is before or after February 4, 2021, which is the implementation date of
the AI model. †The change in the average credit allowance around six weeks after the adoption date was due to the increased fund allocation
from the bank. The funds were always there, and the bank might have decided to increase the allocation because it saw a lower default rate.
This increase was proportional to both the underserved and regular populations.

As shown in Figure 2, the development team collected data loan performance (i.e., default or not) in Step 5. After the
from both internal and external sources in Step 1, including initial individual models were built, the team reviewed the
not only the aforementioned strong signals (e.g., credit data feature importance index of each feature, the consistency of
and asset data) but also weak signals (e.g., app data). In Step the index of each feature, and the meanings of top features to
2, the team cleaned the data by connecting multiple data remove unimportant, inconsistent, or hard-to-interpret
sources, checking extreme values, handling missing values, features in Step 4. Steps 4 and 5 formed an iterative process
and converting categorical variables. The team then generated because features were selected after models were trained and
features from all data domains using different feature new models were further trained with new features. Once all
generation techniques in Step 3 and trained four individual features were confirmed, the winning algorithm was selected
learners using different combinations of features to predict

1810 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

for each individual model.8 Then in Step 6, the team trained the shaded cells in Figure 3), from approval to denial, from
the ultimate ensemble model using the four individual models denial to approval, or from approval/denial to pending
and tested its performance. The final AI model is a two-layer (pending means the bank needs to collect more information
model that combines four individual learners and one from the applicant and feed the information into this
ensemble learner (see Figure A1 in the Appendix for the evaluation process again to decide). When the traditional and
kernel of the AI model). In Step 7, before the implementation AI models shared the same recommendation, the evaluation
of the AI model, the development team and the validation process resulted in a straightforward decision.
team reviewed the model again and removed hard-to-interpret
features or features related to protected features (e.g., gender
and disability). This change pointed the process back to Step Identification Strategy and Empirical
4 and resulted in a slightly adjusted AI model. After the model Specifications
was eventually approved and implemented, the development
team continued to monitor its performance over time and its The adoption of the AI model for the treated product by the
potential impacts on marginalized subgroups (i.e., minority focal bank provided us with a unique opportunity to
races) in Step 8. investigate the impacts of AI-enabled credit scoring models
on financial inclusion. We applied a difference-in-differences
The development team used data samples from July 1, 2020, (DID) identification strategy to estimate their impacts. We
to December 31, 2020 (around 300,000 issued loans), to build took several steps to remove potential confounding effects in
the AI model. The sample was randomly split into three data our research setting.
sets, including 50% training data, 30% validation data, and
20% testing data. The training data was used in Step 5 for The first challenge came from the potential impacts of
training individual learners. The validation data was used in government policies on financial inclusion. One potential
Step 5 to confirm individual winning algorithms and in Step 6 impact of the government policies was the change in the
to train the ensemble learner. The testing data was used in Step applicant pool from the underserved population. Because many
6 to determine the winning ensemble learner. The training of the policies were issued several years ago before the adoption
process followed a common supervised learning approach and of the AI model, we do not believe they would have confounded
ended up with a LightGBM model as the final winner. It is our results. Furthermore, these policies would influence both
worth emphasizing that the training data set contained the the treated and control products similarly, which would be
natural proportion of the underserved population, and the handled by the DID strategy. Another potential impact of
training process did not assign higher weights to the government policies is on the bank’s loan evaluation process
underserved population. The regular and underserved and eligibility criteria. We communicated with the focal bank
populations shared the same features and were trained using to fully understand their actions to comply with these
the same model. To summarize, compared to the traditional government policies. The focal bank explicitly decided to use
model, the AI model (1) uses both strong and weak signals, the same cutoff score for loan approval decisions for both the
(2) is trained using machine learning algorithms, and (3) underserved and regular populations, as it intended to enhance
allows complex connections between features and financial inclusion without hurting lending performance.
creditworthiness. Due to financial regulations, the focal bank Besides adopting the AI model, the focal bank also took two
did not replace the traditional model with the AI model actions. During the third quarter of 2020, they proactively
entirely. Figure 3 shows how the AI model worked with the promoted all their financial products to the entire population in
traditional model. their home province and adjusted the traditional rule-based
models. We therefore excluded this period from our sample to
As indicated by Figure 3, the traditional and AI models first eliminate any confounding effects (this is why our sample starts
evaluated the applicants independently. Then, a decision on October 1, 2020). The focal bank also had a few bankers and
matrix was used to determine the final lending decision. The loan officers reach out to some applicants from the underserved
cutoff score was 520 before the adoption of the AI model; population to promote its products and evaluate the applications
thus, applicants with a score above 520 would have been manually during our observation period. This behavior would
approved if the traditional model had been used. However, have confounded our findings, so we removed all the
once adopted, the AI model could overturn the applications that were evaluated manually (manual evaluations
recommendations from the traditional model (as indicated in were marked in the bank’s database).

8
The team compared multiple state-of-the-art algorithms and found the
LightGBM algorithm to be the winner. More details on LightGBM (short
for light gradient-boosting machine) are available in Figure A1.

MIS Quarterly Vol. 48 No. 4 / December 2024 1811


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Figure 3. The Implementation of the AI-Enabled Credit Scoring Model at the Treated Product

The second challenge comes from the selection effect at the four sets of variables to make the estimation more precise,
product level. We explicitly discussed the motivation for including (1) 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 , which covers
deploying the AI model on the treated product, and the focal applicant/borrower financial and demographic information,
bank explained that they had a plan to apply AI models to all e.g., income, credit history, gender, age, etc.; (2)
personal loan products (including the control product). The 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 , which covers the two minor updates on the
treated product was selected as a pilot because the business team traditional model and one update on the AI model after its
was more open to new technologies (actually, another AI model adoption. Two traditional model updates occurred on
has been under development for the control product since November 15, 2020, and December 13, 2020, while the AI
2022). This selection was not 100% random, but it had no direct model update occurred on April 17, 2021; (3) 𝑇𝑖𝑚𝑒𝑡 , which
or clear connection with financial inclusion (see Table A1 in the captures the time-fixed effects at the daily level; (4) 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 ,
Appendix for the empirical evidence). which captures the product-fixed effects, i.e., the general long-
lasting differences between the treated and control products. In
After carefully reviewing the institutional background and this DID setting, we essentially compared the change in
cleaning the data set, we ended up with a sample containing performance in the treated group before and after the adoption
9,300,107 applications. Using this sample, we examined the of the AI model versus the change in performance in the control
changes brought by the AI model after its implementation into group before and after the same time period.
the evaluation process. The treated loan product serves as the
treatment group, whereas a similar loan product serves as the In addition to the approval rate, we were also interested in how
control group. We used Equation (1) to investigate how the the AI model influences lending performance. Therefore, we
adoption of the AI model influenced the approval rate of loan followed the logic in Equation (1) and applied Equation (2) to
applicants. The unit of analysis is each loan application, and the answer this question. Equation (2) is the same as Equation (1)
sample period is from October 1, 2020, to April 30, 2021. other than two differences: first, the unit of analysis is at the loan
level rather than the loan application level because we could
𝐴𝑝𝑝𝑟𝑜𝑣𝑒𝑑𝑖𝑗𝑡 = 𝛽0 + 𝛽1 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 + observe lending performance on the approved loans only;
𝛽2 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 + 𝛽3 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 + second, the key dependent variables are a loan’s default status
𝑇𝑖𝑚𝑒𝑡 + 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 + 𝜀𝑖𝑗𝑡 (1) and its utilization level. Loan default is a dummy variable, with
1 indicating default (i.e., repayment was late for over one
𝐴𝑝𝑝𝑟𝑜𝑣𝑒𝑑𝑖𝑗𝑡 denotes whether a loan application was approved, month). Utilization level represents the extent to which the
with 1 indicating approval. Because the unit of analysis is each borrower used the allocated credit allowance. It is measured by
application, i denotes each application, j denotes the product the outstanding loan a borrower had in the immediate one
(i.e., the treated product or control product), and t denotes the month after approval divided by the credit allowance. Because
day when the application was evaluated. Once we know the the interest rate varied over time and was unrelated to individual
application i, we know the product type j and the application borrowers’ characteristics, the time-fixed effects absorb the
date t. However, we used all three subscripts together to make effect of the loan interest rate. Credit allowance was added as a
Equation (1) clearer. 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 is the key independent control variable.
variable in this Equation, denoting whether the AI model has
been adopted in the evaluation process. For the control product, 𝐿𝑜𝑎𝑛 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑖𝑗𝑡 = 𝛽0 + 𝛽1 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 +
this variable always takes the value of 0. For the treated product, 𝛽2 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 + 𝛽3 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 +
this variable takes the value of 1 after the adoption of the AI 𝛽4 𝐶𝑟𝑒𝑑𝑖𝑡 𝐴𝑙𝑙𝑜𝑤𝑎𝑛𝑐𝑒𝑖 + 𝑇𝑖𝑚𝑒𝑡 + 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 + 𝜀𝑖𝑗𝑡 (2)
model. Before the adoption, the value is 0. We controlled for

1812 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 4. The Impacts of the AI Model on the Approval Rate and Lending Performance: Whole Population
Sample Whole Population
Dependent variables Approval (1/0) Default (1/0) Utilization Level
The initial launch of the AI model -0.008 (0.005) -0.008*** (0.003) 0.058** (0.027)
AI model update (Apr 17) 0.033*** (0.004) -0.003** (0.001) 0.032*** (0.011)
Traditional process update (Nov 15) -0.035*** (0.011) -0.014*** (0.004) 0.012 (0.015)
Traditional process update (Dec 13) -0.007 (0.007) -0.011*** (0.003) 0.023* (0.013)
Has income (1/0) 0.009*** (0.002) -0.006*** (0.001) 0.016*** (0.002)
Income imputed 0.006*** (0.002) -0.002*** (0.001) 0.003* (0.000)
Has credit history (1/0) 0.016*** (0.005) -0.007* (0.004) 0.005*** (0.001)
Number of applications imputed 0.000 (0.001) 0.004*** (0.000) 0.011*** (0.004)
Number of rejections imputed -0.010*** (0.003) 0.005*** (0.002) 0.013*** (0.001)
Bank account balance 0.004*** (0.002) -0.009*** (0.003) -0.003*** (0.001)
Gender (Female = 1) -0.003*** (0.001) -0.005*** (0.002) -0.013*** (0.001)
Age -0.006*** (0.002) -0.007** (0.004) 0.003*** (0.001)
Loan amount/credit allowance N.A. 0.000 (0.001) -0.003*** (0.001)
Time fixed effects √ √ √
Product fixed effects √ √ √
Adjusted R2 0.035 0.016 0.015
No. of observations 9,300,107 4,368,626 4,368,626
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level. Adjusted R2 was calculated
without including the fixed effects. Considering that not every applicant has income information or credit history, we used the dummy variables
"has income" / "has credit history" to denote if this information was missing. We further imputed missing income data or credit history data with
0. As a result, the coefficient of "has income" reflects the impact of the presence of verified income data on lending decisions, and the coefficient
of "income imputed" reflects the impact of the true/reported income (conditional on “has income” being 1). We included both in our model
because they had different implications ("has income" estimated the importance of having income data in the full sample, whereas "income
imputed" estimated the importance of income amount among the subsample who had income data).

Results The adoption of the AI model increased the approval rate by


15% and reduced the default rate by 0.9% for the underserved
Main Impacts population. Considering that the approval rate was 16.8% and
the default rate was 4.6% before the adoption of the AI model,
We report the key findings in this section about the treatment the relative magnitudes of the treatment effects are +89.3% for
effects of the AI model on the approval rate, default rate, and the approval rate and -19.6% for the default rate. The results
utilization level. Table 4 reports the results in the full sample. indicate that the traditional model overlooked significant
Surprisingly, the overall approval rate didn’t change much after opportunities in the underserved population. By including
the adoption of the AI model (the coefficient -0.008 is weak signals and using advanced machine learning
insignificant). We will dig deeper into this finding when we algorithms, the AI model was able to more accurately estimate
discuss how the AI model influenced the underserved and the default risk of each applicant from the underserved
regular populations differently. The AI model also reduced the population and selectively offer access to capital for those who
default rate and increased the utilization level, leading to more were creditworthy. Given that the AI model was able to
profits. To summarize, the overall effect of the AI model is to significantly increase the approval rate and decrease the
improve profitability with little impact on loan approval. It is default rate simultaneously for the underserved population, we
worth noticing that the “AI model update (Apr 17)” further conclude that the AI model can enhance financial inclusion.
decreased the default rate and increased the utilization level. In addition, the adoption of the AI model also increased the
The coefficients of borrower features are also in line with utilization level, which can bring more profits to the focal
theoretical expectations. For example, applicants with income bank. Compared to the effect on the underserved population,
information and credit history are more likely to be approved the effect on the approval rate of the regular population was in
and less likely to default. Higher income, more bank account the opposite direction (though insignificant). The launch of
balances, and fewer rejections are associated with higher
the AI model reduced the approval rate by 0.6%, equaling a
approval likelihood and lower default likelihood. Because the
relative magnitude of -1.2% (i.e., -0.006/0.499). However, the
full sample analysis may mask the nuanced and heterogeneous
AI model significantly reduced the default rate and increased
impacts of the AI model, we split the full sample into the regular
and underserved populations and reran Equations (1) and (2). the utilization level for the regular population.
Table 5 reports the results.

MIS Quarterly Vol. 48 No. 4 / December 2024 1813


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 5. The Impacts of the AI Model on the Approval Rate and Lending Performance: Subsample
Analysis
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
variables (1/0) (1/0) Level (1/0) (1/0) Level
The initial launch of the -0.006 -0.008*** 0.057** 0.150*** -0.009*** 0.058**
AI model (Feb 4) (0.007) (0.003) (0.027) (0.012) (0.002) (0.028)
AI model update 0.030** -0.003*** 0.032*** 0.030 *** -0.007 *** 0.032***
(Apr 17) (0.005) (0.001) (0.011) (0.006) (0.001) (0.010)
Traditional process -0.126*** -0.015*** 0.011 -0.013 -0.010** 0.017
update (Nov 15) (0.033) (0.005) (0.015) (0.024) (0.004) (0.018)
Traditional process -0.012 -0.011*** 0.024* -0.033*** -0.014*** 0.022
update (Dec 13) (0.024) (0.003) (0.013) (0.010) (0.003) (0.014)
0.013*** -0.007*** 0.020*** 0.009*** 0.003* 0.009***
Has income (1/0) (0.003) (0.001) (0.004) (0.003) (0.002) (0.002)
0.007*** -0.003*** 0.003*** 0.000 0.003*** 0.002***
Income imputed
(0.002) (0.001) (0.000) (0.001) (0.001) (0.000)
0.010*** -0.008*** 0.004*** 0.020*** -0.004*** 0.006**
Has credit history (1/0)
(0.002) (0.000) (0.001) (0.005) (0.001) (0.004)
Number of credit 0.000 0.004*** 0.007* 0.000 0.001*** 0.011***
applications imputed (0.000) (0.000) (0.005) (0.000) (0.000) (0.002)
Number of credit -0.012*** 0.006*** 0.013*** -0.004*** 0.003*** 0.015***
rejections imputed (0.002) (0.001) (0.002) (0.001) (0.000) (0.004)
0.004*** -0.009*** -0.004*** 0.007*** -0.004*** -0.002***
Bank account balance (0.002) (0.002) (0.008) (0.002) (0.001) (0.000)
-0.002*** -0.004*** -0.012*** -0.006*** -0.009*** -0.016***
Gender = Female
(0.001) (0.001) (0.003) (0.002) (0.004) (0.002)
-0.005*** -0.008*** 0.004*** -0.006*** -0.004*** 0.002
Age
(0.002) (0.002) (0.001) (0.002) (0.001) (0.002)
N.A. 0.000 -0.008*** N.A. 0.003* 0.002
Loan amount
(0.000) (0.001) (0.002) (0.001)
Time fixed effects √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √
Adjusted R2 0.050 0.015 0.013 0.024 0.011 0.021
No. of observations 7,927,519 4,016,328 4,016,328 1,372,588 352,297 352,297
Notes: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.

The improvement in financial inclusion occurred against the model update, which further increased the approval rate and
backdrop of the traditional model, which evaluated applicants decreased the default rate for the underserved population.
using strong signals only. Because the underserved population
does not necessarily possess such strong signals, the traditional The findings from Tables 4 and 5 suggest that the AI model can
model often categorically rejects them. This is a form of better distinguish good applicants from bad applicants. This has
statistical discrimination (Fang & Moro, 2011). The AI model two implications: first, the approved applicants should have a
addressed this issue by incorporating weak signals and using lower default rate, which we have already seen; second, the
advanced machine learning algorithms, thus providing more rejected applicants should have a higher default rate if approved.
accurate estimations of applicants’ creditworthiness. Our results In other words, the default rate of the rejected population should
show that the AI model identified some “bad” regular applicants be positively related to the adoption of the AI model. However,
and denied access to capital for them, leading to a slight (though their loan performance would have no ground truth because their
insignificant) reduction in the approval rate and a substantial applications would have been rejected. To approximate their
reduction in the default rate. At the same time, the AI model loan performance, we identified applicants rejected by the
increased the approval rate and decreased the default rate for the treated product or the control product but approved by other
underserved population. Our results are consistent with the AI financial products within one week.9 We set one week as the

9
Most applicants rejected by the treated or control products failed to get a
loan elsewhere. Only 2.3% of them could get approved by other products
within seven days of the application date.

1814 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

time interval to increase the likelihood that these applicants’ enlarged. These two facts collectively explain why the AI
creditworthiness would not change. Using their default status model can simultaneously increase the approval rate and
with other financial products as a proxy, we conducted a DID decrease the default rate. The more accurate and fair
analysis of the impact of the AI model on the lending distribution of the credit scores for the underserved population
performance of the rejected population. Table 6 reports the qualifies more applicants from the underserved population
results and shows that the applicants rejected by the treated and better distinguishes the creditworthy ones from the
product are associated with a higher default rate after the uncreditworthy ones.
adoption of the AI model. In addition, the applicants rejected by
the treated product are also associated with lower utilization Data Domains
levels after the adoption of the AI model.
To strengthen and deepen this logic chain, we further
Our main findings suggest that the AI model simultaneously investigated the factors in the AI model that contribute to the
reduces Type I errors (false approval) and Type II errors (false increase in prediction accuracy. As we mentioned earlier, this
denial), consistent with our theoretical expectation that the AI increase may come from the use of weak signals (including the
model reduces statistical discrimination against the features generated from them), advanced machine learning
underserved population. algorithms, or both. We first compared the data foundations for
the traditional and AI models, and then estimated each
mechanism’s magnitude. The traditional model of the treated
Mechanisms product mainly relies on human expertise to generate features
and rules using strong signals (e.g., Rule 1: the number of
Prediction Accuracy overdue credit products is more than 3). Experts assign each
rule a score, and the collection of rules hit by each applicant
To understand the seemingly contradictory finding that the AI determines the final credit score and lending decision. The
model simultaneously increases the approval rate for the traditional model eventually includes about 80 rules built upon
underserved population and reduces the default rate and 50 features. Unlike the traditional model, the AI model utilizes
increases the utilization level for the whole population, we both strong and weak signals to predict the default rate (see
note that an essential requirement of the AI model is that it Table 8). Specifically, the traditional model uses data domains
offers a more accurate creditworthiness prediction model for 1-6. In addition to these strong signals, the AI model also uses
both the regular and underserved populations. To assess the social security fund data, provident fund data, in-app financial
prediction accuracy of the AI model, we provide descriptive behavioral data, and in-app nonfinancial behavioral data (i.e.,
statistics on the approval rate and the default rate by looking Data Domains 7-10).
at the distributions of credit score brackets generated by the
traditional and AI models in Table 7. As defined earlier, weak signals refer to information that is not
directly related to creditworthiness and suffers from noisy
Several interesting findings emerge from Table 7. First, the patterns and missing data issues. First, none of the weak
traditional model often underestimated the underserved signals is directly related to creditworthiness. For example, the
population. For example, the traditional model assigned only provident fund can only be used to purchase real estate but
6.8% of the underserved population a credit score between cannot be used to pay off credit card bills or other loans; the
620 and 850. However, this number was 15.7% for the AI social security fund can only be withdrawn after retirement,
model. Given that the traditional model used an absolute and, again, cannot be used to pay off credit card bills or other
cutoff score (i.e., 520) to approve applicants, the adjustment loans; the in-app financial behaviors include depositing
on credit score distribution by the AI model helped approve money, paying bills, reading statements, which have no direct
more applicants from the underserved population. Second, the relationship with creditworthiness; in-app nonfinancial
credit score from the AI model had better prediction accuracy, behaviors include reading news, checking advertisements, and
which can be seen from the correlation between the credit watching videos, which again have no direct relationship with
score and the default rate. The traditional model did a poor job creditworthiness. Second, human experts have limited ability
for the underserved population as the credit score was not to infer complex relationships between weak signals and
closely correlated to the default rate (e.g., the default rate in creditworthiness. For example, given the employment status
the 620-850 range is even higher than the default rate in the and the nature of their jobs, the underserved population may
520-620 range). The AI model vastly improved prediction exhibit various patterns on their social security fund deposits,
accuracy, as the default rate of the above-cutoff population which makes it difficult for human experts to infer repayment
was significantly reduced, and the differences in the default capability. Therefore, making weak signals into rules is
rate between various credit score brackets were significantly largely impractical for traditional models.

MIS Quarterly Vol. 48 No. 4 / December 2024 1815


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 6. The Impact of the AI Model on Financial Inclusion—Screening Performance


Regular applicants who are rejected and Underserved applicants who are
Sample
matched with other loans rejected and matched with other loans
Dependent variable Default (1/0) Utilization level Default (1/0) Utilization level
The initial launch of the AI
0.008*** (0.001) -0.072*** (0.004) 0.026*** (0.002) -0.042*** (0.008)
model (Feb 4)
AI model update (Apr 17) 0.005 (0.006) -0.039*** (0.001) 0.043*** (0.005) -0.026*** (0.001)
Other controls √ √ √ √
Time fixed effects √ √ √ √
Product fixed effects √ √ √ √
Adjusted R2 0.089 0.192 0.096 0.137
No. of observations 80,252 80,252 37,229 37,229
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.

Table 7. Credit Score Distributions from the Traditional and AI Models


Traditional model AI model
Regular population Underserved population Regular population Underserved population
Score range Percentage Default rate Percentage Default rate Percentage Default rate Percentage Default rate
620-850 23.5% 0.04% 6.8% 0.61% 18.7% 0.00% 15.7% 0.05%
520-620 21.3% 0.07% 11.2% 0.52% 21.6% 0.01% 22.8% 0.22%
480-520 30.2% 0.11% 30.2% 1.95% 35.2% 0.07% 30.8% 1.87%
450-480 16.6% 0.48% 33.2% 2.09% 16.8% 0.41% 21.4% 3.61%
300-450 8.4% 0.69% 18.7% 3.75% 7.3% 1.33% 9.3% 7.26%
Note: We used the applications after the adoption date of the AI model to generate these results. Because the traditional and AI models
evaluated the applicants separately, we know the percentage of applicants in each credit score bracket. The default rate represents the
performance of approved applicants within each credit score bracket. There was always a separate whitelist system that approved certain
applicants (e.g., social honor recipients and veterans) regardless of credit scores, so each bracket may contain some approved applicants.

Table 8. Data Domains for the Traditional and AI Models


Feature Domain Strong Weak
Sample features Regular population Underserved population
domains number signal signal
% without # features % without # features
data /total data /total
Blacklist 1 √ ID hits fraud risk list 73.26% 2.4 / 10 75.33% 1.5 / 10
Identity
2 √ Inconsistent education 0.00% 5.9 / 10 0.11% 4.1 / 10
information
Income asset
3 √ Current assets 56.33% 1.38 / 5 59.13% 0.75 / 5
information
Overdue credit Current overdue
4 √ 7.27% 28.6 / 120 50.32% 12.57 / 120
products amount
Credit product
5 √ Number of applications 7.27% 102.5 / 150 50.32% 77.3 / 150
application
Use of credit
6 √ Current utilization level 7.27% 288.6 / 600 50.32% 216.7 / 600
products
Social security Social security
7 √ 7.03% 24.88 / 45 7.27% 24.42 / 45
fund data withholding amount
Provident fund Provident fund
8 √ 6.34% 15.2 / 35 13.83% 12.5 / 35
data withholding amount
In-app financial Deposit, bill payment,
9 √ 2.82% 199 / 600 2.37% 192 / 600
data money transfer
In-app
Read news, watch
nonfinancial 10 √ 2.35% 334 / 1500 5.75% 315 / 1500
videos, collect coupons
data

1816 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 8 above also demonstrates the disadvantages of the which we used strong signals only to train a new AI model with
underserved population when the traditional model is used. loans approved before February 4, 2021. We report the results
More than half of the underserved population has no in Table A4. The results show that the use of advanced
information on Data Domains 4-6 (which are from credit algorithms enables AI to achieve higher prediction accuracy,
history data); even for those with data, the data are still limited even when the AI and traditional models use the same set of
as indicated by the number of features the data can generate. information. The deep root of this improvement stems from the
In contrast, the underserved population has better availability advanced feature generation techniques and the complex yet
of weak signals (i.e., Data Domains 7-10), and they have a reliable connections between features and creditworthiness.
similar number of features generated from weak signals as the Table A5 demonstrates this feature generation process and
regular population.10 provides further details and evidence.

To better understand how the two engines of the AI model


Weak Signals and Advanced Models influence financial inclusion for the underserved population, we
summarize the findings from Tables A2 and A4 in Table 9
Given the data domains mentioned above, we first investigated based on the AUC values.11 Column 1 in Table A2 serves as the
the extent to which the impact comes from weak signals. If the baseline in Table 9. Columns 2-4 in Table 9 reflect the relative
enhanced financial inclusion is due to the inclusion of weak improvement in AUC from the inclusion of weak signals, the
signals, we would expect traditional models to work better using use of advanced algorithms, and a combination of using weak
both strong and weak signals than using strong signals only. To signals and advanced algorithms. We subtracted 0.5 (i.e., the
examine this expectation, we compared three decision models. minimum AUC value) from the raw AUC values and then
The first one was the actual traditional model in use on February compared the relative change in that new value.
4, 2021; the second one was a simulated traditional model, which
used rules built on features generated from both strong and weak
Consistent with our previous findings, Table 9 shows that both
signals. Because the actual traditional model uses Data Domains
weak signals and advanced algorithms contribute to the
1-6 only, we had to borrow some important features of weak
improvement in prediction accuracy. Comparing the
signals from the AI model and convert them into rules to add to
magnitudes in Column 2 with those in Column 3, we conclude
the traditional model; the third one was the actual AI model. The
that the improvement from advanced algorithms is greater than
prediction target was whether a loan had defaulted, and we used
that from weak signals. Table 9 further shows that the
the loans approved after February 4, 2021, to evaluate the
improvement brought by weak signals and advanced algorithms
prediction accuracy of each model. Table A2 reports the results.
to the underserved population is stronger than that brought to
The results show that weak signals can help the traditional model
the regular population. Because of the lack of strong signals
improve prediction accuracy. The deep root of this improvement
from the underserved population and the limited ability of
stems from how features are generated from weak signals and
human experts to create nuanced features and infer complex
why they are predictive of creditworthiness. Table A3
relationships between weak signals and creditworthiness, the AI
demonstrates this feature generation process and provides
model can take better advantage of weak signals with advanced
further details and evidence on the value of weak signals.
algorithms. The last finding is that the underserved population
benefits more from including weak signals than the regular
The second channel we investigated is the extent to which the population when using advanced algorithms. The additional lift
increase in prediction accuracy comes from advanced machine from Column 3 to Column 4 is 43 percentage points for the
learning algorithms. If this were the case, we should expect the
regular population but 128 percentage points for the
AI model to work better than the traditional model with the same
underserved population. This is because strong signals are
set of information (i.e., domains 1-6 in Table 8). To examine this
largely available and precise for the regular population, so weak
expectation, we compared three decision models. The first one
signals provide little additional value. However, as strong
was the actual traditional model in use on February 4, 2021; the signals are largely missing for the underserved population,
second one was the actual AI model in use on February 4, 2021, weak signals become particularly important to the underserved
but we only fed strong signals into the AI model to generate
population in evaluating their creditworthiness.
credit scores; the third one came from a simulation approach in

10
A high proportion of the underserved population has weak signals because 11
AUC stands for “area under the ROC curve”, which is a measure of the
they are using the focal bank’s app to deposit social security and provident funds prediction accuracy of classification models. ROC stands for “received
or pay utility fees (the focal bank is the largest bank in that province handling operating characteristic curve,” which is a graph showing the true positive
these services). Members of the underserved population who are not customers rate against the false positive rate at all classification thresholds. Higher
of the focal bank will not have weak signals in this regard, and applying the AI AUC values indicate higher prediction accuracy.
model would not improve financial inclusion for them. Some other actions are
needed to provide access to capital to this subgroup.

MIS Quarterly Vol. 48 No. 4 / December 2024 1817


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 9. Why Does the AI Model Have Higher Prediction Accuracy: Weak Signals and Advanced Algorithms
Strong signal + Strong signal + Strong signal + AI Strong signal +
Models Traditional models Weak signals + models Weak signals +
Traditional models AI models
Column (1) (2) (3) (4)
Regular population Baseline +114% +186% +229%
Underserved population Baseline +143% +229% +357%

Table 10. Data Domain Importance Based on Single-Domain Explainability


Sample Regular population Underserved population
Model AI model AI model
Data domains K-S AUC K-S AUC
Blacklist 0.17 59 0.20 63
Identity information 0.16 59 0.10 57
Income asset information 0.20 62 0.15 57
Overdue credit products 0.28 68 0.18 62
Credit product application 0.20 63 0.20 63
Use of credit products 0.23 65 0.16 58
Social security fund data 0.16 59 0.11 55
Provident fund data 0.17 59 0.15 57
In-app financial behavioral data 0.13 57 0.22 67
In-app nonfinancial behavioral data 0.15 57 0.22 68
Note: K-S stands for Kolmogorov-Smirnov test scores, which measure the degree of separation between the positive and negative distributions
of classification models. When two groups can perfectly separate positives and negatives, the K-S score is 1. A larger K-S score indicates higher
prediction accuracy.

Table 11. Correlation Table of Strong Signals and Weak Signals


Feature domains (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Blacklist (1) 1
Identity information (2) 0.36 1
Income asset information (3) 0.22 0.32 1
Overdue credit products (4) 0.87 0.58 0.16 1
Credit product application (5) 0.20 0.48 0.24 0.49 1
Use of credit products (6) 0.33 0.40 0.15 0.44 0.39 1
Social security fund data (7) 0.07 0.67 0.73 0.38 0.26 0.16 1
Provident fund data (8) 0.16 0.41 0.56 0.30 0.17 0.11 0.67 1
In-app financial behavioral data (9) 0.05 0.07 0.07 0.04 0.07 0.22 0.14 0.05 1
In-app nonfinancial behavioral data (10) 0.03 0.15 0.05 0.03 0.05 0.17 0.15 0.06 0.52 1

Additional Predictive Power of Weak Signals To assess the relationship between weak signals and strong
signals in predicting default, we focused on correlation scores.
To further understand the value of weak signals, we explored However, we note that within each data domain there can be
two aspects of weak signals: their predictive power on default dozens or hundreds of features. Therefore, we first applied a
likelihood and the relationships between weak signals and principal-component analysis (i.e., PCA) transformation and
strong signals. Table 10 reports the importance/power of each used the primary principal component to capture the
data domain in explaining default likelihood. We employed a maximum variation in all features from the same domain. We
single-domain explainability approach, which uses features then used the primary component from each data domain to
from only one data domain to train the AI model each time. produce a correlation table. Table 11 shows the results. Strong
The performance reflects the highest potential predictive signals are correlated mainly among themselves, but weak
power of each data domain. The results suggest that weak signals are not highly correlated to strong signals. An
signals contain reasonable predictive power, which is close to exception is that social security fund data and provident fund
the power contained by strong signals. This predictive power data are highly correlated with each other and both are highly
is the foundation for weak signals’ value in improving correlated to identity information and income asset
prediction accuracy. information. However, they are not highly correlated with

1818 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

other strong signals. In-app financial behavioral data and in- The ability of the AI model to enhance financial inclusion for
app nonfinancial behavioral data are highly correlated, but the underserved population comes from its improved
neither of them is correlated to any other data domains, prediction accuracy, which comes from the use of weak signals
including both strong signals and other weak signals. To and advanced algorithms. As a result, the AI model reduces
summarize, Tables 10 and 11 collectively support that weak reliance on strong signals which often favor the regular
signals provide valuable and additional information beyond population and disadvantage the underserved population. To
strong signals in predicting creditworthiness. further investigate the AI model’s ability to reduce statistical
discrimination, we focused on three strong features that are
considered important in traditional loan evaluation: urban
Heterogeneous Impacts and Statistical household designation, self-employed status, and the
Discrimination availability of credit history. These three features are also the
key differentiators between the regular and underserved
Although the adoption of the AI model increases the average populations. We posited that the importance of these features
approval rate for the underserved population, it may have unequal in underwriting will be reduced after the launch of the AI
effects across the underserved population. It is also possible for model. To assess this proposition, we reran Equation (1) by
the AI model to hurt certain subgroups in the underserved including interaction terms between “the initial launch of the
population. We first investigated the heterogeneous impacts of AI model” and these three features. The coefficients of these
the AI model on five subgroups categorized based on the five interaction terms indicate how the approval likelihood changes
criteria of the underserved population. We reran the basic DID
for each applicant type. Table 14 reports the results.
model within each subsample and report the coefficients of “the
initial launch of the AI model” on the approval rate and default
Table 14 shows that applicants who were not self-employed,
rate in Table 12. The AI model increases the approval rate for four
subgroups (except for applicants without permanent residence) were living in urban areas, and had a credit history had a
and reduces the default rate for all five subgroups. The results higher likelihood of being approved before the adoption of the
indicate that the AI model largely improves financial inclusion AI model. After the adoption of the AI model, their
across the underserved population except those with no advantages shrank significantly within both the regular and
permanent residence and helps banks better understand the underserved populations, as indicated by the fact that the
creditworthiness of the underserved population. coefficients of these interaction terms all have opposite signs.
This effect is even more substantial for the underserved
As one key engine of the AI model is the use of weak signals, we population: the approval advantage/preference shrank by
further analyzed the impact of the AI model on subgroups with 47.1% (i.e., 0.008/-0.017) for employed applicants, 52.0%
incomplete or missing weak signals. We generated dummy (i.e., -0.013/0.025) for applicants living in urban areas, and
variables to represent the status of missing data in each data 27.3% (i.e., -0.003/0.011) for applicants with a credit history.
domain and interact them with “the initial launch of the AI Although applicants who were self-employed, living in
model” in Equation (1) to investigate how the impact of the AI nonurban areas, and lacking a credit history were still
model on the approval rate varies with missing data. Because in- considered unfavorable compared to applicants who were
app financial behavioral data and in-app nonfinancial behavioral employed, living in urban areas, and having a credit history,
data come from the same data source, we combined them to the bias toward them (or the disadvantage faced by them) was
generate one dummy variable. We also generated one dummy reduced significantly by the adoption of the AI model.
variable to represent cases in which all weak signals were
missing. Table 13 reports the results.
Robustness Checks
The negative coefficients of the interaction terms indicate that
missing weak signals generally reduce the impact of the AI model
One prerequisite for a valid DID strategy is the parallel trends
on the approval rate. However, because the main effect of the AI
assumption. The treatment and control groups should follow
model is positive for the underserved population, subgroups with
similar trends on key dependent variables to enhance the
missing data are still better off, and the absolute impact of the AI
likelihood that the post-treatment change does come from the
model on the approval rate is still positive and significant. One
treatment itself. We tested this assumption using a leads/lags
potential reason is that all data domains of weak signals help the
model (relative time model) as shown by Equation (3).
AI model better understand the creditworthiness of the
underserved population, and most of the time, they help correct
𝑌𝑖𝑗𝑡 = 𝛽0 + ∑−5 +5
𝜏=−2 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 + ∑𝜏=0 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 +
the underestimated credit score generated by the traditional
𝛽2 𝐴𝑝𝑝𝑙𝑖𝑐𝑎𝑛𝑡 𝐶ℎ𝑎𝑟𝑎𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠𝑖 + 𝛽3 𝑀𝑜𝑑𝑒𝑙 𝑈𝑝𝑑𝑎𝑡𝑒𝑠𝑗𝑡 +
model. When one data domain is missing, the other data domains
can still provide helpful information to the AI model. 𝑇𝑖𝑚𝑒𝑡 + 𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑗 + 𝜀𝑖𝑗𝑡 (3)

MIS Quarterly Vol. 48 No. 4 / December 2024 1819


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 12. Heterogeneous Impacts of the AI Model on Five Underserved Subgroups


Approval (1/0) Default (1/0)
Total assets < 80,000 CNY 0.103*** (0.006) -0.008*** (0.001)
Individual income < 20,000 CNY 0.136*** (0.016) -0.010*** (0.001)
No permanent residence -0.005 (0.004) -0.003*** (0.001)
Long-time unemployed 0.094*** (0.003) -0.007*** (0.001)
Government benefits 0.154*** (0.014) -0.009*** (0.002)
Note: Each coefficient comes from a separate regression using Equations (1) and (2) and a subsample of the underserved population. For
example, Row 1 uses applicants with a total asset below 80,000 CNY in the underserved population.

Table 13. Heterogeneous Impacts of the AI Model on Subgroups with Missing Weak Signals
Dependent variable Approval (1/0)
Sample Regular population Underserved population
AI model × Missing
-0.003*** -0.005***
social security fund
(0.001) (0.002)
data
AI model × Missing -0.002* -0.020***
provident fund data (0.001) (0.008)
AI model × Missing 0.002 -0.005***
app usage data (0.002) (0.001)
AI model × Missing -0.004*** -0.080**
all weak signals (0.001) (0.027)
The initial launch of -0.008 -0.008 -0.007 -0.008 0.132*** 0.141*** 0.132*** 0.131***
the AI model (0.007) (0.006) (0.006) (0.006) (0.004) (0.004) (0.003) (0.007)
Other controls √ √ √ √ √ √ √ √
Time fixed effects √ √ √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √ √ √
Adjusted R2 0.049 0.051 0.047 0.052 0.028 0.027 0.028 0.025
No. of observations 7,927,519 7,927,519 7,927,519 7,927,519 1,372,588 1,372,588 1,372,588 1,372,588
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.

Table 14. The Impacts of the AI Model on Approval Preference: Heterogeneity Analysis Based on Three
Distinction Features
Dependent variable Approval (1/0)
Sample Regular population Underserved population
0.006*** 0.008***
AI model × Self-employed
(0.002) (-0.001)
AI model × Urban -0.001* -0.013***
household (-0.001) (-0.002)
-0.000 -0.003***
AI model × Credit history
(0.000) (0.000)
The initial launch of the AI -0.007 -0.007 -0.007 0.131*** 0.132*** 0.131***
model (-0.006) (-0.006) (-0.006) (-0.003) (-0.003) (-0.003)
-0.010*** -0.017***
Self-employed (1/0)
(-0.002) (-0.005)
0.021*** 0.025***
Urban household (1/0)
(-0.007) (-0.009)
0.013*** 0.011***
Has credit history (1/0)
(0.004) (0.000)
Other controls √ √ √ √ √ √
Time fixed effects √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √
Adjusted R2 0.062 0.058 0.065 0.030 0.031 0.031
No. of observations 7,927,519 7,927,519 7,927,519 1,372,588 1,372,588 1,372,588
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.

1820 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 15. The Impacts of the AI Model on the Approval Rate and Lending Performance: Relative Time Model
Sample Regular population Underserved population
Approval Utilization Approval Utilization
Dependent variable Default (1/0) Default (1/0)
(1/0) level (1/0) level
0.009 0.006 -0.049*** 0.001 0.008 -0.040***
Relative time -5 or beyond -5
(0.015) (0.005) (0.009) (0.003) (0.006) (0.008)
0.009 -0.004 -0.003 -0.015 -0.002 -0.007
Relative time -4
(0.010) (0.003) (0.007) (0.016) (0.004) (0.012)
0.010 -0.003 -0.008 -0.013 0.000 -0.010
Relative time -3
(0.015) (0.003) (0.012) (0.008) (0.002) (0.018)
0.003 -0.001 0.003 0.003 -0.005 0.003
Relative time -2
(0.010) (0.001) (0.002) (0.008) (0.003) (0.004)
Relative time -1 Baseline
-0.013* -0.005 0.027*** 0.010 -0.007** 0.013**
Relative time 0
(0.009) (0.004) (0.004) (0.009) (0.004) (0.005)
0.008 -0.001 0.037*** -0.011 -0.008 0.028***
Relative time 1
(0.013) (0.002) (0.013) (0.007) (0.007) (0.007)
0.030*** 0.004*** -0.044*** 0.062*** -0.007*** -0.024***
Relative time 2
(0.003) (0.002) (0.002) (0.008) (0.001) (0.004)
0.016*** 0.001 -0.052*** 0.088*** -0.009*** 0.013
Relative time 3
(0.004) (0.001) (0.025) (0.029) (0.003) (0.012)
-0.004* 0.000 0.053*** 0.133*** -0.008*** 0.023
Relative time 4
(0.003) (0.001) (0.019) (0.015) (0.003) (0.017)
-0.009 -0.004** 0.051*** 0.144*** -0.010*** 0.053***
Relative time 5 or beyond 5
(0.007) (0.001) (0.014) (0.013) (0.001) (0.008)
Other controls √ √ √ √ √ √
Time fixed effects √ √ √ √ √ √
Product fixed effects √ √ √ √ √ √
Adjusted R2 0.053 0.017 0.129 0.025 0.011 0.155
No. of observations 7,927,519 4,016,328 4,016,328 1,372,588 352,297 352,297
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors in parentheses are clustered at the application day level.

Equation (3) extends Equation (1) with one change. It splits We detected no heterogeneous trends or significant
the 𝐴𝐼 𝑀𝑜𝑑𝑒𝑙𝑗𝑡 variable into multiple relative time dummies differences between the treated product and the control
based on how far away the current observation is from the product for the approval rate and default rate before the
adoption time of the AI model in terms of weeks (we used a adoption of the AI model. The utilization level showed one
week dummy rather than a day dummy to smooth the trend). significant difference between the treated and control
These relative time dummies are always 0 for applications products, but it was far away (i.e., 5+ weeks) from the
of the control product. One and only one of them would be adoption date and is unlikely to have confounded our
1 for each application of the treated product. When τ is findings. These results support the parallel trends
negative, the observation is τ weeks before the adoption date. assumption and alleviate the concern that the selection of the
We collapsed all observations five weeks or more than five treated product, rather than the adoption of the AI model,
leads to the enhancement in financial inclusion.
weeks before adoption into the -5 relative time dummy and
removed the -1 relative time dummy to serve as the baseline.
Although the control product seems to be a good
When τ is nonnegative, the observation is τ weeks after the
counterfactual based on the results of the relative time model,
adoption date. We also collapsed all observations five weeks
there are other concerns about the self-selection bias of the AI
or more than five weeks after adoption into the +5 relative model at different levels and the confounding effect of the
time dummy. These nonnegative relative time dummies focal bank’s other actions. We summarize these concerns in
estimate how soon the AI model takes effect and whether the Table 16 and fully address each concern. In a nutshell, our
effect persists. Table 15 reports the results. main findings still hold across multiple robustness checks.

MIS Quarterly Vol. 48 No. 4 / December 2024 1821


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 16. Robustness Checks


Question Concerns to solve Solutions Results
domain
Self-selection at The business team of the treated product We compared the descriptive statistics of Table A1
the product level may have chosen to implement the AI the underserved population of two products
model because it would be easier to to check their differences.
include their underserved applicants.
Self-selection at The applicant pool may have changed We conducted a subsample analysis using Table A6
the applicant before and after the adoption of the AI a shorter time period of data to focus on a
level model for the treated product. consistent applicant pool.
General self- The applicants for the two products might We built a matched sample to focus on Table A7
selection have been different. similar applicants.
Manual Manual promotion for and evaluation of the We conducted a subsample analysis using Table A9
endeavors treated product might have happened online applications only (no manual
during the time period. intervention at all).
Lagged effects Manual efforts before Oct 1, 2020, might We conducted a heterogeneous analysis Table A10
of manual have lasted longer and confounded the based on bank branches (i.e., a proxy for
endeavors impact of AI. the level of manual efforts).
Unobserved The focal bank might have done something We conducted a placebo test of fake Table A11
actions/events else besides using the AI model to promote treatments using the pre-treatment data to
financial inclusion. rule out spurious connections.

Compliance and Generalizability behavior during loan or insurance application processes to


predict their default rate and fraud risk, has experienced three-
The generalizability of our findings is important, given that digit growth in recent years.13
using AI models and privacy-sensitive data is challenging
from both a legal perspective and a technical perspective.12 As Another challenge to generalizability is the technical
both strong and weak signals are often privacy sensitive, their capability and data availability. Not all financial institutions
use is strictly regulated by data privacy protection laws in have the capability to build a sophisticated AI model or have
different countries. These laws specify customers’ data rights the resources to access and process weak signals. We
and regulate firms’ data collection, data disclosure, data conducted a simulation analysis to address this
sharing, and data protection practices. Different countries generalizability concern by investigating the potential impacts
of simplified AI models. We trained different versions of
have developed various laws and regulations related to the use
simplified AI models and assessed their performance. The
of AI models and personal data, so it is important to
first version used one LightGBM model/learner only (instead
understand their details and the practices complying with
of four individual learners and one ensemble learner), and the
them. Although the focal bank operates in China, it developed
second to fifth versions used one data domain of weak signals
the AI model with multiple potential regulations in mind
only (instead of all four data domains). We reran Equations
because China often moves quickly on new regulations. Table (1) and (2) by assuming these AI models were implemented
17 summarizes the focal bank’s actions and solutions to at the adoption date and report the results in Table 18. The
regulatory requirements. We also complemented Table 17 results show that although the impacts of these simplified AI
with potential solutions from the literature, which can serve as models are weaker, they can still enhance financial inclusion
a roadmap for practitioners looking for solutions to specific to some extent. The impacts on the approval rate for the
regulatory issues. With the compliance issue in mind, underserved population ranged from 3.3% to 8.6%, whereas
businesses and financial institutions have developed novel the impacts on the default rate ranged from -0. 2% to -0.4%.
ways to incorporate weak signals in their assessment of In conclusion, even simple AI models can still enhance
customer creditworthiness, as reported in Table A12. For financial inclusion in a sizeable and significant way.
example, NeuroID, a company using applicants’ online

12 13
It is also challenging from an ethical perspective, but that is beyond the [Link]
scope of our paper. growth-attracts-industry-leader-to-further-drive-revenue-growth-
[Link]

1822 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table 17. Compliance on Consumer Lending and Data Privacy


Issue Laws and Requirements Solutions used in this project Potential solutions
domains regulations
Data CCPA GDPR Firms should ask for Permission was explained and Gopal et al.
collection permission to access and asked for in each loan (2023)
permission use consumer data. application.
Data privacy CCPA Firms should handle private (1) No data sharing with third Kwon & Johnson
GDPR and sensitive data carefully parties; (2) feature vectorization (2018), Gopal et al.
GLBA (e.g., in sharing with third was used (2023), Macha et al.
parties). (2023)
Protected ECOA Protected features cannot No protected feature was used, Langenbucher (2020),
features be used in credit scoring. and a post-training feature Hurlin et al. (2022)
review was conducted.
Lending ECOA Lending outcomes cannot (1) Extended sample of the Zhang (2018), Fu et
outcome discriminate against whole population was used in al. (2021), Kallus et al.
marginalized groups. model training; (2) manual (2022)
lending outcome review after
adopting the AI model
Data GDPR Firms should help N.A. Cai a& Zhu (2015),
accuracy consumers make sure their Talha et al. (2020)
data are correct.
Process GDPR Firms should explain how Multilayer AI explainability: data Dubber et al. (2020),
transparency data and models were used domain level and feature level Bucker et al. (2022)
in credit scoring.
Note: CCPA = California Consumer Privacy Act; ECOA = Equal Credit Opportunity Act; GDPR = General Data Protection Regulation; GLBA =
Gramm-Leach-Bliley Act

Table 18. Impacts of Simulated Simplified AI Models on Financial Inclusion


Regular population Underserved population
Approval (1/0) Default (1/0) Approval (1/0) Default (1/0)
-0.006 -0.008*** 0.150*** -0.009***
Actual AI model
(0.007) (0.003) (0.012) (0.002)
Simulated AI model with weak 0.003 -0.003 0.086*** -0.004***
algorithm (one learner only) (0.003) (0.003) (0.015) (0.001)
Simulated AI model with social -0.002 -0.002*** 0.033*** -0.002***
security fund data only (0.002) (0.001) (0.008) (0.000)
Simulated AI model with provident 0.008*** -0.001 0.058*** -0.002
fund data only (0.003) (0.001) (0.012) (0.002)
Simulated AI model with in-app -0.004*** -0.001 0.052*** -0.003***
financial behavioral data only (0.001) (0.001) (0.008) (0.001)
Simulated AI model with in-app -0.007** -0.004*** 0.058*** -0.004***
nonfinancial behavioral data only (0.006) (0.001) (0.008) (0.001)

Discussion and Conclusion examined the financial inclusion issue from another perspective
by investigating the role of modern information technologies.
Financial inclusion is essential for members of the underserved More specifically, we investigated the impacts of an AI-enabled
population, who are often from rural areas, are self-employed, credit scoring model on the underserved population’s approval
and lack a credit history. Enhancing financial inclusion is a rate, default rate, and utilization level. We collaborated with a
major task for promoting social justice because access to capital regional bank in China and took advantage of a rare event in
may influence opportunities for education, healthcare, which the focal bank deployed an AI model to work with its
employment, housing, etc. Common solutions, such as opening traditional model to evaluate one of its personal loan products.
more branches in rural areas or reducing requirements for the This setting allowed us to identify a similar personal loan
underserved population, can help enhance financial inclusion product during the same period as a control group and apply a
but often come with high operational costs or default risk. We DID strategy to estimate the impacts of the AI model.

MIS Quarterly Vol. 48 No. 4 / December 2024 1823


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Our findings suggest that the AI model enhances financial signals and that less sophisticated AI models have a weaker
inclusion without sacrificing loan performance. The AI model impact on financial inclusion. Future studies could seek to better
significantly increases the approval rate for the underserved understand the heterogeneous impacts of various AI models. It
population while decreasing the default rate and increasing the would also be interesting for future studies to investigate the
utilization level for the whole population. These results are impacts of AI models on other financial products. Third, the
encouraging because they help solve the persistent dilemma of “black-box” problem of AI models and the use of personal data
how to balance financial inclusion and loan performance. Our should be handled better by financial institutions. Policymakers
study solves this dilemma using an AI model to improve financial should work on actionable AI regulations to deal with data
inclusion without hurting lending performance. This is feasible privacy and financial exclusion issues.
because the AI model can leverage weak signals and advanced
algorithms, thus reducing statistical discrimination against the
underserved population and improving their access to capital. Acknowledgments
Our study also delves into the underlying mechanisms of why We thank Dr. Mingjie Zhu, the chairman and CEO of CraiditX, for
AI models can simultaneously increase the approval rate and his unwavering support of our six-year-long project, including the
reduce the default rate for the underserved population. The provision of manpower, materials, and other resources. Our gratitude
direct reason is that the AI model improves the prediction also extends to the co-editors of the special issue, Min-Seok Pang,
accuracy of the evaluation/underwriting process. Weak signals Atreyi Kankanhalli, Margunn Aanestad, Sudha Ram, and Likoebe M.
Maruping, as well as to the associate editor, Jennifer Jie Zhang, the
and advanced algorithms are two engines for improving
transparency editor, and the reviewers for their constructive and
prediction accuracy. Human experts and advanced techniques developmental feedback throughout the review process. Chunxiao Li
can generate novel and meaningful features from weak signals, expresses gratitude for the support from the National Natural Science
which are then used by machine learning algorithms to predict Foundation of China (NSFC) [Grant 72121001]. Hongchang Wang
creditworthiness. Our empirical analysis finds that both weak is the corresponding author of this paper.
signals and advanced algorithms can contribute to prediction
accuracy and the combination of them brings the largest
improvement, especially for the underserved population. References
Our study contributes to the social justice literature by Abrahams, C. R., & Zhang, M. (2008). Fair lending compliance:
investigating one of its foundations—financial inclusion. We Intelligence and implications for credit risk management. John
focused on one potential solution enabled by modern AI Wiley & Sons.
technologies and found that AI models can enhance financial Agarwal, S., Alok, S., Ghosh, P., & Gupta, S. (2020). Financial
inclusion without hurting lending performance. Our study also inclusion and alternate credit scoring for the millennials: Role of
contributes to the AI and consumer lending literature by big data and machine learning in fintech. SSRN. [Link]
10.2139/ssrn.3507827
revealing the impacts of an AI model on financial inclusion.
Armstrong, C., Craig, B., Jackson, W. E., & Thomson, J. B. (2013).
Although previous studies have documented the value of The moderating influence of financial market development on the
machine learning models and weak signals in improving relationship between loan guarantees for SMEs and local market
crediting scoring models, few studies have considered their employment rates. Journal of Small Business Management, 52(1),
impact on financial inclusion (Di Giuseppe, 2021; Liang et al., 126-140. [Link]
2018; Liu, 2022). Leveraging a unique opportunity arising Autor, D. (2014). Polanyi’s paradox and the shape of employment
when a financial institution introduced an AI model to one of its growth (NBER Working Paper No. 20485). National Bureau of
personal loan products, we were able to use a DID identification Economic Research. [Link]
strategy to assess the impacts of the AI model on financial Awaworyi-Churchill, S. (2019). Microfinance financial sustainability
and outreach: Is there a trade-off? Empirical Economics, 59(3),
inclusion for the underserved population.
1329-1350. [Link]
Bao, Z., & Huang, D. (2021). Shadow banking in a crisis: Evidence
While our analysis demonstrates the value of AI models in from Fintech during COVID-19. Journal of Financial and
reducing statistical discrimination against the underserved Quantitative Analysis, 56(7), 2320-2355. [Link]
population and improving their financial inclusion, it is important S0022109021000430
to recognize the limitations of AI models. First, AI models cannot Bartlett, R., Morse, A., Stanton, R., & Wallace, N. (2022). Consumer-
address structural discrimination, which can cause inequity in lending discrimination in the fintech era. Journal of Financial
wealth distribution, access to education, or career advancement Economics, 143(1), 30-56. [Link]
and in turn affects applicants’ creditworthiness. Such structural 05.047
Brown, J. R., Cookson, J. A., & Heimer, R. Z. (2019). Growing up
discrimination would require systematic policy interventions.
without finance. Journal of Financial Economics, 134(3), 591-616.
Second, our analysis shows that the effect of the AI model on [Link]
financial inclusion is weaker for subgroups with missing weak

1824 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Bucker M, Szepannek G, Gosiewska A, Biecek P (2022). Godbillon-Camus, B., & Godlewski, C. J. (2005). Credit risk
Transparency, auditability, and explainability of machine learning management in banks: Hard information, soft information and
models in credit scoring. Journal of the Operational Research manipulation. SSRN. [Link]
Society, 73(1), 70-90. [Link] abstract_id=882027
1922098 Gomber, P., Kauffman, R. J., Parker, C., & Weber, B. W. (2018). On
Burtch, G., & Chan, J. (2019). Investigating the relationship between the Fintech revolution: Interpreting the forces of innovation,
medical crowdfunding and personal bankruptcy in the United disruption, and transformation in financial services. Journal of
States: Evidence of a digital divide. MIS Quarterly, 43(1), 237-262. Management Information Systems, 35(1), 220-265. [Link]
[Link] 10.1080/07421222.2018.1440766
Cai, L., & Zhu, Y. (2015). The challenges of data quality and data Gopal, R. D., Hidaji, H., Kutlu, S. N., Patterson, R. A., & Yaraghi, N.
quality assessment in the big data era. Data Science Journal, 14(2), (2023). Law, economics, and privacy: Implications of government
1-10. [Link] policies on website and third-party information sharing.
Chen, Z., Liu, Y.J., Meng, J., & Wang, Z. (2023). What’s in a face? An Information Systems Research, 34(4), 1375-1397. [Link]
experiment on facial information and loan-approval decision. 10.1287/isre.2022.1178
Management Science, 69(4), 2263-2283. [Link] Gross, J. P. K., Cekic, O., Hossler, D., & Hillman, N. (2010). What
mnsc.2022.4436 matters in student loan default: A review of the research literature.
Cowgill, B., Dell’Acqua, F., Deng, S., Hsu, D., Verma, N., & Journal of Student Financial Aid, 39(1), Article 2.
Chaintreau, A. (2020). Biased programmers? or biased data? A [Link]
field experiment in operationalizing AI ethics. SSRN. Gunnarsson, B. R., vanden Broucke, S., Baesens, B., Óskarsdóttir, M.,
[Link] & Lemahieu, W. (2021). Deep learning for credit scoring: Do or
Cull, R., Demirgüç–Kunt, A., & Morduch, J. (2011). Microfinance don’t? European Journal of Operational Research, 295(1), 292-
trade-offs: Regulation, competition and financing. In B. 305. [Link]
Armendáriz & M. Labie (Eds.), The handbook of microfinance, Hao, K., & Stray, J. (2019). Can you make AI fairer than a judge? Play
(pp. 141-157). World Scientific. [Link] our courtroom algorithm game. MIT Technology Review.
4295666_0007 [Link]
Di Giuseppe, D. (2021). Credit scoring model using machine learning than-judge-criminal-risk-assessment-algorithm/
(Working paper). Available at [Link] Hiller, J. S. (2020). Fairness in the eyes of the beholder: AI, fairness,
_DI%20GIUSEPPE_DAVIDE.pdf and alternative credit scoring. West Virginia Law Review, 123(3),
Dobbie, W., Liberman, A., Paravisini, D., & Pathania, V. (2021). 907-935. [Link]
Measuring bias in consumer lending. The Review of Economic Hou J, Zhang J, Zhang K (2023) Pictures that are worth a thousand
Studies, 88(6), 2799-2832. [Link] donations: How emotions in project images drive the success of
Dubber, M. D., Pasquale, F., & Das, S. (2020). The Oxford handbook online charity fundraising campaigns? An image design
of ethics of AI. Oxford University Press. perspective. MIS Quarterly, 47(2), 535-584. [Link]
Engel, K. C., & McCoy, P. A. (2001). A tale of three markets: The law 10.25300/MISQ/2022/17164
and economics of predatory lending. SSRN. [Link] Hurley, M., & Adebayo, J. (2016). Credit scoring in the era of big data.
com/sol3/[Link]?abstract_id=286649 Yale Journal of Law & Technology., 18, 148-202.
Fang, H., & Moro, A. (2011). Theories of statistical discrimination and Hurlin C, Perignon C, Saurin S (2022) The fairness of credit scoring
affirmative action: A survey. In J. Benhabib, A. Bisin, & M. O. Jackson models. SSRN. [Link]
(Eds.), Handbook of social economics (pp. 133-200). North-Holland. id=3785882
[Link] Iyer, R., Khwaja, A. I., Luttmer, E. F., & Shue, K. (2016). Screening peers
Fu, R., Huang, Y., & Singh, P. V. (2021). Crowds, lending, machine, softly: Inferring the quality of small borrowers. Management
and Bias. Information Systems Research, 32(1), 72-92. Science, 62(6), 1554-1577. [Link]
[Link] Jagtiani, J., & Lemieux, C. (2019). The roles of alternative data and
Fügener, A., Grahl, J., Gupta, A., & Ketter, W. (2022). Cognitive machine learning in fintech lending: Evidence from the
challenges in human–artificial intelligence collaboration: LendingClub consumer platform. Financial Management, 48(4),
Investigating the path toward productive delegation. Information 1009-1029. [Link]
Systems Research, 33(2), 678-696. [Link] Jappelli, T. (1990). Who is credit constrained in the U.S. economy? The
2021.1079 Quarterly Journal of Economics, 105(1), 219-234.
Gao, H., Kumar, S., Tan, Y. (Ricky), & Zhao, H. (2022). Socialize [Link]
more, pay less: Randomized field experiments on social pricing. Johnson, K. N. (2019). Examining the use of alternative data in
Information Systems Research, 33(3), 935-953. [Link] underwriting and credit scoring to expand access to credit (Tulane
10.1287/isre.2021.1089 Public Law Research Paper, 19-7). SSRN. [Link]
Garfinkel, S., Matthews, J., Shapiro, S. S., & Smith, J. M. (2017). 10.2139/ssrn.3481102
Toward algorithmic transparency and accountability. Kallus, N., Mao, X., & Zhou, A. (2020). Assessing algorithmic fairness
Communications of the ACM, 60(9), Article 5. [Link] with unobserved protected class using data combination. In
10.1145/3125780 Proceedings of the 2020 Conference on Fairness, Accountability,
Gianfrancesco, M. A., Tamang, S., Yazdany, J., & Schmajuk, G. and Transparency. [Link]
(2018). Potential biases in machine learning algorithms using Kang, L., Jiang, Q., & Tan, C.-H. (2017). Remarkable advocates: An
electronic health record data. JAMA Internal Medicine, 178(11), investigation of geographic distance and social capital for
1544. [Link]

MIS Quarterly Vol. 48 No. 4 / December 2024 1825


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

crowdfunding. Information & Management, 54(3), 336-348. Lu, T., Zhang, Y., & Li, B. (2023). Profit vs. equality? The case of
[Link] financial risk assessment and a new perspective of alternative data.
Kankanhalli, A., Charalabidis, Y., & Mellouli, S. (2019). IOT and AI MIS Quarterly, 47(4), 1517-1556. [Link]
for smart government: A research agenda. Government 2023/17330
Information Quarterly, 36(2), 304-309. [Link] Lu, Y., Gu, B., Ye, Q., & Sheng, Z. (2012). Social influence and
[Link].2019.02.003 defaults in peer-to-peer lending networks. The Proceedings of 33rd
Kim, D. (2019). The importance of detailed patterns of herding International Conference on Information Systems. [Link]
behaviour in a P2P lending market. Applied Economics Letters, [Link]/icis2012/proceedings/DigitalNetworks/19
27(2), 127-130. [Link] Lusardi, A., & Scheresberg, C. de B. (2013). Financial literacy and
Kwon J, Johnson M (2018). Meaningful healthcare security: Does high-cost borrowing in the United States (NBER working paper
meaningful-use attestation improve information security #18969). National Bureau of Economic Research. [Link]
performance? MIS Quarterly, 42(4), 1043-1067. [Link] 10.3386/w18969
10.25300/MISQ/2018/13580 Ma, L., Zhao, X., Zhou, Z., & Liu, Y. (2018). A new aspect on P2P
Langenbucher K (2020). Responsible AI-based credit scoring: A legal online lending default prediction using meta-level phone usage
framework. European Business Law Review, 31(4), 527-572. data in China. Decision Support Systems, 111, 60-71.
[Link] [Link]
Laouénan, M., & Rathelot, R. (2022). Can information reduce ethnic Macha M, Foutz N, Li B, Ghose A (2023). Personalized privacy
discrimination? Evidence from Airbnb. American Economic preservation in consumer mobile trajectories. Information Systems
Journal: Applied Economics, 14(1), 107-132. [Link] Research, 35(1), 249-271. [Link]
10.1257/app.20190188 Martin, K. (2019). Designing ethical algorithms. MIS Quarterly
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, Executive, 18(2), 129-142. [Link]
521(7553), 436-444. Mejia J, Parker C (2021) When transparency fails: Bias and financial
Lee, G. M., Naughton, J. P., Zheng, X., & Zhou, D. (2020). Predicting incentives in ridesharing platforms. Management Science, 67(1),
litigation risk via Machine Learning. SSRN. [Link] 166-184. [Link]
com/sol3/[Link]?abstract_id=3740954 Mendonça, S., Cardoso, G., & Caraça, J. (2012). The strategic strength
Leyshon, A., & Thrift, N. (1995). Geographies of financial exclusion: of weak signal analysis. Futures, 44(3), 218-228. [Link]
Financial abandonment in Britain and the United States. 10.1016/[Link].2011.10.004
Transactions of the Institute of British Geographers, 20(3), 312- Mitlin, D. (2008). Urban poor funds: Development by the people for
314. [Link] the people. IIED.
Li, J., & Hu, J. (2019). Does university reputation matter? Evidence Monk, A., Prins, M., & Rook, D. (2019). Rethinking alternative data in
from peer-to-peer lending. Finance Research Letters, 31, 66-77. institutional investment. The Journal of Financial Data Science,
[Link] 1(1), 14-31. [Link]
Liang, F., Das, V., Kostyuk, N., & Hussain, M. M. (2018). Neumann, N., Tucker, C. E., Kaplan, L., Mislove, A., & Sapiezynski,
Constructing a data-driven society: China’s social credit system as P. (in press). Data deserts and black box bias: The impact of socio-
a state surveillance infrastructure. Policy & Internet, 10(4), 415- economic status on consumer profiling. Management Science.
453. [Link] Advance online publication. [Link]
Liberti, J. M. (2018). Initiative, incentives, and soft information. 2023.4979
Management Science, 64(8), 3714-3734. [Link] Nowak, A., Ross, A., & Yencha, C. (2018). Small business borrowing
mnsc.2016.2690 and peer‐to‐peer lending: Evidence from lending club.
Liberti, J. M., & Petersen, M. A. (2019). Information: Hard and soft. Contemporary Economic Policy, 36(2), 318-336. [Link]
Review of Corporate Finance Studies, 8(1), 1-41. [Link] 10.1111/coep.12252
10.1093/rcfs/cfy009 O'neil, C. (2017). Weapons of math destruction: How big data
Lin, M., Prabhala, N. R., & Viswanathan, S. (2013). Judging borrowers increases inequality and threatens democracy. Crown.
by the company they keep: Friendship networks and information Ozler, S. (1992). Have commercial banks ignored history? (NBER
asymmetry in online peer-to-peer lending. Management Science, working paper #3959). National Bureau of Economic Research.
59(1), 17-35. [Link] [Link]
Liu, D., Brass, D. J., Lu, Y., & Chen, D. (2015). Friendship in online peer- Rudin, C. (2019). Stop explaining black box machine learning models
to-peer lending: Pipes, prisms, and relational herding. MIS Quarterly, for high stakes decisions and use interpretable models instead.
39(3), 729-742. [Link] Nature Machine Intelligence, 1(5), 206-215. [Link]
Liu, M. (2022). Assessing human information processing in lending 10.1038/s42256-019-0048-x
decisions: A machine learning approach. Journal of Accounting Schmidhuber, J. (2015). Deep learning in neural networks: An
Research, 60(2), 607-651. [Link] overview. Neural Networks, 61, 85-117. [Link]
Loufield, E., Ferenzy, D., & Johnson, T. (2018). Accelerating financial [Link].2014.09.003
inclusion with new data. Center for Financial Inclusion. Serrano-Cinca, C., & Gutiérrez-Nieto, B. (2016). The use of profit
[Link] scoring as an alternative to credit scoring systems in peer-to-peer
financial-inclusion-with-new-data/ (P2P) lending. Decision Support Systems, 89, 113-122.
Lu, J., Lee, D. (DK), Kim, T. W., & Danks, D. (2019). Good [Link]
explanation for algorithmic transparency. SSRN. [Link] Seibert, S. E., Kraimer, M. L., & Liden, R. C. (2001). A social capital
[Link]/sol3/[Link]?abstract_id=3503603 theory of career success. Academy of Management Journal, 44(2),
219-237. [Link]

1826 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Talha M, Elmarzouqi N, Kalam A (2020). Towards a powerful solution About the Authors
for data accuracy assessment in the big data context. International
Journal of Advanced Computer Science and Applications, 11(2), Chunxiao Li is an associate professor at the School of Sci-tech
419-429. [Link] Business and the School of Management at the University of Science
Tantri, P. (2021). Fintech for the poor: Financial intermediation without and Technology of China. Her research primarily focuses on the areas
discrimination. Review of Finance, 25(2), 561-593. [Link] of fintech and AI ethics, with a particular emphasis on user and
10.1093/rof/rfaa039 machine behavior. She is dedicated to exploring fairness and
Teodorescu, M., Morse, L., Awwad, Y., & Kane, G. (2021). Failures interpretability in AI systems, the elimination of bias, and the
of fairness in automation require a deeper understanding of human- identification of irrational behavior. Her research has been published in
ML augmentation. MIS Quarterly, 45(3), 1483-1500. journals such as Journal of the Association for Information Systems and
[Link] IEEE Transactions on Knowledge and Data Engineering. She has
United Nations. (1995). Report of the world summit for social received two best paper runner-up awards. She received her Ph.D.
development. United Nations. [Link] degree from the W. P. Carey School of Business at Arizona State
document/auto-insert-189595/ University and has previously worked at the Antai College of
United Nations. (2006). Social justice in an open world: The role of the Economics and Management, Shanghai Jiao Tong University. ORCiD:
United Nations. [Link] 0000-0002-6946-9726
[Link]
United Nations. (2015). Economic and social survey of Asia and The
Hongchang Wang is an assistant professor in information systems at
Pacific 2015: Making growth more inclusive for sustainable
the Naveen Jindal School of Management in the University of Texas at
development. United Nations. [Link]
Dallas. His research investigates the economic and social impacts of
828926?ln=en&v=pdf
information systems, financial technologies, artificial intelligence, and
Wei, Y., Yildirim, P., Van den Bulte, C., & Dellarocas, C. (2016).
digital platforms. Implications of his research cover areas and industries
Credit scoring with social network data. Marketing Science, 35(2),
such as enterprise systems (e.g., ERP, SCM, CRM), online lending
234-258. [Link]
(e.g., Lending Club, [Link], traditional banks), online
World Bank. (2010) Financial inclusion, poverty reduction and
accommodation (e.g., Airbnb), and blockchain applications (e.g.,
economic growth. [Link]
NFT). His research has appeared in Management Science and other
2010/11/10/financial-inclusion-poverty-reduction-economic-
outlets. He has received three Best Paper runner-up awards and a Best
growth
Reviewer award. ORCiD: 0000-0002-8707-2810
World Bank. (2023) Financial inclusion overview. [Link]
[Link]/en/topic/financialinclusion/overview
Xu, J. J., & Chau, M. (2018). Cheap talk? The impact of lender- Songtao Jiang is the senior scientist at CraiditX, Inc. His research
borrower communication on peer-to-peer lending outcomes. primarily concentrates on the identification and mitigation of risks in
Journal of Management Information Systems, 35(1), 53-85. transactions and financial lending. This critical work aids in the
[Link] prevention of financial crimes such as money laundering and
Yawe, B., & Prabhu, J. (2015). Innovation and financial inclusion: A telecommunications fraud, and supports financial institutions in
review of the literature. Journal of Payments Strategy & Systems, minimizing potential losses, thereby enhancing decision-making
9(3), 215-228. [Link] processes and business management strategies. He received his
Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & bachelor of science degree in finance from the Southwestern University
Oermann, E. K. (2018). Confounding variables can degrade of Finance and Economics in Chengdu, China, in 2014, followed by a
generalization performance of radiological deep learning models. master of science degree in financial risk management from the
arXiv. [Link] University of Leeds, UK, in 2015. ORCiD: 0000-0001-6860-6998
Zhang, S., Mehta, N., Singh, P. V., & Srinivasan, K. (2021). Frontiers:
Can an artificial intelligence algorithm mitigate racial economic Bin Gu is Everett W. Lord Distinguished Faculty Scholar, professor,
inequality? An analysis in the context of Airbnb. Marketing and department chair of information systems at the Questrom School
Science, 40(5), 813-820. [Link] of Business, Boston University. His research interests are in using
Zhang, Y. (2018). Assessing fair lending risks using race/ethnicity information technologies and artificial intelligence to address
proxies. Management Science, 64(1), 178-197. [Link] information asymmetry and social inequity in business and society. His
10.1287/mnsc.2016.2579 research has been published in Management Science, MIS Quarterly,
Zhu, N. (2011). Household consumption and personal bankruptcy. The Information Systems Research, and Journal of Management
Journal of Legal Studies, 40(1), 1-37. [Link] Information Systems, among others. He received his Ph.D. from the
649033 University of Pennsylvania. ORCiD: 0000-0002-0396-8899

MIS Quarterly Vol. 48 No. 4 / December 2024 1827


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Appendix
Table A1 shows the descriptive statistics of the applicants from the underserved population for the treated and control product. The application
ratio and the features of these underserved applicants are very similar. Figure A1 describes the kernel of the AI model. Building upon the 10
data domains, the development team used various feature-generation techniques to create features within each data domain. For example, a
natural-language processing technique generated features from text messages and news content. A time-series technique generated features
from a sequence of provident fund deposits. This process led to four groups of features, which were used to train four individual learners
predicting default (multiple individual learners, compared to one learner, can better utilize all features and integrate related features). Each
learner was a LightGBM model, outperforming other state-of-the-art machine learning models in this context. Eventually, an ensemble learner
(also a LightGBM model) combined the results from these four individual learners and served as the final model to generate credit scores.

Table A1. Descriptive Statistics of the Underserved Applicants for Two Products
Treated product Control product
Percent of underserved population
27.53% 19.63%
among all applicants
Approval rate of the underserved
17.50% 16.67%
population
Percent of urban households within the
15.17% 13.02%
underserved population
Percent of self-employed within the
82.72% 79.67%
underserved population
Percent with income data within the
10.52% 11.88%
underserved population
Percent with credit history within the
48.62% 53.23%
underserved population
Note: We show the descriptive statistics of the underserved population applicants for the treated and control products.

Figure A1. Kernel of the AI Model

1828 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Developed by Microsoft, LightGBM, short for light gradient-boosting machine, represents an avant-garde gradient-boosting framework,
catering predominantly to ranking, classification, and regression tasks (Ke et al., 2017). LightGBM is based on decision tree algorithms and
gives a prediction model in the form of an ensemble of simple decision trees. LightGBM shares many advantages with XGBoost, including
sparse optimization, parallel training, multiple loss functions, regularization, bagging, and early stopping (Wikipedia 2023). Diverging from
XGBoost’s level-wise tree growth strategy, LightGBM employs a leaf-wise tree growth strategy, which consequently prunes inefficient splits
to optimize operational speed. It embarks on an innovative trajectory with its histogram-based optimization for continuous features,
compressing the search space during node splits, and minimizing memory usage during training. Compared to frameworks like XGBoost and
CatBoost, LightGBM comes with enhanced computational efficiency and reduced memory consumption, maintaining commendable model
accuracy.

Positioning itself at the forefront of financial analytics, LightGBM transcends traditional models, as evidenced by its commendable
achievements in academic research and real-world applications. In academic research, Ma et al. (2018) showed its efficacy in enhancing P2P
lending models, marking a tangible reduction in loan defaults. Sun et al. (2020) demonstrated LightGBM’s superior predictive accuracy in
forecasting cryptocurrency trends compared to traditional models like support vector machines and random forests. Wang et al. (2022) utilized
LightGBM to evaluate the finance risk of 186 firms. Their exhaustive experiments, benchmarking LightGBM against other algorithms,
consistently underscored its superior predictive prowess in this domain. In industry, within the competitive milieu of Kaggle, from 2016 to
2019, LightGBM notched more than 30 top-three placements, distinguishing itself in flagship contests such as CIKM AnalytiCup 2017 and
IEEE Fraud Detection. Amazon has also incorporated LightGBM into platforms like SageMaker for distributed training since January 2023.

In this project, the development team utilized the open source code of LightGBM ([Link] and applied it
five times in the AI model. The four individual learners were trained separately by LightGBM using different sets of features. The ensemble
learner then integrated their predictions and generated the final scores (Sawhney et al., 2020). The development team applied this two-layer
structure for three reasons: (1) Training distinct LightGBM models tailored to specific objectives enhances model efficiency. Conversely,
merging all features into a singular model heightens the risk of overfitting; (2) This structure not only magnifies model interpretability by
demarcating feature contributions but also ensures optimal performance across models; (3) LightGBM’s inherent “exclusive feature
bundling” mechanism can amalgamate highly collinear features. Thus, when strong and weak signals coexist in a single model, their inter-
feature bundling can obscure the individual contributions of each to the predictive model.

Table A2 reveals three findings: first, weak signals have substantial predictive value because they can improve prediction accuracy even
when used by the traditional model; second, the best combination is to use strong and weak signals by the AI model, which can further
improve prediction accuracy; and third, the improvement brought by weak signals and advanced algorithms is more important for the
underserved population than the regular population.

Table A2. Mechanism Analysis on Weak Signals


Traditional model Traditional model AI model
Prediction The real one in use on A simulated model using The real one launched on
Performance\model 02/04/2021 both strong and weak 02/04/2021
signals
Column (1) (2) (3)
Panel A: Regular population
AUC 0.57 0.65 0.73
KS 0.12 0.21 0.34
Top KS 0.12 0.13 0.24
Panel B: Underserved population
AUC 0.57 0.67 0.82
KS 0.08 0.24 0.43
Top KS 0.08 0.16 0.25
Note: This analysis measures how well each model can predict the loan performance of the approved borrowers in a later period. The testing sample
period is from 02/04/2021 to 04/30/2021, which covers the period after the adoption of the AI model. Panel A reports the results from the regular
population, while Panel B reports the results from the underserved population. Column 1 reflects the prediction accuracy of the traditional model,
which is the baseline. We applied AUC and KS as two performance measures on each model’s overall score ranking efficiency. Considering that it
is more challenging to rank top borrowers, we also compared top KS by focusing on only the top 15% of borrowers in terms of credit scores. The
values in Column 2 minus the values in Column 1 indicate the additional value of weak signals under the traditional model, whereas the values in
Column 3 minus the values in Column 1 indicate the additional value of advanced algorithms using both strong and weak signals.

MIS Quarterly Vol. 48 No. 4 / December 2024 1829


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table A3 illustrates how the traditional approach (i.e., human expertise) and the advanced approaches (i.e., knowledge graph, time series,
natural language processing) generate features from weak signals. It is evident that weak signals are hard for human experts to use because
only limited features can be generated using human intuition. However, the advanced approaches can fit into different data natures and
generate novel and meaningful features, consequently serving as the foundation for the AI model. Table A4 reveals two patterns: first,
advanced algorithms do lead to higher prediction accuracy, which is the case in both the regular and underserved populations; second,
although the prediction accuracy is worse for the underserved population than for the regular population using the traditional model, the
performance for the underserved population becomes closer to (or even better than) that for the regular population using the AI model. In a
nutshell, advanced algorithms (e.g., a complex machine learning algorithm) are one of the mechanisms that can enhance financial inclusion
and are thus especially important for the underserved population.

Table A3. Feature Generation for Weak Signals


Data domain Feature Justification for Sample features Logical meanings/arguments of
generation the selected the features
method method
In-app Human When the raw Count of entertainment news visit News taste is related to personality
nonfinancial expertise information is and everyday focus.
data straightforward Active app usage length within 7 Extreme cases (too active or too
to think of days inactive) are often risky.
Natural When the Rude and offensive words used It is related to one’s personality.
language content of in comments and replies
processing browsing is Main topics/interests of browsed Applicants’ everyday interests are
important videos associated with personality and
probably creditworthiness.
Time series When the trend The trend of app activity level A decreasing trend of app activity
is important over 60 days probably means the applicant
becomes busier.
The change of article topic A shift in reading focus probably
distribution over 60 days. means a shift in life.
Knowledge When the social The number of overlapping major A shift in social groups probably
graph network of online online social communities/groups means a shift in life.
communities is over 60 days
important. The number of defaulted Applicants with more connections
consumers in the applicant’s that have defaulted are also more
social groups. likely to default.

Table A4. Mechanism Analysis on Advanced Algorithms


Traditional model AI model AI model
Prediction The real one in use on A part of the real one A simulated model using
performance\model 02/04/2021 launched on 02/04 using strong signals only
strong signals only
Column (1) (2) (3)
Panel A: Regular population
AUC 0.57 0.61 0.70
KS 0.12 0.15 0.30
Top KS 0.12 0.12 0.17
Panel B: Underserved population
AUC 0.57 0.66 0.73
KS 0.08 0.25 0.35
Top KS 0.08 0.15 0.15
Note: We followed the logic in Table A2 to generate this table. Column 1 reflects the prediction accuracy of the traditional model, which is the
baseline. The values in Column 2 minus the values in Column 1 indicate the actual improvement from advanced algorithms with strong signals,
whereas the values in Column 3 minus the values in Column 1 indicate the best possible improvement from advanced algorithms with strong
signals.

1830 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table A5 illustrates how the traditional approach (i.e., human expertise) and the advanced approaches (i.e., knowledge graph, time series)
generate features from strong signals. The novel and meaningful features generated by the advanced approaches serve as the foundation for
the AI model. Table A6 addresses the concern that the changes in the applicant pool of the treated product confounded the impacts of the AI
model. We reran the main Equations using shorter time periods, which were two weeks/four weeks/six weeks before and after the adoption
date. The trade-off is that a short time window may help rule out confounding events/changes, but it may not be long enough to capture the
entire impact. The results are largely consistent with those from our main sample: the adoption of the AI model increases the approval rate
and reduces the default rate for the underserved population. Therefore, changes in the applicant pool are not a major concern for our study.

Table A5. Feature Generation for Strong Signals


Data domain Feature Justification for Sample features Logical meanings/arguments
generation the selected of the features
method method
Identity Human When the raw Residence zip code Zip code information largely
information expertise information is reflects the health of the local
straightforward economy.
and stand-alone Occupation Occupation information largely
reflects income consistency.
Knowledge When the The number of defaulted Applicants with more connections
graph implication of consumers associated with the that are defaulted are also more
trustworthiness is applicant’s organization likely to default.
embedded in an The social network density of The positions of applicants in
applicant’s social the applicant their social networks reflect their
connections lifestyles.
The social network centrality of The positions of applicants in
the applicant their social networks reflect their
lifestyles.
Credit product Human When the raw Number of credit product This information reflects the
application expertise information is applications in 6 months demand of the applicant.
straightforward Number of active credit lines This information reflects the
burden of the applicant.
Time series When the trend is A change in the product type This information reflects the
important (e.g., personal loans to payday magnitude of the demand of the
loans) applicant.
A change in lender (e.g., This information reflects the
traditional banks to online eligibility change of the applicant.
lenders)

Table A6. The Impacts of the AI Model on The Approval Rate and Lending Performance: Shorter Time
Period Analysis
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data period
variable (1/0) (1/0) level (1/0) (1/0) level
Six weeks The initial
-0.006 -0.002*** 0.047*** 0.104*** -0.008*** 0.065***
before and launch of the AI
(0.005) (0.001) (0.018) (0.013) (0.001) (0.019)
after model
Four weeks The initial
-0.006 -0.001*** 0.035** 0.089*** -0.007*** 0.072***
before and launch of the AI
(0.005) (0.001) (0.019) (0.011) (0.002) (0.008)
after model
Two weeks The initial
-0.002 -0.001 0.029 0.045*** -0.005 0.034
before and launch of the AI
(0.002) (0.000) (0.023) (0.008) (0.004) (0.027)
after model
Note: We reran Equations (1) and (2) using alternate time periods as a robustness check and report the coefficients of “the initial launch of the
AI model” on the approval rate and default rate in this table.

MIS Quarterly Vol. 48 No. 4 / December 2024 1831


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table A7 reports the results from a matched sample analysis. We conducted a coarsened exact matching approach and matched the applicants
for the treated product and the control product. We generated a variable “underserved” and used it in the matching approach so the
underserved (regular) applicants applying for the treated product were matched to the underserved (regular) applicants applying for the
control product (M-to-N match). We also used age, gender, with income data, with credit history, self-employed, urban household, and annual
income as matching variables. As a result of this matching approach, we ended up with a well-balanced sample of around 25% of the original
sample size (the balance check result is reported in Table A8). This process helped us build a sample that contained similar applicants who
applied for either product both before or after the adoption of AI. The results in Table 7 are consistent with our main findings. Table A9
reports the results from a subsample analysis using online applicants only. The focal bank never promoted the treated product to the
underserved population via online channels. Therefore, all online applications were unlikely to suffer from human intervention. The impacts
of the AI model for the underserved population are similar to our main findings.

Table A7. The Impacts of the AI Model on the Approval Rate and Lending Performance: Matched Sample
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data
variable (1/0) (1/0) level (1/0) (1/0) level
The initial
Matched 0.005 -0.002*** 0.036*** 0.096*** -0.004*** 0.014***
launch of the
Sample (0.004) (0.000) (0.005) (0.037) (0.002) (0.004)
AI model
Note: We rerun Equations (1) and (2) using the matched sample and report the coefficients of “the initial launch of the AI model” on the approval
rate, default rate, and utilization level in this table.

Table A8. Balance Check on the Matched Sample


Matching Variables Mean value of the Mean value of p-value of
treated group the control group the differences
Underserved 29.2% 29.4% 0.715
Age 35.8 35.7 0.933
Gender = female 47.4% 47.4% 0.312
With income data 21.2% 21.1% 0.451
With credit history 71.2% 71.9% 0.102
Self-employed 25.3% 25.6% 0.641
Urban household 73.7% 74.3% 0.077
Annual income 115,130 117,026 0.364

Table A9. The Impacts of the AI Model on the Approval Rate and Lending Performance: Online
Applications
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data
variable (1/0) (1/0) level (1/0) (1/0) level
The initial
Online 0.010*** -0.005*** 0.030*** 0.110*** -0.007*** 0.014
launch of the
applications (0.003) (0.000) (0.013) (0.007) (0.001) (0.033)
AI model
Note: We reran Equations (1) and (2) using applicants who were from the online channel as a robustness check (there was barely human
intervention in this process) and report the coefficients of “the initial launch of the AI model” on the approval rate, default rate, and utilization
level in this table.

Table A10 reports the results from three subsample analyses that split the whole sample based on the location of bank branches. Given the
concern that the effect of manual promotion and evaluation may be at play and last for a while, we distinguished the bank branches that were
not likely to suffer from this intervention (i.e., bank branches in big cities) from those that were likely to suffer from this intervention (i.e.,
bank branches in medium and small cities). The working assumption is that because the focal bank sent almost all its special personnel to the
branches in medium and small cities/towns, the impact identified for applications via branches in big cities likely comes from the AI model
only. The results are consistent with our main findings, with an increase in the approval rate and a decrease in the default rate for applicants
from the underserved population in big cities.

1832 MIS Quarterly Vol. 48 No. 4 / December 2024


Li et al. / The Effect of AI-Enabled Credit Scoring on Financial Inclusion

Table A11 reports the results from three placebo tests. Given the concern that the focal bank might conduct multiple efforts to echo the
government policies, we assume that the focal bank has indeed taken some unobserved actions to promote financial inclusion and call them
fake treatments. Relying on the same specifications and using the time period before the adoption of the AI model, we were able to identify
the impacts of these fake treatments. As is shown in Table A11, the level of human intervention is largely insignificant, and the magnitude
of the impact of human intervention on the approval rate is about 1/10 of the impact of the AI model (i.e., 0.018 compared to 0.150). More
importantly, the impact of human intervention, if there was any, is positive on the default rate. The results further support that AI models can
increase the approval rate and reduce the default rate simultaneously, which manual interventions cannot easily achieve. Table A12
summarizes how weak signals are used in various contexts (outside of China), as documented in the literature. Our literature review shows
that although many data domains and features may seem sensitive or private, they have been widely used by businesses both in China and in
the U.S., which is appropriate as long as they obtain consent from customers and comply with relevant regulations.

Table A10. The Impacts of the AI Model on The Approval Rate and Lending Performance: A Branch-Based
Analysis
Sample Regular population Underserved population
Dependent Approval Default Utilization Approval Default Utilization
Data
variable (1/0) (1/0) level (1/0) (1/0) level
The initial
Large -0.006*** -0.008*** 0.072*** 0.117*** -0.008*** 0.072***
launch of the
cities/towns (0.001) (0.003) (0.031) (0.015) (0.002) (0.031)
AI model
The initial
Medium-sized 0.002 -0.007*** 0.037 0.158*** -0.005*** 0.041***
launch of the
cities/towns (0.002) (0.003) (0.028) (0.009) (0.001) (0.012)
AI model
The initial
Small -0.002 -0.008*** 0.040*** 0.126*** -0.008*** 0.036***
launch of the
cities/towns (0.002) (0.003) (0.007) (0.018) (0.002) (0.011)
AI model
Note: We reran Equations (1) and (2) using subsamples based on city/town size as a robustness check and report the coefficients of “the initial
launch of the AI model” on the approval rate, default rate, and utilization level in this table. The results come from three individual regressions.

Table A11. The Impacts of the AI Model on the Approval Rate and Lending Performance: Placebo Tests
Sample Regular population Underserved population
Fake Dependent Approval Default Utilization Approval Default Utilization
treatment variable (1/0) (1/0) level (1/0) (1/0) level
0.002 0.003*** 0.010*** 0.018* 0.003*** 0.003
Nov 1, 2020
0.002 0.000 0.002 0.009 0.000 0.003
The initial
launch of the 0.002*** 0.001*** -0.004*** 0.017 0.003*** 0.003***
Dec 1, 2020
fake treatment 0.000 0.000 0.001 0.010 0.000 0.001
-0.001*** -0.000 -0.004*** 0.011 0.002*** 0.000
Jan 1, 2021
0.000 0.000 0.000 0.009 0.000 0.001
Note: We reran Equations (1) and (2) within the time period before the initial launch of the AI model and used fake treatment marks. We report
the coefficients of "the initial launch of the fake treatment" on the approval rate, default rate, and utilization level. The results come from three
individual regressions.

Table A12. Practical Usage of Weak Signals


Data Sample features Practices
Domain
Provident Real estate balance, homeownership Crowdfunding loans default prediction (Fu et al., 2021)
fund data Financial funds Purchase prediction (Martens et al., 2016)
In-app Balance of e-commerce accounts Financial risk prediction (Wang et al., 2021)
financial Payment transactions Purchase prediction (Martens et al., 2016)
data History and sales of e-commerce accounts Small business lending (Jagtiani & Lemieux, 2016)
In-app Sentiment of comments Financial risk prediction (Wang et al., 2021)
nonfinancial Browsing sequence, browsing intensity App recommendation (He et al., 2019)
data User’s ID, location, profile App analytics (Kummer & Schulte, 2019)
Online reviews and followers Small business lending (Jagtiani & Lemieux, 2016)
Education and field of study Mortgage (Chan et al., 2022)
Real-time click and input data Fraud risk (Weinmann et al., 2022)

MIS Quarterly Vol. 48 No. 4 / December 2024 1833


1834 MIS Quarterly Vol. 48 No. 4 / December 2024
Copyright of MIS Quarterly is the property of MIS Quarterly and its content may not be
copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for
individual use.

Common questions

Powered by AI

The adoption of AI models increased the approval rate and decreased the default rate for the underserved population. This was achieved by better predicting creditworthiness using weak signals, which traditional models often overlook .

Weak signals serve as the informational base for AI models, allowing machine learning algorithms to create novel and meaningful features that are predictive of creditworthiness. This leads to improved prediction accuracy without relying on traditional, often exclusionary, strong signals. By effectively evaluating underserved applicants who would not qualify through conventional methods, AI models enhance financial inclusion .

The study outlines that AI models can be a powerful tool for financial inclusion. Practitioners should consider using weak signals and machine learning algorithms to enhance credit scoring models, even when operating with limited IT capabilities. Additionally, combining AI models with traditional models can yield improved outcomes in credit assessment processes. These insights guide the design and deployment of AI systems in financial contexts, emphasizing their adaptability and efficacy .

AI models provide a powerful tool for enhancing financial inclusion by improving default risk prediction accuracy. This enables banks to widen access to capital for the underserved, even with limited IT resources .

The combination of weak signals and machine learning algorithms produces the largest prediction accuracy improvement. This enhances financial inclusion significantly, surpassing the results from using one without the other .

The adoption of the AI model resulted in higher approval rates and lower default rates among underserved applicants compared to a control group using traditional methods. This indicates that AI models are more effective in assessing credit risk and enhancing financial inclusion while maintaining or even improving loan performance metrics .

AI models contribute to financial inclusion and social justice literature by demonstrating significant potential to improve prediction accuracy and increase capital access for underserved populations. By reducing reliance on traditional strong signals, typically lacking among underserved groups, the models mitigate statistical discrimination and enhance financial inclusion. These insights expand the theoretical understanding of AI's role in social justice, showing how technological advances can address systemic financial barriers .

Weak signals contribute by providing raw information that, when processed by machine learning algorithms, predicts creditworthiness more accurately than strong signals alone. This results in improved selection of applicants who are less likely to default, thereby enhancing financial inclusion .

The AI models enhance financial inclusion primarily by improving prediction accuracy, which is achieved through the use of weak signals and sophisticated machine learning algorithms. Weak signals provide raw data to create viable predictor features, while machine learning algorithms connect these features to creditworthiness, reducing reliance on group characteristics prone to financial exclusion. The combination of these methods results in the largest improvement, particularly beneficial for the underserved population previously overlooked by traditional models .

AI models contribute to social justice by enhancing financial inclusion. They improve access to capital for the underserved population, reducing statistical discrimination by leveraging weak signals and advanced algorithms .

You might also like