Analysis of Learning Behavior
Analysis of Learning Behavior
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.Doi Number
ABSTRACT Information literacy is a basic ability for college students to adapt to social needs at present,
and it is also a necessary quality for self-learning and lifelong learning. It is an effective way to reveal the
information literacy teaching mechanism to use the rich and diverse information literacy learning behavior
characteristics to carry out the learning effect prediction analysis. This paper analyzes the characteristics of
college students' learning behaviors and explores the predictive learning effect by constructing a predictive
model of learning effect based on information literacy learning behavior characteristics. The experiment
used 320 college students' information literacy learning data from Chinese university. Pearson algorithm is
used to analyze the learning behavior characteristics of college students' information literacy, revealing that
there is a significant correlation between the characteristics of information thinking and learning effect. The
supervised classification algorithms such as Decision Tree, KNN, Naive Bayes, Neural Net and Random
Forest are used to classify and predict the learning effect of college students' information literacy. It is
determined that the Random Forest prediction model has the best performance in the classification
prediction of learning effect. The value of Accuracy is 92.50%, Precision is 84.56%, Recall is 94.81%, F1-
Score is 89.39%, and Kapaa coefficient is 0.859. This paper puts forward differentiated intervention
suggestions and management decision-making reference in the information literacy teaching process of
college students, with a view to adjusting the information literacy teaching behavior, improving the
information literacy teaching quality, optimizing educational decision-making, and promoting the
sustainable development of high-quality and innovative talents in the information society.Our work
involving research of the thinking and direction of the sustainable development of information literacy
training proved to be encouraging.
INDEX TERMS machine learning, information literacy, learning behavior characteristics, learning effect,
innovative talents
x 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
a part of cultural literacy and overall quality. Cultivating and literature review, etc. The prediction of students' learning
college students' information literacy has already become an performance and learning effect is carried out using
important issue facing contemporary higher education. regression analysis, neural network, Bayes and other methods
Information literacy includes the basic knowledge and (Wang Gaihua et al., 2019)[9].UNESCO's 2019 report,
skills of information and information technology, the ability Artificial Intelligence in Education: Challenges and
to use information technology to learn, cooperate, Opportunities for Sustainable Development, explores how
communicate and solve problems, as well as information artificial intelligence technologies can help education
awareness and social ethics. At present, information literacy systems use data to to promote equity and quality in
education has received the attention of people from all walks education [10].Using educational data mining technology and
of life. The education departments and libraries in the United machine learning technology to build learning effect
States, the United Kingdom, Australia and other countries prediction model through data-driven way, that is,
have carried out information literacy education to different automatically learning from data to build prediction model,
degrees.In 2022, the Ministry of Education and other four which is the current research focus and research trend.
departments of China jointly issued the "key points of This study links multiple specific behavioral data together
improving the digital literacy and skills of the whole people to create an integrated data link based on college students'
in 2022". Students' information literacy and digital literacy learning behaviors in information literacy courses. The
are expected to be further improved in the next few predictive analysis and evaluation of different machine
years[4].In recent years, due to the influence of online learning classification models are used to classify and predict
teaching and hybrid teaching, and the development of the learning effect of college students. This study focuses on
artificial intelligence technology, information literacy has the following questions.
also received more and more research attention. Many (1) Which indicators of information literacy learning
colleges and universities at home and abroad have opened behavioral characteristics of college students have better
information literacy courses through various ways to carry predictive ability for learning effect?
out targeted information literacy education. For example, on (2) Which machine learning models have better predictive
the MOOC platform of the University of China, Tsinghua performance and efficacy based on the study sample?
University has opened "Information Literacy: A Compulsory (3) What diagnostic observations for use in learning
Course for Academic Research", Wuhan University has recommendations and instructional interventions were
opened "Information Literacy and Practice - A Pair of derived in conjunction with the study findings?
Academic Eyes", Sun Yat-sen University's "Information
Literacy General Course - A Compulsory Course for Digital II. LITERATURE REVIEW
Survival", and Sichuan Normal University's "Information This study conducted a literature study on the analysis of
Literacy and Lifelong Learning (Autonomous Mode)"[5].In learning behavior characteristics and prediction of learning
view of the existing information literacy education for effect to improve college students' information literacy.
college students, many problems have emerged.
In the field of education big data, learning prediction is a Table 1 .Abbreviation of professional terms.
very meaningful topic. Learning effect prediction is one of Acronyms Description
the core issues in the field of learning analysis. Its essence is
to use various data generated by learners in the learning ML Machine Learning
process, and use the method represented by machine learning ITUB Information Technology Usage Behaviour
to predict the learning effect. According to the prediction Preferences for Smart Classroom Learning
results, teachers can know the learners' learning status in time PSCLE
Environments
and intervene in the learning process in time. Such as
ILSs Information Literacy Skills
improving learners' learning habits, adjusting teaching
strategies, etc. (Wu Fati et al., 2019)[6].Learning analysis ISB Information-Seeking Behavior
technology has developed from principle exploration and CBL Case-Based Learning
application value to application in learning behavior analysis, DNN Deep Neural Network
data visualization and learning prediction (Hu Hang et al.,
CNN Convolutional Neural Network
2020)[7].Learning prediction is based on learning
achievement, learning goals, and learning ability, and SVM Support Vector Machine
predicts learning effect and learning experience based on the GBDT Gradient Boosting Decision Tree
characteristics of learning behavior before and after learning LR Logistic Regression
(AlShammari et al.,2013)[8].The prediction of learning
RF Random Forest
results includes prediction theoretical model, empirical
research of prediction model, comparison of algorithms, KNN K-Nearest Neighbor
development of algorithms, research of early warning factors BP-NN Back Propagation Neural Network
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
The literature data mainly comes from the common Literature comparison on research methodology: Most
databases for international paper retrieval such as Web of scholars mainly use quantitative research, qualitative
Science, Scopus, Ei Compendex, etc., and is mainly based on research, questionnaire survey, data mining, and factory
the relevant research in the past three years. quality-experience, while some scholars use machine
learning technology. The tick marks in Table 2 represent the
Table 2.Research on information literacy learning behavior and use of machine learning. According to the literature, scholars
learning effect.
mainly use traditional research methods in research methods,
Ref. Domain M L Methodology Finding
Hot Spots and Assessment tools, information security, and personalized
[11] Bibliometric analysis
Enlightenment learning recommendations
[12] Evaluation of Enhancement √ ML Constructs a predictive model for enhancing ITUB
Smart Classroom A high level of information literacy obtained
[13] A quantitative method
Preferences significantly higher scores on PSCLE
Discusses several strategies of information literacy
[14] Training Strategies Data mining
education under the background of big data
Online Course Based on The “MOOC+SPOC+Flipped Classroom” teaching
[15] Analyzes the data
MOOC method
[16] Assessment of ILSs and ISB Quantitative research Enhancing the ILSs of medical students
The Frame of Evaluation Means of
[17] The methods to improve information literacy
Index questionnaires
Statistical free software
[18] Assess the Self-ssessment Explore new teaching methods
R
Factorial quasi- Improve metacognitive abilities classified based on
[19] Metacognitive Abilities
experiment students’ information literacy
Quantitative
Evaluation of Information
[20] evaluation and data Establish a reciprocal teaching mode
Literacy
mining
Directed qualitative The CBL unit was effective in increasing their
[21] Critical Thinking Skills
content analysis information literacy and critical thinking skills
Analyzes their media selection tendency, media usage
[22] Cross-Media Data Analysis √ DNN time, positive influence, and the relationship with new
media literacy
A Multilevel Modeling
[23] Multilevel modeling The models related to teacher and student characteristics
Approach
Since there are many professional terms and machine and seldom use machine learning methods to carry out
learning algorithm terms in the reference documents, they are relevant research.
uniformly described, as shown in Table 1. Literature comparison in the research of finding: Scholars
have made fruitful exploration results in the research. For
A. INFORMATION LITERACY LEARNING BEHAVIOR example, some scholars establish a correct teaching mode,
ANALYSIS AND LEARNING EFFECT EVALUATION some scholars explore new teaching methods of information
Scholars have carried out research on information literacy literacy, and some scholars build prediction models to
from different angles. Specific literature analysis and enhance the use of information literacy. The literature shows
comparison are as follows: that there are few achievements in the analysis of information
Literature comparison in research domain: In terms of literacy learning behavior and the construction of learning
information literacy learning behavior and learning effect, effect prediction model.The detailed comparative studies are
many studies in recent years have focused on the evaluation summarized in Table 2.
framework of information literacy effect; Strategies for
improving information literacy learning behavior; The B. LEARNING BEHAVIOR ANALYSIS AND LEARNING
cultivation of specific ability of information literacy; Online EFFECT PREDICTION BASED ON MACHINE LEARNING
courses or the relationship between intelligent environment
and information literacy.
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
Table 3. Learning behavior analysis and learning effect prediction example, performance prediction model; Students'
based on machine learning.
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
willingness analysis model; Prediction of classroom teaching provides a basic reference tool for this paper.Based on this
effect; Learning behavior diagnosis model, etc. evaluation index, the research team observed, measured,
To sum up, the current research on information literacy is extracted and described the information literacy learning
mainly based on theoretical deduction and experience, behavior characteristics of college students, and formed the
establishing the hypothesis that some factors are related to information literacy learning behavior characteristics
academic performance, and then collecting data through observation scale for college students. The scale includes
questionnaires and interviews to analyze and verify the awareness and attitude, knowledge and skills, application and
hypothesis. This method can only prove the correlation innovation, ethics and responsibility. Awareness and attitude
between selected factors and academic achievement, but it is mainly focus on the understanding of the importance of
difficult to determine the quantitative relationship between information technology. Knowledge and skills mainly focus
selected factors and academic achievement. Machine on the knowledge and skills of information technology.
learning and data mining technology are rarely used, and data Application and innovation mainly examine the cognitive
intelligence analysis research of information literacy thinking and innovative application of information
education is lacking. Some researchers use decision tree, technology. Morality and responsibility mainly focus on
neural network and other algorithms to establish academic information laws, regulations and moral concepts. There are
achievement prediction models, but lack of information 4 first-level indicators, 9 second-level indicators and 28 third-
literacy learning effect prediction research. level indicators. In order to measure the learning effect of
With the continuous development and maturity of students, this study divides the students' learning scores into
intelligent technologies such as data mining, emotion five categories: excellent (5), good (4), medium (3), qualified
analysis and pattern recognition, especially the combination (2) and unqualified (1). Each three-level indicator of
of machine learning technology and education field, it information literacy learning behavior characteristics of
provides strong technical support for learning prediction college students corresponds to Likert's five-level scale:
research. Although some studies have pointed out the "1=never"; "2=seldom"; "3=sometimes"; "4=often";
negative impact of artificial intelligence on educational "5=always".
research, the use of educational data mining and other Table 4 describes the observed indicators of information
technologies is still the current research trend. Therefore, it is literacy learning behavioral characteristics of college students.
an urgent problem to build an information literacy learning In conjunction with the above, the four areas of learning
behavior characteristic analysis and learning effect prediction behavior are described in terms of learning behavior in
model for college students with strong usability, easy consciousness and attitude, learning behavior in knowledge
operation and good prediction performance, as well as and skills, learning behavior in application and innovation,
differential recommendation and intervention based on the and learning behavior in morality and responsibility.
prediction results. Learning behavior in consciousness and attitude:Mainly
including Information perception consciousness
III. Materials and Methods (IPC),Information application consciousness(IAC) and
Lifelong learning consciousness (LLC). Specific behaviors
A. RESEARCH TOOLS include: Identify and classify information (IPC1); Using the
Common learning prediction tools include Weka, SPSS, Web to find, filter, and judge information (IPC2); Determine
Python, Rapidminer and other tools. In this study, SPSS and the correctness and reliability of information sources (IPC3);
Rapidminer analysis tools are mainly used in data Using information technology related knowledge and
preprocessing, feature set selection and classification methods to solve problems (IAC1); Using information
prediction, and model performance evaluation. Rapidminer is technology tools such as mind mapping tools to assist
mainly used in machine learning. Rapidminer is the world's learning (IAC2); Leveraging Information Technology to
mainstream data mining and machine learning software. It support Lifelong Learning (LLC1); Using Information
provides functions such as data preprocessing and Technology to Support Professional and Personal
visualization, predictive analysis and statistical modeling, Development (LLC2).
evaluation and deployment, and has rich machine learning Learning behavior in knowledge and skills:Mainly
algorithms[39]. including Information science knowledge(ISK) and
Information application Skills (IAS). Specific behaviors
B. RESEARCH OBJECT AND DATA SOURCE include: Understand all kinds of operating systems, word
Due to regional factors, different regions have different processing software, graphics and image processing software,
requirements for information literacy. Therefore, the video and audio processing software operation method
establishment of information literacy standards should not be (ISK1); Understand the development history, basic status and
limited to general standards[40].The research team has built future trend of information technology (ISK2); Master the
the evaluation index system of college students' information basic knowledge and technology of information retrieval and
literacy in the previous research[41].The index system evaluation, information classification and storage method
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
(ISK3); Master the basic scientific knowledge of information Learning behavior in application and innovation:Mainly
literacy, data literacy, visual literacy and other multi-literacy including Information thinking (IT) and Information
(ISK4); Use various search engines and network platforms to behavior (IB). Specific behaviors include: Define and
find the required information (IAS1); Classify the identify implicit assumptions in information, and deduce
information and present the information in a tabular form information (IT1); Carry out targeted information-based
(IAS2); Identification and analysis of information through instructional design and implement effective instructional
various approaches and methods (IAS3); Create valuable activities (IT2); Using information technology to support
information resources based on specific teaching content or services and management (IT3); Construct problem solutions
around specific teaching topics (IAS4). by integrating resources and using reasonable algorithms
(IT4); Use collaborative tools to create and manage
Table 4 .Observation scale of information literacy learning behavior content,such as project management systems, shared
characteristics of college students.
documents, etc. (IB1); Use advanced communication tools to
First level Second level
Observable behavior
indicators indicators
1) Identify and classify information (IPC1)
1 IPC 2) Using the Web to find, filter, and judge information (IPC2)
3) Determine the correctness and reliability of information sources (IPC3)
Consciousness 4) Using information technology related knowledge and methods to solve problems (IAC1)
and Attitude 2 IAC
5) Using information technology tools such as mind mapping tools to assist learning (IAC2)
6) Leveraging Information Technology to support Lifelong Learning (LLC1)
3 LLC
7) Using Information Technology to Support Professional and Personal Development (LLC2)
8) Understand all kinds of operating systems, word processing software, graphics and image
processing software, video and audio processing software operation method (ISK1)
9) Understand the history, basic status and future trend of information technology (ISK2)
1 ISK 10) Master the basic knowledge and technology of information retrieval and evaluation,
information classification and storage method (ISK3)
11) Master the basic scientific knowledge of information literacy, data literacy, visual literacy
Knowledge and other multi-literacy (ISK4)
and Skills
12) Use various search engines and network platforms to find the required information (IAS1)
13) Classify the information and present the information in a tabular form (IAS2)
2 IAS
14) Identification and analysis of information through various approaches and methods (IAS3)
15) Create valuable information resources based on specific teaching content or topics (IAS4)
16) Define and identify implicit assumptions in information, and deduce information (IT1)
17) Carry out targeted information-based instructional design and implement effective
instructional activities (IT2)
1 IT
18) Using information technology to support services and management (IT3)
19) Construct problem solutions by integrating resources and using reasonable algorithms
Application
(IT4)
and
Innovation 20) Use collaborative tools to create and manage content (such as project management
systems, shared documents, etc.) (IB1)
21) Use advanced communication tools to communicate with people (e.g. video conferencing,
2 IB
data sharing, application sharing) (IB2)
22) Developing innovative teaching applications (IB3)
23) To carry out information technology cooperation and exchange (IB4)
24) Healthy and correct use of learning resources to create information environment (IE1)
25) Restrain one's own information ethical behavior and supervise others' information
1 IE behavior (IE2)
26) Abide by the network civilization convention, purify the network language, civilized and
Morality and
polite learning and communication (IE3)
Responsibility
27) Impart knowledge of laws, regulations and ethics related to technology utilization (ILR1)
2 ILR 28) Learn the right to access and access information equally and respect the intellectual
property rights of others (ILR2)
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
(4) Carry out prediction effect evaluation and comparative A. CORRELATION ANALYSIS OF LEARNING BEHAVIOR
analysis, and establish the optimal prediction algorithm CHARACTERISTICS AND LEARNING EFFECT
model. Modeling feature subset selection can be achieved through
correlation analysis of learning behavior characteristics and
D. DATA COLLECTION AND PREPROCESSING learning effect. Correlation analysis is the analysis of two or
more elements of variables that are related as a measure of
1) DATA COLLECTION
The research data comes from the "Special Survey on their degree of association. The related elements must have
Information Literacy of College Students" implemented by some kind of association or likelihood in order for correlation
the research group of the "Research on Information Literacy analysis to be performed.
of College Students Supported by Smart Campus", a teaching If two variables have a strong interdependence, then we
quality project in Anhui Province, China, in 2022. The study can say that the two variables have a high correlation. If the
takes into account the impact of scattered, random and values of both groups increase at the same time, they are said
representative data on the student population, involving to be positively correlated; if the value of one group increases,
students from a variety of disciplinary and professional then the value of the other group decreases, which is called a
backgrounds.The data were collected from the information negative correlation. Pearson's algorithm is used here to
literacy learning behavior questionnaire data and information calculate the correlation. Pearson's correlation coefficient is
literacy course performance data of 320 junior students in an important measure of the interrelationship between two
Huainan Normal University in 2020. Data was collected by variables, and it has a correlation between -1 and 1. If there
means of a web-based questionnaire administered in batches are P related variables and the correlation coefficient of the
to students in each class. A pre-survey was conducted before two variables needs to be found, the number of correlation
the questionnaire was distributed to test whether the coefficients obtained is as follows:
questions were fully understood by the subjects, whether the RP×P = p( p-1)/2 (1)
expression was appropriate and the degree of cooperation, so
If the variables are arranged into a numerical square in
the overall recall quality of the questionnaire was very high.
order of their numbering, this square is the correlation
The data presents positive distribution, with little difference
matrix.There are two identical variables on the diagonal from
between the data, and good reliability and validity.
the top left to the bottom right, both of which have a value of
2) DATA PREPROCESSING 1; the correlation coefficient above the diagonal has a
Figure 2 shows a descriptive statistical overview.The symmetric relationship with the part below.
horizontal axis represents variables, and the vertical axis The Pearson correlation coefficient between each variable
represents numerical values. It gives some indication of the and the learning effect was calculated to measure the linear
data results for each variable.The descriptive statistics correlation between the existing variables. The correlation
revealed a few missing values and outliers.The Min, Max, coefficients between the variables are shown in Figure 4. The
Average and Deviation of each feature subset are shown in intersection of the two variables in the rows and columns is
the figure. Average ranks in the top 3 for IAC1, IPC3 and the significance plot, and the color knob at the bottom
IPC2, while IB4, IB2 and IB3 rank in the bottom 3; IE1, corresponds to the correlation coefficient. The correlation
IPC1 and IAC1 rank in the bottom 3 for Deviation, while IB4, between the predictor variables and the learning effect is
ISK2 and ISK3 rank in the top 3. shown in Figure 3. R takes values between -1 and +1. If r>0,
To ensure the quality of the classification learning model it means that the two variables are positively correlated, i.e.,
construction, data preprocessing was performed on the the larger the value of one variable, the larger the value of the
collected learning behavior characteristics and performance other variable; if r<0, it means that the two variables are
data, including operations such as missing value processing, negatively correlated, i.e., the larger the value of one variable,
abnormal data processing and data transformation.Outlier the smaller the value of the other variable. The larger the
processing. SPSS was used to remove the null data and other absolute value of r, the stronger the correlation; the smaller
abnormal data present in the training set. Then the outliers of the absolute value of r, the weaker the correlation[42].The
the data are removed by box plot.After data cleaning, 315 linear correlation between the existing variables was
recorded data were finally retained. measured by calculating Pearson's correlation coefficients
Data transformation. In order to enable the machine between the variables of information literacy of college
learning model to achieve better recognition, the data in the students and learning effect. It was concluded that the vast
collected training set needs to be transformed operationally. majority of the predictor variables showed some positive
The attribute of learning effect should be defined as "nominal correlation with learning outcomes. As the correlations
attribute". The transformation level attributes are: reflect some variability, this provides support for the analysis
"5=Excellent", "4=Good", "3=Medium", " 2=Pass", "1=Fail". of learning behavioural characteristics.
IV. RESULTS AND DISCUSSION
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
are abstract in nature; from the perspective of learning The F1-Score is a comprehensive metric that combines
behavior presentation time, these three learning behaviors are Precision and Recall. Since Precision and Recall are a pair of
less integrated with college students' study, life and existing contradictory measures, and different problems focus on
learning environment than other learning behavior different criteria, F1-Score is a good comprehensive
characteristics. This suggests a reference for later evaluation metric, and the larger the value of this metric, the
pedagogical improvement, which requires more streamlined better.
and effective learning behaviors for college students in
F1-Score =2×Precision×Recall/(Precision+Recall) (5)
response to these indicators.
3) ESTABLISHMENT OF A SUBSET OF LEARNING The kappa (KIA) coefficient is a measure of classification
BEHAVIOR CHARACTERISTICS accuracy. kia is an index that enables the calculation of
In order to construct better prediction models and achieve overall consistency and classification consistency. The KIA
better prediction results with fewer features, three learned is used to perform an assessment of the accuracy of a multi-
behavioral features with correlations below 0.500 were not classification model. The higher the value of this coefficient,
involved in the prediction model construction, namely IPC1 the higher the classification accuracy achieved by the model.
(0.486), LLC2 (0.484), and ILR2 (0.430). The gender kappa coefficient can be calculated as follows. Po denotes the
variable was also not involved in the prediction model proportion of observation accuracy or consistency
construction because its degree of relationship with learning cells. Pc indicates the proportion of cells that are contingently
effect was -0.103. consistent or expected to be contingently consistent.
To sum up, a subset of 25 information literacy learning KIA = (Po-Pc)/(1-Pc) (6)
behavior characteristics of college students is currently
retained, specifically IT1, IT4, IAS1, IT2, IT3, ILR1, IAS4, In terms of prediction model selection, the main focus is
IB1, ISK4, IE1, IE3, IAC1, ISK3, IPC2, IAS3, IB2, LLC1, on predicting college students' learning effect levels through
IE2, IAS2, IPC3, ISK1, IB3, ISK2, IB4, IAC2. their information literacy behavioral performance indicators.
This is a typical classification problem, so the classical
B. CLASSIFICATION MODEL PREDICTION OF machine learning classification algorithm is used to compare
LEARNING EFFECT the prediction performance of different models separately for
The performance evaluation metrics of the binary this study sample. In the following, the five models are
classification model include Accuracy, Precision, Recall, F1- trained and tested using a ten-fold cross validation method.
Score (F1), etc. [44].TP indicates the number of positive The dataset is first divided into ten parts, then nine of them
samples whose learning effect was correctly predicted; TN are rotated as training data and the remaining one is used as
indicates the number of negative samples whose learning test data, and finally the model training is performed by
effect was correctly predicted; FP indicates the number of maximizing the use of samples by averaging the correct rate
samples that were incorrectly predicted as positive; and FN each time as the evaluation value of the algorithm accuracy.
indicates the number of samples that were incorrectly 1) DECISION TREE
predicted as negative.
Classification Accuracy is the percentage of the number of
correct samples that can be predicted in the classification
model and reflects the accuracy of the overall classification.
Accuracy = (TP+TN)/(TP+TN+FP+FN) (2)
The Precision is the ratio of the number of positive cases
correctly predicted by the classification model to the number
of all positive cases predicted by the classification model, i.e.,
the proportion of true positive cases among all results
predicted as positive.
Precision = TP/(TP+FP) (3)
The Recall is the ratio of the number of positive samples
correctly predicted by the classification model to the actual FIGURE 5. The selection process of Decision Tree hyperparameters.
number of positive samples in the entire test set, i.e., the Decision tree is a greedy algorithm that classifies instances
proportion of true positive cases that are found by the based on features and performs recursive binary partitioning
classification model. on the feature space. Starting from the root node of the tree,
the sample data is compared with the feature nodes in the
Recall = TP/(TP+FN) (4)
decision tree, and the branches at the next level are selected
to continue the comparison based on the judgment result, and
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
the final leaf node is the classification result [45].The Naive Bayes is a data detection and classification
advantage of decision trees is that they are more readable and algorithm based on probability theory. The algorithm can
faster to classify[46].The C4.5 decision tree algorithm uses relate the prior and posterior probabilities of events and use
the "gain ratio" to select the optimal partitioning attribute. sample data with prior information to determine the posterior
By training and optimizing the Decision Tree parameters, probability of events. Its advantage is that the model is
the best recognition effect of the model is obtained when the simple to construct and has high efficiency and stability [48].
core parameter Maximum depth is set to 8, the minimum leaf By training and optimizing the Naive Bayes parameters,
size is 2 and the confidence is 0.1. The experimental process the model recognition effect is best when the core parameter
of obtaining Decision Tree is shown in Figure 5. Observe the minimum bandwidth is set to 0.2. The experimental
results of multiple experiments and get the optimal accuracy procedure of obtaining Naive Bayes is shown in Figure 7.
rate of 84.17%. The optimal accuracy rate of 90.00% is obtained by
2) K-NEAREST NEIGHBOR observing the results of multiple experiments.
The K-Nearest Neighbor (KNN) algorithm is an algorithm 4) NEURAL NET
based on statistical classification. The advantage of this Neural network is a mathematical model that simulates
algorithm is that it does not need to partition the vector space biological neural networks for information processing, and
consisting of all data records, and the classification is better neural networks are applied in classification problems with
by training the model data to find K similar vectors, and the good results [49]. Neural networks are mainly composed of:
disadvantage is that it is insensitive to outliers [47]. input layer, hidden layer, and output layer.
By training and optimizing the KNN parameters, the By training and optimizing the Neural Net parameters, the
model recognition effect is best when the core parameter K is best recognition effect of the model is obtained when the core
set to 6. The experimental procedure of KNN is obtained as parameters momentum is set to 0.9, training cycles to 200,
shown in Figure 6. The optimal accuracy rate of 90.83% is and learning rate to 0.01. The experimental process of
obtained by observing the results of multiple experiments. obtaining Neural Net is shown in Figure 8. Observing the
results of multiple experiments, the optimal accuracy rate is
obtained as 91.67%.
3) NAIVE BAYES
FIGURE 8. The selection process of Neural Net hyperparameters.
5) RANDOM FOREST
Random Forest utilizes random sampling of data samples
and features to train multiple tree classifiers, avoiding the
learning of all samples and all features per tree, thus
increasing randomness, avoiding overfitting, and integrating
the results of a single decision tree according to the rules of
Bagging [50].The training sample data are sampled with put-
back to generate K classification regression trees; assume
that there are n features in the feature space and m features
are randomly selected at the nodes of each tree, requiring m
< n; make each tree grow maximally without any pruning;
form a forest by multiple trees, and the classification results
are determined by how many tree classifiers vote.
FIGURE 7. The selection process of Naive Bayes hyperparameters.
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
By training and optimizing the parameters of the Random by Neural Net The results of all indicators show that Random
Forest model, the best recognition effect of the model was Forest prediction model has the best performance and can be
obtained when the number of trees parameter was set to 150 used to enhance the learning effect prediction of college
and the criterion was set to gain_ratio. The experimental students' information literacy.
results of the Random Forest are shown in Figure 9 after
repeated execution for several times. Observe the results of
the multiple experiments and get the optimal accuracy rate of
92.50%.
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
algorithms applied to learning effect prediction modeling, need to be applied according to the specific teaching and
which is basically in line with the findings of related studies learning situation. Further research is proposed in the
such as Wang Juan et al. (2022) [57] and Sun Faqin (2019) following areas at a later stage.(1)Learning effect prediction
[58]. By using the Random Forest algorithm model for has not been able to cover other possible factors in learning
predicting the learning effect of college students' information scenarios, which puts higher demands on the quality of the
literacy, we can predict the learning effect of college students learning behavior trait scale. We will explore the learning
in information literacy education more accurately, guide the behavior characteristics in more scenarios, evolve the
adjustment of teaching behaviors and allocation of teaching learning behavior characteristics scale, improve the
resources, and effectively guarantee teaching quality. universality, and form a closed loop of textbook development
of teaching experiment, teaching research, and teaching
V. CONCLUSION practice.(2)In this study, only five supervised classification
The results prove that the prediction model proposed in this algorithms are used, such as Decision Tree, KNN, Naive
paper has a significant effect on the cultivation of Bayes, Neural Net, and Random Forest. In subsequent
information literacy of college students. On the one hand, studies, the adopted algorithms can be collectively improved
Algorithmic analysis of the learning behaviour characteristics to achieve better prediction results.
of college students' information literacy reveals a more
significant correlation between information thinking,
information application skills and learning effect. Emphasis ACKNOWLEDGMENT
should be placed on the cultivation of information thinking, The author would like to thank everyone who made every
while not neglecting the cultivation of information effort to improve the content of this research paper. Research
acquisition ability. Universities should understand the for this paper was partially supported by the Academic
importance and urgency of information literacy education for Affairs Office of Huainan Normal University.Special thanks
college students from the height of sustainable development to the anonymous reviewers and editors for their work on the
of talents and cultivation of high-quality and innovative publication of this paper.This research was funded by
talents. It is necessary to make full use of network and Scientific research project of Education Department of Anhui
multimedia technologies to provide intelligent learning tools Province of China,grant number 2020jyxm1733;Scientific
and learning environments conducive to independent, research project of Education Department of Anhui Province
cooperative and research learning, to incorporate critical of China,grant number 2020qkl38. Key Research Project of
thinking methods into the information literacy education Huainan Normal University in Natural Science
system, to establish a long-term assessment mechanism Category ,grant number 2022XJZD027.
oriented to the cultivation of critical thinking, and to actively
promote the critical thinking cognition and knowledge
creation skills of college students. On the other hand, it is REFERENCES
necessary to further optimize the learning methods in such [1] Zhang Changhai, “Research on the information literacy education
aspects as information laws and regulations, streamline the model of Chinese college students based on critical thinking and
creativity,” Journal of China Library, vol. 4, no.15, pp. 15-16,
learning contents in such aspects as information perception
Aug.2016,doi:10.13530/j.cnki.jlis.164008.
awareness, and continuously instill and guide college [2] Shang Hui, “Information literacy education for college students, ”
students to establish the concept of lifelong learning. This Educational theory and practice,vol. 30 ,no.10,pp. 38-
study provides a more reliable data base for educational 39,Oct.2008,doi:CNKI:SUN:JYLL.0.2008-30-016.
[3] Yan D, Li G, “A Heterogeneity Study on the Effect of Digital
administrators to analyze the potential connections between Education Technology on the Sustainability of Cognitive Ability
information literacy education phenomena and outcomes, for Middle School Students, ” Sustainability ,vol. 15,no.3,pp.2784-
thereby increasing the success rate of educational decisions. 2786,Feb.2023,doi:10.3390/su15032784.
[4] Yang Xiaohong, Zheng Xin, Zhang Jing, “Research on the Chinese
In conclusion, this study proposes an effective machine Experience of the “Epidemic” of the Education War - Online
learning approach to characterize the learning behaviors and Education Perspective, ” China Distance Education ,vol. 43 ,no.1,
predict the learning effects of information literacy among pp. 1-11,Jan.2023,doi:10.13541/j.cnki.chinade.2023.01.002.
college students. This study uses a data-driven thinking to [5] Gao Shan, Fan Shan, “Digital Literacy Practice and Enlightenment
of the University of Queensland Library in Australia, ” Library
promote teachers and students to optimize their learning Science Research ,vol. 7,no.10, pp. 95-
paths and improve the effectiveness of information literacy 101,Jul.2019,doi:10.15941/j.cnki.issn1001-0424.2019.10.015.
instruction. It also strongly supports the implementation of [6] Wufati, Tian Hao, “Mining the characteristics of meaningful
learning behavior: a framework for predicting learning outcomes, ”
differentiated teaching decision-making [59] and the Open education research ,vol. 25, no.6, pp. 75-
construction of a long-term mechanism for differentiated 82,Jan.2019,doi:10.13966/j.cnki.kfjyyj.2019.06.008.
educational decisions through a data-driven approach. [7] Hu Hang, Li Yaxin, Lang Qi'e, Yang Hairu, Zhao Qiuhua, Cao
Yifan , “The occurrence process, design model and mechanism
Although this study has conducted some exploration, there explanation of deep learning, ” China's distance education,
are still limitations. While machine learning approaches work vol.5,no.1, pp. 54-61,Mar.2020,doi:10.13541/j.cnki.chinade.
to some extent, the study suggests that technological tools 2020.01.005.
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
[8] AlShammari, I. A., Aldhafiri, M. D., & Al-Shammari, Z. , “A Meta- [24] Sun, Y., Tan, Z., Li, Z., & Long, S. , “Predicting and Analyzing
Analysis of Educational Data Mining on Improvements in Learning College Students’ Performance Based on Multifaceted Data Using
Outcomes,” College Student Journal ,vol.47, no.2, pp. 326- Machine Learning, ” 2022 4th International Conference on
333,Jun.2013. Advances in Computer Technology, Information Science and
[9] Wang Gaihua, Fu Gangshan, “Prediction of online learning Communications (CTISC) ,no.4,pp.1-6,Apr.2022,doi: 10.1109/CTI
behavior and achievement and design of learning intervention SC54888.2022.9849815.
model, ” China's distance education ,vol.2, no.2, pp. 39- [25] Guo, J., & Xu, T. , “IT Monitoring and Management of Learning
48,Mar.2019,doi:10.13541/j.cnki.chinade.20181214.007. Quality of Online Courses for College Students Based on Machine
[10] UNESCO.[Online].Avaliable:https://www.unesco.org/en/articles/ch Learning, ” Mobile Information Systems ,vol. 2022, Article ID
allenges-and-opportunities-artificial-intelligence-education. 5501322,Sept.2022,doi:10.1155/2022/5501322.
Accessed on: Apr.1,2023. [26] Jia, Y., & Wang, E. , “Research on Information Anxiety of College
[11] Yang, G.F., Wen, B., & Lin, W. , “Research Status, Hot Spots and Students under the Background of Information Overloaded Based
Enlightenment of College Students' Information Literacy: Based on on Support Vector Machine Optimization Alogrithm, ” 2021 2nd
Bibliometric Analysis of CNKI from 2000 to 2021, ” Proceedings International Conference on Information Science and Education
of the 4th World Symposium on Software Engineering , no.6,pp. (ICISE-IE) ,no.11,pp.484-487,Nov.2021,doi:10.1109/ICISE-IE53
161-166,Sept.2022,doi:10.1145/3568364.3568389. 922.2021.00117.
[12] Li,J., “Machine Learning-Based Evaluation of Information Literacy [27] Pei, C. , “The Construction of a Prediction Model for the Teaching
Enhancement among College Teachers, ” International Journal of Effect of Two Courses Education in Colleges and Universities
Emerging Technologies in Learning (IJET) ,vol.17,no.22,pp. 116- Based on Machine Learning Algorithms, ” Wireless
131,Nov.2022,doi:10.3991/ijet.v17i22.35117. Communications and Mobile Computing ,vol.2022, Article ID
[13] Yu, L., Wu, D., Yang, H.H., & Zhu, S. , “Smart classroom 1167454,Aug.2022,doi:10.1155/2022/1167454.
preferences and information literacy among college students, ” [28] Xu, H. , “GBDT-LR: A Willingness Data Analysis and Prediction
Australasian Journal of Educational Technology ,vol.38,no.2,pp. Model Based on Machine Learning, ” 2022 IEEE International
144-163,Feb.2022,doi:10.14742/ajet.7081. Conference on Advances in Electrical Engineering and Computer
[14] Ying, Y. , “Research on college students ’ information literacy Applications (AEECA) ,no.8,pp.396-401,Aug.2022,doi:10.1109
based on big data, ” Cluster Computing ,vol.22,no.2,pp. 3463- /AEECA55500.2022.9919013.
3470,Mar.2019,doi:10.1007/s10586-018-2193-0. [29] Li, T. , “Students’ Numeracy and Literacy Aptitude Analysis and
[15] Mian, Z., Bai, Y., & ur, R.K. , “Research on College Computer- Prediction Using Machine Learning, ” Journal of Computer and
Computing and Information Literacy online course based on Communications ,vol.10, no.8,pp.90-103,Aug.2022,doi:10.4236
MOOC: taking the North Minzu University as an example, ” 2021 /jcc.2022.108006.
IEEE 3rd International Conference on Computer Science and [30] Shen, X., & Yuan, C. , “A College Student Behavior Analysis and
Educational Informatization (CSEI) ,vol.6, no.7,pp. 300- Management Method Based on Machine Learning Technology, ”
306,Jun.2021,doi:10.1109/CSEI51395.2021.9477751. Wirel. Commun. Mob. Comput. ,vol.2021, Article ID
[16] Haider, M.S. and Ya, C. , “Assessment of information literacy skills 3126347,Aug.2021,doi:10.1155/2021/3126347.
and information-seeking behavior of medical students in the age of [31] Wang, R. , “Research on effect of college english blended teaching
technology: a study of Pakistan,” Information Discovery and mode under small private online course based on machine learning, ”
Delivery ,Vol. 49 ,no.1,pp. 84-94,Feb.2021,doi:10.1108/IDD-07- SN Applied Sciences ,vol.5,no.1,pp.1-13,Jan.2023,doi:
2020-0083. 10.1007/s42452-023-05278-y.
[17] Ouyang, X., Xiao, Y., & Zhong, J. , “Research on the Influencing [32] Rusdiana, E., Violinda, Q., Pramana, C., Purwoko, R.Y., Chamidah,
Factors and the Promotion Measures of College Students ’ D.D., Rahmah, N., Prihanto, Y.J., Hasnawati, F.Y., Susanti, R.,
Information Literacy,” 2016 8th International Conference on Haimah, A.Y., Purwahida, R., Arkiang, F., Equatora, M.A., &
Information Technology in Medicine and Education (ITME). vol.12, Mujib, A. , “College Students’ Perception of Electronic Learning
no.12,pp. 728-732,Dec.2016,doi:10.1109/ITME.2016.0169. During Covid-19 Pandemic in Indonesia: A Cross-Sectional Study, ”
[18] Nishikawa, T., & Izuta, G. , “The information technology literacy Journal of Higher Education Theory and Practice,vol.10,
level of newly enrolled female college students in Japan,” no.13,pp.29-44, Oct.2022,doi:10.33423/jhetp.v22i13.5505.
Humanities & Social Sciences Reviews ,vol.7,no.1,pp. 1- [33] Liu, X., & Yang, C. , “Analysis of College Students' Practical
10,Mar.2019,doi:10.18510/hssr.2019.711. Teaching Effect Based on Machine Learning Correlation Analysis
[19] Diani, R., Susanti, A.L., Lestari, N., Yuberti, Saputri, M., & Fujiani, Algorithm: Take the Software Technology Course as an Example, ”
D. , “The influence of connecting, organizing, reflecting, and 8th International Conference on Education, Management,
extending (CORE) learning model toward metacognitive abilities Information and Management Society ,vol.8, no.1,pp.782-
viewed from students’ information literacy in physics learning,” 788,Aug.2018,doi:10.2991/emim-18.2018.159.
Journal of Physics: Conference Series ,vol.1796,no.1,pp. 1- [34] Akram, H., Abdelrady, A. H., Al-Adwan, A. S., & Ramzan, M.,
9,Feb.2021,doi:10.1088/1742-6596/1796/1/012073. “Teachers’ Perceptions of Technology Integration in Teaching-
[20] Liu, F., Zhang, Q. , “A New Reciprocal Teaching Approach for Learning Practices: A Systematic Review,” Frontiers in Psychology,
Information Literacy Education under the Background of Big Data, ” vol.13,p.920317.Jun.2022,doi:10.3389/fpsyg.2022.920317
Int. J. Emerg. Technol. Learn ,vol.16, no.3,pp. 246- [35] Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. , “Student
260,Feb.2021,doi:10.3991/ijet.v16i03.20459. Engagement Predictions in an e-Learning System and Their Impact
[21] Masko, M.K., Thormodson, K., & Borysewicz, K. , “Using Case- on Student Course Assessment Scores, ”Computational Intelligence
Based Learning to Teach Information Literacy and Critical and Neuroscience ,vol.2018 Article ID
Thinking Skills in Undergraduate Music Therapy Education: A 6347186,Oct.2018,doi:10.1155/2018/6347186.
Cohort Study, ” Music Therapy Perspectives ,vol.38, no.2, pp. 143- [36] Karthikeyan, V.G., Thangaraj, P., & Karthik, S. , “Towards
149,Oct.2020,doi:10.1093/mtp/miz025. developing hybrid educational data mining model (HEDM) for
[22] Xu, R., Wang, C., Hsu, Y., & Wang, X. , “Research on the efficient and accurate student performance evaluation, ”Soft
Influence of DNN-Based Cross-Media Data Analysis on College Computing ,vol.6, no.24,pp.18477-18487,Jun.2020,doi:
Students' New Media Literacy, ” Computational Intelligence and 10.1007/s00500-020-05075-4.
Neuroscience ,vol.2022,Article ID 9224834,Aug.2022,doi: [37] Hu Hang, Du Shuang, Liang Jiarou, Kang Zhonglin , “Construction
10.1155/2022/9224834. of learning performance prediction model: from big data analysis of
[23] Aydın, M. , “A multilevel modeling approach to investigating learning behavior, ”China's distance education ,no.4,pp.8-
factors impacting computer and information literacy: ICILS Korea 20,May.2021,doi:10.13541/j.cnki.chinade.
and Finland sample, ” Education and Information Technologies , [38] Mou Zhijia, Wu Fati , “MOOC learning results prediction index
vol.27, no.2,pp.1675-1703,Aug.2021,doi:10.1007/s10639-021-1069 exploration and learning group characteristics analysis, ”Research
0-1.
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3278370
2023
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4