Daniel Project
Daniel Project
i
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of “Danielson Kwame Klutsey”,
Enrollment No: A9920122002949 (el) who carried out the Project work as a team entitled “Human
Trafficking Identification and Prediction” under my supervision from March 2024 to May 2024. The
project work embodies the original research work undertaken by the candidate and meets the requirements
for the partial fulfillment of M.B.A in DATA SCIENCE. This project report has not been submitted
DATE : 01-05-2024
ii
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
DECLARATION
I, “Danielson Kwame Klutsey”, Enrollment No A9920122002949 (el), hereby declare that the major
project on human trafficking identification and prediction using Python, entitled “Human Trafficking
Identification And Prediction”, is the result of my own original research work and has been carried out
under the guidance of “Dr. Francis Sarkodie M.P.M., Ph.D”. All sources of information and assistance
utilized during the course of this project have been duly acknowledged and cited in the bibliography.
DATE: 01-05-2024
iii
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
ACKNOWLEDGEMENT
encouragement in doing this project and for completing it successfully. I am grateful to them.
I convey my thanks to “Dr. Francis Sarkodie M.P.M., Ph.D” for providing me necessary
I would like to express my sincere and deep sense of gratitude to my Project Guide “Dr.
Francis Sarkodie M.P.M., Ph.D”, for his valuable guidance, suggestions and constant
I wish to express my thanks to all Teaching and Non-teaching staff members of the Amity
University who were helpful in many waysfor the completion of the project.
iv
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
ABSTRACT
Human trafficking is a global crime that affects millions of individuals, yet it remains a highly
challenging issue for law enforcement and other relevant authorities to combat effectively. To
address this complex problem, this study proposes a Comprehensive Human Trafficking
Identification and Prediction System (CHTIPS) that leverages the power of machine learning
human trafficking and to predict the likelihood of individuals becoming victims. The system
utilizes a variety of data sources, including social media posts, financial transactions, and
algorithms, such as support vector machines, random forests, and neural networks, are
employed to extract patterns and identify potential indicators of human trafficking. These
algorithms are trained using labeled data, consisting of known cases of human trafficking, to
enhance their predictive capabilities. The system also incorporates real-time data, allowing
for dynamic updates and adjustments. Through the integration of diverse data sources and
advanced machine learning algorithms, CHTIPS aims to improve the accuracy and timeliness
of human trafficking identification and prediction. Such a system could greatly assist law
enforcement agencies in their efforts to detect and prevent human trafficking, leading to more
effective interventions and increased victim support. However, further research and
collaboration with relevant stakeholders are needed to refine and validate CHTIPS in real-
world scenarios.
v
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
TABLE OF CONTENTS
CHAPTER 1 : INTRODUCTION…………………………………………………….01-04
CHAPTER 8 : CONCLUSION………………………………………………………….51-54
CHAPTER 9 : REFERNCES……………………………………………………………55-56
vi
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
APPENDIX
A. SOURCE CODE……………………………………………………………………..57-69
B. SCREENSHOTS……………………………………………………………………..70-73
C. RESEARCH PAPER………………………………………………………………...74-87
LIST OF FIGURES
Fig B.1 Source Data of Different people from different states. ................................. …70
Fig B.2 Bar diagram of collection of data of various states Age wise………………….70
Fig B.3 Bar diagram of collection of data of various states Gender wise……………….71
Fig B.4 Bar diagram of collection of data of various states Education wise…………….71
Fig B.7 Diagram Showing Web Page created using code in Python………………….. 73
vii
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 1
INTRODUCTION
Human trafficking is a global crime that involves the exploitation of people through force,
fraud, or coercion for various purposes such as forced labor, sexual exploitation, or
complexities involved in identifying and predicting human trafficking cases have led to the
automating processes and analyzing large volumes of data to detect patterns and indicators
populations, such as women, children, migrants, and refugees, are particularly at risk due
and physical violence, to control and manipulate their victims. Moreover, the clandestine
nature of human trafficking makes it difficult to gather accurate data and evidence,
hampering efforts to combat the crime effectively. Consequently, there is a pressing need
for innovative approaches that leverage emerging technologies like machine learning to
entail the use of algorithms that can process vast amounts of data from diverse sources,
including social media, online advertisements, financial transactions, and law enforcement
1
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
records, to identify potential victims and traffickers. By analyzing these data sets, machine
learning algorithms can detect hidden patterns, correlations, and anomalies that may
indicate trafficking activities, aiding in the early identification and prevention of such
crimes. Inconclusion, human trafficking is a complex and pervasive crime that necessitates
possibilities for the development of comprehensive systems for better user interface.
Identification and prediction play a crucial role in combating human trafficking, and their
holding traffickers accountable. Often, victims of human trafficking are hidden in plain
agencies can leverage cutting- edge technology to identify potential victims and gather
evidence to build strong cases against traffickers. Machine learning algorithms can sift
through vast amounts of data, such as social media interactions, online advertisements, and
financial transactions, to identify patterns and indicators of human trafficking. This can
help in locating individuals at risk, tracking the movement of victims, and apprehending
the perpetrators involved. Identifying victims not only offers them a chance to escape their
captors but also allows support services to intervene and provide them with necessary
proactive measures to be taken. By analyzing historical data and patterns, machine learning
2
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
algorithms can anticipate where and when human trafficking is likely to occur. This
resources and focus their efforts on potential hotspots. Furthermore, with the ability to
predict specific individuals or groups who are at higher risk of being trafficked, preventive
can increase surveillance at transportation hubs, strengthen border control, and enhance
Prediction can also aid in targeting the underlying causes of human trafficking, such as
authorities can enhance their ability to identify victims, gather evidence against traffickers,
and provide assistance to survivors. Moreover, the predictive capabilities of such systems
One of the main challenges in identifying and predicting human trafficking is the lack of
reliable and comprehensive data. Human trafficking often operates in the shadows, making
it difficult to obtain accurate information about its scale and scope. Many victims are afraid
data about their experiences. Additionally, the clandestine nature of human trafficking
3
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
means that law enforcement agencies and other relevant organizations may not have access
diversity of human trafficking cases. Human trafficking can take many forms, such as
forced labor, sex trafficking, and child trafficking. Each type of trafficking requires unique
identification and prediction strategies, as the risk factors and indicators can vary.
Moreover, human trafficking networks often adapt and evolve their tactics, making it
difficult to keep up with these changing patterns. Machine learning techniques need to be
able to analyze and adapt to these diverse and evolving scenarios to accurately predict
Furthermore, the ethical considerations surrounding the use of machine learning in human
predictive models requires the use of historical data, which may include sensitive
that this data is storedsecurely and used ethically. Additionally, the deployment of machine
learning systems should not replace the work of human experts and frontline organizations.
and ensuring that human rights and dignity are protected throughout the process.
comprehensive system using machine learning techniques include the lack of reliable data,
the complexity and diversity of human trafficking cases, and the ethical considerations
4
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 2
STUDY HYPOTHESIS
Human trafficking represents a grave violation of human rights and a significant global
challenge. Traditional methods for identifying and combating human trafficking often rely
biases. Therefore, there is a pressing need for more effective and efficient approaches to
identify, track, and predict human trafficking activities. This study aims to explore the
At the core of this project lies the hypothesis that machine learning algorithms can be
trained to recognize patterns and indicators of human trafficking from various types of data
sources. These data sources may include but are not limited to, online advertisements,
social media activities, financial transactions, transportation records, and law enforcement
reports. By analyzing these diverse datasets, machine learning models can potentially
uncover hidden connections, identify red flags, and extract valuable insights that may not
One key aspect of this hypothesis is the assumption that human trafficking leaves
discernible digital footprints across different data domains. These footprints may manifest
machine learning algorithms, it is postulated that these digital footprints can be captured
5
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
and leveraged to develop predictive models capable of identifying and predicting human
Furthermore, the hypothesis suggests that machine learning models can adapt and improve
over time as they are fed more data and exposed to new patterns of human trafficking
and reinforcement learning, these models can continuously refine their predictive
capabilities and stay abreast of evolving trends and tactics employed by traffickers. This
adaptive nature of machine learning holds the promise of creating robust and resilient
Additionally, the hypothesis posits that machine learning algorithms can help overcome
Unlike manual methods that may be hindered by human biases, cognitive limitations, and
the sheer volume of data to sift through, machine learning models are capable of
processing large-scale datasets rapidly and objectively. Moreover, they can detect subtle
patterns and correlations that may elude human investigators, thereby augmenting the
In summary, this study hypothesizes that by harnessing the power of machine learning
techniques and leveraging diverse datasets, it is possible to develop robust, accurate, and
proactive systems for human trafficking identification and prediction. These systems have
the potential to revolutionize the way we combat human trafficking, enabling law
6
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 3
LITERATURE REVIEW
identification and prediction system using these techniques. The study, published in
the WMU Journal of Maritime Affairs, provides insights into the potential of machine
2. Summers, L., Shallenberger, A. N., Cruz, J., & Fulton, L. V. (2023). A Multi-
472.
In this research paper, Summers et al. (2023) present a novel approach for classifying
sex trafficking from online escort advertisements. The authors propose a multi-input
machine learning system that combines textual and visual features to accurately
7
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
comprehensive system that can effectively predict and identify human trafficking
activities. This research contributes to the field of machine learning and knowledge
3. Youssef, B., Bouchra, F., & Brahim, O. (2023, March). State of the Art
Youssef, B., Bouchra, F., and Brahim, O. conducted a study on the state of the art
literature regarding anti-money laundering using machine learning and deep learning
using machine learning techniques. Their findings were published in the conference
4. Ray, A., Arora, V., Maass, K., & Ventresca, M. (2023). Optimal resource
Transactions, 1-15.
In their study, Ray, Arora, Maass, and Ventresca (2023) propose an optimal resource
8
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
study aims to improve the accuracy of human trafficking detection while minimizing
false positives and false negatives. The findings of this research contribute to the
5. Gakiza, J., Jilin, Z., Chang, K. C., & Tao, L. (2022). Human trafficking solution
Gakiza, J., Jilin, Z., Chang, K. C., & Tao, L. (2022) presented a research paper titled
"Human trafficking solution by deep learning with Keras and OpenCV" at the
system using machine learning techniques. The system utilizes deep learning
associated with human trafficking. The paper discusses the methodology and
Diagnose Eye diseases using Deep Learning Techniques. In 2022 4th International
Agarwal, S., & Bhat, A. (2022) conducted a study on investigating ophthalmic images
to diagnose eye diseases using deep learning techniques. The research was presented at
9
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Control and Networking (ICAC3N) and published by IEEE. The authors focused on
diagnosis and prediction of eye diseases. The paper discusses the methods and findings
of the study, emphasizing the potential of deep learning techniques in the field of
ophthalmology.
7. Li, C., Zhu, B., Zhang, J., Guan, P., Zhang, G., Yu, H., ... & Liu, L. (2022).
and age-related eye diseases in mainland China. Frontiers in Public Health, 10,
966006.
Li et al. (2022) conducted a study on the epidemiology, health policy, and public
China. Their research aimed to provide insights into the prevalence, risk factors, and
impact of these conditions in the Chinese population. The study findings contribute to
visual impairment and age- related eye diseases. The research highlights the
significance of integrating public health measures and health policies to address the
12, 14.
10
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Their research focuses on the application of artificial intelligence for the accurate
identification of these eye conditions. The study explores the potential of machine
these diseases. This research holds promise for the development of a comprehensive
9. Cheng, Y., Ren, T., & Wang, N. (2023). Biomechanical homeostasis in ocular
In their mini-review, Cheng, Ren, and Wang (2023) explore the concept of
significance of maintaining this balance and its implications for understanding and
10. Sanghavi, J., & Kurhekar, M.(2023). Ocular disease detection systems based
In their survey paper, Sanghavi and Kurhekar (2023) explore ocular disease detection
systemsthat rely on fundus images. They discuss the potential of these systems for early
11
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
The existing system for human trafficking identification and prediction relies mainly
on manual methods and limited data analysis. Law enforcement agencies and
and analysis of official records to identify potential human trafficking cases. However,
trafficking is a major global issue that involves the forced exploitation of individuals
for various purposes such as labor, sex trafficking, and organ trafficking. The system
will leverage the power of machine learning algorithms and techniques to analyze
large volumes of data and identify patterns and indicators of human trafficking. By
analyzing various data sources such as social media, online advertisements, financial
transactions, and immigration records, the system will be able to identify potential
12
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Data Sensitivity: Human trafficking data is highly sensitive. Ensuring the privacy
False Negatives: Failing to identify a genuine trafficking case can result incontinued
Bias in Data: The training data might be biased towards certain regions,demographics,
computationally challenging.
13
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 4
RESEARCH METHODOLGY
In this pivotal phase, our primary aim is to bolster server performance, spearheaded by the
plan alongside detailed cost estimates. Simultaneously, as we delve into system analysis,
an exhaustive feasibility study of the proposed system takes precedence. This necessitates
not only a profound understanding of the system's major requirements but also a holistic
This in-depth investigation probes the economic landscape, scrutinizing the projected
impact of the system on the organization's financial framework. Given the finite pool of
14
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Here, our focus intensifies on the intricate technical fabric of the system, meticulously
assessing its requirements. It is paramount that the system's development exerts minimal
strain on available technical resources, thus averting undue burdens on the client. By
thereby minimizing the need for significant alterations during implementation and ensuring
seamless integration.
This facet of the study delves into the human aspect, probing the system's acceptance
among users. Central to this endeavor is comprehensive user training aimed at fostering
efficient system utilization. Our paramount goal is to instill user confidence and
acceptance, positioning the system not as a threat but as an indispensable tool within their
operational ecosystem. Efforts are channeled towards seamlessly integrating the system
into existing workflows, ensuring its perceived necessity and facilitating smooth adoption.
Python: As the primary language for machine learning model development and data
processing due to its extensive libraries like scikit-learn, TensorFlow, and PyTorch.
TensorFlow or PyTorch: For building and training deep learning models to handle complex
15
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Scikit-learn: For traditional machine learning algorithms such as Random Forest, Support
Vector Machines (SVM), and Logistic Regression for feature engineering and model
comparison.
Flask or Django: To develop a web-based interface or API for the deployment and
SQL or NoSQL Database: Depending on the nature of the data, choose a suitable database
PostgreSQL, MySQL, MongoDB: Examples of databases that can handle large-scale data
storage and retrieval, essential for managing datasets related to human trafficking
incidents.
Matplotlib and Seaborn: For generating visualizations such as histograms, scatter plots,
Plotly or Tableau: For creating interactive and informative dashboards to visualize patterns
NLTK (Natural Language Toolkit) or SpaCy: For preprocessing and analyzing textual
16
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
data, extracting relevant information from textual sources such as social media posts,
Word Embedding Models (Word2Vec, GloVe): To convert textual data into numerical
GeoPandas: For handling geospatial data and performing spatial operations such as
GIS Software (ArcGIS, QGIS): For advanced geospatial analysis and visualization of
and prevent overfitting by splitting the data into training and testing subsets.
Model Deployment Platforms (AWS, Azure, Google Cloud): For deploying trained
Encryption and Access Control Mechanisms: To safeguard sensitive data related to human
HIPAA.
17
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
preserving the utility of the data for analysis and model training.
Version Control Systems (Git, GitHub, GitLab): For tracking changes in codebase,
Project Management Tools (Jira, Trello): For coordinating tasks, setting milestones, and
Jupyter Notebooks: For creating interactive documents combining code, visualizations, and
Sphinx or MkDocs: For generating documentation from code comments and markdown
files, providing comprehensive guidelines for project setup, usage, and maintenance.
Automated Testing Frameworks (PyTest, UnitTest): For writing and executing test cases to
Continuous Integration Platforms (Travis CI, Jenkins): For automating the build, testing,
and deployment processes, ensuring code quality and stability throughout the development
lifecycle.
18
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 5
surveillance systems and leveraging advanced data analytics techniques, the system
can flag suspicious activities inreal-time, enabling law enforcement agencies to take
The data collection and cleaning phase is a crucial step in the development of a
learning techniques. This phase involves gathering relevant data from various sources,
such as law enforcement databases, social media platforms, and victim interviews, to
create a comprehensive dataset. The collected data needs to be cleaned and preprocessed
to ensure its quality and consistency. This includes handling missing values, outliers, and
protect the identities of victims and perpetrators. By effectively collecting and cleaning
data, accurate and reliable models can be developed to identify and predict human
human trafficking cases. These data sources serve as the foundation for training the
machine learning algorithms to detect patterns and make predictions in the realm of
human trafficking. These sources can include various types of data such as official
reports from law enforcement agencies, court records, victim testimonies, online
advertisements, social media posts, and news articles. Additionally, collaboration with
organizations working against human trafficking can provide valuable insights and
access to datasets that may not be publicly available. It is important to ensure the data
trafficking to increase the effectiveness and accuracy of the machine learning system. By
20
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
gathering and analyzing these relevant data sources, the developed system can contribute
prediction system involves the removal of irrelevant or duplicate data entries through the
utilization of machine learning techniques. This step plays a crucial role in ensuring the
accuracy and efficiency of the system. By eliminating irrelevant data, the system can
focus solely on analyzing relevant information, thereby enhancing its ability to identify
potential human trafficking cases. Additionally, the removal of duplicate data entries helps
techniques such as data pre-processing algorithms and anomaly detection methods can be
employed to achieve this task effectively. These techniques enable the system to identify
and filter out irrelevant or duplicate entries based on predefined criteria and patterns. As a
result, the comprehensive human trafficking identification and prediction system can
provide more accurate and reliable insight into the occurrence and likelihood of human
prediction system that employs machine learning techniques. However, the paper lacks
detailed information on the specific machine learning algorithms used, the size and
characteristics of the dataset used for training the models, and the evaluation metrics
employed to assess the performance of the proposed system. Additionally, the study does
not provide information on the sources of data used for training and testing, which might
21
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
impact the reliability and generalizability of the results. Furthermore, the paper could
benefit from a discussion on the potential limitations and ethical considerations associated
with using machine learning techniques in the context of human trafficking detection. A
more thorough explanation of the features and variables included in the dataset, as well as
the rationale behind their selection, would also contribute to a better understanding of the
system's effectiveness. Finally, it would be valuable to discuss potential future work and
areas for improvement in the proposed system, such as exploring different feature
Feature selection and engineering are crucial aspects of building a comprehensive human
trafficking identification and prediction system using machine learning techniques. In this
context, feature selection involves identifying the most pertinent features that contribute
significantly to the detection and prediction of human trafficking activities. This process
helps to eliminate irrelevant or redundant features, reducing the complexity and improving
the efficiency of the system. By selecting the most informative features, the model can
focuson the most important aspects of human trafficking, resulting in more accurate
predictions. On the other hand, feature engineering involves creating new features or
transforming existing ones to enhance the predictive power of the system. This technique
enables the model to capture hidden patterns or relationships that may not be apparent in
the original dataset. By engineering features that capture meaningful information about the
nature of human trafficking, such as location, age, gender, and socio-economic factors, the
system can generate more robust predictions. Together, feature selection and engineering
play a pivotal role in designing and developing a powerful machine learning system that
22
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
In the field of human trafficking identification and prediction, various machine learning
preprocessing techniques, such as data cleaning, imputation, and normalization, are used to
ensure the quality and consistency of the data. Feature selection methods, such as genetic
algorithms, information gain, and principal component analysis, are applied to identify the
most relevant features from the dataset. Finally, various classification algorithms, such as
decision trees, support vector machines, and artificial neural networks, are used to train
predictive models that can identify and predict human trafficking patterns. These
techniques leverage the power of machine learning to analyze large amounts of data, detect
subtle patterns, and generate accurate predictions. By integrating these techniques into a
comprehensive system, law enforcement agencies and NGOs can enhance their efforts in
5.2.5 Identifying key features or variables that may be predictive of human trafficking
incidents
prediction system using machine learning techniques involves the collection and
preprocessing of training data. This process is crucial to ensure the accuracy and reliability
of the machine learning model. Firstly, data must be collected from various sources such
data may include information on known trafficking cases, victim profiles, trafficker
profiles, recruitment tactics, and patterns of exploitation. Once the data is collected, it
23
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
needs to be preprocessed to make it suitable for the machine learning model. This involves
extracted from the data to capture the relevant information for the prediction task. This
may involve techniques such as text mining, image processing, or network analysis,
depending on the nature of the data. Overall, the collection and preprocessing of training
data lay the foundation for building an effective machine learning model for human
5.2.6 Creating new features or transforming existing ones to improve predictive power
Feature engineering involves creating new features or transforming existing ones to better
represent the underlying patterns in the data. This process can include techniques like
interaction features. Feature selection, on the other hand, aims to identify the most
relevant features that contribute the most to the model’s predictive power while reducing
irrelevant or redundant ones. This can be achieved through techniques such as recursive
algorithm with more informative and discriminative features, reducing overfitting, and
machine learning model can achieve higher accuracy, precision, and recall in predicting
and identifying human trafficking activities, thus enabling more effective prevention and
24
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
creating new features or transforming existing ones to better represent the underlying
patterns in the data. This process can include techniques like combining multiple related
Feature selection, on the other hand, aims to identify the most relevant features that
contribute the most to the model’s predictive power while reducing irrelevant or
redundant ones. This can be achieved through techniques such as recursive feature
elimination, select k-best, or L1 regularization. Both feature engineering and selection play
a significant role in enhancing model performance by providing the algorithm with more
learning model can achieve higher accuracy, precision, and recall in predicting and
identifying human trafficking activities, thus enabling more effective prevention and
The evaluation of the effectiveness of the machine learning system for A Comprehensive
improvement. One way to evaluate the system's effectiveness is by measuring its accuracy
25
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
This can be done by comparing the system's predictions with real-world data and assessing
its levels of false positives and false negatives. Additionally, the system's precision and
recall can be measured to evaluate its ability to correctly classify instances of human
trafficking and identify potential victims. To enhance the effectiveness of the system, several
Firstly, incorporating more comprehensive and up-to-date data can help improve the
and data sources, such as social media or online platforms commonly used by human
traffickers, can provide a more robust and comprehensive analysis. Further improvements
can include exploring advanced machine learning techniques, such as deep learning or
ensemble models, to enhance the system's performance. Regular updates and continuous
retraining of the system with new data can also help in achieving a more effective and
The Web User Interface (UI) for our comprehensive Human Trafficking Identification and
human trafficking. The UI features a range of functionalities that allow users to input and
manage various types of data related to human trafficking cases, such as victim
learning techniques, the system can analyze this data to identify potential trafficking
hotspots, predict future patterns, and aid in decision-making processes. The UI provides
intuitive data visualizationtools, including interactive charts and maps, to facilitate a better
26
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Additionally, the UI enables users to generate comprehensive reports and share important
information with other stakeholders. The system aims to increase the efficiency and
The database for the comprehensive human trafficking identification and prediction system
using machine learning techniques consists of various types of data that are crucial for
cases, including details such as the locations where trafficking occurred, the demographic
information of the victims, and the characteristics of the traffickers. This database also
incorporates data from social media platforms, online classifieds websites, and other online
contains data related to law enforcement efforts, such as the number of arrests made and
populations and vulnerable groups is also included in the database to aid in predictive
modeling. Other relevant data, such as immigration records, financial transactions, and
the human trafficking identification and prediction system can effectively analyze patterns
and trends using machine learning techniques to identify potential trafficking activities
27
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
5.2.9 Scaling numerical data to a common range to prevent bias towards certain
features
The need for a comprehensive human trafficking identification and prediction system is
crucial in the fight against this heinous crime. Machine learning techniques offer immense
leveraging the power of data analysis, machine learning algorithms can analyze and detect
patterns and anomalies that may indicate the presence of human trafficking networks.
These techniques can be applied to various sources of data, including social media, online
predict future hotspots or areas where human trafficking is likely to occur, enabling law
implement strict security measures, including data encryption, access controls, and regular
identification and prediction system can significantly contribute to the global fight against
Discovering and fixing such problems is what testing is all about. The purpose of testing
isto find and correct any problems with the final product. It's a method for evaluating the
quality of the operation of anything from a whole product to a single component. The
28
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
goal of stress testing software is to verify that it retains its original functionality under
extreme circumstances. There are several different tests from which to pick. Many tests are
available since there is such a vast range is of the let see if it works the
assessment of the options available in the given dataset making it reliable . Who
Performs the Testing: All individuals who play an integral role in the software
development process are responsible for performing the testing. Testing the software is
the responsibility of a wide variety of specialists, including the End Users, Project
Manager, When it is recommended that testing begin: Testing the software is the initial
step in the process. begins with the phase of requirement collecting, also known as the
Planning phase, and ends with the stage known as the Deployment phase. In the waterfall
model, the phase of testing is where testing is explicitly arranged and carried out. Testing
in the incremental model is carried out at the conclusion of each increment or iteration,
and the it is appropriate to halt testing: Testing the programme is an ongoing activity that
will never end. Without first putting the software through its paces, it is impossible for
5.2.11 Ensuring all features are on a comparable scale for machine learning
algorithms
crucial to implement unit testing. Unit testing helps identify and fix any issues or bugs
29
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Testcase1: The system should be able to accurately classify a given set of text data as
either indicative of human trafficking or not. This can be tested by providing a sample
of text data known to be related to human trafficking and verifying that the system
Testcase2: The system should have a low false positive rate, meaning it should not
This can be evaluated by providing a set of non-human trafficking text data and
Testcase3: The system should be able to handle a large volume of incoming data
system a large dataset and measuring its response time and resource usage to ensure it
remains efficient.
By conducting these and similar test cases, the unit testing process can help guarantee
Integration testing is a crucial step in the software development process, especially for
Prediction System using Machine Learning techniques. This type of testing focuses on
30
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
In the case of the Human Trafficking system, integration testing will involve
testing the integration and interaction between the different machine learning
algorithms, data processing modules, and the user interface. This will help ensure
Modules
Expected Outcome: Verify if the data is correctly processed and passed to the
The techniques can provide valuable insights and predictions, the system should be
used in conjunction with human expertise and judgment to ensure accurate and
this system holds great promise in bolstering global efforts to combat human
31
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
identificationand prediction
normalization, are used to ensure the quality and consistency of the data. Feature
component analysis, are applied to identify the most relevant features from the
vector machines, and artificial neural networks, are used to train predictive models
that can identify and predict human trafficking patterns. These techniques leverage
the power of machine learning to analyze large amounts of data, detect subtle
comprehensive system, law enforcement agencies and NGOs can enhance their
victims.
5.3.2 Training data collection and preprocessing for the machine learning model
prediction system using machine learning techniques involves the collection and
preprocessing of training data. This process is crucial to ensure the accuracy and
32
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
reliability of the machine learning model. Firstly, data must be collected from
organizations, and online platforms. This data may include information on known
make it suitable for the machine learning model. This involves removing any
data to capture the relevant information for the prediction task. This may involve
on the nature of the data. Overall, the collection and preprocessing of training data
lay the foundation for building an effective machine learning model for human
trafficking identification and prediction system using machine learning techniques. Feature
engineering involves creating new features or transforming existing ones to better represent
the underlying
patterns in the data. This process can include techniques like combining multiple related
features, encoding categorical variables, and creating interaction features. Feature selection,
on the other hand, aims to identify the most relevant features that contribute the most to the
model’s predictive power while reducing irrelevant or redundant ones. This can be achieved
33
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Both feature engineering and selection play a significant role in enhancing model
performance by providing the algorithm with more informative and discriminative features,
and selecting features, the machine learning model can achieve higher accuracy, precision,
and recall in predicting and identifying human trafficking activities, thus enabling more
5.3.4 Evaluating the effectiveness of the machine learning system and potential
futureimprovements
The evaluation of the effectiveness of the machine learning system for A Comprehensive
improvement. One way to evaluate the system's effectiveness is by measuring its accuracyin
correctly identifying instances of human trafficking and predicting future occurrences. This
can be done by comparing the system's predictions with real-world data and assessing its
levels of false positives and false negatives. Additionally, the system's precision and recall
can be measured to evaluate its ability to correctly classify instances of human trafficking
and identify potential victims. To enhance the effectiveness of the system, several potential
future improvements can be considered. Firstly, incorporating more comprehensive and up-
to-date data can help improve the accuracy and relevance of the predictions. Additionally,
incorporating additional features and data sources, such as social media or online platforms
commonly used by human traffickers, can provide a more robust and comprehensive
analysis.
34
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
In the field of human trafficking identification and prediction, various machine learning
preprocessing techniques, such as data cleaning, imputation, and normalization, are used to
ensure the quality and consistency of the data. Feature selection methods, such as genetic
algorithms, information gain, and principal component analysis, are applied to identify the
most relevant features from the dataset. Finally, various classification algorithms, such as
decision trees, support vector machines, and artificial neural networks, are used to train
predictive models that can identify and predict human trafficking patterns. These techniques
leverage the power of machine learning to analyze large amounts of data, detect subtle
comprehensive system, law enforcement agencies and NGOs can enhance their efforts in
5.4.1 Database
The first step in developing a comprehensive human trafficking identification and prediction
system using machine learning techniques involves the collection and preprocessing of
training data. This process is crucial to ensure the accuracy and reliability of the machine
learning model. Firstly, data must be collected from various sources such as law
may include information on known trafficking cases, victim profiles, trafficker profiles,
recruitment tactics, and patterns of exploitation. Once the data is collected, it needs to be
from the data to capture the relevant information for the prediction task. This may involve
techniques such as text mining, image processing, or network analysis, depending on the
5.4.2 Security
Feature engineering involves creating new features or transforming existing ones to better
represent the underlying patterns in the data. This process can include techniques like
interaction features. Feature selection, on the other hand, aims to identify the most relevant
features that contribute the most to the model’s predictive power while reducing irrelevant
or redundant ones. This can be achieved through techniques such as recursive feature
elimination, select k-best, inenhancing model performance by providing the algorithm with
learning model can achieve higher accuracy, precision, and recall in predicting and
identifying human trafficking activities, thus enabling more effective prevention and
36
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
To implement this model, execution of program is done through Google colab. Necessary
In the present study, groundwater samples are collected following random sampling
technique and the sampling stations are chosen in a near-grid pattern. Samples were
collected at 44 locations in the study area representing pre-monsoon and post- monsoon
due to seasonal impact. The dataset consists of Latitude, Longitude points and the data
points of 8 heavy metal in groundwater namely Zinc, Lead, Manganese, Nickel, Cobalt,
5.7.1 PYTHON
set, and versatile applicability. Python is the most suitable programming language for
machine learning since it can function on its own platform. Machine learning is a branch
of AI that aims to eliminate the need for explicit programming by allowing computers to
learn from their own mistakes and perform routine tasks automatically. However,
which is the method through which computers are trained to recognize visual and auditory
cues, understand spoken language, translate between languages. The desire for intelligent
solutions to real-world problems has necessitated the need to develop AI further in order
to automate tasks that are arduous to program without AI. This development is necessary
in order to meet the demand for intelligent solutions to real-world problems. Python is a
widely used programming language that is often considered to have the best algorithm for
5.7.2 ANACONDA
Anaconda is an open-source package manager for Python and R. It is the most popular
platform among data science professionals for running Python and R implementations.
There are over 300 libraries in data science, so having a robust distribution system for
them is a must for any professional in this field. Anaconda simplifies package
deployment and management. On top of that, it has plenty of tools that can help you with
38
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
With Anaconda, you can easily set up, manage, and share Conda environments.
Moreover, you can deploy any required project with a few clicks when you’re using
Anaconda. There are manyadvantages to using Anaconda and the following are the most
prominent ones among them: Anaconda is free and open-source. This means you can use
it without spending anymoney. In the data science sector, Anaconda is an industry staple.
It is open-source too, which has made it widely popular. If you want to become a data
science professional, you must know how to use Anaconda for Python because every
It has more than 1500 Python and R data science packages, so you don’t face any
compatibility issues while collaborating with others. For example, suppose your
colleaguesends you a project which requires packages called A and B but you only have
package
A. Without having package B, you wouldn’t be able to run the project. Anaconda
mitigates the chances of such errors. You can easily collaborate on projects without
worrying about any compatibility issues.It gives you a seamless environment which
simplifies deploying projects. You can deploy any project with just a few clicks and
commands while managing the rest. Anaconda has a thriving community of data
scientists and machine learning professionals who use it regularly. If you encounter an
issue, chances are, the communityhas already answered the same. On the other hand, you
can also ask people in the community about the issues you face there, it’s a very helpful
community ready to help new learners. With Anaconda, you can easily create and train
39
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
machine learning and deep learning models as it works well with popular tools including
TensorFlow, Scikit-Learn, and Theano. You can create visualizations by using Bokeh,
Now that we have discussed all the basics in our Python Anaconda tutorial, let’s
discusssome fundamental commands you can use to start using this package manager.
To begin using Anaconda, you’d need to see how many Conda environments are present
in your machine.conda env listIt will list all the available Conda environments in your
machine.
You can create a new Conda environment by going to the required directory and use this
command:
You can replace <your_environment_name> with the name of your environment. After
entering this command, conda will ask you if you want to proceed to which you should
reply with y:
proceed ([y])/n)?
On the other hand, if you want to create an environment with a particular version of
Similarly, if you want to create an environment with a particular package, you can use
40
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
thefollowing command:
Here, you can replace pack_name with the name of the package you want to use.
If you have a .yml file, you can use the following command to create a new
We have also discussed how you can export an existing Conda environment to a .yml
Activating an Environment
You can activate a Conda environment by using the following command: conda
activate <environment_name>
You should activate the environment before you start working on the same. Also,
activate. On the other hand, if you want to deactivate an environment use the following
command:
conda deactivate
Now that you have an activated environment, you can install packages into it by using
thefollowing command:
Replace the term <pack_name> with the name of the package you want to install in
Suppose you want to share your project with someone else (colleague, friend, etc.).
While you can share the directory on Github, it would have many Python packages,
making the transfer process very challenging. Instead of that, you can create an
environment configuration .yml file and share it with that person. Now, they can create
For exporting the environment to the .yml file, you’ll first have to activate the
The person you want to share the environment with only has to use the exported file by
If you want to uninstall a package from a specific Conda environment, use the
followingcommand:
On the other hand, if you want to uninstall a package from an activated environment,
42
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 6
IMPLEMENTATION DETAILS
In the “Development and Deployment Setup” section, you provide detailed insights into
the environment, tools, and processes involved in the creation and deployment of your
Learning Techniques.
Explain the software and hardware components constituting the development environment.
Discuss the programming languages, frameworks, and libraries chosen for building the
Detail the strategies employed for collecting and preprocessing human trafficking data.
Describe how data was sourced, cleaned, and transformed into a format suitable for
43
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Elaborate on the methodologies used to train and optimize machine learning models.
Discuss the selection of algorithms, the rationale behind their choices, and the techniques
Provide insights into the infrastructure used for deploying the system. Discuss the server
Explain how the user interface was designed and developed to facilitate interaction with
the human trafficking identification system. Discuss user experience considerations and
Address ethical and legal considerations pertaining to the development setup. Discuss
any measures taken to ensure the responsible and ethical use of data, as well as
44
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Discuss how collaboration and communication were managed during the development
phase. Highlight any tools or platforms used for team collaboration and communication,
6.2 Algorithms
Identification and Prediction System. This section aims to elucidate the reasoning
Explain the supervised learning models employed for the identification and prediction
Decision Trees, or Support Vector Machines, and discuss their respective strengths in
45
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Detail the use of unsupervised learning techniques in your system for uncovering
If applicable, delve into the implementation of deep learning models. Explain the neural
network architectures chosen, such as convolutional neural networks (CNNs) for image
analysis or recurrent neural networks (RNNs) for sequential data, and discuss their
Detail the use of unsupervised learning techniques in your system for uncovering
If applicable, delve into the implementation of deep learning models. Explain the
neural network architectures chosen, such as convolutional neural networks (CNNs) for
image analysis or recurrent neural networks (RNNs) for sequential data, and discuss
46
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
6.3 Testing
Explain the metrics used to evaluate the performance of the machine learning models.
Discuss standard metrics such as precision, recall, F1 score, and accuracy, emphasizing
6.3.2 Cross-Validation:
models. Explain the partitioning of data into training and testing sets, ensuring robust
47
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 7
RESULT AND DISCUSSION
crime. Through the integration of diverse data sources and state-of-the-art machine
potential victims and predicting the likelihood of individuals falling prey to human
trafficking.
The effectiveness of CHTIPS stems from its ability to analyze a wide range of data,
including social media posts, financial transactions, and online advertisements, to extract
meaningful patterns and indicators of human trafficking activity. By leveraging this diverse
array of information, the system can uncover subtle connections and behaviors that may
Machine learning algorithms such as support vector machines, random forests, and neural
networks play a pivotal role in CHTIPS by autonomously learning from labeled data to
continuous training and refinement, these algorithms enhance their predictive capabilities,
48
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
One of the key strengths of CHTIPS lies in its ability to incorporate real-time data,
enabling dynamic updates and adjustments to the system's predictive models. This real-
time capability ensures that CHTIPS remains responsive to emerging threats and changing
trafficking incidents.
Despite the promising potential of CHTIPS, several challenges and considerations must be
Firstly, the ethical and legal implications of data collection and analysis must be carefully
navigated to safeguard individual privacy and civil liberties. Additionally, the reliability
and accuracy of the data sources utilized by CHTIPS must be rigorously evaluated to
partnerships and sharing insights and expertise, CHTIPS can be refined and optimized to
better meet the needs and challenges of combating human trafficking on a global scale
49
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER 8
CONCLUSION
8.1 Conclusion
outcomes, and insights gained from the development and implementation of your system.
identification.
Example:
learning techniques for combating human trafficking. The system has demonstrated
commendable accuracy and reliability, offering valuable insights for law enforcement and
In the “Future Work” section, outline potential avenues for further research and
enhancement. Identify areas where the system can be expanded or improved, considering
51
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Example:
While our system has shown promising results, future work can focus on enhancing its
Exploring advanced deep learning architectures for image and text analysis can further
improve the system’s accuracy and ability to handle diverse data types. Collaboration with
international organizations and NGOs can facilitate the integration of global data,
In the “Research Issues” section, you address specific challenges and issues encountered
Identification and Prediction System. This section allows you to reflect on the complexities
inherent in the project and share valuable insights gained from navigating these challenges.
Discuss any challenges related to biases in the data used for training and testing your
Example:
One notable research issue encountered was the presence of biases in the
training data, which could influence the system’s predictions. Addressing this issue required a
exploring ways to mitigate biases. Future research should focus on developing methods .
52
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Example:
The project highlighted the need to consider regional variations in human trafficking
regions posed a challenge due to variations in reporting practices, legal frameworks, and
and Prediction System. This section offers insights into the real-world complexities of
Discuss any challenges related to the scalability of the system, especially when dealing
with larger datasets or increased user demand. Address how these challenges were
Example:
53
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
handling larger datasets or increased user demand. The system’s architecture required
refinement to ensure seamless scalability, and the incorporation of cloud services played a
pivotal role. Future implementations should prioritize a modular and scalable architecture
Explore challenges related to integrating your system with existing platforms, databases, or
Example:
Integrating the system with existing platforms and databases posed implementation
and NGOs, was crucial to understanding their systems and ensuring smooth integration.
protocols.
54
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
CHAPTER – 9
REFERENCES
[1]. Agarwal, S., & Bhat, A. (2022, December). Investigating Ophthalmic images to Diagnose
Eye diseases using Deep Learning Techniques. In 2022 4th International Conference on
Advances in Computing, Communication Control and Networking (ICAC3N) (pp. 973-979).
IEEE.
[3]. Cheng, Y., Ren, T., & Wang, N. (2023). Biomechanical homeostasis in ocular diseases: A
mini-review. Frontiers in Public Health, 11, 1106728.
[4]. Gakiza, J., Jilin, Z., Chang, K. C., & Tao, L. (2022). Human trafficking solution by deep
learning with Keras and OpenCV. In Proceedings of the International Conference on
Advanced Intelligent Systems and Informatics 2021 (pp. 70-79). Springer International
Publishing.
[5]. Gamage, C., Dinalankara, R., Samarabandu, J., & Subasinghe, A. (2023). A comprehensive
survey on the applications of machine learning techniques on maritime surveillance to detect
abnormal maritime vessel behaviors. WMU Journal of Maritime Affairs, 1-31.
[6]. Li, C., Zhu, B., Zhang, J., Guan, P., Zhang, G., Yu, H., ... & Liu, L. (2022). Epidemiology,
health policy and public health implications of visual impairment and age-related eye diseases
in mainland China. Frontiers in Public Health, 10, 966006.
[7]. Ray, A., Arora, V., Maass, K., & Ventresca, M. (2023). Optimal resource allocation to
minimize errors when detecting human trafficking. IISE Transactions, 1-15.
55
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
[8]. Sanghavi, J., & Kurhekar, M. (2023). Ocular disease detection systems based on fundus
images: a survey. Multimedia Tools and Applications, 1-26.
[9]. Summers, L., Shallenberger, A. N., Cruz, J., & Fulton, L. V. (2023). A Multi-Input Machine
Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements.
Machine Learning and Knowledge Extraction, 5(2), 460-472.
[10]. Youssef, B., Bouchra, F., & Brahim, O. (2023, March). State of the Art Literature on Anti-
money Laundering Using Machine Learning and Deep Learning Techniques. In The
International Conference on Artificial Intelligence and Computer Vision.
56
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
APPENDIX
A. SOURCE CODE
#!/usr/bin/env python3
# Author: Catalina Vajiac
# Purpose: Coarse clustering of text documents#
Usage: ./infoshieldcoarse.py [filename]
import heapq
import math
import networkx as nx
import numpy as np
import os, sys
import pandas
import pickle
import random
import re
import scipy
import time
57
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
# Utilities
def filter_text(text):
if type(text) is not str: # nan
return ''
return text
class InfoShieldCoarse():
def init (self, filename: str,
doc_id_header='', doc_text_header='',
num_phrases=10):
# init basic variables
self.time = time.time()
self.num_phrases = num_phrases
self.filename_full = filename.split('.')[0]
self.filename =
58
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
os.path.basename(filename).split('.')[0]
#self.time_filename = '{}_streaming-
{}_time.txt'.format(self.filename)
self.ngrams = (5, 5)
self.index_to_docid = Counter()
self.docid_to_index = Counter()
self.data = pandas.read_csv(filename,
lineterminator='\n')
self.determine_header_names(doc_text_header,doc_id_header)
#self.data =
self.data.drop_duplicates(subset=['title',
self.description])
if 'timestamp' in self.data.columns:
self.data.sort_values(by=['timestamp']) #
since ads not in order as they should be
self.num_ads = len(self.data.index)
self.cluster_graph = nx.Graph()
# setup tfidf
tfidf = TfidfVectorizer(token_pattern=r'[^\s]+',
lowercase=False, ngram_range=self.ngrams,
sublinear_tf=True) self.tokenizer
= tfidf.build_analyzer()
self.document_freq = defaultdict(float)
self.term_freq = defaultdict(lambda:
Counter())
self.length = Counter()
self.data[self.description] = self.data[self.description].apply(filter_text)
59
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
self.tfidfs =
tfidf.fit_transform(self.data[self.description].values)
self.tfidf_indices =
tfidf.get_feature_names_out()
self.num_ads_features = 0
def determine_header_names(self,
doc_text_header: str, doc_id_header: str):
''' automatically determine relevant header
names for doc id, doc text'''
columns = set(self.data.columns) indices =
{'ad_id', 'index', 'TweetID', 'id'}
descriptions = {'u_Description', 'description',
'body', 'Tweet', 'text'}
phones = {'u_PhoneNumbers', 'phone',
'PhoneNumber'}
descriptions.add(doc_text_header)
indices.add(doc_id_header) indices.add(doc_text_header)
for name, field in [('text', descriptions), ('unique
id', indices)]:#, ('phone #', phones)]:
if not len(columns.intersection(field)):
print('Add "{}" header to possible
descriptions!'.format(name))
exit(1)
self.description = columns.intersection(descriptions).pop()
self.id = columns.intersection(indices).pop()
self.phone =
columns.intersection(phones).pop() if
len(columns.intersection(phones)) else ''
60
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
if 'title' in row:
text = row['title'] if type(row['title']) == str
else ''
61
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
else:
text = ''
text += ' ' + row[self.description] if
type(row[self.description]) == str else ''
phrases = self.tokenize_text(text)
self.cluster_graph.add_nodes_from(top_tfidf,bipartite=0)
self.cluster_graph.add_node(doc_id,
bipartite=1)
self.cluster_graph.add_edges_from([(doc_id,
phrase) for phrase in top_tfidf])
def generate_labels(self):
document_nodes = set([n for n, d in
self.cluster_graph.nodes(data=True) if
d['bipartite']])
self.labels = [-1]*len(self.data.index)
for i, component in
enumerate(nx.connected_components(self.cluster
_graph)):
docs = [c for c in component if c in
document_nodes]
if len(docs) == 1:
62
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
continue
def write_cluster_graph(self):
''' write cluster graph as pkl file '''if
not os.path.isdir('pkl_files'):
os.mkdir('pkl_files')
#with
open('pkl_files/{}_ad_graph.pkl'.format(self.filena
me), 'wb') as f:
# pickle.dump(self.cluster_graph, f)
def write_csv_labels(self):
''' write new csv, with LSH labels '''
filename_stub = self.filename_full
self.final_data_filename = filename_stub +
'_LSH_labels.csv'
self.unfiltered_data_filename =
filename_stub + '_full_LSH_labels.csv'
data_filtered.to_csv(self.unfiltered_data_filename, index=False)
data_filtered =
63
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
def clustering(self):
''' process each ad individually and
incrementally save cluster graph. '''
t = time.time()
index = 0
for _, row in self.data.iterrows():
if index and not index % 10000:
time_elapsed = time.time() - t
print(index, '/', self.num_ads, 'time',
time_elapsed)
self.docid_to_index[row[self.id]] = index
self.index_to_docid[index] = row[self.id]
self.process_ad(index, row)
index += 1
self.generate_labels()
self.write_cluster_graph()
self.write_csv_labels()
self.total_time = time.time() - t
print('Finished clustering!', self.total_time)
#print(self.num_ads_features,
len(self.data.index)
def get_clusters(self):
''' given cluster graph, return the relevant
64
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
def print_clusters(self):
print('number of clusters',
len(self.get_clusters()))
clusters = sorted(self.get_clusters(),
key=lambda x: len(self.get_docs(x)),
reverse=True)
document_nodes = [n for n, d in
self.cluster_graph.nodes(data=True) if
d['bipartite']]
for i, cluster in enumerate(clusters):
docs = [c for c in cluster if c in
document_nodes]
print('cluster:', i, 'len:', len(docs))
for doc_id in docs:
index = self.docid_to_index[doc_id]
row = self.data.loc[index]
try:
description = row[self.description]
except:
print('issue with doc_id', doc_id, 'and
65
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
desc', self.description)
print(doc_id, [n for n in
self.cluster_graph.neighbors(doc_id)], row['label'])
#print(doc_id, [n for n in
self.cluster_graph.neighbors(doc_id)],
row['is_spam'])
print()
print('\n\n')
def usage(exit_code):
print('Usage: _ [filename]')
exit(exit_code)
filename = sys.argv[1]
d = InfoShieldCoarse(filename,
num_phrases=10)
d.clustering()
import os
import numpy as np
from math import ceil
import pandas as pd
66
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
import string
from nltk.corpus import stopwords
def log_star(x):
"""
Universal code length
"""
return 2 * ceil(np.log2(x)) + 1 if x != 0 else 0
def word_cost():
return GOLBAL_VOC_COST
def sequence_cost(seq):"""
Output encoding cost for a given sequence
"""
return log_star(len(seq)) + len(seq) * word_cost()
def str_prep(s):
s = s.translate(str.maketrans('', '', string.punctuation)).split(' ')
67
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
voc = set()
for label in lsh_label:
for id, text in df[df['LSH label'] == label][[id_str, text_str]].values:
try:
text = str_prep(text)
for t in text:
voc.add(t)
except:
continue
if len(text) != 0:
data[label][id] = text
gvc = ceil(np.log2(len(voc)))
set_global_voc_cost(gvc)
return data, gvc
"""
### Initialize document
doc = Document()
proc = doc.add_paragraph()
for s, c in zip(['Slot', 'Matched', 'Substitution', 'Deletion', 'Insertion'], WCI.values()):font =
proc.add_run(s).font
font.highlight_color = c
proc.add_run(' ')
68
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
doc.save(word_path)
"""
if len(temp_arr) > 0 and not os.path.exists(output_path):
os.makedirs(output_path)
69
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
B. SCREENSHOTS
Figure B.2 : Bar diagram of collection of data of various states Age wise
70
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Figure B.3 : Bar diagram of collection of data of various states Gender wise
Figure B.4 : Bar diagram of collection of data of various states Education wise
71
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
72
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Figure B.7 : Diagram Showing Web Page created using code in Python
73
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
C.RESEARCH PAPER
A MACHINE-LEARNING APPROACH TO HUMAN TRAFFICKING IDENTIFICATION
AND PREDICTION
AUTHORS :
ABSTRACT :
This study introduces a comprehensive method for identifying and predicting human
trafficking using Machine-Learning. Given the urgent need for more efficient prevention and
intervention techniques in addressing this pervasive crime, the conventional manual
approaches are time-consuming. The proposed method automates the identification and
prediction processes by leveraging various Machine- Learning techniques. It analyzes
extensive data, including social media posts, individual demographics, and internet activity,
to pinpoint potential victims and forecast their likelihood of involvement in human
trafficking. Utilizing methods such as decision trees, support vector machines, and neural
networks enhances the system's effectiveness. Employing cross-validation, model evaluation,
and feature selection further boosts the accuracy of the system. This technique offers a
substantial improvement in accuracy, aiding law enforcement organizations in their
endeavors tocombat this heinous crime.
74
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
75
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
76
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
77
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
78
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
trafficking hotspots, this system will use interfaces and visualization elements
variety of data sources, including social [14]. All things considered, this all-
79
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
1. Problem Definition:
2. Information Gathering:
80
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Take care of missing values, outliers, and Separate the dataset into validation and
inconsistent data to clean it up. To training sets. Train the chosen model using
guarantee uniformity, scale or normalize the training set, adjusting the
the features. Categorical variables should hyperparameters as necessary [21].
be encoded, and dataset imbalances should Examine the model's performance using
beaddressed [18]. the validation set.
Determine relevant features that improve Determine the proper evaluation criteria
the performance of the model. Gather (accuracy, precision, recall, F1 score)
valuable information from the data by based on the nature of the problem.
applying location-based features or Evaluate the model's performance using
performing text analysis on social media datasets for both training and validation.
[19]. [21]
To learn more about the dataset, use an Examine the performance of the model and
exploratory data analysis. Display patterns, make iterative improvements by
relationships, and experimenting with different algorithms,
distributions of data visually [20]. adjusting hyperparameters, or adding
additional features [22].
6. Model Selection:
10. Ethical Issues:
Depending on the type of task
(classification, clustering, etc.), choose Talk about the moral issues surrounding the
appropriate machine-learning use of sensitive data. To prevent biases,
techniques. For improved performance, make sure the model is transparent and
think about utilizing deep learning models equitable. Aim for interpretability in your
or ensemble approaches [21]. model so that
81
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
interested parties can understand and have strategy strives to both predict future
faith in the forecasts [22]. incidents and promptly and accurately
identify potential cases of human
trafficking [24]. Leveraging the power of
11. Interpretability:
Machine-Learning algorithms, the system
Make an effort to make the model's scrutinizes vast amounts of data from
processes as clear as possible so that diverse sources, including social media,
stakeholders can understand and trust the internet marketing, and criminal records, to
forecasts [22]. uncover trends and indicators of human
trafficking. These algorithms, regularly
12. Deployment: updated andtrained on new data, enable the
system to continuously enhance its
In a scalable and secure environment,
accuracy and efficiency over time. The
deploy the trained model. Include the
technology aids law enforcement agencies
model in an intuitive user interface that
in locating potential victims, as well as
end users or pertinent authorities can use
identifying hotspots and areas where
[23].
human trafficking is likely to occur [25].
This proactive approach provides
13. Monitoring and Maintenance: authorities with improved resources for
Set up a way to track performance in the implementing preventive measures.
real world. Update the model frequently in Furthermore, the system offers law
light of fresh information and new trends enforcement and other stakeholders an
[23]. intuitive user interface, providing real-time
data, visualizations, and statistical analysis
to support their decision-making processes.
Ⅶ .RESULT AND DISCUSSION :
Inessence, by delivering a cutting-edge and
The Comprehensive Human Trafficking effective tool to identify potential victims,
Detection and Prediction System apprehend traffickers, and ultimately work
represents a groundbreaking initiative towards eradicating this horrifying crime,
aimed at tackling the widespread issue of this technology revolutionizes efforts to
human trafficking. Developed through combat humantrafficking [26].
extensive research, this
82
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
Fig 4 : Grid Showing The Human Trafficking Victims Are Used According to their age
83
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
84
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
[2] Summers, L., Shallenberger, A. N., [6] Agarwal, S., & Bhat, A. (2022,
Cruz, J., & Fulton, L. V. (2023). A December). Investigating ophthalmic to
Multi-Input Machine-Learning Diagnose Eye diseases using Deep
Approach to Classifying Sex Trafficking Learning Techniques. In 2022 4th
from Online Escort Advertisements. International Conference on Advances
Machine-Learning and Knowledge in Computing,
Extraction, 5(2), 460-472. Communication Control and Networking
(ICAC3N) (pp. 973-979). IEEE.
[3] Youssef, B., Bouchra, F., & Brahim,
O. (2023, March). State of the Art [7] Li, C., Zhu, B., Zhang, J., Guan, P.,
Literature on Anti-money Laundering Zhang, G., Yu, H., ... & Liu, L. (2022).
Intelligence and Computer Vision (pp. China. Frontiers in Public Health, 10,
Switzerland.
85
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
[8] Arias-Serrano, I., Velásquez- López, Intelligence based glaucoma and diabetic
P. A., Avila-Briones, L. N., Laurido- retinopathy detection using MATLAB—
Mora, F.C., Villalba-Meneses, F., Tirado- Retrained Alex Ne t convolutional neural
Espin, A., ... & Almeida- Galárraga, D. network.F1000Research, 12, 14.
(2023). Artificial
86
HUMAN TRAFFICKING IDENTIFICATION AND PREDICTION
87