Research Proposal
College/institute/School :
Department :
Project Initiator(s) :
Co-Project Initiators :
Title :
Place of Work :
Thematic Area :
Project Duration :
Project Code :
1. Introduction
Information is data that is accurate and timely, specific and organised for a purpose, presented within a
context that gives it meaning and relevance and can lead to an increase in understanding and decrease in
uncertainty ([Link] Information is valuable
because it can affect behaviour, a decision or an outcome. Although information is recognised as an
important development resource and it is acknowledged that an absence of information may impede
development (Boon 1992; Camble 1994). Information is power only when used and applied (Boon
1992; Martin 1984; Paez-Urdaneta 1989) effectively.
Knowledge management includes knowledge sharing, exchanging, and dissemination as elements. The
main purpose of knowledge management is to transform information and intellectual assets into
enduring value (Metcalfe, 2005). The basic idea is to propel the development by using the wealth of
information for the benefit of community.
In Universities, each and every year the number of students and staff members doing research is
increasing. There is no digital repository of project information at the department, institute and
university level which is accessible and available to all users. Storage, retrieval and analysis methods at
the department, institute and university level to find out the research projects or thesis which is repeated
is not developed so far. The researchers and staff members don’t have access to the researches
completed at the University for reference and study purpose.
It is very difficult for the department or institute to find out the repetition of the research topics and
there is no system as on date in the University to analyse the information on research
topics.
2. Rationale of the Project
The projects done by staff and PhD / Master’s thesis and UG project reports are not accessible to
everyone. The information is an asset when it will be used by the intended people. If it remains in the
shelves of libraries/offices, the purpose for which it was generated will be defeated.
Also, due to increase in the number of projects every year, the task of checking repetition of projects and
assigning new topic is tedious.
To fill these gaps, a software will be developed so that the repository of information will be created for
everyone’s access and use, not restricted to within the university but between universities also in the
future. It will benefit the students, teachers and researchers. Also, the repetition can be avoided.
3. Relevance of the Project
To develop a software for storage, analysis and retrieval of information on student (Under Graduate,
Post Graduate, Master and Doctorate) projects and/or thesis and staff research projects (Community,
Research and Technology Transfer) completed at the University level.
4. Specific Objectives
1. To establish a mechanism to classify, document and store the student projects and/or thesis and
staff research projects [under graduate, post graduate, PhD and staff projects – Community
projects, research projects and technology transfer projects) completed at the University level.
2. To develop a pilot software for project information analysis within Ambo University.
5. Scope of the Project
An integrated project information analysis software will be developed for all campuses in Ambo
University. Upon successful completion of this project, the same can be scaled up by linking various
Universities in Ethiopia.
6. Review of Literature
Manber presented approximate index concept to measure similarity between strings in different
documents (Manber 1994 [6]). A tool called “Sif” is developed to find similar files in a large file
system. He proposed the concept of approximate index to measure the similarity of character strings
between documents, which was adopted later by many similar systems.
(Manber 1994 [6]) described using a finger print (or what they called anchors) and a fixed number of
characters as a baseline to search for plagiarism. In a similar approach and rather than considering a
fixed number of characters where changing one character may affect the whole comparison, we decided
to select 4 words as the baseline. An initial method is developed to calculate the most frequent words in
a paper and use them as an anchor. This is of course after removing all generic words, prepositions, and
any other words that are expected to be seen in any paper (i.e., abstract, keywords, “this paper”, etc). For
each occurrence of those frequent words, the algorithm will take 4 words starting from frequent words,
and then look in all subject documents for possible matches.
Comparison will be based on two criteria: performance and plagiarism detection. If sufficient number of
baselines (i.e., 4-words statements are common to two files (under comparison) then this is a good
enough evidence that the two files are similar in some way.
The tool developed in this paper uses several different search algorithms. The first one searches for
possible similar documents for the subject document through a directory of files. The other algorithm
searches for similar documents through the Internet. Calculating similarity between documents does not
require in many cases similarity in cosmetic attributes such as the file type, size, number of words, etc.
He defined a checksum algorithm called “fingerprint” that is based on defining keywords in each
document and parse a certain amount of characters starting from those keywords to calculate similarity.
In those checksum, anchor words are used from which a certain number of characters is selected and
compared among documents. Anchors are created through analyzing text from many different files and
selecting a fixed set of representative strings. In somewhat similar approach, we used the most frequent
words in the subject word to be our anchors from which the algorithm will start looking for possible
plagiarism or sentences’ match.
Some papers tried to tackle the performance problem of finding plagiarism in documents through using
indexing (Mozgovoy et al., 2005 [7]). Such concept is utilized also in search engines for fast document
retrieval.
7. Expected Outcome of this project
A Software will be developed to classify, document, store and analyse the research projects or thesis
completed in various departments of Ambo University.
All the research projects or thesis completed will be classified, documented and stored in the dedicated
server space. Information can be retrieved quickly in convenient search methods based on the
requirements.
Repetition in the research topics selected by the students and staff members can be avoided by using this
software. Any information on the research projects or theses can be searched and retrieved quickly and
efficiently. It will help the staff members/students to refer previous research work.
8. Methodology of Software Development
The development design of the software is depicted in the following UML diagram. The project topics
submitted will be searched by title/keyword/abstract. It is planned to use semantic search (Fuzzy search)
methods for searching. Also it is web based application and planned to integrate with the existing Ambo
University website. Moreover PHP and MySQL software will be used to develop the application. The
authors of the project/thesis can decide whether their full work/abstract can be made available in the
portal or not.
Figure 1. UML Diagram
The database design of the proposed software is given below.
1. University_DB
[Link]. Field Name Data Type Size Constraint
1 University ID VarChar 10 Primary Key
2 University Name VarChar 25
2. Institute DB
[Link]. Field Name Data Type Size Constraint
1 Institute ID VarChar 10 Primary Key
2 Institute Name VarChar 25
3 University ID VarChar 10 Reference Key
3. Department DB
[Link]. Field Name Data Type Size Constraint
1 Department ID VarChar 10 Primary Key
2 Department Name VarChar 25
3 Institute ID VarChar 10 Reference Key
4 Programme Type VarChar 2
4. Project Student DB
[Link]. Field Name Data Type Size Constraint
1 Project ID VarChar 10 Primary Key
2 Project Title VarChar 100
3 Department ID VarChar 10 Reference Key
4 Programme Type VarChar 2
5 Duration_of_Project Number
6 Guide Name VarChar 25
7 CoAdvisor VarChar 25
8 Project Member(s) VarChar 200
9 Keyword VarChar 50
10 Abstract VarChar 1000
11 Status VarChar 15
5. Project Staff DB
[Link]. Field Name Data Type Size Constraint
1 Project ID VarChar 10 Primary Key
2 Project Title VarChar 100
3 Department ID VarChar 10 Reference Key
4 Principal Investigator VarChar 25
5 Co-Investigators VarChar 500
6 Duration_of_Project Number
7 Year of Commence VarChar 20
8 Year of Completion VarChar 20
9 Keyword VarChar 50
10 Abstract VarChar 1000
11 Status VarChar 15
9. Work plan
Duration of the Project - 1 Year
S.N Activities Duration
Work Plan for 2010 E.C.
1 Visit by Initiators to all the campuses to convey the
information to the authorities June, 2010
Training the material Collectors
September, 2010
Work Plan for 2011 E.C.
1 Collection of Data from IOT, Main Campus, Gudar October, 2011
Campus and Wolliso Campus
2 Design the software, based on the collected information November, 2011 - January, 2011
3 Coding February to March, 2011
4 Preparing Software Manual April-May, 2011
5 Workshop for HODs (by visiting all campuses) June, 2011
6 Report submission and Presentation June 2011
10. Budget Breakdown.
Budget Breakdown for the year 2010 E.C.
B.1. Stationary Expenses for the year 2010 E.C
Type/items Unit Quantity Unit Price Total Price Remarks
(Number) (Number) (Birr) (Birr)
A4 size paper 1 pack 2 200 400
Scribbling pad 1 8 15 120
(Small)
Pen 1 8 5 40
Marker 1 3 10 30
Flip Chart 1 2 130 260
Total 850
B.2. Fuel Cost:
S.N Item No. of Trips Quantity Unit priceTotal price(Birr)
Litres (Birr/litre)
1 Visits to Woliso Campus 1 75 20 1500
2 Sub total 1500
B.3. Personal Expenses
S.N Activities Qualification No ofNo. of days Daily rateTotal Remark
personnel (Birr) payment
(Birr)
1 Per diem for initiators to visit Post-Graduation 7 3 179 3759
Wolliso Campus and PhD
2 Per diem for Material Collectors toUnder- 8 2 50 800
attend training Graduation
3 Training Refreshment [ Material 15 2 50 1500
collectors and Trainers ]
Total 6059
Budget Breakdown for the year 2011 E.C.
B.1. Stationary Expenses for the year 2010 E.C
Type/items Unit Quantity Unit Total Price Remarks
(Number) (Number) (Birr)
Price (Birr)
A4 size paper Packs 10 200 2000 (Software
Manual)
Scribbling pad Pcs 95 15 1425
(Small)
Pen Pcs 95 5 475
Marker Pcs 10 10 100
Flip Chart Pcs 4 130 520
Total 4520
B.2. Fuel Cost:
S.N Item No. of Trips Quantity Unit priceTotal price(Birr)
Litres (Number) (Birr/litre)
1 Visit to Wolliso Campuse. 1 75 20 1500
2 Sub total 1500
B.3. Personal Expenses
S.N Activities Qualification No ofNo. of days Daily rateTotal Remark
personnel (Birr) payment
(Birr)
1 Per diem for Material Collectors to Under-Graduation
collect data from
IOT 2 12 50 1200
Main Campus 2 12 50 1200
Gudar 2 12 50 1200
Wolliso 2 12 50 1200
2 Software Manual Preparation 4500
3 Workshop Refreshment
IOT (12 + 5) 17 1 50 850
Main Campus ( 33 + 5* ) 38 1 50 1900
Gudar Campus ( 11 + 5*) 16 1 50 800
Wolliso Campus (19 + 5*) 24 1 50 1200
Trainers 7 4 50 1400
4 Per diem for workshop trainers
Gudar Campus 7 1 99 693
Wolliso Campus 7 3 179 3759
Total 19902
* Research and community service staff from Directorate and Institute.
C. Budget summary
S.N Budget Item Total Budget in Birr Remarks
1. Stationary Expenses 5370.00
2. Fuel cost 3000.00
3. Personal Expenses 25961.00
Total cost 34331.00
Contingency (10%) 3433.10
Grand Total 37764.10
11. References
1. L. Prechelt, M. Guido and M. Phlippsen, “JPlag: Finding plagiarisms among a set of programs”,
Journal of Universal Computer Science, vol. 8, no. 11, (2000).
2. J. Faidhi and S. K. Robinson, “An empirical approach for detecting program similarity within a
university programming environment”, Computers & Education, vol. 11, no. 1, (1987), pp. 11-
19.
3. M. Wise, “Detection of similarities in student programs: YAP’ing may be preferable to
Plague’ing”. ACM SIGSCE Bulletin (Proc. of 23rd SIGCSE Technical Symp.), vol. 24, no. 1,
(1992), pp. 268-271.
4. D. Gitchell and N. Tran, “Sim: a utility for detecting similarity in computer programs”, The
proceedings of the thirtieth SIGCSE technical symposium on Computer science education, New
Orleans, Louisiana, United States, (1999) March 24-28, pp. 266-270.
5. S. Grier, “A tool that detects plagiarism in Pascal programs”, ACM SIGCSE Bulletin, vol. 13,
no. 1, (1981), pp. 15-20. 34 Copyright ⓒ 2014 SERSC
6. U. Manber, “Finding similar files in a large file system[C/OL]”, Proceedings of the Winter
USENIX Conference, (1994), (2006), pp. 1-10.
7. M. Mozgovoy, K. Fredriksson, D. White, M. Joy and E. Sutinen, “Fast plagiarism detection
system”, Lecture Notes in Computer Science, vol. 3772, (2005), pp. 267-270.
8. B. Baker, “A theory of parameterized patern matching: Algorithms and applications”, 25th
Annual ACM Symposium on Theory of Computing, San Diego, CA, (1993), pp. 71-80.
9. Broder, Z. Glassman, C. Steven, M. Manasse and G. Zweig, “Syntactic Clustering of the Web”,
Proceedings of the Sixth WWW Conference. Santa Clara, CA, (1997).
10. Jun-Peng, S. Jun-Yi, L. Xiao-Dong and S. Qin-Bao, “A Survey on Natural Language Text Copy
Detection”, Journal of Software, vol. 14, no. 10, (2003), pp. 1753-1760.
11. H. Maurer, F. Kappe and B. Zaka, “Plagiarism, a survey”, Journal of universal computer science,
vol. 12, no. 8, (2006).
12. C. Kustanto and I. Liem, “Automatic Source Code Plagiarism Detection”, SNPD, (2009), pp.
481-486.
13. J. Hage, P. Rademaker and N. Vugt, “A comparison of plagiarism detection tools, Technical
Report”, UU-CS-2010-015, Department of Information and Computing Sciences Utrecht
University, Utrecht, The Netherlands, (2010).
14. El Tahir, H. Abdulla and V. Snasel, “Survey of Plagiarism Detection Methods”, 2011 Fifth Asia
Modelling Symposium, Manila, Philippines, May 24-May 26.
15. Alhami and I. Alsmadi, “Automatic code homework grading based on concept extraction”,
International Journal of Software Engineering and Its Applications IJSEIA
([Link] vol. 5, no. 4, (2011).
16. Boon, J.A. (1992). Information and development: some reasons for failure. Information Society,
8(3), 227-241.
17. Camble, E. (1994). The information environment of rural development workers in Borno State,
Nigeria. African Journal of Library Archives and Information Science, 4(2), 99-106.
18. Martin, W.J. (1984). The potential for community information services in a developing country.
IFLA Journal, 10(4), 385-392.
19. Metcalfe, A.S. (ed.). 2005. Knowledge Management and Higher Education: A Critical Analysis.
(Online). Available at [Link]
[Link].
20. Paez-Urdaneta, I. (1989). Information in the Third World. International Library Review, 21(2),
177-191
Assurance of the Principal Investigator:
The undersigned agrees to accept responsibility for scientific, ethical and technical conduct of the
research project and for the provision of required progress report as per terms and conditions of the
University policy in effect at the tie of grant, if grant is awarded as the result of this application.
Date: Signature:
Approval By:
Head of Department of Signature
Date:
College/Institute Research Team Leader: Signature
Date:
Dean, Institute of Technology Signature
Date:
Research, Consultancy and Community Service
For Use by the RCCSD Office
Amount of total budget approved
Amount of Budget approved for the current year
Period of Allocation