0% found this document useful (0 votes)
52 views6 pages

Automated API Docs Generator Using Generative AI

Uploaded by

rovedi3757
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

Automated API Docs Generator Using Generative AI

Uploaded by

rovedi3757
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 IEEE International Students' Conference

on Electrical, Electronics and Computer Science


2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS) | 979-8-3503-4846-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/SCEECS61402.2024.10482119

Automated API Docs Generator using


Generative AI
Prakhar Dhyani Shubhang Nautiyal Aditya Negi
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
Graphic Era Hill University Graphic Era Hill University Graphic Era Hill University
Dehradun, India Dehradun, India Dehradun, India
[email protected] [email protected] [email protected]

Shikhar Dhyani Mrs. Preeti Chaudhary


Department of Computer Science and Department of Computer Science and
Engineering Engineering
Graphic Era Hill University Graphic Era Hill University
Dehradun, India Dehradun, India
[email protected] [email protected]

Abstract— Our study provides an improvement on the creation Earlier the manual or partially automated creation and
of Application Programming Interfaces (APIs) usage maintenance made it difficult to keep up with the ongoing
documentation using the efficiency and power of Generative AI. rapid development updates. Therefore, an AI driven API
APIs play an important role in software integration and documentation generator is a better approach for creation of
software maintenance but the process of API documentation
API documentation which we suggested in the paper.
creation has been traditional and did not evolve with time, this
paper employs Generative AI to enhance the accuracy, speed, This system is based on Generative AI technology that
and scale of API documentation generation. The automated API combines machine learning and natural language processing
documentation generator is created using natural language to provide tools that are capable of handling difficult task that
processing applied through a large language model mainly require human input. This strategy might make a
(TinyPixel/Llama-2-7B-bf16-sharded model). Training data bigger impact on how API Documentation can be written.
was created by applying web scraping on various large tech The goal of this research is: (1) to evaluate the economic and
companies' documentation web pages to get a good quality and practical advantage of incorporating AI technology. and (2)
industry-standard documentation dataset. It was further to show how AI technology improves data driven operations
diversified and increased using the GPT model to handle a wide
by giving them access to the speed and efficiency in modern
range of API scenarios. The fine-tuning greatly enhanced the
TinyPixel/Llama-2-7B-bf16-sharded model's efficiency and software development. The intended goal is to guarantee
quality of output which is proven by the reduced response time comprehensive, precis and up to date API docs. Examining
and the accuracy of documentation generated. Our study's the state of API strategy documentation at the moment and
comparative study confirms the effectiveness of the approach highlighting the limitations and flaws of the methods used is
used. Our study's conclusion offers a comprehensive approach the goal of this study. To address these issues, the paper
that should improve software development processes and pave proposes generative AI, which gives readers access to a tool
the way for additional developments in API documentation. that might greatly increase the precision, speed, and
scalability of API documentation creation.1
Keywords— API Documentation, Generative AI, Fine-Tuning,
Large Language Models, Web Scraping, Natural Language
Processing. II. BACKGROUND AND EVOLUTION OF APPLICATION
I. INTRODUCTION PROGRAMMING INTERFACES
APIs (Application Programming Interfaces) allow different APIs, or application programming interfaces, are crucial in the
software components to easily communicate with each other. large field of software development because they allow the
To provide usability, integration and to maintain complex linkage and integration of different applications. To
understand the importance of APIs, one must first explore
architecture, comprehensive documentation is required.

1
API - Wikipedia

979-8-3503-4846-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: UNIVERSIDADE DO ESTADO DO RIO DE JANEIRO. Downloaded on August 19,2025 at 02:56:08 UTC from IEEE Xplore. Restrictions apply.
their origins, history, and critical functions in today's software E. Interactive API Documentation
ecosystem. As software development progresses, new Interactive interfaces by Postman and Swagger UI sets apart
techniques emerge, each with its own set of advantages and them from standard manuals. Users can search and make API
uses. API documentation that is clear and thorough is in high queries straight from within the file. This helps in testing,
demand. Among them are: flight testing and improves of API Documentation.4
A. Manual Documentation
The conventional method of producing API documentation is
having a developer or author write a book, usually with the
use of a text editor or platform like Microsoft Word or
Confluence. This method takes a lot of time and work, but it
allows for extension and customization. One of the main
problems with this approach is keeping the database
compatible with constant API updates.2
B. Automated Generation from Code Comments
In order to extract relevant information, programs like Javadoc
for Java and Doxygen for C++ take instructions straight out of Figure 2 Example of Interactive API Documentation using
the source code. Consistency between API usage and related Swagger UI 5
documentation is supported by this integration. Nevertheless,
the comprehension and caliber of the code's comments F. Version Control and Synchronization Tools
determine how successful this strategy will be. Version control systems used in Read Documentation
platforms let you update both the code and its documentation
together. These systems automatically refresh the
documentation when the code changes, making the
information more accurate and reducing the amount of data
that needs to be managed to keep everything current.
G. Generative AI and Machine Learning
Using advanced algorithms and machine learning, systems
can automatically detect patterns, update code, and manage
large volumes of data on their own. This represents a big step
forward in how software documentation is created and
maintained.

In conclusion, we're shifting our attention to using AI to make


API documentation better and more up to date. This means
that documentation will be able to keep up with the fast
Figure 1 Sample Javadoc Documentation for a Java Class 3 changes in software development, making everything more
user-friendly and adaptable.
C. API Description Languages
III. LITERVIEW REVIEW
Languages such as OpenAPI (previously known as Swagger)
and RAML provide developers with the means to describe Reference [1] presents a way of improving user experience in
APIs in machine-readable formats. These descriptions act as producing interactive documentation for Web APIs. In this
a base for generating structured documentation, which aids in method, computational approaches are employed to generate
maintaining uniformity and simplifies the process of updating and interpret natural language descriptions that are useful to
the documentation. This approach enhances the efficiency automated systems as well as human users. The study was
and consistency of API documentation. conducted with data-related domain enthusiasts to validate the
efficacy of this methodology. It has been discovered that it can
D. Framework-Specific Documentation Generators significantly reduce time and effort required for understanding
Some web frameworks come with integrated or additional and using web APIs thereby demonstrating its utility in API
tools specifically for generating documentation. Examples description. The main purpose of [3] is to investigate how
include the Django Rest Framework for Python and Spring scaling up computational language models influence their
Rest Docs for Java. These tools enable the creation of performance on tasks with fast learning rates. The research
documentation directly from the framework's ecosystem, thoroughly compares various model sizes and configurations.
ensuring a high level of coherence with the API's This study found that making the models larger improves the
development environment. This integration, which is closely speed of learning by a great extent through some experiments
linked with the entire development, streamlines the made with different model scales and little data for fine-tuning
information process. purposes. These enlarged models perform on par with or even

2 4
API - Wikipedia Intro to APIs: History of APIs | Postman Blog
3 5
https://www.baeldung.com/javadoc https://editor.swagger.io/

SCEECS 2024
Authorized licensed use limited to: UNIVERSIDADE DO ESTADO DO RIO DE JANEIRO. Downloaded on August 19,2025 at 02:56:08 UTC from IEEE Xplore. Restrictions apply.
better than previous models that relied on large amounts of to satisfy different needs related to software development
data. documentation.
RbG has been introduced as a documentation tool specifically Reference [7] looks on the benefits of combining more
designed for scientific and engineering software [6]. In general, more generic instructions with specialist models for
generating documentation, this program automatically natural language task processing. This study compares models
extracts mathematical formulas and decision-making logic trained on general data with those designed for specific tasks
from the code through code analysis. RbG arranges these to investigate the effects of adding extra data on performance.
documents with comments within the source code itself giving The results are quite impressive when training data is scarce.
it total control over the content that is provided. RbG’s usage The main finding of this work is that reliable, high-quality
in many different professional contexts such as reverse generalist data are essential to prevent models developed for
engineering, documenting new software projects and updating tasks from performing inadequately. This work highlights the
existing system documentation demonstrates its flexibility. difficulties and complexities of carefully integrating data to
These case studies demonstrate how RbG can be customised enhance natural language processing algorithms.

Table 1 Comparative Analysis of Large Language Model Applications and Methodologies

Reference Purpose Model/Technique Used Dataset Used Comparison Results/Findings


Parameters and
Methodologies

[2] News summary LLM with evolutionary Niche domain dataset, Compared with TFIDF NSG generates
generation using LLM fine-tuning. PENS, grain storage pest and TextRank algorithms accurate, reliable
summaries

[4] Adapting language GPT-3 language model Hand-curated dataset Metrics: output Metrics for PALMS
models to society with target values. adherence, toxicity, process evaluation
common word

[5] CodeBERT for CodeBERT pre-trained Provided by Husain et al. Pre-trained on large-scale Performance on code
programming and NLP model (2019) corpus, fine-tuned on documentation,
NL-PL apps retrieval

[9] Compute-optimal Chinchilla (Transformer Not explicitly mentioned Investigated optimal Compute-optimal
training of transformer language model) model size and tokens training solution

[10] BERT fine-tuning for BERT (Bidirectional IMDb, Yelp P., Yelp F., Investigated various Achieved new state-
text classification Encoder Representations) TREC, Yahoo! Answers, BERT fine-tuning of-the-art results on
AG’s News, DBPedia, methods text classification
Sogou News datasets

[13] Fine-tuning pre-trained RoBERTa LARGE, GLUE benchmark Two-stage fine-tuning Task-agnostic mask,
language models mBART LARGE approach adapter fine-tuning

Reference [8] investigates the tranGAN model about the use than other traditional fine-tuning techniques. To make a model
of Generative Adversarial Networks (GANs) for text produce targeted outputs that does not require a lot of
generation. This model combines the actor-critic technique additional training, prompt tuning allows for the customizing
with transformer architecture to address common text of a prompt vector for that task. Tasks such as question
production issues including sequence dependence and answering, natural language inference, and text categorization
exposure bias. The Penn Treebank dataset was used to assess show how well this technique works, not only making it much
tranGAN's capacity to generate grammatically sound and more robust but also allowing combining efficiently and
logical sentences. These assessments show off tranGAN's text quickly thereby enhancing its general applicability.
creation skills. The purpose of [12] is to create greater transparency in
A prompt tuning technique is presented in [11] that makes it evaluating language models. It provides a comprehensive
possible to condition language models that have already been framework called HELM that evaluates these models on
trained for specific tasks in an effective manner. The method various aspects such as toxicity, efficiency, bias, fairness,
works effectively on a variety of natural language processing robustness, accuracy, and calibration. The evaluation of 30
applications and requires less training time and parameters different popular language models on 42 scenarios makes

SCEECS 2024
Authorized licensed use limited to: UNIVERSIDADE DO ESTADO DO RIO DE JANEIRO. Downloaded on August 19,2025 at 02:56:08 UTC from IEEE Xplore. Restrictions apply.
HELM extend the scope of language model evaluation
greatly. Moreover, it outlines the trade-offs between metrics
and models while also providing a benchmark to compare
different generative AI techniques across languages. A
detailed examination of generative AI can be found in [14],
which discusses several advanced computational techniques
for creating meaningful content. Generative AI is a rapidly
expanding field which must consider the limitations as well as
opportunities therein. The methods involve presentation of
variational autoencoders (VAEs), generative adversarial
networks (GANs) and deep learning with examples on how
they are used in different types of content creation such as
writing, music and photos. This paper examines the problems
facing the development and deployment of Generative AI
systems to name a few large training data set requirements,
worries about potential biases and moral dilemmas.

IV. PROPOSED METHODOLOGY


The project aims at introducing artificial intelligence in API
documentation. However, the specific objective is to develop
an AI model that can instantly generate accurate
documentation for various APIs. This comprehensive
approach will include creating a diverse dataset, fine tuning a
Figure 3 Workflow Diagram for AI-Driven API Documentation
massive language model and designing a user-friendly System
interface for inputting API details. Ultimately, it should
improve the accuracy and efficiency of API documentation C. Generation and Evaluation
hence maintaining effective software development process
The deployment of the model marked a significant
flow. The procedure is divided into several steps:
advancement in the generation and evaluation of API
A. Dataset building and preparation documentation. For the model to give better responses, we
This step involved the collection of data from web-scrapping improved it by using GenerationConfig. This helped us adjust
on various top leading companies of the world. We collected important settings to get the best performance. We made sure
the open-source API documentation provided by these the model was doing a good job by checking how fast it works
companies to have a clear goal of achieving best and how good is the created API documentation.
documentation possible. Some of the companies were –
PayPal, Google Map Apis, Stripe, etc. D. Output
The data collected was further shaped into input and output
After the model was evaluated, we added a user-friendly
objects to have a concise and clear understanding of our
interface using streamlit library for the user to provide API
model to work on. To diversify and enlarge the data, artificial
details. The model then creates API documentation in JSON
data was generated through GPT. This was a critical step to
format. The output in JSON format is then displayed in a
diversify the dataset to tackle all the API edge test cases.
readable and interactive manner using HTML, CSS, and JS.
Lastly, the dataset was formatted according to standards for
The API docs given by our model is well represented in form
optimizing AI models for NLP tasks, such as modeling
of an interactive webpage.
human – assistant interaction.
B. Fine Tuning Process
To get the most out of AI capabilities in this project, some
changes had to be made on LLaMA 2- 7B parameter sharded
model. This model was necessitated because it has rich
language generative and understanding capacities necessary
for generating accurate API docs. Fine-tuning process
became easier with AutoTrain package that provided
frameworks and tools able to efficiently optimize different
training settings. In this step, therefore, we had to make sure
that the model was trained adequately enough to understand
API documentation while concentrating much on generating
highly relevant outputs that were very accurate at the same
time. Figure 4 User Interface for providing API details to the model

SCEECS 2024
Authorized licensed use limited to: UNIVERSIDADE DO ESTADO DO RIO DE JANEIRO. Downloaded on August 19,2025 at 02:56:08 UTC from IEEE Xplore. Restrictions apply.
C. Comparative Analysis
A comparison of the fine-tuned and original performances
reveals a large difference. In terms of generation speed as
well as content correctness and relevancy, the upgraded
model performs better than the original. This illustrates the
usefulness of our approach and the significance of using a
dataset that has been specifically created for a particular task.
The results from our dataset was not only appropriate and
useful but also took less time to get generated, after the model
had a good idea of what to generate.
Figure 5 Sample Output from our API Documentation

V. RESULT ANALYSIS
Our focus was to maximize the performance of the
TinyPixel/Llama-2-7B-bf16-sharded model using a dataset
meant for API documentation to fine tune the model. Our
finetuned model outperformed the baseline model in several
categories.
A. Initial Model Performance
There were problems in the initial responses by
TinyPixel/Llama-2-7Bbf16-sharded model. To produce API Figure 7 Comparison of API Documentation Outputs Before and
documentation, it took an average of fifty seconds. After Fine-Tuning.
Furthermore, these outputs frequently fell short of the
assignment's true requirements in terms of the accuracy and VI. CONCLUSION
comprehensiveness needed to create an excellent API The process of creating and maintaining API documentation
specification. The first findings show that the model for modern applications has been made easy through this our
misinterpreted the user's request for API documentation. study, which leverages the power of Generative AI to create
B. Performance after Refinement API documentation for user which is concise and meets the
industry standards. We optimized the TinyPixel/Llama-2-
After the model was adjusted, both its output quality and 7Bbf16-sharded model to achieve significant speed and
speed significantly increased. These days, it only takes 36 quality gains. The improvements shown by the model show
seconds on average to construct an instance of API its flexibility and accuracy. This expands the potential of
documentation. This acceleration is necessary for practical machine learning techniques. Using web technologies to
applications, especially in settings where software interpret the model's output, we enhance user experience and
development proceeds rapidly. Furthermore, there was a provide a visually appealing and captivating interface.
discernible improvement in the quality of the documents Developers and other stakeholders will find it simpler to
generated. The updated model adhered to professional API understand and learn about API integration as a result. The
documentation standards and produced documentation that addition of HTML, CSS, and JavaScript has made the API
was more precise, intelligible, and appropriate for the current documentation easier to read and comprehend. Now that
scenario. documentation can be accessed more quickly and
perceptively, developers may fully understand the
possibilities of APIs. In addition to accelerating the learning
curve for engineers, this method promotes a more organized
and productive software development environment.
In the end, this study provides a strong basis for upcoming
API documentation enhancements. A new era of precise,
interactive, and user-centered documentation is being
promoted in by the combination of modern web technologies
and advanced computational methodologies. This all-
encompassing approach will revolutionize our interactions
with API documentation and propel the industry to previously
unheard-of heights of efficiency, accessibility, and usability.

VII. REFERENCES

Figure 6 Performance Comparison Before and After Model Fine- [1] González-Mora, C., Barros, C., Garrigós, I., Zubcoff, J., Lloret, E., &
Mazón, J. N. (2023). Improving open data web API documentation
Tuning through interactivity and natural language generation. Computer
Standards & Interfaces, 83, 103657.

SCEECS 2024
Authorized licensed use limited to: UNIVERSIDADE DO ESTADO DO RIO DE JANEIRO. Downloaded on August 19,2025 at 02:56:08 UTC from IEEE Xplore. Restrictions apply.
[2] Xiao, L., & Chen, X. (2023). Enhancing LLM with Evolutionary Fine
Tuning for News Summary Generation. arXiv preprint
arXiv:2307.02839.
[3] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal,
P., ... & Amodei, D. (2020). Language models are few-shot learners.
Advances in neural information processing systems, 33, 1877-1901.
[4] Solaiman, I., & Dennison, C. (2021). Process for adapting language
models to society (palms) with values-targeted datasets. Advances in
Neural Information Processing Systems, 34, 5861-5873.
[5] Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., ... & Zhou,
M. (2020). Codebert: A pre-trained model for programming and natural
languages. arXiv preprint arXiv:2002.08155.
[6] Moser, M., Pichler, J., Fleck, G., & Witlatschil, M. (2015, March). Rbg:
A documentation generator for scientific and engineering software. In
2015 IEEE 22nd International Conference on Software Analysis,
Evolution, and Reengineering (SANER) (pp. 464-468). IEEE.
[7] Shi, C., Su, Y., Yang, C., Yang, Y., & Cai, D. (2023). Specialist or
Generalist? Instruction Tuning for Specific NLP Tasks. arXiv preprint
arXiv:2310.15326.
[8] Zhang, C., Xiong, C., & Wang, L. (2019, August). A research on
generative adversarial networks applied to text generation. In 2019
14th International Conference on Computer Science & Education
(ICCSE) (pp. 913-917). IEEE.
[9] Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T.,
Rutherford, E., ... & Sifre, L. (2022). Training compute-optimal large
language models. arXiv preprint arXiv:2203.15556.
[10] Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune bert
for text classification?. In Chinese Computational Linguistics: 18th
China National Conference, CCL 2019, Kunming, China, October 18–
20, 2019, Proceedings 18 (pp. 194-206). Springer International
Publishing.
[11] Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for
parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
[12] Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga,
M., ... & Koreeda, Y. (2022). Holistic evaluation of language models.
arXiv preprint arXiv:2211.09110.
[13] Liao, B., Meng, Y., & Monz, C. (2023). Parameter-Efficient Fine-
Tuning without Introducing New Latency. arXiv preprint
arXiv:2305.16742.
[14] Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2023).
Generative AI.

SCEECS 2024
Authorized licensed use limited to: UNIVERSIDADE DO ESTADO DO RIO DE JANEIRO. Downloaded on August 19,2025 at 02:56:08 UTC from IEEE Xplore. Restrictions apply.

You might also like