Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
…
5 pages
1 file
AI-generated Abstract
The paper discusses the challenges associated with integrating financial data from multiple sources, specifically focusing on the issues of abstraction, linking, and consolidation necessary for effective data analysis. It emphasizes the importance of data quality and reliability in the integration process, as erroneous data can adversely affect analysis outcomes. The conclusion highlights that overcoming data integration challenges is critical for developing sophisticated semantic analysis methods that can enhance transparency and drive informed business decisions.
2011
We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships.
2014
Abstract. Finance practitioners and researchers rely heavily on accurate and acces-sible historical data. Practitioners require the data to evaluate trading and investing decisions. Researchers may use data to test market quality and efficiency. Unfortu-nately data is not error-free and is difficult to access and integrate with other sources. The ongoing project FINDS (Financial News and Data Service) is designed to fill this gap and provide clean, integrated and accessible data to both practitioners and researchers in finance. We achieve these goals via flexible data preprocessing and novel data preparation methods presented below. Data integration includes the task of combining data residing at different sources and providing the user with the unified view of this data [1]. Formally, we can pro-
2004
This paper describes the development and applications of FRAANK-Financial Reporting and Auditing Agent with Net Knowledge. The prototype of FRAANK presented here provides automated access to, and understanding and integration of rapidly changing financial information available from various sources on the Internet. In particular, FRAANK implements intelligent parsing to extract accounting numbers from natural-text financial statements available from the SEC EDGAR repository. FRAANK develops an "understanding" of the accounting numbers by means of matching the line-item labels to synonyms of tags in an XBRL taxonomy. As a result, FRAANK converts the consolidated balance sheet, income statement, and statement of cash flows into XBRL-tagged format. Based on FRAANK, we propose an empirical approach towards the evaluation and improvement of XBRL taxonomies and for identifying and justifying needs for specialized taxonomies by assessing a taxonomy fit to the historical data, i.e., the qua rterly and annual EDGAR filings. Using a test set of 10-K SEC filings, we evaluate FRAANK's performance by estimating its success rate in extracting and tagging the line items using the year 2000 C&I XBRL Taxonomy, Version 1. The evaluation results show that FRAANK is an advanced research prototype that can be useful in various practical applications. FRAANK also integrates the accounting numbers with other financial information publicly available on the Internet, such as timely stock quotes and analysts' forecasts of earnings, and calculates important financial ratios and other financial-analysis indicators.
Proceedings of the 6th International Conference on Semantic Systems - I-SEMANTICS '10, 2010
One of the main ways of populating the Web of Data is by triplifying existing data sources. One interesting candidate for this approach is data based on the XML Business Reporting Language (XBRL), a standard for business and financial reporting. Many institutions are making available or requiring data in this format, e.g. the US SEC through the EDGAR program. However, XBRL data is loosely interconnected and it is difficult to mix and query it. Our contribution is a translation from XBRL filings to linked data, which we have applied to more than 1000 filings obtaining 3 million triples. The resulting semantic data is easier to integrate and cross query. Moreover, it can be interconnected with the rest of the Web of Data in order to extract its full potential.
Journal of Information Systems, 2005
This paper describes the development and applications of FRAANK—Financial Reporting and Auditing Agent with Net Knowledge. The prototype of FRAANK presented here provides automated access to, and understanding and integration of, rapidly changing financial information available from various sources on the Internet. In particular, FRAANK implements intelligent parsing to extract accounting numbers from natural-text financial statements available from the SEC EDGAR repository. FRAANK develops an “understanding” of the accounting numbers by means of matching the line-item labels to synonyms of tags in an XBRL taxonomy. As a result, FRAANK converts the consolidated balance sheet, income statement, and statement of cash flows into XBRL-tagged format. Based on FRAANK, we propose an empirical approach toward the evaluation and improvement of XBRL taxonomies and for identifying and justifying needs for specialized taxonomies by assessing a taxonomy fit to the historical data, i.e., the quar...
Applied Economics and Finance, 2016
Rigorous and proper linking of financial databases is a necessary step to test trading strategies incorporating multimodal sources of information. This paper proposes a machine learning solution to match companies in heterogeneous financial databases. Our method, named Financial Attribute Selection Distance (FASD), has two stages, each of them corresponding to one of the two interrelated tasks commonly involved in heterogeneous database matching problems: schema matching and entity matching. FASD's schema matching procedure is based on the Kullback-Leibler divergence of string and numeric attributes. FASD's entity matching solution relies on learning a company distance flexible enough to deal with the numeric and string attribute links found by the schema matching algorithm, and it incorporates different string matching approaches such as edit-based and token-based metrics. The parameters of the distance are optimized using the F-score as cost function. FASD is able to match the joint Compustat/CRSP and Institutional Brokers' Estimate System (I/B/E/S) databases with an F-score over 0.94 using only a hundred of manually labeled company links.
U.S. corporations are obligated to file financial statements with the U.S. Securities and Exchange Commission (SEC). The SEC´s Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system containing millions of financial statements is one of the most important sources of corporate information available. The paper illustrates which financial statements are publicly available by analyzing the entire SEC EDGAR database since its implementation in 1993. It shows how to retrieve financial statements in a fast and efficient way from EDGAR. The key contribution however is a platform-independent algorithm for business and research purposes designed to extract textual information embedded in financial statements. The dynamic extraction algorithm capable of identifying structural changes within financial statements is applied to more than 180,000 annual reports on Form 10-K filed with the SEC for descriptive statistics and validation purposes.
2003
We present a system to extract, visualize, and analyze inter-corporation relationships disclosed by public companies in their annual reports to the U.S. Securities and Exchange Commission (SEC). In improving the transparency of these disclosures, we allow policy makers, analysts, investors, and the general public to analyze these relationships at both the firm level and the industry level. Using probabilistic information retrieval and extraction techniques, we automatically extract a dataset of 45,000 relationships between 26,000 companies from over 15 gigabytes of SEC 10-K documents. These relationships range from ownerships, agreements, and personal connections to competition and legal disagreements. Information visualization and social network analytic techniques can then be applied to explore and analyze the dataset. 7 1
Proceedings of the eighth international conference on Information and knowledge management - CIKM '99, 1999
The proliferation of electronically available data within large organizations as well as publicly available data (e.g. over the World Wide Web) poses challenges for userS who wish to efficiently interact with and integrate multiple heterogeneous sources. This paper presents CI', a corporate information integrator, which applies XML as a tool to facilitate data mediation and integration amongst heterogeneous sources in the context of financial analysts creating corporate profiles. Sources include Lotus Notes, relational databases, and the World Wide Web. C13 applies a unified XML data model to automate integration. By preserving metadata about the source of each datum in the integrated result set, CI" supports source attribution. Users may trace the attribution metadata from the result back to the underlying sources and leverage their expertise in interpreting the data and, if necessary, use their judgment in assessing the authenticity and veracity of results. We present a functional overview of CI', its system architecture including the XML data model, and the integration procedures. We conclude by reflecting on lessons learned.
International Journal of Accounting Information …, 2001
Most major corporations in the U.S. (and a growing number of companies around the world) are reporting some level of financial information on their Web sites. However, it is not clear that the stakeholders are fully satisfied with this Web-based data. The time and effort allocated to the mechanics of Web retrieval are actually increasing because of the difficulty of finding pages and specific data within the enormity of the public Web (over 1 billion pages) or of many corporate intranets. One way to deal with this vast information source would be to automate the Web search mechanics by developing and using intelligent software agents. However, developing these agents in the current Web environment is very problematic. Three factors are preconditions for effective utilization of the Web. First, appropriate metadata representation of financial reporting information on the Web is required that could improve the accuracy of searches (the resource discovery problem). Second, accounting data points within Web pages should be able to be reliably parsed (the attribute recognition problem). Third, standard mechanisms are required that will encourage or require corporations to report in a consistent fashion. The reality of the Web is that it falls far short of a reliable communication medium for accounting and financial information on all three of these factors. The eXtensible Markup Language (XML) provides a method to tag financial information to greatly improve the automation of information location and retrieval, and provides technical solutions to the resource discovery and attribute recognition problems. However, if every company were free to develop its own labels for its XML tags, then the searching for financial information would be only marginally improved. The recent development by a consortium lead by the American Institute of CPAs (AICPA) of the so-called``eXtensible Business Reporting Language'' (XBRL) is an initiative to develop an XML-based Web-based business reporting specification. The widespread adoption of /$ ± see front matter D 2001 Elsevier Science Inc. All rights reserved. PII: S 1 4 6 7 -0 8 9 5 ( 0 0 ) 0 0 0 1 2 -9 XBRL would mean that both humans and intelligent software agents could operate on financial information disseminated on the Web with a high degree of accuracy and reliability. XBRL provides rich research opportunities, including new taxonomies, database accounting, financial statement assurance, intelligent agents, human/computer interfaces, standard development process, adoption incentives, global adoption, and formal ontologies. D
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Accounting Information Systems, 2012
Lecture Notes in Computer Science, 2013
Journal of Accounting and Public Policy, 2010
Decision Support Systems, 2014
2010
Proceedings of the International Workshop on Data Science for Macro-Modeling - DSMM'14, 2014
Oxford Business Law Blog, 2024
Intelligent Systems and Applications, 2018
Citeseer
Asian Journal of Economics, Business and Accounting, 2018
International Journal of Science and Research (IJSR) , 2019
Social Science Research Network, 2015
IOSR Journal of Engineering, 2012