Advances in Systems Analysis, Software Engineering, and High Performance Computing
In the era of data-driven science, corpus-based language technology is an essential part of cyber... more In the era of data-driven science, corpus-based language technology is an essential part of cyber physical systems. In this chapter, the authors describe the design and the development of an extensible domain-specific web corpus to be used in a distributed social application for the care of the elderly at home. The domain of interest is the medical field of chronic diseases. The corpus is conceived as a flexible and extensible textual resource, where additional documents and additional languages will be appended over time. The main purpose of the corpus is to be used for building and training language technology applications for the “layfication” of the specialized medical jargon. “Layfication” refers to the automatic identification of more intuitive linguistic expressions that can help laypeople (e.g., patients, family caregivers, and home care aides) understand medical terms, which often appear opaque. Exploratory experiments are presented and discussed.
In a SweClarin cooperation project we apply topic modelling to the texts found with pins in Pinte... more In a SweClarin cooperation project we apply topic modelling to the texts found with pins in Pinterest boards. The data in focus are digitisations of Viking Age finds from the Swedish History Museum and the underlying research question is how they are given new contextual meanings in boards. We illustrate how topic modelling can support interpretation of polysemy and culturally situated meanings. It expands on the employment of topic modelling by accentuating the necessity of interpretation in every step of the process from capturing and cleaning the data, to modelling and visualisation. The paper concludes that the national context of digitisations of Viking Age jewellery in the Swedish History Museum's collection management system is replaced by several transnational contexts in which Viking Age jewellery is appreciated for its symbolical meanings and decorative functions in contemporary genres for re-imagining, reliving and performing European pasts and mythologies. The emerging contexts on Pinterest also highlight the business opportunities involved in genres such as reenactment, neo-paganism, lajv and fantasy. The boards are clues to how digitisations serve as prototypes for replicas.
We report on results from using the multivariate readability model SVIT to classify texts into va... more We report on results from using the multivariate readability model SVIT to classify texts into various levels. We investigate how the language features integrated in the SVIT model can be transformed to values on known criteria like vocabulary, grammatical fluency and propositional knowledge. Such text criteria, sensitive to content, readability and genre in combination with the profile of a student's reading ability form the base of individually adapted texts. The procedure of levelling texts into different stages of complexity is presented along with results from the first cycle of tests conducted on 8th grade students. The results show that SVIT can be used to classify texts into different complexity levels.
2020 IEEE Frontiers in Education Conference (FIE), 2020
This Innovative Practice Full Paper presents an approach to integrate three critical elements in ... more This Innovative Practice Full Paper presents an approach to integrate three critical elements in Computer Science education. The call to imbue computer science graduates with strategic skills needed to address our pressing global sustainability challenges is extremely important, and a great challenge to degree programmes in computer science and software engineering. Doing this successfully requires great care, and possibly several iterations across an entire curriculum. In this regard, learning for sustainability faces similar challenges as understanding scientific results and ethics. Improving skills in searching for, reading, and producing academic texts are often neglected, as are skills in understanding ethics; what norms and values that guide our choices of methods for solving problems. To handle the fact that these subjects (academic writing, ethics and sustainability) are treated separately, and thereby lowering student engagement with the topics, we have successfully integrated them into one coherent subject of Professionalism in Computer Science. By integrating the three subjects, we do three things: a) describe a multi-faceted but integrated engineering role; b) integrate the three aspects of the role we focus on in education and steer away from the view that these are add-ons; and c) increase the motivation of students to take on these aspects of the engineering role. Our approach uses a flipped-classroom style with students playing educational games, participating in discussion seminars and conducting critical analyses of other students' choices in IT system design. Much emphasis is on the students academic writing abilities, including critical information search and a student peer-review procedure. Also, we do this using an integrated assessment format where teachers from different disciplinary backgrounds jointly assess material from students, which stimulates discussions among ourselves about what and how to assess, and provides a practical way to integrate assessments. We present results from attitude surveys, course evaluations and the contents of the students' analyses in their final essays. In conclusion, our approach demonstrates a clear shift in how students perceive sustainability, showing that it is possible to achieve changes in attitude towards the subjects as such and their importance for computer scientists.
This paper presents the motivation and challenges to developing semantic interoperability for an ... more This paper presents the motivation and challenges to developing semantic interoperability for an internet of things network that is used in the context of home based care. The paper describes a research environment which examines these challenges and illustrates the motivation through a scenario whereby a network of devices in the home is used to provide high-level information about elderly patients by leveraging from techniques in context awareness, automated reasoning, and configuration planning.
We present a novel model for text complexity analysis which can be fitted to ordered categorical ... more We present a novel model for text complexity analysis which can be fitted to ordered categorical data measured on multiple scales, e.g. a corpus with binary responses mixed with a corpus with more than two ordered outcomes. The multiple scales are assumed to be driven by the same underlying latent variable describing the complexity of the text. We propose an easily implemented Gibbs sampler to sample from the posterior distribution by a direct extension of established data augmentation schemes. By being able to combine multiple corpora with different annotation schemes we can get around the common problem of having more text features than annotated documents, i.e. an example of the $p>n$ problem. The predictive performance of the model is evaluated using both simulated and real world readability data with very promising results.
Background: Individuals with intellectual disabilities (ID) show difficulties with everyday plann... more Background: Individuals with intellectual disabilities (ID) show difficulties with everyday planning. A tablet-based training program for everyday planning may be a suitable intervention, but its feasibility must be evaluated. This study evaluated how behaviour changes during training and if individuals with ID can use technology by themselves. Method: 33 adolescents with ID and 30 younger children with a typical development were recruited. The participants were instructed to train in school for a total of 300 minutes. After the intervention, the participants were matched on Mental Age (MA) Results: Only 16% of the participants trained for all 300 minutes. Participants in the MA group trained for a longer time than the ID group. Both groups made fewer errors per task in the end compared to the beginning. Individuals with ID started off making fewer attempts per task and increased their activity during the training. This pattern was not seen in the comparison group. Conclusions: Both...
In this article, we present the results of a corpus-based study where we explore whether it is po... more In this article, we present the results of a corpus-based study where we explore whether it is possible to automatically single out different facets of text complexity in a general-purpose corpus. To this end, we use factor analysis as applied in Biber’s multi-dimensional analysis framework. We evaluate the results of the factor solution by correlating factor scores and readability scores to ascertain whether the selected factor solution matches the independent measurement of readability, which is a notion tightly linked to text complexity. The corpus used in the study is the Swedish national corpus, calledStockholm-Umeå Corpusor SUC. The SUC contains subject-based text varieties (e.g., hobby), press genres (e.g., editorials), and mixed categories (e.g., miscellaneous). We refer to them collectively as ‘registers’. Results show that it is indeed possible to elicit and interpret facets of text complexity using factor analysis despite some caveats. We propose a tentative text complexi...
We present results from a study of truck drivers' experience of using two different interfaces; s... more We present results from a study of truck drivers' experience of using two different interfaces; spoken interaction and visualmanual interaction, to perform secondary tasks while driving. The instruments used to measure their experience are based on three popular questionnaires, measuring different aspects of usability and cognitive load: SASSI, SUS and DALI. Our results show that the speech interface is preferred both regarding usability and cognitive demand.
We present an evaluation of an extraction based summarizer based on human assessments of the summ... more We present an evaluation of an extraction based summarizer based on human assessments of the summaries. In the experiment humans read the various summaries and answered questions on the content of the text and filled in a questionnaire with subjective assessments. The it took to read a summary was also measured. The texts were taken from the readability tests from a national test on knowledge and ability to be engaged in university studies (Sw. Högskoleprovet). Our results show that summaries are faster to read, but miss information needed to fully answer questions related to the text and also that human readers consider them harder to read than the original texts.
In this paper we present a novel technique to capture Web users' behaviour based on their interes... more In this paper we present a novel technique to capture Web users' behaviour based on their interest-oriented actions. In our approach we utilise the vector space model Random Indexing to identify the latent factors or hidden relationships among Web users' navigational behaviour. Random Indexing is an incremental vector space technique that allows for continuous Web usage mining. User requests are modelled by Random Indexing for individual users' navigational pattern clustering and common user profile creation. Clustering Web users' access patterns may capture common user interests and, in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. We present results from the Web user clustering approach through experiments on a real Web log file with promising results. We also apply our data to a prefetching task and compare that with previous approaches. The results show that Random Indexing provides more accurate prefetchings.
Your article is protected by copyright and all rights are held exclusively by Springer-Verlag Lon... more Your article is protected by copyright and all rights are held exclusively by Springer-Verlag London Limited. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to selfarchive your work, please use the accepted author's version for posting to your own website or your institution's repository. You may further deposit the accepted author's version on a funder's repository at a funder's request, provided it is not made publicly available until 12 months after publication.
Although previous studies have shown that errors occur in texts summarized by extraction based su... more Although previous studies have shown that errors occur in texts summarized by extraction based summarizers, no study has investigated how common different types of errors are and how that changes with degree of summarization. We have conducted studies of errors in extraction based single document summaries using 30 texts, summarized to 5 different degrees and tagged for errors by human judges. The results show that the most common errors are absent cohesion or context and various types of broken or missing anaphoric references. The amount of errors is dependent on the degree of summarization where some error types have a linear relation to the degree of summarization and others have U-shaped or cutoff linear relations. These results show that the degree of summarization has to be taken into account to minimize the amount of errors by extraction based summarizers.
We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Val... more We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. We compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. In this study we use a corpus comprising 1003 documents from the MEDLINE-corpus. Our results show that Latent Semantic Analysis on Random Indexing reduced matrices provide better results on Precision and Recall than Random Indexing only. Furthermore, computation time for Singular Value Decomposition on a Random Indexing reduced matrix is almost halved compared to Latent Semantic Analysis.
Analysis of distilled dialogues, ie post processed natural dia-logues, is a complement to analyse... more Analysis of distilled dialogues, ie post processed natural dia-logues, is a complement to analyses of dialogues collected either in Wizard of Oz-experiments or in natural settings for develop-ment of dialogue systems. However, the distillation process itself has been found to provide ...
CiteSeerX - Document Details (Isaac Councill, Lee Giles): This paper describes the coding model a... more CiteSeerX - Document Details (Isaac Councill, Lee Giles): This paper describes the coding model and coding conventions used for the LINDA and LINLIN dialogue models developed in Linkoping. Both dialogue and focus structure are included in the analysis. ...
Advances in Systems Analysis, Software Engineering, and High Performance Computing
In the era of data-driven science, corpus-based language technology is an essential part of cyber... more In the era of data-driven science, corpus-based language technology is an essential part of cyber physical systems. In this chapter, the authors describe the design and the development of an extensible domain-specific web corpus to be used in a distributed social application for the care of the elderly at home. The domain of interest is the medical field of chronic diseases. The corpus is conceived as a flexible and extensible textual resource, where additional documents and additional languages will be appended over time. The main purpose of the corpus is to be used for building and training language technology applications for the “layfication” of the specialized medical jargon. “Layfication” refers to the automatic identification of more intuitive linguistic expressions that can help laypeople (e.g., patients, family caregivers, and home care aides) understand medical terms, which often appear opaque. Exploratory experiments are presented and discussed.
In a SweClarin cooperation project we apply topic modelling to the texts found with pins in Pinte... more In a SweClarin cooperation project we apply topic modelling to the texts found with pins in Pinterest boards. The data in focus are digitisations of Viking Age finds from the Swedish History Museum and the underlying research question is how they are given new contextual meanings in boards. We illustrate how topic modelling can support interpretation of polysemy and culturally situated meanings. It expands on the employment of topic modelling by accentuating the necessity of interpretation in every step of the process from capturing and cleaning the data, to modelling and visualisation. The paper concludes that the national context of digitisations of Viking Age jewellery in the Swedish History Museum's collection management system is replaced by several transnational contexts in which Viking Age jewellery is appreciated for its symbolical meanings and decorative functions in contemporary genres for re-imagining, reliving and performing European pasts and mythologies. The emerging contexts on Pinterest also highlight the business opportunities involved in genres such as reenactment, neo-paganism, lajv and fantasy. The boards are clues to how digitisations serve as prototypes for replicas.
We report on results from using the multivariate readability model SVIT to classify texts into va... more We report on results from using the multivariate readability model SVIT to classify texts into various levels. We investigate how the language features integrated in the SVIT model can be transformed to values on known criteria like vocabulary, grammatical fluency and propositional knowledge. Such text criteria, sensitive to content, readability and genre in combination with the profile of a student's reading ability form the base of individually adapted texts. The procedure of levelling texts into different stages of complexity is presented along with results from the first cycle of tests conducted on 8th grade students. The results show that SVIT can be used to classify texts into different complexity levels.
2020 IEEE Frontiers in Education Conference (FIE), 2020
This Innovative Practice Full Paper presents an approach to integrate three critical elements in ... more This Innovative Practice Full Paper presents an approach to integrate three critical elements in Computer Science education. The call to imbue computer science graduates with strategic skills needed to address our pressing global sustainability challenges is extremely important, and a great challenge to degree programmes in computer science and software engineering. Doing this successfully requires great care, and possibly several iterations across an entire curriculum. In this regard, learning for sustainability faces similar challenges as understanding scientific results and ethics. Improving skills in searching for, reading, and producing academic texts are often neglected, as are skills in understanding ethics; what norms and values that guide our choices of methods for solving problems. To handle the fact that these subjects (academic writing, ethics and sustainability) are treated separately, and thereby lowering student engagement with the topics, we have successfully integrated them into one coherent subject of Professionalism in Computer Science. By integrating the three subjects, we do three things: a) describe a multi-faceted but integrated engineering role; b) integrate the three aspects of the role we focus on in education and steer away from the view that these are add-ons; and c) increase the motivation of students to take on these aspects of the engineering role. Our approach uses a flipped-classroom style with students playing educational games, participating in discussion seminars and conducting critical analyses of other students' choices in IT system design. Much emphasis is on the students academic writing abilities, including critical information search and a student peer-review procedure. Also, we do this using an integrated assessment format where teachers from different disciplinary backgrounds jointly assess material from students, which stimulates discussions among ourselves about what and how to assess, and provides a practical way to integrate assessments. We present results from attitude surveys, course evaluations and the contents of the students' analyses in their final essays. In conclusion, our approach demonstrates a clear shift in how students perceive sustainability, showing that it is possible to achieve changes in attitude towards the subjects as such and their importance for computer scientists.
This paper presents the motivation and challenges to developing semantic interoperability for an ... more This paper presents the motivation and challenges to developing semantic interoperability for an internet of things network that is used in the context of home based care. The paper describes a research environment which examines these challenges and illustrates the motivation through a scenario whereby a network of devices in the home is used to provide high-level information about elderly patients by leveraging from techniques in context awareness, automated reasoning, and configuration planning.
We present a novel model for text complexity analysis which can be fitted to ordered categorical ... more We present a novel model for text complexity analysis which can be fitted to ordered categorical data measured on multiple scales, e.g. a corpus with binary responses mixed with a corpus with more than two ordered outcomes. The multiple scales are assumed to be driven by the same underlying latent variable describing the complexity of the text. We propose an easily implemented Gibbs sampler to sample from the posterior distribution by a direct extension of established data augmentation schemes. By being able to combine multiple corpora with different annotation schemes we can get around the common problem of having more text features than annotated documents, i.e. an example of the $p>n$ problem. The predictive performance of the model is evaluated using both simulated and real world readability data with very promising results.
Background: Individuals with intellectual disabilities (ID) show difficulties with everyday plann... more Background: Individuals with intellectual disabilities (ID) show difficulties with everyday planning. A tablet-based training program for everyday planning may be a suitable intervention, but its feasibility must be evaluated. This study evaluated how behaviour changes during training and if individuals with ID can use technology by themselves. Method: 33 adolescents with ID and 30 younger children with a typical development were recruited. The participants were instructed to train in school for a total of 300 minutes. After the intervention, the participants were matched on Mental Age (MA) Results: Only 16% of the participants trained for all 300 minutes. Participants in the MA group trained for a longer time than the ID group. Both groups made fewer errors per task in the end compared to the beginning. Individuals with ID started off making fewer attempts per task and increased their activity during the training. This pattern was not seen in the comparison group. Conclusions: Both...
In this article, we present the results of a corpus-based study where we explore whether it is po... more In this article, we present the results of a corpus-based study where we explore whether it is possible to automatically single out different facets of text complexity in a general-purpose corpus. To this end, we use factor analysis as applied in Biber’s multi-dimensional analysis framework. We evaluate the results of the factor solution by correlating factor scores and readability scores to ascertain whether the selected factor solution matches the independent measurement of readability, which is a notion tightly linked to text complexity. The corpus used in the study is the Swedish national corpus, calledStockholm-Umeå Corpusor SUC. The SUC contains subject-based text varieties (e.g., hobby), press genres (e.g., editorials), and mixed categories (e.g., miscellaneous). We refer to them collectively as ‘registers’. Results show that it is indeed possible to elicit and interpret facets of text complexity using factor analysis despite some caveats. We propose a tentative text complexi...
We present results from a study of truck drivers' experience of using two different interfaces; s... more We present results from a study of truck drivers' experience of using two different interfaces; spoken interaction and visualmanual interaction, to perform secondary tasks while driving. The instruments used to measure their experience are based on three popular questionnaires, measuring different aspects of usability and cognitive load: SASSI, SUS and DALI. Our results show that the speech interface is preferred both regarding usability and cognitive demand.
We present an evaluation of an extraction based summarizer based on human assessments of the summ... more We present an evaluation of an extraction based summarizer based on human assessments of the summaries. In the experiment humans read the various summaries and answered questions on the content of the text and filled in a questionnaire with subjective assessments. The it took to read a summary was also measured. The texts were taken from the readability tests from a national test on knowledge and ability to be engaged in university studies (Sw. Högskoleprovet). Our results show that summaries are faster to read, but miss information needed to fully answer questions related to the text and also that human readers consider them harder to read than the original texts.
In this paper we present a novel technique to capture Web users' behaviour based on their interes... more In this paper we present a novel technique to capture Web users' behaviour based on their interest-oriented actions. In our approach we utilise the vector space model Random Indexing to identify the latent factors or hidden relationships among Web users' navigational behaviour. Random Indexing is an incremental vector space technique that allows for continuous Web usage mining. User requests are modelled by Random Indexing for individual users' navigational pattern clustering and common user profile creation. Clustering Web users' access patterns may capture common user interests and, in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. We present results from the Web user clustering approach through experiments on a real Web log file with promising results. We also apply our data to a prefetching task and compare that with previous approaches. The results show that Random Indexing provides more accurate prefetchings.
Your article is protected by copyright and all rights are held exclusively by Springer-Verlag Lon... more Your article is protected by copyright and all rights are held exclusively by Springer-Verlag London Limited. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to selfarchive your work, please use the accepted author's version for posting to your own website or your institution's repository. You may further deposit the accepted author's version on a funder's repository at a funder's request, provided it is not made publicly available until 12 months after publication.
Although previous studies have shown that errors occur in texts summarized by extraction based su... more Although previous studies have shown that errors occur in texts summarized by extraction based summarizers, no study has investigated how common different types of errors are and how that changes with degree of summarization. We have conducted studies of errors in extraction based single document summaries using 30 texts, summarized to 5 different degrees and tagged for errors by human judges. The results show that the most common errors are absent cohesion or context and various types of broken or missing anaphoric references. The amount of errors is dependent on the degree of summarization where some error types have a linear relation to the degree of summarization and others have U-shaped or cutoff linear relations. These results show that the degree of summarization has to be taken into account to minimize the amount of errors by extraction based summarizers.
We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Val... more We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. We compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. In this study we use a corpus comprising 1003 documents from the MEDLINE-corpus. Our results show that Latent Semantic Analysis on Random Indexing reduced matrices provide better results on Precision and Recall than Random Indexing only. Furthermore, computation time for Singular Value Decomposition on a Random Indexing reduced matrix is almost halved compared to Latent Semantic Analysis.
Analysis of distilled dialogues, ie post processed natural dia-logues, is a complement to analyse... more Analysis of distilled dialogues, ie post processed natural dia-logues, is a complement to analyses of dialogues collected either in Wizard of Oz-experiments or in natural settings for develop-ment of dialogue systems. However, the distillation process itself has been found to provide ...
CiteSeerX - Document Details (Isaac Councill, Lee Giles): This paper describes the coding model a... more CiteSeerX - Document Details (Isaac Councill, Lee Giles): This paper describes the coding model and coding conventions used for the LINDA and LINLIN dialogue models developed in Linkoping. Both dialogue and focus structure are included in the analysis. ...
Uploads
Papers by Arne Jönsson