The classic problem of the capital cost optimization of branched piped networks consists of choos... more The classic problem of the capital cost optimization of branched piped networks consists of choosing pipe diameters for each pipe in the network from a discrete set of commercially available pipe diameters. Each pipe in the network can consist of multiple segments of differing diameters. Water networks also consist of intermediate tanks that act as buffers between incoming flow from the primary source and the outgoing flow to the demand nodes. The network from the primary source to the tanks is called the primary network, and the network from the tanks to the demand nodes is called the secondary network. During the design stage, the primary and secondary networks are optimized separately, with the tanks acting as demand nodes for the primary network. Typically the choice of tank locations, their elevations, and the set of demand nodes to be served by different tanks is manually made in an ad hoc fashion before any optimization is done. It is desirable therefore to include this tank configuration choice in the cost optimization process itself. In this work, we explain why the choice of tank configuration is important to the design of a network and describe an integer linear program model that integrates the tank configuration to the standard pipe diameter selection problem. In order to aid the designers of piped-water networks, the improved cost optimization formulation is incorporated into our existing network design system called JalTantra.
ACM Transactions on Asian Language Information Processing, Dec 1, 2010
Today, parallel corpus-based systems dominate the transliteration landscape. But the resourcescar... more Today, parallel corpus-based systems dominate the transliteration landscape. But the resourcescarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. In this article, we show that by properly harnessing the monolingual resources in conjunction with manually created rule base, one can achieve reasonable transliteration performance. We achieve this performance by exploiting the power of Character Sequence Modeling (CSM), which requires only monolingual resources. We present the results of our rule-based system for Hindi to English, English to Hindi, and Persian to English transliteration tasks. We also perform extrinsic evaluation of transliteration systems in the context of Cross Lingual Information Retrieval. Another important contribution of our work is to explain the widely varying accuracy numbers reported in transliteration literature, in terms of the entropy of the language pairs and the datasets involved.
Workshop on Parallel and Distributed Simulation, Jul 1, 1998
In traditional distributed simulation schemes, entire simulation needs to be restarted if any of ... more In traditional distributed simulation schemes, entire simulation needs to be restarted if any of the participating LP crashes. This is highly undesirable for long running simulations. Some form of fault-tolerance is required to minimize the wasted c omputation. In this paper, a rollback based optimistic faulttolerance scheme is integrated with an optimistic distributed simulation scheme. In rollback recovery schemes, checkpoints are periodically saved on stable storage. After a crash, these saved checkpoints are used t o r estart the computation. We make use of the novel insight that a failure c an be modeled as a straggler event with the receive time equal to the virtual time of the last checkpoint saved on stable storage. This results in saving of implementation e orts, as well as reduced overheads. We de ne stable global virtual time SGVT, as the virtual time such that no state with a lower timestamp will ever be rolled back despite crash failures. A simple change is made in existing GVT algorithms to compute SGVT. Our use of transitive dependency tracking eliminates antimessages. LPs are clubbed in clusters to minimize stable storage access time.
We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the wor... more We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are reordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation's relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.
Existing techniques for the cost optimization of water distribution networks either employ metahe... more Existing techniques for the cost optimization of water distribution networks either employ metaheuristics, or try to develop problem-specific optimization techniques. Instead, we exploit recent advances in generic NLP solvers and explore a rich set of model refinement techniques. The networks that we study contain a single source and multiple demand nodes with residual pressure constraints. Indeterminism of flow values and flow direction in the network leads to non-linearity in these constraints making the optimization problem non-convex. While the physical network is cyclic, flow through the network is necessarily acyclic and thus enforces an acyclic orientation. We devise different strategies of finding acyclic orientations and explore the benefit of enforcing such orientations explicitly as a constraint. Finally, we propose a parallel link formulation that models flow in each link as two separate flows with opposing directions. This allows us to tackle numerical difficulties in optimization when flow in a link is near zero. We find that all our proposed formulations give results at par with least cost solutions obtained in the literature on benchmark networks. We also introduce a suite of large test networks since existing benchmark networks are small in size, and find that the parallel link approach outperforms all other approaches on these bigger networks, resulting in a more tractable technique of cost optimization.
English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A p... more English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A part of the data is our patch for the Emille parallel corpus.
We present a performance study of security overheads in 802.11g networks. At 54 Mbps, the securit... more We present a performance study of security overheads in 802.11g networks. At 54 Mbps, the security overhead becomes significant in single client scenarios for many of the popular security protocols. However, we find that the most secure protocol, WPA•2, is also the protocol with the lowest security overhead.
Lexical co-occurrence is an important cue for detecting word associations. We present a theoretic... more Lexical co-occurrence is an important cue for detecting word associations. We present a theoretical framework for discovering statistically significant lexical co-occurrences from a given corpus. In contrast with the prevalent practice of giving weightage to unigram frequencies, we focus only on the documents containing both the terms (of a candidate bigram). We detect biases in span distributions of associated words, while being agnostic to variations in global unigram frequencies. Our framework has the fidelity to distinguish different classes of lexical co-occurrences, based on strengths of the document and corpuslevel cues of co-occurrence in the data. We perform extensive experiments on benchmark data sets to study the performance of various co-occurrence measures that are currently known in literature. We find that a relatively obscure measure called Ochiai, and a newly introduced measure CSA capture the notion of lexical co-occurrence best, followed next by LLR, Dice, and TTest, while another popular measure, PMI, suprisingly, performs poorly in the context of lexical co-occurrence.
Traditionally, crop yield has been the main focus of agricultural policies and technological inte... more Traditionally, crop yield has been the main focus of agricultural policies and technological interventions. For designing appropriate agricultural interventions, a holistic set of indicators accounting for the short and long-term benefits and environmental impacts as well as socioeconomic sustainability of farmers is needed. In contrast with existing frameworks for assessing farming practices where the indicators are restricted to a preset attributes, we developed a stock and flow based framework for a systemic identification of both short and long-term indicators. While stock variables inside the system capture the stability and resilience of the system, indicators identified from various dimensions of the biophysical flows across the system–environment boundary capture the desirable outcomes and undesirable impacts. Our framework also aids in selection of appropriate proxy indicators for hard to measure primary indicators by tracing their forward and backward linkages rather than avoiding them due to their complexity.
Government bodies responsible for drinking water distribution in India face the challenging task ... more Government bodies responsible for drinking water distribution in India face the challenging task of designing schemes that provide a quality of service that is adequate to meet the needs of citizens at a cost below the strict government norms. Engineers at these government bodies must undertake the design process using tools that are not optimal and consider only pipe diameter selection, which is only one component of the entire scheme design. As such, much of the design process is undertaken in an ad hoc and heuristic manner, relying on the experience and intuition of the engineers. We developed JalTantra, a web system that aids these government engineers in sizing both pipe diameters and the various other water network components, such as tanks, pumps, and valves. We use an integer linear program model, which allows us to solve the problem optimally and quickly. History: This paper was refereed.
Word sense disambiguation (WSD) is the task of selecting the appropriate senses of a word in a gi... more Word sense disambiguation (WSD) is the task of selecting the appropriate senses of a word in a given context. It is essence of communication in a natural language. It is motivated by its use in many crucial applications such as Information retrieval, Information extraction, Machine Translation, Partof-Speech tagging, etc. Various issues like scalability, ambiguity, diversity (of languages) and evaluation pose challenges to WSD solutions. The aim of this project is to develop a WSD technique which can handle all these issues with better accuracy and performance. This report presents our preliminary work towards solving the problem.
We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words... more We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words are replaced by characters and sentences by words. We employ the standard SMT tools like GIZA++ for learning alignments and Moses for learning the phrase tables and decoding. Besides tuning the standard SMT parameters, we focus on tuning the Character Sequence Model (CSM) related parameters like order of the CSM, weight assigned to CSM during decoding and corpus used for CSM estimation. Our results show that paying sufficient attention to CSM pays off in terms of increased transliteration accuracies.
In the correct-by-construction programming methodology, programs are incrementally derived from t... more In the correct-by-construction programming methodology, programs are incrementally derived from their formal specifications, by repeatedly applying transformations to partially derived programs. At an intermediate stage in a derivation, users may have to make certain assumptions to proceed further. To ensure that the assumptions hold true at that point in the program, certain other assumptions may need to be introduced upstream as loop invariants or preconditions. Typically these other assumptions are made in an ad hoc fashion and may result in unnecessary rework, or worse, complete exclusion of some of the alternative solutions. In this work, we present rules for propagating assumptions through annotated programs. We show how these rules can be integrated in a top-down derivation methodology to provide a systematic approach for propagating the assumptions, materializing them with executable statements at a place different from the place of introduction, and strengthening of loop invariants with minimal additional proof efforts.
Electronic Notes in Theoretical Computer Science, Dec 1, 2011
Many properties of a system may not be obvious just by a quick inspection of the corresponding Ev... more Many properties of a system may not be obvious just by a quick inspection of the corresponding Event-B model. Users typically rely on animation, scenario analysis, and inspection of state transition graphs for discovering certain behavior of the system. We propose a methodology for generating a hierarchical representation of the system for visualising Event-B models. Our representation is succinct and it provides multiple views to aid in better comprehension of the Event-B models.
Drinking Water Engineering and Science, Jun 9, 2017
The classic problem of the capital cost optimization of branched piped networks consists of choos... more The classic problem of the capital cost optimization of branched piped networks consists of choosing pipe diameters for each pipe in the network from a discrete set of commercially available pipe diameters. Each pipe in the network can consist of multiple segments of differing diameters. Water networks also consist of intermediate tanks that act as buffers between incoming flow from the primary source and the outgoing flow to the demand nodes. The network from the primary source to the tanks is called the primary network, and the network from the tanks to the demand nodes is called the secondary network. During the design stage, the primary and secondary networks are optimized separately with the tanks acting as demand nodes for the primary network. Typically the choice of tank locations, their elevations, and the set of demand nodes to be served by different tanks, is manually made in an ad-hoc fashion before any optimization is done. It is desirable therefore to include this tank configuration choice in the cost optimization process itself. In this work, we motivate why the choice of tank configuration is important to the design of a network and describe an Integer Linear Program (ILP) model that integrates the same to the standard pipe diameter selection problem. To aid the designers of piped water networks, the improved cost optimization formulation is incorporated in our existing network design system called JalTantra.
The Government of India conducts a well census every five years. It is time-consuming, costly, an... more The Government of India conducts a well census every five years. It is time-consuming, costly, and usually incomplete. By using transfer learning-based object detection algorithms, we have built a system for the automatic detection of wells in satellite images. We analyze the performance of three object detection algorithms-Convolutional Neural Network, HaarCascade, and Histogram of Oriented Gradients on the task of well detection and find that the Convolutional Neural Network based YOLOv2 performs best and forms the core of our system. Our current system has a precision value of 0.95 and a recall value of 0.91 on our dataset. The main contribution of our work is to create a novel open-source system for well detection in satellite images and create an associated dataset which will be put in the public domain. A related contribution is the development of a general purpose satellite image annotation system to annotate and validate objects in satellite images. While our focus is on well detection, the system is general purpose and can be used for detection of other objects as well.
Agroecology and sustainable food systems, Dec 24, 2018
A holistic set of indicators using a stock and flow framework is used to assess farming practices... more A holistic set of indicators using a stock and flow framework is used to assess farming practices across socioeconomic and ecological dimensions. We design a methodology to estimate, normalize, and aggregate the indicators to form composite indices. The indicators under each dimension are aggregated using the progressive weighted average to give three-dimensional indices viz. economic, social, and ecological indices, which are aggregated to give a single holistic index called Farm Assessment Index (FAI). Unlike other approaches where the comparison of farming system is restricted within the sample under study, normalization of indicators using regional averages makes the FAI suitable for universal comparisons of farming systems across crops and regions. The methodology was applied to evaluate farming practices of 60 organic and 60 conventional farmers, from two Indian states over three years. The results from the application of the FAI demonstrates that the focus on yield or income as the sole indicator for policy decisions will not lead to sustainable farming systems. Policy makers need to shift toward holistic measures emphasizing human health, the livelihood of farmers and sustenance provided by agroecology. Case studies prove FAI to be a valuable tool for decision-makers in assessing farm practices and designing better agricultural policies and programs.
In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as p... more In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query translation based approach using bilingual dictionaries. Query words not found in the dictionary are transliterated using a simple rule based transliteration approach. The resultant transliteration is then compared with the unique words of the corpus to return the 'k' words most similar to the transliterated word. The resulting multiple translation/transliteration choices for each query word are disambiguated using an iterative page-rank style algorithm which, based on term-term co-occurrence statistics, produces the final translated query. Using the above approach, for Hindi, we achieve a Mean Average Precision (MAP) of 0.2366 using title and a MAP of 0.2952 using title and description. For Marathi, we achieve a MAP of 0.2163 using title.
The classic problem of the capital cost optimization of branched piped networks consists of choos... more The classic problem of the capital cost optimization of branched piped networks consists of choosing pipe diameters for each pipe in the network from a discrete set of commercially available pipe diameters. Each pipe in the network can consist of multiple segments of differing diameters. Water networks also consist of intermediate tanks that act as buffers between incoming flow from the primary source and the outgoing flow to the demand nodes. The network from the primary source to the tanks is called the primary network, and the network from the tanks to the demand nodes is called the secondary network. During the design stage, the primary and secondary networks are optimized separately, with the tanks acting as demand nodes for the primary network. Typically the choice of tank locations, their elevations, and the set of demand nodes to be served by different tanks is manually made in an ad hoc fashion before any optimization is done. It is desirable therefore to include this tank configuration choice in the cost optimization process itself. In this work, we explain why the choice of tank configuration is important to the design of a network and describe an integer linear program model that integrates the tank configuration to the standard pipe diameter selection problem. In order to aid the designers of piped-water networks, the improved cost optimization formulation is incorporated into our existing network design system called JalTantra.
ACM Transactions on Asian Language Information Processing, Dec 1, 2010
Today, parallel corpus-based systems dominate the transliteration landscape. But the resourcescar... more Today, parallel corpus-based systems dominate the transliteration landscape. But the resourcescarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. In this article, we show that by properly harnessing the monolingual resources in conjunction with manually created rule base, one can achieve reasonable transliteration performance. We achieve this performance by exploiting the power of Character Sequence Modeling (CSM), which requires only monolingual resources. We present the results of our rule-based system for Hindi to English, English to Hindi, and Persian to English transliteration tasks. We also perform extrinsic evaluation of transliteration systems in the context of Cross Lingual Information Retrieval. Another important contribution of our work is to explain the widely varying accuracy numbers reported in transliteration literature, in terms of the entropy of the language pairs and the datasets involved.
Workshop on Parallel and Distributed Simulation, Jul 1, 1998
In traditional distributed simulation schemes, entire simulation needs to be restarted if any of ... more In traditional distributed simulation schemes, entire simulation needs to be restarted if any of the participating LP crashes. This is highly undesirable for long running simulations. Some form of fault-tolerance is required to minimize the wasted c omputation. In this paper, a rollback based optimistic faulttolerance scheme is integrated with an optimistic distributed simulation scheme. In rollback recovery schemes, checkpoints are periodically saved on stable storage. After a crash, these saved checkpoints are used t o r estart the computation. We make use of the novel insight that a failure c an be modeled as a straggler event with the receive time equal to the virtual time of the last checkpoint saved on stable storage. This results in saving of implementation e orts, as well as reduced overheads. We de ne stable global virtual time SGVT, as the virtual time such that no state with a lower timestamp will ever be rolled back despite crash failures. A simple change is made in existing GVT algorithms to compute SGVT. Our use of transitive dependency tracking eliminates antimessages. LPs are clubbed in clusters to minimize stable storage access time.
We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the wor... more We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are reordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation's relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.
Existing techniques for the cost optimization of water distribution networks either employ metahe... more Existing techniques for the cost optimization of water distribution networks either employ metaheuristics, or try to develop problem-specific optimization techniques. Instead, we exploit recent advances in generic NLP solvers and explore a rich set of model refinement techniques. The networks that we study contain a single source and multiple demand nodes with residual pressure constraints. Indeterminism of flow values and flow direction in the network leads to non-linearity in these constraints making the optimization problem non-convex. While the physical network is cyclic, flow through the network is necessarily acyclic and thus enforces an acyclic orientation. We devise different strategies of finding acyclic orientations and explore the benefit of enforcing such orientations explicitly as a constraint. Finally, we propose a parallel link formulation that models flow in each link as two separate flows with opposing directions. This allows us to tackle numerical difficulties in optimization when flow in a link is near zero. We find that all our proposed formulations give results at par with least cost solutions obtained in the literature on benchmark networks. We also introduce a suite of large test networks since existing benchmark networks are small in size, and find that the parallel link approach outperforms all other approaches on these bigger networks, resulting in a more tractable technique of cost optimization.
English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A p... more English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A part of the data is our patch for the Emille parallel corpus.
We present a performance study of security overheads in 802.11g networks. At 54 Mbps, the securit... more We present a performance study of security overheads in 802.11g networks. At 54 Mbps, the security overhead becomes significant in single client scenarios for many of the popular security protocols. However, we find that the most secure protocol, WPA•2, is also the protocol with the lowest security overhead.
Lexical co-occurrence is an important cue for detecting word associations. We present a theoretic... more Lexical co-occurrence is an important cue for detecting word associations. We present a theoretical framework for discovering statistically significant lexical co-occurrences from a given corpus. In contrast with the prevalent practice of giving weightage to unigram frequencies, we focus only on the documents containing both the terms (of a candidate bigram). We detect biases in span distributions of associated words, while being agnostic to variations in global unigram frequencies. Our framework has the fidelity to distinguish different classes of lexical co-occurrences, based on strengths of the document and corpuslevel cues of co-occurrence in the data. We perform extensive experiments on benchmark data sets to study the performance of various co-occurrence measures that are currently known in literature. We find that a relatively obscure measure called Ochiai, and a newly introduced measure CSA capture the notion of lexical co-occurrence best, followed next by LLR, Dice, and TTest, while another popular measure, PMI, suprisingly, performs poorly in the context of lexical co-occurrence.
Traditionally, crop yield has been the main focus of agricultural policies and technological inte... more Traditionally, crop yield has been the main focus of agricultural policies and technological interventions. For designing appropriate agricultural interventions, a holistic set of indicators accounting for the short and long-term benefits and environmental impacts as well as socioeconomic sustainability of farmers is needed. In contrast with existing frameworks for assessing farming practices where the indicators are restricted to a preset attributes, we developed a stock and flow based framework for a systemic identification of both short and long-term indicators. While stock variables inside the system capture the stability and resilience of the system, indicators identified from various dimensions of the biophysical flows across the system–environment boundary capture the desirable outcomes and undesirable impacts. Our framework also aids in selection of appropriate proxy indicators for hard to measure primary indicators by tracing their forward and backward linkages rather than avoiding them due to their complexity.
Government bodies responsible for drinking water distribution in India face the challenging task ... more Government bodies responsible for drinking water distribution in India face the challenging task of designing schemes that provide a quality of service that is adequate to meet the needs of citizens at a cost below the strict government norms. Engineers at these government bodies must undertake the design process using tools that are not optimal and consider only pipe diameter selection, which is only one component of the entire scheme design. As such, much of the design process is undertaken in an ad hoc and heuristic manner, relying on the experience and intuition of the engineers. We developed JalTantra, a web system that aids these government engineers in sizing both pipe diameters and the various other water network components, such as tanks, pumps, and valves. We use an integer linear program model, which allows us to solve the problem optimally and quickly. History: This paper was refereed.
Word sense disambiguation (WSD) is the task of selecting the appropriate senses of a word in a gi... more Word sense disambiguation (WSD) is the task of selecting the appropriate senses of a word in a given context. It is essence of communication in a natural language. It is motivated by its use in many crucial applications such as Information retrieval, Information extraction, Machine Translation, Partof-Speech tagging, etc. Various issues like scalability, ambiguity, diversity (of languages) and evaluation pose challenges to WSD solutions. The aim of this project is to develop a WSD technique which can handle all these issues with better accuracy and performance. This report presents our preliminary work towards solving the problem.
We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words... more We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words are replaced by characters and sentences by words. We employ the standard SMT tools like GIZA++ for learning alignments and Moses for learning the phrase tables and decoding. Besides tuning the standard SMT parameters, we focus on tuning the Character Sequence Model (CSM) related parameters like order of the CSM, weight assigned to CSM during decoding and corpus used for CSM estimation. Our results show that paying sufficient attention to CSM pays off in terms of increased transliteration accuracies.
In the correct-by-construction programming methodology, programs are incrementally derived from t... more In the correct-by-construction programming methodology, programs are incrementally derived from their formal specifications, by repeatedly applying transformations to partially derived programs. At an intermediate stage in a derivation, users may have to make certain assumptions to proceed further. To ensure that the assumptions hold true at that point in the program, certain other assumptions may need to be introduced upstream as loop invariants or preconditions. Typically these other assumptions are made in an ad hoc fashion and may result in unnecessary rework, or worse, complete exclusion of some of the alternative solutions. In this work, we present rules for propagating assumptions through annotated programs. We show how these rules can be integrated in a top-down derivation methodology to provide a systematic approach for propagating the assumptions, materializing them with executable statements at a place different from the place of introduction, and strengthening of loop invariants with minimal additional proof efforts.
Electronic Notes in Theoretical Computer Science, Dec 1, 2011
Many properties of a system may not be obvious just by a quick inspection of the corresponding Ev... more Many properties of a system may not be obvious just by a quick inspection of the corresponding Event-B model. Users typically rely on animation, scenario analysis, and inspection of state transition graphs for discovering certain behavior of the system. We propose a methodology for generating a hierarchical representation of the system for visualising Event-B models. Our representation is succinct and it provides multiple views to aid in better comprehension of the Event-B models.
Drinking Water Engineering and Science, Jun 9, 2017
The classic problem of the capital cost optimization of branched piped networks consists of choos... more The classic problem of the capital cost optimization of branched piped networks consists of choosing pipe diameters for each pipe in the network from a discrete set of commercially available pipe diameters. Each pipe in the network can consist of multiple segments of differing diameters. Water networks also consist of intermediate tanks that act as buffers between incoming flow from the primary source and the outgoing flow to the demand nodes. The network from the primary source to the tanks is called the primary network, and the network from the tanks to the demand nodes is called the secondary network. During the design stage, the primary and secondary networks are optimized separately with the tanks acting as demand nodes for the primary network. Typically the choice of tank locations, their elevations, and the set of demand nodes to be served by different tanks, is manually made in an ad-hoc fashion before any optimization is done. It is desirable therefore to include this tank configuration choice in the cost optimization process itself. In this work, we motivate why the choice of tank configuration is important to the design of a network and describe an Integer Linear Program (ILP) model that integrates the same to the standard pipe diameter selection problem. To aid the designers of piped water networks, the improved cost optimization formulation is incorporated in our existing network design system called JalTantra.
The Government of India conducts a well census every five years. It is time-consuming, costly, an... more The Government of India conducts a well census every five years. It is time-consuming, costly, and usually incomplete. By using transfer learning-based object detection algorithms, we have built a system for the automatic detection of wells in satellite images. We analyze the performance of three object detection algorithms-Convolutional Neural Network, HaarCascade, and Histogram of Oriented Gradients on the task of well detection and find that the Convolutional Neural Network based YOLOv2 performs best and forms the core of our system. Our current system has a precision value of 0.95 and a recall value of 0.91 on our dataset. The main contribution of our work is to create a novel open-source system for well detection in satellite images and create an associated dataset which will be put in the public domain. A related contribution is the development of a general purpose satellite image annotation system to annotate and validate objects in satellite images. While our focus is on well detection, the system is general purpose and can be used for detection of other objects as well.
Agroecology and sustainable food systems, Dec 24, 2018
A holistic set of indicators using a stock and flow framework is used to assess farming practices... more A holistic set of indicators using a stock and flow framework is used to assess farming practices across socioeconomic and ecological dimensions. We design a methodology to estimate, normalize, and aggregate the indicators to form composite indices. The indicators under each dimension are aggregated using the progressive weighted average to give three-dimensional indices viz. economic, social, and ecological indices, which are aggregated to give a single holistic index called Farm Assessment Index (FAI). Unlike other approaches where the comparison of farming system is restricted within the sample under study, normalization of indicators using regional averages makes the FAI suitable for universal comparisons of farming systems across crops and regions. The methodology was applied to evaluate farming practices of 60 organic and 60 conventional farmers, from two Indian states over three years. The results from the application of the FAI demonstrates that the focus on yield or income as the sole indicator for policy decisions will not lead to sustainable farming systems. Policy makers need to shift toward holistic measures emphasizing human health, the livelihood of farmers and sustenance provided by agroecology. Case studies prove FAI to be a valuable tool for decision-makers in assessing farm practices and designing better agricultural policies and programs.
In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as p... more In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query translation based approach using bilingual dictionaries. Query words not found in the dictionary are transliterated using a simple rule based transliteration approach. The resultant transliteration is then compared with the unique words of the corpus to return the 'k' words most similar to the transliterated word. The resulting multiple translation/transliteration choices for each query word are disambiguated using an iterative page-rank style algorithm which, based on term-term co-occurrence statistics, produces the final translated query. Using the above approach, for Hindi, we achieve a Mean Average Precision (MAP) of 0.2366 using title and a MAP of 0.2952 using title and description. For Marathi, we achieve a MAP of 0.2163 using title.
Uploads
Papers by Om Damani