Papers by Vinicius Rofatto

Journal of Surveying Engineering
In the development of neural networks, many realizations are performed to decide which solution p... more In the development of neural networks, many realizations are performed to decide which solution provides the smallest prediction error. Due to the inevitable random errors associated with the data and the randomness related to the network (e.g., initialization of the weight and initial conditions linked to the learning procedure), there is usually not an optimal solution. However, we can advantage of the idea of making several realizations based on resampling methods. Resampling methods are often used to replace theoretical assumptions by repeatedly resampling the original data and making inferences from the resampling. Resampling methods provide us the opportunity to do the interval prediction instead of only one point prediction. Following this idea, we introduce three resampling methods in neural networks, namely Deleted Jackknife Trials, Delete-1 Jackknife Trials, and Hold-Out Trials. They are discussed and applied to a real coordinate transformation problem. Although the Delete-1 Jackknife Trials offer better results, the choice of resampling method will depend on the dimension of the problem at hand.
IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium

IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium
The Total Electron Content (TEC) derived from Global Navigation Satellite System (GNSS) data proc... more The Total Electron Content (TEC) derived from Global Navigation Satellite System (GNSS) data processing has been used as a tool for monitoring earthquakes. The purpose of this study is to bring an alternative approach to the prediction of earthquakes and to determine their magnitudes based on Artificial Neural Networks (ANN) and ionospheric disturbances. For this, the Vertical Total Electron Content (VTEC) data from the National Oceanic and Atmosphere Administration (NOAA) were used to train the ANN. Results show that the ANN process achieved an accuracy of 85.71% in validation assessment to predict Tres Picos Mw=8.2 earthquake from 1:30 UTC to 04:00 UTC, approximately 3 hours before the seismic event. For magnitude classification, the ANN achieved an accuracy of 94.60%. The Matthews Correlation Coefficient (MCC) which takes into account all true/false positives and negatives was also evaluated and showed promising results.

Revista Brasileira de Cartografia, Feb 28, 2015
Um dos produtos meteorológicos advindos do processamento de alta precisão de dados GNSS é a estim... more Um dos produtos meteorológicos advindos do processamento de alta precisão de dados GNSS é a estimativa do atraso zenital troposférico, o qual pode ser utilizado para quantifi car o vapor d'água integrado na coluna atmosférica, importante medida para as ciências atmosféricas. Como são diversas as fontes de erros presentes nessas observações, diferentes metodologias de minimização ou tratamento podem ser empregadas ao utilizar diferentes softawares ou estratégias de processamento, o que gera no fi nal soluções com sutis diferenças. A combinação de séries temporais do atraso zenital troposférico obtidas em diferentes metodologias visa gerar uma solução única, mais confi ável se comparada com as soluções individuais. Esse artigo propõe uma combinação em tempo quase real do atraso zenital troposférico baseada em múltiplas soluções para um instante específi co. Nessas circunstâncias, optou-se por uma combinação obtida por meio do método dos mínimos quadrados, com controle de qualidade realizado pelo processo de detecção, identificação e adaptação de eventuais inconsistências. As estimativas do atraso zenital troposférico foram obtidas por meio dos softwares de processamento de dados GNSS, GAMIT e GIPSY-OASIS II, os quais utilizam diferentes métodos e estratégias de processamento. Ambos os softwares representam o estado da arte em processamento de dados GNSS. Diversas possibilidades foram investigadas e seus impactos nas estimativas do atraso troposférico puderam ser avaliados. As séries temporais foram geradas e as soluções combinadas para cada instante em uma janela deslizante foram produzidas junto com o viés entre as estimativas do atraso troposférico para cada software. Utilizando esse procedi
Geo-spatial Information Science, 2022

Há mais de meio século, a teoria da confiabilidade introduzida por Baarda (1968) tem sido usada c... more Há mais de meio século, a teoria da confiabilidade introduzida por Baarda (1968) tem sido usada como uma prática padrão para o controle de qualidade em geodésia. Embora atenda o rigor matemático e os pressupostos da probabilidade, a teoria foi originalmente desenvolvida para um Data-Snooping que considera uma específica observação como sendo um outlier. Na prática, não sabemos qual observação é um outlier. Se o objetivo do procedimento Data-Snooping é testar cada observação individual quanto à presença de um outlier, então uma hipótese alternativa mais apropriada seria: “Existe pelo menos um outlier nos dados observados”. Agora, estamos interessados em responder: “Onde?”. A resposta para tal pergunta recai sobre um problema de localizar dentre as hipóteses alternativas aquela que levou à rejeição da hipótese nula, ou seja, estamos interessados em identificar o outlier. Esse problema é conhecido como múltiplas hipóteses alternativas. Embora avanços tenham ocorrido ao longo desse período, as teorias apresentadas até o momento consideram apenas uma única rodada do Data-Snooping, sem qualquer diagnóstico subsequente, como a remoção do outlier. Na prática, entretanto, o Data-Snooping é aplicado de forma iterativa: após a identificação e a eliminação de um possível outlier, os dados são reprocessados e a identificação é reiniciada. Este procedimento é denominado de Data-Snooping Iterativo (DSI). O DSI é, portanto, um caso que envolve não somente múltiplas hipóteses alternativas, mas também múltiplas rodadas de estimação, teste e adaptação. Estimar os níveis de probabilidade associado com DSI é praticamente impossível por aqueles métodos analíticos usualmente empregados em procedimentos mais simples, por exemplo, o teste global do modelo e Data-Snooping de uma única hipótese alternativa. Por essa razão, uma rigorosa e completa teoria da confiabilidade não estava disponível até o momento. Embora grandes avanços tenham ocorrido em meados da década de 1970, como os computadores baseados em microprocessadores, Baarda tinha uma desvantagem: a tecnologia de sua época era insuficiente para que se utilizassem técnicas computacionais inteligentes. Hoje o cenário computacional é completamente diferente da época da teoria da confiabilidade de Baarda. Aqui, seguindo a tendência atual da ciência moderna, usamos o método de Monte Carlo e estendemos a teoria da confiabilidade para o DSI. Neste trabalho, demonstramos que a estimação depende do teste e da adaptação e, portanto, o DSI é, na verdade, um estimador. Até o presente momento, a escolha do número de simulações de Monte Carlo tem sido avaliada somente em função da precisão. Assim, levantou-se uma questão: como podemos encontrar um número ótimo de experimentos Monte Carlo em termos de acurácia? Aqui, usamos eventos com probabilidades conhecidas para avaliar a acurácia do Método de Monte Carlo. Os resultados mostraram que, dentre os números de experimentos testados, m = 200, 000 forneceu suficiente precisão numérica, com erro relativo menor que 0.1%. A estatística de teste associada ao DSI é o valor extremo dos resíduos dos mínimos quadrados normalizados. É bem conhecido na literatura que valores críticos desse teste não podem ser derivados de distribuições conhecidas, mas devem ser calculados numericamente por meio do método de Monte Carlo. Este trabalho fornece os primeiros resultados sobre o valor crítico baseado em Monte Carlo inserido em diferentes cenários de correlação entre as estatísticas de teste. Testamos se o aumento do nível de significância conjunto, ou redução do valor crítico, melhora a identificabilidade do outlier. Os resultados mostraram que quanto menor o valor crítico, ou maior o nível de significância conjunto, maior é a probabilidade de correta detecção, e menor é o MDB. Porém, essa relação não é válida em termos de identificação. Observamos que, quando o efeito de todas as observações na taxa de falsa exclusão (Erro Tipo III) diminui, é possível encontrar o menor outlier identificável (MIB). A razão disso é que o efeito da correlação entre os resíduos torna-se insignificante para uma certa magnitude de outlier, o que aumenta a probabilidade da correta identificação.For more than half a century, the reliability theory introduced by Baarda (1968) has been used as a standard practice for quality control in geodesy and surveying. Although the theory meets mathematical rigor and probability assumptions, it was originally developed for a Data-Snooping which assumes a specific observation as a suspect outlier. In other words, only one single alternative hypothesis is in play. Actually, we do not know which observation is an outlier. Since the Data-Snooping consists of screening each individual measurement for an outlier, a more appropriate alternative hypothesis would be: “There is at least one outlier in the observations”. Now, we are interested to answer: “Where?”. The answer to this question lies in a problem of locating among the alternative hypotheses…
dataset = first column: Number ID; second column: North UTM (m); third column: Este UTM (m); four... more dataset = first column: Number ID; second column: North UTM (m); third column: Este UTM (m); fourth column: pH; fifth column: pH; sixth column: Ca; seventh column: Mg; eighth column P; and ninth column K. jack_40.m: Example of the JACK-1T script for the case where the sample was randomly and uniformly reduced to 40%. redSample.m: Script that randomly and uniformly reduces the sample size to a desired percentage.

corrMatrix.m: This Matlab function computes the correlation matrix of w-test statistics. KMC.m: T... more corrMatrix.m: This Matlab function computes the correlation matrix of w-test statistics. KMC.m: This Matlab function computes the critical values for max-w test statistic based on Monte Carlo method. It is needed to run corrMatrix.m before use it. kNN.m: This Matlab function based on neural networks allows anyone to obtain the desired critical value with good control of type I error. In that case, you need to download file SBPNN.mat and save it in your folder. It is needed to run corrMatrix.m before use it. SBPNN.mat: MATLAB's flexible network object type (called SBPNN.mat) that allows anyone to obtain the desired critical value with good control of type I error. Examples.txt: File containing examples of both design and covariance matrices in adjustment problems of geodetic networks. rawMC.txt: Monte-Carlo-based critical values for the following signifiance levels: α′= 0.001, α′= 0.01, α′= 0.05, α′= 0.1 and α′= 0.5. The number of the observations (n) were fixed for each α ′with n = 5 to n= 100 by a increment of 5. For each "n" the correlation between the w-tests (ρwi,wj) were also fixed from ρwi,wj = 0.00 to ρwi,wj = 1.00, by increment of 0.1, considering also taking into account the correlation ρwi,wj = 0.999. For each combination of α′,"n" and ρwi,wj, m= 5,000,000 Monte Carlo experiments were run.

Mathematical Problems in Engineering, 2021
Robust estimation has proved to be a valuable alternative to the least squares estimator for the ... more Robust estimation has proved to be a valuable alternative to the least squares estimator for the cases where the dataset is contaminated with outliers. Many robust estimators have been designed to be minimally affected by the outlying observations and produce a good fit for the majority of the data. Among them, the redescending estimators have demonstrated the best estimation capabilities. It is little known, however, that the success of a robust estimation method depends not only on the robust estimator used but also on the way the estimator is computed. In the present paper, we show that for complicated cases, the predominant method of computing the robust estimator by means of an iteratively reweighted least squares scheme may result in a local optimum of significantly lower quality than the global optimum attainable by means of a global optimization method. Further, the sequential use of the proposed global robust estimation proves to successfully solve the problem of M-split es...

The Lancet Regional Health - Americas, 2021
Background Brazil has faced two simultaneous problems related to respiratory health: forest fires... more Background Brazil has faced two simultaneous problems related to respiratory health: forest fires and the high mortality rate due to COVID-19 pandemics. The Amazon rain forest is one of the Brazilian biomes that suffers the most with fires caused by droughts and illegal deforestation. These fires can bring respiratory diseases associated with air pollution, and the State of Pará in Brazil is the most affected. COVID-19 pandemics associated with air pollution can potentially increase hospitalizations and deaths related to respiratory diseases. Here, we aimed to evaluate the association of fire occurrences with the COVID-19 mortality rates and general respiratory diseases hospitalizations in the State of Pará, Brazil. Methods We employed machine learning technique for clustering k-means accompanied with the elbow method used to identify the ideal quantity of clusters for the k-means algorithm, clustering 10 groups of cities in the State of Pará where we selected the clusters with the highest and lowest fires occurrence from the 2015 to 2019. Next, an Auto-regressive Integrated Moving Average Exogenous (ARIMAX) model was proposed to study the serial correlation of respiratory diseases hospitalizations and their associations with fire occurrences. Regarding the COVID-19 analysis, we computed the mortality risk and its confidence level considering the quarterly incidence rate ratio in clusters with high and low exposure to fires. Findings Using the k-means algorithm we identified two clusters with similar DHI (Development Human Index) and GDP (Gross Domestic Product) from a group of ten clusters that divided the State of Pará but with diverse behavior considering the hospitalizations and forest fires in the Amazon biome. From the auto-regressive and moving average model (ARIMAX), it was possible to show that besides the serial correlation, the fires occurrences contribute to the respiratory diseases increase, with an observed lag of six months after the fires for the case with high exposure to fires. A highlight that deserves attention concerns the relationship between fire occurrences and deaths. Historically, the risk of mortality by respiratory diseases is higher (about the double) in regions and periods with high exposure to fires than the ones with low exposure to fires. The same pattern remains in the period of the COVID-19 pandemic, where the risk of mortality for COVID-19 was 80% higher in the region and period with high exposure to fires. Regarding the SARS-COV-2 analysis, the risk of mortality related to COVID-19 is higher in the period with high exposure to fires than in the period with low exposure to fires. Another highlight concerns the relationship between fire occurrences and COVID-19 deaths. The results show that regions with high fire occurrences are associated with more cases of COVID deaths. Interpretation The decision-make process is a critical problem mainly when it involves environmental and health control policies. Environmental policies are often more cost-effective as health measures than the use of public health services. This highlight the importance of data analyses to support the decision making and to identify population in need of better infrastructure due to historical environmental factors and the knowledge of associated health risk. The results suggest that The fires occurrences contribute to the increase of the respiratory diseases hospitalization. The mortality rate related to COVID-19 was higher for the period with high exposure to fires than the period with low exposure to fires. The regions with high fire occurrences is associated with more COVID-19 deaths, mainly in the months with high number of fires. Funding No additional funding source was required for this study.

Revista Brasileira de Cartografia, 2021
A mais recente versão da teoria da confiabilidade tem sido utilizada para descrever a capacidade ... more A mais recente versão da teoria da confiabilidade tem sido utilizada para descrever a capacidade de um sistema de medição em detectar, identificar e remover outliers a um certo nível de probabilidade. Entretanto, as aplicações desta teoria têm sido direcionadas para redes simuladas de nivelamento. Aqui, por outro lado, aplicamos a teoria no contexto de redes baseadas nos sistemas de posicionamento por satélites GNSS (Global Navigation Satellite System), a partir de dados reais coletados em campo. Testamos se as covariâncias entre as componentes da linha base têm efeito sobre a confiabilidade. Verificamos que as covariâncias entre as componentes da linha base aumentam a taxa de sucesso na identificação de outlier e, portanto, aumentam a confiabilidade da rede. O menor outlier identificável – ao nível de 80% de correta identificação – teve uma redução média de ~30% para as componentes ΔX e ΔY, e ~14% para ΔZ em comparação ao cenário com covariâncias nulas. O aumento do nível de signif...

Mathematical Problems in Engineering, 2021
Robust estimators are often lacking a closed-form expression for the computation of their residua... more Robust estimators are often lacking a closed-form expression for the computation of their residual covariance matrix. In fact, it is also a prerequisite to obtain critical values for normalized residuals. We present an approach based on Monte Carlo simulation to compute the residual covariance matrix and critical values for robust estimators. Although initially designed for robust estimators, the new approach can be extended for other adjustment procedures. In this sense, the proposal was applied to both well-known minimum L1-norm and least squares into three different leveling network geometries. The results show that (1) the covariance matrix of residuals changes along with the estimator; (2) critical values for minimum L1-norm based on a false positive rate cannot be derived from well-known test distributions; (3) in contrast to critical values for extreme normalized residuals in least squares, critical values for minimum L1-norm do not necessarily tend to be higher as network re...

PLOS ONE, 2020
Reliability analysis allows for the estimation of a system's probability of detecting and identif... more Reliability analysis allows for the estimation of a system's probability of detecting and identifying outliers. Failure to identify an outlier can jeopardize the reliability level of a system. Due to its importance, outliers must be appropriately treated to ensure the normal operation of a system. System models are usually developed from certain constraints. Constraints play a central role in model precision and validity. In this work, we present a detailed investigation of the effects of the hard and soft constraints on the reliability of a measurement system model. Hard constraints represent a case in which there exist known functional relations between the unknown model parameters, whereas the soft constraints are employed where such functional relations can be slightly violated depending on their uncertainty. The results highlighted that the success rate of identifying an outlier for the case of hard constraints is larger than soft constraints. This suggested that hard constraints be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. After identifying and removing possible outliers, one should set up the soft constraints to propagate their uncertainties to the model parameters during the data processing.

Survey Review, 2021
The goal of this paper is to evaluate the outlier identification performance of iterative Data Sn... more The goal of this paper is to evaluate the outlier identification performance of iterative Data Snooping (IDS) and L1-norm in levelling networks by considering the redundancy of the network, number and size of the outliers. For this purpose, several Monte-Carlo experiments were conducted into three different levelling networks configurations. In addition, a new way to compare the results of IDS based on Least Squares (LS) residuals and robust estimators such as the L1-norm has also been developed and presented. From the perspective of analysis only according to the success rate, it is shown that L1-norm performs better than IDS for the case of networks with low redundancy , especially for cases where more than one outlier is present in the dataset. In the relationship between false positive rate and outlier identification success rate, however, IDS performs better than L1-norm, independently of the levelling network configuration, number and size of outliers.

International Journal of Environmental Research and Public Health, 2020
The relationship between the fires occurrences and diseases is an essential issue for making publ... more The relationship between the fires occurrences and diseases is an essential issue for making public health policy and environment protecting strategy. Thanks to the Internet, today, we have a huge amount of health data and fire occurrence reports at our disposal. The challenge, therefore, is how to deal with 4 Vs (volume, variety, velocity and veracity) associated with these data. To overcome this problem, in this paper, we propose a method that combines techniques based on Data Mining and Knowledge Discovery from Databases (KDD) to discover spatial and temporal association between diseases and the fire occurrences. Here, the case study was addressed to Malaria, Leishmaniasis and respiratory diseases in Brazil. Instead of losing a lot of time verifying the consistency of the database, the proposed method uses Decision Tree, a machine learning-based supervised classification, to perform a fast management and extract only relevant and strategic information, with the knowledge of how r...

Revista Brasileira de Geomática, Jul 28, 2017
Agradeço a Deus, Autor e Princípio de toda vida, por me propiciar gozar de sapiência e boa saúde ... more Agradeço a Deus, Autor e Princípio de toda vida, por me propiciar gozar de sapiência e boa saúde para que eu pudesse vencer esta etapa tão crucial e rica de minha vida. Aos meus pais, Antônio e Célia, por serem meus grandes incentivadores e por acolherem com carinho e respeitarem a cada uma de minhas decisões, manifesto meus mais profundos agradecimentos. Sou grato ao meu orientador, Prof. Msc. Vinicius Francisco Rofatto, por todo suporte oferecido, pelas nobres sugestões e críticas pertinentes e, acima de tudo, pelo vínculo de amizade e companheirismo firmado. Ao Prof. Dr. Gabriel do Nascimento Guimarães, grande amigo feito durante esta caminhada, modelo de profissional, contador de histórias e conselheiro, deixo um abraço e sinceros agradecimentos. Agradeço à coordenação do curso de Engenharia de Agrimensura e Cartográfica, com todos os seus profissionais, pelo empenho em oferecer sempre as melhores soluções aos desafios administrativos e burocráticos enfrentados no decorrer da graduação. Agradeço aos companheiros de curso, por dividirem experiências, propiciarem a vivência de grandes histórias e por estarem partilhando comigo este momento de extrema felicidade e realização. Agradeço á Profa. Dra. Daniele Barroca Marra Alves, da UNESP, e a toda sua equipe, pela imensa colaboração prestada, sem a qual a execução do trabalho seria impossível. Sou grato também ao Prof. Dr. André Luiz Naves de Oliveira, figura dotada da maior capacidade intelectual com a qual me deparei em toda esta caminhada, por ser um modelo de dedicação aos estudos e de serenidade na lida com as dificuldades do dia-a-dia. Agradeço a todos os agentes: professores, figuras públicas e demais colaboradores que tornaram possível a construção do campus avançado da Universidade Federal de Uberlândia na cidade de Monte Carmelo, o que constitui o principal acontecimento na rica história de minha cidade natal. Agradeço a todos os profissionais que tornam a Universidade Federal de Uberlândia, uma instituição de excelência e que contribui com a sociedade por meio da formação de profissionais sérios, capacitados e comprometidos.

In this paper we evaluate the effects of hard and soft constraints on the Iterative Data Snooping... more In this paper we evaluate the effects of hard and soft constraints on the Iterative Data Snooping (IDS), an iterative outlier elimination procedure. Here, the measurements of a levelling geodetic network were classified according to the local redundancy and maximum absolute correlation between the outlier test statistics, referred to as clusters. We highlight that the larger the relaxation of the constraints, the higher the sensitivity indicators MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) for both the clustering of measurements and the clustering of constraints. There are circumstances that increase the family-wise error rate (FWE) of the test statistics, increase the performance of the IDS. Under a scenario of soft constraints, one should set out at least three soft constraints in order to identify an outlier in the constraints. In general, hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible...
Uploads
Papers by Vinicius Rofatto