Skip to content

Train/test file header may not contain all categories of a categorical variable #350

@mfeurer

Description

@mfeurer

Hey, I just tried AutoWEKA using the code from #349 and think I found two issues related to arff files and categories. I have only had a look at the KDD Appetency dataset (1111) because AW failed here, but should have produced some results according to your 2019 paper.

  1. Empty numerical columns (i.e. all values are missing) are emitted as string:
    @attribute Var141 numeric vs @ATTRIBUTE Var141 STRING
  2. If there is an attribute for which a category with the same letters in the same order but different casing exists, the benchmark appears to drop one. This can be seen in Var217, where the original arff file has uUsP and UUSp, but the file dataset_test_0.arff only has the category UUSP. In case you retrieve the categories from the server, this is a server issue, as the server swallows the extra category as can be seen here, most likely it is https://github.com/openml/OpenML/issues/1114. This results in
     **** AutoWEKA [vlatest]****
     
     Using 4096MB memory per run on 8 parallel runs.
     Running cmd `java -cp /bench/frameworks/AutoWEKA/lib/autoweka/autoweka.jar:/bench/frameworks/AutoWEKA/lib/weka/weka.jar weka.classifiers.meta.AutoWEKAClassifier -t "/input/org/openml/www/datasets/1111/dataset_train_0.arff" -T "/input/org/openml/www/datasets/1111/dataset_test_0.arff" -memLimit 4096 -classifications "weka.classifiers.evaluation.output.prediction.CSV -distribution -file \"/output/predictions/KDDCup09_appetency/0/predictions.weka_pred.csv\"" -timeLimit 60 -parallelRuns 8 -metric areaUnderROC -seed 17193`
     java.io.IOException: nominal value not declared in header, read Token[uUsP], line 261
         at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:354)
         at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:719)
         at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:545)
         at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:514)
         at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:500)
         at weka.core.converters.ArffLoader.getDataSet(ArffLoader.java:1286)
         at weka.core.converters.ConverterUtils$DataSource.getDataSet(ConverterUtils.java:266)
         at weka.core.converters.ConverterUtils$DataSource.getDataSet(ConverterUtils.java:289)
         at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1618)
         at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:668)
         at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:141)
         at weka.classifiers.meta.AutoWEKAClassifier.main(AutoWEKAClassifier.java:266)
     java.lang.NullPointerException
     
         at weka.core.Capabilities.test(Capabilities.java:1138)
     
         at weka.core.Capabilities.testWithFail(Capabilities.java:1468)
     
         at weka.classifiers.meta.AutoWEKAClassifier.buildClassifier(AutoWEKAClassifier.java:298)
     
         at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1632)
     
         at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:668)
     
         at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:141)
     
         at weka.classifiers.meta.AutoWEKAClassifier.main(AutoWEKAClassifier.java:266)
     
     
     
     java.io.IOException: nominal value not declared in header, read Token[uUsP], line 261
         at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:354)
         at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:719)
         at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:545)
         at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:514)
         at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:500)
         at weka.core.converters.ArffLoader.getDataSet(ArffLoader.java:1286)
         at weka.core.converters.ConverterUtils$DataSource.getDataSet(ConverterUtils.java:266)
         at weka.core.converters.ConverterUtils$DataSource.getDataSet(ConverterUtils.java:289)
         at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1618)
         at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:668)
         at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:141)
         at weka.classifiers.meta.AutoWEKAClassifier.main(AutoWEKAClassifier.java:266)
     java.lang.NullPointerException
         at weka.core.Capabilities.test(Capabilities.java:1138)
         at weka.core.Capabilities.testWithFail(Capabilities.java:1468)
         at weka.classifiers.meta.AutoWEKAClassifier.buildClassifier(AutoWEKAClassifier.java:298)
         at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1632)
         at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:668)
         at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:141)
         at weka.classifiers.meta.AutoWEKAClassifier.main(AutoWEKAClassifier.java:266)
     
     AutoWEKA failed producing any prediction.
     Traceback (most recent call last):
       File "/bench/amlb/benchmark.py", line 511, in run
         meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
       File "/bench/frameworks/AutoWEKA/__init__.py", line 10, in run
         return run(*args, **kwargs)
       File "/bench/frameworks/AutoWEKA/exec.py", line 80, in run
         raise NoResultError("AutoWEKA failed producing any prediction.")
     amlb.results.NoResultError: AutoWEKA failed producing any prediction.
    

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions