-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Datafusion gives incorrect results when running TPC-H Query 8 with parquet files.
To Reproduce
- Generate TPC-H parquet files for scale factor 10
- Open datafusion-cli and create external tables pointing to files
create external table lineitem stored as parquet location '/path/to/lineitem/lineitem_1687987398_default/';
create external table customer stored as parquet location '/path/to/customer/customer_1687987384_default/';
create external table nation stored as parquet location '/path/to/nation/nation_1687988005_default/';
create external table orders stored as parquet location '/path/to/orders/orders_1687988005_default/';
create external table region stored as parquet location '/path/to/region/region_1687988223_default/';
create external table supplier stored as parquet location '/path/to/supplier/supplier_1687988223_default/';
create external table part stored as parquet location '/path/to/part/part_1687988133_default/';- Run TPC-H Query 8
select
o_year, sum(case when nation = 'BRAZIL' then volume else
0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (
1
- l_discount) as volume,
n2.n_name as nation
from part, supplier, lineitem, orders, customer, nation n1, nation n2, region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'ECONOMY ANODIZED STEEL'
) as all_nations
group by o_year
order by o_year;- Incorrect results displayed below
+--------+-------------------------------------------+
| o_year | mkt_share |
+--------+-------------------------------------------+
| 1995.0 | -0.00000000000011380044067220119495060732 |
| 1996.0 | 0.00000000000019588288500717383285218261 |
+--------+-------------------------------------------+
2 rows in set. Query took 1.131 seconds.
Expected behavior
The correct results should be:
- From Postgres:
o_year | mkt_share
--------+------------------------
1995 | 0.03882014251433219622
1996 | 0.03948968749183991638
(2 rows)
Time: 11925.416 ms (00:11.925)
- From DuckDb (with same parquet files):
┌────────┬─────────────────────┐
│ o_year │ mkt_share │
│ int64 │ double │
├────────┼─────────────────────┤
│ 1995 │ 0.0388201425143322 │
│ 1996 │ 0.03948968749183992 │
└────────┴─────────────────────┘
Run Time (s): real 1.328 user 8.095730 sys 0.358842
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working