[c++] Apply ArraySchemaEvolution to specified timestamp#2895
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2895 +/- ##
==========================================
+ Coverage 89.83% 89.98% +0.15%
==========================================
Files 37 37
Lines 3926 3926
==========================================
+ Hits 3527 3533 +6
+ Misses 399 393 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
Good. That should also help for R where we documented this first, |
|
There is something going on, since should have been since we wrote |
|
I've verified that ☝️ is a separate bug. Running this on a built checkout without this PR: I get |
@eddelbuettel great! If you have a ilnk for that "where" handy, please share it here -- that will be helpful. |
ArraySchemaEvolution to specified timestamp [WIP]ArraySchemaEvolution to specified timestamp
Never mind -- I found #2879 |
|
@johnkerl Here is my actual test script from last week (which is not yet committed to the 'scraps' repo I use for these). With the SOMA version I have here (a few days old) it just comes up empty (no 'strings' for the enumeration). Worth testing on your patch. Details!#!/usr/bin/env Rscript
uri <- "/tmp/tiledb/somadf-evolve-ts"
if (dir.exists(uri)) unlink(uri, recursive=TRUE)
schema <- arrow::schema(arrow::field("soma_joinid", arrow::int64(), nullable = FALSE),
arrow::field("int", arrow::int32(), nullable = FALSE),
arrow::field("str", arrow::dictionary(index_type = arrow::int8(), value_type = arrow::utf8(), ordered = FALSE), nullable=FALSE))
print(schema)
ts1 <- rep(as.POSIXct(1, tz="UTC"), 2)
sdf <- tiledbsoma::SOMADataFrameCreate(uri, schema, tiledb_timestamp=ts1[1])
data <- arrow::arrow_table(soma_joinid = bit64::as.integer64(1L:5L),
int = 101:105L,
str = factor(c('a','b','b','a','b')))
cat("\n")
print(tibble::as_tibble(data))
ts2 <- rep(as.POSIXct(2, tz="UTC"), 2)
sdf$write(data, ts2)
data <- arrow::arrow_table(soma_joinid = bit64::as.integer64(6L:10L),
int = 106:110L,
str = factor(c('c','b','c','c','b')))
cat("\n")
print(tibble::as_tibble(data))
ts3 <- rep(as.POSIXct(3, tz="UTC"), 2)
sdf$write(data, ts3)
sdf$close()
cat("\n")
tiledb::tiledb_array(uri, return_as="data.table")[]
sdf <- tiledbsoma::SOMADataFrameOpen(uri, tiledb_timestamp=ts3)
res <- sdf$read()$concat()
print(res)
print(res$str)
#tibble::as_tibble(res)
|
@eddelbuettel before & after this PR I get the following when I run that: This PR should be merged to remove a known defect from the C++ implementation and we can continue from there. |
* [c++] Apply `ArraySchemaEvolution` to specified timestamp * Add Python unit-test coverage * run `make format`
) * [c++] Apply `ArraySchemaEvolution` to specified timestamp * Add Python unit-test coverage * run `make format` Co-authored-by: John Kerl <[email protected]>
|
@johnkerl when you said above
where you on But that is on the de/sc-52516/timestamprange branch. |
|
Good to know -- awesome @eddelbuettel ! :) |
|
Branch should hopefully be ready "Real Soon Now (TM)" too. |
)" This reverts commit 7241fef. It seems to break cellxgene_census_builder/tests/test_builder.py.
Issue and/or context: #2879
Changes: If we've got the array open at a timestamp, and we need to evolve it, evolve from that same timestamp.
Notes for Reviewer:
Material from @nguyenv to be incorporated into unit-test material:
Script:
Before:
After:
Difference: