User Profile

Collapse

Profile Sidebar

Collapse
RockRoll
RockRoll
Last Activity: Aug 26 '20, 03:08 PM
Joined: Jul 30 '20
Location:
  •  
  • Time
  • Show
  • Source
Clear All
new posts

  • Possible to have different datatypes among columns of an array?

    Hi Everyone,

    I create an expandable earray of Nx4 columns. Some columns require float64 datatype, the others can be managed with int32. Is it possible to vary the data types among the columns? Right now I just use one (float64, below) for all, but it takes huge disk space for (>10 GB) files.

    For example, how can I ensure column 1-2 elements are int32 and 3-4 elements are float64?

    Code:
    a = f1.create_earray(f1.root,
    ...
    See more | Go to post

  • Hey SioSio,

    Thanks for your assistance. So, that "n" was just a typo here. I however solved the problem. It turns out that PyTables' "append" method was much faster than resizing the HDF5 file. Just wanted to mention here, if anyone stops by here in the future!

    Thanks again for your time!
    See more | Go to post

    Leave a comment:


  • RockRoll
    started a topic Pandas: Merging Sorted Dataframes

    Pandas: Merging Sorted Dataframes

    Hi,

    I have a large (Nx4, >10GB) array that I need to sort based on col.2.

    I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. Here is what I have tried yet:

    Code:
    chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
                         names=['col-1', 'col-2', 'col-3', 'col-4'])
    ...
    See more | Go to post

  • RockRoll
    started a topic Python/Numpy: Automating to save generated data

    Python/Numpy: Automating to save generated data

    I want to save a particular number of values in maps I create. For example, when creating (4064x1) values, I want to save first (1000x1) in map1, next (1000x1) in map2 and so on. The last map will have remaining (64x1) elements. I need these maps later for fast processing.

    Now the issue is I want to automate this as the number 4064 varies based on data I analyze. Here is simplsitic version of something I tried and is working (L is...
    See more | Go to post

  • RockRoll
    started a topic Extracting rows based on condition on one column

    Extracting rows based on condition on one column

    Hi Everyone,

    Here is an interesting problem I am trying to solve.

    I have a (Nx4) array and want to extract those rows which have their third column's element in certain range. Are there existing capabilities in NumPy? Below is a simple example.

    PS: I know how for loops can be used by comparing each element of col. 3; and saving the rows that meet the condition. But I want to use NumPy here (like slicing...
    See more | Go to post

  • So here is what is happening:
    1. I choose a flexible shape dset on line 9; flexible as I am dealing with large arrays and that can vary with the input file size
    2. I fill in some values of interest at line 24
    3. At line 23, I am basically expanding the current size of dset by n (=1). The added row is filled in with values I create at line 24.

    Simply put, I am generating some numbers (line 22) and filling in dset...
    See more | Go to post

    Leave a comment:


  • Yeah, 9th seems to be fine. f_w is providing a file object. Basically I am creating a new "data.h5" that save in its dset at line 24.

    I can also change line 9 to: dset = f_r.create_data set('dataset_3' , data=d1, maxshape=(None, None), chunks=True)

    This (instead of creating a new hdf5 file), creates a new dataset3 in input.h5 ; but the computation time is unimpacted.

    My suspicion is something can...
    See more | Go to post

    Leave a comment:


  • I tried that too. It gives an error "ValueError : Not a dataset (not a dataset)" at Line 12 where e1 is asking for dset1.

    I can't transfer dataset_1 and dataset_2 directly to a list/numpy array as dataset_1 or 2 are really large.

    Any other thought?
    See more | Go to post

    Leave a comment:


  • I closed it using "f_r.close( )" at the end and it didn't change anything. Any other suggestion?
    See more | Go to post

    Leave a comment:


  • Fastest way to subtract elements of datasets of HDF5 file?

    Hey Everyone:

    Here is one interesting problem.

    Input: Input is two arrays (Nx4, sorted in column-2) stored in datasets-1 and 2 in HDF5 file (input.h5). N is huge (originally belonging to 10 GB of file, hence stored in HDF5 file).

    Output: Subtracting each column-2 element of dataset-2 from dataset-1, such that the difference (delta) is between +/-4000. Eventually saving this info in dset of a new HDF5 file....
    See more | Go to post
No activity results to display
Show More
Working...