-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChunkStore: Iterator always returns an empty dataframe #384
Comments
Its pretty hard to tell what you are doing from the code sample you produced. 'D' chunking results in chunks that comprise all data for the given day. So the chunk range is correct - the start of the chunk is Oct 18, 2016 and thats also the end of the chunk since its a daily chunk. I'm not sure what If you call the iterator method Does Also, due to the nature of your last issue, I want to make sure this isnt on CosmosDB. |
It will be 100x easier for me to help you if you can produce a very small code / data sample that reproduces the issue as well. something like create a small dataframe - write it to arctic - other calls that reproduce the error. |
in case you haven't already seen it: |
Hi @bmoscon, Sorry for the lack of clarity, I've written a small script that replicates the issue running locally against mongodb (not cosmosdb): from datetime import datetime
import pandas
from arctic import Arctic, CHUNK_STORE
from pandas import DataFrame
from pandas.tseries.index import DatetimeIndex
# Create store and library
store = Arctic('localhost', 'test')
store.initialize_library('lib', lib_type=CHUNK_STORE)
lib = store['lib']
# Create dataframe of time measurements taken every 6 hours
date_range = pandas.date_range(start=datetime(2017, 5, 1, 1), periods=8, freq='6H')
df = DataFrame(data={'something': [100, 200, 300, 400, 500, 600, 700, 800]},
index=DatetimeIndex(date_range, name='date'))
# Write to database
lib.write('testkey', df, chunk_size='D')
# Iterate
for chunk in lib.iterator('testkey'):
no_of_rows = len(chunk)
print(no_of_rows)
print(chunk)
# Read
print(lib.read('testkey')) The output of the above script is:
I would expect the output to be:
The
this looks sensible, as you say we're chunking by day but then this produces ranges of:
The only timestamps that could possibly be in these date ranges are 2017-05-01T00:00:00 and 2017-05-02T00:00:00, respectively. From looking closer it looks like this could be fixed by simply passing yield self.read(symbol, chunk_range=c.to_range(chunk[0], chunk[1])) What do you think? |
Alright, let me take a look this evening. Thanks for sending the code that replicates it. I can use that to generate a test case as well for the unit tests. |
oh ok, i see now that the issue is related to the times being part of the datetime index. Nothing we ever tried or intended to work, but let me see if I can get it working and preserve the base case (no times) |
fixed and merged |
Thanks a lot! Quick question just for my own understanding:
How were you using the chunkstore with no times stored in the datetime index? Is that just because you only had one data point per day? But in that case, what's the use case of the ChunkStore with a 'day' chunk size as it results in one document (or chunk) per data point? Sorry for the questions, I just want to make sure I'm not misusing the library. |
i wouldnt use daily really, but there is a use case - imagine you have gigs of data per day. Ideally you want enough data in your chunk size that there is enough to get good compression out of it, but not so big that its bad for reads. |
I understand. Thanks a lot for your quick response and help! |
Arctic Version
Arctic Store
Platform and version
Python 3.5
Description of problem and/or code sample that reproduces the issue
I am using the DateChunker with chunk_size='D' and the data stored has a frequency of about 10 seconds.
no_of_rows
in the above code is always 0 andchunk
is always an empty dataframe.From debugging this looks like it could be because
get_chunk_ranges
creates a a list of date tuples like:so using
c.to_range(chunk[0], chunk[1])
produces:therefore, since the date range is infinitely small, no dates are returned.
Is this a bug or am I doing something wrong?
Cheers
Dave
The text was updated successfully, but these errors were encountered: