admin管理员组

文章数量:1431761

I have some stock 5min data, like as:

                  Date   Open   High    Low  Close    Volume
0  2024-11-19 09:35:00  11.75  11.79  11.55  11.78  32673600
1  2024-11-19 09:40:00  11.78  11.81  11.73  11.79  14802700
2  2024-11-19 09:45:00  11.79  11.84  11.79  11.82  13837400
3  2024-11-19 09:50:00  11.81  11.83  11.76  11.82   8534200
4  2024-11-19 09:55:00  11.82  11.87  11.80  11.87   8540500
5  2024-11-19 10:00:00  11.87  11.96  11.87  11.90  20659800
6  2024-11-19 10:05:00  11.89  11.90  11.82  11.82  11691000
7  2024-11-19 10:10:00  11.82  11.82  11.73  11.74   8762900
8  2024-11-19 10:15:00  11.74  11.74  11.71  11.73   6870500
9  2024-11-19 10:20:00  11.73  11.73  11.68  11.70   6244800
10 2024-11-19 10:25:00  11.70  11.70  11.66  11.69   5083000
11 2024-11-19 10:30:00  11.70  11.73  11.69  11.71   5342400
12 2024-11-19 10:35:00  11.72  11.74  11.71  11.73   3311800
13 2024-11-19 10:40:00  11.73  11.74  11.71  11.72   2331900
14 2024-11-19 10:45:00  11.72  11.72  11.70  11.72   3024100
15 2024-11-19 10:50:00  11.71  11.74  11.70  11.71   2774200
16 2024-11-19 10:55:00  11.70  11.72  11.70  11.71   1313000
17 2024-11-19 11:00:00  11.72  11.75  11.71  11.74   1737400
18 2024-11-19 11:05:00  11.75  11.75  11.73  11.75   1690600
19 2024-11-19 11:10:00  11.74  11.76  11.73  11.76   1751800
20 2024-11-19 11:15:00  11.76  11.76  11.72  11.73   2248700
21 2024-11-19 11:20:00  11.73  11.73  11.70  11.71   2464200
22 2024-11-19 11:25:00  11.71  11.71  11.69  11.70   1033600
23 2024-11-19 11:30:00  11.69  11.70  11.67  11.69   2063600

I use df.resample to convert them to 30m data, the code is:

df = df.set_index('Date')
df = df.resample('30T').agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
                                 'Volume':'sum'}, closed='right', label = 'right').dropna()

But I got strange result like these:

                      Open   High    Low  Close    Volume
Date                                                     
2024-11-19 09:30:00  11.75  11.87  11.55  11.87  78388400
2024-11-19 10:00:00  11.87  11.96  11.66  11.69  59312000
2024-11-19 10:30:00  11.70  11.74  11.69  11.71  18097400
2024-11-19 11:00:00  11.72  11.76  11.69  11.70  10926300
2024-11-19 11:30:00  11.69  11.70  11.67  11.69   2063600

Here are correct 30m data export from my trading software:

Time    Open    High    Low Close   Volume
 2024/11/19-10:00   11.75   11.96   11.55   11.9    99048200
 2024/11/19-10:30   11.89   11.9    11.66   11.71   43994600
 2024/11/19-11:00   11.72   11.75   11.7    11.74   14492400
 2024/11/19-11:30   11.75   11.76   11.67   11.69   11252500
 

The data at 9:30 is irrelevant, mainly because the following data are not correct. but I did not find more parameters of df.sample. How can I correctly aggregate the data?

I have some stock 5min data, like as:

                  Date   Open   High    Low  Close    Volume
0  2024-11-19 09:35:00  11.75  11.79  11.55  11.78  32673600
1  2024-11-19 09:40:00  11.78  11.81  11.73  11.79  14802700
2  2024-11-19 09:45:00  11.79  11.84  11.79  11.82  13837400
3  2024-11-19 09:50:00  11.81  11.83  11.76  11.82   8534200
4  2024-11-19 09:55:00  11.82  11.87  11.80  11.87   8540500
5  2024-11-19 10:00:00  11.87  11.96  11.87  11.90  20659800
6  2024-11-19 10:05:00  11.89  11.90  11.82  11.82  11691000
7  2024-11-19 10:10:00  11.82  11.82  11.73  11.74   8762900
8  2024-11-19 10:15:00  11.74  11.74  11.71  11.73   6870500
9  2024-11-19 10:20:00  11.73  11.73  11.68  11.70   6244800
10 2024-11-19 10:25:00  11.70  11.70  11.66  11.69   5083000
11 2024-11-19 10:30:00  11.70  11.73  11.69  11.71   5342400
12 2024-11-19 10:35:00  11.72  11.74  11.71  11.73   3311800
13 2024-11-19 10:40:00  11.73  11.74  11.71  11.72   2331900
14 2024-11-19 10:45:00  11.72  11.72  11.70  11.72   3024100
15 2024-11-19 10:50:00  11.71  11.74  11.70  11.71   2774200
16 2024-11-19 10:55:00  11.70  11.72  11.70  11.71   1313000
17 2024-11-19 11:00:00  11.72  11.75  11.71  11.74   1737400
18 2024-11-19 11:05:00  11.75  11.75  11.73  11.75   1690600
19 2024-11-19 11:10:00  11.74  11.76  11.73  11.76   1751800
20 2024-11-19 11:15:00  11.76  11.76  11.72  11.73   2248700
21 2024-11-19 11:20:00  11.73  11.73  11.70  11.71   2464200
22 2024-11-19 11:25:00  11.71  11.71  11.69  11.70   1033600
23 2024-11-19 11:30:00  11.69  11.70  11.67  11.69   2063600

I use df.resample to convert them to 30m data, the code is:

df = df.set_index('Date')
df = df.resample('30T').agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
                                 'Volume':'sum'}, closed='right', label = 'right').dropna()

But I got strange result like these:

                      Open   High    Low  Close    Volume
Date                                                     
2024-11-19 09:30:00  11.75  11.87  11.55  11.87  78388400
2024-11-19 10:00:00  11.87  11.96  11.66  11.69  59312000
2024-11-19 10:30:00  11.70  11.74  11.69  11.71  18097400
2024-11-19 11:00:00  11.72  11.76  11.69  11.70  10926300
2024-11-19 11:30:00  11.69  11.70  11.67  11.69   2063600

Here are correct 30m data export from my trading software:

Time    Open    High    Low Close   Volume
 2024/11/19-10:00   11.75   11.96   11.55   11.9    99048200
 2024/11/19-10:30   11.89   11.9    11.66   11.71   43994600
 2024/11/19-11:00   11.72   11.75   11.7    11.74   14492400
 2024/11/19-11:30   11.75   11.76   11.67   11.69   11252500
 

The data at 9:30 is irrelevant, mainly because the following data are not correct. but I did not find more parameters of df.sample. How can I correctly aggregate the data?

Share Improve this question asked Nov 19, 2024 at 12:19 Sun JarSun Jar 3411 silver badge12 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

By default the reference in resample is the start of the day. It looks like you want the start of the data. You should set origin='start' instead of the default origin='start_day':

(df.resample('30min', origin='start')
   .agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
         'Volume':'sum'}, closed='right', label = 'right')
  .dropna()
)

Output:

                      Open   High    Low  Close    Volume
Date                                                     
2024-11-19 09:35:00  11.75  11.96  11.55  11.90  99048200
2024-11-19 10:05:00  11.89  11.90  11.66  11.71  43994600
2024-11-19 10:35:00  11.72  11.75  11.70  11.74  14492400
2024-11-19 11:05:00  11.75  11.76  11.67  11.69  11252500

And presumably, you want to pass the label parameter to resample:

(df.resample('30min', origin='start', label='right')
   .agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
         'Volume':'sum'})
 .dropna()
)

Output:

                      Open   High    Low  Close    Volume
Date                                                     
2024-11-19 10:05:00  11.75  11.96  11.55  11.90  99048200
2024-11-19 10:35:00  11.89  11.90  11.66  11.71  43994600
2024-11-19 11:05:00  11.72  11.75  11.70  11.74  14492400
2024-11-19 11:35:00  11.75  11.76  11.67  11.69  11252500

Finally, if you want to round to 30 minutes:

(df.resample('30min', origin='start', label='right')
   .agg({'Open':'first', 'High':'max', 'Low':'min', 
         'Close':'last', 'Volume':'sum'})
   .dropna()
   .pipe(lambda x: x.set_axis(x.index.floor('30min')))
)

Output:

                      Open   High    Low  Close    Volume
Date                                                     
2024-11-19 10:00:00  11.75  11.96  11.55  11.90  99048200
2024-11-19 10:30:00  11.89  11.90  11.66  11.71  43994600
2024-11-19 11:00:00  11.72  11.75  11.70  11.74  14492400
2024-11-19 11:30:00  11.75  11.76  11.67  11.69  11252500

The closed='right' and label='right' parameters causing the intervals to align differently.

closed='right': This makes the intervals include the right endpoint but exclude the left.

label='right': This labels the resulting bins by the right edge of the interval.

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
 
# Resampling
resampled_df = df.resample('30T', closed='left', label='right').agg({
    'Open': 'first',
    'High': 'max',
    'Low': 'min',
    'Close': 'last',
    'Volume': 'sum'
}).dropna()
 
print(resampled_df)

1.Use closed='left' to include the left edge of each interval.

2.Use label='right' to match your trading software's labeling.

3.Use .agg() to compute the required statistics.

本文标签: dataframePandas resample stock 5min data not alignStack Overflow