Pandas OHLCデータのOHLC集約

Question

1列のデータを使用したパンダのOHLC再サンプリングは、たとえば次のデータフレームで完全に機能することを理解しています。

>>df ctime openbid 1443654000 1.11700 1443654060 1.11700 ... df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = df.set_index('ctime') df.resample('1H', how='ohlc', axis=0, fill_method='bfill') >>> open high low close ctime 2015-09-30 23:00:00 1.11700 1.11700 1.11687 1.11697 2015-09-30 24:00:00 1.11700 1.11712 1.11697 1.11697 ...

しかし、データが既にOHLC形式である場合はどうすればよいですか？ APIのOHLCメソッドを収集できることから、列ごとにOHLCスライスを計算します。したがって、データが次の形式の場合：

 ctime openbid highbid lowbid closebid 0 1443654000 1.11700 1.11700 1.11687 1.11697 1 1443654060 1.11700 1.11712 1.11697 1.11697 2 1443654120 1.11701 1.11708 1.11699 1.11708

再サンプリングしようとすると、次のように各列のOHLCが取得されます。

 openbid highbid \ open high low close open high ctime 2015-09-30 23:00:00 1.11700 1.11700 1.11700 1.11700 1.11700 1.11712 2015-09-30 23:01:00 1.11701 1.11701 1.11701 1.11701 1.11708 1.11708 ... lowbid \ low close open high low close ctime 2015-09-30 23:00:00 1.11700 1.11712 1.11687 1.11697 1.11687 1.11697 2015-09-30 23:01:00 1.11708 1.11708 1.11699 1.11699 1.11699 1.11699 ... closebid open high low close ctime 2015-09-30 23:00:00 1.11697 1.11697 1.11697 1.11697 2015-09-30 23:01:00 1.11708 1.11708 1.11708 1.11708

pandas manual？

ありがとう。

ps、この答えがあります- OHLC株式データをpython and pandas で別の時間枠に変換します-しかし、それは4年前だったので、いくつかの進歩。

chrisb · Accepted Answer

これはリンクした答えに似ていますが、ラムダではなく最適化された集計を使用するため、少しクリーンで高速です。

resample(...).agg(...)構文にはpandas version 0.18.0。

In [101]: df.resample('1H').agg({'openbid': 'first', 'highbid': 'max', 'lowbid': 'min', 'closebid': 'last'}) Out[101]: lowbid highbid closebid openbid ctime 2015-09-30 23:00:00 1.11687 1.11712 1.11708 1.117

Benjamin Crouzier · Answer

次のように、新しいバージョンのパンダで行の順序を維持するには、OrderedDictを使用する必要があります。

import pandas as pd from collections import OrderedDict df['ctime'] = pd.to_datetime(df['ctime'], unit='s') df = btceur.set_index('ctime') df = df.resample('5Min').agg( OrderedDict([ ('open', 'first'), ('high', 'max'), ('low', 'min'), ('close', 'last'), ('volume', 'sum'), ]) )

Sivakumar D · Answer

これはうまくいくようです、

def ohlcVolume(x): if len(x): ohlc={ "open":x["open"][0],"high":max(x["high"]),"low":min(x["low"]),"close":x["close"][-1],"volume":sum(x["volume"])} return pd.Series(ohlc) daily=df.resample('1D').apply(ohlcVolume)

Ben · Answer

価格と金額の列を持つデータフレームを考える

def agg_ohlcv(x): arr = x['price'].values names = { 'low': min(arr) if len(arr) > 0 else np.nan, 'high': max(arr) if len(arr) > 0 else np.nan, 'open': arr[0] if len(arr) > 0 else np.nan, 'close': arr[-1] if len(arr) > 0 else np.nan, 'volume': sum(x['amount'].values) if len(x['amount'].values) > 0 else 0, } return pd.Series(names) df = df.resample('1min').apply(agg_ohlcv) df = df.ffill()