Data Interfaces#
This module include classes that download, store, and serve market data.
The two main abstractions are SymbolData
and MarketData
.
Neither are exposed outside this module. Their derived classes instead are.
If you want to interface cvxportfolio with financial data source other than the ones we provide, you should derive from either of those two classes.
Single-symbol data download and storage#
- class cvxportfolio.YahooFinance(symbol, storage_backend='pickle', base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'))#
Yahoo Finance symbol data.
- Parameters:
symbol (str) – The symbol that we downloaded.
storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
.base_storage_location (pathlib.Path) – The location of the storage. We store in a subdirectory named after the class which derives from this.
grace_period (pandas.Timedelta) – If the most recent observation in the data is less old than this we do not download new data.
- Attribute data:
The downloaded, and cleaned, data for the symbol.
- class cvxportfolio.Fred(symbol, storage_backend='pickle', base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'))#
Fred single-symbol data.
- Parameters:
symbol (str) – The symbol that we downloaded.
storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
. By default'pickle'
.base_storage_location (pathlib.Path) – The location of the storage. We store in a subdirectory named after the class which derives from this. By default it’s a directory named
cvxportfolio_data
in your home folder.grace_period (pandas.Timedelta) – If the most recent observation in the data is less old than this we do not download new data. By default it’s one day.
- Attribute data:
The downloaded data for the symbol.
Market data servers#
- class cvxportfolio.UserProvidedMarketData(returns, volumes=None, prices=None, copy_dataframes=True, trading_frequency=None, min_history=Timedelta('365 days 05:45:36'), base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'), cash_key='USDOLLAR')#
User-provided market data.
- Parameters:
returns (pandas.DataFrame) – Historical open-to-open returns. The return at time \(t\) is \(r_t = p_{t+1}/p_t -1\) where \(p_t\) is the (open) price at time \(t\). Must have datetime index. You can also include cash returns as its last column, and set
cash_key
below to the last column’s name.volumes (pandas.DataFrame or None) – Historical market volumes, expressed in units of value (e.g., US dollars).
prices (pandas.DataFrame or None) – Historical open prices (e.g., used for rounding trades in the
MarketSimulator
).trading_frequency (str or None) – Instead of using frequency implied by the index of the returns, down-sample all dataframes. We implement
'weekly'
,'monthly'
,'quarterly'
and'annual'
. By default (None) don’t down-sample.min_history (pandas.Timedelta) – Minimum amount of time for which the returns are not
np.nan
before each assets enters in a back-test.base_location (pathlib.Path) – The location of the storage, only used in case it downloads the cash returns. By default it’s a directory named
cvxportfolio_data
in your home folder.cash_key (str) – Name of the cash account. If not the last column of the provided returns, it will be downloaded. In that case you should make sure your provided dataframes have a timezone aware datetime index. Its returns are the risk-free rate.
- serve(t)#
Serve data for policy and simulator at time \(t\).
- Parameters:
t (pandas.Timestamp) – Time of execution, e.g., stock market open of a given day.
- Returns:
(past_returns, current_returns, past_volumes, current_volumes, current_prices)
- Return type:
(pandas.DataFrame, pandas.Series, pandas.DataFrame or None, pandas.Series or None, pandas.Series or None)
- trading_calendar(start_time=None, end_time=None, include_end=True)#
Get trading calendar from market data.
- Parameters:
start_time (pandas.Timestamp) – Initial time of the trading calendar. Always inclusive if present. If None, use the first available time.
end_time (pandas.Timestamp) – Final time of the trading calendar. If None, use the last available time.
include_end (bool) – Include end time.
- Returns:
Trading calendar.
- Return type:
pandas.DatetimeIndex
- class cvxportfolio.DownloadedMarketData(universe=(), datasource='YahooFinance', cash_key='USDOLLAR', base_location=PosixPath('/home/docs/cvxportfolio_data'), storage_backend='pickle', min_history=Timedelta('365 days 05:45:36'), grace_period=Timedelta('1 days 00:00:00'), trading_frequency=None)#
Market data that is downloaded.
- Parameters:
universe (list) – List of names as understood by the data source used, e.g.,
['AAPL', 'GOOG']
if using the default Yahoo Finance data source.datasource (str or
SymbolData
class) – The data source used.cash_key (str) – Name of the cash account, its rates will be downloaded and added as last columns of the returns. Its returns are the risk-free rate.
base_location (pathlib.Path) – The location of the storage. By default it’s a directory named
cvxportfolio_data
in your home folder.storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
. By default'pickle'
.min_history (pandas.Timedelta) – Minimum amount of time for which the returns are not
np.nan
before each assets enters in a back-test.grace_period (pandas.Timedelta) – If the most recent observation of each symbol’s data is less old than this we do not download new data. By default it’s one day.
trading_frequency (str or None) – Instead of using frequency implied by the index of the returns, down-sample all dataframes. We implement
'weekly'
,'monthly'
,'quarterly'
and'annual'
. By default (None) don’t down-sample.
- serve(t)#
Serve data for policy and simulator at time \(t\).
- Parameters:
t (pandas.Timestamp) – Time of execution, e.g., stock market open of a given day.
- Returns:
(past_returns, current_returns, past_volumes, current_volumes, current_prices)
- Return type:
(pandas.DataFrame, pandas.Series, pandas.DataFrame or None, pandas.Series or None, pandas.Series or None)
- trading_calendar(start_time=None, end_time=None, include_end=True)#
Get trading calendar from market data.
- Parameters:
start_time (pandas.Timestamp) – Initial time of the trading calendar. Always inclusive if present. If None, use the first available time.
end_time (pandas.Timestamp) – Final time of the trading calendar. If None, use the last available time.
include_end (bool) – Include end time.
- Returns:
Trading calendar.
- Return type:
pandas.DatetimeIndex
Base classes (for using other data sources)#
- class cvxportfolio.data.SymbolData(symbol, storage_backend='pickle', base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'))#
Base class for a single symbol time series data.
The data is either in the form of a Pandas Series or DataFrame and has datetime index.
This class needs to be derived. At a minimum, one should redefine the
_download
method, which implements the downloading of the symbol’s time series from an external source. The method takes the current (already downloaded and stored) data and is supposed to only append to it. In this way we only store new data and don’t modify already downloaded data.Additionally one can redefine the
_preload
method, which prepares data to serve to the user (so the data is stored in a different format than what the user sees.) We found that this separation can be useful.This class interacts with module-level functions named
_loader_BACKEND
and_storer_BACKEND
, whereBACKEND
is the name of the storage system used. We definepickle
,csv
, andsqlite
backends. These may have limitations. See their docstrings for more information.- Parameters:
symbol (str) – The symbol that we downloaded.
storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
. By default'pickle'
.base_location (pathlib.Path) – The location of the storage. We store in a subdirectory named after the class which derives from this. By default it’s a directory named
cvxportfolio_data
in your home folder.grace_period (pandas.Timedelta) – If the most recent observation in the data is less old than this we do not download new data. By default it’s one day.
- Attribute data:
The downloaded data for the symbol.
- class cvxportfolio.data.MarketData#
Prepare, hold, and serve market data.
- Method serve:
Serve data for policy and simulator at time \(t\).
- serve(t)#
Serve data for policy and simulator at time \(t\).
- Parameters:
t (pandas.Timestamp) – Trading time. It must be included in the timestamps returned by
trading_calendar()
.- Returns:
past_returns, current_returns, past_volumes, current_volumes, current_prices
- Return type:
(pandas.DataFrame, pandas.Series, pandas.DataFrame, pandas.Series, pandas.Series)
- trading_calendar(start_time=None, end_time=None, include_end=True)#
Get trading calendar between times.
- Parameters:
start_time (pandas.Timestamp) – Initial time of the trading calendar. Always inclusive if present. If None, use the first available time.
end_time (pandas.Timestamp) – Final time of the trading calendar. If None, use the last available time.
include_end (bool) – Include end time.
- Returns:
Trading calendar.
- Return type:
pandas.DatetimeIndex
- property periods_per_year#
Average trading periods per year.
- Return type:
int
- property full_universe#
Full universe, which might not be available for trading.
- Returns:
Full universe.
- Return type:
pandas.Index
- partial_universe_signature(partial_universe)#
Unique signature of this instance with a partial universe.
A partial universe is a subset of the full universe that is available at some time for trading.
This is used in cvxportfolio.cache to sign back-test caches that are saved on disk. If not redefined it returns None which disables on-disk caching.
- Parameters:
partial_universe (pandas.Index) – A subset of the full universe.
- Returns:
Signature.
- Return type:
str