Manual#
Here we explain some concepts that apply throughout the library and are useful to know. The code blocks in this document assume the following imports
import cvxportfolio as cvx
import numpy as np
import pandas as pd
Passing Data#
Most Cvxportfolio objects, such as policies, constraints, and objective terms, accept user-provided data. These data can either have values that are constant in time, varying in time, constant for all assets, or specific for each asset. They are specified as follows.
Python scalars. These represents values that are constant for all assets (if the object requires a value per each asset) and for all times. For example
cvx.LeverageLimit(3)
is a constraint that requires the leverage of the portfolio to be less or equal than three, at all times. Or,
cvx.HoldingCost(short_fees=5.25)
is a cost object that models 5.25% annual fees on short positions, for all assets and at all times.
Pandas series. These represent either values that are constant in time and vary for each asset, or values that vary in time and are constant for all assets. For example
my_forecast = pd.Series([0.001, 0.0005], index=['AAPL', 'GOOG']) cvx.ReturnsForecast(r_hat=my_forecast)
is an objective term that models market returns forecasts of 0.1% and 0.05% for the two stocks that are specified, over the trading period used. The forecasts are constant in time.
Note
During a back-test the trading universe may change. Cvxportfolio objects are aware of the current trading universe at each point of a back-test. If you pass data that vary for each asset, Cvxportfolio objects will try to slice it using the current universe. If they fail, they throw an error. So, you should always provide data for all assets that ever appear in a back-test.
If instead the pandas series has datetime index it is assumed to contain values that are varying in time. For example
datetime_index_2020 = pd.date_range('2020-01-01', '2020-12-31') short_fees_2020 = pd.Series(5.0, index=datetime_index_2020) datetime_index_2021 = pd.date_range('2021-01-01', '2021-12-31') short_fees_2021 = pd.Series(5.25, index=datetime_index_2021) historical_short_fees = pd.concat([short_fees_2020, short_fees_2021]) cvx.HoldingCost(short_fees=historical_short_fees)
is a cost object that models annual fees on short positions, for all assets, of 5% in 2020 and 5.25% in 2021.
Note
You should be careful and make sure that the timestamps used match the timestamps used by the market data server: for example they must have the same timezone. To find the correct timestamps you can call the
trading_calendar()
method of a market data object.Pandas dataframes. The same conventions used for Pandas series apply, so you should read the above, including the two notes. With dataframes you can specify data that varies both for each asset and in time, or multi-dimensional data that varies in time, or for each assets, or both. If you provide data that varies in time, the datetime index should always be the index (not the columns) and in case of a multi-index it should be the first level. For example
my_forecast = pd.DataFrame( [[0.1, 0.05], [0.15, 0.06]], index=[pd.Timestamp('2020-01-01'), pd.Timestamp('2021-01-01')], columns=['AAPL', 'GOOG']) cvx.ReturnsForecast(r_hat=my_forecast)
is an objective term that models returns forecasts of 10% and 5% for the first period, and 15% and 6% for the second period, for the two assets specified. In this case the two periods are one year each (you get that, for example, by setting the
trading_frequency
attribute of a market data server as'annual'
). Remember again that the timestamps must match those provided by thetrading_calendar()
method of the market data server used.Multi-dimensional data constant in time is modeled as follows
exposures = pd.DataFrame( [[1, -.5], [-.25, .75]], index=['AAPL', 'GOOG'], columns=['factor_1', 'factor_2']) cvx.FactorNeutral(factor_exposure=exposures)
so the resulting constraint requires neutrality of the portfolio with respect to those two factors. The index must contain all assets that appear in a back-test, and it will be sliced if at some point in time of a back-test only a subset of those assets is traded (see the note above).
Multi-dimensional data can also vary in time. It is modeled as a Pandas multi-indexed dataframe. If the data is time-varying, the first level of the multi-index should be a Pandas datetime index.
multi_index = pd.MultiIndex.from_product( [[pd.Timestamp('2020-01-01'), pd.Timestamp('2021-01-01')], ['AAPL', 'GOOG']]) exposures = pd.DataFrame( [[1, -.5], [-.25, .75], [.9, -.3], [-.1, .9]], index=multi_index, columns=['factor_1', 'factor_2']) cvx.FactorNeutral(factor_exposure=exposures)
All the conventions above apply (timestamps should match the ones provided by the
trading_calendar()
method of the market data server, assets’ names should include all the ones that are traded, …).Another example are factor covariances that appear in low-rank factor model covariances. These are specified as follows
multi_index = pd.MultiIndex.from_product( [[pd.Timestamp('2020-01-01'), pd.Timestamp('2021-01-01')], ['factor_1', 'factor_2']]) factor_covariances = pd.DataFrame( # factor covariance at '2020-01-01' [[1, 0.25], [0.25, 1], # factor covariance at '2021-01-01' [1, .1], [.1, 1]], index=multi_index, columns=['factor_1', 'factor_2']) cvx.FactorModelCovariance( F=exposures, Sigma_F=factor_covariances, d=0.01)
Numpy arrays. These are not recommended but can be used in simple cases. One use-case is to model data that is constant in time and vary for the assets. If the trading universe varies through a back-test these can’t be used, an error is thrown whenever the sizes of the trading universe and of the array don’t match. For example
my_forecast = np.array([0.001, 0.0005]) cvx.ReturnsForecast(r_hat=my_forecast)
models returns’ forecasts of 0.1% for the first asset in the universe and 0.05% for the second asset in the universe. The ordering is the one of the data provided by the
serve()
method of the market data server.Another usecase, less problematic, is to model data that varies across other dimensions, such as risk factors. For example a constant factor covariance can be provided as follows
factor_covariance = np.array( # constant in time [[1, 0.25], [0.25, 1]]) cvx.FactorModelCovariance( F=exposures, Sigma_F=factor_covariance, d=0.01)
the ordering used here is that of the columns of the provided
exposures
dataframe.
Missing values#
When Cvxportfolio objects access user-provided data, after they locate the right
time and slice with the current trading universe (if applicable), they check
that the resulting data does not contain any np.nan
missing value. If any is
found, they throw an error. Thus, you should make sure that no np.nan
values
are contained in any data passed that will be accessed. It is fine to have
np.nan
values for assets that are not traded at a certain time (for example,
because they didn’t exist) because that data won’t be accessed.
Cash account#
Many Cvxportfolio internal variables, such as the weights and holdings vectors
that you can access in a cvxportfolio.BacktestResult
object, include
the cash account as their last element. In most cases used-provided data is not
concerned with the cash account (such as all examples above) and so it can be
ignored. Exceptions are noted in the documentation of each object.