Loading Data with Pandas

On at least a couple of occasions lately, I realized that I may need Python in the near future. While I have amassed some limited experience with the language over the years, I never spent the time to understand Pandas, its de-facto standard data-frame library.


Where does one start? For me its usually with the data. Simple stuff, loading, wrangling, etc. Re-writing my little R6 helper class to load future’s data looked like a perfect candidate.

There was some frustration, totally expected after years of experience with R. Some things were less intuitive, however, surprisingly pretty much nothing was straight ugly. 🙂 And when it comes to code, I am not easy to please. The end result is available here.

Here is a little example how to use the code, although one can’t do much without the data, which I can’t distribute:

import pandas as pd
import instrumentdb as idb

def main():
    # Crate the object for the database
    db = idb.CsiDb()

    # Load the data for three elements
    all = db.mload_bars(["HO2", "RB2", "CL2"])
    print(all['HO2'].head())
    print(all['RB2'].head())

    # Build an array of the closing prices for each series
    closes = []
    for ss in all.keys():
        closes.append(all[ss]['close'])

    # Create a single data frame using these series
    all_df = pd.concat(closes, join='inner', axis=1)
    all_df.columns = [xx.lower() for xx in all.keys()]

    print(all_df.tail())

    # That's the only line that would work without the data.
    print(db.future_list())

if __name__ == "__main__":
    main()

The structure of the database is available from Tradelib’s source code (I am using the SQLite’s version for this test). To bootstrap (create) the database I use sqlite3.exe’s read command, to which I pass data.sqlite.sql as a parameter. To be used via the CsiDb class, the database is configured using a TOML configuration file.

flavor = "SQLite"
db = "sqlite:///C:/Users/qmoron/Documents/csidata.sqlite"
bars_table = "csi_bars"

Now a little rant: In the above code, I tried to create a module, instrumentdb, to keep the source code in it. This created some problems while developing the module. Apparently, once loaded, it’s pretty hard to re-load the module properly within the same REPL interpreter. From R’s perspective, where I am used to re-loading files, or even packages, as my development goes, that seemed quite an obstacle. After straggling with the issue for a while, the best I was able to come up with, is the above approach of using a full-blown “main” file to drive the execution and some tests. This is unlikely to scale (in the sense of using it in a rapid REPL prototyping) – I am open to suggestions.

Comments

  1. Clemens Brunner says:

    For efficiency reasons, Python imports a module only once. If you must, you can re-import a single module (such as instrumentdb in your case) as follows:

    import importlib
    importlib.reload(instrumentdb)

    For Python development, I strongly recommend PyCharm. Running code in PyCharm always creates a new Python interpreter instance – therefore, all modules are reloaded.

    1. quintuitive says:

      I found that as well, but that works only for the entire module. For parts of the module, like the way I load the class, it doesn’t seem to.

  2. janschulz says:

    If you use the IPython kernel in a Jupyter notebook, try `%autoreload`: https://ipython.org/ipython-doc/3/config/extensions/autoreload.html

Leave a Reply