DevTalk.net

A blog on programming and quantitative finance

Archive for the ‘DotNet’ Category

X2C – Convert Excel Spreadsheets into Executable Code

without comments

Introduction

About a year ago, while I was putting the polish on MathSharp (a MathML-to-code converter), I had yet another idea. I observed that for computational problems, despite the usefulness of MATLAB, Mathematica and other math applications, a great many people used Excel.

Excel really is convenient, and I also model things in it. But for executing code on ‘live’ server systems, rather than running an Excel workbook as a server (yes, it is possible), people typically prefer executable code.

And so with that in mind, I decided to put the two together and create an Excel add-in that would let me turn Excel spreadsheets (with formulae and all) into ready-to-compile C++ code. Thus, the X2C project was born.

High-Level Design

How can Excel and C++ constructs be related? My approach is to define mappings from cells to C++ code constructs. There are 5 different types of mapping:

  • Scalar — a scalar maps a single cell to a variable. This means that a single A1 cell with the value 42 can be mapped to a global field defined as double A1 = 42. Of course, the name is customizable, and I left an option to not set the default value if it’s not needed.

  • Function — this corresponds to a global function. For example, if cell A2 has a formula =A1+23, we generate a global function: double A2() { return A1+23; }

  • Vector — this is a one-dimensional array of values. This means that a selection of A1:A3 can be mapped to a double Stuff[] array with 3 elements in it. If an array is subsequently used in a formula, we use the [] operator to extract the right element. Note that some advanced C++ is used here: for example if someone has a formula =SUM(A1:A3), we cannot create a temporary variable and a for loop to sum things up.

  • Matrix — this is a two-dimensional array. It works just like a 1D array but its initialization is tricker and it’s tougher to understand the iteration code in case it’s used in a formula somewhere. Also, one can take 1D slices out of a 2D matrix, which turns things really nasty.

  • Entity — this construct basically converts a selection of scalar/function definitions into a class! This lets you take a chunk of a spreadsheet and organize it in a way that its constituent parts now become part of a larger, named entity. Of course, lots of tricks are needed here too, because for example, a formula might use a cell mapped to an entity, so you need to add EntityName& to the parameters and dereference accordingly.

All of these mapping are created by an Excel add-in that lets you just click a button with the cell selected and edit the mapping. It’s really as simple as that, though mappings do have different code generation options for added customization.

Mathematical Issues

When dealing with math, there’s lots of issues caused by mismatches between real-world-math and computational math. Here are a few examples:

  • Functions with known equivalents are easy: they are typically replicated by corresponding STL and Boost calls. In certain cases, though, it makes sense to create temporaries: for example, if a function uses NORM.DIST in many places, it makes sense to create the Boost normal distribution construct only once and then reuse it.

  • Functions with no equivalents and easy implementation are typically done using best-available methods. For example, ODD and EVEN Excel functions require checking the last bit of an integer, so we can simply check e.g. (long int)foo & 1.

  • Functions that are weird or just difficult to implement, like BAHTTEXT for example, are completely left out: not much we can do about those.

  • Lots of matrix functions are meticulously implemented with cell-by-cell addressing. For example, a multiplication of two Excel matrices will be spelled out by creating real-life matrices from the data and then using Boost to perform the multiplication. The code for such a construct can look quite scary when generated, but the important thing is that it’s syntactically correct.

  • Inefficient operations are rewritten when necessary. This includes things like repeated calculations. If you’re OK with having POWER(x,3) then that’s fine, but you can also get x * x * x or MyIntPow(x, 3) or even _IntPow(x,3) if you feel like it.

Things to Come

As it stands, X2C only supports C++, and it is capable if generating C++ files that you can preview and dump to disk. There’s rudimentary operations like removal and editing of mappings, and some customization functionality already in the box.

The following ideas might be worth pursuing (depending on demand for the program):

  • .NET support is probably highest on the list. MathSharp already supports C# and F# (but not C++ — that might also happen at some point), so having at least F# in X2C would be nice. It would especially be nice to get continuous (re) compilation working for the generated code, so that the user can see immediately his model updating the code and the code being executed.

  • Web service support is also another neat idea. Generating web server stubs isn’t exactly difficult, but would let users put up the calculations as a service and then share it accross the internet.

  • UI generation is another very attractive feature, because once you get the model in a compilable state, the next thing you want to do is to is to start wiring up the UI so you can manipulate the data and observe the results. So why not generate the UI from the outset?

All in all, the development of X2C really depends on two things: my own personal needs as well as how much demand there is for such a tool. Meanwhile, if I got you interested, check out the product page (I promise to update the video as soon as I can) and let me know what you think.

Written by Dmitri Nesteruk

September 23rd, 2012 at 2:19 pm

Posted in DotNet

Tagged with ,