Archive for the ‘DotNet’ Category
X2C – Convert Excel Spreadsheets into Executable Code
Introduction
About a year ago, while I was putting the polish on MathSharp (a MathML-to-code converter), I had yet another idea. I observed that for computational problems, despite the usefulness of MATLAB, Mathematica and other math applications, a great many people used Excel.
Excel really is convenient, and I also model things in it. But for executing code on ‘live’ server systems, rather than running an Excel workbook as a server (yes, it is possible), people typically prefer executable code.
And so with that in mind, I decided to put the two together and create an Excel add-in that would let me turn Excel spreadsheets (with formulae and all) into ready-to-compile C++ code. Thus, the X2C project was born.
High-Level Design
How can Excel and C++ constructs be related? My approach is to define mappings from cells to C++ code constructs. There are 5 different types of mapping:
-
Scalar — a scalar maps a single cell to a variable. This means that a single
A1cell with the value42can be mapped to a global field defined asdouble A1 = 42. Of course, the name is customizable, and I left an option to not set the default value if it’s not needed. -
Function — this corresponds to a global function. For example, if cell
A2has a formula=A1+23, we generate a global function:double A2() { return A1+23; } -
Vector — this is a one-dimensional array of values. This means that a selection of
A1:A3can be mapped to adouble Stuff[]array with 3 elements in it. If an array is subsequently used in a formula, we use the[]operator to extract the right element. Note that some advanced C++ is used here: for example if someone has a formula=SUM(A1:A3), we cannot create a temporary variable and aforloop to sum things up. -
Matrix — this is a two-dimensional array. It works just like a 1D array but its initialization is tricker and it’s tougher to understand the iteration code in case it’s used in a formula somewhere. Also, one can take 1D slices out of a 2D matrix, which turns things really nasty.
-
Entity — this construct basically converts a selection of scalar/function definitions into a
class! This lets you take a chunk of a spreadsheet and organize it in a way that its constituent parts now become part of a larger, named entity. Of course, lots of tricks are needed here too, because for example, a formula might use a cell mapped to an entity, so you need to addEntityName&to the parameters and dereference accordingly.
All of these mapping are created by an Excel add-in that lets you just click a button with the cell selected and edit the mapping. It’s really as simple as that, though mappings do have different code generation options for added customization.
Mathematical Issues
When dealing with math, there’s lots of issues caused by mismatches between real-world-math and computational math. Here are a few examples:
-
Functions with known equivalents are easy: they are typically replicated by corresponding STL and Boost calls. In certain cases, though, it makes sense to create temporaries: for example, if a function uses
NORM.DISTin many places, it makes sense to create the Boost normal distribution construct only once and then reuse it. -
Functions with no equivalents and easy implementation are typically done using best-available methods. For example,
ODDandEVENExcel functions require checking the last bit of an integer, so we can simply check e.g.(long int)foo & 1. -
Functions that are weird or just difficult to implement, like
BAHTTEXTfor example, are completely left out: not much we can do about those. -
Lots of matrix functions are meticulously implemented with cell-by-cell addressing. For example, a multiplication of two Excel matrices will be spelled out by creating real-life matrices from the data and then using Boost to perform the multiplication. The code for such a construct can look quite scary when generated, but the important thing is that it’s syntactically correct.
-
Inefficient operations are rewritten when necessary. This includes things like repeated calculations. If you’re OK with having
POWER(x,3)then that’s fine, but you can also getx * x * xorMyIntPow(x, 3)or even_IntPow(x,3)if you feel like it.
Things to Come
As it stands, X2C only supports C++, and it is capable if generating C++ files that you can preview and dump to disk. There’s rudimentary operations like removal and editing of mappings, and some customization functionality already in the box.
The following ideas might be worth pursuing (depending on demand for the program):
-
.NET support is probably highest on the list. MathSharp already supports C# and F# (but not C++ — that might also happen at some point), so having at least F# in X2C would be nice. It would especially be nice to get continuous (re) compilation working for the generated code, so that the user can see immediately his model updating the code and the code being executed.
-
Web service support is also another neat idea. Generating web server stubs isn’t exactly difficult, but would let users put up the calculations as a service and then share it accross the internet.
-
UI generation is another very attractive feature, because once you get the model in a compilable state, the next thing you want to do is to is to start wiring up the UI so you can manipulate the data and observe the results. So why not generate the UI from the outset?
All in all, the development of X2C really depends on two things: my own personal needs as well as how much demand there is for such a tool. Meanwhile, if I got you interested, check out the product page (I promise to update the video as soon as I can) and let me know what you think.