Blackmire Open-Sourced

ActiveMesaWe’ve made the decision to open-source Blackmire, the tool that allows C# to C++ conversion via the use of Roslyn. As of right now, the source code is available on GitHub: https://github.com/ActiveMesa/Blackmire

Blackmire is a very complicated project and its commercial viability is somewhat problematic given that, unlike say MathSharp, Blackmire needs every single construct to be translated correctly. Couple that with various issues based around the Object type as well as the need for more and more emulation of C# features not present in C++ and you’ve got yourself a project bigger than what is commercially viable for a tiny code shop.

Of course, we’ll keep committing to Blackmire and we encourage other people to participate as well: together we can make a tool that is truly usable and works to benefit everyone. Thanks! ■

Property Change Notifications in C++

Having spent a lot of time in .NET land, I find it difficult to adapt to the lack of first-class support for certain commonplace concepts in C++. One problem I had recently is change notifications: an ability for a property to notify whoever is interested that its value has been changed. C++ does not support this because

  • There is no concept of a ‘property’ in standard C++ (fields exist, of course).
  • There is no concept of an ‘event’ in C++ either.

Luckily, both of these problems can be successfully solved – either by using a library or – more contentiously – a compiler extension. Let’s take a look.

Properties

A property in a language like C# is simply a wrapper for accessor and mutator methods (a.k.a. getter and setter) methods. For example, in C# you can write

bool CanVote
{
  get { return age >= 16; }
}

And this could be used as person.CanVote, i.e. as if we were accessing a field-like construct. Under the covers, of course, this turns into a method (that’s what they call functions in C#/Java), but that’s just compiler magic.

In C++, apart from the (obvious) way of making an accessor function, there’s no support for properties. Unless, of course, you are using a compiler from Microsoft or Intel. This extension is called __declspec(property) and what it does is… properties! Yep, properties in C++.

So here’s how it works: you create accessor/mutator functions as always but then create a field-like declaration with __declspec(property) prefix:

class Person
{
public:
  int age;
  int get_age() const { return age; }
  void put_age(int value)
  {
    if (age == value) return;
    age = value;
    // change notification will happen here
  }
  __declspec(property(get = get_age, put = put_age)) int Age;
};

Okay, so once you’ve got everything in place, you can use it as follows:

Person p;
p.Age = 33;
p.Age++;

Under the covers, your assignment/increment/whatever mutating calls will be proxied over to put_age, which does three things:

  1. Checks that the value has in fact been changed, and kicks us out otherwise.
  2. Assigns the value (duh!).
  3. Notifies whoever’s listening that the value has changed.

1. and 3. are required for sensible change notification, though 3. is not yet shown since we haven’t implemented it yet.

Oh, and just in case you’re wondering about encapsulation, I want to note two things:

  • There is no way to hide the getter/setter functions. If you make them private, your program will not compile, because they are actually used when reading/writing values.
  • You can make the field private. A small consolation prize, I guess.

Change Notification

In .NET, notifications are done using events, which are first-class implementations of the Observer pattern built right into the language. In C++, events are typically a library solution implemented in libraries such as Boost.Signals2 (for more info on Boost see my course). In C++, events are typically implemented using the signal and slot paradigm, which we are going to leverage.

Let’s define a uniform interface… ugh, I mean class, for sending change notifications (C++ doesn’t have interfaces as such, either):

template <typename T> class INotifyPropertyChanged
{
public:
  signal<void(T&, string)> PropertyChanged;
};

The name here is stolen from the equivalent .NET name. Now, what this class has is a signal that we fire when a particular property is changed. The first argument passes a reference to the object, the second contains the name of the property that was changed.

Having written this, we can now inherit this type in Person:

class Person : public INotifyPropertyChanged<Person>
{
public:
  int age;
  int get_age() const { return age; }
  void put_age(int value)
  {
    if (age == value) return;
    age = value;
    PropertyChanged(*this, "Age"); // <-- we notify here!
  }
  __declspec(property(get = get_age, put = put_age)) int Age;
};

So, now that we’ve implemented this, any change in the value of age will be sent as a notification to the subscribers. How do we get notifications? Well, this is rather easy:

Person p;
p.PropertyChanged.connect([](Person& p, string name){
  if (name == "Age")
    cout << "Person's age changed to " << p.Age << endl;
});
p.Age = 44;
p.Age++;

And the output is:

Person's age changed to 44
Person's age changed to 45

Conclusion

This is an illustration of some of the hoops you have to jump to get things to work the way you want. I actually use __declspec(property) in many places in my programs and while some people might see it as non-standard or whatnot, I really do not care. ■

Five Ways of Improving Excel to Code Conversion

Converting Excel to compile-ready C++ or C# code isn’t as easy as you might think: there are plenty of cases where generating code straight-ahead is either inefficient or would result in incorrect code.

Here is an overview of five different cases where X2C performs adjustments.

1. Sequence Folding

Say you decide to map a few cells containing values {0, 1, 2, 3, 4} to a vector:

If you were to perform initialization of this vector, it might look as follows:

double Foo[5];
void InitializeFoo()
{
  Foo[0] = 0;
  Foo[1] = 1;
  Foo[2] = 2;
  Foo[3] = 3;
  Foo[4] = 4;
}

The above is correct, but tedious. Wouldn’t it be great if we could detect repetition like this and put it in a loop? Well, X2C does exactly this, so your generated code will look as follows:

double Foo[5];
void InitializeFoo()
{
  for (int i = 0; i < 5; ++i)
    Foo[i] = i;
}

X2C does its best at detecting row/column-dependent values and, if it can see that even a part of the elements can be initialized in a for loop, it rewrites the code accordingly.

2. Range Function Substitutions

Let’s say you decided to find smallest of the elements above:

What sort of code would you expect? Maybe something like

double Min() const
{
  return std::min(Foo[0], std::min(Foo[1], std::min(Foo[2],  // you get the idea
}

Well, X2C could just give you that, but it’s a lot smarter. Depending on the language, X2C knows about ways of aggregating values so, for the above case, the generated code would be as follows:

double Min() const
{
  return *std::min_element(std::begin(Foo), std::end(Foo));
}

3. Complex Aggregate Calculations

Sometimes, though, aggregations get complicated – not just a simple sum, average or calculation of smallest or largest element, but rather something like the following:

In this case, what X2C does is perform a walk over each of the elements and outputs the final result based on element-wise reads, i.e.:

double FooBar() const
{
  return (Foo[0]*Bar[0] + Foo[1]*Bar[1] + Foo[2]*Bar[2] + 
    Foo[3]*Bar[3] + Foo[4]*Bar[4]);
}

In future, we aim to teach X2C to fold even these types of expressions into loops, as the above could be calculated using a simple for loop and just one temporary variable.

4. Synthetic Evaluation for Simple Functions

Some functions which appear deceptively easy in Excel require additional evaluation steps when translated to code. Here’s a simple example:

What would you expect X2C to do with the TRUNC() function in the above example? Well, for functions which are difficult to immediately express using existing APIs, X2C simply rewrites them the best way it knows. For the above, you’d get the following code:

double Foo::Pi() const
{
  return M_PI;
}
double Foo::PiTruncated() const
{
  return (boost::math::round(Pi() * 100) / 100);
}

The value 100 in the above comes from the fact we need to truncate to 2 decimal places (102 = 100).

5. Compile-Time Evaluation

Sometimes, the parameters to an Excel function actually determine which function to call. In this case, X2C simply checks the parameter at ‘compile-time’ and writes the appropriate code. Here’s an example:

In the above, it’s the 4th parameter to the normal distribution calculation that’s the problem. If set to FALSE, this is a calculation of the probability density function φ(x), whereas if it is TRUE, we need to compute the cumulative distribution function Φ(x) instead.

To get around this, X2C simply looks at the parameter value and infers the invocation for that. So for the cell above, you would get the following function:

double Foo() const
{
  return boost::math::cdf<>(
    boost::math::normal_distribution<>(0, 1), 0.5);
}

Conclusion

These are just some of the ways in which X2C improves the conversion process. If you want to see these, and many other transformations in action, download X2C and give it a go! ■

Late-Night Ruminations on C++

Way Back When

I think it was in 199X when I got my first experience of ‘serious’ programming. One day, being fed up with toying about with Delphi (it was relevant at that time, unlike now), I went out and bought one of those book+CD packages. The package included two books on Programming Windows with Visual Studio 97, a refernce book (essentially, a very large reprint of a chunk of MSDN), and the CDs containing a trial version (the equivalent of today’s Express) of Visual Studio. I installed it on my 166MHz 16Mb RAM machine, and I was hooked.

The fact that I got to actually write MFC apps was amazing. After all, MFC with all its insane macros and types such as CString that supplanted ordinary STL types (was there an STL back then?) were about as appropriate for learning C++ as it is appropriate to lose virginity with a horse. (Sorry, really crude example, but true.)

At any rate, I seem to have somehow survived MFC (without writing one single meaningful app, of course) and moved on to just writing console apps. At the moment I was trying to implement the gaming rules of AD&D (2nd edition, I beleive), and I distincly remember my excitement of learning about Microsoft’s __declspec(property) which, for those of you who do not know, allows one to have C#-style properties. Yes, we really had this capability ages ago (and still do).

However, at some point, C++ gave way first to Java, then to C#. I suppose that the real problem was the creating a good GUI app was much easier using .NET. That and the fact that C# had usable strings, memory management, and a whole host of other things. You dodn’t have to deal with arcane things such as Boost, and even STL which is arcane, especially if you consider its error messages.

Recently

Fast-forward a few years, and I’ve started using C++ again. Why? Well, I think the reason I got back into it was image processing. In C#, manipulating an image is easy (the API is very intuitive and System.Drawing.Bitmap is nice), but the speed basically kills the effort outright. So what I did is I started using P/Invoke to basically Lock() all the bytes in place, send it off to a C++ DLL to process, and then return the results back in.

Performance increased tenfold. And sure, you can fiddle around in C# with unsafe and pointers, but why bother?

Anyways, the real issue with C++ compared to C# to me, at least, seemed to be the fact that C++ is generally superior for parallelization. That’s why, at some point, I completely gave up on using the Microsoft compiler (which, to be fair, has improved in recent years) and focussed on using Intel Parallel Studio. The great thing about IPS is that, first, it’s a terrific set of compiler+libraries+tools, and is a real joy to use provided you invest some time in learning how it all works; and second, the Intel compiler is available on non-Windows OSs, which helps in terms of uniformity, even if I don’t use other OSs as much.

In terms of parallelization, .NET has TPL whereas C++ has

  • PPL

  • OpenMP

  • Intel Threading Building Blocks (similar to PPL, by the way)

  • MPI

Okay so maybe it’s not fair to bring all of these into the same category, but they all serve one purpose – speeding up code. And they do it pretty well. I’ve used OpenMP on a project, and this is basically ‘blind parallelization’, kind of like trusting TPL’s Parallel.For() to optimize your loop for multicore.

‘Modern’ C++

Today’s C++ is every bit as painful as the C++ of old. You still have to do the same menial things. In terms of what can seriously annoy a C#/Java developer, I can think of the following:

  • Having to manage your own memory. Worse yet, there’s lots of ambiguous ways of using all those funny shared_ptr, unique_ptr etc. containers, and it’s nearly impossible to figure out even for experienced devs, let alone the uninitiated.

  • The lack of proper string literals as well as C++‘s inability to just bite the bullet and create just a compiler flag for how to treat strings has led us to the insanity of char/wchar_t, various BSTR implementations, as well as the fact that I need to put the letter L in front of string literals.

  • Truly insane compiler messages that Clang is supposed to be able to fix (err… I’ll beleive it when I see it).

  • Really messed up mechanisms for declaring/initializing arrays, as well as the equally insane idea that an array is just a pointer to the first element. And don’t get me started on multi-dimensional arrays.

  • The separation between header and implementation files results in huge amounts of wasted efforts. Personally, I try to avoid it by keeping everything in one single, huge .cpp file, but that’s not a good idea either.

On the other hand, there are reasons to use C++ nowadays. I’ve mentioned parallelization briefly, but there’s also C++ AMP, which is a way to transparently (ugh, more or less) execute C++ on a GPU. And of course, like it or not, interfacing with CUDA is a lot easier from C++ than .NET, though products are showing up all over the place that actually illustrate that it’s possible to just transcompiler. Still, if you go looking for jobs in the GPGPU space, you’re essentially looking at CUDA C which is, you guessed it, a C variant, so something closely related to C++.

Another advantage to C/C++ is that it’s a lot easier to integrate with .NET. I won’t talk about MC++ here, because this is a really ambigous product, but rather about P/Invoke – a technology that lets you transparently call C++ DLL functions right from C# (or indeed F#/VB.NET). Of course, you cannot use OOP structures from C#, but chances are you’re just optimizing a particular algorithm and you don’t need to import any hierarchies.

On the subject, there has been a few attempts to bring even OOP libraries to .NET. A good example is QuantLib which is a C++ library for quant finance (a topic very close to my heart). This library actually uses Boost for lifetime management, and of course it does use objects. Still, there are mechanisms which let you use it from .NET, though you must realize that you’re essentially getting a second-rate product.

Conclusion

There is no conclusion, since these are just my ruminations on a language that has largely gone out of relevance and now remains the niche of a few specialists. That’s not to say that it’s fundamentally bad, rather that it’s no longer something that warrants serious investment. In fact, I’d argue that knowing CUDA C is a more valuable skill than knowing C++, unless of course you’re job seeking in the quant space, where it’s still considered fresh.

X2C – Convert Excel Spreadsheets into Executable Code

Introduction

About a year ago, while I was putting the polish on MathSharp (a MathML-to-code converter), I had yet another idea. I observed that for computational problems, despite the usefulness of MATLAB, Mathematica and other math applications, a great many people used Excel.

Excel really is convenient, and I also model things in it. But for executing code on ‘live’ server systems, rather than running an Excel workbook as a server (yes, it is possible), people typically prefer executable code.

And so with that in mind, I decided to put the two together and create an Excel add-in that would let me turn Excel spreadsheets (with formulae and all) into ready-to-compile C++ code. Thus, the X2C project was born.

High-Level Design

How can Excel and C++ constructs be related? My approach is to define mappings from cells to C++ code constructs. There are 5 different types of mapping:

  • Scalar — a scalar maps a single cell to a variable. This means that a single A1 cell with the value 42 can be mapped to a global field defined as double A1 = 42. Of course, the name is customizable, and I left an option to not set the default value if it’s not needed.

  • Function — this corresponds to a global function. For example, if cell A2 has a formula =A1+23, we generate a global function: double A2() { return A1+23; }

  • Vector — this is a one-dimensional array of values. This means that a selection of A1:A3 can be mapped to a double Stuff[] array with 3 elements in it. If an array is subsequently used in a formula, we use the [] operator to extract the right element. Note that some advanced C++ is used here: for example if someone has a formula =SUM(A1:A3), we cannot create a temporary variable and a for loop to sum things up.

  • Matrix — this is a two-dimensional array. It works just like a 1D array but its initialization is tricker and it’s tougher to understand the iteration code in case it’s used in a formula somewhere. Also, one can take 1D slices out of a 2D matrix, which turns things really nasty.

  • Entity — this construct basically converts a selection of scalar/function definitions into a class! This lets you take a chunk of a spreadsheet and organize it in a way that its constituent parts now become part of a larger, named entity. Of course, lots of tricks are needed here too, because for example, a formula might use a cell mapped to an entity, so you need to add EntityName& to the parameters and dereference accordingly.

All of these mapping are created by an Excel add-in that lets you just click a button with the cell selected and edit the mapping. It’s really as simple as that, though mappings do have different code generation options for added customization.

Mathematical Issues

When dealing with math, there’s lots of issues caused by mismatches between real-world-math and computational math. Here are a few examples:

  • Functions with known equivalents are easy: they are typically replicated by corresponding STL and Boost calls. In certain cases, though, it makes sense to create temporaries: for example, if a function uses NORM.DIST in many places, it makes sense to create the Boost normal distribution construct only once and then reuse it.

  • Functions with no equivalents and easy implementation are typically done using best-available methods. For example, ODD and EVEN Excel functions require checking the last bit of an integer, so we can simply check e.g. (long int)foo & 1.

  • Functions that are weird or just difficult to implement, like BAHTTEXT for example, are completely left out: not much we can do about those.

  • Lots of matrix functions are meticulously implemented with cell-by-cell addressing. For example, a multiplication of two Excel matrices will be spelled out by creating real-life matrices from the data and then using Boost to perform the multiplication. The code for such a construct can look quite scary when generated, but the important thing is that it’s syntactically correct.

  • Inefficient operations are rewritten when necessary. This includes things like repeated calculations. If you’re OK with having POWER(x,3) then that’s fine, but you can also get x * x * x or MyIntPow(x, 3) or even _IntPow(x,3) if you feel like it.

Things to Come

As it stands, X2C only supports C++, and it is capable if generating C++ files that you can preview and dump to disk. There’s rudimentary operations like removal and editing of mappings, and some customization functionality already in the box.

The following ideas might be worth pursuing (depending on demand for the program):

  • .NET support is probably highest on the list. MathSharp already supports C# and F# (but not C++ — that might also happen at some point), so having at least F# in X2C would be nice. It would especially be nice to get continuous (re) compilation working for the generated code, so that the user can see immediately his model updating the code and the code being executed.

  • Web service support is also another neat idea. Generating web server stubs isn’t exactly difficult, but would let users put up the calculations as a service and then share it accross the internet.

  • UI generation is another very attractive feature, because once you get the model in a compilable state, the next thing you want to do is to is to start wiring up the UI so you can manipulate the data and observe the results. So why not generate the UI from the outset?

All in all, the development of X2C really depends on two things: my own personal needs as well as how much demand there is for such a tool. Meanwhile, if I got you interested, check out the product page (I promise to update the video as soon as I can) and let me know what you think.

Performance of String Histogram Building in C++ AMP

We all know that GPUs are excellent number crunchers. Given a large amount of data, GPUs can process numerics up to 2 orders of magnitude more efficiently than today’s multi-core CPUs. But to date, the use of GPUs has been mainly restricted exactly to this scientific/mathematical domain, without much concern to alternative uses.

In this post, I want to take a look at other ways of exploiting GPUs – specifically, how individual strings as well as string arrays can be processed on the GPU and what kind of performance benefit we would be able to get. For the purposes of my investigations, I will be using an ATI 6800 series GPU with an ordinary Core 4 Quad. I will use ordinary C++ and C++ AMP as the technologies being compared. I will use primary Latin characters (in a range that can be displayed in a console), but in Unicode (i.e., wchar_t-based) strings as befits a modern framework.

Please note that the examples use minimum optimization of C++ code, i.e., common STL algorithms and approaches are used and no attempt is made to improve or replace basic C++ constructs. This, I beleive, constitutes a fairer test than attempting to fine-tune C++ for peak performance.

Character histogram

We’ll begin with a simple case: given a rather lengthy text, how long would it take to build a histogram of all the characters present in the string? We’ll use a special function to randomly generate strings of different sizes:

// uses chars 32-127
void fill_string(wchar_t* chars, int count)
{
  for (int i = 0; i < count; ++i)
  {
    chars[i] = 32 + rand() % 95;
  }
}

We’ll perform the experiment on strings with length from 2 to 224 to ascertain how the algorithm performs under different conditions. First, we begin with a C++ implementation – without any constraints on the architecture, we’ll use a simple map-based approach:

typedef concurrent_unordered_map<wchar_t,int> histogram;
unique_ptr<histogram> cpu_histogram(wchar_t* str, size_t count)
{
  unique_ptr<histogram> result(new histogram);
  parallel_for((size_t)0, count, [&](int i)
  {
    (*result)[str[i]]++;
  });
      
  return result;
}

The GPU function is quite a bit more complicated. First of all, there is no real way of treating elements as wchar_t types since C++ AMP does not recognize elements smaller than an uint32_t. Thus, any view over a wchar_t array has to use the uint32_t data type.

In order to test the histogram, I used an existing implementation written by Daniel Moth. I won’t replicate the code here. All I changed in Daniel’s implementation is the definition of the data type, which I set to wchar_t, as well as an implementation of an alternative constructor that took actual data rather than the data size.

From then on, what I did is created strings of sizes 2n and filled them in with random printable characters:

int size = sizes[i];
wchar_t* target = new wchar_t[size+1];
ZeroMemory(target, (size+1) * sizeof(wchar_t));
fill_string(target, size);

Then, I simply timed the ‘cost’ of the CPU and GPU calls and averaged out results:

aggregate = 0.0;
for (int s = 0; s < 20; ++s)
{
  Timer gpuTimer;
  gpuTimer.Start();
  auto gh = gpu_histogram(target, size);
  gpuTimer.Stop();
  aggregate += gpuTimer.Elapsed();
}
wcout << (aggregate / 20.0) << endl;

Measuring Performance

For performance measurement I found yet another useful blog post describing the steps necessary to get the measurement just right. I also used a high-res timer from this article (isn’t the internet wonderful?).

The end result is the following performance measurements from the CPU and GPU. This is from a Release build with all optimizations enabled. Each iteration was repeated 20 times with the results averaged.

Figure 1. Comparison of GPU and CPU performance for building a string histogram. The X axis corresponds to a string of length 2x, the Y axis represents the elapsed time in milliseconds, and is binary-log-scaled.

It’s clear that, up to some point, the CPU has an advantage, as the GPU is consistently paying a small but annoying start-up cost. As far as measuring the benefits, the parallel lines on the right hand side of the chart suggest a constant 8× performance improvement when using the GPU. Performance benefits of the GPU can only be appreciated when dealing with strings with length greater than 216 (65536) characters.

Conclusion

This experiment is just a small performance investigation, probably full of various methodological errors and inaccuracies, but the above has certainly been benefitial in terms of figuring out that:

  • The GPU kernel appears to have a fixed start-up cost associated with it. However, the start-up cost seems to be around 2ms, which is fairly insignificant, except maybe for things like high-frequency trading.

  • The GPU appears to have a linear advantage over the CPU (about 3 orders of binary magnitude), which is surprising, because I would expect non-linear divergence in the results.

  • The current approach seems to be entirely unfit for calculating histograms with unlimited character sets. The reason is that currently, a histogram array matches the size of all possible characters, and then there are equally many numbers of such arrays. Essentially, to make lots of arrays of size 2sizeof(wchar_t) is impossible – the GPU just doesn’t have thas much memory.

I’ll certainly be playing more with C++ AMP and string processing specifically. Meanwhile, if you’d like to get the source code for this article, you can find it here. Who knows, maybe your performance measurements will be entirely different? Or maybe the algorithm won’t even run on your machine? At any rate, let me know. ■

ReSharper SDK Adventures Part 4 – SSR, Gutter Marks and Suppressions in Agent Mulder

In the previous three parts of the SDK Adventures we looked at some artificially created samples. Time to change all that. In this part, we’re going to take a look at some of the advanced features used by Igal Tabachnik’s (@hmemcpy) excellent Agent Mulder plugin.

Brief Overview of Agent Mulder

The principal idea behind Agent Mulder is to get ReSharper to support various IoC frameworks. To get a better appreciation for its features, take a look at the screencast or, better yet, download the plugin and try it out for yourself. But to sum things up, Agent Mulder augments ReSharper with intrinsic knowledge about the types registered in the IoC container and allows all sorts of wonderful things, such as e.g. the possibility to navigate to the point where a type is registered.

But seeing how our SDK adventures concern actual implementations of the various features, here’s what we’re going to talk about today:

  • Programmatic use of Structural Search and Replace (SSR)

  • Use of gutter marks

  • Suppression of existing inspections

All of these are fairly advanced topics, but we’ll tread carefully and hopefully things will make sense.

Programmatic SSR

Structured Search and Replace is a ReSharper feature that helps people locate code based on some pattern. For example, you can go off looking for $1.Foo($2) where $1 and $2 have specific types. The Replace part of the SSR lets you then replace the found pattern with something else by presenting a context action.

So how does this work in Agent Mulder? Well, AM needs to locate places where a particular component has been registered. To do that, it creates and uses search patterns – definitions of type IStructuralSearchPattern which are object-orientated equivalents of what one would otherwise define in the SSR GUI. For example, here’s the pattern for a Castle Windsor container registration:

private static readonly IStructuralSearchPattern pattern =
  new CSharpStructuralSearchPattern("$container$.Register($arguments$)",
    new ExpressionPlaceholder("container", "Castle.Windsor.IWindsorContainer", false),
    new ArgumentPlaceholder("arguments", -1, -1)); // any number of arguments

The above is a simple search for $container$.Register($arguments$) where the first parameter has a type of IWindsorContainer and the second argument is actually variadic (i.e., there might be several arguments).

Now that we’ve got the pattern to search with, how do we perform the search? Well, to match an expression to a pattern we need a matcher embodies by the IStructuralMatcher interface. This matcher is effectively created from the defined pattern, for example:

var matcher = pattern.CreateMatcher();

With the matcher in hand, any particular ITreeNode can then be matched against the pattern using a QuickMatch() method. This method takes the tree node as a parameter and returned a bool indicating whether there is, in fact, a match. Keep in mind that a matcher works on a particular in vocation, not on the entire tree. This is why Agent Mulder also tries to get the tree node as an IInvocationExpression and, if successful, performs a match against all its subexpressions.

Gutter Marks

Gutter marks are little glyphs on the left-hand side of the editing pane. They typically show various useful bits of information, such as indicating that a type inherits from another type. Gutter marks are also clickable. In the case of Agent Mulder, a gutter mark is used to indicate that a type is registered in a container.

So what’s a gutter mark in terms of the API? Put simply, it is a class that inherits from IconGutterMark and is registered with an assembly-level RegisterHighlighter attribute. There are a few things the gutter mark class has to do. First of all, it has to call the base class constructor and provide an image that it is going to display:

public ContainerGutterMark()
  : base(ImageLoader.GetImage("Hat", Assembly.GetExecutingAssembly()))
{
}

The above loads the Hat.png image file from the current assembly. In order for the above invocation to work, an additional assembly-level attribute is required in AssemblyInfo.cs:

[assembly: ImagesBase("AgentMulder.ReSharper.Plugin.Resources")]

Now, once the image is loaded, there is also the IsClickable property of the gutter mark class to implement. Its function is predictable: if set to true, the OnClick() function must be defined.

public override bool IsClickable
{
  get { return true; }
}

Now, before we get to clickability itself, we need to discuss the way gutter marks are actually added to a file because, after all, their invocation (the OnClick() override) depends entirely on their position relative to code.

The answer to this question is rather simple: gutter mark positions are actually controlled by highlightings – yes, the same highlightings that are used to indicate warnings, errors, and so on. This exact mechanism allows the clickable gutter mark to actually get information about its location. Here’s how it works: the gutter mark is registered at the assembly level with an attribute similar to the following:

[assembly: RegisterHighlighter("Container Registration", 
  "{B57372C1-16C3-4CB5-8B68-A0FBEFB487AD}", 
  EffectType = EffectType.GUTTER_MARK, GutterMarkType = typeof(ContainerGutterMark), 
  Layer = 2001)]

The critical parameter in the above registration is the Id, and in this case it has a value of "Container Registration". Now, we can go ahead and create a simple highlighting with a severity of INFO. The critical piece of the puzzle is that the AttributeId parameter of the highlighting has to match the Id from above:

[StaticSeverityHighlighting(Severity.INFO, "GutterMarks", 
  OverlapResolve = OverlapResolveKind.NONE, AttributeId = "Container Registration", 
  ShowToolTipInStatusBar = false)]
public sealed class RegisteredByContainerHighlighting : IClickableGutterHighlighting
{
  // implementation here
}

But this doesn’t answer the most difficult of questions: how does the gutter mark know where to go? It knows because the gutter mark’s OnClick() override gets an IHighlighter as a parameter. This means that if we get a solution manager and this highlighter, we can get at the actual highlighting and invoke its own Click() method:

public override void OnClick(IHighlighter highlighter)
{
  ISolution currentSolution = Shell.Instance.GetComponent<ISolutionManager>().CurrentSolution;
  if (currentSolution == null)
  {
    return;
  }
  var clickable = Daemon.GetInstance(currentSolution).GetHighlighting(highlighter)
                  as IClickableGutterHighlighting;
  if (clickable != null)
  {
    clickable.OnClick();
  }
}

Put all of the above together and you get a clickable gutter mark.

Suppression of Inspections

If you turn on Solution-Wide Analysis (SWA), you’ll get container-registered classes as unused, which isn’t good because they are used, albeit in a container. Sure, you could just mark these as [UsedImplicitly], but this is extra work for the developer, so Agent Mulder’s idea is to handle this automatically.

How does it work? Well, one of the daemon processes in ReSharper’s ecosystem is the CollectUsagesStageProcess. This process handles usage information, and conveniently enough, this component has a SetElementState() method that can set the ‘usage state’ of a particular element (in the case of Agent Mulder, an IConstructor):

private void SetConstructorsState(ITypeElement typeElement, UsageState state)
{
  foreach (IConstructor constructor in typeElement.Constructors)
  {
    collectUsagesStageProcess.SetElementState(constructor, state);
  }
}

It really is that simple. Agent Mulder actually sets the usages in its daemon stage process as it goes through the files. This means that, for classes that are registered in the container, constructors are marked as used which prevents SWA from claiming that the class is unused.

Conclusion

I’ve gone over most of the core aspects of Agent Mulder, leaving out perhaps just one – the search & navigation aspect. This really warrants its own entry however, so stay tuned for more R# SDK adventures. Oh, and check out Agent Mulder if you haven’t done so already!

ReSharper SDK Adventures Part 3 – CSV Paste Action

The two previous posts have been about various ReSharper features such as analyzers and context actions. But what if you’ve got a plugin that just wants to do something without showing fancy UI and interacting with the R# ecosystem in any major way? Well, in this case, you can create something known as an action.

What are actions?

Actions are, effectively, commands that you can invoke and your plugin can respond to. To invoke an action, you can use any number of menus as well as keyboard shortcuts. Actions are actually published as commands in Visual Studio, so binding them to a particular key is no problem.

For example, let’s say we’re working with Excel and we want to cut-and-paste data from Excel into your C# file, naturally turning the data into some easily digestible form, like e.g., an array. What we can do is define an action to handle it. An action has to implement the IActionHandler interface and be decorated with the ActionHandler attribute whose parameter takes the action’s identifier.

[ActionHandler("PasteCSV")]
public class PasteCSV : IActionHandler
{
  // todo
}

Now, the IActionHandler interface has two methods that you need to implement.

First, there is the Update() method, which determines whether our action gets executed at all. This is a great place to check, in our case, if we’ve got the right data on the clipboard. Interestingly, when you copy a chunk of an Excel worksheet, data gets placed on the clipboard in multiple formats simultaneously, including a plain-text format (separated by tabs) and a CSV format (separated by commas). We’ll go for the CSV format here. Also, we need to check that there is an editor to work with, because if there isn’t, pasting is meaningless. Here’s the implementation we end up with:

public bool Update(IDataContext context, ActionPresentation presentation, DelegateUpdate nextUpdate)
{
  return Clipboard.ContainsText(TextDataFormat.CommaSeparatedValue) &&
    context.GetData(JetBrains.TextControl.DataContext.DataConstants.TEXT_CONTROL) != null;
}

The second method is the Execute() method where, as you may have guessed, execution of the action actually happens. This is where the bulk of our algorithm will reside.

The Algorithm

Now, we know text is in there, so let’s think a little about the algorithm that we might want to implement when pasting things. Data can be coming in as a single row, column, or a table of multiple rows and columns. It can also be homogeneous (e.g., all numbers) or heterogeneous, mixing text and numbers.

Without overcomplicating things, I’d argue that:

  • A single row or column of numeric data can be declared as an array of fixed size.

  • A table of purely numeric data can be treated as a rectangular array.

  • Any ‘mixed’ data needs to be declared as an array of Tuples. The supported data types are double, DateTime or string, depending what parses.

We therefore start by getting the clipboard data and breaking it into lines:

var csv = Clipboard.GetText(TextDataFormat.CommaSeparatedValue);
var lines = csv.Split(new[]{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);

We can now write a simple method checking that all data is numeric:

public bool IsAllDataNumeric(string[] lines)
{
  double dummy;
  return lines.SelectMany(line => line.Split(','))
    .All(element => double.TryParse(element, out dummy));
}

Let’s consider the numeric case first. If either the number of rows or columns is equal to zero, we get a 1-dimensional array; otherwise, we get a rectangular array:

if (rows == 1 || cols == 1)
{
  sb.AppendFormat("double[] foo = {{ {0} }};",
                  rows == 1 ? lines[0] : lines.Join(","));
}
else
{
  sb.Append("double[,] foo = {").AppendLine();
  foreach (var line in lines)
    sb.AppendFormat("{{ {0} }},", line).AppendLine();
  sb.Append("};");
}

Well, that was the easy part, now the tough part. How can we guess the type of a non-numeric data item? Well, we can try parsing it for whatever data structure we want. And once we’ve got the type, we can format things accordingly:

public string FormatForType(string input)
{
  DateTime dt;
  double d;
  if (DateTime.TryParse(input, out dt))
    return dt.ToAssemblyCode();
  else if (double.TryParse(input, out d))
    return input;
  else return input.Quoted();
}

In the above, the ToAssemblyCode() extension method creates a new DateTime(...) declaration corresponding to the actual DateTime object. Quoted() simply puts double quotes around a string.

Now, if we are making an array, we can still keep implicit typing just so long as data is, roughly the same. This can cause a few hiccups: for example, is 123 a double? According to type conversion rules, it may as well be. But according to type inferencing rules, it’s not, and Tuple.Create(123) is actually a Tuple<int> even if you’ve got a large array where some elements are of type double. You need to be explicit about this.

Anyways, the implementation of Tuple-based algorithm for both vectors and matrices is as follows:

if (rows == 1 || cols == 1)
{
  var data = rows == 1 ? lines[0].Split(',') : lines.ToArray();
  sb.AppendFormat("var foo = Tuple.Create({0});", data.Select(FormatForType).Join(","));
} else
{
  sb.Append("var foo = new[] {").AppendLine();
  foreach (var line in lines)
    sb.Append("Tuple.Create(").Append(line.Split(',').Select(FormatForType).Join(",")).Append("),");
  sb.Append("};");
}

And finally, we can put insert the generated text into the current document at the caret position:

var textControl = context.GetData(JetBrains.TextControl.DataContext.DataConstants.TEXT_CONTROL);
var doc = textControl.Document;
var pos = textControl.Caret.Position;
doc.InsertText(pos.Value.ToDocOffset(), sb.ToString());

And that’s all there really is to it.

Using Actions

There are two fairly obvious ways to use actions.

The first is to simply bind it to a shortcut. To do this, you go to Tools | Options and choose the Environment → Keyboard section:

The other option is to have the action display its own menu item, either in the top menu bar or in any number of context menus for, e.g., the solution, the project, etc. In order to implement this, you need to do three things:

  • Create in your project a file called Actions.xml and set its Build Action to ”Embedded Resource“

  • Edit the file, specifying the action and the menu you want it to appear under. For example, to have our action in the top-level menu under ReSharper | Foo, you would specify the following in the XML file:

    <actions>
      <insert group-id="ReSharper" position="last">
        <action-group id="Foo" text="Foo" shared="true">
          <action id="PasteCSV" text="Paste CSV"/>
        </action-group>
      </insert>
    </actions>
    

    A more comprehensive example of places where the menu item can be added is available in the SDK.

  • Finally, you need to specify the location of the Actions.xml file in AssemblyInfo.cs, i.e., add a line similar to the following:

    [assembly: ActionsXml("Bar.Baz.Actions.xml")]
    

    In the above, Bar.Baz refers to the name of the assembly you’re working with. (We are referencing an embedded resource, after all.)

Conclusion

Once again, I’ve presented an example that would require a lot more rigor if it were to be deemed production-ready code. Actions are simple, but there are lots of useful things you can do with them. And in case you’re interested, the source code for this action can be found here.

ReSharper SDK Adventures Part 2 – Math.Pow Improvements

In the first part of our SDK experiments, we implemented a way of identifying Math.Pow() calls with integer-based powers and wrote a quick-fix to correct the situation. Let us now try to improve the robustness of our code as well as increase its functionality.

Checking that caller is indeed System.Math

So far, we only checked that the function being called is Pow by getting the name of the reference:

bool functionIsCalledPow = false;
var e = element.InvokedExpression as IReferenceExpression;
if (e != null)
{
  if (e.Reference.GetName().Equals("Pow"))
    functionIsCalledPow = true;
}

This means that we can easily get a false positive with something like this:

double y = Foo.Pow(x, 2.0);

Now, to handle this situation properly, we’re going to scrap our previous check of the name and replace it with the following:

bool isOnMathPow = false;
var r = element.InvocationExpressionReference.Resolve();
var m = r.DeclaredElement as IMethod;
if (m != null)
{
  var parent = m.GetContainingType();
  if (parent != null)
  {
    isOnMathPow = parent.GetClrName().FullName.Equals("System.Math")
                  && m.ShortName.Equals("Pow");
  }
}

There’s quite a lot that’s happening in the above, so let’s go through it step-by-step:

  • The first thing you’ll notice is that we now use the InvocationExpressionReference and Resolve() it. This typically yields us a ‘resolve result’, but can also fail in case the method doesn’t resolve to anything.

  • To figure out if the resolution happened correctly, we simply take the result’s DeclaredElement and cast it to an IMethod, since Math.Pow() is a static method call.

  • If the result is OK, we also get the type that contains the method. If this is the right call, this should point us to the System.Math class.

  • To check that we do in fact have System.Math, we get the parent’s CLR name, and from that its FullName, which has the full namespace prefix.

  • To check we have the right method, we simply check its name as before.

The above manipulations, though complex, rid us from false positives like that Foo.Pow() call.

Custom function handling

Changing Math.Pow(x, 2.0) to x*x makes sense. So does changing Math.Pow(y, 3) to y*y*y. However, after this threshold, things get a bit ridiculous: you really don’t want z*z*z*z*z*z in your code, particularly since there’s no way to read from this that this is z^6. Wouldn’t it be nice if the user could specify their custom function that we ought to use for powers, say, greater than 3?

Let’s do this. First of all, we’ll define a settings class that will house the user’s preferences, specifically whether they want to have a custom power function, and what its name is:

[SettingsKey(typeof(Missing), "General Settings")]
public class GeneralSettings
{
  [SettingsEntry(false, "Use Custom Power Function")]
  public bool UseCustomPowerFunction { get; set; }
 
  [SettingsEntry("", "Custom Power Function Name")]
  public string CustomPowerFunctionName { get; set; }
}

We can now create a settings page (a user control that implements IOptionsPage) and bind the properties as follows:

settings.SetBinding(this.lifetime, (GeneralSettings gs) => gs.UseCustomPowerFunction,
  WinFormsProperty.Create(this.lifetime, chUseCustomFunction, x => x.Checked, true));
settings.SetBinding(this.lifetime, (GeneralSettings gs) => gs.CustomPowerFunctionName,
  WinFormsProperty.Create(this.lifetime, tbInliningFunctionName, x => x.Text, true));

Now, we need to change our inlining fix. This is tricky. First of all, let’s define a method that would actually get the two values given a context function:

public Pair<bool, string> GetCustomFunctionSettings(Func<Lifetime, DataContexts, IDataContext> ctx)
{
  var ss = Shell.Instance.GetComponent<ISettingsStore>();
  var bs = ss.BindToContextTransient(ContextRange.Smart(ctx));
  var s = bs.GetKey<GeneralSettings>(SettingsOptimization.DoMeSlowly);
  return new Pair<bool, string>(s.UseCustomPowerFunction, s.CustomPowerFunctionName);
}

The above simply gets a settings store, binds it to a transient context that’s based on the function we provide (more on that in a sec), reads the settings and returns them in a Pair<>.

The function above may be a bit tricky, but it’s required to create a context. Typically, though, you can get the function directly from whatever code element you’re operating on. In our case, the highlighting passed into a quick-fix has an Expression, so we can simply use that:

public IntPowerInliningFix(IntPowerHighlighting highlighting)
{
  this.highlighting = highlighting;
  customNameSettings = GetCustomFunctionSettings(highlighting.Expression.ToDataContext());
}

Now that we’ve got the settings, all that remains is to use them! The approach I’m going to take here is to use the custom function, if available, for powers greater than 3:

protected override Action<ITextControl> ExecutePsiTransaction(ISolution solution, IProgressIndicator progress)
{
  var expr = highlighting.Expression;
  var arg = expr.Arguments[0];
  var factory = CSharpElementFactory.GetInstance(expr.GetPsiModule());
  ICSharpExpression replacement;
  if (customNameSettings.First && highlighting.Power > 3)
  {
    var template = "$0($1, " + highlighting.Power + ")";
    replacement = factory.CreateExpression(template, 
      customNameSettings.Second, expr.Arguments[0]);
  }
  else
  {
    replacement = factory.CreateExpression(
      Enumerable.Range(0, highlighting.Power).Select(i => "$0").Join("*"), arg.Value);
  }
  ModificationUtil.ReplaceChild(expr, replacement);
      
  return null;
}

Now, a user can specify a replacement function Maths.IntPow and have their Math.Pow(x, 4.0) call refactored to Maths.IntPow(x, 4). Note that it’s 4, not 4.0 — we assume that the target function takes an int as the second parameter. Note also that in order to get 4 to appear, we cannot use the $ notation – instead, we prepare the template by concatenating strings, putting the integer there manually.

Code Cleanup

Changing the Math.Pow() invocations one at a time is too slow if you’ve got hundreds of such instances. The mechanism to perform changes en masse in ReSharper is called Code Cleanup and that’s the mechanism we’re going to use to perform changes on all int-bearing instances of Math.Pow() that are found, be it in a file, project or whole solution.

So let’s start with the basics: what we need is a code cleanup module, a class that implements the ICodeCleanupModule interface and is decorated with [CodeCleanupModule]. Unfortunately, ICodeCleanupModule has six methods that we must implement. Some of these are similar to IQuickFix methods but actually serve different purposes. There’s also the thorny issue of descriptors, so let’s start with those.

A descriptor is basically an option of code cleanup. For example, the option of whether to change declarations to var is encapsulated within a descriptor. We need at least one descriptor to define whether we want to do the Math.Pow() replacement at all, so here goes:

[DefaultValue(false)]
[DisplayName("Replace Math.Pow() integer calls")]
[Category(CSharpCategory)]
private class Descriptor : CodeCleanupBoolOptionDescriptor
{
  public Descriptor() : base("ReplaceMathPowIntegerCalls") {}
}

All this does is provide an option to turn the feature on or off in various code cleanup profiles. Now, this feature can be instantiated and returned from the code cleanup module:

private static readonly Descriptor descriptor = new Descriptor();
public ICollection<CodeCleanupOptionDescriptor> Descriptors
{
  get { return new[] {descriptor}; }
}

Now, moving on, let’s implement the IsAvailable() method. This method takes an IPsiSourceFile and we’re supposed to determine whether it can be used. Well, our only restriction right now is that this has to be a C# file, so…

public bool IsAvailable(IPsiSourceFile sourceFile)
{
  return sourceFile.GetPsiFile<CSharpLanguage>() != null;
}

One small thing to note is that the above causes the action to also fire in injected PSI, i.e., in C# that’s part of MVC or Razor views. If you want to avoid this, we could use GetNonInjectedPsiFile() instead.

Next, let’s go after SetDefaultSetting(), which determines the default value of this option depending on the code cleanup profile that’s being used. Let’s assume that, by default,

  • Our feature is on in the “Full Cleanup” profile.

  • Our feature is off in the “Reformat” profile.

Thus, the following implementation of SetDefaultSetting() communicates our preferences:

public void SetDefaultSetting(CodeCleanupProfile profile, CodeCleanup.DefaultProfileType profileType)
{
  switch (profileType)
  {
    case CodeCleanup.DefaultProfileType.FULL:
      profile.SetSetting(descriptor, true);
      break;
    default:
      profile.SetSetting(descriptor, false);
      break;
  }
}

Now, there’s also the IsAvailableeOnSelection property, which is pretty self-descriptive. You can return true or false here based on applicability; it doesn’t matter much for our experiments.

Finally, we come to the meat of the problem: the Process() method where changes actually happen. But, annoyingly enough, this method needs to reuse our analyzer and quick-fix code. We don’t really want to run it all again, do we? As a result, we perform the following refactorings:

  • In IntPowerProblemAnalyzer, we isolate the check on the right method into a static method called InvocationExpressionIsMathPowCall. This function now returns two values – a bool indicating whether this is the right function, and an int indicating the power that is being used.

  • In IntPowerInliningFix, we ensure the function to acquire settings is static; we then isolate the code that performs the change into a separate (again, static) method.

We can now attempt to perform the code cleanup itself. This is no mean feat, considering that previously, in our quick-fix, a lot of the plumbing was handled by the BulbItemImpl class that we inherited from. Let’s do a few checks, first. Let’s check that we’ve got the right file and that we are allowed to do the change:

var file = sourceFile.GetPsiFile<CSharpLanguage>();
if (file == null) 
  return;
 
if (!profile.GetSetting(descriptor)) 
  return;

If these checks pass, there’s some legitimacy to doing the change. We may as well get the user settings for the quick-fix right now, because doing them every time we meet an invocation expression is a bad idea:

var settings = IntPowerInliningFix.GetCustomFunctionSettings(sourceFile.ToDataContext());

Now, regrettably, we must manually set up a transaction using the PsiManager. We’re going to also be using shell locks, specifically a write lock that is required to perform changes in a file. The IShellLocks variable can be injected into the constructor of our module:

var settings = IntPowerInliningFix.GetCustomFunctionSettings(sourceFile.ToDataContext());
file.GetPsiServices().PsiManager.DoTransaction(() =>
{
  using (shellLocks.UsingWriteLock())
  {
    // scary stuff here
  }
}, "Code cleanup");

So, what exactly is happening inside the above? Well, we have to go through all IInvocationExpression’s in the file recursively and, for each one that happens to be a correct Math.Pow() call, perform the replacement. This is where our refactorings come into play, because without them we’d be duplicating code or creating meaningless (and quite possibly broken) instances of our analyzer or quick-fix.

Here’s what the ‘scary stuff’ looks like:

var itemsToChange = new List<Pair<IInvocationExpression,int>>();
file.ProcessChildren<IInvocationExpression>(e =>
{
  int power;
  if (IntPowerProblemAnalyzer.InvocationExpressionIsMathPowCall(e, out power))
    itemsToChange.Add(new Pair<IInvocationExpression, int>(e, power));
});
foreach (var e in itemsToChange)
  IntPowerInliningFix.PerformChange(e.First, settings, e.Second);

Notice how we have to cache the items before they are processed — this is necessary because modifying these items as they are being iterated is not safe, just as it’s not safe to modify the loop iterator within the loop. Thus, we use a List<> to cache the items and their respective powers, and then reprocess them wholesale once the iteration has been completed.

Conclusion

In this post I’ve demonstrated how to dig into the invocation’s type name (no easy feat), how to define settings and use them. I’ve also demonstrated how to write a code cleanup module to perform large-scale changes on the code base. You can find the source code here.

ReSharper SDK Adventures Part 1 – Math.Pow Inlining

Seeing how the ReSharper SDK has been out for quite a while, I’d like to start a series of posts demonstrating how it can be used. And since this is the first post in a series, I’m going to explain a little about some of the practices related to plugin development, and illustrate a few concepts that are central to R# as a product.

Problem

Why does the plugin ecosystem exist in the first place? Why make the SDK? The motivation is, of course, the fact that individual features that you might need for your particular case aren’t available out of the box. For example, you are aware of a particular case where code is bad and you want to somehow correct this. (SSR helps in simple cases, but the one we’ll consider is a bit more complex.)

For example, the following is, arguably, bad code:

var y = Math.Pow(x, 2.0);

The performance overhead on calculating x² this way is massive. The reason is that Math.Pow() is tuned towards floating-point powers, which means that up to some extent, it is a lot more efficient to do one of the following:

  • Write the expression inline, i.e., Math.Pow(x, 2.0)x*x.

  • Substitute an expression with your own implementation where the power is an integer, i.e., Math.Pow(x, 2.0)Maths.Pow(x, 2), where Maths is your own utility class and Pow is a function that has an integer overload.

Now, the above is a problem, and I’d like this problem to be taken care of by using ReSharper’s extensibility mechanisms.

Identifying the Structure

If we are to somehow fix the above, we first need to be able to identify the ‘offending’ structure. We are going to assume the simplest possible case, Math.Pow(x, y) where x is an identifier and y is a numeric value that happens to be an integer.

We should immediately be able to realize the constraints that we are putting on this model. For example, we assume that x is an identifier and not, for example, a numeric value. In fact, if it was, we could precompute the whole thing. But what if it was a constant, something like Math.Pow(Math.PI, 2)? Clearly, in this particular case, we wouldn’t care whether the power was an integer or not, because in any case, the replacement would be a computed constant value.

So, let’s look at a typical call, Math.Pow(x, 2.0). Here’s how to get info about it:

  1. First, fire up Visual Studio with the /ReSharper.Internal switch to get all the internal debug menus.

  2. Make a bare-bones file that contains the Math.Pow() statement in it.

  3. In the top-level menu, choose ReSharper | Internal | Go to PSI Viewer.

What you should end up with is something like this:

The PSI viewer is a really neat tool for figuring out what the code you’re after actually is. For instance, the above basically shows us that Math.Pow(x, 2.0) is

  • An IInvocationExpression containing

  • A reference expression referring to Math.Pow

  • A list of arguments containing

  • A reference expression with an identifier x

  • A literal expression with a floating-point value of 2.0

And that’s just about all we need to know – for now. We can now create a simple problem analyzer to look for this particular scenario.

Problem Analyzer

A problem analyzer is a piece of code that ReSharper can use to find out if there’s a problem somewhere. Since we agree to consider integer-based Math.Pow expressions to be a performance problem, we are going to create an element problem analyzer that will help us identify it.

First of all, let us define a class called IntPowerProblemAnalyzer. This class is looking explicitly for IInvocationExpressions, thus it will be made to inherit from ElementProblemanalyzer<IInvocationExpression>. There’s also some metadata at the top that we’ll discuss in a moment.

[ElementProblemAnalyzer(new[]{typeof(IInvocationExpression)}, 
  HighlightingTypes=new[]{typeof(IntPowerHighlighting)})]
public class IntPowerProblemAnalyzer : ElementProblemAnalyzer<IInvocationExpression>
{
  protected override void Run(IInvocationExpression element, 
    ElementProblemAnalyzerData data, IHighlightingConsumer consumer)
  {
    // todo :)
  }
}

Okay, so the class we have right now has to implement a single method called Run(). This method is important because this is where we teach the plugin to identify and highlight code that conforms to our pattern. But how to actually identify the pattern? Using the PSI viewer above, we can deduce that a Math.Pow() call check would consist of two parts: checking that we are calling a Pow() function, and checking that the arguments are correct. The first condition is checked as follows:

bool functionIsCalledPow = false;
var e = element.InvokedExpression as IReferenceExpression;
if (e != null)
{
  if (e.Reference.GetName().Equals("Pow"))
    functionIsCalledPow = true;
}

Note that we’re not testing it was Math that this was called on – this is more complicated and a bit out-of-scope right now. The second part of the check is a little more convoluted:

bool firstArgIsIdentifier = false;
bool secondArgIsInteger = false;
if (element.Arguments.Count == 2)
{
  firstArgIsIdentifier = element.Arguments[0].Value is IReferenceExpression;
  var secondArg = element.Arguments[1].Value as ICSharpLiteralExpression;
  if (secondArg != null && (secondArg.IsConstantValue()))
  {
    double value = -1.0;
    var cv = secondArg.ConstantValue;
    if (cv.IsDouble())
      value = (double)cv.Value;
    else if (cv.IsInteger())
      value = (int) cv.Value;
    secondArgIsInteger = (value > 0.0) && value == Math.Floor(value);
  }
}

The above does a very basic, almost simplistic check on the structure of the code in question. Note that I’ve greatly simplified the identification of the reference, but it will do for now. The next thing that we have to discuss is how problematic code is highlighted.

Highlighting the Problem

We’ve got some code for finding the problem, but how to show it to the user? In the snippet above, we used the highlighting consumer and passed it an instance of a highlighting. So what is it?

A highlighting is simply a class that implements that IHighlighting interface and is decorated by appropriate attributes. All it does is define how a particular problem is highlighted by ReSharper. This highlighting is passed to the consumer and is also featured as part of the HighlightingTypes attribute parameter of the problem analyzer. We are currently using the following definition of IntPowerHighlighting:

[StaticSeverityHighlighting(Severity.WARNING, CSharpLanguage.Name)]
public class IntPowerHighlighting : IHighlighting
{
  private readonly IInvocationExpression expression;
  public int Power { get; private set; }
  public IntPowerHighlighting(IInvocationExpression expression, int power)
  {
    this.expression = expression;
    Power = power;
  }
  public bool IsValid()
  {
    return expression != null && expression.IsValid();
  }
  public string ToolTip { get { return "Inefficient use of integer-based power"; } }
  public string ErrorStripeToolTip { get { return ToolTip; } }
  public int NavigationOffsetPatch { get { return 0; } }
}

Together with the element problem analyzer, we can fire up our plugin and get a result similar to the following:

And that’s what we wanted in the first place, but just seeing the problem isn’t enough. How about letting the end user correct the problem?

Quick-Fix

A quick-fix is a fix associated with a highlighting. In our case, we want to let a user replace Math.Pow(x,2) with x*x, with the number of x’s corresponding to the power. Note that it is entirely unreasonable beyond a certain value, typically around 2-3, so our fix will only work for these values.

In order to implement an inlining fix, we need to create an additional class. This class will implement the IQuickFix interface and decorate it with a [QuickFix] attribute as follows:

[QuickFix]
public class IntPowerInliningFix : BulbItemImpl, IQuickFix
{
  private readonly IntPowerHighlighting highlighting;
  public IntPowerInliningFix(IntPowerHighlighting highlighting)
  {
    this.highlighting = highlighting;
  }
  protected override Action<ITextControl> ExecutePsiTransaction(ISolution solution, IProgressIndicator progress)
  {
    // todo
  }
  public override string Text
  {
    get { return "Inline integer power"; }
  }
  public bool IsAvailable(IUserDataHolder cache)
  {
    int power = highlighting.Power;
    return power == 2 || power == 3;
  }
}

Note how the fix is only available for powers 2 and 3. The ExecutePsiTransaction() method is where all interesting things occur. Specifically, this is where we actually replace the Math.Pow() call with an inlined multiplication. Since this is somewhat more complicated, let’s go through this step-by-step. First of all, we need to get our argument – the one we’re going to multiply a few times:

var arg = highlighting.Expression.Arguments[0];

Now we need to create an element factory to construct the replacement expression. We use the original expression to get at the PSI module:

var factory = CSharpElementFactory.GetInstance(highlighting.Expression.GetPsiModule());

Now, through a bit of LINQ magic, we create an expression that represents the multiplication. Essentially, we create lots of $0’s separated by multiplication signs and let CSharpElementFactory fill them in:

var replacement = factory.CreateExpression(
  Enumerable.Range(0, highlighting.Power).Select(i => "$0").Join("*"), arg.Value);

Finally, we replace the old expression with the new one:

ModificationUtil.ReplaceChild(highlighting.Expression, replacement);

And here is what it looks like:

Conclusion

This demonstration is, admittedly, very simple. I’ve taken a number of methodological shortcuts that would never be acceptable in production code, namely:

  • Only checked that the function is called Pow() without verifying that it’s part of System.Math

  • Only did a basic check against the first argument

  • Assumed that the power is either a double or an int, whereas in reality it could be e.g., a short

  • Didn’t write any tests

In the next part of the series we’ll continue with this theme and take a look at how the plugin can be enhanced and its operations can be more rigorously checked. Meanwhile, check out the source code. For more information on the ReSharper API, check out the ReSharper Plugin Development Guide. Happy coding!