At ABB Corporate Research, we use SrcML.NET to perform lightweight code analysis for a variety of tasks ranging from experimentation to powering tools. It hasn’t seen too many updates recently. In this post, I’m going to lay out what we’d like to do with SrcML.NET. Each task will get a github issue. The tasks will be broken into different categories: organization, code modernization, and new features.

Organization

The SrcML.NET repository is currently organized as a single solution with a number of different sub-projects. The projects range from core libraries (ABB.SrcML and ABB.SrcML.Data), to tools (Src2SrcMLPreview), to visual studio integration (the program transformation VS addin and the SrcML Service VSIX package). The problem with this layout is that it makes it very difficult to on-board new project members. In particular, including the Visual Studio libraries means that we need to have VS 2010, 2012, & 2013 installed along with their respective VS SDKs. This is incredibly frustrating for someone who just wants to work on one of the core libraries or standalone tools. The solution is to split the monolithic solution into many different solutions. There will be a few different repositories:

  • The main ABB.SrcML repository will have the core library (ABB.SrcML) and associated test libraries. It will also include ABB.SrcML.Data and the command line runner for generating srcML archives. This limited set of core projects means that we can start thinking about making SrcML.NET platform independent (Linux/Mac support via Mono).
  • The Src2SrcMLPreview tool is a small GUI tool that allows users to visualize how srcML for a source code snippet is structured. While I would like to package this with the “core” repository, I believe platform independence is a more important goal.
  • The Visual Studio projects will each get their own solution: the SrcML service (and associated test projects) and the program transformation add-in.

Additionally, the srcML executable that we depend on are included as part of the ABB.SrcML project file. Other projects (such as the SrcML Service) must have a manual link to those files in order to include them. Instead, we would like to pull theose libraries & executables out of ABB.SrcML and package them into their own nuget package. This way, projects that need them can explicitly declare that dependency. We can also let individual packages depend on different versions of the Kent State executables. For instance, ABB.SrcML doesn’t care about changes in source code parsing by srcML — it only cares if the command line tools or library APIs changes. However, ABB.SrcML.Data is very dependent on how source code is parsed. Packaging the srcML binaries separately will allow us to manage these relationships more effectively.

Right now, the project has a coding standard defined in the wiki. However, not many people know about it. This has led to inconsistency in the codebase. I want to look at EditorConfig and/or StyleCop in order to automatically enforce these guidelines. Each solution will include these files.

GitHub Issues

Code Modernization

Tasks in this category aren’t really new features. However, they should allow the codebase to be more easily understandable.

Right now, there’s a lot of code in ABB.SrcML, ABB.SrcML.Data, and the SrcML Service devoted to monitoring something and then routing those monitoring events to different archives. There are monitors for file systems, Visual Studio, and archives. While the code itself isn’t terribly complicated, the interplay of the different sources and their monitors can be hard to understand. It would be nice to explore existing, well-maintained libraries for managing these types of relationships. That way, we can focus SrcML.NET on creating and managing srcML instead of event routing. One possible library is the Reactive Extensions Library.

Make ABB.SrcML and ABB.SrcML.Data platform agnostic. The core srcML functionality should be platform agnostic. It should run on Windows, Linux, and Mac OS X equally well. This issue should also modify the NuGet package created in #67 so that it works on these different platforms.

GitHub Issues

New Features

Tasks in this category are new features that we can work on once tasks in the previous two sections have either been completed or have seen significant progress.

Item 1 is to improve the public facing APIs for ABB.SrcML.Data. It is currently very difficult to manage the object lifecycle for objects returned from the data queries. One avenue to explore is using an HTTP-based front end for submitting and answering queries. For example, OmniSharp used NancyFx to provide an HTTP front end.

Item 2 is to improve the call graph query code. The call graph currently works by creating a large structure in memory on which method calls and object references can be built. The code that keeps this structure up to date (in response to file changes, for instance) is very complicated. We should look at doing name resolution on individual data files through something like a reverse index.

Item 3 is to implement more accurate expression parsing. Currently, the SrcML.Data handling of expressions is very basic and basically mirrors how srcML stores expressions. This issue should look at making our expressions reflect an actual expression tree. This should improve e accuracy of name resolution and the call graph.

GitHub Issues

Conclusions

These improvements should improve the accuracy and performance of SrcML.NET while improving the maintainability of the codebase. If you’re interested, comment on one of the issues to get started!