SrcML.NET Roadmap

At ABB Corporate Research, we use SrcML.NET to perform lightweight code analysis for a variety of tasks ranging from experimentation to powering tools. It hasn’t seen too many updates recently. In this post, I’m going to lay out what we’d like to do with SrcML.NET. Each task will get a github issue. The tasks will be broken into different categories: organization, code modernization, and new features.


The SrcML.NET repository is currently organized as a single solution with a number of different sub-projects. The projects range from core libraries (ABB.SrcML and ABB.SrcML.Data), to tools (Src2SrcMLPreview), to visual studio integration (the program transformation VS addin and the SrcML Service VSIX package). The problem with this layout is that it makes it very difficult to on-board new project members. In particular, including the Visual Studio libraries means that we need to have VS 2010, 2012, & 2013 installed along with their respective VS SDKs. This is incredibly frustrating for someone who just wants to work on one of the core libraries or standalone tools. The solution is to split the monolithic solution into many different solutions. There will be a few different repositories:

  • The main ABB.SrcML repository will have the core library (ABB.SrcML) and associated test libraries. It will also include ABB.SrcML.Data and the command line runner for generating srcML archives. This limited set of core projects means that we can start thinking about making SrcML.NET platform independent (Linux/Mac support via Mono).
  • The Src2SrcMLPreview tool is a small GUI tool that allows users to visualize how srcML for a source code snippet is structured. While I would like to package this with the “core” repository, I believe platform independence is a more important goal.
  • The Visual Studio projects will each get their own solution: the SrcML service (and associated test projects) and the program transformation add-in.

Additionally, the srcML executable that we depend on are included as part of the ABB.SrcML project file. Other projects (such as the SrcML Service) must have a manual link to those files in order to include them. Instead, we would like to pull theose libraries & executables out of ABB.SrcML and package them into their own nuget package. This way, projects that need them can explicitly declare that dependency. We can also let individual packages depend on different versions of the Kent State executables. For instance, ABB.SrcML doesn’t care about changes in source code parsing by srcML — it only cares if the command line tools or library APIs changes. However, ABB.SrcML.Data is very dependent on how source code is parsed. Packaging the srcML binaries separately will allow us to manage these relationships more effectively.

Right now, the project has a coding standard defined in the wiki. However, not many people know about it. This has led to inconsistency in the codebase. I want to look at EditorConfig and/or StyleCop in order to automatically enforce these guidelines. Each solution will include these files.

GitHub Issues

Code Modernization

Tasks in this category aren’t really new features. However, they should allow the codebase to be more easily understandable.

Right now, there’s a lot of code in ABB.SrcML, ABB.SrcML.Data, and the SrcML Service devoted to monitoring something and then routing those monitoring events to different archives. There are monitors for file systems, Visual Studio, and archives. While the code itself isn’t terribly complicated, the interplay of the different sources and their monitors can be hard to understand. It would be nice to explore existing, well-maintained libraries for managing these types of relationships. That way, we can focus SrcML.NET on creating and managing srcML instead of event routing. One possible library is the Reactive Extensions Library.

Make ABB.SrcML and ABB.SrcML.Data platform agnostic. The core srcML functionality should be platform agnostic. It should run on Windows, Linux, and Mac OS X equally well. This issue should also modify the NuGet package created in #67 so that it works on these different platforms.

GitHub Issues

New Features

Tasks in this category are new features that we can work on once tasks in the previous two sections have either been completed or have seen significant progress.

Item 1 is to improve the public facing APIs for ABB.SrcML.Data. It is currently very difficult to manage the object lifecycle for objects returned from the data queries. One avenue to explore is using an HTTP-based front end for submitting and answering queries. For example, OmniSharp used NancyFx to provide an HTTP front end.

Item 2 is to improve the call graph query code. The call graph currently works by creating a large structure in memory on which method calls and object references can be built. The code that keeps this structure up to date (in response to file changes, for instance) is very complicated. We should look at doing name resolution on individual data files through something like a reverse index.

Item 3 is to implement more accurate expression parsing. Currently, the SrcML.Data handling of expressions is very basic and basically mirrors how srcML stores expressions. This issue should look at making our expressions reflect an actual expression tree. This should improve e accuracy of name resolution and the call graph.

GitHub Issues


These improvements should improve the accuracy and performance of SrcML.NET while improving the maintainability of the codebase. If you’re interested, comment on one of the issues to get started!

Thoughts on Community from ATO2015

Open source is where society innovates — Jono Bacon

This is the first post in my All Things Open 2015 Series.

What makes a health community? What are examples of healthy communities? What features can we lift from successful communities for our own teams and projects?

Goals and principles

The defining feature of open source projects and communities is directly in the name: “openness”.

There was a great keynote on the first day by Red Hat’s CEO, Jim Whitehurst. He defined an “open organization” to be

An organization that engages participative communities both inside and out

Organizations must welcome all kinds of input from all kinds of people to attract and retain talent.

Nuts and bolts

There were some great tips that people felt contributed to the above goals. Two of the key talks in this area were Brandon Keeper’s Open Source Principles for Better Engineering Teams and Kaitlin Devine’s Power to the People: Transforming Government with Open Source. They both had great things to say about how their respective organizations worked (GitHub & 18F) in a distributed fashion and interacted with the community. The core points were to:

  • Communicate via methods that are permanent, asynchronous mediums
  • Have a strong code of conduct
  • Learn by lurking / teach by doing
  • Automate grunt work

One fact that Brandon highlighted was that, historically, open source projects never had the luxury of face to face meetings. Instead, they use issue tracking systems, mailing lists, and version control systems to communicate. The permanent record provided by these tools helps newcomers learn the ropes. The asynchronous nature of these tools (You can e-mail a mailing list, and continue working until someone responds) means that people can respond at their leisure and you can keep working until you get a response.

Brandon noted that at GitHub, they respond to questions with a URL (to documentation, a pull request, or a blog post). He stated that informal knowledge stops a conversation:

Alice: Why don’t we do it this way?
Bob: We tried that a few years ago and it didn’t work

A durable, search-able knowledgebase lets the conversation continue:

Alice: Why don’t we do it this way?
Bob: We tried that a few years ago and it didn’t work. See http://link-to-project
Alice (some time later): I see — since then, there have been these new advances. It might be worth trying some of them to see if this would work now.

Kaitlin talked about the importance of having a strong code of conduct. A code of conduct protects community members by making it clear what kind of conduct is appropriate and what the procedure is when someone violates it. An easy way to make your code of conduct prominent on a project-by-project basis is to link to it prominently from both the and documents.

The conference itself was a good example of a community with a strong code of conduct. Each communication from the organizing committee mentions the code of conduct.

The way we start working with communities (open source or otherwise) is important. He stated that new OSS project members learn by doing and veteran members teach by doing.

In the course of their daily work veteran community members work primarily on durable mediums. By responding to questions and requests with links (references to the durable record), they teach new community members to work the same way. Similarly, by responding to code of conduct violations in a public fashion, new members know what to expect and how to handle violations. This both promotes a safe space and reinforces community norms.

The final thought that resonated with me was the automation of grunt work. Github employees go out of their way to automate grunt work so that people can keep working on the hard parts of their jobs. Some of the things that they automate include:

  1. Automating coding standard checking and other basic warnings: this way, code reviews end up being about the substance of the code and not important, but nitpicky details.
  2. Writing style: the GitHub blog uses automated checks to ensure quality. In particular, they have a Jenkins job that flags common writing flaws by running candidate posts through the write-good. This promotes a consistent style across the blog, and once again eliminates simple errors and lets human reviewers focus on the substance of the post.
  3. Blog post calendar: They considered appointing someone to organize when specific entries would get posted to the blog to ensure that there was a regular flow of posts. Instead, they added another Jenkins job that throws an error if a day has too many entries. If a day is too full, the job suggests that you schedule your post for a different day.

What I love about these suggestions is that they take the common machinery required to work day-to-day and focuses it to providing thoughtful feedback. Computers handle the minutiae that they are best at.

We are more receptive to feedback from pedantic robots than pedantic people ...and robots are more reliable... — Brandon Keepers

There was a lot of good “how to manage a community” sprinkled throughout ATO this year. I plan on taking these ideas and trying to drive them in my own teams and projects.

All Things Open 2015

Ready for ATO2015

This is the third year that I’ve had the pleasure of attending All Things Open in Raleigh, NC.

The conference has gotten bigger every year. This year there were 1,700 attendees across thirteen tracks. The tracks cover a variety of technical topics ranging from JavaScript/web development to big data, to community management.

I haven’t previously blogged about the talks I go to. This year (I tell myself) will be different.

This year is big enough that I’m going to highlight several different themes and projects that really caught my attention over the two days. I’ll write a post on each of the following topics:

  1. Community: community is a big part of All Things Open, and many of the talks are either specifically about open source communities, or peripherally mention them as a key to technical success. I think there were a lot of good takeaways on how to engineer communities to be inclusive, productive, and self-sustaining.
  2. Development & Deployment Environments: While I’d previously been aware of tools like Docker and Vagrant, hearing experts talk about them and give live demonstrations really cements what these tools are for and how they can be used to simplify my life as a developer, researcher, and systems administrator.
  3. Big Data: I went to several of the big data talks — this year there were two great talks on using Apache Nifi and Apache Spark to ingest and explore data.
  4. Graph Databases: Graph databases (like Neo4J) are an interesting take on databases. Again, it was instructive to see an expert talk about graph databases and give a live demonstration.
  5. Rust: The Rust programming language is an interesting new programming language that’s seen a considerable amount of development over the previous year.
  6. SrcML.NET: There was no talk on SrcML.NET this year — however, as its primary maintainer, I hope to apply many of the social and technical ideas I learned this year to the project.

As I write each post, I’ll update this post with a link.

Krista Tippett interviews Thich Nhat Hanh

MS. TIPPETT: The kingdom of God?

BROTHER THAY: Yeah, because I could not like to go to a place where there is no suffering. I could not like to send my children to a place where there is no suffering because, in such a place, they have no way to learn how to be understanding and compassionate. And the kingdom of God is a place where there is understanding and compassion, and, therefore, suffering should exist.

MS. TIPPETT: That’s quite different from some religious perspectives which would say that the kingdom of God is a place where we’ve transcended suffering or moved beyond it.

BROTHER THAY: Yes. And suffering and happiness, they are both organic, like a flower and garbage. If the flower is on her way to become a piece of garbage, the garbage can be on her way to becoming a flower.

Transcript: Thich Nhat Hanh, Cheri Maples, and Larry Ward — Mindfulness, Suffering, and Engaged Buddhism | On Being

This is my blog. There are many like it, but they probably get updated more often.


Get every new post delivered to your Inbox.

Join 430 other followers