Pushing the Limits of Xtext for C/C++ Linker Scripts

simplemem

Xtext is the popular Eclipse language development framework for domain specific languages. Its sweet spot is JVM-languages and it is excellent for languages where you can define the grammar yourself. But how well can Xtext cope with a non-JVM language that has undergone decades of evolution?

In our case, we want to see if we can take advantage of Xtext to create an editor for C/C++ linker scripts in CDT. Linker scripts are used to specify the memory sections, layouts and how code relates to these sections. Linker scripts consist of the ld command language, and this is what a simple typical script might look like:

MEMORY {
 RAM : ORIGIN = 0x0, LENGTH = 0x2000
 ROM : ORIGIN = 0x80000, LENGTH = 0x10000
}

SECTIONS {
 .text : { *(.text) *(.text.*) } > ROM
 .rodata : { *(.rodata) *(.rodata.*) } > ROM
 .data : { *(.data) *(.data.*) } > RAM
 .bss : { _bss = .; *(.bss) *(.bss.*) *(COMMON) _ebss = .; _end = .; } > RAM
}

Alternatives to Xtext

Besides using Xtext, its worth considering some of the other options there are for this task:

  • Roll-your-own – the existing C/C++ Editor in CDT does this, gives full control, best error-recovery and supports bidirectionality, recreating source from abstract syntax tree (AST), but it is a last resort as it would be an incredible amount of work that would take a long time to get right.
  • Antlr – write your own antlr grammar, but since antlr is already used in Xtext, may as well use Xtext and get benefits of Eclipse editor integration
  • Reuse linker’s bison grammar – would give perfect parsing, but it is a no-go because i) it’s GPL ii) it generates C code not Java & iii) requirements for editing are much more strenuous than for linking and this for example, would not support bidirectionality (i.e you can’t recreate the linker file from the AST).

Benefits of Xtext

The Xtext framework additionally provides these nice features we are interested in:

  • Parsing, lexing & AST generation
    • serialisation support is particularly important to support bidirectionality and preserve users comments, whitespace etc.
  • Rich Editor Features
    • syntax highlighting
    • content assist
    • validation & error markers
    • code folding & bracket matching
  • Integrated Outline editor
  • Ecore model generation which can be used for integration with UI frameworks such as EMF Forms, Sirius, etc.

Linker Script Parsing Challenges

When we talk about the ld command language being a non-JVM language, here are some specific challenges related to what that means.

  1. Crazy Identifiers! The following are valid identifiers in linker scripts:
    • .text
    • *
    • hello*.o
    • “spaces are ok, just quote the identifier”
    • this+is-another*crazy[example]
  2. Identifier or Number? Things that appear to be identifiers may actually be numbers:
    • a123 – identifier
    • a123x – number
    • 123y – identifier
    • 123h -number
  3. Identifier or Expression?

In the grammar 2+3, for example, depending on context, can either be an identifier or an expression:

SECTIONS {
 .out_name : {
  file*.o(.text.*)
  2+3(*)
  symbol = 2+3;
 }
}

The first 2+3 is a filename, so almost anything that can be a filename is allowed there. The second 2+3 is an expression to be assigned to symbol.

Resolutions

Here’s what we did to support the linker language as far as we could:

  1. Custom Xtext grammar – as extending the XType grammar does not make sense, the main job is to craft the grammar to understand all the linker script identifier and expressions specifics. This involves iterating as we add in more and more language feature support, here’s the work in progress.
  2. Limited Identifier Support – in some cases we opted to not support certain identifiers unless they are escaped (double-quoted). While linker scripts theoretically support such identifiers (e.g. 1234abcd) we have not found a single case yet of an identifier that would actually need escaping. If one did crop up, the user could adjust it to work with the editor (e.g. “1234abcd”).
  3. Context Based Lexing – knowing the difference between an identifier or expression would require context based lexing rules. However this will not work with the antlr lexer. We have the option to replace it with a custom or external lexer. This is an option that can be considered in the future if desirable.

Conclusion

Xtext is a great language development framework. While Xtext may not be able to support every theoretical case of the long-lived linker script command language, it can be used to provide a very high level of support for the common features. Support for context based lexing in the future would enable a higher level of language support. Xtext can be used to provide a rich language editor with syntax colouring, command completion, integrated outline view & more in a relatively short space of time. A powerful linker script editor is another great feature for C/C++ developers that use CDT, the reference C/C++ IDE in the industry.

The Sound of the Universe @ EclipseCon Europe 2016

At EclipseCon this year I heard the sound of the universe. And it was awesome and breathtaking. To be precise, it was the sound of two black holes colliding over a billion years ago,  part of the enthralling final keynote from Dr Benno Wilke on detecting gravitational waves. It was a fitting way to end a conference that had kicked off with another amazing keynote:  Stephen Carver delivering a powerful and emotional story of the people and tech behind the space shuttle disasters, framed in profound lessons on real communication and avoiding silo thinking.

For the very first time at EclipseCon Europe we held a CDT summit. Over 10 years ago I had the honour of being the first developer from Europe involved in CDT, so to bring the summit to Europe was a particularly special moment for me, especially with our renowned project co-lead Doug Schaefer in attendance. The summit was a success, particularly welcoming contributors from the wider community into the fold, and will definitely something we will be doing again next year.

As this year’s focus there was also a big community focus on diversity and raising awareness on this topic. The activity included my talk on ‘7 Habits of Highly Diverse Communities‘, addressing the board on the topic and a diversity BOF session. The discussions were great, lots of good energy, practical suggestions and I am so proud to see the community work together to ensure we can be as open and inclusive as possible.

The Science Working Group had good reason to celebrate at the conference: we have just completed our very first simultaneous release of five projects. A significant milestone for this nascent group, and was terrific to talk about the projects to the rest of the community.

There was an incredible amount on at the conference this year, the best way to get a quick taste was hearing what people enjoyed: language servers, Xtext, Sirius, scripting, IoT & testing were topics that kept coming up. On a personal level, it was my most intense EclipseCon yet with three talks, a BOF and a summit to organize. On the whole it was the busiest conference yet with a record attendance of 619. The most important thing is always the people: lots of new and old friends to talk to and exchange energy. At EclipseCon this year I heard the sound of the Eclipse universe. And it was awesome and breathtaking.

(A version of this article was first published on jaxenter.com: https://jaxenter.com/eclipsecon-europe-at-a-glance-129883.html)