Pushing the Limits of Xtext for C/C++ Linker Scripts

simplemem

Xtext is the popular Eclipse language development framework for domain specific languages. Its sweet spot is JVM-languages and it is excellent for languages where you can define the grammar yourself. But how well can Xtext cope with a non-JVM language that has undergone decades of evolution?

In our case, we want to see if we can take advantage of Xtext to create an editor for C/C++ linker scripts in CDT. Linker scripts are used to specify the memory sections, layouts and how code relates to these sections. Linker scripts consist of the ld command language, and this is what a simple typical script might look like:

MEMORY {
 RAM : ORIGIN = 0x0, LENGTH = 0x2000
 ROM : ORIGIN = 0x80000, LENGTH = 0x10000
}

SECTIONS {
 .text : { *(.text) *(.text.*) } > ROM
 .rodata : { *(.rodata) *(.rodata.*) } > ROM
 .data : { *(.data) *(.data.*) } > RAM
 .bss : { _bss = .; *(.bss) *(.bss.*) *(COMMON) _ebss = .; _end = .; } > RAM
}

Alternatives to Xtext

Besides using Xtext, its worth considering some of the other options there are for this task:

  • Roll-your-own – the existing C/C++ Editor in CDT does this, gives full control, best error-recovery and supports bidirectionality, recreating source from abstract syntax tree (AST), but it is a last resort as it would be an incredible amount of work that would take a long time to get right.
  • Antlr – write your own antlr grammar, but since antlr is already used in Xtext, may as well use Xtext and get benefits of Eclipse editor integration
  • Reuse linker’s bison grammar – would give perfect parsing, but it is a no-go because i) it’s GPL ii) it generates C code not Java & iii) requirements for editing are much more strenuous than for linking and this for example, would not support bidirectionality (i.e you can’t recreate the linker file from the AST).

Benefits of Xtext

The Xtext framework additionally provides these nice features we are interested in:

  • Parsing, lexing & AST generation
    • serialisation support is particularly important to support bidirectionality and preserve users comments, whitespace etc.
  • Rich Editor Features
    • syntax highlighting
    • content assist
    • validation & error markers
    • code folding & bracket matching
  • Integrated Outline editor
  • Ecore model generation which can be used for integration with UI frameworks such as EMF Forms, Sirius, etc.

Linker Script Parsing Challenges

When we talk about the ld command language being a non-JVM language, here are some specific challenges related to what that means.

  1. Crazy Identifiers! The following are valid identifiers in linker scripts:
    • .text
    • *
    • hello*.o
    • “spaces are ok, just quote the identifier”
    • this+is-another*crazy[example]
  2. Identifier or Number? Things that appear to be identifiers may actually be numbers:
    • a123 – identifier
    • a123x – number
    • 123y – identifier
    • 123h -number
  3. Identifier or Expression?

In the grammar 2+3, for example, depending on context, can either be an identifier or an expression:

SECTIONS {
 .out_name : {
  file*.o(.text.*)
  2+3(*)
  symbol = 2+3;
 }
}

The first 2+3 is a filename, so almost anything that can be a filename is allowed there. The second 2+3 is an expression to be assigned to symbol.

Resolutions

Here’s what we did to support the linker language as far as we could:

  1. Custom Xtext grammar – as extending the XType grammar does not make sense, the main job is to craft the grammar to understand all the linker script identifier and expressions specifics. This involves iterating as we add in more and more language feature support, here’s the work in progress.
  2. Limited Identifier Support – in some cases we opted to not support certain identifiers unless they are escaped (double-quoted). While linker scripts theoretically support such identifiers (e.g. 1234abcd) we have not found a single case yet of an identifier that would actually need escaping. If one did crop up, the user could adjust it to work with the editor (e.g. “1234abcd”).
  3. Context Based Lexing – knowing the difference between an identifier or expression would require context based lexing rules. However this will not work with the antlr lexer. We have the option to replace it with a custom or external lexer. This is an option that can be considered in the future if desirable.

Conclusion

Xtext is a great language development framework. While Xtext may not be able to support every theoretical case of the long-lived linker script command language, it can be used to provide a very high level of support for the common features. Support for context based lexing in the future would enable a higher level of language support. Xtext can be used to provide a rich language editor with syntax colouring, command completion, integrated outline view & more in a relatively short space of time. A powerful linker script editor is another great feature for C/C++ developers that use CDT, the reference C/C++ IDE in the industry.

Suspicious Semicolon: CDT at EclipseCon France 2016

before_suppress

The CDT project was well represented in Toulouse this year.

CDT Latest & Greatest

Jonah Graham gave a CDT overview with CDT: Latest & Greatest Tooling for C/C++. Mostly hands-on, Jonah used the C-implementation of the Python interpreter to demonstrate how to set-up, build and debug a substantial sized project with CDT.

 

This included showing some of the new features in Neon like suppressing Codan warnings and the enhanced memory view. Jonah also talked about the upcoming features including the new GDB console and improved multicore breakpoint support. Continue reading “Suspicious Semicolon: CDT at EclipseCon France 2016”

“Hello World!” – One Small Step for Python Scripting in Eclipse

helloworld

We are excited to be able to run a ‘hello world’ command using EASE and CPython in Eclipse (quickly followed by running the ‘Zen of Python’ for good measure :). It is the first key stage of proving our approach of using Py4J as the enabling technology to getting scripting available with EASE for Eclipse. It is a small step, but a significant leap towards unprecedented automation, dramatically easier Eclipse extensions and powerful third-party library integration using Python. Continue reading ““Hello World!” – One Small Step for Python Scripting in Eclipse”

Improved CDT Source Lookup Path Mappings in Neon

Source lookup path mappings is a CDT debug feature that you would never notice if it always just worked. And it mostly does. But when it doesn’t you get an error while debugging that looks like this:

src_not_found

And things can get worse for the user: even if they go on to locate the file, they can then suffer problems when trying to set breakpoints. In the example below, the breakpoint is not installed (no blue tick on the breakpoint icon) and an error message from gdb shows up in the console. As far as the user is concerned that file does exists and is just there sitting in their workspace. But there is no help or indication of how to solve the problem.

cantsetbp

Source Lookup Path Mappings

Source lookup path mappings is the feature responsible for translating compilation paths into local paths, for example, if a binary is built on a build machine it will have paths like /build/machine. To debug it on a user’s machine and locate the corresponding source files this path must be mapped to the local paths, say /user/project. There are two parts to it: Continue reading “Improved CDT Source Lookup Path Mappings in Neon”

What’s Between You and the Hardware?

This slide is from my EclipseCon Europe talk with Gordon Williams on Espruino.

It aims to show at a high-level what sits between our user software and the hardware itself and why it matters.

Arduino – an Arduino sketch gets compiled directly to object code and this runs directly on the hardware, i.e ‘bare metal’. This is great for directly accessing hardware but bare metal can be a pain, for instance, when it comes to scheduling when different things should run – then you’re on your own.

Raspberry Pi – is a powerful board that runs the Linux operating system, and an interpreter for Python on top of that. While you can do a lot with a Pi,and they are often used for running servers, sometimes it can be overkill for smaller automated tasks, and then you pay the price in power or battery life.

Espruino – Espruino aims to sit in the gap between the two. Its custom javascript interpreter gives some powerful functionality without having to give up tight control over the hardware. In particular, the interpreter includes scheduler methods which means it knows exactly when to go to sleep and for how long giving huge gains in power saving and therefore battery life.