parsing

Parsing Strategy

lally's picture

I've gotta process C++ code, converting it to C++ code (with some constructs transformed in the middle). Last week I started playing with Hannibal, a Spirit-based parser that does some of C++. It does most of the important parts, but making it choke wasn't hard enough -- the source to Boost.Wave's wave tool was able to kill it pretty easily. Hannibal has over 100 rules in it, but still only covers about half the language. Later, it'd be a great starting point for a full C++ grammar.

Strategy
Well, the hannibal-based approach would require lots of grunt analysis and grammar work, when my program's need to understand the text it was processing is very shallow — a poor mix. Instead, take the hack approach: the parser just filters what's important vs what isn't.

lally – Wed, 2006 – 06 – 28 19:20

Iterator Slicing

lally's picture

The big question I've been pondering is how to easily copy content to the right source files, and make it simple enough for a parser-based automata to use.

Slicing it Up
The solution is to slice up the file into ranges of iterators. A file such as:

stuff (a)
export namespace Blah {
stuff (b)
}
stuff (c)

Will get sliced into stuff(a), the export (including stuff (b)), and stuff(c).
Stuffs a and c go into the .cpp file, and stuff b (along with a generated namespace decl for Blah) go into the .h (which is included by the .cpp).

Also, both files will #include others as we go, but we won't know the full set of #includes until we've finished processing the file. To make sure we can insert an #include directive at any time, we don't immediately emit the ranges of text into the destination files. Instead, we queue the ranges up, and then do a final emission at the end.

lally – Tue, 2006 – 06 – 20 00:09

Grammar Strategy

lally's picture

What constructs do we need to match, how well, and what can we ignore?

Fundamentally, we just need:

stuff
import module Blah;
export module Blah {
stuff
public:
stuff
private:
stuff
}
stuff

Where all the 'stuff' can be saved and manipulated in bulk.

The export keyword's used with templates as well, but that's it.
import's a new keyword.

so it really comes down to recognizing these constructs, and ignoring
everything else flat out, without screwing up the constructs we care about.

hmm...

import_stmt = ch_p(T_IMPORT) >> ch_p(T_MODULE) >> ch_p(T_IDENTIFIER);
export_block = ch_p(T_EXPORT) >> ch_p(T_MODULE) >> ch_p(T_IDENTIFIER)

lally – Tue, 2006 – 06 – 13 20:43

Summary and Strategy

lally's picture

Requirements
My work this summer is implementing the 'modules' mentioned here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1964.pdf
which are implemented using a syntax like:

// File_1.cpp:
export namespace Lib { // Module definition.
 import namespace std;
public:
 struct S {
  S() { std::cout << “S()\n”; }
 };
}
// File_2.cpp:
import namespace Lib;
int main() {
 Lib::S s;
}

I'm using Boost.Spirit for this work. Spirit is a parsing framework in C++, using expression templates to define BNF-esque grammars inline. Its use requires that I do some C++ parsing, with no preexisting grammar to use as a starting point. I've committed to using Spirit to make a preprocessor that converts "Modular C++" to "C++", which will then be compilable with a stock compiler (I'm using G++).

lally – Sun, 2006 – 05 – 28 12:17
Syndicate content