A Marpa-powered alternate C::Scan, and even a command-line

It is a natural extension for MarpaX::Languages::C::AST, after providing the c2ast tool, to have a look at more business view of C source, alike C::scan (a compatibility layer is provided) and extend the concept.

Coming back to the roots, what is/was the purpose of C::Scan: to have a coherent view of all declarations within a C source code. Indeed, C always resumes to two things: declarations and functions definitions, as stated by the top rule of its grammar:

  translationUnit     ::= externalDeclaration+
  externalDeclaration ::= functionDefinition
                        | declaration

A little bit further, and we see that function definitions resumes to nothing else but declaration specific rules followed by a compound statement. So we can always say a C source code resumes to a linked list of declarations and function bodies.

Amongst the few methods provided by <C::Scan>, there are those are difficult to not be heuristic: everything that belongs to preprocessing, i.e.:

  #include "something"
  #include <somewhere>
  #define AND
  #define MACROS(with, eventual, parameters)

because all of that vanishes after preprocessing, that is giving the real C source code. MarpaX::Languages::C::AST will do its own heuristics as well.

And then the other methods, where MarpaX::Languages::C::AST implementation differs completely v.s. C::Scan, since it is based on grammar and not heuristics. I.e. the output is always, modulo a bug (-;), the exact representation of what is in the C source code. It is providing a C::Scan compatible interface for those who want to switch, and extends the concept up to providing deep but still "programmer mind" oriented view of the source code.

A great feature is that there is also a cscan command-line to experience! All the C::Scan like method are exposed, plus others handy methods, two of them that I'd like do share here with you.

First, a method that is not limited to the declarations: the answer to the usual question "What are all the hardcoded strings !?". cscan will give the answer on the whole source, including function bodies:

  % cat /tmp/test.c
  char *x1 = "String1";
  int f() {
    char *x2 = "String2";
  }
  % cscan --get strings /tmp/test.c
  [
    '"String1"',
    '"String2"'
  ]

And something which I believe is new: doing an Xpath query on all the declarations within the source code! For example, to find all nodes having an identifier named "x":

  cscan --xpath "//*[contains(@nm,'x')]" \
/tmp/test.c

To find all function definitions that have at least one argument of type "double":

  cscan --xpath "//*[@func=\"1\"]/args[@\
var=\"1\" and @\
ty=\"double\"]/.." /tmp/test.c

Let's finish by taking again the small /tmp/test.c file above, and ask for strings in the declaration section:

  cscan --xpath "//*[starts-with(@\
init,'\"')]" /tmp/test.c

The output is full of important information from the "programming point of view":

  {
    file => '/tmp/test.c',       # Source
    ft => 'char *x1 = "String1"',# Full text
    init => '"String1"',         # An initialisor
    line => '1',                 # Line number
    nm => 'x1',                  # Identifier
    ty => 'char *',              # Type
    var => '1'                   # Yes, a variable
  }

Last item is crucial: sometimes entries are just types, and you want them to have in the parsed declaration output. In particular structs of structs declares implicitely an inner struct type.

Leave a comment

About Jean-Damien Durand

user-pic About::Me::And::Perl