A Marpa-powered alternate C::Scan, and even a command-line
It is a natural extension for MarpaX::Languages::C::AST, after providing the c2ast tool, to have a look at more business view of C source, alike C::scan (a compatibility layer is provided) and extend the concept.
Coming back to the roots, what is/was the purpose of C::Scan: to have a coherent view of all declarations within a C source code. Indeed, C always resumes to two things: declarations and functions definitions, as stated by the top rule of its grammar:
  translationUnit     ::= externalDeclaration+
  externalDeclaration ::= functionDefinition
                        | declaration
A little bit further, and we see that function definitions resumes to nothing else but declaration specific rules followed by a compound statement. So we can always say a C source code resumes to a linked list of declarations and function bodies.
Amongst the few methods provided by <C::Scan>, there are those are difficult to not be heuristic: everything that belongs to preprocessing, i.e.:
#include "something" #include <somewhere> #define AND #define MACROS(with, eventual, parameters)
because all of that vanishes after preprocessing, that is giving the real C source code. MarpaX::Languages::C::AST will do its own heuristics as well.
And then the other methods, where MarpaX::Languages::C::AST implementation differs completely v.s. C::Scan, since it is based on grammar and not heuristics. I.e. the output is always, modulo a bug (-;), the exact representation of what is in the C source code. It is providing a C::Scan compatible interface for those who want to switch, and extends the concept up to providing deep but still "programmer mind" oriented view of the source code.
A great feature is that there is also a cscan command-line to experience! All the C::Scan like method are exposed, plus others handy methods, two of them that I'd like do share here with you.
First, a method that is not limited to the declarations: the answer to the usual question "What are all the hardcoded strings !?". cscan will give the answer on the whole source, including function bodies:
  % cat /tmp/test.c
  char *x1 = "String1";
  int f() {
    char *x2 = "String2";
  }
  % cscan --get strings /tmp/test.c
  [
    '"String1"',
    '"String2"'
  ]
And something which I believe is new: doing an Xpath query on all the declarations within the source code! For example, to find all nodes having an identifier named "x":
cscan --xpath "//*[contains(@nm,'x')]" \ /tmp/test.c
To find all function definitions that have at least one argument of type "double":
cscan --xpath "//*[@func=\"1\"]/args[@\ var=\"1\" and @\ ty=\"double\"]/.." /tmp/test.c
Let's finish by taking again the small /tmp/test.c file above, and ask for strings in the declaration section:
cscan --xpath "//*[starts-with(@\ init,'\"')]" /tmp/test.c
The output is full of important information from the "programming point of view":
  {
    file => '/tmp/test.c',       # Source
    ft => 'char *x1 = "String1"',# Full text
    init => '"String1"',         # An initialisor
    line => '1',                 # Line number
    nm => 'x1',                  # Identifier
    ty => 'char *',              # Type
    var => '1'                   # Yes, a variable
  }
Last item is crucial: sometimes entries are just types, and you want them to have in the parsed declaration output. In particular structs of structs declares implicitely an inner struct type.
 About::Me::And::Perl
	            About::Me::And::Perl
Leave a comment