A Marpa-powered alternate C::Scan, and even a command-line
It is a natural extension for MarpaX::Languages::C::AST, after providing the c2ast tool, to have a look at more business view of C source, alike C::scan (a compatibility layer is provided) and extend the concept.
Coming back to the roots, what is/was the purpose of C::Scan: to have a coherent view of all declarations within a C source code. Indeed, C always resumes to two things: declarations and functions definitions, as stated by the top rule of its grammar:
translationUnit ::= externalDeclaration+ externalDeclaration ::= functionDefinition | declaration
A little bit further, and we see that function definitions resumes to nothing else but declaration specific rules followed by a compound statement. So we can always say a C source code resumes to a linked list of declarations and function bodies.
Amongst the few methods provided by <C::Scan>, there are those are difficult to not be heuristic: everything that belongs to preprocessing, i.e.:
#include "something" #include <somewhere> #define AND #define MACROS(with, eventual, parameters)
because all of that vanishes after preprocessing, that is giving the real C source code. MarpaX::Languages::C::AST will do its own heuristics as well.
And then the other methods, where MarpaX::Languages::C::AST implementation differs completely v.s. C::Scan, since it is based on grammar and not heuristics. I.e. the output is always, modulo a bug (-;), the exact representation of what is in the C source code. It is providing a C::Scan compatible interface for those who want to switch, and extends the concept up to providing deep but still "programmer mind" oriented view of the source code.
A great feature is that there is also a cscan command-line to experience! All the C::Scan like method are exposed, plus others handy methods, two of them that I'd like do share here with you.
First, a method that is not limited to the declarations: the answer to the usual question "What are all the hardcoded strings !?". cscan will give the answer on the whole source, including function bodies:
% cat /tmp/test.c char *x1 = "String1"; int f() { char *x2 = "String2"; } % cscan --get strings /tmp/test.c [ '"String1"', '"String2"' ]
And something which I believe is new: doing an Xpath query on all the declarations within the source code! For example, to find all nodes having an identifier named "x":
cscan --xpath "//*[contains(@nm,'x')]" \ /tmp/test.c
To find all function definitions that have at least one argument of type "double":
cscan --xpath "//*[@func=\"1\"]/args[@\ var=\"1\" and @\ ty=\"double\"]/.." /tmp/test.c
Let's finish by taking again the small /tmp/test.c file above, and ask for strings in the declaration section:
cscan --xpath "//*[starts-with(@\ init,'\"')]" /tmp/test.c
The output is full of important information from the "programming point of view":
{ file => '/tmp/test.c', # Source ft => 'char *x1 = "String1"',# Full text init => '"String1"', # An initialisor line => '1', # Line number nm => 'x1', # Identifier ty => 'char *', # Type var => '1' # Yes, a variable }
Last item is crucial: sometimes entries are just types, and you want them to have in the parsed declaration output. In particular structs of structs declares implicitely an inner struct type.
Leave a comment