In computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program. Computer science (or computing science) is the study and the Science of the theoretical foundations of Information and Computation and their Computer programs (also software programs, or just programs) are instructions for a Computer. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. A compiler is a Computer program (or set of programs that translates text written in a computer language (the source language) into another The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of fully-fledged programming languages. A macro (from the Greek 'μάκρο' for long or far in Computer science is a rule or Pattern that specifies how a certain input sequence (often a sequence A programming language is an Artificial language that can be used to write programs which control the behavior of a machine particularly a Computer.
A common example from computer programming is the processing performed on source code before the next step of compilation. In Computer science, source code (commonly just source or code) is any sequence of statements or declarations written in some Human-readable In some computer languages (e. g. , C) there is a phase of translation known as preprocessing. tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured A compiler is a Computer program (or set of programs that translates text written in a computer language (the source language) into another
Contents |
Lexical preprocessors are the lowest-level of preprocessors, insofar as they only require lexical analysis, that is, they operate on the source text, prior to any parsing, by performing simple substitution of tokenized character sequences for other tokenized character sequences, according to user-defined rules. In Computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens In Computer science and Linguistics, parsing, or more formally syntactic analysis, is the process of analyzing a sequence of tokens to They typically perform macro substitution, textual inclusion of other files, and conditional compilation or inclusion.
The most widely used lexical preprocessor is CPP, the C preprocessor, used pervasively in C and its descendant, C++. The C preprocessor ( cpp) is the Preprocessor for the C programming language. tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured C++ (" C Plus Plus " ˌsiːˌplʌsˈplʌs is a general-purpose Programming language. This preprocessor is used to provide the usual set of preprocessing services
The most common use of the C preprocessor is the
#include ". . . "
or
#include <. . . >
directive, which copies the full content of a file into the current file, at the point at which the directive occurs. These files usually (almost always) contain interface definitions for various library functions and data types, which must be included before they can be used; thus, the #include directive usually appears at the head of the file. The files so included are called "header files" for this reason. Some examples include <math.h> and <stdio.h> from the standard C library, providing mathematical and input/output functions, respectively. TemplateC_Standard_library --> mathh is a header file in the standard library of C programming language designed for TemplateC_Standard_library --> stdioh, which stands for "standard input/output header" is the header in the
While this use of a preprocessor for code reuse is simple, it is also slow, rather inefficient and requires the additional use of conditional compilation to avoid multiple inclusions of a given header file.
Since the 1970s, faster, safer and more efficient alternatives to reuse by file inclusion have been known and used by the programming language community, and implemented in most programming languages: Java and Common Lisp have packages, Pascal has units, Modula, OCaml, Haskell and Python have modules, and D, designed as a replacement of C and C++, has imports. Common Lisp, commonly abbreviated CL, is a dialect of the Lisp Programming language, published in ANSI standard document Information Pascal is an influential imperative and procedural Programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small The Modula Programming language is a descendent of the Pascal programming language. Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon Haskell is a standardized Purely functional Programming language with non-strict semantics, named after the Logician Haskell Curry Python is a general-purpose High-level programming language. Its design philosophy emphasizes programmer productivity and code readability The D programming language, also known simply as D, is an object-oriented, imperative, multiparadigm System programming language
Macros are commonly used in C to define small snippets of code. Snippet is a programming term for a small region of re-usable Source code or text During the preprocessing phase, each macro call is replaced, in-line, by the corresponding macro definition. If the macro has parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the overhead of a function call in simple cases, where the code is lightweight enough that function call overhead has a significant impact on performance.
For instance,
#define max(a,b) a > b ? a : b
defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing,
z = max(x,y);
becomes
z = x > y ? x : y ;
While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls.
For instance, if f and g are two functions, calling
z = max(f(), g());
will not evaluate f()once and g() once, and place the highest value in z as one may believe. Rather, one of the functions will be evaluated twice. If that function has side effects, this is usually not the expected behavior.
C macros are capable of mimicking functions, creating new syntax within some limitations, as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or else comments), but they have some limitations as a programming construct. Macros which mimic functions, for instance, can be called like real functions, but a macro cannot be passed to another function using a function pointer, since the macro itself has no address.
More modern languages typically do not use this form of metaprogramming through macro expansion of character strings, rather relying on either automatic or manual inlining of functions and methods, and other abstraction techniques such as templates, generic functions, or parametric polymorphism. Metaprogramming is the writing of Computer programs that write or manipulate other programs (or themselves as their data or that do part of the work at Compile time Templates are a feature of the C++ programming language that allow functions and classes to operate with generic types. In certain systems for Object-oriented programming such as the Common Lisp Object System and Dylan, a generic function is an entity made up of all methods In Computer science, polymorphism is a Programming language feature that allows values of different Data types to be handled using a In particular, inline functions overcome one of the major disadvantages of macros in modern C and C++ implementations, since an inline function provides the macro's advantage of avoiding the overhead of a function call, while its address can still be stored in a function pointer for indirect calls or use in parameters. In Computer science, an inline function is a Programming language construct used to suggest to a Compiler that a particular function be subjected to Also, the problem of multiple evaluation, seen above in the max macro, would not occur in an inlined function.
The C preprocessor also offers conditional compilation. In the C and C++ programming languages an #include guard, sometimes called a macro guard, is a particular construct used to avoid the problem of This permits having different versions of a same code in the same source file. Typically, this is used to customize the program with respect to the compilation platform, the status (debugging code can be "defined out" in production code), as well as to ensure that header files are only included once.
In this common case, the programmer will use a construct like this:
#ifndef FOO_H #define FOO_H . . . (header file code). . . #endif
This "macro guard" protects the header file from duplicate inclusion by testing for the existence of a macro which, by convention, has the same name as the header file itself. The definition of the FOO_H macro takes place when the header file is first processed by CPP. Thereafter, if that header file is included again, FOO_H will already be defined, causing the preprocessor to skip the entirety of the header file's text.
Preprocessor conditionals can be used in more complex ways, as below:
#ifdef x . . . #else . . . #endif
or
#if x . . . #else . . . #endif
This technique is often used in system header files to test for various features whose definition can change depending on the platform; for example, the GNU C Library uses "feature-test" macros to ensure that operating system and hardware differences are properly handled, while maintaining the same portable interface. The GNU C Library, commonly known as glibc, is the C standard library released by the GNU Project.
Once again, most modern programming languages discard this feature, rather relying on traditional if. . . then. . . else. . . flow control operators, leaving to the compiler the task of removing useless code from the executable.
Other lexical preprocessors include the general-purpose m4, most commonly used in cross-platform build systems such as autoconf, and GEMA, an open source macro processor which operates on patterns of context. m4 is a General purpose macro processor designed by Brian Kernighan and Dennis Ritchie. Autoconf is a tool for producing Shell scripts that automatically configure software Source code packages to adapt to many kinds of UNIX-like systems A general purpose macro processor is a macro processor that is not tied to or integrated with a particular language or piece of software
Syntactic preprocessors were introduced with the Lisp family of languages. Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case with Lisp and OCaml. Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon Some other languages rely on a fully external language to define the transformations, such as the XSLT preprocessor for XML, or its statically typed counterpart CDuce. Extensible Stylesheet Language Transformations ( XSLT) is an XML -based language used for the transformation of XML documents into other XML or "human-readable" Don't change "Extensible" CDuce is an XML-oriented functional language which extends XDuce in a few directions
Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed a Domain-Specific Programming Language inside a general purpose language. The term domain-specific language ( DSL) has become popular in recent years in Software development to indicate a Programming language or Specification
A good example of syntax customization is the existence of two different syntaxes in the Objective Caml programming language. Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand.
Similarly, a number of programs written in OCaml customize the syntax of the language by the addition of new operators. Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon
The best examples of language extension through macros are found in the Lisp family of languages. Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax While the languages, by themselves, are simple dynamically-typed functional cores, the standard distributions of Scheme or Common Lisp permit imperative or object-oriented programming, as well as static typing. Scheme is a Multi-paradigm programming language. It is one of the two main dialects of Lisp and supports a number of programming paradigms but is Common Lisp, commonly abbreviated CL, is a dialect of the Lisp Programming language, published in ANSI standard document Information Almost all of these features are implemented by syntactic preprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of preprocessing, since it takes place before other phases of compilation.
Similarly, statically-checked, type-safe regular expressions or code generation may be added to the syntax and semantics of OCaml through macros, as well as micro-threads (also known as coroutines or fibers), monads or transparent XML manipulation. In Computing, regular expressions provide a concise and flexible means for identifying strings of text of interest such as particular characters words or patterns of characters Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon In Computer science, coroutines are program components that generalize Subroutines to allow multiple entry points and suspending and resuming of execution at certain In Computer science, a fiber is a particularly lightweight Thread of execution. In Functional programming, a monad is a kind of Abstract data type used to represent Computations (instead of data in the Domain model)
One of the unusual features of the Lisp family of languages is the possibility of using macros to create an internal Domain-Specific Programming Language. Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax The term domain-specific language ( DSL) has become popular in recent years in Software development to indicate a Programming language or Specification Typically, in a large Lisp-based project, a module may be written in a variety of such minilanguages, one perhaps using a SQL-based dialect of Lisp, another written in a dialect specialized for GUIs or pretty-printing, etc. Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax Common Lisp's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like minilanguage to describe complex iteration, while still enabling the use of standard Lisp operators. Common Lisp, commonly abbreviated CL, is a dialect of the Lisp Programming language, published in ANSI standard document Information
The MetaOCaml preprocessor/language provides similar features for external Domain-Specific Programming Languages. The term domain-specific language ( DSL) has become popular in recent years in Software development to indicate a Programming language or Specification This preprocessor takes the description of the semantics of a language (i. e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the OCaml programming language -- and from that language, either to bytecode or to native code. Objective Caml ( OCaml) is the main implementation of the Caml Programming language, created by Xavier Leroy, Jérôme Vouillon `
Most preprocessors are specific to a particular data processing task (e. g. , compiling the C language). A compiler is a Computer program (or set of programs that translates text written in a computer language (the source language) into another tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured A preprocessor may be promoted as being general purpose, meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks.
M4 is probably the most well known example of such a general purpose preprocessor, although the C preprocessor is sometimes used in a non-C specific role. m4 is a General purpose macro processor designed by Brian Kernighan and Dennis Ritchie. tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured Examples: