A compiler is a computer program (or set of programs) that translates text written in a computer language (the source language) into another computer language (the target language). Computer programs (also software programs, or just programs) are instructions for a Computer. A programming language is an Artificial language that can be used to write programs which control the behavior of a machine particularly a Computer. The original sequence is usually called the source code and the output called object code. In Computer science, source code (commonly just source or code) is any sequence of statements or declarations written in some Human-readable In Computer science, object code, or an object file, is the representation of code that a Compiler or Assembler generates by processing Commonly the output has a form suitable for processing by other programs (e. g. , a linker), but it may be a human-readable text file. A text file (sometimes spelled "textfile" is a kind of Computer file that is structured as a sequence of lines.
The most common reason for wanting to translate source code is to create an executable program. In Computing, an executable (file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a file that only contains The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e. In computing a high-level programming language is a Programming language with strong abstraction from the details of the computer g. , assembly language or machine language). See the terminology section below for information regarding inconsistent use of the terms assembly and assembler Machine code or machine language is a system of instructions and data executed directly by a Computer 's Central processing unit. A program that translates from a low level language to a higher level one is a decompiler. A decompiler is the name given to a Computer program that performs the reverse operation to that of a Compiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. In Mathematics, Computer science and Logic, rewriting covers a wide range of potentially non-deterministic methods of replacing subterms of a
A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization. In Computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens In Computer science, a preprocessor is a program that processes its input data to produce output that is used as input to another program In Computer science and Linguistics, parsing, or more formally syntactic analysis, is the process of analyzing a sequence of tokens to In Computer science, code generation is the process by which a Compiler 's code generator converts some internal representation of Source code In Computing, optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources
Contents |
Software for early computers was exclusively written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler. Computer data storage, often called storage or memory, refers to Computer components devices and recording media that retain digital
Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. Rear Admiral Grace Murray Hopper ( December 9 1906 – January 1 1992) was an American Computer scientist and United The A-0 system, written by Grace Hopper in 1951 and 1952 for the UNIVAC I, was the first Compiler ever developed for an electronic computer The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler, in 1957. Fortran (previously FORTRAN) is a general-purpose, procedural, imperative Programming language that is especially suited to John Warner Backus ( December 3, 1924 – March 17, 2007) was an American Computer scientist. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology COBOL was an early language to be compiled on multiple architectures, in 1960. COBOL (ˈkoʊbɒl is one of the oldest programming languages still in active use [1]
In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex. A programming language is an Artificial language that can be used to write programs which control the behavior of a machine particularly a Computer.
Early compilers were written in assembly language. The first self-hosting compiler — capable of compiling its own source code in a high-level language — was created for Lisp by Hart and Levin at MIT in 1962. The term Self-hosting was coined to refer to the use of a Computer program as part of the Toolchain or Operating system that produces new versions of that Lisp (or LISP) is a family of Computer Programming languages with a long history and a distinctive fully parenthesized syntax [2] Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Pascal is an influential imperative and procedural Programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured Building a self-hosting compiler is a bootstrapping problem -- the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter. Bootstrapping is a term used in Computer science to describe the techniques involved in writing a Compiler (or assembler) in the target Programming In Computer science, an interpreter normally means a Computer program that executes, i
Compiler construction and compiler optimization are taught at universities as part of the computer science curriculum. Compiler optimization is the process of tuning the output of a Compiler to minimize or maximize some attribute of an Executable computer program Computer science (or computing science) is the study and the Science of the theoretical foundations of Information and Computation and their Such courses are usually supplemented with the implementation of a compiler for an educational programming language. An educational programming language is a Programming language that is designed primarily as a Learning instrument and not so much as a tool for writing real-world A well-documented example is Niklaus Wirth's PL/0 compiler, which Wirth used to teach compiler construction in the 1970s. Niklaus Emil Wirth (b February 15, 1934) is a Swiss computer scientist, best known for designing several Programming languages including There are at least two programming languages known as PL/0. One is a subset of IBM's general purpose Programming language PL/I. [3] In spite of its simplicity, the PL/0 compiler introduced several influential concepts to the field:
One method used to classify compilers is by the platform on which the generated code they produce executes. A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent where each such Procedure In Computer science, Extended Backus–Naur Form (EBNF is a Metasyntax notation used to express Context-free grammars that is a formal way to describe In Computer science, code generation is the process by which a Compiler 's code generator converts some internal representation of Source code Bootstrapping is a term used in Computer science to describe the techniques involved in writing a Compiler (or assembler) in the target Programming In Computing, a platform describes some sort of Hardware architecture or Software framework (including Application frameworks, that allows This is known as the target platform.
A native or hosted compiler is one whose output is intended to directly run on the same type of computer and operating system as the compiler itself runs on. The output of a cross compiler is designed to run on a different platform. A cross compiler is a Compiler capable of creating Executable code for a platform other than the one on which the compiler is run Cross compilers are often used when developing software for embedded systems that are not intended to support a software development environment. An embedded system is a special-purpose Computer system designed to perform one or a few dedicated functions often with Real-time computing constraints
The output of a compiler that produces code for a virtual machine (VM) may or may not be executed on the same platform as the compiler that produced it. In Computer science, a virtual machine (VM is a Software implementation of a machine (computer that executes programs like a real machine For this reason such compilers are not usually classified as native or cross compilers.
Higher-level programming languages are generally divided for convenience into compiled languages and interpreted languages. A compiled language is a Programming language whose implementations are typically Compilers (translators which generate Machine code from In Computer programming an interpreted language is a Programming language whose implementation often takes the form of an interpreter. However, there is rarely anything about a language that requires it to be exclusively compiled, or exclusively interpreted. The categorization usually reflects the most popular or widespread implementations of a language — for instance, BASIC is thought of as an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.
In a sense, all languages are interpreted, with "execution" being merely a special case of interpretation performed by transistors switching on a CPU. In Electronics, a transistor is a Semiconductor device commonly used to amplify or switch electronic signals Modern trends toward just-in-time compilation and bytecode interpretation also blur the traditional categorizations. In Computing, just-in-time compilation ( JIT) also known as dynamic translation, is a technique for improving the runtime performance of a Computer Bytecode is a term which has been used to denote various forms of Instruction sets designed for efficient execution by a software interpreter as well as being suitable
There are exceptions. Some language specifications spell out that implementations must include a compilation facility; for example, Common Lisp. Common Lisp, commonly abbreviated CL, is a dialect of the Lisp Programming language, published in ANSI standard document Information Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special evaluation function. SNOBOL ( String Oriented Symbolic Language) is a Computer Programming language developed between 1962 and 1967 at AT&T Bell Laboratories To implement these features in a compiled language, programs must usually be shipped with a runtime library that includes a version of the compiler itself. In Computer programming, a runtime library is a special Program library used by a Compiler, to implement functions built into a Programming language
The output of some compilers may target hardware at a very low level. Hardware is a general term that refers to the physical artifacts of a Technology. For example a Field Programmable Gate Array (FPGA) or structured Application-specific integrated circuit (ASIC). FPGAs should not be confused with the Flip-chip pin grid array, a form of integrated circuit packaging Such compilers are said to be hardware compilers or synthesis tools because the programs they compile effectively control the final configuration of the hardware and how it operates; the output of the compilation are not instructions that are executed in sequence - only an interconnection of transistors or lookup tables. A silicon compiler is a Software system that takes a user's specifications and automatically generates an Integrated circuit (IC For example, XST is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available from Altera, Synplicity, Synopsys and other vendors.
The approach taken to compiler design is affected by the complexity of the processing that needs to be done, the experience of the person(s) designing it, and the resources (eg, people and tools) available.
A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. When the source language is large and complex, and high quality output is required the design may be split into a number of relatively independent phases, or passes. Having separate phases means development can be parceled up into small parts and given to different people. It also becomes much easier to replace a single phase by an improved one, or to insert new phases later (eg, additional optimizations).
The division of the compilation processes in phases (or passes) was championed by the Production Quality Compiler-Compiler Project (PQCC) at Carnegie Mellon University. The Production Quality Compiler-Compiler Project (or PQCC was a long-term project led by William Wulf at Carnegie Mellon University to produce an industrial-strength Carnegie Mellon University (also known as CMU) is a private Research University in Pittsburgh, Pennsylvania, United This project introduced the terms front end, middle end (rarely heard today), and back end.
All but the smallest of compilers have more than two phases. However, these phases are usually regarded as being part of the front end or the back end. The point at where these two ends meet is always open to debate. The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code).
The middle end is usually designed to perform optimizations on a form other than the source code or machine code. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors.
The back end takes the output from the middle. It may perform more analysis, transformations and optimizations that are for a particular computer. Then, it generates code for a particular processor and OS.
This front-end/middle/back-end approach makes it possible to combine front ends for different languages with back ends for different CPUs. A programming language is an Artificial language that can be used to write programs which control the behavior of a machine particularly a Computer. Practical examples of this approach are the GNU Compiler Collection, LLVM, and the Amsterdam Compiler Kit, which have multiple front-ends, shared analysis and multiple back-ends. The GNU Compiler Collection (usually shortened to GCC) is a set of Compilers produced for various Programming languages by the GNU Project The Low Level Virtual Machine, generally known as LLVM, is a Compiler infrastructure written in C++, which is designed for Compile-time The Amsterdam Compiler Kit (ACK is a fast lightweight and Retargetable compiler suite and toolchain written by Andrew Tanenbaum and Ceriel Jacobs
Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers are generally faster than multi-pass compilers. In Computer programming, a one-pass compiler is a Compiler that passes through the Source code of each Compilation unit only once A multi-pass compiler is a type of Compiler that processes the Source code or Abstract syntax tree of a program several times Many languages were designed so that they could be compiled in a single pass (e. g. , Pascal). Pascal is an influential imperative and procedural Programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small
In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.
The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. Compiler optimization is the process of tuning the output of a Compiler to minimize or maximize some attribute of an Executable computer program It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.
Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.
While the typical multi-pass compiler outputs machine code from its final pass, there are several other types:
DOALL statements). The front end analyzes the source code to build an internal representation of the program, called the intermediate representation or IR. In Computing, an intermediate representation (IR is a Data structure that is constructed from input data to a program, and from which part or all of the It also manages the symbol table, a data structure mapping each symbol in the source code to associated information such as location, type and scope. In Computer science, a symbol table is a Data structure used by a language translator such as a Compiler or interpreter, where each Identifier This is done over several phases, which includes some of the following:
The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. In Computer science, code generation is the process by which a Compiler 's code generator converts some internal representation of Source code Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators.
The main phases of the back end include the following:
Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For example, dependence analysis is crucial for loop transformation. In Compiler theory, dependence analysis produces execution-order constraints between statements/instructions In Compiler theory, loop optimization plays an important role in improving cache performance making effective use of parallel processing capabilities and reducing overheads
In addition, the scope of compiler analysis and optimizations vary greatly, from as small as a basic block to the procedure/function level, or even over the whole program (interprocedural optimization). In Computing, a basic block is code that has one entry point (i Interprocedural optimization (IPO is a Compiler technique used in Computer programming to improve performance in programs containing many frequently used functions Obviously, a compiler can potentially do a better job using a broader view. But that broad view is not free: large scope analysis and optimizations are very costly in terms of compilation time and memory space; this is especially true for interprocedural analysis and optimizations.
The existence of interprocedural analysis and optimizations is common in modern commercial compilers from HP, IBM, SGI, Intel, Microsoft, and Sun Microsystems. International Business Machines Corporation abbreviated IBM and nicknamed "Big Blue", is a multinational Computer Technology Silicon Graphics Inc (commonly initialised to SGI, historically sometimes referred to as Silicon Graphics Computer Systems or SGCS) is a company Microsoft Corporation is an American multinational Computer technology Corporation, which rose to dominate the Home computer Sun Microsystems Inc ( is a multinational vendor of Computers computer components Computer software, and Information technology services The open source GCC was criticized for a long time for lacking powerful interprocedural optimizations, but it is changing in this respect. The GNU Compiler Collection (usually shortened to GCC) is a set of Compilers produced for various Programming languages by the GNU Project Another good open source compiler with full analysis and optimization infrastructure is Open64, which is used by many organizations for research and commercial purposes. Open64 is an Open source, optimizing Compiler for the Intel IA-64 (Itanium AMD Opteron and Intel IA-32e architecture
Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip them by default. Users have to use compilation options to explicitly tell the compiler which optimizations should be enabled.
Assembly language is not a high-level language and a program that compiles it is more commonly known as an assembler, with the inverse program known as a disassembler. See the terminology section below for information regarding inconsistent use of the terms assembly and assembler A disassembler is a Computer program that translates Machine language into Assembly language —the inverse operation to that of an assembler.
A program that translates from a low level language to a higher level one is a decompiler. A decompiler is the name given to a Computer program that performs the reverse operation to that of a Compiler.
A program that translates between high-level languages is usually called a language translator, source to source translator, language converter, or language rewriter. In Mathematics, Computer science and Logic, rewriting covers a wide range of potentially non-deterministic methods of replacing subterms of a The last term is usually applied to translations that do not involve a change of language.