This article is a part of GNE. It is freely redistributable under the terms of the GNU Free Documentation License, version 1.1 or any later version published by the Free Software Foundation.

Author: Andrew Main (Zefram) <zefram@fysh.org> (see full article history)

the C programming language

C is a high-level imperative programming language, which characteristically provides a very close correspondence between high-level language statements and machine-language instructions. It was initially developed in the 1970s, being primarily used for systems programming, most notably for the implementation of the Unix operating system. In the 1980s its popularity broadened massively, coming into use in some form on almost all computing platforms, and it remained popular throughout the 1990s.

Two versions of C have been standardized, in 1989 and 1999. It has also spawned a profusion of dialects and descendant languages, most notably C++. Because of C's enormous popularity, parts of its syntax have also been adopted for inter-human communication among programmers.

More detailed information follows, organized into sections thus:

history

The initial developments towards C occurred on the nascent Unix operating system in 1969. When the need was felt for a system programming language, Unix co-inventor Ken Thompson designed and implemented the language B, derived from BCPL, which is the descendent of CPL, and ultimately of Algol 60. The name "B" was chosen as the first letter of "BCPL". Like BCPL, B is a typeless word-oriented language, and B inherits BCPL's block structure. However, B had to be stripped down to fit into the DEC PDP-7 on which Unix was being developed. Its syntax is therefore largely novel, and very terse.

In 1970, the Unix project moved to the DEC PDP-11, a byte-oriented machine. Although B was ported to the PDP-11, its architectural assumptions made it an inconvenient language to use. Starting in 1971, Unix co-inventor Dennis Ritchie extended the B language to support bytes as well as words. The later addition of structures, more conventional array semantics, and later generalizations of the type system, yielded the earliest incarnation of C. Ritchie deliberately left unanswered the question of whether the name "C" was chosen as the next letter of the alphabet after "B" or as the next letter of "BCPL".

The remainder of C's most formative development occurred in 1972-1973, and in 1973 Unix itself was finally rewritten in C. Further development, primarily in the type system, occurred up to 1980. During this period, an intermediate version of the language was described in the book "The C Programming Language". This version of the language became a de facto standard, named "K&R C" after the authors of the book, Brian Kernighan and Dennis Ritchie.

During the 1980s, C compilers spread widely, and C became an extremely popular language. The previously largely unaddressed issues of portability that this raised, and the deficiencies of the K&R standard (which was already out of date), led to an observed need for more formal standardization. An ANSI standardization effort, from 1983 to 1989, led to a formal standard, ANSI X3.159-1989, which was subsequently adopted by ISO as ISO/IEC 9899:1990.

The 1989 standard went slightly beyond codifying the language already in use, by continuing the language development a little, in particular making the type system more complete. The resulting "ANSI C" or "C89" dialect was progressively adopted during the early 1990s, so that by around 1995 K&R C was viewed as a historical artifact, rather than as a current language.

A second round of standardization, starting in 1995, developed the language further, principally by adding syntactic features for the convenience of programmers. This resulted in the "C99" standard, ISO/IEC 9899:1999, which to date (2001) has yet to make any significant impact.

dialects

The mainline C dialects (K&R C, ANSI C and C99) are discussed in the history section. Although most C compilers have always made a handful of changes to the C language that they implement, few such dialects have gained significance in their own right. One of the few, and probably the most significant current non-standard C dialect, is GNU C, which is the C implemented by the GCC C compiler. Although, like other compiler-specific dialects, many of its features are tied to the internals of the compiler, it has anticipated or even originated some of the features of C99, such as variable length arrays and the long long type.

There is also a large group of languages derived from C. Because of C's familiarity and the ease of access to compilers, it has been used as the basis for many experimental and some non-experimental projects in adding specialized features to an existing general purpose language. A few such projects have grown to the status of being independent languages. There are also some true descendant languages, that do not maintain compatibility with C. In approximate chronological order, here are the most significant of both types of descendant:

technical aspects

C is a high-level imperative programming language, designed to be statically compiled into machine code for byte-oriented architectures. Many of its basic features are typical of languages in this class. For example, expressions and variables are strongly and statically typed; there is strictly lexical scoping with no function closures; dynamic storage allocation is entirely manual; and there are many details of the language semantics that vary with the target architecture, for efficiency.

There is also much about C that is atypical. The following paragraphs discuss C's unusual or original features.

C is, at its core, a fairly small language, due to its origin in very small computers. Because of its closeness to the machine, it is easily implemented, with C code naturally compiling into a very similar amount of machine code. There are no I/O primitives in the language itself; all interaction with the environment is relegated to a library of separately compiled functions. All of these aspects make it well suited to embedded computing, where it is indeed very popular.

Because so little is built into the language itself, those operators that are part of the language are reused for many purposes. The composition of complex data types from a handful of basic data types that are native to the target architecture is very transparent. The language gives a number of partial guarantees about the representation of certain data types, and the relationships between them, allowing the programmer to implement many operations by direct manipulation of bits in memory, essentially using the C language as a portable machine language. The language imposes few restrictions to impede the programmer in doing such things, and there is a history of lax enforcement by compilers of the official restrictions.

A particularly notable area in which C promotes direct manipulation of data structures is arrays. C guarantees that arrays are laid out contiguously, and so has remarkably flexible array handling, due to the flexibility of its pointer handling and direct memory manipulation. In fact, for historical reasons, even very basic array operations require the use of pointers; array indexing, for example, is made up of pointer arithmetic, with only a little syntactic sugar.

C has flow control primitives mostly typical of languages in its class, with the block structuring derived from BCPL. It includes the goto of imperative languages; it is not strictly necessary, due to the code structuring capabilities, but its use is not as controversial among C programmers as it is among, for example, programmers of Pascal-derived languages. Unusually for its class, though, C has no for loop in the conventional sense; it instead has some syntactic sugar for a while loop, leaving loop counter advancement for the programmer to specify.

Another unusual flow control feature of C is the relatively unstructured nature of its switch statement (its version of what is usually referred to as a case statement). Instead of selecting between alternative code sequences to execute, it selects between a set of labels to jump to; while in syntax it looks much like a conventional case statement, its semantics are more akin to the `computed goto' of FORTRAN. Full use of this facility is controversial, as particularly noted in the case of the infamous `Duff's device', which uses the freedom in label placement to interleave the structures of a switch statement and a while loop.

Another noted feature of C's syntax is its declarations, which avoid requiring additional keywords to describe pointers, arrays, and functions, by reusing the syntax for dereferencing pointers, indexing arrays, and calling functions. A C declaration consists of a base type and a `declarator', which gives the name of the object being declared and the means of deriving its type from the base type, in the approximate form of an expression involving the named object and having the type of the declared base type. This feature originated early in C's development, when multiply-derived types were added to the language, and is one of the features consistently copied in languages derived from C.

references

History