C Programming Tutorial

1. Introduction.
C is a computer language available on the GCOS and UNIX operating systems at Murray Hill and (in preliminary form) on OS/360 at Holmdel. C lets you write your programs clearly and simply it has decent control flow facilities so your code can be read straight down the page, without labels or GOTO's; it lets you write code that is compact without being too cryptic; it encourages modularity and good program organization; and it provides good data-structuring facilities.
This memorandum is a tutorial to make learning C as painless as possible. The first part concentrates on the central features of C; the second part discusses those parts of the language which are useful (usually for getting more efficient and smaller code) but which are not necessary for the new user. This is not a reference manual. Details and special cases will be skipped ruthlessly, and no attempt will be made to cover every language feature. The order of presentation is hopefully pedagogical instead of logical. Users who would like the full story should consult the "C Reference Manual" by D. M. Ritchie [1], which should be read for details anyway. Runtime support is described in [2] and [3]; you will have to read one of these to learn how to compile and run a C program.
We will assume that you are familiar with the mysteries of creating files, text editing, and the like in the operating system you run on, and that you have programmed in some language before.
2. A Simple C Program
main( ) {
printf("hello, world");
}
A C program consists of one or more functions, which are similar to the functions and subroutines of a Fortran program or the procedures of PL/I, and perhaps some external data definitions. main is such a function, and in fact all C programs must have a main. Execution of the program begins at the first statement of main. main will usually invoke other functions to perform its job, some coming from the same program, and others from libraries.
One method of communicating data between functions is by arguments. The parentheses following the function name surround the argument list; here main is a function of no arguments, indicated by ( ). The {} enclose the statements of the function. Individual statements end with a semicolon but are otherwise free-format.
printf is a library function which will format and print output on the terminal (unless some other destination is specified). In this case it prints
hello, world
A function is invoked by naming it, followed by a list of arguments in parentheses. There is no CALL statement as in Fortran or PL/I.
3. A Working C Program; Variables; Types and Type Declarations
Here's a bigger program that adds three integers and prints their sum.
main( ) {
int a, b, c, sum;
a = 1; b = 2; c = 3;
sum = a + b + c;
printf("sum is %d", sum);
}
Arithmetic and the assignment statements are much the same as in Fortran (except for the semicolons) or PL/I. The format of C programs is quite free. We can put several statements on a line if we want, or we can split a statement among several lines if it seems desirable. The split may be between any of the operators or variables, but not in the middle of a name or operator. As a matter of style, spaces, tabs, and newlines should be used freely to enhance readability.
C has four fundamental types of variables:
 int integer (PDP-11: 16 bits; H6070: 36 bits; IBM360: 32 bits)
 char one byte character (PDP-11, IBM360: 8 bits; H6070: 9 bits)
 float single-precision floating point
 double double-precision floating point
There are also arrays and structures of these basic types, pointers to them and functions that return them, all of which we will meet shortly.
All variables in a C program must be declared, although this can sometimes be done implicitly by context. Declarations must precede executable statements. The declaration
int a, b, c, sum;
declares a, b, c, and sum to be integers.
Variable names have one to eight characters, chosen from A-Z, a-z, 0-9, and _, and start with a non-digit. Stylistically, it's much better to use only a single case and give functions and external variables names that are unique in the first six characters. (Function and external variable names are used by various assemblers, some of which are limited in the size and case of identifiers they can handle.) Furthermore, keywords and library functions may only be recognized in one case.
4. Constants
We have already seen decimal integer constants in the previous example-- 1, 2, and 3. Since C is often used for system programming and bit-manipulation, octal numbers are an important part of the language. In C, any number that begins with 0 (zero!) is an octal integer (and hence can't have any 8's or 9's in it). Thus 0777 is an octal constant, with decimal value 511.
A ``character'' is one byte (an inherently machine-dependent concept). Most often this is expressed as a character constant, which is one character enclosed in single quotes. However, it may be any quantity that fits in a byte, as in flags below:
char quest, newline, flags;
quest = '?';
newline = '\n';
flags = 077;
The sequence `\n' is C notation for ``newline character'', which, when printed, skips the terminal to the beginning of the next line. Notice that `\n' represents only a single character. There are several other ``escapes'' like `\n' for representing hard-to-get or invisible characters, such as `\t' for tab, `\b' for backspace, `\0' for end of file, and `\\' for the backslash itself.
float and double constants are discussed in section 26.
5. Simple I/O -- getchar, putchar, printf
main( ) {
char c;
c = getchar( );
putchar(c);
}
getchar and putchar are the basic I/O library functions in C. getchar fetches one character from the standard input (usually the terminal) each time it is called, and returns that character as the value of the function. When it reaches the end of whatever file it is reading, thereafter it returns the character represented by `\0' (ascii NUL, which has value zero). We will see how to use this very shortly.
putchar puts one character out on the standard output (usually the terminal) each time it is called. So the program above reads one character and writes it back out. By itself, this isn't very interesting, but observe that if we put a loop around this, and add a test for end of file, we have a complete program for copying one file to another.
printf is a more complicated function for producing formatted output. We will talk about only the simplest use of it. Basically, printf uses its first argument as formatting information, and any successive arguments as variables to be output. Thus
printf ("hello, world\n");
is the simplest use. The string ``hello, world\n'' is printed out. No formatting information, no variables, so the string is dumped out verbatim. The newline is necessary to put this out on a line by itself. (The construction
"hello, world\n"
is really an array of chars. More about this shortly.)
More complicated, if sum is 6,
printf ("sum is %d\n", sum);
prints
sum is 6
Within the first argument of printf, the characters ``%d'' signify that the next argument in the argument list is to be printed as a base 10 number.
Other useful formatting commands are ``%c'' to print out a single character, ``%s'' to print out an entire string, and ``%o'' to print a number as octal instead of decimal (no leading zero). For example,
n = 511;
printf ("What is the value of %d in octal?", n);
printf ("%s! %d decimal is %o octal\n", "Right", n, n);
prints
What is the value of 511 in octal? Right! 511 decimal
is 777 octal
Notice that there is no newline at the end of the first output line. Successive calls to printf (and/or putchar, for that matter) simply put out characters. No newlines are printed unless you ask for them. Similarly, on input, characters are read one at a time as you ask for them. Each line is generally terminated by a newline (\n), but there is otherwise no concept of record.
6. If; relational operators; compound statements
The basic conditional-testing statement in C is the if statement:
c = getchar( );
if( c == '?' )
printf("why did you type a question mark?\n");
The simplest form of if is
if (expression) statement
The condition to be tested is any expression enclosed in parentheses. It is followed by a statement. The expression is evaluated, and if its value is non-zero, the statement is executed. There's an optional else clause, to be described soon.
The character sequence `==' is one of the relational operators in C; here is the complete set:
== equal to (.EQ. to Fortraners)
!= not equal to
> greater than
<>= greater than or equal to
<= less than or equal to The value of ``expression relation expression'' is 1 if the relation is true, and 0 if false. Don't forget that the equality test is `=='; a single `=' causes an assignment, not a test, and invariably leads to disaster. Tests can be combined with the operators `&&' (AND), `||' (OR), and `!' (NOT). For example, we can test whether a character is blank or tab or newline with if( c==' ' || c=='\t' || c=='\n' ) ... C guarantees that `&&' and `||' are evaluated left to right -- we shall soon see cases where this matters. One of the nice things about C is that the statement part of an if can be made arbitrarily complicated by enclosing a set of statements in {}. As a simple example, suppose we want to ensure that a is bigger than b, as part of a sort routine. The interchange of a and b takes three statements in C, grouped together by {}: if (a < t =" a;" a =" b;" b =" t;" c="getchar(" c =" getchar(" x =" y" z =" 0;" c =" getchar(" x =" a%b;" c =" c" c="getchar(" x =" a;" x =" b;" x="a." j="0;" i="0;" sum="0;" n="0;" c="getchar(" length="'%d\n" let="dig"> size || c < c =" size;" i =" 0;" b =" &a;" b =" &a;" c =" *b;" c =" *b'" y =" &x[0];" y =" x;" y =" &x[0];" n="0;" n="0;" n="0;" temp =" *x;" x =" *y;" y =" temp;" i="1;" aflag =" bflag" cflag =" 0;"> 1 && argv[1][0] == '-' ) {
for( i=1; (c=argv[1][i]) != '\0'; i++ )
if( c=='a' )
aflag++;
else if( c=='b' )
bflag++;
else if( c=='c' )
cflag++;
else
printf("%c?\n", c);
--argc;
++argv;
}
...
There are several things worth noticing about this code. First, there is a real need for the left-to-right evaluation that && provides; we don't want to look at argv[1] unless we know it's there. Second, the statements
--argc;
++argv;
let us march along the argument list by one position, so we can skip over the flag argument as if it had never existed; the rest of the program is independent of whether or not there was a flag argument. This only works because argv is a pointer which can be incremented.
19. The Switch Statement; Break; Continue
The switch statement can be used to replace the multi-way test we used in the last example. When the tests are like this:
if( c == 'a' ) ...
else if( c == 'b' ) ...
else if( c == 'c' ) ...
else ...
testing a value against a series of constants, the switch statement is often clearer and usually gives better code. Use it like this:
switch( c ) {

case 'a':
aflag++;
break;
case 'b':
bflag++;
break;
case 'c':
cflag++;
break;
default:
printf("%c?\n", c);
break;
}
The case statements label the various actions we want; default gets done if none of the other cases are satisfied. (A default is optional; if it isn't there, and none of the cases match, you just fall out the bottom.)
The break statement in this example is new. It is there because the cases are just labels, and after you do one of them, you fall through to the next unless you take some explicit action to escape. This is a mixed blessing. On the positive side, you can have multiple cases on a single statement; we might want to allow both upper and lower
case 'a': case 'A': ...

case 'b': case 'B': ...
etc.
But what if we just want to get out after doing case `a' ? We could get out of a case of the switch with a label and a goto, but this is really ugly. The break statement lets us exit without either goto or label.
switch( c ) {

case 'a':
aflag++;
break;
case 'b':
bflag++;
break;
...
}
/* the break statements get us here directly */
The break statement also works in for and while statements; it causes an immediate exit from the loop.
The continue statement works only inside for's and while's; it causes the next iteration of the loop to be started. This means it goes to the increment part of the for and the test part of the while. We could have used a continue in our example to get on with the next iteration of the for, but it seems clearer to use break instead.
20. Structures
The main use of structures is to lump together collections of disparate variable types, so they can conveniently be treated as a unit. For example, if we were writing a compiler or assembler, we might need for each identifier information like its name (a character array), its source line number (an integer), some type information (a character, perhaps), and probably a usage count (another integer).
char id[10];
int line;
char type;
int usage;
We can make a structure out of this quite easily. We first tell C what the structure will look like, that is, what kinds of things it contains; after that we can actually reserve storage for it, either in the same statement or separately. The simplest thing is to define it and allocate storage all at once:
struct {
char id[10];
int line;
char type;
int usage;
} sym;
This defines sym to be a structure with the specified shape; id, line, type and usage are members of the structure. The way we refer to any particular member of the structure is
structure-name . member
as in
sym.type = 077;
if( sym.usage == 0 ) ...
while( sym.id[j++] ) ...
etc.
Although the names of structure members never stand alone, they still have to be unique; there can't be another id or usage in some other structure.
So far we haven't gained much. The advantages of structures start to come when we have arrays of structures, or when we want to pass complicated data layouts between functions. Suppose we wanted to make a symbol table for up to 100 identifiers. We could extend our definitions like
char id[100][10];
int line[100];
char type[100];
int usage[100];
but a structure lets us rearrange this spread-out information so all the data about a single identifer is collected into one lump:
struct {
char id[10];
int line;
char type;
int usage;
} sym[100];
This makes sym an array of structures; each array element has the specified shape. Now we can refer to members as
sym[i].usage++; /* increment usage of i-th identifier */
for( j=0; sym[i].id[j++] != '\0'; ) ...
etc.
Thus to print a list of all identifiers that haven't been used, together with their line number,
for( i=0; i= 0 )
sym[index].usage++; /* already there ... */
else
install(newname, newline, newtype);
...
}

lookup(s)
char *s; {
int i;
extern struct {
char id[10];
int line;
char type;
int usage;
} sym[ ];

for( i=0; i 0 )
return(i);
return(-1);
}

compar(s1,s2) /* return 1 if s1==s2, 0 otherwise */
char *s1, *s2; {
while( *s1++ == *s2 )
if( *s2++ == '\0' )
return(1);
return(0);
}
The declaration of the structure in lookup isn't needed if the external definition precedes its use in the same source file, as we shall see in a moment.
Now what if we want to use pointers?
struct symtag {
char id[10];
int line;
char type;
int usage;
} sym[100], *psym;

psym = &sym[0]; /* or p = sym; */
This makes psym a pointer to our kind of structure (the symbol table), then initializes it to point to the first element of sym.
Notice that we added something after the word struct: a ``tag'' called symtag. This puts a name on our structure definition so we can refer to it later without repeating the definition. It's not necessary but useful. In fact we could have said
struct symtag {
... structure definition
};
which wouldn't have assigned any storage at all, and then said
struct symtag sym[100];
struct symtag *psym;
which would define the array and the pointer. This could be condensed further, to
struct symtag sym[100], *psym;
The way we actually refer to an member of a structure by a pointer is like this:
ptr -> structure-member
The symbol `->' means we're pointing at a member of a structure; `->' is only used in that context. ptr is a pointer to the (base of) a structure that contains the structure member. The expression ptr->structure-member refers to the indicated member of the pointed-to structure. Thus we have constructions like:
psym->type = 1;
psym->id[0] = 'a';
and so on.
For more complicated pointer expressions, it's wise to use parentheses to make it clear who goes with what. For example,
struct { int x, *y; } *p;
p->x++ increments x
++p->x so does this!
(++p)->x increments p before getting x
*p->y++ uses y as a pointer, then increments it
*(p->y)++ so does this
*(p++)->y uses y as a pointer, then increments p
The way to remember these is that ->, . (dot), ( ) and [ ] bind very tightly. An expression involving one of these is treated as a unit. p->x, a[i], y.x and f(b) are names exactly as abc is.
If p is a pointer to a structure, any arithmetic on p takes into account the actual size of the structure. For instance, p++ increments p by the correct amount to get the next element of the array of structures. But don't assume that the size of a structure is the sum of the sizes of its members -- because of alignments of different sized objects, there may be ``holes'' in a structure.
Enough theory. Here is the lookup example, this time with pointers.
struct symtag {
char id[10];
int line;
char type;
int usage;
} sym[100];

main( ) {
struct symtag *lookup( );
struct symtag *psym;
...
if( (psym = lookup(newname)) ) /* non-zero pointer */
psym -> usage++; /* means already
there */
else
install(newname, newline, newtype);
...
}

struct symtag *lookup(s)
char *s; {
struct symtag *p;
for( p=sym; p < &sym[nsym]; p++ ) if( compar(s, p->id) > 0)
return(p);
return(0);
}
The function compar doesn't change: `p->id' refers to a string.
In main we test the pointer returned by lookup against zero, relying on the fact that a pointer is by definition never zero when it really points at something. The other pointer manipulations are trivial.
The only complexity is the set of lines like
struct symtag *lookup( );
This brings us to an area that we will treat only hurriedly; the question of function types. So far, all of our functions have returned integers (or characters, which are much the same). What do we do when the function returns something else, like a pointer to a structure? The rule is that any function that doesn't return an int has to say explicitly what it does return. The type information goes before the function name (which can make the name hard to see).
Examples:
char f(a)
int a; {
...
}

int *g( ) { ... }

struct symtag *lookup(s) char *s; { ... }
The function f returns a character, g returns a pointer to an integer, and lookup returns a pointer to a structure that looks like symtag. And if we're going to use one of these functions, we have to make a declaration where we use it, as we did in main above.
Notice the parallelism between the declarations
struct symtag *lookup( );
struct symtag *psym;
In effect, this says that lookup( ) and psym are both used the same way - as a pointer to a structure -- even though one is a variable and the other is a function.
21. Initialization of Variables
An external variable may be initialized at compile time by following its name with an initializing value when it is defined. The initializing value has to be something whose value is known at compile time, like a constant.
int x 0; /* "0" could be any constant */
int a 'a';
char flag 0177;
int *p &y[1]; /* p now points to y[1] */
An external array can be initialized by following its name with a list of initializations enclosed in braces:
int x[4] {0,1,2,3}; /* makes x[i] = i */
int y[ ] {0,1,2,3}; /* makes y big enough for 4 values */
char *msg "syntax error\n"; /* braces unnecessary here */
char *keyword[ ]{
"if",
"else",
"for",
"while",
"break",
"continue",
0
};
This last one is very useful -- it makes keyword an array of pointers to character strings, with a zero at the end so we can identify the last element easily. A simple lookup routine could scan this until it either finds a match or encounters a zero keyword pointer:
lookup(str) /* search for str in keyword[ ] */
char *str; {
int i,j,r;
for( i=0; keyword[i] != 0; i++) {
for( j=0; (r=keyword[i][j]) == str[j] && r != '\0'; j++ );
if( r == str[j] )
return(i);
}
return(-1);
}
Sorry -- neither local variables nor structures can be initialized.
22. Scope Rules: Who Knows About What
A complete C program need not be compiled all at once; the source text of the program may be kept in several files, and previously compiled routines may be loaded from libraries. How do we arrange that data gets passed from one routine to another? We have already seen how to use function arguments and values, so let us talk about external data. Warning: the words declaration and definition are used precisely in this section; don't treat them as the same thing.
A major shortcut exists for making extern declarations. If the definition of a variable appears before its use in some function, no extern declaration is needed within the function. Thus, if a file contains
f1( ) { ... }

int foo;

f2( ) { ... foo = 1; ... }

f3( ) { ... if ( foo ) ... }
no declaration of foo is needed in either f2 or or f3, because the external definition of foo appears before them. But if f1 wants to use foo, it has to contain the declaration
f1( ) {
extern int foo;
...
}
This is true also of any function that exists on another file; if it wants foo it has to use an extern declaration for it. (If somewhere there is an extern declaration for something, there must also eventually be an external definition of it, or you'll get an ``undefined symbol'' message.)
There are some hidden pitfalls in external declarations and definitions if you use multiple source files. To avoid them, first, define and initialize each external variable only once in the entire set of files:
int foo 0;
You can get away with multiple external definitions on UNIX, but not on GCOS, so don't ask for trouble. Multiple initializations are illegal everywhere. Second, at the beginning of any file that contains functions needing a variable whose definition is in some other file, put in an extern declaration, outside of any function:
extern int foo;

f1( ) { ... }
etc.
The #include compiler control line, to be discussed shortly, lets you make a single copy of the external declarations for a program and then stick them into each of the source files making up the program.
23. #define, #include
C provides a very limited macro facility. You can say
#define name something
and thereafter anywhere ``name'' appears as a token, ``something'' will be substituted. This is particularly useful in parametering the sizes of arrays:
#define ARRAYSIZE 100
int arr[ARRAYSIZE];
...
while( i++ < x =" x">> right shift (arithmetic on PDP-11; logical on H6070,
IBM360)
25. Assignment Operators
An unusual feature of C is that the normal binary operators like `+', `-', etc. can be combined with the assignment operator `=' to form new assignment operators. For example,
x =- 10;
uses the assignment operator `=-' to decrement x by 10, and
x =& 0177
forms the AND of x and 0177. This convention is a useful notational shortcut, particularly if x is a complicated expression. The classic example is summing an array:
for( sum=i=0; i sum =+ array[i];
But the spaces around the operator are critical! For
x = -10;
sets x to -10, while
x =- 10;
subtracts 10 from x. When no space is present,
x=-10;
also decreases x by 10. This is quite contrary to the experience of most programmers. In particular, watch out for things like
c=*s++;
y=&x[0];
both of which are almost certainly not what you wanted. Newer versions of various compilers are courteous enough to warn you about the ambiguity.
Because all other operators in an expression are evaluated before the assignment operator, the order of evaluation should be watched carefully:
x = x<means ``shift x left y places, then OR with z, and store in x.'' But
x =<< y | z;
means ``shift x left by y|z places'', which is rather different.
26. Floating Point
We've skipped over floating point so far, and the treatment here will be hasty. C has single and double precision numbers (where the precision depends on the machine at hand). For example,
double sum;
float avg, y[10];
sum = 0.0;
for( i=0; i sum =+ y[i];
avg = sum/n;
forms the sum and average of the array y.
All floating arithmetic is done in double precision. Mixed mode arithmetic is legal; if an arithmetic operator in an expression has both operands int or char, the arithmetic done is integer, but if one operand is int or char and the other is float or double, both operands are converted to double. Thus if i and j are int and x is float,
(x+i)/j converts i and j to float
x + i/j does i/j integer, then converts
Type conversion may be made by assignment; for instance,
int m, n;
float x, y;
m = x;
y = n;
converts x to integer (truncating toward zero), and n to floating point.
Floating constants are just like those in Fortran or PL/I, except that the exponent letter is `e' instead of `E'. Thus:
pi = 3.14159;
large = 1.23456789e10;
printf will format floating point numbers: ``%w.df'' in the format string will print the corresponding variable in a field w digits wide, with d decimal places. An e instead of an f will produce exponential notation.

No comments: