All Articles

What I didn't know about the C language

In contradiction with the title I will begin with a sentence about programming in C that I have already experienced.

A C program is like a fast dance on a newly waxed dance floor by people carrying razors.

— Waldi Ravens

Now, let’s begin.

Recently, inspired by a workshop about operating systems at my university, I decided to read more about the GNU C Library. “More” means that I started reading the manual. It took me some evenings to get through it, however, it was worth to spend this time on it. There are plenty of features I did not realize they existed and I believe they are not so widespread. Here I present some of them which I consider most curious even if they could not be so useful.

1. Customizing printf

If you use C++, you probably recognize this pattern:

class Foo {
   private:
      int foo;
      //...
   public:
      //...
      friend ostream &operator<<(ostream &output, const Foo &f) { 
         output << "foo : " << f.foo;
         return output;            
      }
};

It allows you to define custom behavior for your class when using the stream operator with an instance of the class and cout or other streams. In C (but only when using glibc) we can find a similar feature, and maybe even more powerful, for printf. How does it work? To understand that, this is a quick reminder of how printf works:

#include <stdio.h>

int main()
{
	int foo = 12;
	printf("Foo is %d", foo);
	return 0;
}

The function takes a string to display and some arguments to put into the string replacing the %d or other such “percentage” codes. This particular letter d means that the argument should be printed as an integer, but obviously, there are many more possibilities. We can display floats, characters, strings, pointers, and probably some more formats. Including our own. Here comes glibc with the function register_printf_specifier. Let’s see an example:

#include <stdio.h>
#include <printf.h>

typedef struct
{
	char *name;
	int age;
}
Person;

int print_person(FILE *stream, const struct printf_info *info, const void *const *args)
{
	const Person *p = *((const Person **)(args[0]));
	return fprintf(stream, "%s is %d years old", p->name, p->age);
}

int print_person_arginfo(const struct printf_info *info, size_t n, int *argtypes, int *sizes)
{
	if (n > 0)
	{
		argtypes[0] = PA_POINTER;
		sizes[0] = sizeof(Person*);
	}	
	return 1;
}

int main()
{
	register_printf_specifier ('P', print_person, print_person_arginfo);

	Person me;
	me.name = "Piotr";
	me.age = 20;

	printf("%P and is awesome", &me);
	//Piotr is 20 years old and is awesome
	return 0;
}

I will not explain this code and all the possibilities that this function provides because it could be enough for a detached post. To compile the code without warnings, you need to use the flag -Wformat=0 because compilers by default assume that the conversion P is wrong. And this a good point to notice that you should avoid using this feature. If a compiler gives you warnings, it means that you are doing something possibly dangerous and changing conversions can actually cause trouble. Accidentally you can override some default conversion or pass the wrong argument and your program can crash.

So why do I show you this? Because it is interesting.

2. Third argument of main()

There are two popular ways to declare the main() function:

//1. Without arguments
int main()

//2. With 2 arguments
int main(int argc, char **argv)

The second way takes two arguments - the number of arguments passed to the program (including the program name) and the array of these arguments. But one hardly knows about the third option with the third argument:

//3. With 3 arguments
int main(int argc, char **argv, char **envp)

What is the third argument? It is an array of environment variables. The same as the global variable environ and this one is preferred over the main() with three arguments. It is because of portability. The main() function declared as above is specific for most UNIX systems, but not for all and is not mentioned in any standard. So it is better not to use it, even if it works.

3. Interesting math functions

The math.h part of the library turns out to have lots of already implemented math functions. The existence of some of them surprised me. Let’s see them.

void sincos(double x, double *sinx, double *cosx)

You have an angle and need to compute both sine and cosine of it? You can do it in one function call!

double hypot(double x, double y)

Calculates sqrt(x*x + y*y) which is the length of the hypotenuse of a right triangle with sides of length x and y, or the distance of the point (x, y) from the origin. Using this function instead of the direct formula is wise since the error is much smaller.

double expm1(double x)

This function return a value equivalent to exp (x) - 1. They are computed in a way that is accurate even if x is near zero — a case where exp (x) - 1 would be inaccurate owing to subtraction of two nearly equal numbers.

double erf(double x)

Returns the error function of x. The error function is defined as:

erf (x) = 2/sqrt(pi) * integral from 0 to x of exp(-t^2) dt

There are many more implemented functions including gamma, Bessel, complex versions of many functions, etc. There are also functions meant to reduce an error of computations (like hypot), but some of them could not do it actually. Here is a snippet of the source code of glibc. The comment and the possible implementation of exp10 explain everything :)

double
__ieee754_exp10 (double arg)
{
  if (isfinite (arg) && arg < DBL_MIN_10_EXP - DBL_DIG - 10)
    return DBL_MIN * DBL_MIN;
  else
    /* This is a very stupid and inprecise implementation.  It'll get
       replaced sometime (soon?).  */
    return __ieee754_exp (M_LN10 * arg);
}

4. Hashtables and trees

Hashtables and binary trees are useful data structures for storing and searching for any type of element. They provide better complexity than traditional lists or arrays, therefore it is advised to use them especially for managing big amounts of data.

I was surprised that the glibc library implements such structures, but even more surprising for me was the way they are implemented. If it comes to hashing tables, it is mentioned in the manual:

The weakest aspect of this function is that there can be at most one hashing table used throughout the whole program. The table is allocated in local memory out of control of the programmer.

I failed to find out why it is like that, but I am sure that if you really need more hashtables, you can find another library which supports them.

If it comes to trees, using them is also a little bit tricky. There are functions for searching (with adding a new element or not), deleting elements, walking, and destroying the tree. I wrote a simple example of using trees from glibc, which you can see below. It is pretty complicated to use triple stars and dealing with void * every time, but I believe that after writing some helpful functions it would not be so complicated to use the trees in C.

#include <stdio.h>
#include <search.h>
#include <stdlib.h>
#include <string.h>

int strcmp_void(const void *a, const void *b){
	return strcmp(*(char **)a, *(char**)b);
	printf("aa: %s\n", *(char **)a);
}

void look_for(char **animal, void **tree)
{
	char ***ptr;

	ptr = (char ***)tfind((void *)animal, tree, strcmp_void);
	if(ptr != NULL)
		printf("%s is in the tree!\n", *animal);
	else
		printf("%s not found!\n", *animal);
}

void action(const void *nodep, const VISIT which, const int depth)
{
    char **datap;

    switch (which) {
	    case preorder:
	        break;
	    case postorder:
	        datap = *(char ***)nodep;
	        printf("%s\n", *datap);
	        break;
	    case endorder:
	        break;
	    case leaf:
	        datap = *(char ***)nodep;
	        printf("%s\n", *datap);
	        break;
    }
} 

int main()
{
	char *cat = "cat";
	char *frog = "frog";
	char *dog = "dog";
	char *giraffe = "giraffe";
	
	void *tree = NULL;

	tsearch((void *)(&cat), &tree, strcmp_void);
	tsearch((void *)(&frog), &tree, strcmp_void);
	tsearch((void *)(&dog), &tree, strcmp_void);
	
	look_for(&cat, &tree);
	look_for(&giraffe, &tree);
	
	printf("\nList of animals in the tree:\n");
	twalk(tree, action);

	tdelete((void *)(&dog), &tree, strcmp_void);

	printf("\nList of animals in the tree without dog:\n");
	twalk(tree, action);

	return 0;
}

// cat is in the tree!
// giraffe not found!
//
// List of animals in the tree:
// cat
// dog
// frog
//
// List of animals in the tree without dog:
// cat
// frog

That’s all!

Not all that I found, but enough to write about in one post. Obviously, there are much more interesting features and they are so complex that I would need to write individual articles about them, but my goal is to give you just superficial description of some of them and encourage to read more.

I hope that now you know something new and I invite you to glance at the GNU C Library Manual. It is really worth. And there are some Easter eggs inside, can you find them? Thank you for reading!