Crash Course on STL - C++ Programming Tutorials

Topic : Crash Course on STL
Author : Osvaldo
Page : 1 Next >>

Crash Course on STL

I though you would make good use of this info, because C++ may be ugly but it's far from dead, and also there are good ideas here...

What the STL is, how does it work
STL (the Standard Template Library) has some fundamental elements:

1.Iterators
2.Algorithms
3.Collections
4.Things (*)
(*) My personal terminology.  A Thing is any value in C++ (object or primitive).

Containers will hold Things, Iterators will walk on Containers and point to Things, and Algorithms will manipulate Things and/or Containers that have Things inside. The main idea of reuse is:

Algorithms, Containers and Things are independent from each other.

So you never have to write more than one version of any Algorithm. The same version works for all kinds of Things and can work on any kind of organization of Things (Containers).

Let's see first the basic of generics in C++. I can make stacks:

stack<int> s1;      s1.push(5);
stack<Person> s2;   s2.push(osvaldo);
stack<Person*> s3;  s3.push(new SpecialPerson);
stack<void*> s4;    s4.push(...any Thing you like...);

C++ templates offer much more detail, like default values for generic parameters, member templates, non-type parameters (e.g. "template<class Thing, int initialsize = 100> class stack..." uses both default and non-type
params) and lots of other fine details and rules for desambiguation, well I don't think you want to read these :)

Now, I can do a generic algorithm:

template <class Thing> Thing max (Thing a, Thing b)
{
    return a > b ? a : b;
}
x = max(y,z);
If y and z are of the same type (or can be coerced to a common type), then the compiler instantiates the template for that type (e.g. max<int>) and calls (or inlines) it in the expression. The instantiated code will replace '>' by whatever version of "operator>" applies for that type, which may be a language operator or a user-defined operator if Thing is a class.

Moving to STL, we must know about Iterators. They come in a number of types. Iterators are (conceptually) objects that know some kind of Container, and can bavigate it; but Iterators have an interface exactly compatible with that of C++ pointers (e.g. * to get the pointer element, ++ to move to next element and so on) and simple iterators are actually implemented as plain C pointers (this means zero overhead).

That's how I can use this:

for (stack<int>::iterator i = s1.begin(); i != s1.end(); ++i)
    cout << *i;

This strange code will get all Things in the stack and print them. begin() and end() are methods provided by all Containers; begin() produces an iterators for the first element (whatever "first" means in that kind of Container) and end() is a past-the-end iterator pointing to some INVALID position, just after the last valid one. Use of past-the-end logic is pervasive in STL, so you'd better be used to this... ranges are always [begin,end). That's why the comparision is != and not <= (this will become important soon). Finally, ++i moves the iterator to the next position and *i gets the element. WARNING: Not everything that looks like a C pointer is a C pointer!! If my Container is a file, the Iterator may be a more complex object, * will map (at compile time) to "read" and ++ will map to "seek"
(for example).

Iterator Categories
Let's see other foundation of STL's design. (STL has an very formal design down to the big-Oh performance specification of everything.)  Iterators come in five flavors:

1.Input
2.Output
3.Forward
4.Bidirectional
5.Random Access

Input Iterators can only be dereferenced and incremented; Output Iterators only support increment and assigned dereference (e.g. *i = value). And both cannot dereference the same position more than once. These are mostly used
for sequential files.

Forward Iterators are like Input+Output, but they support multiple accesses to the same element. Bidirectional Iterators are like Forward, but can also decrement. Finally, Random Access Iterators are like Bidirectional, but adds support for indexing (both [] and all C's pointer arithmetics); difference btw two RA Iterators is a Distance, but a Distance maps to an integer for most Iterators.

Generic Algorithms and Collections
Algorithms don't depend on the Collection's internal structures, but they depend on which class of Iterator the Collection provides -- there are some metaprotocols to query these, e.g. stack<int>::iterator will return a type that you can use to declare some variable like in my loop example. (In fact STL mimics Eiffel's "like" typing declarations by having public typedefs into the classes, returning relevant types...)

Collections will provide the most powerful iterator they can, and Algorithms will be declared to require the least powerful iterator that will do the job. For example, the "find" algorithm has this implementation:

template<class InputIter, class Thing> inline
Thing find(InputIter first, InputIter last, const Thing& value)
{
    for (; first != last; ++first)
        if (*first == value)
            break;
    return first;
}

InputIter is just a name, but if the user provides a type that doesn't implement all methods in the InputIterator specification, the compilation will fail IF those missing methods are actually used there. There is no concrete specification of Iterator protocols (in Java's JGL, interfaces are used); STL doesn't use abstract classes for this because that would exclude primitive types like plain pointers from qualification.

The return is an iterator for the position in the Container where the element was found; if nothing is found, the return is equal to end(), so standard usage of that is:

if (find(s1.begin(), s1.end(), value) == s1.end())
    cout << "Not found!";

Of course, if the user provides a better Iterator than required (e.g. Random Access) everything works fine.

Now, all this stuff may seem ugly, but the find() algorithm you see above will find Things on whatever Container you want: stacks, arrays, lists, deques, hashtables, strings, files, ropes, sets, multisets, plain C arrays, and whatever other Container the user may write. THIS is 50% of the "magic" of STL. The other 50% is the fact that your performance will match that of handcrafted, collection-specific algorithms; just trust me, or investigate C++ specific optimizations and the innards of STL if you don't... of course, some iterators and collections may be far more complex to implement than stacks and plain pointers. Oh yes, no matter what kind of Thing or Collection you have, find()'s precondition is always [first,last) being a valid range, and performance behavior is always O(N); also the spec rwequires that Things are comparable for equality (this is in the abstract protocol EqualityComparable which means having != and == and several invariants like identity, transitivity and other rules).

Iterators are not required to implement comparision operators other than == and !=; that's why you can't safely use "i <= s1.end()" in the first loop example -- that would work for most Iterators including all built-in ones, but it's legal to write an Iterator that doesn't implement operator<=.

Now you may wonder that Algorithms are not organized in an OO manner. That's true; all Algorithms are global functions, they are not methods of any class. In Java libraries (JGL or Collections) the Algorithms are all static methods, so your only advantage is some name organization (but then C++ has namespaces). The Collections have methods to be compatible to the required protocols (e.g. begin()) and for things like inserting or removing elements. There are core methods implemented by all Collections that apply; e.g. you can use push_back(Thing) to insert stuff on any Collection, and also wrappers like stack::push() (which maps to push_back). In addition to Iterators, _some_ Algorithms rely on _some_ core methods to manipulate generic Collections.

Once again, there are no abstract classes ore other language mechanism to specify the core methods that all Collections must answer to. This would prevent plain C arrays to qualify as Collections. All binding and checking is done statically by the compiler. There is no typing relationship btw collections. Inheritance is almost never used; every class is root. There is a TON of polymorphism in STL but this is implemented through overloading & template instantiation, instead of inheritance.

Functors and Other Fun Stuff

STL has a huge collection of stuff, including very specialized algorithms all the way from random_shuffle to set_symmetric_difference or next_permutation... but I will only show a final example that illustrates more cool features of STL.

template <class Thing> class hasName
{
    cchar* name; // p.s. cchar = const char
public:
    hasName (cchar* _name) : name(_name) {}
    bool operator () (Thing* obj) { return !strcmp(obj->name(), name); }
};

The code above defines an STL Functor. A Functor is something that emulates first-class functions and it's used as parameters in Algorithms like:

template<class InputIter, class Predicate> inline
InputIter find_if (InputIter first, InputIter last, Predicate pred);
This variant of find takes a third argument, a Predicate,

Page : 1 Next >>

C++ Home Menu