Sunday, July 31, 2022

C++ API Design Best Practices - Improving Performance


PASS INPUT ARGUMENTS BY CONST REFERENCE

  • Prefer const references over pointers to pass input parameters, that is, parameters that are not changed by the function. However, you should prefer pointers over non-const references for output parameters so that their mutability is clearly advertised to clients. This section now offers some additional performance reasons to prefer the use of const references to pass input arguments into a function.
  • By default, function arguments in C++ are passed “by value.” This means that the object being passed into the function is copied and then that copy is destroyed when the function returns. As a result, the original object that is passed into the method can never be modified. However, this involves the overhead of calling the object’s copy constructor and then the destructor. Instead you should pass a const reference to the object. This has the effect of only passing the pointer to the object, but also ensuring that the object is not modified by the method. This can be particularly important for embedded systems where you have a very limited stack size.

                        void SetValue(std::string str); // pass by value
                        void SetValue(std::string &str); // pass by reference
                        void SetValue(const std::string &str); // pass by const reference
  • Always prefer passing a non-mutable object as a const reference rather than passing it by value. This will avoid the memory and performance costs to create and destroy a temporary copy of the object and all of its members and inherited objects.
  • This rule only applies to objects. It is not necessary for built-in types, such as int, bool, float, double, or char, because these are small enough to fit in a CPU register. In addition, 
  • STL iterators and function objects are designed to be passed by value. However, for all other custom types, you should favor references or const references.
  • If you assume that the SetObjectByValue() and SetObjectByConstReference() methods both simply assign their argument to the mObject member variable, then the sequence of operations that get performed when each of these methods is called is as follows.
                        SetObjectByValue(object)
      1. std::string constructor
      2. MyObject copy constructor
      3. MyObject assignment operator
      4. MyObject destructor
      5. std::string destructor

                        SetObjectByConstReference(object)
      1. MyObject assignment operator
  • The situation becomes worse if MyObject is derived from some base class because then the copy constructor and destructor of each base class in the object hierarchy would also have to be called for the pass-by-value case.

 MINIMIZE #INCLUDE DEPENDENCIES

  • The time it takes to compile a large project can depend greatly on the number and depth of #include files. As such, one of the common techniques for decreasing build times is to try to reduce the number of #include statements in header files
  • Some APIs provide a single large header file that pulls in all of the classes and global definitions for the interface. This can seem like a convenient affordance for your clients, however, it only serves to increase the compile-time coupling between your clients’ code and your API, which means that even the most minimal use of your API must pull in every public symbol.
  • For example, the standard Win32 header windows.h pulls in well over 200,000 lines of code (under Visual Studio 9.0). Every .cpp file that includes this header effectively adds over 4 MB of extra code that needs to be loaded from around 90 separate files and compiled for every source file.

Forward Declarations

  • A header file, A, includes another header file, B, in order to pull in the declaration of a class, function, struct, enum, or other entity that is used in header A. The most common situation in an object-oriented program is that header A wants to pull in the declaration of one or more classes from header B. However, in many situations, header A does not actually need to include header B and can instead simply provide a forward declaration for the classes needed. A forward declaration can be used when
    1. The size of the class is not required. If you include the class as a member variable or subclass from it, then the compiler will need to know the size of the class.
    2. You do not reference any member methods of the class. Doing so would require knowing the method prototype: its argument and return types.
    3. You do not reference any member variables of the class; but you already know to never make those public (or protected).
  • use forward declarations if header A only refers to the name of classes from header B via pointers or references
                class B; // forward declaration
                class A
                {
                public:
                                void SetObject(const &B obj);
                private:
                                B *mObj;
                };
  • However, if you were to change the definition of class A so that that compiler needs to know theb actual size of class B, then you must include the actual declaration of class B, that is, you must #include its header. For example, if you store an actual copy of B inside of A.
                #include <B.h>
                class A
                {
                public:
                                void SetObject(const &B obj);
                private:
                                B mObj;
                };
  • As a rule of thumb, you should only need to #include the header file for a class if you use an object of that class as a data member in your own class or if you inherit from that class.

Redundant #include Guards

  • Another way to reduce the overhead of parsing too many include files is to add redundant preprocessor guards at the point of inclusion. For example, if you have an include file, bigfile.h, that looks like this

                #ifndef BIGFILE_H

                #define BIGFILE_H

                // lots and lots of code

                #endif

  • Then you might include this file from another header by doing the following

                #ifndef BIGFILE_H
                #include "bigfile.h"
                #endif
  • Consider adding redundant #include guards to your headers to optimize compile time for your clients.

DECLARING CONSTANTS

  • Often you want to define a number of public constants for your API. This is a great technique for avoiding the proliferation of hardcoded values throughout your client’s code, such as maximum values or default strings. For example, you might declare several constants in the global scope of your header in this way.
                const int MAX_NAME_LENGTH = 128;
                const float LOG_2E = log2(2.71828183f);
                const std::string LOG_FILENAME = "filename.log";
  • The issue to be aware of here is that only very simple constants for built-in types will be inlined by your C++ compiler. By default, any variable that you define in this way will cause your compiler to store space for the variable in every module that includes your header. In the aforementioned case, this will likely happen for both the float and the string constant. If you declare many constants and your API headers are included in many .cpp files, then this can cause bloat in the client’s .o object files and the final binary. The solution is to declare the constants as extern.
                extern const int MAX_NAME_LENGTH;
                extern const float LOG_2E;
                extern const std::string LOG_FILENAME;
  • Then define the value of each constant in the accompanying .cpp file. In this way, the space for the variables is only allocated once. This also has the additional benefit of hiding actual constant values from the header file: they are implementation details after all. 
  • A better way to do this is if you can declare the constants within a class. Then you can declare them as static const (so they will not count toward the per-object memory size).
                // myapi.h
                class MyAPI
                {
                public:
                                static const int MAX_NAME_LENGTH;
                                static const int MAX_RECORDS;
                                static const std::string LOG_FILENAME;
                };
  • You can then define the value for these constants in the associated .cpp file
                // myapi.cpp
                const int MyAPI::MAX_NAME_LENGTH = 128;
                const int MyAPI::MAX_RECORDS = 65525;
                const std::string MyAPI::LOG_FILENAME = "filename.log";
  • Another option to avoid the bloat issue in certain cases is to use enums as an alternative to variables or you could also use getter methods to return the constant values, such as
                // myapi.h
                class MyAPI
                {
                public:
                                static int GetMaxNameLength();
                                static int GetMaxRecords();
                                static std::string GetLogFilename();
                };
  • Declare global scope constants with extern or declare constants in classes as static const. Then define the value of the constant in the .cpp file. This reduces the size of object files for modules that include your headers. Even better, hide these constants behind a function call.

INITIALIZATION LISTS

  • C++ provides constructor initialization lists to let you easily initialize all of the member variables in your class. Using this feature can afford a slight performance increase over simply initializing each member variable in the body of the constructor.
  • 1st example
            // avatar.h
           class Avatar
           {
           public:
                      Avatar(const std::string &first, const std::string &last)
                      {
                                 mFirstName = first;
                                 mLastName = last;
                      }

           private
                      std::string mFirstName;
                      std::string mLastName;
           };
  • 2nd example - Use the following method, it will hide implementation details as well 
           // avatar.h
           class Avatar
           {
           public:
                      Avatar(const std::string &first, const std::string &last);

           private:
                      std::string mFirstName;
                      std::string mLastName;
           };

           // avatar.cpp 
           Avatar::Avatar(const std::string &first, const std::string &last) :
                                            mFirstName(first),
                                            mLastName(last)
           {           
           }



  • Because member variables are constructed before the body of the constructor is called, 
  • In the first example, the default std::string constructor will be called to initialize the mName member variable, and then inside the constructor, the assignment operator is called However,
  • In the second example, only the assignment operator is invoked.
  • Using an initialization list avoids the cost of calling the default constructor for each member variable that you include in the list.
  • Use constructor initialization lists to avoid the cost of a constructor call for each data member, but declare these in the .cpp file to hide implementation details.
  • Here are a few things to be aware of when using initialization lists:
    • The order of variables in the initialization list must match the order of the variables specified in the class.
    • Cannot specify arrays in an initialization list. However, you can specify a std::vector, which may be a better choice of data structure anyway.
    • If you are declaring a derived class, the default constructor for any base classes will be called implicitly. You can use the initialization list to call a non-default constructor instead. If specified, a call to a base class constructor must appear before any member variables.
    • If you have declared any of your member variables as references or as const, then you must initialize them via the initialization list (to avoid the default constructor defining their initial, and only, value).

MEMORY OPTIMIZATION

  • One key technique is to reduce the size of your objects: the smaller your objects are, the more of them can potentially fit into a cache. There are several ways that you can reduce the size of your objects.

  • Cluster member variables by type. Modern computers access memory a single “word” at a time. Your C++ compiler will therefore align certain data members so that their memory addresses fall on word boundaries. A number of unused padding bytes may be added to a structure in order to make this happen. By clustering all member variables of the same type next to each other, you can minimize the amount of memory loss to these padding bytes

  • Use bit fields. A bit field is a decorator for a member variable that specifies how many bits the variable should occupy, for example, int tinyInt:4. This is particularly useful for packing several bools into a single byte or for squeezing two or more numbers into the space of a single int. The downside is that there is normally a performance penalty for using bit field sizes that are not a multiple of 8, but if memory is your biggest concern then this may be an acceptable cost.


  • Use unions. A union is a structure where data members share the same memory space. This can be used to allow multiple values that are never used at the same time to share the same area of memory, thus saving memory. The size of a union is the size of the largest type in the union.
  • Don’t add virtual methods until you need them. I recommended this as a way to keep an API minimally complete, but there are also performance reasons to do this. Once you add one virtual method to a class, that class needs to have a vtable. Only one copy of the vtable needs to be allocated per class type, but a pointer to the vtable is stored in every instance of your object. This adds the size of one pointer to your overall object size (normally 4 bytes for a 32-bit application or 8 bytes for a 64-bit application).
  • Use explicit size-based types. The size of various types can differ by platform, compiler, and whether you are building a 32-bit or a 64-bit application. If you want to specify the exact size of a member variable, then you should use a type that specifically enforces this rather than assuming that types such as bool, short, or int will be a specific size. Unfortunately, the way to declare a fixed-size variable varies for different platforms. For example, on UNIX-based systems, the stdint.h header file provides types such as int8_t, uint32_t, and int64_t to specify an 8-bit integer, 32-bit unsigned integer, and a 64-bit integer, respectively.

  • Cluster the member variables based on their type and sort them based on the size of each type (bools, chars, then ints), then the size of the structure can be reduced to 32 bytes, a 33% reduction.
            class Fireworks_A
            {
                        bool mIsActive;
                        int mOriginX;
                        int mOriginY;
                        bool mVaryColor;
                        char mRed;
                        int mRedVariance;
                        char mGreen;
                        int mGreenVariance;
                        char mBlue;
                        int mBlueVariance;
                        bool mRepeatCycle;
                        int mTotalParticles;
                        bool mFadeParticles;
            };

            class Fireworks_B
            {
                        bool mIsActive;
                        bool mVaryColor;
                        bool mRepeatCycle;
                        bool mFadeParticles;
                        char mRed;
                        char mGreen;
                        char mBlue;
                        int mRedVariance;
                        int mGreenVariance;
                        int mBlueVariance;
                        int mTotalParticles;
                        int mOriginX;
                        int mOriginY;
            };


  • Cluster member variables by their type to optimize object size.
  • You can still squeeze a few more bytes out of the structure, however. By using bit fields you can make each of the bool flags occupy a single bit instead of an entire byte. Doing this lets you get the structure size down to 28 bytes, a 42% reduction.

            class Fireworks_C
            {
                        bool mIsActive:1;
                        bool mVaryColor:1;
                        bool mRepeatCycle:1;
                        bool mFadeParticles:1;
                        char mRed;
                        char mGreen;
                        char mBlue;
                        int mRedVariance;
                        int mGreenVariance;
                        int mBlueVariance;
                        int mTotalParticles;
                        int mOriginX;
                        int mOriginY;
            };
  • Use size-specific types, such as int32_t or uint16_t, to specify the maximum number of required bits for a variable.
            class Fireworks_D
            {
                        bool mIsActive:1;
                        bool mVaryColor:1;
                        bool mRepeatCycle:1;
                        bool mFadeParticles:1;
                        char mRed;
                        char mGreen;
                        char mBlue;
                        char mRedVariance;
                        char mGreenVariance;
                        char mBlueVariance;
                        int mTotalParticles;
                        short mOriginX;
                        short mOriginY;
            };
  • This version of our structure occupies only 16 bytes: a reduction of 66% from the original unoptimized size of 48 bytes.

COPY ON WRITE

  • One of the best ways to save memory is to not allocate any until you really need to. This is the essential goal of copy-on-write techniques. 
  • These work by allowing all clients to share a single resource until one of them needs to modify the resource. Only at that point is a copy made. Hence the name: copy on write. 
  • The advantage is that if the resource is never modified then it can be shared for all clients. This is related to the Flyweight design pattern, which describes objects that share as much memory as possible to minimize memory consumption
  • Use copy-on-write semantics to reduce the memory cost for many copies of your objects.


Array References

  • The primary purpose of this method is to perform. In essence, it is a way to collapse a series of connected nodes of a graph data structure into a sequential array data structure. This provides a data structure that can be very efficiently iterated over but also locates elements adjacent to each other in memory. 
  • The result is a data structure that can take better advantage of CPU caching strategies, as opposed to a tree structure where individual nodes in the tree may be fragmented across the process’s address space.
  • This technique is particularly efficient if the client keeps the same array around to service multiple calls to getAllPaths(). Also, any initial performance overhead to fill the array can be compensated for if the array is kept around to support multiple iterations over its elements.
  • Adopt an iterator model for traversing simple linear data structures. If you have a linked list or tree data structure, then consider using array references if iteration performance is critical.

ITERATING OVER ELEMENTS

  • Iterating over a collection of objects is an extremely common task for client code so it is worth spending some time looking at alternative strategies that offer different strengths and weaknesses. That way you can choose the best solution for your particular API requirements.
  • The STL approach to this problem is to use iterators. These are objects that can traverse over some or  all elements in a container class .
  • An iterator points to a single element in a container, with various operators available, such as operator* to return the current element, operator-> to access the members of the container element directly, and operatorþþ to step forward to the next element. 
  • This design intentionally mimics the interface of plain pointer manipulation in C/C++. Clients can then use the begin() and end() methods on each container class to return iterators that bound all elements in the container or they can use various STL algorithms that return iterators within the set of all elements, such as std::find(), std::lower_bound(), and std::upper_bound(). The following code segment provides a simple example of using an STL iterator to sum all the values in a std::vector:
            float sum = 0.0f;
            std::vector<float>::const_iterator it;
            for (it = values.begin(); it != values.end(); ++it)
            {
                        sum+=*it;
            }

  • In terms of your own API designs, here are some reasons why you may want to adopt an iterator model to allow your clients to iterate over data.

  1. Iterators are a well-known pattern that most engineers are already familiarwith.As such, using an iterator model in your own APIs will minimize the learning curve for users
  2. The iterator abstraction can be applied to simple sequential data structures, such as arrays or lists, as well as more complicated data structures, such as sets and maps, which are often implemented as self-balancing binary search trees such as red-black trees
  3. Iterators can be implemented quite efficiently, even as simply as a pointer in some cases.
  4. Iterators can be used to traverse massive data sets that may not even fit entirely into memory. For example, the iterator could be implemented to page in blocks of data from disk as needed and free previously processed blocks.
  5. Clients can create multiple iterators to traverse the same data and use these iterators simultaneously. In the case where clients wish to insert or delete elements while traversing the container, there are established patterns for doing this while maintaining the integrity of the iterators

Random Access

  • An iterator allows clients to traverse linearly through each element in a container. However, you may have cases where you wish to support random access to any element, such as accessing a specific element in an array or vector container. STL container classes that support random accesses provide this in a couple of ways.
  1. The [] operator. This is meant to simulate the array indexing syntax of C/C++. Normally this operator is implemented without any bounds checking so that it can be made very efficient.
  2. The at() method. This method is required to check if the supplied index is out of range and throw an exception in this case. As a result, this approach can be slower than the [] operator.

         float sum = 0.0f;
         const size_t len = values.size();
         for (size_t it = 0; it < len; ++it)
         {
                  sum += values[it];
         }

  • In terms of performance, these two methods are essentially equivalent.

References 

  • API Design for C++ Book by Martin Reddy

No comments:

Post a Comment

LeetCode C++ Cheat Sheet June

🎯 Core Patterns & Representative Questions 1. Arrays & Hashing Two Sum – hash map → O(n) Contains Duplicate , Product of A...