A multi-core system is a computer with multiple central processing units (CPU) or cores that are unified into one. The CPUs are also independent of each other. The cores perform basic computing tasks such as running programs, managing data, executing instructions, etc. The difference between single-core computer systems (e.g., Intel Pentium 4/AMD Athlon 64 FX-55) and multi-core systems is that multi-core systems can run multiple programs and instructions comfortably at the same time, thereby increasing the speed and agility of the computer. The cores are usually fitted onto a single integrated circuit board die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package.Since 2005, single-core systems have almost become history.
Information Communication Technology has evolved very fast in the past five years. Computer chip manufacturers are constantly targeting higher clock speeds and transistor density by having more processing cores and hardware threads per chip. A multi-core processor could have two cores, e.g., dual-core CPUs such as the Intel Celeron Dual-Core, the world’s first dual-core processor for entry-level computers, three cores as with AMD Phenom II X3, four to eight cores as with IBM Power 7 series, etc. High-end multi-core processors such as Intel Xeon Phi series have as many as 57-61 processing cores. Multi-core processors are indispensable for many computer application domains such as graphic processing units (GPUs), digital signal processing (DSP), and much more.
Multi-core processors are ideal for use in servers because they boost the number of users that can share server resources simultaneously. Servers also have independent threads of execution that enables web and application servers to have much better throughput.
Advantages of Multi-Core Processor
The location of multiple processing cores on the same die means the cache coherency circuitry can operate at higher clock speeds than if the signals have to be sent outside the chip. Signals communication between the cores travel a shorter distance and therefore signals are less likely to degrade, thus permitting more data to be transferred in a session and the signal does not have to be amplified often.
Since multiple cores are fitted into one die package, multi-core CPUs requires smaller printed circuit board (PCB) surface than two single cores coupled together.
A multi-core processor uses lower amounts of power than two wedged single-core processors because less power is required to drive electric pulse back and forth between chips.
Multi-cores share common circuitry, L2 cache, and front-side bus (FSB). Multi-core systems make better use of CPU core library architecture required to produce a system with low risk of design error.
Multi-core processors have higher performance with low power. This advantage makes it possible to use multi-core processor on battery-powered mobile devices.
Disadvantages of Multi-Core Processor
Before a multi-core processor can be used on any device, both the device’s operating system and existing application software have to be adapted to suit it.
Multi-core processors don’t just improve system performance on their own. The ability of multi-core processors to improve system performance largely depends on the utilization of multiple threads by the applications.
The legendary heat production problems in multi-core systems, particularly heat in mobile devices, are hard to manage.
Technically, crude processing power is really not the only factor required to boost system performance. Other factors such as memory, motherboard circuitry, cache, and on-board bandwidth also play important roles. Having more processing cores without addressing other factors wouldn’t bring any significant performance improvement.
The improvement in performance of a multi-core processor depends largely on the software algorithms used to implement it. Performance could be limited by the fraction of software that can run in parallel (the act of processing large amount data simultaneously) on the cores. There are many applications can be run on a single core, hence a multi-core system might be of little use for the application itself due to inability to spread the process evenly across multiple cores and the single thread doing all the processing work. Hence, multi-core processing determines the way modern software is built.
Some programming languages are not compatible with multi-core systems. To share application workload among the processors can sometimes be daunting, though there are various ways to deal with the problem, such as packing in coordination language or higher-order functions. Each block of the application can have a different unique implementation mode for different types of processor. During implementation, the program’s compiler chooses the best implementation mode based on the context. Application developers are required to make use of numerical libraries to access compatible languages such as FORTRAN and C which perform computations faster than popular programming languages such as C#.
High power consumption and heat problems mean that more emphasis is placed on multi-core chip design and threading. Multi-threading software to take advantage of the multi-core system’s higher clock speeds is what actually improves the computer’s performance. If developers of a program are unable to design it to fully exploit the advantages provided by multiple cores, it doesn’t make sense making such programs, as they would not be able to reach the system’s performance ceiling. To overcome this problem, two dual-cores may be implemented on a die with a single unified cache. Any of the two dual-core dies can be used, instead of running over three cores on a die.
To program a multithreaded code requires complex and careful coordination of threads. A simple error can introduce subtle bugs that are difficult to find due to the interweaving of shared data between multiple core threads. As a result, such programs are more difficult to debug when it breaks. Consequently, there aren’t many consumer-level threaded applications because most computer users hardly make maximum use of computer hardware.
Applications are required to perform and scale better by using multi-cores hardware threads’ higher memory in order to meet demands for faster performance and efficiency. Software has to be designed to include methods of efficiently sharing the software’s functionality among multiple cores. Any application meant to be run in multi-core environment that does not take this into consideration during design would definitely end up with performance issues.
How to spread the tasks that will be executed on the multiple processors is the main headache when designing software to run on a multi-core system. The most common way to manage the problem is to share tasks by using a threading model in which tasks can be broken down to separate execution units to run on different processors in parallel. However, if the threads are independent of each other, their design does not have to include how they will work together, as in the case of two different applications running on a system as separate processes. Each application runs on its own core and doesn’t have any awareness of the other. System and application performance is not affected unless the applications contend for a resource such as shared system memory. This gives rise to another issue, how to manage shared memory in a multi-core system.
Memory management is the process of allocating and sharing available computer memory among various running programs when needed and freeing up the memory when the application process has ended. Efficient memory allocation is important to any system that is required to multi-task at any time.
Memory management is a function of the hardware, operating system (OS), and the applications being run.
Hardware Memory: Memory management in the hardware is the function of physical parts of the electronic motherboard that store data such as flash-based solid-state drives (SSDs), ATA/SATA disks, RAM chips and memory caches.
Operating System Memory: Memory management in the operating system requires the OS to constantly allocate and re-allocate memory to individual user programs on demand as they require it and reserve the memory when it is no longer required, after the application has been closed. When available memory is used up, additional applications will no longer be able to run on the system. Memory can be freed up by deleting surplus data and uninstalling rarely used applications.
Application Memory: Applications cannot define in advance how many units of memory they will require to operate when launched, hence they need a code to make memory requests on their behalf. The code requests ensure the availability of memory for each running program until they are closed.
Application memory management involves the combination of two related tasks, known as allocation and recycling.
Allocation: When an application needs memory, it requests a block of memory. Memory is then allocated to it by the memory manager called “the allocator.”
Recycling: When an application is closed and its data in previously allocated memory blocks are no longer needed, the memory blocks can be recycled and reassigned till needed again. Recycling can be done automatically by the memory manager or manually by the programmer.
Automatic Memory Management
This is either a part of the programming language used to build an application or an application extension that automatically recycles memory after the program has been closed or uninstalled. Automatic memory managers, also called “collectors,” work by recycling blocks that are unreachable by an application, e.g., when the application can no longer reach data that has been moved or deleted. In automatic mode, memory management is clearly more efficient. There are also fewer incidents of memory bugs. On the downside, memory may be erroneously retained because it is reachable by the application so the collector won’t recycle the block for use again.
Manual Memory Management
Manual memory management requires the programmer to manually recycle the system memory using a code to manage the control stack or by direct calls to the heap (a reserved area of computer memory that applications can use to store data temporarily). The collector does not work or recycle any memory without being launched by the programmer. While this makes it easier for the system administrator to know everything going on within the system, s/he will have to write codes continually and take regular inventory of the memory.
It is quite common for programmers who are faced with inefficient manual memory manager to either write code to duplicate the memory manager, recycle memory blocks internally, or allocate large memory blocks and split them for use. To write memory management codes, programmers could use FORTRAN, C++, COBOL, Pascal, etc. Conservative collection extensions may also be used.
Memory Management Problems
The main problem with memory management is identifying which data to keep, how long to keep it, and when to clear the data so that the memory can be freed up for reuse. Although this is a trivial issue, the fact that poor management of a system’s memory can affect the effectiveness and speed of running applications. Common memory management problems include:
API Complexity: The application program interface (API), when being designed, must take into consideration and properly design how the application will manage memory, especially objects if constant allocation of memory is required.
Premature Frees and Dangling Pointer: After being closed, applications are required to give up memory for recycling. In an attempt to access the memory later, the application could behave sluggishly, hang or crash. This situation is called premature free. The application ought to forget data as soon as it has given up the memory. The inability of a program to forget about previous memory is called dangling pointer. Both premature free and dangling pointer are more predominant with manual memory management.
Fragmentation: Memory fragmentation occurs when the memory allocator is unable to perform its job efficiently by skipping or randomly allocating memory blocks until the system runs out of space. Free memory is split into smaller blocks separated by memory blocks still in use. Fragmentation results in waste of storage space.
Memory Leak: This is when some applications are continually allocated memory every time they request it without giving up the memory after being closed. This situation is referred to as a memory leak.
Misplaced Locality of Reference: Access to memory is faster when the memory managers arrange memory blocks close to. This is referred to as locality of reference. A shorter distance means data can be sent back and forth faster. However, if the memory blocks are located far apart, this will likely affect applications’ performance.
How to Manage Multi-Core Systems Memory
Avoid Memory Contention: Memory contention is the situation in which two different programs try to make use of the same memory resources such as disk space, RAM, cache, or processing threads at the same time. This could result in deadlock or thrashing (when the memory is forced to constantly receive and store data in secondary storage blocks called pages).
Memory bus traffic and core interactions should be kept as low as possible by avoiding sharing storage drives and data. Access to shared memory can be regulated by queuing and using a good scheduler program.
Avoid Heap Contention: As stated earlier, the heap is a reserved area of computer memory that applications can use to store data temporarily and it is also shared among the cores during processing. Heap contention is one of the inconveniences associated with multi-core applications that require intensive memory allocation. To avoid head contention, a private heap may be installed on the system to avoid heap contention. The use of a private heap also improves multi-core systems performance compared to using only global heap.
Avoid False Sharing: False sharing occurs when two or more processors in a multi-core system are making use of the same cache line that is not related to their operations concurrently. The cache system could become confused and this might result in invalidating or rewriting the cached copy of other processors.
Different processors have different ways of dealing with false sharing problems. It can be avoided by carefully aligning data structures to suit cache line boundaries using a compiler alignment program used for compiling cache boundary for each processor. Another way of dealing with false sharing is by grouping frequently used fields of a data to ensure that they are located in the same cache line and can be easily accessed when needed.
Avoid Lock Contention: Lock contention occurs when a thread attempts to use the lock to a program that is already acquired by another thread. One technique used to avoid lock contention is the adoption of lock-free algorithms and concurrent data structure designs that eliminate locks and synchronization tools such as Mutex. Concurrent data structure algorithms do not need to incorporate synchronization mechanisms.
When traditional locking tools such as spinlocks are used, the locks should be broken into pieces instead of using global or monolithic locks. Thereby the locks protect a specific small area of the data structure. This helps multiple threads to concurrently make use of different locks instead of contending for one lock. This way better concurrency can be attained.
Figure 1: High-level architecture of an example single-core system (left), a dual-core system (middle), and an N-core system (right). The chip is shaded. The DRAM memory system, part of which is off chip, is encircled.
Figure 2: Multi-core System Die
*Image Credit: Thomas Moscibroda, Onur Mutlu: Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems; 16th USENIX Security Symposium, 5 July 2007/ Figure 1 image credit
*Gurudutt Kumar: Considerations in software design for multi-core multiprocessor architectures; IBM Developer Works, 20 May 2013
*Multi-Core Processor; https://en.wikipedia.org/wiki/Multi-core_processor
*The Memory Management Reference; Ravenbrook Limited, 2016. Available online at: http://www.memorymanagement.org/mmref/begin.html
*Thomas Moscibroda, Onur Mutlu: Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems; 16th USENIX Security Symposium, 5 July 2007/ Figure 1 image credit