In this paper we present how Intel's Single-Chip-Cloud processor behaves for parallel macro pipeline applications. Subsets of the SCC's available cores can be arranged as a pipeline where each core processes one stage of the overall workload. Each of the independent cores processes a small part of a larger task and feeds the following core with new data after it finishes its work. Our case-study is a parallel rendering system which renders successive images and applies different filters on them. On normal graphics adapters this is usually done in multiple cycles, we do this in a single pipeline pass. We show that we can achieve a significant speedup by using multiple parallel pipelines on the SCC. We show that we can further improve performance by using SCC's controlling PC in conjunction with the SCC. We also identify aspects of the SCC that hinder the overall performance, mainly the lack of local memory banks for each core on the SCC. The results presented in this paper are not limited to only image processing, but users could expect similar experiences where macro pipelining is used in other applications on the SCC.
The different versions of the original document can be found in: