Cloud 1 : A Performance Goal Oriented Processor Allocation Technique for Centralized Heterogeneous Multi-cluster Environments

Abstract

This paper proposes a processor allocation technique named temporal look-ahead processor allocation (TLPA) that makes allocation decision by evaluating the allocation effects on subsequent jobs in the waiting queue. TLPA has two strengths. First, it takes multiple performance factors into account when making allocation decision. Second, it can be used to optimize different performance metrics. To evaluate the performance of TLPA, we compare TLPA with best-fit and fastest-first algorithms. Simulation results show that TLPA has up to 32.75% performance improvement over conventional processor allocation algorithms in terms of average turnaround time in various system configurations.

 

Cloud 2 : Addressing Resource Fragmentation in Grids through Network-Aware Meta-scheduling in Advance

Abstract

Grids are made of heterogeneous computing resources geographically dispersed where providing Quality of Service (QoS) is a challenging task. One way of enhancing the QoS perceived by users is by performing scheduling of jobs in advance, since reservations of resources are not always possible. This way, it becomes more likely that the appropriate resources are available to run the job when needed. One drawback of this scenario is that fragmentation appears as a well known effect in job allocations into resources and becomes the cause for poor resource utilization. So, a new technique has been developed to tackle fragmentation problems, which consists of rescheduling already scheduled tasks. To this end, some heuristics are implemented to calculate the intervals to be replanned and to select the jobs involved in the process. Moreover, another heuristic is implemented to put rescheduled jobs as close together as possible to minimize the fragmentation. This technique has been tested using a real test bed.

 

Cloud 3 : APP: Minimizing Interference Using Aggressive Pipelined Prefetching in Multi-level Buffer Caches

Abstract

As services become more complex with multiple interactions, and storage servers are shared by multiple services, the different I/O streams arising from these multiple services compete for disk attention. Aggressive Pipelined Prefetching (APP) enabled storage clients are designed to manage the buffer cache and I/O streams to minimize the disk I/O-interference arising from competing streams. Due to the large number of streams serviced by a storage server, most of the disk time is spent seeking, leading to degradation in response times. The goal of APP is to decrease application execution time by increasing the throughput of individual I/O streams and utilizing idle capacity on remote nodes along with idle network times thus effectively avoiding alternating bursts of activity followed by periods of inactivity. APP significantly increases overall I/O throughput and decreases overall messaging overhead between servers. In APP, the intelligence is embedded in the clients and they automatically infer parameters in order to achieve the maximum throughput. APP clients make use of aggressive prefetching and data offloading to remote buffer caches in multi-level buffer cache hierarchies in an effort to minimize disk interference and tranquilize the effects of aggressive prefetching. We used an extremely I/O-intensive Radix-k application employed in studies on the scalability of parallel image composition and particle tracing developed at the Argonne National Laboratory with data sets of up to 128 GB and implemented our scheme on a 16-node Linux cluster. We observed that the execution time of the application decreased by 68% on average when using our scheme.

Cloud 4 : ASDF: An Autonomous and Scalable Distributed File System

Abstract

The demand for huge storage space on data-intensive applications and high-performance scientific computing continues to grow. To integrate massive distributed storage resources for providing huge storage space is an important and challenging issue in Cloud and Grid computing. In this paper, we propose a distributed file system, called ASDF, to meet the demands of not only data-intensive applications but also end users, developers and administrators. While sharing many of the same goals as previous distributed file systems such as scalability, reliability, and performance, it is also designed with the emphasis on compatibility, extensibility and autonomy. With the design goals in minds, we address several issues and present our design by adopting peer-to-peer technology, replication, multi-source data transfer, metadata caching and service-oriented architecture. The experimental results show the proposed distributed file system meet our design goals and will be useful in Cloud and Grid computing.

 

Cloud 5 : Assertion Based Parallel Debugging

Abstract

Programming languages have advanced tremendously over the years, but program debuggers have hardly changed. Sequential debuggers do little more than allow a user to control the flow of a program and examine its state. Parallel ones support the same operations on multiple processes, which are adequate with a small number of processors, but become unwieldy and ineffective on very large machines. Typical scientific codes have enormous multi-dimensional data structures and it is impractical to expect a user to view the data using traditional display techniques. In this paper we discuss the use of debug-time assertions, and show that these can be used to debug parallel programs. The techniques reduce the debugging complexity because they reason about the state of large arrays without requiring the user to know the expected value of every element. Assertions can be expensive to evaluate, but their performance can be improved by running them in parallel. We demonstrate the system with a case study finding errors in a parallel version of the Shallow Water Equations, and evaluate the performance of the tool on a 4,096 cores Cray XE6.

 

 

Cloud 6 : Autonomic SLA-Driven Provisioning for Cloud Applications

Abstract

Significant achievements have been made for automated allocation of cloud resources. However, the performance of applications may be poor in peak load periods, unless their cloud resources are dynamically adjusted. Moreover, although cloud resources dedicated to different applications are virtually isolated, performance fluctuations do occur because of resource sharing, and software or hardware failures (e.g. unstable virtual machines, power outages, etc.). In this paper, we propose a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. According to our approach, the dynamic economic fitness of a Web service determines whether it is replicated or migrated to another server, or deleted. The economic fitness of a Web service depends on its individual performance constraints, its load, and the utilization of the resources where it resides. Cascading performance objectives are dynamically calculated for individual tasks in the application workflow according to the user requirements. By fully implementing our framework, we experimentally proved that our adaptive approach statistically meets the performance objectives under peak load periods or failures, as opposed to static resource settings.

 

 

Cloud 7 : BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

Abstract

Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.

 

 

Cloud 8 : Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines

Abstract

In this paper we characterize the behavior with respect to memory locality management of scientific computing applications running in virtualized environments. NUMA locality on current solutions (KVM and Xen) is enforced by pinning virtual machines to CPUs and providing NUMA aware allocation in hyper visors. Our analysis shows that due to two-level memory management and lack of integration with page reclamation mechanisms, applications running on warm VMs suffer from a ``leakage'' of page locality. Our results using MPI, UPC and Open MP implementations of the NAS Parallel Benchmarks, running on Intel and AMD NUMA systems, indicate that applications observe an overall average performance degradation of 55% when compared to native. Runs on ``cold'' VMs suffer an average performance degradation of 27%, while subsequent runs are roughly 30% slower than the cold runs. We quantify the impact of locality improvement techniques designed for full virtualization environments: hyper visor level page remapping and partitioning the NUMA domains between multiple virtual machines. Our analysis shows that hyper visor only schemes have little or no potential for performance improvement. When the programming model allows it, system partitioning with proper VM and runtime support is able to re-produce native performance: in a partitioned system with one virtual machine per socket the average workload performance is 5% better than native.

 

Cloud 9 : Classification and Composition of QoS Attributes in Distributed, Heterogeneous Systems

Abstract

In large-scale distributed systems the selection of services and data sources to respond to a given request is a crucial task. Non-functional or Quality of Service (QoS) attributes need to be considered when there are several candidate services with identical functionality. Before applying any service selection optimization strategy, the system has to be analyzed in terms of QoS metrics, comparable to the statistics needed by a database query optimizer. This paper presents a classification approach for QoS attributes of system components, from which aggregation functions for composite services are derived. The applicability and usefulness of the approach is shown in a distributed system from a High-Energy Physics experiment posing a complex service selection challenge.

 

Cloud 10 : Cloud computing in Aircraft Data Network

Abstract

The introduction of data networks within an aircraft has created several service opportunities for the air carriers. Using the available Internet connectivity, the carriers could offer services like Video-on-Demand (VoD), Voice-over-IP (VoIP), and gaming-on-demand within the aircraft. One of the major road blocks in implementing any of these services is the additional hardware and software requirements. Each service requires dedicated hardware resources to run appropriate software components. It is not possible to accommodate every hardware component within the aircraft due to space, power, and ventilation restrictions. Also, it is economically not viable to install and maintain hardware components for every aircraft. One solution is to use cloud computing. Cloud computing is a recent innovation that is helping the computing industry in distributed computing. Cloud computing allows the organizations to consolidate several hardware resources into one physical device. The Cloud computing concept helps organizations in reducing the overall power consumption and maintenance costs. The cloud computing concept could be extended to the Aircraft Data Network environment with every aircraft subscribing to the cloud resources to run their non mission-critical applications. In this paper, the authors explore the possibility of using cloud services for Aircraft Data Networks. The authors evaluate the performance issues involved with the aircraft mobility and dynamic resource transfer between servers when the aircraft's point-of-attachment changes. The authors predict that using cloud computing concepts would encourage many carriers to offer new services within the aircraft.

 

Cloud 11 : Dealing with Grid-Computing Authorization Using Identity-Based Certificateless Proxy Signature

Abstract

In this paper, we propose a new Identity-Based Certificateless Proxy Signature scheme, for the grid environment, in order to enable attribute-based authorization, fine-grained delegation and enhanced delegation chain establishment and validation, all without relying on any kind of PKI Certificates or proxy certificates. We show that our scheme is correct and secure. We also give an evaluation of the computational and communication overhead of the proposed scheme. Simulations shows satisfying results.

 

Cloud 12 : Enabling Multi-physics Coupled Simulations within the PGAS Programming Framework

Abstract

Complex coupled multi-physics simulations are playing increasingly important roles in scientific and engineering applications such as fusion plasma and climate modeling. At the same time, extreme scales, high levels of concurrency and the advent of multicore and many core technologies are making the high-end parallel computing systems on which these simulations run, hard to program. While the Partitioned Global Address Space (PGAS) languages is attempting to address the problem, the PGAS model does not easily support the coupling of multiple application codes, which is necessary for the coupled multi-physics simulations. Furthermore, existing frameworks that support coupled simulations have been developed for fragmented programming models such as message passing, and are conceptually mismatched with the shared memory address space abstraction in the PGAS programming model. This paper explores how multi-physics coupled simulations can be supported within the PGAS programming framework. Specifically, in this paper, we present the design and implementation of the XpressSpace programming system, which enables efficient and productive development of coupled simulations across multiple independent PGAS Unified Parallel C (UPC) executables. XpressSpace provides the global-view style programming interface that is consistent with the memory model in UPC, and provides an efficient runtime system that can dynamically capture the data decomposition of global-view arrays and enable fast exchange of parallel data structures between coupled codes. In addition, XpressSpace provides the flexibility to define the coupling process in specification file that is independent of the program source codes. We evaluate the performance and scalability of Xpress Space prototype implementation using different coupling patterns extracted from real world multi-physics simulation scenarios, on the Jaguar Cray XT5 system of Oak Ridge National Laboratory.

 

Cloud 13 : EZTrace: A Generic Framework for Performance Analysis

Abstract

Modern supercomputers with multi-core nodes enhanced by accelerators, as well as hybrid programming models introduce more complexity in modern applications. Exploiting efficiently all the resources requires a complex analysis of the performance of applications in order to detect time-consuming sections. We present eztrace, a generic trace generation framework that aims at providing a simple way to analyze applications. eztrace is based on plugins that allow it to trace different programming models such as MPI, pthread or OpenMP as well as user-defined libraries or applications. eztrace uses two steps: one to collect the basic information during execution and one post-mortem analysis. This permits tracing the execution of applications with low overhead while allowing to refine the analysis after the execution. We also present a script language for eztrace that gives the user the opportunity to easily define the functions to instrument without modifying the source code of the application.

 

 

 

Cloud 14 : Implementing Trust in Cloud Infrastructures

Abstract

Today's cloud computing infrastructures usually require customers who transfer data into the cloud to trust the providers of the cloud infrastructure. Not every customer is willing to grant this trust without justification. It should be possible to detect that at least the configuration of the cloud infrastructure -- as provided in the form of a hyper visor and administrative domain software -- has not been changed without the customer's consent. We present a system that enables periodical and necessity-driven integrity measurements and remote attestations of vital parts of cloud computing infrastructures. Building on the analysis of several relevant attack scenarios, our system is implemented on top of the Xen Cloud Platform and makes use of trusted computing technology to provide security guarantees. We evaluate both security and performance of this system. We show how our system attests the integrity of a cloud infrastructure and detects all changes performed by system administrators in a typical software configuration, even in the presence of a simulated denial-of-service attack.

 

Cloud 15 : Improving Utilization of Infrastructure Clouds

Abstract

A key advantage of infrastructure-as-a-service (IaaS) clouds is providing users on-demand access to resources. To provide on-demand access, however, cloud providers must either significantly overprovision their infrastructure (and pay a high price for operating resources with low utilization) or reject a large proportion of user requests (in which case the access is no longer on-demand). At the same time, not all users require truly on-demand access to resources. Many applications and workflows are designed for recoverable systems where interruptions in service are expected. For instance, many scientists utilize high-throughput computing (HTC)-enabled resources, such as Condor, where jobs are dispatched to available resources and terminated when the resource is no longer available. We propose a cloud infrastructure that combines on-demand allocation of resources with opportunistic provisioning of cycles from idle cloud nodes to other processes by deploying backfill virtual machines (VMs). For demonstration and experimental evaluation, we extend the Nimbus cloud computing toolkit to deploy backfill VMs on idle cloud nodes for processing an HTC workload. Initial tests show an increase in IaaS cloud utilization from 37.5% to 100% during a portion of the evaluation trace but only 6.39% overhead cost for processing the HTC workload. We demonstrate that a shared infrastructure between IaaS cloud providers and an HTC job management system can be highly beneficial to both the IaaS cloud provider and HTC users by increasing the utilization of the cloud infrastructure (thereby decreasing the overall cost) and contributing cycles that would otherwise be idle to processing HTC jobs.

Cloud 16 : Inferring Network Topologies in Infrastructure as a Service Cloud

Abstract

Infrastructure as a Service (IaaS) clouds are gaining increasing popularity as a platform for distributed computations. The virtualization layers of those clouds offer new possibilities for rapid resource provisioning, but also hide aspects of the underlying IT infrastructure which have often been exploited in classic cluster environments. One of those hidden aspects is the network topology, i.e. the way the rented virtual machines are physically interconnected inside the cloud. We propose an approach to infer the network topology connecting a set of virtual machines in IaaS clouds and exploit it for data-intensive distributed applications. Our inference approach relies on delay-based end-to-end measurements and can be combined with traditional IP-level topology information, if available. We evaluate the inference accuracy using the popular hyper visors KVM as well as XEN and highlight possible performance gains for distributed applications.

 

Cloud 17 : MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm File System

Abstract

This paper proposes a design and implementation of an MPI-IO implementation of the Gfarm file system, called MPI-IO/Gfarm. The Gfarm file system is a global file system that federates the local storage of compute nodes among several clusters. It has a scale-out architecture designed to support distributed data-intensive computing. However Gfarm file system does not achieve scalable performance in the case of parallel writes to a single file, a typical file operation in MPI-IO. This paper proposes an optimization technique to improve the parallel write performance to a single file. In the evaluation, MPI-IO/Gfarm achieves scalable parallel I/O performance.

 

Cloud 18 : Multiple Services Throughput Optimization in a Hierarchical Middleware

Abstract

Accessing the power of distributed resources can nowadays easily be done using a middleware based on a client/server approach. Several architectures exist for those middleware's. The most scalable ones rely on a hierarchical design. Determining the best shape for the hierarchy, the one giving the best throughput of services, is not an easy task. We first propose a computation and communication model for such hierarchical middleware. Our model takes into account the deployment of several services in the hierarchy. Then, based on this model, we propose algorithms for automatically constructing a hierarchy on two kinds of heterogeneous platforms: communication homogeneous/computation heterogeneous platforms, and fully heterogeneous platforms. The proposed algorithms aim at offering the users the best obtained to requested throughput ratio, while providing fairness on this ratio for the different kinds of services, and using as few resources as possible for the hierarchy. For each kind of platforms, we compare our model with experimental results on a real middleware called DIET (Distributed Interactive Engineering Toolbox).

 

 

Cloud 19 : Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems

Abstract

One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically realized through direct messages between initiator and target processes. For peta scale systems with 10,000s of nodes and 100,000s of cores, these direct messages require dedicated communication buffers and/or channels, which can lead to significant scalability challenges for GAS programming models. In this paper, we describe a network-friendly communication model, multinode cooperation, to enable indirect one-sided communication. Compute nodes work together to handle one-side requests through (1) request forwarding in which one node can intercept a request and forward it to a target node, and (2) request aggregation in which one node can aggregate many requests to a target node. We have implemented multinode cooperation for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). Our experimental results on a large scale Cray XT5 system demonstrate that multinode cooperations able to greatly increase memory scalability by reducing communication buffers required on each node. In addition, multinode cooperation improves the resiliency of GAS runtime system to network contention. Furthermore, multinode cooperation can benefit the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.

 

Cloud 20 : On the Performance Variability of Production Cloud Services

Abstract

Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their owners the benefits of an economy of scale and, at the same time, become an alternative for both the industry and the scientific community to self-owned clusters, grids, and parallel production environments. For this potential to become reality, the first generation of commercial clouds need to be proven to be dependable. In this work we analyze the dependability of cloud services. Towards this end, we analyze long-term performance traces from Amazon Web Services and Google App Engine, currently two of the largest commercial clouds in production. We find that the performance of about half of the cloud services we investigate exhibits yearly and daily patterns, but also that most services have periods of especially stable performance. Last, through trace-based simulation we assess the impact of the variability observed for the studied cloud services on three large-scale applications, job execution in scientific computing, virtual goods trading in social networks, and state management in social gaming. We show that the impact of performance variability depends on the application, and give evidence that performance variability can be an important factor in cloud provider selection.

 

Cloud 21 : On the Relation between Congestion Control, Switch Arbitration and Fairness

Abstract

In loss less interconnection networks such as InfiniBand, congestion control (CC) can be an effective mechanism to achieve high performance and good utilization of network resources. The InfiniBand standard describes CC functionality for detecting and resolving congestion, but the design decisions on how to implement this functionallity is left to the hardware designer. One must be cautious when making these design decisions not to introduce fairness problems, as our study shows. In this paper we study the relationship between congestion control, switch arbitration, and fairness. Specifically, we look at fairness among different traffic flows arriving at a hot spot switch on different input ports, as CC is turned on. In addition we study the fairness among traffic flows at a switch where some flows are exclusive users of their input ports while other flows are sharing an input port (the parking lot problem). Our results show that the implementation of congestion control in a switch is vulnerable to unfairness if care is not taken. In detail, we found that a threshold hysteresis of more than one MTU is needed to resolve arbitration unfairness. Furthermore, to fully solve the parking lot problem, proper configuration of the CC parameters are required.

 

 

 

Cloud 22 : Open Social Based Collaborative Science Gateways

Abstract

In data-driven science projects, researchers distributed in different institutions often wish to easily team up for data and computing resource sharing to address challenging scientific problems. Typical VO based authorization schemes is not suitable for such a user organized scientific collaboration. Using the emerging OAuthprotocol, we introduce a novel group authorization scheme to support ad-hoc team formation and user controlled resource sharing. Integrating this group authorization scheme, we define an Open Social based scientific collaboration framework and develop a science gateway prototype named as Open Life Science Gateway (OLSGW) to verify and refine the framework. Our experience with development of the OLSGW shows that OAuth 2.0 based group authorization scheme is avery promising approach to resource sharing in Cloud environments, and the Open Social based framework can facilitate science gateway developers to create domain-specific collaborative applications in a very flexible way.

 

Cloud 23 : Optimized Management of Power and Performance for Virtualized Heterogeneous Server Clusters

Abstract

This paper proposes and evaluates an approach for power and performance management in virtualized server clusters. The major goal of our approach is to reduce power consumption in the cluster while meeting performance requirements. The contributions of this paper are: (1) a simple but effective way of modeling power consumption and capacity of servers even under heterogeneous and changing workloads, and (2) an optimization strategy based on a mixed integer programming model for achieving improvements on power-efficiency while providing performance guarantees in the virtualized cluster. In the optimization model, we address application workload balancing and the often ignored switching costs due to frequent and undesirable turning servers on/off and VM relocations. We show the effectiveness of the approach applied to a server cluster test bed. Our experiments show that our approach conserves about 50% of the energy required by a system designed for peak workload scenario, with little impact on the applications' performance goals. Also, by using prediction in our optimization strategy, further QoS improvement was achieved.

 

Cloud 24 : PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers

Abstract

Cache replacement policy plays an important role in guaranteeing the availability of cache blocks, reducing miss rates, and improving applications' overall performance. However, recent research efforts on improving replacement policies require either significant additional hardware or major modifications to the organization of the existing cache. In this study, we propose the PAC-PLRU cache replacement policy. PAC-PLRU not only utilizes but also judiciously salvages the prediction information discarded from a widely-adopted stride prefetcher. The main idea behind PAC-PLRU is utilizing the prediction results generated by the existing stride prefetcher and preventing these predicted cache blocks from being replaced in the near future. Experimental results show that leveraging the PAC-PLRU with a stride prefetcher reduces the average L2 cache miss rate by 91% over a baseline system with only PLRU policy, and by 22% over a system using PLRU with an unconnected stride prefetcher. Most importantly, PAC-PLRU only requires minor modifications to existing cache architecture to get these benefits. The proposed PAC-PLRU policy is promising in fostering the connection between prefetching and replacement policies, and have a lasting impact on improving the overall cache performance.

 

Cloud 25 : Performance under Failures of MapReduce Applications

Abstract

The MapReduce programming paradigm is gaining more and more popularity in recent years due to its ability in supporting easy programming, data distribution, as well as fault tolerance. Failure is an unwanted but inevitable fact that all large-scale parallel computing systems have to face with. MapReduce introduces a novel data replication and task reexecution strategy for fault tolerance. This study intends to lead a better understanding of such fault tolerance mechanisms. In particular, we build a stochastic performance model to quantify the impact of failures on MapReduce applications and to investigate its effectiveness under different computing environments. Simulations also have been carried out to verify the accuracy of the proposed model. Our results show that data replication is an effective approach even when failure rate is high, and the task migration mechanism of MapReduce works well in balancing the reliability difference among individual nodes. This work provides a theoretical foundation for optimizing large-scale MapReduce applications, especially when fault tolerance is the concern.

 

Cloud 26 : Small Discrete Fourier Transforms on GPUs

 

Abstract

Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data sizes. On the other hand, several applications perform multiple DFTs on small data sizes. In fact, even algorithms for large data sizes use a divide-and-conquer approach, where eventually small DFTs need to be performed. We discuss our DFT implementation, which is efficient for multiple small DFTs. One feature of our implementation is the use of the asymptotically slow matrix multiplication approach for small data sizes, which improves performance on the GPU due to its regular memory access and computational patterns. We combine this algorithm with the mixed radix algorithm for 1-D, 2-D, and 3-D complex DFTs. We also demonstrate the effect of different optimization techniques. When GPUs are used to accelerate a component of an application running on the host, it is important that decisions taken to optimize the GPU performance not affect the performance of the rest of the application on the host. One feature of our implementation is that we use a data layout that is not optimal for the GPU so that the overall effect on the application is better. Our implementation performs up to two orders of magnitude faster than cuFFT on an NVIDIA GeForce 9800 GTX GPU and up to one to two orders of magnitude faster than FFTW on a CPU for multiple small DFTs. Furthermore, we show that our implementation can accelerate the performance of a Quantum Monte Carlo application for which cuFFT is not effective. The primary contributions of this work lie in demonstrating the utility of the matrix multiplication approach and also in providing an implementation that is efficient for small DFTs when a GPU is used to accelerate an application running on the host.

 

Cloud 27 : Sophia: Local Trust for Securing Routing in DHTs

Abstract

Distributed Hash Tables (DHTs) have been used as a common building block in many distributed applications, including Cloud and Grid. However, there are still important security vulnerabilities that hinder their adoption in today's large-scale computing platforms. For instance, routing vulnerabilities have been a subject of intensive research but existing solutions rely on redundancy in lieu of improving the quality of routing paths. In this paper, we present Sophia, a novel generic security technique which combines iterative routing with local trust to fortify routing in DHTs. Sophia strictly benefits from first-hand observations about the success/failure of a node's own lookups to improve forwarding paths. Moreover, unlike redundant routing, Sophia dynamically protects routing without introducing additional network overhead. To the best of our knowledge, this is the first work which exploits a local trust system to fortify routing in DHTs. We compared the performance of Sophia with redundant routing in Kademlia DHT. We obtained significant improvements regarding routing resilience, self-adjustment and network traffic reduction.

 

Cloud 28 : Supporting Federated Multi-authority Security Models

Abstract

The JISC-funded Shintau project has produced an extension to the Shibboleth profile which allows a user to link information from more than one IdP together utilising a custom Linking Service (LS). This paper describes both the application and independent evaluation of this software by the Nationale-Science Centre (NeSC) at the University of Glasgow within the context of the ESRC-funded Data Management through e-Social Science (DAMES) project.

 

Cloud 29 : The Grid Observatory

Abstract

The goal of the Grid Observatory project (GO) is to contribute to an experimental theory of large grid systems by integrating the collection of data on the behaviour of the flagship European Grid Infrastructure (EGI) and its users, the development of models, and an ontology for the domain knowledge. The GO gives access to a database of grid usage traces available to the wider computer science community without the need of grid credentials. The paper presents the architecture of the digital curation process enacted by the GO and examples of their exploitation.

 

Cloud 30 : A Performance Goal Oriented Processor Allocation Technique for Centralized Heterogeneous Multi-cluster Environments

Abstract

This paper proposes a processor allocation technique named temporal look-ahead processor allocation (TLPA) that makes allocation decision by evaluating the allocation effects on subsequent jobs in the waiting queue. TLPA has two strengths. First, it takes multiple performance factors into account when making allocation decision. Second, it can be used to optimize different performance metrics. To evaluate the performance of TLPA, we compare TLPA with best-fit and fastest-first algorithms. Simulation results show that TLPA has up to 32.75% performance improvement over conventional processor allocation algorithms in terms of average turnaround time in various system configurations.

 

Cloud 31 : Addressing Resource Fragmentation in Grids through Network-Aware Meta-scheduling in Advance

Abstract

Grids are made of heterogeneous computing resources geographically dispersed where providing Quality of Service (QoS) is a challenging task. One way of enhancing the QoS perceived by users is by performing scheduling of jobs in advance, since reservations of resources are not always possible. This way, it becomes more likely that the appropriate resources are available to run the job when needed. One drawback of this scenario is that fragmentation appears as a well known effect in job allocations into resources and becomes the cause for poor resource utilization. So, a new technique has been developed to tackle fragmentation problems, which consists of rescheduling already scheduled tasks. To this end, some heuristics are implemented to calculate the intervals to be replanned and to select the jobs involved in the process. Moreover, another heuristic is implemented to put rescheduled jobs as close together as possible to minimize the fragmentation. This technique has been tested using a real test bed.

 

Cloud 32 : APP: Minimizing Interference Using Aggressive Pipelined Prefetching in Multi-level Buffer Caches

Abstract

As services become more complex with multiple interactions, and storage servers are shared by multiple services, the different I/O streams arising from these multiple services compete for disk attention. Aggressive Pipelined Prefetching (APP) enabled storage clients are designed to manage the buffer cache and I/O streams to minimize the disk I/O-interference arising from competing streams. Due to the large number of streams serviced by a storage server, most of the disk time is spent seeking, leading to degradation in response times. The goal of APP is to decrease application execution time by increasing the throughput of individual I/O streams and utilizing idle capacity on remote nodes along with idle network times thus effectively avoiding alternating bursts of activity followed by periods of inactivity. APP significantly increases overall I/O throughput and decreases overall messaging overhead between servers. In APP, the intelligence is embedded in the clients and they automatically infer parameters in order to achieve the maximum throughput. APP clients make use of aggressive prefetching and data offloading to remote buffer caches in multi-level buffer cache hierarchies in an effort to minimize disk interference and tranquilize the effects of aggressive prefetching. We used an extremely I/O-intensive Radix-k application employed in studies on the scalability of parallel image composition and particle tracing developed at the Argonne National Laboratory with data sets of up to 128 GB and implemented our scheme on a 16-node Linux cluster. We observed that the execution time of the application decreased by 68% on average when using our scheme.

Cloud 33 : ASDF: An Autonomous and Scalable Distributed File System

Abstract

The demand for huge storage space on data-intensive applications and high-performance scientific computing continues to grow. To integrate massive distributed storage resources for providing huge storage space is an important and challenging issue in Cloud and Grid computing. In this paper, we propose a distributed file system, called ASDF, to meet the demands of not only data-intensive applications but also end users, developers and administrators. While sharing many of the same goals as previous distributed file systems such as scalability, reliability, and performance, it is also designed with the emphasis on compatibility, extensibility and autonomy. With the design goals in minds, we address several issues and present our design by adopting peer-to-peer technology, replication, multi-source data transfer, metadata caching and service-oriented architecture. The experimental results show the proposed distributed file system meet our design goals and will be useful in Cloud and Grid computing.

 

Cloud 34 : Assertion Based Parallel Debugging

Abstract

Programming languages have advanced tremendously over the years, but program debuggers have hardly changed. Sequential debuggers do little more than allow a user to control the flow of a program and examine its state. Parallel ones support the same operations on multiple processes, which are adequate with a small number of processors, but become unwieldy and ineffective on very large machines. Typical scientific codes have enormous multi-dimensional data structures and it is impractical to expect a user to view the data using traditional display techniques. In this paper we discuss the use of debug-time assertions, and show that these can be used to debug parallel programs. The techniques reduce the debugging complexity because they reason about the state of large arrays without requiring the user to know the expected value of every element. Assertions can be expensive to evaluate, but their performance can be improved by running them in parallel. We demonstrate the system with a case study finding errors in a parallel version of the Shallow Water Equations, and evaluate the performance of the tool on a 4,096 cores Cray XE6.

 

 

Cloud 36 : Autonomic SLA-Driven Provisioning for Cloud Applications

Abstract

Significant achievements have been made for automated allocation of cloud resources. However, the performance of applications may be poor in peak load periods, unless their cloud resources are dynamically adjusted. Moreover, although cloud resources dedicated to different applications are virtually isolated, performance fluctuations do occur because of resource sharing, and software or hardware failures (e.g. unstable virtual machines, power outages, etc.). In this paper, we propose a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. According to our approach, the dynamic economic fitness of a Web service determines whether it is replicated or migrated to another server, or deleted. The economic fitness of a Web service depends on its individual performance constraints, its load, and the utilization of the resources where it resides. Cascading performance objectives are dynamically calculated for individual tasks in the application workflow according to the user requirements. By fully implementing our framework, we experimentally proved that our adaptive approach statistically meets the performance objectives under peak load periods or failures, as opposed to static resource settings.

 

 

Cloud 37 : BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

Abstract

Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.

 

 

 

Cloud 38 : Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines

Abstract

In this paper we characterize the behavior with respect to memory locality management of scientific computing applications running in virtualized environments. NUMA locality on current solutions (KVM and Xen) is enforced by pinning virtual machines to CPUs and providing NUMA aware allocation in hyper visors. Our analysis shows that due to two-level memory management and lack of integration with page reclamation mechanisms, applications running on warm VMs suffer from a ``leakage'' of page locality. Our results using MPI, UPC and Open MP implementations of the NAS Parallel Benchmarks, running on Intel and AMD NUMA systems, indicate that applications observe an overall average performance degradation of 55% when compared to native. Runs on ``cold'' VMs suffer an average performance degradation of 27%, while subsequent runs are roughly 30% slower than the cold runs. We quantify the impact of locality improvement techniques designed for full virtualization environments: hyper visor level page remapping and partitioning the NUMA domains between multiple virtual machines. Our analysis shows that hyper visor only schemes have little or no potential for performance improvement. When the programming model allows it, system partitioning with proper VM and runtime support is able to re-produce native performance: in a partitioned system with one virtual machine per socket the average workload performance is 5% better than native.

 

Cloud 39 : Classification and Composition of QoS Attributes in Distributed, Heterogeneous Systems

Abstract

In large-scale distributed systems the selection of services and data sources to respond to a given request is a crucial task. Non-functional or Quality of Service (QoS) attributes need to be considered when there are several candidate services with identical functionality. Before applying any service selection optimization strategy, the system has to be analyzed in terms of QoS metrics, comparable to the statistics needed by a database query optimizer. This paper presents a classification approach for QoS attributes of system components, from which aggregation functions for composite services are derived. The applicability and usefulness of the approach is shown in a distributed system from a High-Energy Physics experiment posing a complex service selection challenge.

 

Cloud 40 : Cloud computing in Aircraft Data Network

Abstract

The introduction of data networks within an aircraft has created several service opportunities for the air carriers. Using the available Internet connectivity, the carriers could offer services like Video-on-Demand (VoD), Voice-over-IP (VoIP), and gaming-on-demand within the aircraft. One of the major road blocks in implementing any of these services is the additional hardware and software requirements. Each service requires dedicated hardware resources to run appropriate software components. It is not possible to accommodate every hardware component within the aircraft due to space, power, and ventilation restrictions. Also, it is economically not viable to install and maintain hardware components for every aircraft. One solution is to use cloud computing. Cloud computing is a recent innovation that is helping the computing industry in distributed computing. Cloud computing allows the organizations to consolidate several hardware resources into one physical device. The Cloud computing concept helps organizations in reducing the overall power consumption and maintenance costs. The cloud computing concept could be extended to the Aircraft Data Network environment with every aircraft subscribing to the cloud resources to run their non mission-critical applications. In this paper, the authors explore the possibility of using cloud services for Aircraft Data Networks. The authors evaluate the performance issues involved with the aircraft mobility and dynamic resource transfer between servers when the aircraft's point-of-attachment changes. The authors predict that using cloud computing concepts would encourage many carriers to offer new services within the aircraft.

 

Cloud 41 : Dealing with Grid-Computing Authorization Using Identity-Based Certificateless Proxy Signature

Abstract

In this paper, we propose a new Identity-Based Certificateless Proxy Signature scheme, for the grid environment, in order to enable attribute-based authorization, fine-grained delegation and enhanced delegation chain establishment and validation, all without relying on any kind of PKI Certificates or proxy certificates. We show that our scheme is correct and secure. We also give an evaluation of the computational and communication overhead of the proposed scheme. Simulations shows satisfying results.

 

Cloud 42 : Enabling Multi-physics Coupled Simulations within the PGAS Programming Framework

Abstract

Complex coupled multi-physics simulations are playing increasingly important roles in scientific and engineering applications such as fusion plasma and climate modeling. At the same time, extreme scales, high levels of concurrency and the advent of multicore and many core technologies are making the high-end parallel computing systems on which these simulations run, hard to program. While the Partitioned Global Address Space (PGAS) languages is attempting to address the problem, the PGAS model does not easily support the coupling of multiple application codes, which is necessary for the coupled multi-physics simulations. Furthermore, existing frameworks that support coupled simulations have been developed for fragmented programming models such as message passing, and are conceptually mismatched with the shared memory address space abstraction in the PGAS programming model. This paper explores how multi-physics coupled simulations can be supported within the PGAS programming framework. Specifically, in this paper, we present the design and implementation of the XpressSpace programming system, which enables efficient and productive development of coupled simulations across multiple independent PGAS Unified Parallel C (UPC) executables. XpressSpace provides the global-view style programming interface that is consistent with the memory model in UPC, and provides an efficient runtime system that can dynamically capture the data decomposition of global-view arrays and enable fast exchange of parallel data structures between coupled codes. In addition, XpressSpace provides the flexibility to define the coupling process in specification file that is independent of the program source codes. We evaluate the performance and scalability of Xpress Space prototype implementation using different coupling patterns extracted from real world multi-physics simulation scenarios, on the Jaguar Cray XT5 system of Oak Ridge National Laboratory.

 

Cloud 43 : EZTrace: A Generic Framework for Performance Analysis

Abstract

Modern supercomputers with multi-core nodes enhanced by accelerators, as well as hybrid programming models introduce more complexity in modern applications. Exploiting efficiently all the resources requires a complex analysis of the performance of applications in order to detect time-consuming sections. We present eztrace, a generic trace generation framework that aims at providing a simple way to analyze applications. eztrace is based on plugins that allow it to trace different programming models such as MPI, pthread or OpenMP as well as user-defined libraries or applications. eztrace uses two steps: one to collect the basic information during execution and one post-mortem analysis. This permits tracing the execution of applications with low overhead while allowing to refine the analysis after the execution. We also present a script language for eztrace that gives the user the opportunity to easily define the functions to instrument without modifying the source code of the application.

 

 

 

Cloud 44 : Implementing Trust in Cloud Infrastructures

Abstract

Today's cloud computing infrastructures usually require customers who transfer data into the cloud to trust the providers of the cloud infrastructure. Not every customer is willing to grant this trust without justification. It should be possible to detect that at least the configuration of the cloud infrastructure -- as provided in the form of a hyper visor and administrative domain software -- has not been changed without the customer's consent. We present a system that enables periodical and necessity-driven integrity measurements and remote attestations of vital parts of cloud computing infrastructures. Building on the analysis of several relevant attack scenarios, our system is implemented on top of the Xen Cloud Platform and makes use of trusted computing technology to provide security guarantees. We evaluate both security and performance of this system. We show how our system attests the integrity of a cloud infrastructure and detects all changes performed by system administrators in a typical software configuration, even in the presence of a simulated denial-of-service attack.

 

Cloud 45: Improving Utilization of Infrastructure Clouds

Abstract

A key advantage of infrastructure-as-a-service (IaaS) clouds is providing users on-demand access to resources. To provide on-demand access, however, cloud providers must either significantly overprovision their infrastructure (and pay a high price for operating resources with low utilization) or reject a large proportion of user requests (in which case the access is no longer on-demand). At the same time, not all users require truly on-demand access to resources. Many applications and workflows are designed for recoverable systems where interruptions in service are expected. For instance, many scientists utilize high-throughput computing (HTC)-enabled resources, such as Condor, where jobs are dispatched to available resources and terminated when the resource is no longer available. We propose a cloud infrastructure that combines on-demand allocation of resources with opportunistic provisioning of cycles from idle cloud nodes to other processes by deploying backfill virtual machines (VMs). For demonstration and experimental evaluation, we extend the Nimbus cloud computing toolkit to deploy backfill VMs on idle cloud nodes for processing an HTC workload. Initial tests show an increase in IaaS cloud utilization from 37.5% to 100% during a portion of the evaluation trace but only 6.39% overhead cost for processing the HTC workload. We demonstrate that a shared infrastructure between IaaS cloud providers and an HTC job management system can be highly beneficial to both the IaaS cloud provider and HTC users by increasing the utilization of the cloud infrastructure (thereby decreasing the overall cost) and contributing cycles that would otherwise be idle to processing HTC jobs.

Cloud 46 : Inferring Network Topologies in Infrastructure as a Service Cloud

Abstract

Infrastructure as a Service (IaaS) clouds are gaining increasing popularity as a platform for distributed computations. The virtualization layers of those clouds offer new possibilities for rapid resource provisioning, but also hide aspects of the underlying IT infrastructure which have often been exploited in classic cluster environments. One of those hidden aspects is the network topology, i.e. the way the rented virtual machines are physically interconnected inside the cloud. We propose an approach to infer the network topology connecting a set of virtual machines in IaaS clouds and exploit it for data-intensive distributed applications. Our inference approach relies on delay-based end-to-end measurements and can be combined with traditional IP-level topology information, if available. We evaluate the inference accuracy using the popular hyper visors KVM as well as XEN and highlight possible performance gains for distributed applications.

 

Cloud 47 : MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm File System

Abstract

This paper proposes a design and implementation of an MPI-IO implementation of the Gfarm file system, called MPI-IO/Gfarm. The Gfarm file system is a global file system that federates the local storage of compute nodes among several clusters. It has a scale-out architecture designed to support distributed data-intensive computing. However Gfarm file system does not achieve scalable performance in the case of parallel writes to a single file, a typical file operation in MPI-IO. This paper proposes an optimization technique to improve the parallel write performance to a single file. In the evaluation, MPI-IO/Gfarm achieves scalable parallel I/O performance.

 

Cloud 48 : Multiple Services Throughput Optimization in a Hierarchical Middleware

Abstract

Accessing the power of distributed resources can nowadays easily be done using a middleware based on a client/server approach. Several architectures exist for those middleware's. The most scalable ones rely on a hierarchical design. Determining the best shape for the hierarchy, the one giving the best throughput of services, is not an easy task. We first propose a computation and communication model for such hierarchical middleware. Our model takes into account the deployment of several services in the hierarchy. Then, based on this model, we propose algorithms for automatically constructing a hierarchy on two kinds of heterogeneous platforms: communication homogeneous/computation heterogeneous platforms, and fully heterogeneous platforms. The proposed algorithms aim at offering the users the best obtained to requested throughput ratio, while providing fairness on this ratio for the different kinds of services, and using as few resources as possible for the hierarchy. For each kind of platforms, we compare our model with experimental results on a real middleware called DIET (Distributed Interactive Engineering Toolbox).

 

 

Cloud 49 : Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems

Abstract

One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically realized through direct messages between initiator and target processes. For peta scale systems with 10,000s of nodes and 100,000s of cores, these direct messages require dedicated communication buffers and/or channels, which can lead to significant scalability challenges for GAS programming models. In this paper, we describe a network-friendly communication model, multinode cooperation, to enable indirect one-sided communication. Compute nodes work together to handle one-side requests through (1) request forwarding in which one node can intercept a request and forward it to a target node, and (2) request aggregation in which one node can aggregate many requests to a target node. We have implemented multinode cooperation for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). Our experimental results on a large scale Cray XT5 system demonstrate that multinode cooperations able to greatly increase memory scalability by reducing communication buffers required on each node. In addition, multinode cooperation improves the resiliency of GAS runtime system to network contention. Furthermore, multinode cooperation can benefit the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.

 

Cloud 50 : On the Performance Variability of Production Cloud Services

Abstract

Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their owners the benefits of an economy of scale and, at the same time, become an alternative for both the industry and the scientific community to self-owned clusters, grids, and parallel production environments. For this potential to become reality, the first generation of commercial clouds need to be proven to be dependable. In this work we analyze the dependability of cloud services. Towards this end, we analyze long-term performance traces from Amazon Web Services and Google App Engine, currently two of the largest commercial clouds in production. We find that the performance of about half of the cloud services we investigate exhibits yearly and daily patterns, but also that most services have periods of especially stable performance. Last, through trace-based simulation we assess the impact of the variability observed for the studied cloud services on three large-scale applications, job execution in scientific computing, virtual goods trading in social networks, and state management in social gaming. We show that the impact of performance variability depends on the application, and give evidence that performance variability can be an important factor in cloud provider selection.

 

Cloud 51 : On the Relation between Congestion Control, Switch Arbitration and Fairness

Abstract

In loss less interconnection networks such as InfiniBand, congestion control (CC) can be an effective mechanism to achieve high performance and good utilization of network resources. The InfiniBand standard describes CC functionality for detecting and resolving congestion, but the design decisions on how to implement this functionallity is left to the hardware designer. One must be cautious when making these design decisions not to introduce fairness problems, as our study shows. In this paper we study the relationship between congestion control, switch arbitration, and fairness. Specifically, we look at fairness among different traffic flows arriving at a hot spot switch on different input ports, as CC is turned on. In addition we study the fairness among traffic flows at a switch where some flows are exclusive users of their input ports while other flows are sharing an input port (the parking lot problem). Our results show that the implementation of congestion control in a switch is vulnerable to unfairness if care is not taken. In detail, we found that a threshold hysteresis of more than one MTU is needed to resolve arbitration unfairness. Furthermore, to fully solve the parking lot problem, proper configuration of the CC parameters are required.

 

 

 

Cloud 52 : Open Social Based Collaborative Science Gateways

Abstract

In data-driven science projects, researchers distributed in different institutions often wish to easily team up for data and computing resource sharing to address challenging scientific problems. Typical VO based authorization schemes is not suitable for such a user organized scientific collaboration. Using the emerging OAuthprotocol, we introduce a novel group authorization scheme to support ad-hoc team formation and user controlled resource sharing. Integrating this group authorization scheme, we define an Open Social based scientific collaboration framework and develop a science gateway prototype named as Open Life Science Gateway (OLSGW) to verify and refine the framework. Our experience with development of the OLSGW shows that OAuth 2.0 based group authorization scheme is avery promising approach to resource sharing in Cloud environments, and the Open Social based framework can facilitate science gateway developers to create domain-specific collaborative applications in a very flexible way.

 

Cloud 53 : Optimized Management of Power and Performance for Virtualized Heterogeneous Server Clusters

Abstract

This paper proposes and evaluates an approach for power and performance management in virtualized server clusters. The major goal of our approach is to reduce power consumption in the cluster while meeting performance requirements. The contributions of this paper are: (1) a simple but effective way of modeling power consumption and capacity of servers even under heterogeneous and changing workloads, and (2) an optimization strategy based on a mixed integer programming model for achieving improvements on power-efficiency while providing performance guarantees in the virtualized cluster. In the optimization model, we address application workload balancing and the often ignored switching costs due to frequent and undesirable turning servers on/off and VM relocations. We show the effectiveness of the approach applied to a server cluster test bed. Our experiments show that our approach conserves about 50% of the energy required by a system designed for peak workload scenario, with little impact on the applications' performance goals. Also, by using prediction in our optimization strategy, further QoS improvement was achieved.

 

Cloud 54 : PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers

Abstract

Cache replacement policy plays an important role in guaranteeing the availability of cache blocks, reducing miss rates, and improving applications' overall performance. However, recent research efforts on improving replacement policies require either significant additional hardware or major modifications to the organization of the existing cache. In this study, we propose the PAC-PLRU cache replacement policy. PAC-PLRU not only utilizes but also judiciously salvages the prediction information discarded from a widely-adopted stride prefetcher. The main idea behind PAC-PLRU is utilizing the prediction results generated by the existing stride prefetcher and preventing these predicted cache blocks from being replaced in the near future. Experimental results show that leveraging the PAC-PLRU with a stride prefetcher reduces the average L2 cache miss rate by 91% over a baseline system with only PLRU policy, and by 22% over a system using PLRU with an unconnected stride prefetcher. Most importantly, PAC-PLRU only requires minor modifications to existing cache architecture to get these benefits. The proposed PAC-PLRU policy is promising in fostering the connection between prefetching and replacement policies, and have a lasting impact on improving the overall cache performance.

 

Cloud 55 : Performance under Failures of MapReduce Applications

Abstract

The MapReduce programming paradigm is gaining more and more popularity in recent years due to its ability in supporting easy programming, data distribution, as well as fault tolerance. Failure is an unwanted but inevitable fact that all large-scale parallel computing systems have to face with. MapReduce introduces a novel data replication and task reexecution strategy for fault tolerance. This study intends to lead a better understanding of such fault tolerance mechanisms. In particular, we build a stochastic performance model to quantify the impact of failures on MapReduce applications and to investigate its effectiveness under different computing environments. Simulations also have been carried out to verify the accuracy of the proposed model. Our results show that data replication is an effective approach even when failure rate is high, and the task migration mechanism of MapReduce works well in balancing the reliability difference among individual nodes. This work provides a theoretical foundation for optimizing large-scale MapReduce applications, especially when fault tolerance is the concern.

 

Cloud 56 : Small Discrete Fourier Transforms on GPUs

 

Abstract

Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data sizes. On the other hand, several applications perform multiple DFTs on small data sizes. In fact, even algorithms for large data sizes use a divide-and-conquer approach, where eventually small DFTs need to be performed. We discuss our DFT implementation, which is efficient for multiple small DFTs. One feature of our implementation is the use of the asymptotically slow matrix multiplication approach for small data sizes, which improves performance on the GPU due to its regular memory access and computational patterns. We combine this algorithm with the mixed radix algorithm for 1-D, 2-D, and 3-D complex DFTs. We also demonstrate the effect of different optimization techniques. When GPUs are used to accelerate a component of an application running on the host, it is important that decisions taken to optimize the GPU performance not affect the performance of the rest of the application on the host. One feature of our implementation is that we use a data layout that is not optimal for the GPU so that the overall effect on the application is better. Our implementation performs up to two orders of magnitude faster than cuFFT on an NVIDIA GeForce 9800 GTX GPU and up to one to two orders of magnitude faster than FFTW on a CPU for multiple small DFTs. Furthermore, we show that our implementation can accelerate the performance of a Quantum Monte Carlo application for which cuFFT is not effective. The primary contributions of this work lie in demonstrating the utility of the matrix multiplication approach and also in providing an implementation that is efficient for small DFTs when a GPU is used to accelerate an application running on the host.

 

Cloud 57 : Sophia: Local Trust for Securing Routing in DHTs

Abstract

Distributed Hash Tables (DHTs) have been used as a common building block in many distributed applications, including Cloud and Grid. However, there are still important security vulnerabilities that hinder their adoption in today's large-scale computing platforms. For instance, routing vulnerabilities have been a subject of intensive research but existing solutions rely on redundancy in lieu of improving the quality of routing paths. In this paper, we present Sophia, a novel generic security technique which combines iterative routing with local trust to fortify routing in DHTs. Sophia strictly benefits from first-hand observations about the success/failure of a node's own lookups to improve forwarding paths. Moreover, unlike redundant routing, Sophia dynamically protects routing without introducing additional network overhead. To the best of our knowledge, this is the first work which exploits a local trust system to fortify routing in DHTs. We compared the performance of Sophia with redundant routing in Kademlia DHT. We obtained significant improvements regarding routing resilience, self-adjustment and network traffic reduction.

 

Cloud 58 : Supporting Federated Multi-authority Security Models

Abstract

The JISC-funded Shintau project has produced an extension to the Shibboleth profile which allows a user to link information from more than one IdP together utilising a custom Linking Service (LS). This paper describes both the application and independent evaluation of this software by the Nationale-Science Centre (NeSC) at the University of Glasgow within the context of the ESRC-funded Data Management through e-Social Science (DAMES) project.

 

Cloud 59 : The Grid Observatory

Abstract

The goal of the Grid Observatory project (GO) is to contribute to an experimental theory of large grid systems by integrating the collection of data on the behaviour of the flagship European Grid Infrastructure (EGI) and its users, the development of models, and an ontology for the domain knowledge. The GO gives access to a database of grid usage traces available to the wider computer science community without the need of grid credentials. The paper presents the architecture of the digital curation process enacted by the GO and examples of their exploitation.

 

 

Cloud 60 : A Flexible Policy Framework for the QoS Differentiated Provisioning of Services

Abstract

We propose a policy-based framework for the QoS differentiated provisioning of services. The proposed frame-work improves the state-of-the-art in policy-based preference specification by combining cardinal and ordinal preferences. We describe the underlying models, focussing on the key features and contributions of the proposed framework. We also show how, using our framework, the QoS evaluation problem can be translated to a Constraint Satisfaction Problem while preserving the semantics of the preference policies.

 

Cloud 61 : Towards Real-Time, Volunteer Distributed Computing

Abstract

Many large-scale distributed computing applications demand real-time responses by soft deadlines. To enable such real-time task distribution and execution on the volunteer resources, we previously proposed the design of the real-time volunteer computing platform called RT-BOINC. The system gives low O(1) worst-case execution time for task management operations, such as task scheduling, state transitioning, and validation. In this work, we present a full implementation RT-BOINC, adding new features including deadline timer and parameter-based admission control. We evaluate RT-BOINC at large scale using two real-time applications, namely, the games Go and Chess. The results of our case study show that RT-BOINC provides much better performance than the original BOINC in terms of average and worst-case response time, scalability and efficiency.

 

Cloud 62 : Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms

Abstract

Scientific workflows are commonplace in eScience applications. Yet, the lack of integrated support for data models, including streaming data, structured collections and files, is limiting the ability of workflows to support emerging applications in energy informatics that are stream oriented. This is compounded by the absence of Cloud data services that support reliable and performant streams. In this paper, we propose and present a scientific workflow framework that supports streams as first-class data, and is optimized for performant and reliable execution across desktop and Cloud platforms. The workflow framework features and its empirical evaluation on a private Eucalyptus cloud are presented.

 

 

 

Cloud 63 : Unifying Cloud Management: Towards Overall Governance of Business Level Objectives

Abstract

We address the challenge of providing unified cloud resource management towards an overall business level objective, given the multitude of managerial tasks to be performed and the complexity of any architecture to support them. Resource level management tasks include elasticity control, virtual machine and data placement, autonomous fault management, etc, which are intrinsically difficult problems since services normally have unknown lifetime and capacity demands that varies largely over time. To unify the management of these problems, (for optimization with respect to some higher level business level objective, like optimizing revenue while breaking no more than a certain percentage of service level agreements)becomes even more challenging as the resource level managerial challenges are far from independent. After providing the general problem formulation, we review recent approaches taken by the research community, including mainly general autonomic computing technology for large-scale environments and resource level management tools equipped with some business oriented or otherwise qualitative features. We propose and illustrate a policy-driven approach where a high-level management system monitors overall system and services behavior and adjusts lower level policies (e.g., thresholds for admission control, elasticity control, server consolidation level, etc) for optimization towards the measurable business level objectives.

 

Cloud 64 : Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing

 

Cloud Computing has been envisioned as the next-generation architecture of IT Enterprise. It moves the application software and databases to the centralized large data centers, where the management of the data and services may not be fully trustworthy. This unique paradigm brings about many new security challenges, which have not been well understood. This work studies the problem of ensuring the integrity of data storage in Cloud Computing. In particular, we consider the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. The introduction of TPA eliminates the involvement of the client through the auditing of whether his data stored in the cloud are indeed intact, which can be important in achieving economies of scale for Cloud Computing. The support for data dynamics via the most general forms of data operation, such as block modification, insertion, and deletion, is also a significant step toward practicality, since services in Cloud Computing are not limited to archive or backup data only. While prior works on ensuring remote data integrity often lacks the support of either public auditability or dynamic data operations, this paper achieves both. We first identify the difficulties and potential security problems of direct extensions with fully dynamic data updates from prior works and then show how to construct an elegant verification scheme for the seamless integration of these two salient features in our protocol design. In particular, to achieve efficient data dynamics, we improve the existing proof of storage models by manipulating the classic Merkle Hash Tree construction for block tag authentication. To support efficient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multiuser setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis show that the proposed schemes are highly efficient and provably secure

 

Cloud 65 : Multicloud Deployment of Computing Clusters for Loosely Coupled MTC Applications

 

Cloud computing is gaining acceptance in many IT organizations, as an elastic, flexible, and variable-cost way to deploy their service platforms using outsourced resources. Unlike traditional utilities where a single provider scheme is a common practice, the ubiquitous access to cloud resources easily enables the simultaneous use of different clouds. In this paper, we explore this scenario to deploy a computing cluster on the top of a multicloud infrastructure, for solving loosely coupled Many-Task Computing (MTC) applications. In this way, the cluster nodes can be provisioned with resources from different clouds to improve the cost effectiveness of the deployment, or to implement high-availability strategies. We prove the viability of this kind of solutions by evaluating the scalability, performance, and cost of different configurations of a Sun Grid Engine cluster, deployed on a multicloud infrastructure spanning a local data center and three different cloud sites: Amazon EC2 Europe, Amazon EC2 US, and ElasticHosts. Although the testbed deployed in this work is limited to a reduced number of computing resources (due to hardware and budget limitations), we have complemented our analysis with a simulated infrastructure model, which includes a larger number of resources, and runs larger problem sizes. Data obtained by simulation show that performance and cost results can be extrapolated to large-scale problems and cluster infrastructures.

 

Cloud 66 : Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud

 

In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.

 

 

 

 

Cloud 67: Robust Execution of Service Workflows Using Redundancy and Advance Reservations

 

In this paper, we develop a novel algorithm that allows service consumers to execute business processes (or workflows) of interdependent services in a dependable manner within tight time-constraints. In particular, we consider large interorganizational service-oriented systems, where services are offered by external organizations that demand financial remuneration and where their use has to be negotiated in advance using explicit service-level agreements (as is common in Grids and cloud computing). Here, different providers often offer the same type of service at varying levels of quality and price. Furthermore, some providers may be less trustworthy than others, possibly failing to meet their agreements. To control this unreliability and ensure end-to-end dependability while maximizing the profit obtained from completing a business process, our algorithm automatically selects the most suitable providers. Moreover, unlike existing work, it reasons about the dependability properties of a workflow, and it controls these by using service redundancy for critical tasks and by planning for contingencies. Finally, our algorithm reserves services for only parts of its workflow at any time, in order to retain flexibility when failures occur. We show empirically that our algorithm consistently outperforms existing approaches, achieving up to a 35-fold increase in profit and successfully completing most workflows, even when the majority of providers fail.