Commit 9ba88b4d authored by Omran Saleh's avatar Omran Saleh
Browse files

minor changes for camera ready

parent 8b3a3fd5
A wide range of (near) real-time applications process stream-based data including
financial data analysis, traffic management, telecommunication monitoring, environmental monitoring, the smart grid, weather forecasting, and social media analysis, etc. These applications focus mainly on finding useful information and patterns on-the-fly as well as deriving valuable higher-level information from lower-level ones from continuously incoming data stream to report and monitor the progress of some activities. In the last few years, several systems for processing streams of information, where each offering their own processing solution, have been proposed. It is pioneered by academic systems such as Aurora and Borealis~\cite{Abadi:2003:ANM:950481.950485} and commercial systems like IBM InfoSphere Streams or StreamBase. Recently, some novel distributed stream computing platforms have been developed based on data parallelization approaches, which try to support scalable operation in cluster environments for processing massive data streams. Examples of these platforms Storm~\cite{storm}, Spark Streaming ~\cite{spark}, and Flink~\cite{flink}. Though, these engines provide abstractions for processing (possibly) infinite streams of data, they lack support for higher-level declarative languages. Some of these engines provide only a programming interface where operators and topologies have to be implemented in a programming language like Java or Scala. Moreover, to build a particular program (i.e., query) in these systems, the users should be expert and should have a deeper knowledge of the syntax and programming constructs of the language, especially, if the system supports multiple languages. Therefore, no time and effort savings can be achieved as the user needs to proceed by writing each programming statements correctly. To make the life much easier, the current trend in data analytics should be the adopting of the "Write once, run anywhere" slogan. This is a slogan first-mentioned by Sun Microsystems to illustrate that the Java code can be developed on any platform and be expected to run on any platform equipped with a java virtual machine (JVM). In general, the development of various stream processing engines raises the question whether we can provide an unified programming model or a standard language where the user can write one steam-processing script and he/she expects to execute this script on any stream-processing engines. By bringing all these things together, we provide a demonstration of our solution called \PipeFlow. In our \PipeFlow system, we address the following issues:
financial data analysis, traffic management, telecommunication monitoring, environmental monitoring, the smart grid, weather forecasting, and social media analysis, etc. These applications focus mainly on finding useful information and patterns on-the-fly as well as deriving valuable higher-level information from lower-level ones from continuously incoming data stream to report and monitor the progress of some activities. In the last few years, several systems for processing streams of information, where each offering their own processing solution, have been proposed. It is pioneered by academic systems such as Aurora and Borealis~\cite{Abadi:2003:ANM:950481.950485} and commercial systems like IBM InfoSphere Streams or StreamBase. Recently, some novel distributed stream computing platforms have been developed based on data parallelization approaches, which try to support scalable operation in cluster environments for processing massive data streams. Examples of these platforms are Storm~\cite{storm}, Spark Streaming ~\cite{spark}, and Flink~\cite{flink}. Though, these engines provide abstractions for processing (possibly) infinite streams of data, they lack support for higher-level declarative languages. Some of these engines provide only a programming interface where operators and topologies have to be implemented in a programming language like Java or Scala. Moreover, to build a particular program (i.e., query) in these systems, the users should be expert and should have a deeper knowledge of the syntax and programming constructs of the language, especially, if the system supports multiple languages. Therefore, no time and effort savings can be achieved as the user needs to proceed by writing each programming statements correctly. To make the life much easier, the current trend in data analytics should be the adopting of the "Write once, run anywhere" slogan. This is a slogan first-mentioned by Sun Microsystems to illustrate that the Java code can be developed on any platform and be expected to run on any platform equipped with a java virtual machine (JVM). In general, the development of various stream processing engines raises the question whether we can provide an unified programming model or a standard language where the user can write one stream processing script and he/she expects to execute this script on any stream processing engines. By bringing all these things together, we provide a demonstration of our solution called \PipeFlow. In our \PipeFlow system, we address the following issues:
\begin{itemize}
\item Developing a scripting language that provides most of the features of stream-processing scripting languages, e.g., Storm and Spark Streaming. Therefore, we have chosen a dataflow language called \PipeFlow. At the beginning, this language was intended to be used in conjunction with a stream processing engine called PipeFabric \cite{DBIS:SalBetSat14year2014,DBIS:SalSat14}. Later, it is extended to be used with other engines. The source script written in \PipeFlow language is parsed and compiled and then a target program (i.e., for Spark Streaming and Storm as well as PipeFabric) is generated based upon user's selection. This target program is equivalent in its functionalities to the original \PipeFlow script. Once the target program is generated, the user can execute this program in the specific engine.
\item Developing a scripting language that provides most of the features of stream processing scripting languages, e.g., Storm and Spark Streaming. Therefore, we have chosen a dataflow language called \PipeFlow. At the beginning, this language was intended to be used in conjunction with a stream processing engine called PipeFabric \cite{DBIS:SalBetSat14year2014,DBIS:SalSat14}. Later, it is extended to be used with other engines. The source script written in \PipeFlow language is parsed and compiled and then a target program (i.e., for Spark Streaming and Storm as well as PipeFabric) is generated based upon user's selection. This target program is equivalent in its functionalities to the original \PipeFlow script. Once the target program is generated, the user can execute this program in the specific engine.
\item Mapping or translating a \PipeFlow script into other programs necessitates the existing of each operator in \PipeFlow to be implemented in the target engine. Since \PipeFlow contains a set of predefined operators, all of these operators have been implemented directly or indirectly in that engine.
\item Providing a flexible architecture for users for extending the system by supporting more engines as well as new operators. These extensions should be integrated in the system smoothly.
\item Developing a front-end web application to enable users who have little experience in the \PipeFlow language to express the program and its associated processing algorithm and data pipeline graphically.
......
......@@ -34,7 +34,7 @@
%\crdata{0-12345-67-8/90/01} % Allows default copyright data (0-89791-88-6/97/05) to be over-ridden - IF NEED BE.
% --- End of Author Metadata ---
\title{The PipeFlow Approach: Write Once, Run in Different Stream-processing Engines}
\title{Demo: The PipeFlow Approach: Write Once, Run in Different Stream-processing Engines}
\numberofauthors{2} % in this sample file, there are *total*
% of EIGHT authors. SIX appear on the 'first-page' (for formatting
% reasons) and the remaining two appear in the \additionalauthors section.
......@@ -63,18 +63,28 @@ Omran Saleh\\
\email{kus@tu-ilmenau.de}
}
\newfont{\mycrnotice}{ptmr8t at 7pt}
\newfont{\myconfname}{ptmri8t at 7pt}
\let\crnotice\mycrnotice%
\let\confname\myconfname%
\permission{Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org.}
\conferenceinfo{DEBS'15,} {June 29 - July 3, 2015, Oslo, Norway.}
\copyrightetc{Copyright 2015 ACM \the\acmcopyr}
\crdata{978-1-4503-3286-6/15/06\ ...\$15.00.\\
http://}
\maketitle
\begin{abstract}
Recently, some distributed stream computing platforms have been
developed for processing massive data streams such as Storm and Spark Streaming. However, these platforms lack support for higher-level declarative languages and provide only a programming interface. Moreover, the users should be well versed of the syntax and programming constructs of each language in these platforms. In this paper, we are going to demonstrate our \PipeFlow system. In \PipeFlow system, the user can write a stream-processing script (i.e., query) using a higher-level dataflow language. This script can be translated to different stream-processing programs that run in the corresponding engines. In this case, the user is only willing to know a single language, thus, he/she can write one steam-processing script and expects to execute this script on different engines.
developed for processing massive data streams such as Storm and Spark Streaming. However, these platforms lack support for higher-level declarative languages and provide only a programming interface. Moreover, the users should be well versed of the syntax and programming constructs of each language in these platforms. In this paper, we are going to demonstrate our \PipeFlow system. In \PipeFlow system, the user can write a stream-processing script (i.e., query) using a higher-level dataflow language. This script can be translated to different stream-processing programs that run in the corresponding engines. In this case, the user is only willing to know a single language, thus, he/she can write one stream-processing script and expects to execute this script on different engines.
\end{abstract}
% % A category with the (minimum) three required fields
\category{H.2.4}{Database Management}{Systems - Query Processing}
\category{H.4}{Information Systems Applications}{Miscellaneous}
\category{I.5}{Pattern Recognition}{Miscellaneous}
\keywords{Data stream processing, Autostudio, Query processing, PipeFabric, Spark Streaming, Storm, Flink, PipeFlow}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment