Commit c77d1352 authored by Omran Saleh's avatar Omran Saleh
Browse files

add changes

parent 3bb2e34b
To provide a more abstraction level, we have developed a web application called \textbf{AutoStudio}. It is a very user friendly and easy to use web application to run cross-platform using HTML5, Draw2D touch\footnote{\url{http://www.draw2d.org/draw2d/index.html}}, and node.js\footnote{\url{http://www.nodejs.org}}. AutoStudio has several functionalities:
\begin{itemize}
\item It enables users to leverage the emerging \PipeFlow language graphically via a collection of operators (represented by icons) which could be ``dragged and dropped'' onto a drawing canvas. The user can assemble the operators in order to create a dataflow graph in a logical way and visually show how they are related, and from this graph, equivalent \PipeFlow script can be generated. By clicking on the operator icon, a pop-up window appears to let the user specify the parameters of \PipeFlow operators, which are required. Moreover, the user can display the help contents for each operator.
\item Contacting the \PipeFlow system to generate the right script (e.g., storm, spark, etc. scripts) based upon the user's selection of language from the dashboard page. This makes the user to be not aware of any of stream-processing languages syntax including \PipeFlow and their constructs.
\item It also has a feature of script execution via calling the respective engine and displaying script execution result instantly and in real-time as well as emailing the user when the execution is complete. The application provides the options of saving the generated scripts or flow-designs for future reference, loading the saved script and executing it whenever
required.
\item Contacting the \PipeFlow system to generate the right script (e.g., Storm, Spark, or PipeFabric scripts) based upon the user's selection of language from the dashboard page. This makes the user to be not aware of any of stream-processing languages syntax including \PipeFlow and their constructs. By this application, the user can trigger the execution of the script through the \PipeFlow system via calling the respective engine. Moreover, it handles real-time stats including execution and performance results sent by \PipeFlow system when the script is in execution. When the execution is complete, the application can send an email to the user .
\item It provides the options of saving the generated scripts or flow-designs for future reference, loading the saved script and executing it whenever required.
\item An evaluation tool for the generated scripts where the user is interested in comparing as well as evaluating the performance of stream- processing systems in terms of throughput, latency, and resource consumption such as CPU and memory. The evaluation can be performed online using dynamic figures or offline using static figures.
\end{itemize}
AutoStudio prominently uses open source softwares and frameworks. The client side including HTML5, JavaScript, Cascading Style Sheets (CSS), jQuery (and helping libraries), twitter bootstrap, and hogan.js is used for building the graphical user interface, performing Ajax requests, file uploading and downloading, etc. AutoStudio extensively uses pre-compiled hogan templates where the data returned from the server is simply passed to these templates for quick rendering. In addition, Draw2D touch is used to enable creation of diagram applications in a browser by creating and manipulating operators and connections. \\
%The second part is the server side which consists of the node.js web server.
AutoStudio prominently uses open source softwares and frameworks. The client side including HTML5, JavaScript, Cascading Style Sheets (CSS), jQuery (and helping libraries), twitter bootstrap, and hogan.js is used for building the graphical user interface, performing Ajax requests, file uploading and downloading, etc. AutoStudio extensively uses pre-compiled hogan templates where the data returned from the server is simply passed to these templates for quick rendering. In addition, Draw2D touch is used to enable creation of diagram applications in a browser by creating and manipulating operators and connections. In the server side, we used a web server suitable for data-intensive real-time applications. Therefore, node.js and its supporting modules such as nodemailer and socket.io were employed.
\begin{figure}[hb]
\centering
\includegraphics[width=3.5in, height = 2in]{autostudio.png}
\caption
{Front-end web application: AutoStudio}
\end{figure}
During the demo session we will bring a laptop running our system and demonstrate
its capabilities and supported-services by employing real queries and datasets which can be interactively
explored by demo audience. Since the front-end is a web-based application, the whole demonstration can be done via the laptop's browser. We will prepare some sample programs expressed in \PipeFlow. We will create these programs by dragging and dropping the operators. The users can switch between different generated scripts (i.e., different languages) without changing the data flow and observe the differences between them. To show real-time results, the generated scripts will compiled and executed over the respective engine. Moreover, the audience will see real-time performance evaluation for the engines by display static and dynamic figures. The audience will play a role in the session. They will change and create data flow programs interactively through the web front-end application and switch between the generated scripts to see real-time results and check the status of the running programs.
\ No newline at end of file
A wide range of (near) real-time applications process stream-based data, for instance,
financial data analysis, traffic management, telecommunication monitoring, environmental monitoring, the smart grid, weather forecasting, and social media analysis, etc. These applications focus mainly on finding useful information and patterns on-the-fly as well as deriving valuable higher-level information from lower-level ones from continuously incoming data stream to report and monitor the progress of some activities. Therefore, stream processing solution has emerged as an appropriate approach to support this kind of applications. In the last few years, several systems for processing streams of information where each offering their own processing solution in (near) real time have been proposed. It is pioneered by academic systems such as Aurora and Borealis~\cite{Abadi:2003:ANM:950481.950485,Borealis}, STREAM~\cite{stream}, and commercial systems like IBM InfoSphere Streams or StreamBase. Recently, some novel distributed stream computing platforms have been developed based on data parallelization approaches, which try to support scalable operation in cluster environments for processing massive data streams. Examples of these platforms are Storm~\cite{storm}, Spark ~\cite{spark} and Flink~\cite{flink}. Though, these engines (SPE) provide abstractions for processing (possibly) infinite streams of data, they provide only a programming interface and lack support for higher-level declarative languages. This means that operators and topologies have to be implemented in a programming language like Java or Scala. Moreover, to build a particular program or query in these systems, the users should be well versed with the syntax and programming constructs of the language, especially, if the system supports multiple languages. Therefore, no rapid development can be achieved as the user needs to proceed by writing each programming statements correctly. To make the life much easier, the current trend in data analytics should be the adopting of the "Write once, run anywhere" slogan. This is a slogan first-mentioned by Sun Microsystems to illustrate that the Java code can be developed on any platform and be expected to run on any platform equipped with a Java virtual machine (JVM). In general, the development of various stream processing engines raises the question whether we can provide an unified programming model or a stand language where the user can write one steam-processing script and he/she expects to execute this script on any stream-processing engines. By bringing all these things together, we provide a demonstration of our solution called \PipeFlow. In our \PipeFlow system, we address the following issues:
A wide range of (near) real-time applications process stream-based data including
financial data analysis, traffic management, telecommunication monitoring, environmental monitoring, the smart grid, weather forecasting, and social media analysis, etc. These applications focus mainly on finding useful information and patterns on-the-fly as well as deriving valuable higher-level information from lower-level ones from continuously incoming data stream to report and monitor the progress of some activities. In the last few years, several systems for processing streams of information where each offering their own processing solution have been proposed. It is pioneered by academic systems such as Aurora and Borealis~\cite{Abadi:2003:ANM:950481.950485,Borealis}, STREAM~\cite{stream}, and commercial systems like IBM InfoSphere Streams or StreamBase. Recently, some novel distributed stream computing platforms have been developed based on data parallelization approaches, which try to support scalable operation in cluster environments for processing massive data streams. Examples of these platforms Storm~\cite{storm}, Spark ~\cite{spark} and Flink~\cite{flink}. Though, these engines (SPE) provide abstractions for processing (possibly) infinite streams of data, they lack support for higher-level declarative languages. Some of these engines provide only a programming interface where operators and topologies have to be implemented in a programming language like Java or Scala. Moreover, to build a particular program (i.e., query) in these systems, the users should be expert and should have a deeper knowledge of the syntax and programming constructs of the language, especially, if the system supports multiple languages. Therefore, no time and effort savings can be achieved as the user needs to proceed by writing each programming statements correctly. To make the life much easier, the current trend in data analytics should be the adopting of the "Write once, run anywhere" slogan. This is a slogan first-mentioned by Sun Microsystems to illustrate that the Java code can be developed on any platform and be expected to run on any platform equipped with a Java virtual machine (JVM). In general, the development of various stream processing engines raises the question whether we can provide an unified programming model or a stand language where the user can write one steam-processing script and he/she expects to execute this script on any stream-processing engines. By bringing all these things together, we provide a demonstration of our solution called \PipeFlow. In our \PipeFlow system, we address the following issues:
\begin{itemize}
\item Developing a scripting language (i.e., a standard one) that provides most of the features of stream-processing scripting languages, mainly, Storm and Spark. Therefore, a dataflow specification language has been proposed that compiles into an equivalent script. The equivalent script can be a specific-script of one of those engines.
\item Mapping or compiling a \PipeFlow script into other scripts necessitates the existing of each operator in \PipeFlow also in the target engine. Since \PipeFlow contains a set of pref-defined operators, all of these operators has already been implemented in that engine directly or indirectly.
\item Developing a scripting language that provides most of the features of stream-processing scripting languages, e.g., Storm and Spark. Therefore, we have chosen a dataflow language called \PipeFlow. At the beginning, this language was intended to be used in conjunction with a stream processing engine called PipeFabric. Later, it is extended to be used with other engines. The source script written in \PipeFlow language is parsed and compiled and then the target program (i.e., for Spark and Storm as well as PipeFabric) is generated based on the user selection. This target script is equivalent in its functionalities to the input \PipeFlow program. Once the target program is generated, the user can execute this program in the specific engine.
\item Mapping a \PipeFlow script into other scripts necessitates the existing of each operator in \PipeFlow to be implemented in the target engine. Since \PipeFlow contains a set of pref-defined operators, all of these operators has already been implemented in that engine directly or indirectly.
\item Providing a flexible architecture for users for extending the system by supporting more engines as well as new operators. The latter operators, where a custom processing can be defined, should be integrated in the system smoothly.
\item Developing a web application as a front-end to enable users who have less experience in \PipeFlow to express the script problem and its associated processing algorithm and data pipeline graphically.
\end{itemize}
One of the useful application for this approach is helping the developers to evaluate various stream processing systems. Instead of writing several scripts, which should perform the same task, for different systems manually, writing a singular script in our approach will help to get the same result faster and more efficiently.
The remainder of the paper is structured as follows: In Sect.~\ref{sec:pipeflow}, we introduce the \PipeFlow language, the system architecture, and an example for the mapping between scripts. Next, in Sect.~\ref{sec:app}, we describe our front-end application and give details about its design and provided functionalities. Finally, a description of the planned demonstration is discussed in Sect.~\ref{sec:demo}.
\ No newline at end of file
......@@ -7,6 +7,16 @@
\usepackage{color}
\usepackage{graphicx}
\usepackage{multirow}
\usepackage{xcolor}
\usepackage{listings}
\lstset{language=Java,
showspaces=false,
showtabs=false,
breaklines=true,
showstringspaces=false,
breakatwhitespace=true,
basicstyle=\ttfamily,
}
\newtheorem{mydef}{Definition}
\newcommand{\todo}[1]{\textcolor[rgb]{1,0,0}{#1}}
......@@ -56,6 +66,7 @@ Omran Saleh\\
\maketitle
\begin{abstract}
Recently, some distributed stream computing platforms have been developed for processing massive data streams such as Storm and Spark. In general, these platforms lack support for higher-level declarative languages and provide only a programming interface. Moreover, the users should be expert and well versed of the syntax and programming constructs of each language in these platforms. In this paper, we are going to demonstrate our \PipeFlow system. This system provides a higher-level dataflow language which can be translated to different stream processing languages to be run in the corresponding engines. In this case, the user is willing to know one language, thus, he/she can write one steam-processing script and expects to execute this script on different engines.
\end{abstract}
......@@ -89,9 +100,9 @@ Omran Saleh\\
%\subsection{Query 2: Outliers}
%\section{Demonstration}
%\label{sec:demo}
%\input{demo}
\section{Demonstration}
\label{sec:demo}
\input{demo}
%
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment