Commit 85207e97 authored by Omran Saleh's avatar Omran Saleh
Browse files

changes

parent 2b5952a2
......@@ -6,7 +6,7 @@ In this section we provide a description of our \PipeFlow language and the syst
\end{alltt}
where \texttt{\$out} denotes a pipe variable referring to the typed output stream of operator \texttt{op} and \texttt{\$in\emph{i}} refers to input streams. By using the output pipe of one operator as input pipe of another operator, a dataflow graph is formed. Dataflow operators can be further parametrized by the following clauses described below.
\begin{itemize}
\item \texttt{by} clause: The by clause allows to specify a boolean expression which has to be satisfied by each output tuple. This is used for instance for filter operators, grouping, or joins. Expressions are formulated in standard C notation as follows:
\item \texttt{by} clause: The by clause allows to specify a boolean expression which has to be satisfied by each output tuple. This is used, for instance, for filter operators, grouping, or joins. Expressions are formulated in standard C notation as follows:
\vspace*{-0.5cm}
\begin{alltt}
\begin{center}
......@@ -14,7 +14,7 @@ where \texttt{\$out} denotes a pipe variable referring to the typed output strea
\end{center}
\end{alltt}
\vspace*{-0.5cm}
where \texttt{\$res} is the output pipe to which the filter operator publishes its result stream , \texttt{\$in} is the input pipe from which the operator receives its input tuples and has an attributed x, and \texttt{x > 42} is an predicate to discard the incoming tuples accordingly.
where \texttt{\$res} is the output pipe to which the filter operator publishes its result stream, \texttt{\$in} is the input pipe from which the operator receives its input tuples and has the \texttt{x} attribute , and \texttt{x > 42} is an predicate to discard incoming tuples accordingly.
\item \texttt{with} clause: The with clause is used to explicitly specify the schema associated with the output pipe of the operator. This is only required for some operators such as \texttt{file\_source}. An example of this clause is:
\vspace*{-0.5cm}
\begin{alltt}
......@@ -23,8 +23,7 @@ where \texttt{\$res} is the output pipe to which the filter operator publishes i
\end{center}
\end{alltt}
\vspace*{-0.5cm}
\item \texttt{using} clause: This clause allows to pass operator-specific parameters. These parameters are given as a list of key-value pairs with the following syntax, e.g., the 'filename' parameter for the file
reader operator to specify the input file:
\item \texttt{using} clause: This clause allows to pass operator-specific parameters. These parameters are given as a list of key-value pairs with the following syntax, e.g., the 'filename' parameter for the file reader operator to specify the input file:
\vspace*{-0.5cm}
\begin{alltt}
\begin{center}
......@@ -32,7 +31,7 @@ using (param1 = value1, param2 = value2, ...);
\end{center}
\end{alltt}
\vspace*{-0.5cm}
\item \texttt{generate} clause: The generate clause specifies how an output tuple of the operator is constructed. For this purpose, a comma-separated list of expressions is given, optionally a new field name can be specified using the keyword as:
\item \texttt{generate} clause: The generate clause specifies how an output tuple of the operator is constructed. For this purpose, a comma-separated list of expressions is given, optionally a new field name can be specified using the keyword \texttt{as}:
\vspace*{-0.5cm}
\begin{alltt}
\begin{center}
......@@ -40,7 +39,7 @@ generate x, y, (z * 2) as res;
\end{center}
\end{alltt}
\vspace*{-0.5cm}
\item \texttt{on} clause: The on clause is used to specify a list of fields from the input pipe(s) used for grouping or joining. For instance, ordering the tuples on \texttt{x} attribute in a descending order can specified as follows:
\item \texttt{on} clause: The on clause is used to specify a list of fields from the input pipe(s) used for grouping or joining. For example, ordering the tuples on \texttt{x} attribute in a descending order can specified as follows:
\vspace*{-0.5cm}
\begin{alltt}
\begin{center}
......@@ -57,44 +56,42 @@ where \texttt{limit} parameter limits the number of results returned.
\begin{figure}[hb]
\centering
\includegraphics[width=3in]{architecture.png}
\includegraphics[width=3.5in, height = 2in]{architecture.eps}
\caption{\PipeFlow architecture}
\label{fig:arch}
\end{figure}
In our approach, we have adopted Automated Code Translation (ACT) technique by taking an input source code written in \PipeFlow language and converting it into an output source code in another language.
\PipeFlow system is written in Java and depends heavily on Antlr\footnote{\url{http://www.antlr.org}} and StringTemplate\footnote{\url{http://www.stringtemplate.org}} libraries. The former libraries generates a \PipeFlow language parser that can build and walk parse trees whereas the latter generates code using pre-defined templates. Basically, the following components are used to achieve the translation of \PipeFlow source script to equivalent target scripts (PipeFabric, Spark, or Storm): \emph{(1)} parser \emph{(2)} flow graph \emph{(3)} template file, and \emph{(4)} code generator. The latter two components are specific to target scripts and differ from each other depending on the target code to be generated. And thus for every target language we need to create a separate template file and code generator to generate its code.
In our approach, we have adopted the automated code translation (ACT) technique by taking an input source script written in \PipeFlow language and converting it into an output source script in another language. In general, \PipeFlow system is written in Java and depends heavily on ANTLR\footnote{\url{http://www.antlr.org}} and StringTemplate\footnote{\url{http://www.stringtemplate.org}} libraries. The former libraries generates a \PipeFlow language parser that can build and walk parse trees whereas the latter generates code using pre-defined templates. Basically, the following components are used to achieve the translation of \PipeFlow source script to equivalent target scripts (PipeFabric, Spark, or Storm): \emph{(1)} parser \emph{(2)} flow graph \emph{(3)} template file, and \emph{(4)} code generator. The latter two components are specific to target scripts and differ from each other depending on the target code to be generated. And thus for every target language we need to create a separate template file and code generator to generate its code.
\textbf{Role of Components}: The roles and functionalities of each above-mentioned components are described below and shown in Fig. \ref{fig:arch}.
\begin{description}
\item[Parser:] This component simply does the lexical analysis by parsing the input program written in \PipeFlow and identifying the dataflow graph instances from the program. From a formal language description called a \PipeFlow grammar, ANTLR generates a parser for that language that can automatically build parse trees, which are data structures representing how a grammar matches the input. ANTLR also automatically generates tree walkers that you can use to visit the nodes of those trees. A listener interface of the ANTLR parser is called while parsing the \PipeFlow specification and constructs the corresponding flow node instances which are later used to form the dataflow graph. Thus, the flow graph object can be created based on the input program.
\item[Parser:] This component simply does the lexical analysis by parsing the input program written in \PipeFlow and identifying the dataflow graph instances from the program. Initially, ANTLR generates a parser for the \PipeFlow language automatically based on the \PipeFlow grammar. This parser creates a parse tree which is the data structures representing how the grammar matches the \PipeFlow script. Additional, ANTLR automatically generates a tree walker interface which can be used to visit the nodes of the parse tree. A new listener interface, which implements the parent interface, is used to visit the nodes in order to construct the corresponding flow node instances during the tree traversal. From flow node instances, the flow graph object can be created.
\item[Flow graph: ] A logical presentation of the input program which comprises nodes (flow nodes) and edges (pipes). A flow node instance represents an operator in the data-flow program. It is the intermediate mean between the parser and code generator to build up a graph of nodes and generate the target code, respectively. Pipe instance represents the edge between two nodes of a dataflow graph. Therefore, each pipe contains the input node (the node producing tuples which are sent via the pipe) and the output node (the node consuming these tuples from the pipe). This component is mainly used to generate the target code in a specific language. And Irrespective of target language program to be generated this component remains same.
\item[Flow graph: ] A logical representation of the input program which comprises nodes (flow nodes) and edges (pipes). A flow node instance represents an operator in the data-flow program. It is the intermediate mean between the parser and the code generator to build up a graph of nodes and generate the target code, respectively. Pipe instance represents the edge between two nodes of the dataflow graph. Therefore, each pipe contains the input node (the node producing tuples which are sent via the pipe) and the output node (the node consuming these tuples from the pipe). This component is mainly used to generate the target code in a specific language. And Irrespective of target language program to be generated this component remains same.
\item[Code generator:] Once the flow graph is generated from the input \PipeFlow source program, the code generator generates the target code based on it. The code generator takes the flow graph generated by parser and a template file, processes all nodes and pipes iteratively, and creates the equivalent code using the StringTemplate library. There are separate code generators for each specific target language depending on the target code to be generated.
\item[Template file:] It is also defined as a string template group file (stg). We can imagine this file as string with holes which has to be filled by the code generator. inside this file, the rules with its arguments should be defined to specify how to format the operators code and which part in it has to render. Therefore, some parts will be rendered as it is where other parts contain place-holders will be replaced with the provided arguments. Template file is different for each specific target code to be generated as the format and syntax of each target language to be generated is different. The template file contains the library files to be included, packages to be imported, or the code blocks for operators, etc.
\item[Code generator:] Once the flow graph is generated from the input \PipeFlow source program, the code generator generates the target code based on this graph. The code generator takes the flow graph generated by the parser and a template file, processes all nodes and pipes iteratively, and creates the equivalent code using the StringTemplate library. There are separate code generators for each specific target language depending on the target code to be generated.
\item[Template file:] It is also defined as a string template group file (stg). We can imagine this file as string with holes which has to be filled by the code generator. Inside this file, the rules with its arguments should be defined to specify how to format the operators code and which part in it has to render. Therefore, some parts will be rendered as it is where other parts contain place-holders will be replaced with the provided arguments. Template file is different for each specific target code to be generated as the format and syntax of each target language to be generated is different. The template file contains the library files to be included, packages to be imported, or the code blocks for operators, etc.
\end{description}
\subsection{An Example of Translation}
Consider below a simple\footnote{Since of the limitation in the number of pages.} sample script written in \PipeFlow that needs to be translated to PipeFabric and Storm by our system. This script generates a stream that contains the fields \texttt{x} and \texttt{y}. Later, \texttt{x} files is filtered and aggregated to find the the sum of all \texttt{y} fields for particular \texttt{x}. Note that the \PipeFlow construct is simpler than other engine construct.
Consider below a simple\footnote{Because of the limitation in the number of pages.} sample script written in \PipeFlow that needs to be translated to PipeFabric and Storm by our system. This script reads a stream that contains \texttt{x} and \texttt{y} fields . Later, the \texttt{x} field is filtered and aggregated to find the sum of all \texttt{y} fields for a particular \texttt{x}. Note that the \PipeFlow construct is simpler than other engine constructs.
\begin{alltt}
\$in := file_source() using
(filename = "input.csv") with (x int, y int);
$1 := filter($in) by x > 18;
$2 := aggregate($1) on x generate x, sum(y) as counter;
\$in := file_source() using (filename = "input.csv") with (x int, y int);
\$1 := filter(\$in) by x > 18;
\$2 := aggregate(\$1) on x generate x, sum(y) as counter;
\end{alltt}
The following represents the most important parts in storm and PipeFabric code, respectively. Both codes have the same functionalities but in different languages and engines.
\begin{lstlisting}
The following represents the most important parts in Storm and PipeFabric codes, respectively. Both codes have the same functionalities but in different languages and engines.
\begin{lstlisting}[caption=Generated Storm code, label="storm"]
public class MyFilter extends BaseFilter {
public boolean isKeep(TridentTuple tuple) {
return tuple.getInteger(0) > 18;
}
stream.each(new Fields("x","y"), new MyFilter())
stream.groupBy(new Fields("x"))
.aggregate(new Fields("y"), new Sum(), new Fields("counter"))
}
stream.each(new Fields("x","y"), new MyFilter())
stream.groupBy(new Fields("x"))
.aggregate(new Fields("y"), new Sum(), new Fields("counter"))
\end{lstlisting}
\begin{lstlisting}
typedef pfabric::Tuple<int, int> TupleType1_;
\begin{lstlisting}[caption=Generated PipeFabric code, label="pipefabric"]
typedef pfabric::Tuple<int, int>
TupleType1_;
auto op1_ = new FileSource<TupleType1_>("input.csv");
auto op2_ = new Filter<TupleType1_>([&](TupleType1_ tp) { return std::get<0>(*tp) > 18; });
makeLink(op1_, op2_);
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment