Lemonade (Live Exploration and Mining Of a Non-trivial Amount of Data from Everywhere) is an analytics platform that supports intuitive definition of tasks for knowledge discovery, mining, and learning from large amounts of data that come from a wide spectrum of scenarios. The platform interface is a web application in which users may define analytics workflows visually by dragging and dropping operations and data sources, and connecting them.
Lemonade has 7 micro-components:
- Limonero: stores meta-data about data sources and provides them as service.For each data source, it has information about its location access permissions, storage details (such as name, data type, size, precision, data format) and data characteristics such as distribution, missing values, mean and maximum values.
- Tahiti: maintains metadata about individual operations and dataflows created by users and provides them as service. Operations are the smallest units in Lemonade, and they are divided in five categories: execution, privacy/security, monitoring, appearance, and quality of service requirements (QoS).
- Citron: the web interface user use to create, execute, and monitor their data flows. With it, users can choose predefined operations, drag and conncect them throught their ports to compose a data flow.
- Juicer: the module that actually runs the data flows and supports the monitoring of their execution. Upon receiving a data flow, it generates the equivalent Spark source code, acting as a transpiler (source-to-source compiler), where each operation becomes a method. The Spark code is then instantiated in the cloud execution environment, observing the user-defined QoS parameters to make sure operations execute with sufficient resources to meet user demands.
- Stand: coordinates the communication between Citron and Juicer, ensuring independence between the two components. Execution starts when a user requests to run a dataflow through the Citron interface, which then invokes Stand, which connects back to the first to provide feedback to the user.
- Thorn: responsible for security, privacy and access control (AAA) in Lemonade. Some of its tasks are challenging, such as determining who will be able to access the results from applying an operation to a database that contains sensitive attributes.
- Caipirinha: provides visualizations through different visual metaphors.
Essential information for potential users
Lemonade is an open-source solution. All dependencies (operating system, processing frameworks, infrastructure technologies) are also open source, so there are no licensing costs. The license scheme is under discussion and it will be finalised for the first release. To be kept up and running, Lemonade requires a cluster of processing computers and data storages. The size and capacity of the cluster depends on the number of users, data volume and complexity of workflow/tasks.
Lemonade depends on Apache Mesos (standalone mode) or a distributed processing technology (Apache Spark, BSC COMPSs or CMCC Ophidia), Oracle MySQL database server and a Linux operating system distribution. Lemonade requires a reliable infrastructure to run that may be provided by platform-as-a-service (PaaS) companies, such as Google, Amazon or Microsoft or by the organization using Lemonade.
Three different user roles are supported in Lemonade: a system administrator, a data scientist and a data explorer. System administrator will be responsible for keeping Lemonade running, adding new users, setting permissions and security, and managing data sets. Data scientists must know about Lemonade operations in order to create processing workflows and data being processed, their characteristics and how his/her results can be applied in a real scenario. Data explorers are the users of existing models.
Owner type: Academia/Research