Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions .github/workflows/sync-wiki-from-github.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Sync Wiki from GitHub

on:
gollum:
workflow_dispatch:

jobs:
sync-from-wiki:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
path: main-repo

- name: Checkout wiki repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}.wiki
path: wiki-repo

- name: Sync Wiki to texera.wiki
run: |
# Clear existing texera.wiki content in main repo (excluding .git or other ignored files if any)
rm -rf main-repo/texera.wiki/*

# Copy new wiki content
cp -rT wiki-repo main-repo/texera.wiki/

# Commit and push to main repo
cd main-repo
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git add texera.wiki/

if git diff-index --quiet HEAD; then
echo "No changes to sync."
else
git commit -m "docs: sync wiki from GitHub [skip ci]"
git push
fi
63 changes: 63 additions & 0 deletions .github/workflows/sync-wiki-from-pr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Sync Wiki from PR

on:
push:
branches:
- main
paths:
- 'texera.wiki/**'
workflow_dispatch:

jobs:
sync-wiki:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
path: main-repo

- name: Checkout wiki repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}.wiki
path: wiki-repo

- name: Sync texera.wiki to Wiki
run: |
# Clear existing wiki content (excluding .git)
find wiki-repo -mindepth 1 -maxdepth 1 ! -name ".git" -exec rm -rf {} +

# Copy new wiki content
cp -rT main-repo/texera.wiki wiki-repo

# Commit and push
cd wiki-repo
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git add .

if git diff-index --quiet HEAD; then
echo "No changes to sync."
else
git commit -m "docs: sync wiki from main repo [skip ci]"
git push
fi
27 changes: 27 additions & 0 deletions texera.wiki/Apache-License-header.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Every file must include the Apache License as a header. This can be automated in IntelliJ by
adding a Copyright profile:

1. Go to "Settings" → "Editor" → "Copyright" → "Copyright Profiles".
2. Add a new profile and name it "Apache".
3. Add the following text as the license text:

```
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
4. Go to "Editor" → "Copyright" and choose the "Apache" profile as the default profile for this
project.
5. Click "Apply".
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
This Document is aim to provide a instruction on how to setup the local development environment for developing and deploying the `core/micro-services`.

## Prerequisite

This document requires you to finish all the setup of Texera local development environment described in `https://github.com/Texera/texera/wiki`.

## What is `micro-services`?

`core/micro-services` is a sbt-managed project added by the PR https://github.com/Texera/texera/pull/2922. The ongoing code separation effort will gradually migrate all the services in `core/amber` to `core/micro-services`.

## How to directly build and run the micro-services directly

If you just want to run some services under `micro-services`, you can use some provided shell scripts.

### `WorkflowCompilingService`

```shell
cd texera/core

# make sure to give scripts the execution permission
chmod +x scripts/build-workflow-compiling-service.sh
chmod +x scripts/workflow-compiling-service.sh

# Build the WorkflowCompilingService
scripts/build-workflow-compiling-service.sh

# Run the WorkflowCompilingService
scripts/workflow-compiling-service.sh
```

## How to set up the development environment

As there are many sub sbt projects under `micro-services`, Intellij is the most suitable IDE for setting up the whole environment

### Use Intellij (Most Recommended)

1. Open the folder `texera/core/micro-services` through `Open Project` in Intellij
<img width="716" alt="Screenshot 2024-11-19 at 6 00 08 PM" src="https://github.com/user-attachments/assets/4e446332-7cfa-4974-b59b-2088a7a2d921">

Once you open it, Intellij will auto-detect the sbt setting and start to load the project. After loading you should see the sbt tab, which has the `micro-services` as the root project and several other services as the sub-projects:
<img width="200" alt="Screenshot 2024-11-19 at 6 05 15 PM" src="https://github.com/user-attachments/assets/24ba1a31-1c82-4441-b525-7facc00c3ada">


2. Run `sbt clean compile` command in folder `core/micro-services`. This command will compile everything under `micro-services` and generate proto-specified codes.









Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
This tutorial goes through the process of preparing data by creating dataset and creating a workflow to analyze data resided in the dataset using Texera.

More specifically, we are going to create a dataset named `Sales Dataset` which contains a file about the sales data of different types of merchandises for several countries. And the workflow will calculate the average sales per item type across different countries in Europe from the [CountrySalesData.csv](statics/files/CountrySalesData.csv) (Make sure the downloaded file is in `.csv` file extension). The sales data has been downloaded from [eforexcel.com](http://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/) and has 100 rows of data.

We will first be creating a dataset and uploading the sales data to it. Then we will be creating a workflow on Texera Web UI to
1. read the data from the file;
2. filter the relevant data based on keywords;
3. perform an aggregation.

**1. Upload data by creating a Dataset**
* Go to the Dataset tab and click the `dataset creation` icon to start creating the datasaet
* Name the dataset as `Sales Dataset`, drag and drop the `CountrySalesData.csv` to the file uploading area
* Click `Create`, the dataset we just created, along with the preview of `CountrySalesData.csv` is shown.
![2024-03-05 22 00 43](https://github.com/Texera/texera/assets/43344272/e17631b3-bf58-442f-af19-00f0ab704acb)

**2. Read data in Workflow**
* On the left panel, go to the `environment` tab and click `Add Dataset` to add the `Sales Dataset` to current workflow. `CountrySalesData.csv` will be available to be previewed and loaded to the workflow.
![2024-03-05 22 26 45](https://github.com/Texera/texera/assets/43344272/45e98e6b-fe6a-405c-bd24-22ee28ee3716)'
* Drag and drop a `CSV File Scan` operator. On the right panel, input the file name `CountrySalesData.csv` and select the path from the drop down menu
* Run the workflow, you should be able to see the loaded sales data.
![2024-03-05 22 46 11](https://github.com/Texera/texera/assets/43344272/77389a4c-dd73-4179-b8c0-ebf10241b182)


**3. Add operators to analyze data**
* Drag and drop a `Filter` operator to keep only the sales data in `Europe`
![2024-03-05 22 51 26](https://github.com/Texera/texera/assets/43344272/9b73fcaa-a7df-4efb-8189-4054a6bef527)

* Drag and drop a `Aggregate` operator to get the average sold units group by `Item Type`
![2024-03-05 22 53 06](https://github.com/Texera/texera/assets/43344272/67ade74c-df20-44b1-a9fa-1b8edb4af0cf)

Loading
Loading