Publishing pandas data frames to Tableau Online

via python environment

Based on this article by Eric Chan

First we need Tableau Extract API downloaded and saved in /results folder for further use

wget -q --show-progress --progress=bar:force -P /results https://downloads.tableau.com/tssoftware/extractapi-py-linux-x86_64-2019-2-6.tar.gz
1.8s
modin-tableau (Bash in Python)
extractapi-py-linux-x86_64-2019-2-6.tar.gz
100.24 MB

Next untar and install Tableau's python SDK, add pandleau, tableau server client and pandas, update conda packages

tar zxvf 
extractapi-py-linux-x86_64-2019-2-6.tar.gz
cd hyperextractapi-py-linux-x86_64-release_2019_2.2019.2.6.199.r40e5865b/
python setup.py install
conda install -c conda-forge tableauserverclient modin pandas
pip install pandleau --no-deps
conda update -n base -c defaults conda
330.7s
modin-tableau (Bash in Python)
export MODIN_CPUS=4
export MODIN_ENGINE=ray
export MODIN_BACKEND=pandas
0.5s
modin-tableau (Bash in Python)

How to use this runtime

Import the environment container with all the necessary packages

python-tableau-online
Download as Docker image from:
Copy
This image was imported from: docker.nextjournal.com/environment@sha256:5cf7d089d7a2d3a1955eb2f8a9f7259bc57f44cb9f0b8f1393aa97cbf8ce3639

Populate your dataframe with data first, e.g. a large csv file processed in chunks to avoid memory errors

import modin.pandas as pd
tfr = pd.read_csv("/path/to_csv", chunksize=500000, iterator=True)
df = pd.concat(tfr, ignore_index=True)
print(df)
2.1s
Python
python-tableau-online

Now let's create a publishtotableau Python function that publishes pandas dataframe as data source in Tableau Online so that we can later use it in code

import modin.pandas as pd
import tableauserverclient as TSC
from pandleau import *
def publishtotableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'):
    """
    Login to Tableau Online and publish a pandas dataframe
    Assumes the following pages are imported:
        - tableauserverclient as TSC
        - pandleau import *
        - pandas as pd
    Args:
        df: dataframe to publish
        folder_path: folder to store temp.hyper file generated
        projectid: Tableau Server Project ID
        datasource_name: Name of the datasource to publish
        auth_list: List-like with username on index 0, password on index 1
        site: Tableau server site
    Returns:
        None
    """
    pandleau(df).to_tableau(folder_path+'temp.hyper', add_index=False)
    tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id = site)
    server = TSC.Server('https://10ax.online.tableau.com/', use_server_version=True)
    with server.auth.sign_in(tableau_auth):
        mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name)
        item = server.datasources.publish(mydatasourceitem,folder_path+'temp.hyper', 'Overwrite')
        print("{} successfully published with id: {}".format(item.name, item.id))
0.0s
Python
python-tableau-online

Populate your dataframe with data first, e.g. a large csv file processed in chunks to avoid memory errors

import pandas as pd
import tableauserverclient as TSC
from pandleau import *
tfr = pd.read_csv("/path/to_csv", chunksize=500000, iterator=True)
df = pd.concat(tfr, ignore_index=True)
print(df)
Python
python-tableau-online

Alternative method

Runtimes (2)