CKAN is a complex software working with some many other web services. Hence, creating the docker version was not straightforward and needed much troubleshooting. All the useful links are provided here for a furthure study:

  1. CKAN help:
    1. https://docs.ckan.org/en/2.8/maintaining/configuration.html#sqlalchemy-url
  2. Docker:
    1. command lines: https://docs.docker.com/engine/reference/commandline/docker/
    2. docker network: https://www.freecodecamp.org/news/how-to-get-a-docker-container-ip-address-explained-with-examples/
    3. https://docs.docker.com/compose/networking/
    4. https://stackoverflow.com/questions/58073936/how-to-get-ip-address-of-docker-desktop-vm
  3. ckan docker - official:
    1. github: https://github.com/ckan/ckan/tree/2.8
    2. Documentation: https://docs.ckan.org/en/latest/maintaining/installing/install-from-docker-compose.html#
  4. SDDI Docker:
    1. https://github.com/tum-gis/SDDI-CKAN-Docker
  5. Other Ckan docker:
    1. Stable: https://github.com/keitaroinc/docker-ckan
    2. https://github.com/ckan/ckan-docker
    3. https://github.com/okfn/docker-ckan
    4. https://github.com/kowh-ai/ckan-docker-travis
    5. https://github.com/eccenca/ckan-docker
    6. https://github.com/UKHomeOffice/docker-ckan
    7. Instruction: https://herrmann.tech/en/blog/2020/09/30/how-to-install-and-configure-ckan-2-9-0-using-docker.html
  6. Docker - DB:
    1. Connect From Your Local Machine to a PostgreSQL Database in Docker

      1. https://medium.com/better-programming/connect-from-local-machine-to-postgresql-docker-container-f785f00461a7

      2. https://reachmnadeem.wordpress.com/2020/06/02/running-postgresql-database-in-docker-and-connecting-from-host-outside-container/
    2. https://stackoverflow.com/questions/37694987/connecting-to-postgresql-in-a-docker-container-from-outside
    3. How to Restore Database Dumps for Postgres in Docker Container: https://simkimsia.com/how-to-restore-database-dumps-for-postgres-in-docker-container/

    4. How to List Databases and Tables in PostgreSQL Using psql: https://chartio.com/resources/tutorials/how-to-list-databases-and-tables-in-postgresql-using-psql/

  7. Database Dump inside the docker:
    1. https://simkimsia.com/how-to-restore-database-dumps-for-postgres-in-docker-container/
    2. https://www.pgadmin.org/docs/pgadmin4/development/backup_dialog.html
  8. Some usefull discussions:
      1. https://lists-archive.okfn.org/pipermail/ckan-dev/2014-November/019762.html
      2. https://github.com/ckan/ckan/issues/5572#issuecomment-685453697
      3. Apache or Nginx - CKAN :
    https://github.com/ckan/ckan/issues/4991
    1. https://medium.com/@ksashok/using-nginx-for-production-ready-flask-app-with-uwsgi-9da95d8ac0f9
  9. FLASK & Front-end web server:
    1. Example 1: https://testdriven.io/blog/dockerizing-flask-with-postgres-gunicorn-and-nginx/
    2. Example 2: https://github.com/mandrewcito/flask-example-app
    3. Example 3: https://github.com/carlostighe/apache-flask
  10. Installing Docker and running ckan on linux:
    1. https://serverfault.com/questions/715905/why-am-i-getting-an-invalid-command-proxypass-error-when-i-start-my-apache-2/715906
    2. https://www.digitalocean.com/community/tutorials/how-to-use-apache-http-server-as-reverse-proxy-using-mod_proxy-extension
    3. https://www.linode.com/community/questions/311/how-do-i-enabledisable-a-website-hosted-with-apache
    4. https://askubuntu.com/questions/629995/apache-not-able-to-restart

CKAN Services:

THREAD

Mandana Moshref
@MandanaMoshref

Dec 01 11:33
Hello,
I have a question regarding the docker installation. It seems the docker provided by ckan itself is only for the dev purpose ( neither nginx nore apache is included in the Dockerfile /or docker-compose.yml).
I need to have the docker version of our own ckan implementation which is using the ckan core docker including some revision to fit our requirements.
I made the changes and it works on my personal machine. Now I need to move it into our server for production-ready use. My question is: does it make sense if I run it as the localhost on my docker server and then use apache server reverse proxy to reverse the localhost:5000 into the https://my-website.com?

Brett
@kowh-ai

Dec 01 13:26
Yes that makes sense - you could include a NGINX docker image/container configuration in your docker-compose.yml file making sure your NGINX configuration contains a proxy_pass line to proxy requests to the CKAN Docker container eg: proxy_pass http://ckan:5000/;

mabah-mst
@mabah-mst

Dec 01 13:33
I have had the same concerns with the docker installation process that is described in the documentation. I am working on deploying a setup using the docker-compose setup from https://github.com/keitaroinc/docker-ckan .

Mandana Moshref
@MandanaMoshref

Dec 01 13:59
@kowh-ai thanks for the reply.
Is it a recommendation to include ngnix or requirements?
If I go ahead without including NGNIX docker what will happen?
Two more points for clarification: 1) my ckan version is 2.8.0 2) Apache server is not a docker installation.

Brett
@kowh-ai

Dec 01 14:34
@MandanaMoshref I'd always recommend to have some sort of HTTP server on the front (especially for Prod). It would be a simple NGINX docker container configuration. Running a front-end web/proxy server outside of Docker may cause you some grief (especially with networking) unless you are comfortable working with infrastructure... @mabah-mst
yes the keitaroinc setup is certainly more complete than the current CKAN docker one.

Mandana Moshref
@MandanaMoshref

Dec 03 20:44
@kowh-ai Thanks a lot Brett for your advice. I have one more question.
I did it as you suggested including the NGINX docker and then I used apache reverse proxy and the SSL certificate to secure it. I am just wondering whether it is better to include the security certificate in the NGINX container or is it fine also to include it by the apache server?
Thanks again.

Brett
@kowh-ai

Dec 04 10:14
Oh so your setup is along these lines: user —SSL—> Apache Web Server —non-SSL—> NGINX(Container) —non-SSL—> CKAN(Container) ?

Mandana Moshref
@MandanaMoshref

Dec 04 10:16
yes... but just because It has worked in this way. But honestly have no idea how bad or O.K. is my approach

Brett
@kowh-ai

Dec 04 15:01
Assuming the Apache web server and containers are all running on the same machine (you mentioned moving to your production-ready server previously) I would try and simplify by just taking out the Apache web server from the configuration as it’s not really needed. You could update the NGINX container to listen via an SSL port. The non-Docker CKAN “Deploying a source install” instructions (https://docs.ckan.org/en/latest/maintaining/installing/deployment.html) include the NGINX web server as just a reverse proxy to a WSGI Server (which is the Apache Server replacement with CKAN 2.9). With Docker containers, (I think) you don’t need to worry about the WSGI server as the running CKAN container exposes the 5000 port. I hope that isn’t confusing and hasn’t given you a lot more work to do…

HEF AgriHUB Docker

CKAN itself has several docker installation repositories (also referenced in "Useful links" above.).
Our installation is mainly based on the official one from CKAN with improvements inspired by CKAN Keitronic Docker.

Here are the details regarding the design, process, installation and operation of docker HEF AgriHUB on the HEF server hosted by LRZ:


For this docker we use docker compose.

All the relevant files for creating the HEF AgriHUB is presented in the below graph:

Main folder

  • readME.rm: provides short instruction about how to run the HEF AgriHUB docker
  • SetupCKANDocker: a shell file for automatically running the whole HEF AgriHUB catalog in the linux-based OS
  • Setup_CKAN_Docker.bat:  a batch file for automatically running the whole HEF AgriHUB catalog in the windows
  • agrihub.dump: Full backup of HEF AgriHUB main database  
  • datastore.dump: Full backup of HEF AgriHUB datastore database  

docker folder

  • .env: inside the env file some of the high level required settings are defined (for example "CKAN_SITE_UR", or database-related settings such as "Password", "USER", etc.).

docker-compose.yml: inside this folder running "docker-compose up -d --build" reads this file. Here we have defined 6 services ( for our implementation nginx is deactivated due to port conflict with port 80 and instead we use apache server on the host machin)

After this step, CKAN should be running at CKAN_SITE_URL.

There should be five containers running (docker ps):

  • ckan: CKAN with standard extensions
  • db: CKAN’s database, later also running CKAN’s datastore database
  • redis: A pre-built Redis image.
  • solr: A pre-built Solr image set up for CKAN.
  • datapusher: A pre-built CKAN Datapusher image.

the Postgres container could need longer to initialize the database cluster than the ckan container will wait for. This time span depends heavily on available system resources. If the CKAN logs show problems connecting to the database, restart the ckan container a few times:

docker-compose restart ckan
docker ps | grep ckan
docker-compose logs -f ckan

There should be four named Docker volumes (docker volume ls | grep docker). They will be prefixed with the Docker Compose project name (default: docker or value of host environment variable COMPOSE_PROJECT_NAME.)

  • docker_ckan_config: home of production.ini
  • docker_ckan_home: home of ckan venv and source, later also additional CKAN extensions:
  • docker_ckan_storagehome of CKAN’s filestore (resource files)
  • docker_pg_data: home of the database files for CKAN’s default and datastore databases

Docker structure

  • For HEF AgriHUB instead of just using the default docker network, we have specified our own network structure with the top-level networks key containing 2 separate networks "frontend" and "backend". This lets us create more complex topologies and specify custom network drivers and options. In this way, we specify what networks to connect to with the service-level networks key, which is a list of names referencing entries under the top-level networks key. For HEF AgriHUB all those services which should be accessed only internally have a network "backend" incl. "db", "solr", and "redis". Services "datapusher" and "ckan" should be on front but also in contact to the backend network so they have both "backend" and "frontend" networks, In this way, we isolate our backend services from other containers to reach out to them.
  • In addition, services "ckan" and "datapusher" have open ports to be accessed from inside the host mashin. but the rest of services are also blocked from the Host machine. 

For each sevice, the build folder is provided and can be studied.

CKAN container

  • Dockerfile: during building the HEF AgriHUB, for the service "ckan", it gose to the folder ckan and reads firt the file "Dockerfile". This file consists of follwoing sections:
    • pick the image debian:jessie
    • install required system package
    • define environment path specific to CKAN
    • build a virtual environment [it is in fact not necessary as we use this environment only for ckan] and its dependencies
    • Setup CKAN
      • coping the whole folder inside the container
      • install required dependencies (requiremnt-setuptools.txt && requirements.txt)
      • install ckan source code which is provided inside the folder ckan (ckan/ckan)
    • provide entry point
    • installing all extentions. Many of their source codes are directly provided inside the folder extentions
    • copy basemaps.js into the config path
    • copy production.ini into the config path
    • Start ckan with the cli command → ckan-paster serve with config file production.ini.

CKAN extension

Here is a list of all required extensions. Detailed explaination of these extensions are provided under the page Developer Guideline: HEF CKAN (The Comprehensive Knowledge Archive Network)

RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install ckantoolkit
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install ckanapi
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install geoalchemy2
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install git+https://github.com/eawag-rdm/lucparser
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install lxml
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install shapely
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install requests
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-basiccharts
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-composite
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-contact
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-dashboard
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-datarequests
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-datesearch
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-disqus
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-fluent
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-geoview
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-hierarchy-g
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-hierarchy-s
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-mapviews
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-pdfview
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-viewhelpers
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-relation
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-repeating
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-restricted
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-scheming
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-userautoadd

RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install -e git+https://github.com/ckan/ckanext-dcat.git#egg=ckanext-dcat
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install -r /usr/lib/ckan/default/src/ckanext-dcat/requirements.txt

RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install -e git+https://github.com/datopian/ckanext-gdpr.git#egg=ckanext-gdpr

RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-spatial
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-spatial
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install -r /usr/lib/ckan/default/src/ckanext-spatial/pip-requirements.txt

RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install /usr/lib/ckan/default/src/ckanext-harvest
RUN . $CKAN_VENV/bin/activate && $CKAN_VENV/bin/pip install -r /usr/lib/ckan/default/src/ckanext-harvest/pip-requirements.txt


data & backup files

  • HEF - Resources: This folder contains all resources and data which are uploaded to the catalog (incl. excel files, pdfs, images, etc.)
  • agrihub.dump: Full backup of HEF AgriHUB main database  
  • datastore.dump: Full backup of HEF AgriHUB datastore database

HEF AgriHUB docker

By running /SetupCKANDocker,

  1. It first goes to the folder contrib/docker to read the "docker-compose.yml and start creating the services (described in the previous part)
  2. Copy the whole HEP folder into ckan container
  3. Work with db container and doing:
    1. add postgis extension
    2. add spatial ref systems
    3. alter the view geometry_columns ownership to ckan user
    4. alter the view spatial_ref_sys ownership to ckan user
  4. Copy agrihub.dump into ckan container
  5. Use ckan CLI command to clean the CKAN db
  6. Restore agrihub.dump file (copy the dump file) inside the fresh installed ckan/HEF AgriHUB
  7. Remove agrihub.dump file
  8. Repeat step 4 for datastore
  9. Repeat step 6 for datastore
  10. Remove datastore.dump file
  11. Rebuild solr index using CKAN CLI command
  12. Change access permission for the folder storing the upload files to ckan
  13. Set required permission and grants for datastore database


CKAN CLI

In order to learn more and be able to work with CKAN CLI refer to this documentation: https://docs.ckan.org/en/2.9/maintaining/cli.html


Apache server as a reverse proxy

At this stage, the HEf AgriHUB is running and can be accessed by the ip address and port 5000 (ip:5000)

However, we would like to have it running it securely under DNS name "https:\\agrihub.hef.tum.de" and limit access to this service to only MNW network. For that we installed the apache server inside the HEF-LRZ server /host machine with the config file:

<VirtualHost *:80>
	Servername agrihub.hef.tum.de
	Redirect / https://agrihub.hef.tum.de/
</VirtualHost>

<VirtualHost *:443>
	
	ServerName agrihub.hef.tum.de
	ServerAdmin admin@hef.tum.de	
	
	SSLEngine on
    SSLProtocol -all +TLSv1.2
	SSLCertificateFile /etc/apache2/ssl_agrihub/ckanhef.pem
	SSLCertificateKeyFile /etc/apache2/ssl_agrihub/ckanhef.key
	SSLCertificateChainFile /etc/apache2/ssl_agrihub/cacerts_DNS.crt
    SSLCipherSuite          ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128:AES256:AES:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK
    SSLHonorCipherOrder     on
    SSLCompression          off

    <Proxy *>
            Order Deny,Allow
            Deny from all
            Allow from 129.187 10.157 10.152 10.162.246 2001:4ca0:2fff::/48 2001:4ca0:2fff:9:0:1::/96 2001:4ca0:2fff:9:0:2::/96
    </Proxy>

	ProxyPass / http://localhost:5000/
	ProxyPassReverse / http://localhost:5000/

	#ProxyPreserveHost On
	#ProxyRequests Off

</VirtualHost>


For limiting access we referred to this source from LRZ: https://doku.lrz.de/display/PUBLIC/VPN-Technik

Currently, HEF AgriHUB is running with the following information:

Serviceportaccessibility
Apache:
/etc/apache2/sites-available/agrihub.hef.tum.de.conf

5000

(reversed to 80 & 433)

Allow from

10.162.246

129.187
10.157
10.152
2001:4ca0:2fff::/48
2001:4ca0:2fff:9:0:1::/96 2001:4ca0:2fff:9:0:2::/96

CKAN5000 (limited only to local machine)
datapusher

8000 

(limited only to local machine)
db5432 (inside the docker network)
solr8983 (inside the docker network)
redis6379(inside the docker network)

In the docker-compose.yml file it is set in a way that after every system reboot or docker restart it also starts automatically by setting up: 

restart: always
However, it is suggested to check HEF AgriHUB every week to make sure it is running as expected.



  • Keine Stichwörter