SlapOS is a decentralized Cloud Computing technology that can automate the deployment and configuration of applications in a heterogeneous environment.
1 - 4 of 4 records


upfront: I am not experienced at all with the builout process and slapos.

 I am trying to create a software release for the python lib NLTK. The actual package is straightforward, however the library requires a (large) amount of data to be downloaded and install seperately.

 This data usually is installed with an interactive GUI or a few python-calls ie.

import nltk
packages = ['words', 'webtext']

The "raw" data is also available as a direct download here. However, looking at the source-code installing this raw data seems rather involved and complicated.

So my question is if it is okay to use the python-utility provided by nltk to download the data, or if I should somehow try to install it via the direct download (ie. one thing to consider is that using the downloader circumvents possible caching and such provided by the more orthodox installation process).



As you have noticed indeed there are 2 ways

  1. Direct download, from buildout point of view simple, no idea why so complex from nltk point of view
  2. Or you can have buildout generate a wrapper script download-nltk that download needed. Script executed manually
  3. Same as #2 but profile configurable and script automatically called  (like service or so)

Anyway I feel that in case one needs to work with nltp then it's certain that what one needs in terms of downloads can be quite different from what others needs (i.e. configuration). Thus most likely one needs to have freedom to download more packages than what we "decide" default slapos profile should have. For this it's either slapos configuration or file system access over ssh to "hack".

Can we see current profile code?




this is current profile:

And this is how others use the nltk downloader with buildout:

About direct download versus using nltk downloader: According to sebastian, nltk downloader does more than just downloading, it also does some configuration after download (but I did not check the code, so I am not sure),

The sr must have a configuration option for sure to decide what packages to download, because a software using nltk will depend on the availabilty certain packages.

The question which Sebastian and I have is: What is preffered ? A) Using (configurable) direct download or B) using (configurable) download through calling nltk-download ?

B) seems easier to implement (see link above). But we were not sure if it is against SlapOS design to use a downloader script

Regards, Klaus

I just posted a Merge Request so that you can install NLTK with data and you can easily override the collection that will be downloaded. Please review it and put your comments.