There might be several reasons why you need to get files from Kaggle via script. In my case I was playing with Theano and Lasagne and wanted to download data directly to AWS GPU instance.

This topology has been written as my final project for “Real-Time Analytics with Apache Storm” course at Udacity and doesn’t have much practical sense as the maximum throughput of Bitcoin network is only about 7 transactions/second. It’s unlikely to be changed soon, at least until the number of transaction in the network approaches this constraint. Thus, it doesn’t make a lot of sense to use Apache Storm for processing. You can analyse Bitcoin transactions even with a single script (application) without a need to distribute computations unless you perform some really heavy and CPU consuming calculations. Nonetheless, it can be a good starting point for your own spout or WebSocket client in Java.

Lasagne is a powerful Python library to build and train neural networks in Theano. However, you don’t get all the benefits of it unless you have CUDA-capable GPU. Fortunately, we can utilize Amazon Web Services which combines convenience and reasonable prices. We only need to create and configure instance once and run/stop it whenever we need later on.