There might be several reasons why you need to get files from Kaggle via script. In my case I was playing with Theano and Lasagne and wanted to download data directly to AWS GPU instance.
Apache Storm Spout for Bitcoin transactions processing
This topology has been written as my final project for “Real-Time Analytics with Apache Storm” course at Udacity and doesn’t have much practical sense as the maximum throughput of Bitcoin network is only about 7 transactions/second. It’s unlikely to be changed soon, at least until the number of transaction in the network approaches this constraint. Thus, it doesn’t make a lot of sense to use Apache Storm for processing. You can analyse Bitcoin transactions even with a single script (application) without a need to distribute computations unless you perform some really heavy and CPU consuming calculations. Nonetheless, it can be a good starting point for your own spout or WebSocket client in Java.
Running Lasagne via Jupyter on AWS GPU instance
Lasagne is a powerful Python library to build and train neural networks in Theano. However, you don’t get all the benefits of it unless you have CUDA-capable GPU. Fortunately, we can utilize Amazon Web Services which combines convenience and reasonable prices. We only need to create and configure instance once and run/stop it whenever we need later on.
- 1