NeuralCloud Blog

We have built API's on a parallel bare metal cluster in seconds that you can use for Hadoop and other HPC apps.

Our focus & solutions are: Oil & Gas Exploration, 3D Imaging, 3D Analytics, digital forensics, cyber triage, cyber treats and malware Analytics.

IIf you understand behavior of an ANN (Artificial Neural Network) API app or whether you are a software programmer or algorithm developer with Accelerated & Multicore Simulations with experience using Matlab & Simulink, CUDA or OpenCL you maybe interested in running your application in our cloud. Please contact us at: info@neuralcloud.org

 

 

Blog Post

GPU Computing: The Revolution!!!

Solve a problem more quickly. Parallel processing would be faster, but the learning curve is steep – isn't it?
Not anymore. With CUDA, you can send C, C++ and Fortran code straight to GPU, no assembly language required.
Developers at companies such as Adobe, ANSYS, Autodesk, MathWorks and Wolfram Research are waking that sleeping giant – the GPU -- to do general-purpose scientific and engineering computing across a range of platforms.
Using high-level languages, GPU-accelerated applicatio…

Read more

blog post

Find more complex relationships in your data
IBM® SPSS® Neural Networks software offers nonlinear data modeling procedures that enable you to discover more complex relationships in your data. The software lets you set the conditions under which the network learns. You can control the training stopping rules and network architecture, or let the procedure automatically choose the architecture for you.
With SPSS Neural Networks software, you can develop more accurate and effective predictive mo…

Read more

Using Matlab to call Python std's

There is a simple strategy that should do: first, figure out how to call python from stand-alone C functions, and then use that code within a mex function. While tedious, at least this is straightforward. By using the not terribly complicated PyObject stuff we can create python objects in C, send them to python functions, and unpack whatever the python functions give us back. However, everything goes bad if we try to import numpy in our python code. We’ll get an error that looks like this:…

Read more

Using Matlab to call Python std's

There is a simple strategy that should do: first, figure out how to call python from stand-alone C functions, and then use that code within a mex function. While tedious, at least this is straightforward. By using the not terribly complicated PyObject stuff we can create python objects in C, send them to python functions, and unpack whatever the python functions give us back. However, everything goes bad if we try to import numpy in our python code. We’ll get an error that looks like this:

……../pylib/numpy/core/multiarray.so:
undefined symbol: _Py_ZeroStruct
even though all the required symbols are defined in libpython2.x.

This problem was asked several times on stackoverflow, with no satisfactory answer. But luckily, after much searching, I stumbled upon https://github.com/pv/pythoncall which discovered a way to solve this problem.

Basically, matlab imports dynamic libraries in a peculiar way that messes up the symbols somehow. But if we execute the code

int dlopen_python_hack(){
if (!dlopen_hacked){
printf(“Preloading the library for the hack: %s\n”, LIBPYTHON_PATH);
dlopen(LIBPYTHON_PATH, RTLD_NOW|RTLD_GLOBAL);
dlopen_hacked = 1;
}
}

where LIBPYTHON_PATH points to libpython2.x.so, then suddenly all the messed-up symbols will fix themselves, and we won’t have undefined symbol problems anymore.Reinforce is one of my favorite algorithms in machine learning. It’s useful for reinforcement learning.

The formal goal of reinforce is to maximize an expectation, \sum_x p_\theta(x)r(x), where r(x) is the reward function and p_\theta(x) is a distribution. To apply reinforce, all we need is to be able to sample from p(x) and to evaluate the reward r(x), which is really nothing.

This is the case because of the following simple bit of math:

\nabla_\theta \sum_x p_\theta(x) r(x) = \sum_x p_\theta(x)\nabla_\theta \log p_\theta(x) r(x)

which clearly shows that to estimate the gradient wrt the expected reward, we merely need to sample from p(x) and weigh the gradient \nabla_\theta \log p_\theta(x) by the reward r(x).

Reinforce is so tantalizing because sampling from a distribution is very easy. For example, the distribution p(x) could be the combination of a parametrized control policy and the actual responses of the real world to our actions: x could be the sequence of states and actions chosen by our policy and the environment, so

p(x)=\prod_t p_{\textrm{world}}(x_t|x_{t-1},a_t)p_\theta(a_t|x_t),

and only part of p(x) is parameterized by \theta.

Similarly, r(x) is obviously easily computable from the environment.

So reinforce is dead easy to apply: for example, it could be applied to a robot’s policy. To sample from our distribution, we’d run the policy, get the robot to interact with our world, and collect our reward. And we’ll get an unbiased estimate of the gradient, and presto: we’d be doing stochastic gradient descent on the policy’s parameters.Unfortunately, this simple approach is not so easy to apply. The problem lies in the huge variance of our estimator \nabla_\theta p(x) r(x). It is easy to see, intuitively, where this variance comes from. Reinforce obtains its learning signal from the noise in its policy distribution p_\theta. In effect, reinforce makes a large number of random choices through the randomness in its policy distribution, and if they do better than average, then we’ll change our parameters so that these choices are more likely in the future. Similarly, if the random choices end up doing worse than random, the model will try to avoid choosing this specific configuration of actions.

(\sum_x p_\theta(x)\nabla_\theta \log p_\theta(x) r(x) = \sum_x p_\theta(x)\nabla_\theta \log p_\theta(x)(r(x)-r_\textrm{avg}) because \sum_x p(x)\nabla \log p(x)=0. There is a simple formula for choosing the optimal r_\textrm{avg}).

To paraphrase, reinforce adds a small perturbation to the choices and sees if the total reward has improved. If we’re trying to do something akin to supervised learning with reinforce with a small label set, then reinforce won’t do so terribly: each action would be a classification, and we’d have a decent chance to guess the correct answer. So we’ll be guessing the correct answer quite often, which will supply our neural net with training signal, and learning will succeed. However, it’s completely hopeless to train a system that makes a large number of decisions with a tiny reward in the end.On a more optimistic note, large companies that deploy millions of robots could refine their robot’s policies with large scale reinforce. During the day, the robots will collect the data for the policy, and during the night, the policy will be updated.

Go Back

Comment