Featured post
haskell - Neural Network Always Produces Same/Similar Outputs for Any Input -
i have problem trying create neural network tic-tac-toe. however, reason, training neural network causes produce same output given input.
i did take @ artificial neural networks benchmark, network implementation built neurons same activation function each neuron, i.e. no constant neurons.
to make sure problem wasn't due choice of training set (1218 board states , moves generated genetic algorithm), tried train network reproduce xor. logistic activation function used. instead of using derivative, multiplied error output*(1-output)
sources suggested equivalent using derivative. can put haskell source on hpaste, it's little embarrassing at. network has 3 layers: first layer has 2 inputs , 4 outputs, second has 4 inputs , 1 output, , third has 1 output. increasing 4 neurons in second layer didn't help, , neither did increasing 8 outputs in first layer.
i calculated errors, network output, bias updates, , weight updates hand based on http://hebb.mit.edu/courses/9.641/2002/lectures/lecture04.pdf make sure there wasn't error in parts of code (there wasn't, again make sure). because using batch training, did not multiply x
in equation (4) there. adding weight change, though http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html suggests subtract instead.
the problem persisted, in simplified network. example, these results after 500 epochs of batch training , of incremental training.
input |target|output (batch) |output(incremental) [1.0,1.0]|[0.0] |[0.5003781562785173]|[0.5009731800870864] [1.0,0.0]|[1.0] |[0.5003740346965251]|[0.5006347214672715] [0.0,1.0]|[1.0] |[0.5003734471544522]|[0.500589332376345] [0.0,0.0]|[0.0] |[0.5003674110937019]|[0.500095157458231]
subtracting instead of adding produces same problem, except 0.99 instead of 0.50 something. 5000 epochs produces same result, except batch-trained network returns 0.5 each case. (heck, 10,000 epochs didn't work batch training.)
is there in general produce behavior?
also, looked @ intermediate errors incremental training, , although inputs of hidden/input layers varied, error output neuron +/-0.12. batch training, errors increasing, extremely , errors extremely small (x10^-7). different initial random weights , biases made no difference, either.
note school project, hints/guides more helpful. although reinventing wheel , making own network (in language don't know well!) horrible idea, felt more appropriate school project (so know what's going on...in theory, @ least. there doesn't seem computer science teacher @ school).
edit: 2 layers, input layer of 2 inputs 8 outputs, , output layer of 8 inputs 1 output, produces same results: 0.5+/-0.2 (or so) each training case. i'm playing around pybrain, seeing if network structure there work.
edit 2: using learning rate of 0.1. sorry forgetting that.
edit 3: pybrain's "trainuntilconvergence" doesn't me trained network, either, 20000 epochs does, 16 neurons in hidden layer. 10000 epochs , 4 neurons, not much, close. so, in haskell, input layer having 2 inputs & 2 outputs, hidden layer 2 inputs , 8 outputs, , output layer 8 inputs , 1 output...i same problem 10000 epochs. , 20000 epochs.
edit 4: ran network hand again based on mit pdf above, , values match, code should correct unless misunderstanding equations.
some of source code @ http://hpaste.org/42453/neural_network__not_working; i'm working on cleaning code , putting in github (rather private bitbucket) repository.
all of relevant source code @ https://github.com/l33tnerd/hsann.
i've had similar problems, able solve changing these:
- scale down problem manageable size. first tried many inputs, many hidden layer units. once scaled down problem, see if solution smaller problem working. works because when it's scaled down, times compute weights drop down significantly, can try many different things without waiting.
- make sure have enough hidden units. major problem me. had 900 inputs connecting ~10 units in hidden layer. way small converge. became slow if added additional units. scaling down number of inputs helped lot.
- change activation function , parameters. using tanh @ first. tried other functions: sigmoid, normalized sigmoid, gaussian, etc.. found changing function parameters make functions steeper or shallower affected how network converged.
- change learning algorithm parameters. try different learning rates (0.01 0.9). try different momentum parameters, if algo supports (0.1 0.9).
hope helps find thread on google!
- Get link
- X
- Other Apps
Comments
Post a Comment