c++重写卷积网络的前向计算过程，复现theano的测试结果

发布时间：2025-12-09 16:15:36 浏览次数：3

转自http://blog.csdn.net/qiaofangjie/article/details/18042407

本人的需求是：

通过theano的cnn训练神经网络，将最终稳定的网络权值保存下来。c++实现cnn的前向计算过程，读取theano的权值，复现theano的测试结果

本人最终的成果是：

1、卷积神经网络的前向计算过程
2、mlp网络的前向与后向计算，也就是可以用来训练样本
需要注意的是：
如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；
否则，为了mlp的训练过程，激活函数要选择sigmoid

成果的展现：

下图是theano的训练以及测试结果，验证样本错误率为9.23%

下面是我的c++程序，验证错误率也是9.23%，完美复现theano的结果

简单讲难点有两个：

1.theano的权值以及测试样本与c++如何互通？

2.theano的卷积的时候，上层输入的featuremap如何组合，映射到本层的每个像素点上？

在解决上述两点的过程中，走了很多的弯路：

为了用c++重现theano的测试结果，必须让c++能够读取theano保存的权值以及测试样本。思考分析如下： 1.theano的权值是numpy格式，而它直接与c++交互，很困难，numpy的格式不好解析，网上资料很少 2.采用python做中间转换，实现1)的要求。后看theano代码，发现读入python的训练样本，不用转换成numpy数组，用本来python就可以了。但是python经过cPickle的dump文件，加了很多格式，不适合同c++交互。 3. 用json转换，由于python和cpp都有json的接口，都转成json的格式，然后再交互。可是theano训练之后权值是numpy格式的，需要转成python数组，json才可以存到文件中。现在的问题是怎么把numpy转成python的list？ 4.为了解决3，找了一天，终于找到了numpy数组的tolist接口，可以将numpy数组转换成python的list。 5.现在python和c++都可以用json了。研究jsoncpp库的使用，将python的json文件读取。通过测试发现，库 jsoncpp不适合读取大文件，很容易造成内存不足，效率极低，故不可取。 6.用c++写函数，自己解析json文件。并且通过pot文件生成训练与测试样本的时候，也直接用c++来生成，不需要转换成numpy数组的格式。经过上述分析，解决了难点1。通过json格式实现c++与theano权值与测试样本的互通，并且自己写函数解析json文件。
对于难点2，看一个典型的cnn网络图

难点2的详细描述如下：

Theano从S2到C3的时候，如何选择S2的featuremap进行组合？每次固定选取还是根据一定的算法动态组合？
Theano从C3到S4的pooling过程，令poolsize是(2*2),如何将C3的每4个像素变成S4的一个像素？

通过大量的分析，对比验证，发现以下结论：

Theano从S2到C3的时候，选择S2的所有featuremap进行组合
Theano从C3到S4的pooling过程，令poolsize是(2*2),，对于C3的每4个像素，选取最大值作为S4的一个像素

通过以上的分析，理论上初步已经弄清楚了。下面就是要根据理论编写代码，真正耗时的是代码的调试过程，总是复现不了theano的测试结果。曾经不止一次的认为这是不可能复现的，鬼知道theano怎么搞的。今天终于将代码调通，很是兴奋，于是有了这篇博客。阻碍我实现结果的bug主要有两个，一个是理论上的不足，对theano卷积的细节把握不准确；一个是自己写代码时粗心，变量初始化错误。如下：

S2到C3卷积时，theano会对卷积核旋转180度之后，才会像下图这样进行卷积（本人刚接触这块，实在是不知道啊。。。）

C3到S4取像素最大值的时候，想当然认为像素都是正的，变量初始化为0，导致最终找最大值错误（这个bug找的时间最久，血淋淋的教训。。。）

theano对写权值的函数，注意它保存的是卷积核旋转180度后的权值，如果权值是二维的，那么行列互换（与c++的权值表示法统一） [python] view plain copy

def getDataJson(layers):

data = []

i = 0

for layer in layers:

w, b = layer.params

# print '..layer is', i

w, b = w.get_value(), b.get_value()

wshape = w.shape

# print '...the shape of w is', wshape

if len(wshape) == 2:

w = w.transpose()

else:

for k in xrange(wshape[0]):

for j in xrange(wshape[1]):

w[k][j] = numpy.rot90(w[k][j], 2)

w = w.reshape((wshape[0], numpy.prod(wshape[1:])))

w = w.tolist()

b = b.tolist()

data.append([w, b])

i += 1

return data

def writefile(data, name = '../../tmp/src/data/theanocnn.json'):

print ('writefile is ' + name)

f = open(name, "wb")

json.dump(data,f)

f.close()

theano读权值 [python] view plain copy

def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):

# Load the dataset

print ('readfile is ' + name)

f = open(name, 'rb')

data = json.load(f)

f.close()

readwb(data, layers, nkerns)

def readwb(data, layers, nkerns):

i = 0

kernSize = len(nkerns)

inputnum = 1

for layer in layers:

w, b = data[i]

w = numpy.array(w, dtype='float32')

b = numpy.array(b, dtype='float32')

# print '..layer is', i

# print w.shape

if i >= kernSize:

w = w.transpose()

else:

w = w.reshape((nkerns[i], inputnum, 5, 5))

for k in xrange(nkerns[i]):

for j in xrange(inputnum):

c = w[k][j]

w[k][j] = numpy.rot90(c, 2)

inputnum = nkerns[i]

# print '..readwb ，transpose and rot180'

# print w.shape

layer.W.set_value(w, borrow=True)

layer.b.set_value(b, borrow=True)

i += 1

测试样本由手写数字库mnist生成，核心代码如下： [python] view plain copy

def mnist2json_small(cnnName = 'mnist_small.json', validNumber = 10):

dataset = '../../data/mnist.pkl'

print '... loading data', dataset

# Load the dataset

f = open(dataset, 'rb')

train_set, valid_set, test_set = cPickle.load(f)

#print test_set

f.close()

def np2listSmall(train_set, number):

trainfile = []

trains, labels = train_set

trainfile = []

#如果注释掉下面，将生成number个验证样本

number = len(labels)

for one in trains[:number]:

one = one.tolist()

trainfile.append(one)

labelfile = labels[:number].tolist()

datafile = [trainfile, labelfile]

return datafile

smallData = valid_set

print len(smallData)

valid, validlabel = np2listSmall(smallData, validNumber)

datafile = [valid, validlabel]

basedir = '../../tmp/src/data/'

# basedir = './'

json.dump(datafile, open(basedir + cnnName, 'wb'))

个人收获：

面对较难的任务，逐步分解，各个击破

解决问题的过程中，如果此路不通，要马上寻找其它思路，就像当年做数学证明题一样

态度要积极，不要轻言放弃，尽全力完成任务

代码调试时，应该首先构造较为全面的测试用例，这样可以迅速定位bug

本人的需求以及实现时的困难已经基本描述清楚，如果还有别的小问题，我相信大家花点比俺少很多很多的时间就可以解决，下面开始贴代码

如果不想自己建工程，这里有vs2008的c++代码，自己按照theano生成一下权值就可以读入运行了

C++代码

main.cpp

[cpp] view plain copy

#include <iostream>

#include "mlp.h"

#include "util.h"

#include "testinherit.h"

#include "neuralNetwork.h"

using namespace std;

/************************************************************************/

/* 本程序实现了

1、卷积神经网络的前向计算过程

2、mlp网络的前向与后向计算，也就是可以用来训练样本

需要注意的是：

如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；

否则，为了mlp的训练过程，激活函数要选择sigmoid

/************************************************************************/

int main()

{

cout << "****cnn****" << endl;

TestCnnTheano(28 * 28, 10);

// TestMlpMnist对mlp训练样本进行测试

//TestMlpMnist(28 * 28, 500, 10);

return 0;

}

neuralNetwork.h

[cpp] view plain copy

#ifndef NEURALNETWORK_H

#define NEURALNETWORK_H

#include "mlp.h"

#include "cnn.h"

#include <vector>

using std::vector;

/************************************************************************/

/* 这是一个卷积神经网络 */

/************************************************************************/

class NeuralNetWork

{

public:

NeuralNetWork(int iInput, int iOut);

~NeuralNetWork();

void Predict(double** in_data, int n);

double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);

void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);

void SetTrainNum(int iNum);

int Predict(double *pInputData);

// void Forward_propagation(double** ppdata, int n);

double* Forward_propagation(double *);

private:

int m_iSampleNum; //样本数量

int m_iInput; //输入维数

int m_iOut; //输出维数

vector<CnnLayer *> vecCnns;

Mlp *m_pMlp;

};

void TestCnnTheano(const int iInput, const int iOut);

#endif

neuralNetwork.cpp

[cpp] view plain copy

#include "neuralNetwork.h"

#include <iostream>

#include "util.h"

#include <iomanip>

using namespace std;

NeuralNetWork::NeuralNetWork(int iInput, int iOut):m_iSampleNum(0), m_iInput(iInput), m_iOut(iOut), m_pMlp(NULL)

{

int iFeatureMapNumber = 20, iPoolWidth = 2, iInputImageWidth = 28, iKernelWidth = 5, iInputImageNumber = 1;

CnnLayer *pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);

vecCnns.push_back(pCnnLayer);

iInputImageNumber = 20;

iInputImageWidth = 12;

iFeatureMapNumber = 50;

pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);

vecCnns.push_back(pCnnLayer);

const int ihiddenSize = 1;

int phidden[ihiddenSize] = {500};

// construct LogisticRegression

m_pMlp = new Mlp(m_iSampleNum, iFeatureMapNumber * 4 * 4, m_iOut, ihiddenSize, phidden);

}

NeuralNetWork::~NeuralNetWork()

{

for (vector<CnnLayer*>::iterator it = vecCnns.begin(); it != vecCnns.end(); ++it)

{

delete *it;

}

delete m_pMlp;

}

void NeuralNetWork::SetTrainNum(int iNum)

{

m_iSampleNum = iNum;

for (size_t i = 0; i < vecCnns.size(); ++i)

{

vecCnns[i]->SetTrainNum(iNum);

}

m_pMlp->SetTrainNum(iNum);

}

int NeuralNetWork::Predict(double *pdInputdata)

{

double *pdPredictData = NULL;

pdPredictData = Forward_propagation(pdInputdata);

int iResult = -1;

iResult = m_pMlp->m_pLogisticLayer->Predict(pdPredictData);

return iResult;

}

double* NeuralNetWork::Forward_propagation(double *pdInputData)

{

double *pdPredictData = pdInputData;

vector<CnnLayer*>::iterator it;

CnnLayer *pCnnLayer;

for (it = vecCnns.begin(); it != vecCnns.end(); ++it)

{

pCnnLayer = *it;

pCnnLayer->Forward_propagation(pdPredictData);

pdPredictData = pCnnLayer->GetOutputData();

}

//此时pCnnLayer指向最后一个卷积层,pdInputData是卷积层的最后输出

//暂时忽略mlp的前向计算，以后加上

pdPredictData = m_pMlp->Forward_propagation(pdPredictData);

return pdPredictData;

}

void NeuralNetWork::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)

{

for (size_t i = 0; i < vecCnns.size(); ++i)

{

vecCnns[i]->Setwb(vvAllw[i], vvAllb[i]);

}

size_t iLayerNum = vvAllw.size();

for (size_t i = vecCnns.size(); i < iLayerNum - 1; ++i)

{

int iHiddenIndex = 0;

m_pMlp->m_ppHiddenLayer[iHiddenIndex]->Setwb(vvAllw[i], vvAllb[i]);

++iHiddenIndex;

}

m_pMlp->m_pLogisticLayer->Setwb(vvAllw[iLayerNum - 1], vvAllb[iLayerNum - 1]);

}

double NeuralNetWork::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)

{

cout << "Predict------------" << endl;

int iErrorNumber = 0, iValidNumber = vecValidlabel.size();

//iValidNumber = 1;

for (int i = 0; i < iValidNumber; ++i)

{

int iResult = Predict(vecvalid[i]);

//cout << i << ",valid is " << iResult << " label is " << vecValidlabel[i] << endl;

if (iResult != vecValidlabel[i])

{

++iErrorNumber;

}

cout << "the num of error is " << iErrorNumber << endl;

double dErrorRate = (double)iErrorNumber / iValidNumber;

cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;

return dErrorRate;

}

/************************************************************************/

测试样本采用mnist库，此cnn的结构与theano教程上的一致，即

输入是28*28图像，接下来是2个卷积层（卷积+pooling），featuremap个数分别是20和50，

然后是全连接层（500个神经元），最后输出层10个神经元

/************************************************************************/

void TestCnnTheano(const int iInput, const int iOut)

{

//构建卷积神经网络

NeuralNetWork neural(iInput, iOut);

//存取theano的权值

vector< vector<double*> > vvAllw;

vector< vector<double> > vvAllb;

//存取测试样本与标签

vector<double*> vecValid;

vector<WORD> vecLabel;

//保存theano权值与测试样本的文件

const char *szTheanoWeigh = "../../data/theanocnn.json", *szTheanoTest = "../../data/mnist_validall.json";

//将每次权值的第二维（列宽）保存到vector中，用于读取json文件

vector<int> vecSecondDimOfWeigh;

vecSecondDimOfWeigh.push_back(5 * 5);

vecSecondDimOfWeigh.push_back(20 * 5 * 5);

vecSecondDimOfWeigh.push_back(50 * 4 * 4);

vecSecondDimOfWeigh.push_back(500);

cout << "loadwb ---------" << endl;

//读取权值

LoadWeighFromJson(vvAllw, vvAllb, szTheanoWeigh, vecSecondDimOfWeigh);

//将权值设置到cnn中

neural.Setwb(vvAllw, vvAllb);

//读取测试文件

LoadTestSampleFromJson(vecValid, vecLabel, szTheanoTest, iInput);

//设置测试样本的总量

int iVaildNum = vecValid.size();

neural.SetTrainNum(iVaildNum);

//前向计算cnn的错误率，输出结果

neural.CalErrorRate(vecValid, vecLabel);

//释放测试文件所申请的空间

for (vector<double*>::iterator cit = vecValid.begin(); cit != vecValid.end(); ++cit)

{

delete [](*cit);

}

cnn.h

[cpp] view plain copy

#ifndef CNN_H

#define CNN_H

#include "featuremap.h"

#include "poollayer.h"

#include <vector>

using std::vector;

typedef unsigned short WORD;

/**

*本卷积模拟theano的测试过程

*当输入层是num个featuremap时，本层卷积层假设有featureNum个featuremap。

*对于本层每个像素点选取，上一层num个featuremap一起组合，并且没有bias

*然后本层输出到pooling层，pooling只对poolsize内的像素取最大值，然后加上bias，总共有featuremap个bias值

class CnnLayer

{

public:

CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,

int iKernelWidth, int iPoolWidth);

~CnnLayer();

void Forward_propagation(double *pdInputData);

void Back_propagation(double* , double* , double );

void Train(double *x, WORD y, double dLr);

int Predict(double *);

void Setwb(vector<double*> &vpdw, vector<double> &vdb);

void SetInputAllData(double **ppInputAllData, int iInputNum);

void SetTrainNum(int iSampleNumber);

void PrintOutputData();

double* GetOutputData();

private:

int m_iSampleNum;

FeatureMap *m_pFeatureMap;

PoolLayer *m_pPoolLayer;

//反向传播时所需值

double **m_ppdDelta;

double *m_pdInputData;

double *m_pdOutputData;

};

void TestCnn();

#endif // CNN_H

cnn.cpp

[cpp] view plain copy

#include "cnn.h"

#include "util.h"

#include <cassert>

CnnLayer::CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,

int iKernelWidth, int iPoolWidth):

m_iSampleNum(iSampleNum), m_pdInputData(NULL), m_pdOutputData(NULL)

{

m_pFeatureMap = new FeatureMap(iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth);

int iFeatureMapWidth = iInputImageWidth - iKernelWidth + 1;

m_pPoolLayer = new PoolLayer(iFeatureMapNumber, iPoolWidth, iFeatureMapWidth);

}

CnnLayer::~CnnLayer()

{

delete m_pFeatureMap;

delete m_pPoolLayer;

}

void CnnLayer::Forward_propagation(double *pdInputData)

{

m_pFeatureMap->Convolute(pdInputData);

m_pPoolLayer->Convolute(m_pFeatureMap->GetFeatureMapValue());

m_pdOutputData = m_pPoolLayer->GetOutputData();

/************************************************************************/

/* 调试卷积过程的各阶段结果，调通后删除 */

/************************************************************************/

/*m_pFeatureMap->PrintOutputData();

m_pPoolLayer->PrintOutputData();*/

}

void CnnLayer::SetInputAllData(double **ppInputAllData, int iInputNum)

{

}

double* CnnLayer::GetOutputData()

{

assert(NULL != m_pdOutputData);

return m_pdOutputData;

}

void CnnLayer::Setwb(vector<double*> &vpdw, vector<double> &vdb)

{

m_pFeatureMap->SetWeigh(vpdw);

m_pPoolLayer->SetBias(vdb);

}

void CnnLayer::SetTrainNum( int iSampleNumber )

{

m_iSampleNum = iSampleNumber;

}

void CnnLayer::PrintOutputData()

{

m_pFeatureMap->PrintOutputData();

m_pPoolLayer->PrintOutputData();

}

void TestCnn()

{

const int iFeatureMapNumber = 2, iPoolWidth = 2, iInputImageWidth = 8, iKernelWidth = 3, iInputImageNumber = 2;

double *pdImage = new double[iInputImageWidth * iInputImageWidth * iInputImageNumber];

double arrInput[iInputImageNumber][iInputImageWidth * iInputImageWidth];

MakeCnnSample(arrInput, pdImage, iInputImageWidth, iInputImageNumber);

double *pdKernel = new double[3 * 3 * iInputImageNumber];

double arrKernel[3 * 3 * iInputImageNumber];

MakeCnnWeigh(pdKernel, iInputImageNumber) ;

CnnLayer cnn(3, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);

vector <double*> vecWeigh;

vector <double> vecBias;

for (int i = 0; i < iFeatureMapNumber; ++i)

{

vecBias.push_back(1.0);

}

vecWeigh.push_back(pdKernel);

for (int i = 0; i < 3 * 3 * 2; ++i)

{

arrKernel[i] = i;

}

vecWeigh.push_back(arrKernel);

cnn.Setwb(vecWeigh, vecBias);

cnn.Forward_propagation(pdImage);

cnn.PrintOutputData();

delete []pdKernel;

delete []pdImage;

}

featuremap.h

[cpp] view plain copy

#ifndef FEATUREMAP_H

#define FEATUREMAP_H

#include <cassert>

#include <vector>

using std::vector;

class FeatureMap

{

public:

FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKernelWidth);

~FeatureMap();

void Forward_propagation(double* );

void Back_propagation(double* , double* , double );

void Convolute(double *pdInputData);

int GetFeatureMapSize()

{

return m_iFeatureMapSize;

}

int GetFeatureMapWidth()

{

return m_iFeatureMapWidth;

}

double* GetFeatureMapValue()

{

assert(m_pdOutputValue != NULL);

return m_pdOutputValue;

}

void SetWeigh(const vector<double *> &vecWeigh);

void PrintOutputData();

double **m_ppdWeigh;

double *m_pdBias;

private:

int m_iInputImageNumber;

int m_iInputImageWidth;

int m_iInputImageSize;

int m_iFeatureMapNumber;

int m_iFeatureMapWidth;

int m_iFeatureMapSize;

int m_iKernelWidth;

// double m_dBias;

double *m_pdOutputValue;

};

#endif // FEATUREMAP_H

featuremap.cpp

[cpp] view plain copy

#include "featuremap.h"

#include "util.h"

#include <cassert>

FeatureMap::FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKernelWidth):

m_iInputImageNumber(iInputImageNumber),

m_iInputImageWidth(iInputImageWidth),

m_iFeatureMapNumber(iFeatureMapNumber),

m_iKernelWidth(iKernelWidth)

{

m_iFeatureMapWidth = m_iInputImageWidth - m_iKernelWidth + 1;

m_iInputImageSize = m_iInputImageWidth * m_iInputImageWidth;

m_iFeatureMapSize = m_iFeatureMapWidth * m_iFeatureMapWidth;

int iKernelSize;

iKernelSize = m_iKernelWidth * m_iKernelWidth;

double dbase = 1.0 / m_iInputImageSize;

srand((unsigned)time(NULL));

m_ppdWeigh = new double*[m_iFeatureMapNumber];

m_pdBias = new double[m_iFeatureMapNumber];

for (int i = 0; i < m_iFeatureMapNumber; ++i)

{

m_ppdWeigh[i] = new double[m_iInputImageNumber * iKernelSize];

for (int j = 0; j < m_iInputImageNumber * iKernelSize; ++j)

{

m_ppdWeigh[i][j] = uniform(-dbase, dbase);

}

//m_pdBias[i] = uniform(-dbase, dbase);

//theano的卷积层貌似没有用到bias，它在pooling层使用

m_pdBias[i] = 0;

}

m_pdOutputValue = new double[m_iFeatureMapNumber * m_iFeatureMapSize];

// m_dBias = uniform(-dbase, dbase);

}

FeatureMap::~FeatureMap()

{

delete []m_pdOutputValue;

delete []m_pdBias;

for (int i = 0; i < m_iFeatureMapNumber; ++i)

{

delete []m_ppdWeigh[i];

}

delete []m_ppdWeigh;

}

void FeatureMap::SetWeigh(const vector<double *> &vecWeigh)

{

assert(vecWeigh.size() == (DWORD)m_iFeatureMapNumber);

for (int i = 0; i < m_iFeatureMapNumber; ++i)

{

delete []m_ppdWeigh[i];

m_ppdWeigh[i] = vecWeigh[i];

}

卷积计算

pdInputData:一维向量，包含若干个输入图像

void FeatureMap::Convolute(double *pdInputData)

{

for (int iMapIndex = 0; iMapIndex < m_iFeatureMapNumber; ++iMapIndex)

{

double dBias = m_pdBias[iMapIndex];

//每一个featuremap

for (int i = 0; i < m_iFeatureMapWidth; ++i)

{

for (int j = 0; j < m_iFeatureMapWidth; ++j)

{

double dSum = 0.0;

int iInputIndex, iKernelIndex, iInputIndexStart, iKernelStart, iOutIndex;

//输出向量的索引计算

iOutIndex = iMapIndex * m_iFeatureMapSize + i * m_iFeatureMapWidth + j;

//分别计算每一个输入图像

for (int k = 0; k < m_iInputImageNumber; ++k)

{

//与kernel对应的输入图像的起始位置

//iInputIndexStart = k * m_iInputImageSize + j * m_iInputImageWidth + i;

iInputIndexStart = k * m_iInputImageSize + i * m_iInputImageWidth + j;

//kernel的起始位置

iKernelStart = k * m_iKernelWidth * m_iKernelWidth;

for (int m = 0; m < m_iKernelWidth; ++m)

{

for (int n = 0; n < m_iKernelWidth; ++n)

{

//iKernelIndex = iKernelStart + n * m_iKernelWidth + m;

iKernelIndex = iKernelStart + m * m_iKernelWidth + n;

//i am not sure, is the expression of below correct?

iInputIndex = iInputIndexStart + m * m_iInputImageWidth + n;

dSum += pdInputData[iInputIndex] * m_ppdWeigh[iMapIndex][iKernelIndex];

}//end n

}//end m

}//end k

//加上偏置

//dSum += dBias;

m_pdOutputValue[iOutIndex] = dSum;

}//end j

}//end i

}//end iMapIndex

}

void FeatureMap::PrintOutputData()

{

for (int i = 0; i < m_iFeatureMapNumber; ++i)

{

cout << "featuremap " << i <<endl;

for (int m = 0; m < m_iFeatureMapWidth; ++m)

{

for (int n = 0; n < m_iFeatureMapWidth; ++n)

{

cout << m_pdOutputValue[i * m_iFeatureMapSize +m * m_iFeatureMapWidth +n] << ' ';

}

cout << endl;

}

cout <<endl;

}

poollayer.h

[cpp] view plain copy

#ifndef POOLLAYER_H

#define POOLLAYER_H

#include <vector>

using std::vector;

class PoolLayer

{

public:

PoolLayer(int iOutImageNumber, int iPoolWidth, int iFeatureMapWidth);

~PoolLayer();

void Convolute(double *pdInputData);

void SetBias(const vector<double> &vecBias);

double* GetOutputData();

void PrintOutputData();

private:

int m_iOutImageNumber;

int m_iPoolWidth;

int m_iFeatureMapWidth;

int m_iPoolSize;

int m_iOutImageEdge;

int m_iOutImageSize;

double *m_pdOutData;

double *m_pdBias;

};

#endif // POOLLAYER_H

poollayer.cpp

[cpp] view plain copy

#include "poollayer.h"

#include "util.h"

#include <cassert>

PoolLayer::PoolLayer(int iOutImageNumber, int iPoolWidth, int iFeatureMapWidth):

m_iOutImageNumber(iOutImageNumber),

m_iPoolWidth(iPoolWidth),

m_iFeatureMapWidth(iFeatureMapWidth)

{

m_iPoolSize = m_iPoolWidth * m_iPoolWidth;

m_iOutImageEdge = m_iFeatureMapWidth / m_iPoolWidth;

m_iOutImageSize = m_iOutImageEdge * m_iOutImageEdge;

m_pdOutData = new double[m_iOutImageNumber * m_iOutImageSize];

m_pdBias = new double[m_iOutImageNumber];

/*for (int i = 0; i < m_iOutImageNumber; ++i)

{

m_pdBias[i] = 1;

}*/

}

PoolLayer::~PoolLayer()

{

delete []m_pdOutData;

delete []m_pdBias;

}

void PoolLayer::Convolute(double *pdInputData)

{

int iFeatureMapSize = m_iFeatureMapWidth * m_iFeatureMapWidth;

for (int iOutImageIndex = 0; iOutImageIndex < m_iOutImageNumber; ++iOutImageIndex)

{

double dBias = m_pdBias[iOutImageIndex];

for (int i = 0; i < m_iOutImageEdge; ++i)

{

for (int j = 0; j < m_iOutImageEdge; ++j)

{

double dValue = 0.0;

int iInputIndex, iInputIndexStart, iOutIndex;

/************************************************************************/

/* 这里是最大的bug，dMaxPixel初始值设置为0，然后找最大值

** 问题在于像素值有负数，导致后面一系列计算错误，实在是太难找了

/************************************************************************/

double dMaxPixel = INT_MIN ;

iOutIndex = iOutImageIndex * m_iOutImageSize + i * m_iOutImageEdge + j;

iInputIndexStart = iOutImageIndex * iFeatureMapSize + (i * m_iFeatureMapWidth + j) * m_iPoolWidth;

for (int m = 0; m < m_iPoolWidth; ++m)

{

for (int n = 0; n < m_iPoolWidth; ++n)

{

// int iPoolIndex = m * m_iPoolWidth + n;

//i am not sure, the expression of below is correct?

iInputIndex = iInputIndexStart + m * m_iFeatureMapWidth + n;

if (pdInputData[iInputIndex] > dMaxPixel)

{

dMaxPixel = pdInputData[iInputIndex];

}

}//end n

}//end m

dValue = dMaxPixel + dBias;

assert(iOutIndex < m_iOutImageNumber * m_iOutImageSize);

//m_pdOutData[iOutIndex] = (dMaxPixel);

m_pdOutData[iOutIndex] = mytanh(dValue);

}//end j

}//end i

}//end iOutImageIndex

}

void PoolLayer::SetBias(const vector<double> &vecBias)

{

assert(vecBias.size() == (DWORD)m_iOutImageNumber);

for (int i = 0; i < m_iOutImageNumber; ++i)

{

m_pdBias[i] = vecBias[i];

}

double* PoolLayer::GetOutputData()

{

assert(NULL != m_pdOutData);

return m_pdOutData;

}

void PoolLayer::PrintOutputData()

{

for (int i = 0; i < m_iOutImageNumber; ++i)

{

cout << "pool image " << i <<endl;

for (int m = 0; m < m_iOutImageEdge; ++m)

{

for (int n = 0; n < m_iOutImageEdge; ++n)

{

cout << m_pdOutData[i * m_iOutImageSize + m * m_iOutImageEdge + n] << ' ';

}

cout << endl;

}

cout <<endl;

}

mlp.h

[cpp] view plain copy

#ifndef MLP_H

#define MLP_H

#include "hiddenLayer.h"

#include "logisticRegression.h"

class Mlp

{

public:

Mlp(int n, int n_i, int n_o, int nhl, int *hls);

~Mlp();

// void Train(double** in_data, double** in_label, double dLr, int epochs);

void Predict(double** in_data, int n);

void Train(double *x, WORD y, double dLr);

void TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr);

double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);

void Writewb(const char *szName);

void Readwb(const char *szName);

void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);

void SetTrainNum(int iNum);

int Predict(double *pInputData);

// void Forward_propagation(double** ppdata, int n);

double* Forward_propagation(double *);

int* GetHiddenSize();

int GetHiddenNumber();

double *GetHiddenOutputData();

HiddenLayer **m_ppHiddenLayer;

LogisticRegression *m_pLogisticLayer;

private:

int m_iSampleNum; //样本数量

int m_iInput; //输入维数

int m_iOut; //输出维数

int m_iHiddenLayerNum; //隐层数目

int* m_piHiddenLayerSize; //中间隐层的大小 e.g. {3,4}表示有两个隐层，第一个有三个节点，第二个有4个节点

};

void mlp();

void TestMlpTheano(const int m_iInput, const int ihidden, const int m_iOut);

void TestMlpMnist(const int m_iInput, const int ihidden, const int m_iOut);

#endif

mlp.cpp

[cpp] view plain copy

#include <iostream>

#include "mlp.h"

#include "util.h"

#include <cassert>

#include <iomanip>

using namespace std;

const int m_iSamplenum = 8, innode = 3, outnode = 8;

Mlp::Mlp(int n, int n_i, int n_o, int nhl, int *hls)

{

m_iSampleNum = n;

m_iInput = n_i;

m_iOut = n_o;

m_iHiddenLayerNum = nhl;

m_piHiddenLayerSize = hls;

//构造网络结构

m_ppHiddenLayer = new HiddenLayer* [m_iHiddenLayerNum];

for(int i = 0; i < m_iHiddenLayerNum; ++i)

{

if(i == 0)

{

m_ppHiddenLayer[i] = new HiddenLayer(m_iInput, m_piHiddenLayerSize[i]);//第一个隐层

}

else

{

m_ppHiddenLayer[i] = new HiddenLayer(m_piHiddenLayerSize[i-1], m_piHiddenLayerSize[i]);//其他隐层

}

if (m_iHiddenLayerNum > 0)

{

m_pLogisticLayer = new LogisticRegression(m_piHiddenLayerSize[m_iHiddenLayerNum - 1], m_iOut, m_iSampleNum);//最后的softmax层

}

else

{

m_pLogisticLayer = new LogisticRegression(m_iInput, m_iOut, m_iSampleNum);//最后的softmax层

}

Mlp::~Mlp()

{

//二维指针分配的对象不一定是二维数组

for(int i = 0; i < m_iHiddenLayerNum; ++i)

delete m_ppHiddenLayer[i]; //删除的时候不能加[]

delete[] m_ppHiddenLayer;

//log_layer只是一个普通的对象指针，不能作为数组delete

delete m_pLogisticLayer;//删除的时候不能加[]

}

void Mlp::TrainAllSample(const vector<double *> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)

{

cout << "Mlp::TrainAllSample" << endl;

for (int j = 0; j < m_iSampleNum; ++j)

{

Train(vecTrain[j], vectrainlabel[j], dLr);

}

void Mlp::Train(double *pdTrain, WORD usLabel, double dLr)

{

// cout << "******pdLabel****" << endl;

// printArrDouble(ppdinLabel, m_iSampleNum, m_iOut);

double *pdLabel = new double[m_iOut];

MakeOneLabel(usLabel, pdLabel, m_iOut);

//前向传播阶段

for(int n = 0; n < m_iHiddenLayerNum; ++ n)

{

if(n == 0) //第一个隐层直接输入数据

{

m_ppHiddenLayer[n]->Forward_propagation(pdTrain);

}

else //其他隐层用前一层的输出作为输入数据

{

m_ppHiddenLayer[n]->Forward_propagation(m_ppHiddenLayer[n-1]->m_pdOutdata);

}

//softmax层使用最后一个隐层的输出作为输入数据

m_pLogisticLayer->Forward_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);

//反向传播阶段

m_pLogisticLayer->Back_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata, pdLabel, dLr);

for(int n = m_iHiddenLayerNum-1; n >= 1; --n)

{

if(n == m_iHiddenLayerNum-1)

{

m_ppHiddenLayer[n]->Back_propagation(m_ppHiddenLayer[n-1]->m_pdOutdata,

m_pLogisticLayer->m_pdDelta, m_pLogisticLayer->m_ppdW, m_pLogisticLayer->m_iOut, dLr);

}

else

{

double *pdInputData;

pdInputData = m_ppHiddenLayer[n-1]->m_pdOutdata;

m_ppHiddenLayer[n]->Back_propagation(pdInputData,

m_ppHiddenLayer[n+1]->m_pdDelta, m_ppHiddenLayer[n+1]->m_ppdW, m_ppHiddenLayer[n+1]->m_iOut, dLr);

}

//这里该怎么写？

if (m_iHiddenLayerNum > 1)

m_ppHiddenLayer[0]->Back_propagation(pdTrain,

m_ppHiddenLayer[1]->m_pdDelta, m_ppHiddenLayer[1]->m_ppdW, m_ppHiddenLayer[1]->m_iOut, dLr);

else

m_ppHiddenLayer[0]->Back_propagation(pdTrain,

m_pLogisticLayer->m_pdDelta, m_pLogisticLayer->m_ppdW, m_pLogisticLayer->m_iOut, dLr);

delete []pdLabel;

}

void Mlp::SetTrainNum(int iNum)

{

m_iSampleNum = iNum;

}

double* Mlp::Forward_propagation(double* pData)

{

double *pdForwardValue = pData;

for(int n = 0; n < m_iHiddenLayerNum; ++ n)

{

if(n == 0) //第一个隐层直接输入数据

{

pdForwardValue = m_ppHiddenLayer[n]->Forward_propagation(pData);

}

else //其他隐层用前一层的输出作为输入数据

{

pdForwardValue = m_ppHiddenLayer[n]->Forward_propagation(pdForwardValue);

}

return pdForwardValue;

//softmax层使用最后一个隐层的输出作为输入数据

// m_pLogisticLayer->Forward_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);

// m_pLogisticLayer->Predict(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);

}

int Mlp::Predict(double *pInputData)

{

Forward_propagation(pInputData);

int iResult = m_pLogisticLayer->Predict(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);

return iResult;

}

void Mlp::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)

{

for (int i = 0; i < m_iHiddenLayerNum; ++i)

{

m_ppHiddenLayer[i]->Setwb(vvAllw[i], vvAllb[i]);

}

m_pLogisticLayer->Setwb(vvAllw[m_iHiddenLayerNum], vvAllb[m_iHiddenLayerNum]);

}

void Mlp::Writewb(const char *szName)

{

for(int i = 0; i < m_iHiddenLayerNum; ++i)

{

m_ppHiddenLayer[i]->Writewb(szName);

}

m_pLogisticLayer->Writewb(szName);

}

double Mlp::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)

{

int iErrorNumber = 0, iValidNumber = vecValidlabel.size();

for (int i = 0; i < iValidNumber; ++i)

{

int iResult = Predict(vecvalid[i]);

if (iResult != vecValidlabel[i])

{

++iErrorNumber;

}

cout << "the num of error is " << iErrorNumber << endl;

double dErrorRate = (double)iErrorNumber / iValidNumber;

cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;

return dErrorRate;

}

void Mlp::Readwb(const char *szName)

{

long dcurpos = 0, dreadsize = 0;

for(int i = 0; i < m_iHiddenLayerNum; ++i)

{

dreadsize = m_ppHiddenLayer[i]->Readwb(szName, dcurpos);

cout << "hiddenlayer " << i + 1 << " read bytes: " << dreadsize << endl;

if (-1 != dreadsize)

dcurpos += dreadsize;

else

{

cout << "read wb error from HiddenLayer" << endl;

return;

}

dreadsize = m_pLogisticLayer->Readwb(szName, dcurpos);

if (-1 != dreadsize)

dcurpos += dreadsize;

else

{

cout << "read wb error from sofmaxLayer" << endl;

return;

}

int* Mlp::GetHiddenSize()

{

return m_piHiddenLayerSize;

}

double* Mlp::GetHiddenOutputData()

{

assert(m_iHiddenLayerNum > 0);

return m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata;

}

int Mlp::GetHiddenNumber()

{

return m_iHiddenLayerNum;

}

//double **makeLabelSample(double **label_x)

double **makeLabelSample(double label_x[][outnode])

{

double **pplabelSample;

pplabelSample = new double*[m_iSamplenum];

for (int i = 0; i < m_iSamplenum; ++i)

{

pplabelSample[i] = new double[outnode];

}

for (int i = 0; i < m_iSamplenum; ++i)

{

for (int j = 0; j < outnode; ++j)

pplabelSample[i][j] = label_x[i][j];

}

return pplabelSample;

}

double **maken_train(double train_x[][innode])

{

double **ppn_train;

ppn_train = new double*[m_iSamplenum];

for (int i = 0; i < m_iSamplenum; ++i)

{

ppn_train[i] = new double[innode];

}

for (int i = 0; i < m_iSamplenum; ++i)

{

for (int j = 0; j < innode; ++j)

ppn_train[i][j] = train_x[i][j];

}

return ppn_train;

}

void TestMlpMnist(const int m_iInput, const int ihidden, const int m_iOut)

{

const int ihiddenSize = 1;

int phidden[ihiddenSize] = {ihidden};

// construct LogisticRegression

Mlp neural(m_iSamplenum, m_iInput, m_iOut, ihiddenSize, phidden);

vector<double*> vecTrain, vecvalid;

vector<WORD> vecValidlabel, vectrainlabel;

LoadTestSampleFromJson(vecvalid, vecValidlabel, "../../data/mnist.json", m_iInput);

LoadTestSampleFromJson(vecTrain, vectrainlabel, "../../data/mnisttrain.json", m_iInput);

// test

int itrainnum = vecTrain.size();

neural.SetTrainNum(itrainnum);

const int iepochs = 1;

const double dLr = 0.1;

neural.CalErrorRate(vecvalid, vecValidlabel);

for (int i = 0; i < iepochs; ++i)

{

neural.TrainAllSample(vecTrain, vectrainlabel, dLr);

neural.CalErrorRate(vecvalid, vecValidlabel);

}

for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)

{

delete [](*cit);

}

for (vector<double*>::iterator cit = vecvalid.begin(); cit != vecvalid.end(); ++cit)

{

delete [](*cit);

}

void TestMlpTheano(const int m_iInput, const int ihidden, const int m_iOut)

{

const int ihiddenSize = 1;

int phidden[ihiddenSize] = {ihidden};

// construct LogisticRegression

Mlp neural(m_iSamplenum, m_iInput, m_iOut, ihiddenSize, phidden);

vector<double*> vecTrain, vecw;

vector<double> vecb;

vector<WORD> vecLabel;

vector< vector<double*> > vvAllw;

vector< vector<double> > vvAllb;

const char *pcfilename = "../../data/theanomlp.json";

vector<int> vecSecondDimOfWeigh;

vecSecondDimOfWeigh.push_back(m_iInput);

vecSecondDimOfWeigh.push_back(ihidden);

LoadWeighFromJson(vvAllw, vvAllb, pcfilename, vecSecondDimOfWeigh);

LoadTestSampleFromJson(vecTrain, vecLabel, "../../data/mnist_validall.json", m_iInput);

cout << "loadwb ---------" << endl;

int itrainnum = vecTrain.size();

neural.SetTrainNum(itrainnum);

neural.Setwb(vvAllw, vvAllb);

cout << "Predict------------" << endl;

neural.CalErrorRate(vecTrain, vecLabel);

for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)

{

delete [](*cit);

}

void mlp()

{

//输入样本

double X[m_iSamplenum][innode]= {

{0,0,0},{0,0,1},{0,1,0},{0,1,1},{1,0,0},{1,0,1},{1,1,0},{1,1,1}

};

double Y[m_iSamplenum][outnode]={

{1, 0, 0, 0, 0, 0, 0, 0},

{0, 1, 0, 0, 0, 0, 0, 0},

{0, 0, 1, 0, 0, 0, 0, 0},

{0, 0, 0, 1, 0, 0, 0, 0},

{0, 0, 0, 0, 1, 0, 0, 0},

{0, 0, 0, 0, 0, 1, 0, 0},

{0, 0, 0, 0, 0, 0, 1, 0},

{0, 0, 0, 0, 0, 0, 0, 1},

};

WORD pdLabel[outnode] = {0, 1, 2, 3, 4, 5, 6, 7};

const int ihiddenSize = 2;

int phidden[ihiddenSize] = {5, 5};

//printArr(phidden, 1);

Mlp neural(m_iSamplenum, innode, outnode, ihiddenSize, phidden);

double **train_x, **ppdLabel;

train_x = maken_train(X);

//printArrDouble(train_x, m_iSamplenum, innode);

ppdLabel = makeLabelSample(Y);

for (int i = 0; i < 3500; ++i)

{

for (int j = 0; j < m_iSamplenum; ++j)

{

neural.Train(train_x[j], pdLabel[j], 0.1);

}

cout<<"trainning complete..."<<endl;

for (int i = 0; i < m_iSamplenum; ++i)

neural.Predict(train_x[i]);

//szName存放权值

// const char *szName = "mlp55new.wb";

// neural.Writewb(szName);

// Mlp neural2(m_iSamplenum, innode, outnode, ihiddenSize, phidden);

// cout<<"Readwb start..."<<endl;

// neural2.Readwb(szName);

// cout<<"Readwb end..."<<endl;

// cout << "----------after readwb________" << endl;

// for (int i = 0; i < m_iSamplenum; ++i)

// neural2.Forward_propagation(train_x[i]);

for (int i = 0; i != m_iSamplenum; ++i)

{

delete []train_x[i];

delete []ppdLabel[i];

}

delete []train_x;

delete []ppdLabel;

cout<<endl;

}

hiddenLayer.h

[cpp] view plain copy

#ifndef HIDDENLAYER_H

#define HIDDENLAYER_H

#include "neuralbase.h"

class HiddenLayer: public NeuralBase

{

public:

HiddenLayer(int n_i, int n_o);

~HiddenLayer();

double* Forward_propagation(double* input_data);

void Back_propagation(double *pdInputData, double *pdNextLayerDelta,

double** ppdnextLayerW, int iNextLayerOutNum, double dLr);

};

#endif

hiddenLayer.cpp

[cpp] view plain copy

#include <cmath>

#include <cassert>

#include <cstdlib>

#include <ctime>

#include <iostream>

#include "hiddenLayer.h"

#include "util.h"

using namespace std;

HiddenLayer::HiddenLayer(int n_i, int n_o): NeuralBase(n_i, n_o, 0)

{

}

HiddenLayer::~HiddenLayer()

{

}

/************************************************************************/

/* 需要注意的是：

如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；

否则，为了mlp的训练过程，激活函数要选择sigmoid */

/************************************************************************/

double* HiddenLayer::Forward_propagation(double* pdInputData)

{

NeuralBase::Forward_propagation(pdInputData);

for(int i = 0; i < m_iOut; ++i)

{

// m_pdOutdata[i] = sigmoid(m_pdOutdata[i]);

m_pdOutdata[i] = mytanh(m_pdOutdata[i]);

}

return m_pdOutdata;

}

void HiddenLayer::Back_propagation(double *pdInputData, double *pdNextLayerDelta,

double** ppdnextLayerW, int iNextLayerOutNum, double dLr)

{

pdInputData 为输入数据

*pdNextLayerDelta 为下一层的残差值delta,是一个大小为iNextLayerOutNum的数组

**ppdnextLayerW 为此层到下一层的权值

iNextLayerOutNum 实际上就是下一层的n_out

dLr 为学习率learning rate

m_iSampleNum 为训练样本总数

//sigma元素个数应与本层单元个数一致，而网上代码有误

//作者是没有自己测试啊，测试啊

//double* sigma = new double[iNextLayerOutNum];

double* sigma = new double[m_iOut];

//double sigma[10];

for(int i = 0; i < m_iOut; ++i)

sigma[i] = 0.0;

for(int i = 0; i < iNextLayerOutNum; ++i)

{

for(int j = 0; j < m_iOut; ++j)

{

sigma[j] += ppdnextLayerW[i][j] * pdNextLayerDelta[i];

}

//计算得到本层的残差delta

for(int i = 0; i < m_iOut; ++i)

{

m_pdDelta[i] = sigma[i] * m_pdOutdata[i] * (1 - m_pdOutdata[i]);

}

//调整本层的权值w

for(int i = 0; i < m_iOut; ++i)

{

for(int j = 0; j < m_iInput; ++j)

{

m_ppdW[i][j] += dLr * m_pdDelta[i] * pdInputData[j];

}

m_pdBias[i] += dLr * m_pdDelta[i];

}

delete[] sigma;

}

logisticRegression.h

[cpp] view plain copy

#ifndef LOGISTICREGRESSIONLAYER

#define LOGISTICREGRESSIONLAYER

#include "neuralbase.h"

typedef unsigned short WORD;

class LogisticRegression: public NeuralBase

{

public:

LogisticRegression(int n_i, int i_o, int);

~LogisticRegression();

double* Forward_propagation(double* input_data);

void Softmax(double* x);

void Train(double *pdTrain, WORD usLabel, double dLr);

void SetOldWb(double ppdWeigh[][3], double arriBias[8]);

int Predict(double *);

void MakeLabels(int* piMax, double (*pplabels)[8]);

};

void Test_lr();

void Testwb();

void Test_theano(const int m_iInput, const int m_iOut);

#endif

logisticRegression.cpp

[cpp] view plain copy

#include <cmath>

#include <cassert>

#include <iomanip>

#include <ctime>

#include <iostream>

#include "logisticRegression.h"

#include "util.h"

using namespace std;

LogisticRegression::LogisticRegression(int n_i, int n_o, int n_t): NeuralBase(n_i, n_o, n_t)

{

}

LogisticRegression::~LogisticRegression()

{

}

void LogisticRegression::Softmax(double* x)

{

double _max = 0.0;

double _sum = 0.0;

for(int i = 0; i < m_iOut; ++i)

{

if(_max < x[i])

_max = x[i];

}

for(int i = 0; i < m_iOut; ++i)

{

x[i] = exp(x[i]-_max);

_sum += x[i];

}

for(int i = 0; i < m_iOut; ++i)

{

x[i] /= _sum;

}

double* LogisticRegression::Forward_propagation(double* pdinputdata)

{

NeuralBase::Forward_propagation(pdinputdata);

/************************************************************************/

/* 调试 */

/************************************************************************/

//cout << "Forward_propagation from LogisticRegression" << endl;

//PrintOutputData();

//cout << "over\n";

Softmax(m_pdOutdata);

return m_pdOutdata;

}

int LogisticRegression::Predict(double *pdtest)

{

Forward_propagation(pdtest);

/************************************************************************/

/* 调试使用 */

/************************************************************************/

//PrintOutputData();

int iResult = getMaxIndex(m_pdOutdata, m_iOut);

return iResult;

}

void LogisticRegression::Train(double *pdTrain, WORD usLabel, double dLr)

{

Forward_propagation(pdTrain);

double *pdLabel = new double[m_iOut];

MakeOneLabel(usLabel, pdLabel);

Back_propagation(pdTrain, pdLabel, dLr);

delete []pdLabel;

}

//double LogisticRegression::CalErrorRate(const vector<double*> &vecvalid, const vector<WORD> &vecValidlabel)

//{

// int iErrorNumber = 0, iValidNumber = vecValidlabel.size();

// for (int i = 0; i < iValidNumber; ++i)

// {

// int iResult = Predict(vecvalid[i]);

// if (iResult != vecValidlabel[i])

// {

// ++iErrorNumber;

// }

// cout << "the num of error is " << iErrorNumber << endl;

// double dErrorRate = (double)iErrorNumber / iValidNumber;

// cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;

// return dErrorRate;

//}

void LogisticRegression::SetOldWb(double ppdWeigh[][3], double arriBias[8])

{

for (int i = 0; i < m_iOut; ++i)

{

for (int j = 0; j < m_iInput; ++j)

m_ppdW[i][j] = ppdWeigh[i][j];

m_pdBias[i] = arriBias[i];

}

cout << "Setwb----------" << endl;

printArrDouble(m_ppdW, m_iOut, m_iInput);

printArr(m_pdBias, m_iOut);

}

//void LogisticRegression::TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)

//{

// for (int j = 0; j < m_iSamplenum; ++j)

// {

// Train(vecTrain[j], vectrainlabel[j], dLr);

// }

//}

void LogisticRegression::MakeLabels(int* piMax, double (*pplabels)[8])

{

for (int i = 0; i < m_iSamplenum; ++i)

{

for (int j = 0; j < m_iOut; ++j)

pplabels[i][j] = 0;

int k = piMax[i];

pplabels[i][k] = 1.0;

}

void Test_theano(const int m_iInput, const int m_iOut)

{

// construct LogisticRegression

LogisticRegression classifier(m_iInput, m_iOut, 0);

vector<double*> vecTrain, vecvalid, vecw;

vector<double> vecb;

vector<WORD> vecValidlabel, vectrainlabel;

LoadTestSampleFromJson(vecvalid, vecValidlabel, "../.../../data/mnist.json", m_iInput);

LoadTestSampleFromJson(vecTrain, vectrainlabel, "../.../../data/mnisttrain.json", m_iInput);

// test

int itrainnum = vecTrain.size();

classifier.m_iSamplenum = itrainnum;

const int iepochs = 5;

const double dLr = 0.1;

for (int i = 0; i < iepochs; ++i)

{

classifier.TrainAllSample(vecTrain, vectrainlabel, dLr);

if (i % 2 == 0)

{

cout << "Predict------------" << i + 1 << endl;

classifier.CalErrorRate(vecvalid, vecValidlabel);

}

for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)

{

delete [](*cit);

}

for (vector<double*>::iterator cit = vecvalid.begin(); cit != vecvalid.end(); ++cit)

{

delete [](*cit);

}

void Test_lr()

{

srand(0);

double learning_rate = 0.1;

double n_epochs = 200;

int test_N = 2;

const int trainNum = 8, m_iInput = 3, m_iOut = 8;

//int m_iOut = 2;

double train_X[trainNum][m_iInput] = {

{1, 1, 1},

{1, 1, 0},

{1, 0, 1},

{1, 0, 0},

{0, 1, 1},

{0, 1, 0},

{0, 0, 1},

{0, 0, 0}

};

//sziMax存储的是最大值的下标

int sziMax[trainNum];

for (int i = 0; i < trainNum; ++i)

sziMax[i] = trainNum - i - 1;

// construct LogisticRegression

LogisticRegression classifier(m_iInput, m_iOut, trainNum);

// Train online

for(int epoch=0; epoch<n_epochs; epoch++) {

for(int i=0; i<trainNum; i++) {

//classifier.trainEfficient(train_X[i], train_Y[i], learning_rate);

classifier.Train(train_X[i], sziMax[i], learning_rate);

}

const char *pcfile = "test.wb";

classifier.Writewb(pcfile);

LogisticRegression logistic(m_iInput, m_iOut, trainNum);

logistic.Readwb(pcfile, 0);

// test data

double test_X[2][m_iOut] = {

{1, 0, 1},

{0, 0, 1}

};

// test

cout << "before Readwb ---------" << endl;

for(int i=0; i<test_N; i++) {

classifier.Predict(test_X[i]);

cout << endl;

}

cout << "after Readwb ---------" << endl;

for(int i=0; i<trainNum; i++) {

logistic.Predict(train_X[i]);

cout << endl;

}

cout << "*********\n";

}

void Testwb()

{

// int test_N = 2;

const int trainNum = 8, m_iInput = 3, m_iOut = 8;

//int m_iOut = 2;

double train_X[trainNum][m_iInput] = {

{1, 1, 1},

{1, 1, 0},

{1, 0, 1},

{1, 0, 0},

{0, 1, 1},

{0, 1, 0},

{0, 0, 1},

{0, 0, 0}

};

double arriBias[m_iOut] = {1, 2, 3, 3, 3, 3, 2, 1};

// construct LogisticRegression

LogisticRegression classifier(m_iInput, m_iOut, trainNum);

classifier.SetOldWb(train_X, arriBias);

const char *pcfile = "test.wb";

classifier.Writewb(pcfile);

LogisticRegression logistic(m_iInput, m_iOut, trainNum);

logistic.Readwb(pcfile, 0);

}

neuralbase.h

[cpp] view plain copy

#ifndef NEURALBASE_H

#define NEURALBASE_H

#include <vector>

using std::vector;

typedef unsigned short WORD;

class NeuralBase

{

public:

NeuralBase(int , int , int);

virtual ~NeuralBase();

virtual double* Forward_propagation(double* );

virtual void Back_propagation(double* , double* , double );

virtual void Train(double *x, WORD y, double dLr);

virtual int Predict(double *);

void Callbackwb();

void MakeOneLabel(int iMax, double *pdLabel);

void TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr);

double CalErrorRate(const vector<double*> &vecvalid, const vector<WORD> &vecValidlabel);

void Printwb();

void Writewb(const char *szName);

long Readwb(const char *szName, long);

void Setwb(vector<double*> &vpdw, vector<double> &vdb);

void PrintOutputData();

int m_iInput;

int m_iOut;

int m_iSamplenum;

double** m_ppdW;

double* m_pdBias;

//本层前向传播的输出值，也是最终的预测值

double* m_pdOutdata;

//反向传播时所需值

double* m_pdDelta;

private:

void _callbackwb();

};

#endif // NEURALBASE_H

neuralbase.cpp

[cpp] view plain copy

#include "neuralbase.h"

#include <cmath>

#include <cassert>

#include <ctime>

#include <iomanip>

#include <iostream>

#include "util.h"

using namespace std;

NeuralBase::NeuralBase(int n_i, int n_o, int n_t):m_iInput(n_i), m_iOut(n_o), m_iSamplenum(n_t)

{

m_ppdW = new double* [m_iOut];

for(int i = 0; i < m_iOut; ++i)

{

m_ppdW[i] = new double [m_iInput];

}

m_pdBias = new double [m_iOut];

double a = 1.0 / m_iInput;

srand((unsigned)time(NULL));

for(int i = 0; i < m_iOut; ++i)

{

for(int j = 0; j < m_iInput; ++j)

m_ppdW[i][j] = uniform(-a, a);

m_pdBias[i] = uniform(-a, a);

}

m_pdDelta = new double [m_iOut];

m_pdOutdata = new double [m_iOut];

}

NeuralBase::~NeuralBase()

{

Callbackwb();

delete[] m_pdOutdata;

delete[] m_pdDelta;

}

void NeuralBase::Callbackwb()

{

_callbackwb();

}

double NeuralBase::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)

{

int iErrorNumber = 0, iValidNumber = vecValidlabel.size();

for (int i = 0; i < iValidNumber; ++i)

{

int iResult = Predict(vecvalid[i]);

if (iResult != vecValidlabel[i])

{

++iErrorNumber;

}

cout << "the num of error is " << iErrorNumber << endl;

double dErrorRate = (double)iErrorNumber / iValidNumber;

cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;

return dErrorRate;

}

int NeuralBase::Predict(double *)

{

cout << "NeuralBase::Predict(double *)" << endl;

return 0;

}

void NeuralBase::_callbackwb()

{

for(int i=0; i < m_iOut; i++)

delete []m_ppdW[i];

delete[] m_ppdW;

delete[] m_pdBias;

}

void NeuralBase::Printwb()

{

cout << "'****m_ppdW****\n";

for(int i = 0; i < m_iOut; ++i)

{

for(int j = 0; j < m_iInput; ++j)

cout << m_ppdW[i][j] << ' ';

cout << endl;

}

cout << "'****m_pdBias****\n";

for(int i = 0; i < m_iOut; ++i)

{

cout << m_pdBias[i] << ' ';

}

cout << endl;

cout << "'****output****\n";

for(int i = 0; i < m_iOut; ++i)

{

cout << m_pdOutdata[i] << ' ';

}

cout << endl;

}

double* NeuralBase::Forward_propagation(double* input_data)

{

for(int i = 0; i < m_iOut; ++i)

{

m_pdOutdata[i] = 0.0;

for(int j = 0; j < m_iInput; ++j)

{

m_pdOutdata[i] += m_ppdW[i][j]*input_data[j];

}

m_pdOutdata[i] += m_pdBias[i];

}

return m_pdOutdata;

}

void NeuralBase::Back_propagation(double* input_data, double* pdlabel, double dLr)

{

for(int i = 0; i < m_iOut; ++i)

{

m_pdDelta[i] = pdlabel[i] - m_pdOutdata[i] ;

for(int j = 0; j < m_iInput; ++j)

{

m_ppdW[i][j] += dLr * m_pdDelta[i] * input_data[j] / m_iSamplenum;

}

m_pdBias[i] += dLr * m_pdDelta[i] / m_iSamplenum;

}

void NeuralBase::MakeOneLabel(int imax, double *pdlabel)

{

for (int j = 0; j < m_iOut; ++j)

pdlabel[j] = 0;

pdlabel[imax] = 1.0;

}

void NeuralBase::Writewb(const char *szName)

{

savewb(szName, m_ppdW, m_pdBias, m_iOut, m_iInput);

}

long NeuralBase::Readwb(const char *szName, long dstartpos)

{

return loadwb(szName, m_ppdW, m_pdBias, m_iOut, m_iInput, dstartpos);

}

void NeuralBase::Setwb(vector<double*> &vpdw, vector<double> &vdb)

{

assert(vpdw.size() == (DWORD)m_iOut);

for (int i = 0; i < m_iOut; ++i)

{

delete []m_ppdW[i];

m_ppdW[i] = vpdw[i];

m_pdBias[i] = vdb[i];

}

void NeuralBase::TrainAllSample(const vector<double *> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)

{

for (int j = 0; j < m_iSamplenum; ++j)

{

Train(vecTrain[j], vectrainlabel[j], dLr);

}

void NeuralBase::Train(double *x, WORD y, double dLr)

{

(void)x;

(void)y;

(void)dLr;

cout << "NeuralBase::Train(double *x, WORD y, double dLr)" << endl;

}

void NeuralBase::PrintOutputData()

{

for (int i = 0; i < m_iOut; ++i)

{

cout << m_pdOutdata[i] << ' ';

}

cout << endl;

}

util.h

[cpp] view plain copy

#ifndef UTIL_H

#define UTIL_H

#include <iostream>

#include <cstdio>

#include <cstdlib>

#include <ctime>

#include <vector>

using namespace std;

typedef unsigned char BYTE;

typedef unsigned short WORD;

typedef unsigned int DWORD;

double sigmoid(double x);

double mytanh(double dx);

typedef struct stShapeWb

{

stShapeWb(int w, int h):width(w), height(h){}

int width;

int height;

}ShapeWb_S;

void MakeOneLabel(int iMax, double *pdLabel, int m_iOut);

double uniform(double _min, double _max);

//void printArr(T *parr, int num);

//void printArrDouble(double **pparr, int row, int col);

void initArr(double *parr, int num);

int getMaxIndex(double *pdarr, int num);

void Printivec(const vector<int> &ivec);

void savewb(const char *szName, double **m_ppdW, double *m_pdBias,

int irow, int icol);

long loadwb(const char *szName, double **m_ppdW, double *m_pdBias,

int irow, int icol, long dstartpos);

void TestLoadJson(const char *pcfilename);

bool LoadvtFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput);

bool LoadwbFromJson(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput);

bool LoadTestSampleFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput);

bool LoadwbByByte(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput);

bool LoadallwbByByte(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb, const char *filename,

const int m_iInput, const int ihidden, const int m_iOut);

bool LoadWeighFromJson(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb,

const char *filename, const vector<int> &vecSecondDimOfWeigh);

void MakeCnnSample(double arr[2][64], double *pdImage, int iImageWidth, int iNumOfImage );

void MakeCnnWeigh(double *, int iNumOfKernel);

template <typename T>

void printArr(T *parr, int num)

{

cout << "****printArr****" << endl;

for (int i = 0; i < num; ++i)

cout << parr[i] << ' ';

cout << endl;

}

template <typename T>

void printArrDouble(T **pparr, int row, int col)

{

cout << "****printArrDouble****" << endl;

for (int i = 0; i < row; ++i)

{

for (int j = 0; j < col; ++j)

{

cout << pparr[i][j] << ' ';

}

cout << endl;

}

#endif

util.cpp

[cpp] view plain copy

#include "util.h"

#include <iostream>

#include <ctime>

#include <cmath>

#include <cassert>

#include <fstream>

#include <cstring>

#include <stack>

#include <iomanip>

using namespace std;

int getMaxIndex(double *pdarr, int num)

{

double dmax = -1;

int iMax = -1;

for(int i = 0; i < num; ++i)

{

if (pdarr[i] > dmax)

{

dmax = pdarr[i];

iMax = i;

}

return iMax;

}

double sigmoid(double dx)

{

return 1.0/(1.0+exp(-dx));

}

double mytanh(double dx)

{

double e2x = exp(2 * dx);

return (e2x - 1) / (e2x + 1);

}

double uniform(double _min, double _max)

{

return rand()/(RAND_MAX + 1.0) * (_max - _min) + _min;

}

void initArr(double *parr, int num)

{

for (int i = 0; i < num; ++i)

parr[i] = 0.0;

}

void savewb(const char *szName, double **m_ppdW, double *m_pdBias,

int irow, int icol)

{

FILE *pf;

if( (pf = fopen(szName, "ab" )) == NULL )

{

printf( "File coulkd not be opened " );

return;

}

int isizeofelem = sizeof(double);

for (int i = 0; i < irow; ++i)

{

if (fwrite((const void*)m_ppdW[i], isizeofelem, icol, pf) != icol)

{

fputs ("Writing m_ppdW error",stderr);

return;

}

if (fwrite((const void*)m_pdBias, isizeofelem, irow, pf) != irow)

{

fputs ("Writing m_ppdW error",stderr);

return;

}

fclose(pf);

}

long loadwb(const char *szName, double **m_ppdW, double *m_pdBias,

int irow, int icol, long dstartpos)

{

FILE *pf;

long dtotalbyte = 0, dreadsize;

if( (pf = fopen(szName, "rb" )) == NULL )

{

printf( "File coulkd not be opened " );

return -1;

}

//让文件指针偏移到正确位置

fseek(pf, dstartpos , SEEK_SET);

int isizeofelem = sizeof(double);

for (int i = 0; i < irow; ++i)

{

dreadsize = fread((void*)m_ppdW[i], isizeofelem, icol, pf);

if (dreadsize != icol)

{

fputs ("Reading m_ppdW error",stderr);

return -1;

}

//每次成功读取，都要加到dtotalbyte中，最后返回

dtotalbyte += dreadsize;

}

dreadsize = fread(m_pdBias, isizeofelem, irow, pf);

if (dreadsize != irow)

{

fputs ("Reading m_pdBias error",stderr);

return -1;

}

dtotalbyte += dreadsize;

dtotalbyte *= isizeofelem;

fclose(pf);

return dtotalbyte;

}

void Printivec(const vector<int> &ivec)

{

for (vector<int>::const_iterator it = ivec.begin(); it != ivec.end(); ++it)

{

cout << *it << ' ';

}

cout << endl;

}

void TestLoadJson(const char *pcfilename)

{

vector<double *> vpdw;

vector<double> vdb;

vector< vector<double*> > vvAllw;

vector< vector<double> > vvAllb;

int m_iInput = 28 * 28, ihidden = 500, m_iOut = 10;

LoadallwbByByte(vvAllw, vvAllb, pcfilename, m_iInput, ihidden, m_iOut );

}

//read vt from mnist, format is [[[], [],..., []],[1, 3, 5,..., 7]]

bool LoadvtFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput)

{

cout << "loadvtFromJson" << endl;

const int ciStackSize = 10;

const int ciFeaturesize = m_iInput;

int arriStack[ciStackSize], iTop = -1;

ifstream ifs;

ifs.open(filename, ios::in);

assert(ifs.is_open());

BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma, ucSpace;

ucLeftbrace = '[';

ucRightbrace = ']';

ucComma = ',';

ucSpace = '0';

ifs >> ucRead;

assert(ucRead == ucLeftbrace);

//栈中全部存放左括号，用1代表,0说明清除

arriStack[++iTop] = 1;

//样本train开始

ifs >> ucRead;

assert(ucRead == ucLeftbrace);

arriStack[++iTop] = 1;//iTop is 1

int iIndex;

bool isdigit = false;

double dread, *pdvt;

//load vt sample

while (iTop > 0)

{

if (isdigit == false)

{

ifs >> ucRead;

isdigit = true;

if (ucRead == ucComma)

{

//next char is space or leftbrace

// ifs >> ucRead;

isdigit = false;

continue;

}

if (ucRead == ucSpace)

{

//if pdvt is null, next char is leftbrace;

//else next char is double value

if (pdvt == NULL)

isdigit = false;

continue;

}

if (ucRead == ucLeftbrace)

{

pdvt = new double[ciFeaturesize];

memset(pdvt, 0, ciFeaturesize * sizeof(double));

//iIndex数组下标

iIndex = 0;

arriStack[++iTop] = 1;

continue;

}

if (ucRead == ucRightbrace)

{

if (pdvt != NULL)

{

assert(iIndex == ciFeaturesize);

vecTrain.push_back(pdvt);

pdvt = NULL;

}

isdigit = false;

arriStack[iTop--] = 0;

continue;

}

else

{

ifs >> dread;

pdvt[iIndex++] = dread;

isdigit = false;

}

};

//next char is dot

ifs >> ucRead;

assert(ucRead == ucComma);

cout << vecTrain.size() << endl;

//读取label

WORD usread;

isdigit = false;

while (iTop > -1 && ifs.eof() == false)

{

if (isdigit == false)

{

ifs >> ucRead;

isdigit = true;

if (ucRead == ucComma)

{

//next char is space or leftbrace

// ifs >> ucRead;

// isdigit = false;

continue;

}

if (ucRead == ucSpace)

{

//if pdvt is null, next char is leftbrace;

//else next char is double value

if (pdvt == NULL)

isdigit = false;

continue;

}

if (ucRead == ucLeftbrace)

{

arriStack[++iTop] = 1;

continue;

}

//右括号的下一个字符是右括号（最后一个字符）

if (ucRead == ucRightbrace)

{

isdigit = false;

arriStack[iTop--] = 0;

continue;

}

else

{

ifs >> usread;

vecLabel.push_back(usread);

isdigit = false;

}

};

assert(vecLabel.size() == vecTrain.size());

assert(iTop == -1);

ifs.close();

return true;

}

bool testjsonfloat(const char *filename)

{

vector<double> vecTrain;

cout << "testjsondouble" << endl;

const int ciStackSize = 10;

int arriStack[ciStackSize], iTop = -1;

ifstream ifs;

ifs.open(filename, ios::in);

assert(ifs.is_open());

BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma;

ucLeftbrace = '[';

ucRightbrace = ']';

ucComma = ',';

ifs >> ucRead;

assert(ucRead == ucLeftbrace);

//栈中全部存放左括号，用1代表,0说明清除

arriStack[++iTop] = 1;

//样本train开始

ifs >> ucRead;

assert(ucRead == ucLeftbrace);

arriStack[++iTop] = 1;//iTop is 1

double fread;

bool isdigit = false;

while (iTop > -1)

{

if (isdigit == false)

{

ifs >> ucRead;

isdigit = true;

if (ucRead == ucComma)

{

//next char is space or leftbrace

// ifs >> ucRead;

isdigit = false;

continue;

}

if (ucRead == ' ')

continue;

if (ucRead == ucLeftbrace)

{

arriStack[++iTop] = 1;

continue;

}

if (ucRead == ucRightbrace)

{

isdigit = false;

//右括号的下一个字符是右括号（最后一个字符）

arriStack[iTop--] = 0;

continue;

}

else

{

ifs >> fread;

vecTrain.push_back(fread);

isdigit = false;

}

ifs.close();

return true;

}

bool LoadwbFromJson(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput)

{

cout << "loadvtFromJson" << endl;

const int ciStackSize = 10;

const int ciFeaturesize = m_iInput;

int arriStack[ciStackSize], iTop = -1;

ifstream ifs;

ifs.open(filename, ios::in);

assert(ifs.is_open());

BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma, ucSpace;

ucLeftbrace = '[';

ucRightbrace = ']';

ucComma = ',';

ucSpace = '0';

ifs >> ucRead;

assert(ucRead == ucLeftbrace);

//栈中全部存放左括号，用1代表,0说明清除

arriStack[++iTop] = 1;

//样本train开始

ifs >> ucRead;

assert(ucRead == ucLeftbrace);

arriStack[++iTop] = 1;//iTop is 1

int iIndex;

bool isdigit = false;

double dread, *pdvt;

//load vt sample

while (iTop > 0)

{

if (isdigit == false)

{

ifs >> ucRead;

isdigit = true;

if (ucRead == ucComma)

{

//next char is space or leftbrace

// ifs >> ucRead;

isdigit = false;

continue;

}

if (ucRead == ucSpace)

{

//if pdvt is null, next char is leftbrace;

//else next char is double value

if (pdvt == NULL)

isdigit = false;

continue;

}

if (ucRead == ucLeftbrace)

{

pdvt = new double[ciFeaturesize];

memset(pdvt, 0, ciFeaturesize * sizeof(double));

//iIndex数组下标

iIndex = 0;

arriStack[++iTop] = 1;

continue;

}

if (ucRead == ucRightbrace)

{

if (pdvt != NULL)

{

assert(iIndex == ciFeaturesize);

vecTrain.push_back(pdvt);

pdvt = NULL;

}

isdigit = false;

arriStack[iTop--] = 0;

continue;

}

else

{

ifs >> dread;

pdvt[iIndex++] = dread;

isdigit = false;

}

};

//next char is dot

ifs >> ucRead;

assert(ucRead == ucComma);

cout << vecTrain.size() << endl;

//读取label

double usread;

isdigit = false;

while (iTop > -1 && ifs.eof() == false)

{

if (isdigit == false)

{

ifs >> ucRead;

isdigit = true;

if (ucRead == ucComma)

{

//next char is space or leftbrace

// ifs >> ucRead;

// isdigit = false;

continue;

}

if (ucRead == ucSpace)

{

//if pdvt is null, next char is leftbrace;

//else next char is double value

if (pdvt == NULL)

isdigit = false;

continue;

}

if (ucRead == ucLeftbrace)

{

arriStack[++iTop] = 1;

continue;

}

//右括号的下一个字符是右括号（最后一个字符）

if (ucRead == ucRightbrace)

{

isdigit = false;

arriStack[iTop--] = 0;

continue;

}

else

{

ifs >> usread;

vecLabel.push_back(usread);

isdigit = false;

}

};

assert(vecLabel.size() == vecTrain.size());

assert(iTop == -1);

ifs.close();

return true;

}

bool vec2double(vector<BYTE> &vecDigit, double &dvalue)

{

if (vecDigit.empty())

return false;

int ivecsize = vecDigit.size();

const int iMaxlen = 50;

char szdigit[iMaxlen];

assert(iMaxlen > ivecsize);

memset(szdigit, 0, iMaxlen);

int i;

for (i = 0; i < ivecsize; ++i)

szdigit[i] = vecDigit[i];

szdigit[i++] = '\0';

vecDigit.clear();

dvalue = atof(szdigit);

return true;

}

bool vec2short(vector<BYTE> &vecDigit, WORD &usvalue)

{

if (vecDigit.empty())

return false;

int ivecsize = vecDigit.size();

const int iMaxlen = 50;

char szdigit[iMaxlen];

assert(iMaxlen > ivecsize);

memset(szdigit, 0, iMaxlen);

int i;

for (i = 0; i < ivecsize; ++i)

szdigit[i] = vecDigit[i];

szdigit[i++] = '\0';

vecDigit.clear();

usvalue = atoi(szdigit);

return true;

}

void readDigitFromJson(ifstream &ifs, vector<double*> &vecTrain, vector<WORD> &vecLabel,

vector<BYTE> &vecDigit, double *&pdvt, int &iIndex,

const int ciFeaturesize, int *arrStack, int &iTop, bool bFirstlist)

{

BYTE ucRead;

WORD usvalue;

double dvalue;

const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';

ifs.read((char*)(&ucRead), 1);

switch (ucRead)

{

case ucLeftbrace:

{

if (bFirstlist)

{

pdvt = new double[ciFeaturesize];

memset(pdvt, 0, ciFeaturesize * sizeof(double));

iIndex = 0;

}

arrStack[++iTop] = 1;

break;

}

case ucComma:

{

//next char is space or leftbrace

if (bFirstlist)

{

if (vecDigit.empty() == false)

{

vec2double(vecDigit, dvalue);

pdvt[iIndex++] = dvalue;

}

else

{

if(vec2short(vecDigit, usvalue))

vecLabel.push_back(usvalue);

}

break;

}

case ucSpace:

break;

case ucRightbrace:

{

if (bFirstlist)

{

if (pdvt != NULL)

{

vec2double(vecDigit, dvalue);

pdvt[iIndex++] = dvalue;

vecTrain.push_back(pdvt);

pdvt = NULL;

}

assert(iIndex == ciFeaturesize);

}

else

{

if(vec2short(vecDigit, usvalue))

vecLabel.push_back(usvalue);

}

arrStack[iTop--] = 0;

break;

}

default:

{

vecDigit.push_back(ucRead);

break;

}

void readDoubleFromJson(ifstream &ifs, vector<double*> &vecTrain, vector<double> &vecLabel,

vector<BYTE> &vecDigit, double *&pdvt, int &iIndex,

const int ciFeaturesize, int *arrStack, int &iTop, bool bFirstlist)

{

BYTE ucRead;

double dvalue;

const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';

ifs.read((char*)(&ucRead), 1);

switch (ucRead)

{

case ucLeftbrace:

{

if (bFirstlist)

{

pdvt = new double[ciFeaturesize];

memset(pdvt, 0, ciFeaturesize * sizeof(double));

iIndex = 0;

}

arrStack[++iTop] = 1;

break;

}

case ucComma:

{

//next char is space or leftbrace

if (bFirstlist)

{

if (vecDigit.empty() == false)

{

vec2double(vecDigit, dvalue);

pdvt[iIndex++] = dvalue;

}

else

{

if(vec2double(vecDigit, dvalue))

vecLabel.push_back(dvalue);

}

break;

}

case ucSpace:

break;

case ucRightbrace:

{

if (bFirstlist)

{

if (pdvt != NULL)

{

vec2double(vecDigit, dvalue);

pdvt[iIndex++] = dvalue;

vecTrain.push_back(pdvt);

pdvt = NULL;

}

assert(iIndex == ciFeaturesize);

}

else

{

if(vec2double(vecDigit, dvalue))

vecLabel.push_back(dvalue);

}

arrStack[iTop--] = 0;

break;

}

default:

{

vecDigit.push_back(ucRead);

break;

}

bool LoadallwbByByte(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb, const char *filename,

const int m_iInput, const int ihidden, const int m_iOut)

{

cout << "LoadallwbByByte" << endl;

const int szistsize = 10;

int ciFeaturesize = m_iInput;

const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';

int arrStack[szistsize], iTop = -1, iIndex = 0;

ifstream ifs;

ifs.open(filename, ios::in | ios::binary);

assert(ifs.is_open());

double *pdvt;

BYTE ucRead;

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

//栈中全部存放左括号，用1代表,0说明清除

arrStack[++iTop] = 1;

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 1

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 2

vector<BYTE> vecDigit;

vector<double *> vpdw;

vector<double> vdb;

while (iTop > 1 && ifs.eof() == false)

{

readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, m_iInput, arrStack, iTop, true);

};

//next char is dot

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucComma);

cout << vpdw.size() << endl;

//next char is space

while (iTop > 0 && ifs.eof() == false)

{

readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, m_iInput, arrStack, iTop, false);

};

assert(vpdw.size() == vdb.size());

assert(iTop == 0);

vvAllw.push_back(vpdw);

vvAllb.push_back(vdb);

//clear vpdw and pdb 's contents

vpdw.clear();

vdb.clear();

//next char is comma

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucComma);

//next char is space

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucSpace);

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 1

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 2

while (iTop > 1 && ifs.eof() == false)

{

readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, ihidden, arrStack, iTop, true);

};

//next char is dot

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucComma);

cout << vpdw.size() << endl;

//next char is space

while (iTop > -1 && ifs.eof() == false)

{

readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, ihidden, arrStack, iTop, false);

};

assert(vpdw.size() == vdb.size());

assert(iTop == -1);

vvAllw.push_back(vpdw);

vvAllb.push_back(vdb);

//clear vpdw and pdb 's contents

vpdw.clear();

vdb.clear();

//close file

ifs.close();

return true;

}

bool LoadWeighFromJson(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb,

const char *filename, const vector<int> &vecSecondDimOfWeigh)

{

cout << "LoadWeighFromJson" << endl;

const int szistsize = 10;

const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';

int arrStack[szistsize], iTop = -1, iIndex = 0;

ifstream ifs;

ifs.open(filename, ios::in | ios::binary);

assert(ifs.is_open());

double *pdvt;

BYTE ucRead;

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

//栈中全部存放左括号，用1代表,0说明清除

arrStack[++iTop] = 1;

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 1

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 2

int iVecWeighSize = vecSecondDimOfWeigh.size();

vector<BYTE> vecDigit;

vector<double *> vpdw;

vector<double> vdb;

//读取iVecWeighSize个[w,b]

for (int i = 0; i < iVecWeighSize; ++i)

{

int iDimesionOfWeigh = vecSecondDimOfWeigh[i];

while (iTop > 1 && ifs.eof() == false)

{

readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, iDimesionOfWeigh, arrStack, iTop, true);

};

//next char is dot

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucComma);

cout << vpdw.size() << endl;

//next char is space

while (iTop > 0 && ifs.eof() == false)

{

readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, iDimesionOfWeigh, arrStack, iTop, false);

};

assert(vpdw.size() == vdb.size());

assert(iTop == 0);

vvAllw.push_back(vpdw);

vvAllb.push_back(vdb);

//clear vpdw and pdb 's contents

vpdw.clear();

vdb.clear();

//如果最后一对[w,b]读取完毕，就退出，下一个字符是右括号']'

if (i >= iVecWeighSize - 1)

{

break;

}

//next char is comma

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucComma);

//next char is space

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucSpace);

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 1

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 2

}

ifs.read((char*)(&ucRead), 1);;

assert(ucRead == ucRightbrace);

--iTop;

assert(iTop == -1);

//close file

ifs.close();

return true;

}

//read vt from mnszist, format is [[[], [],..., []],[1, 3, 5,..., 7]]

bool LoadTestSampleFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput)

{

cout << "LoadTestSampleFromJson" << endl;

const int szistsize = 10;

const int ciFeaturesize = m_iInput;

const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';

int arrStack[szistsize], iTop = -1, iIndex = 0;

ifstream ifs;

ifs.open(filename, ios::in | ios::binary);

assert(ifs.is_open());

double *pdvt;

BYTE ucRead;

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

//栈中全部存放左括号，用1代表,0说明清除

arrStack[++iTop] = 1;

ifs.read((char*)(&ucRead), 1);

assert(ucRead == ucLeftbrace);

arrStack[++iTop] = 1;//iTop is 1

vector<BYTE> vecDigit;

while (iTop > 0 && ifs.eof() == false)

{

readDigitFromJson(ifs, vecTrain, vecLabel, vecDigit, pdvt, iIndex, ciFeaturesize, arrStack, iTop, true);

};

//next char is dot

ifs >> ucRead;

assert(ucRead == ucComma);

cout << vecTrain.size() << endl;

//next char is space

// ifs.read((char*)(&ucRead), 1);

// assert(ucRead == ucLeftbrace);

while (iTop > -1 && ifs.eof() == false)

{

readDigitFromJson(ifs, vecTrain, vecLabel, vecDigit, pdvt, iIndex, ciFeaturesize, arrStack, iTop, false);

};

assert(vecLabel.size() == vecTrain.size());

assert(iTop == -1);

ifs.close();

return true;

}

void MakeOneLabel(int iMax, double *pdLabel, int m_iOut)

{

for (int j = 0; j < m_iOut; ++j)

pdLabel[j] = 0;

pdLabel[iMax] = 1.0;

}

void MakeCnnSample(double arrInput[2][64], double *pdImage, int iImageWidth, int iNumOfImage)

{

int iImageSize = iImageWidth * iImageWidth;

for (int k = 0; k < iNumOfImage; ++k)

{

int iStart = k *iImageSize;

for (int i = 0; i < iImageWidth; ++i)

{

for (int j = 0; j < iImageWidth; ++j)

{

int iIndex = iStart + i * iImageWidth + j;

pdImage[iIndex] = 1;

pdImage[iIndex] += i + j;

if (k > 0)

pdImage[iIndex] -= 1;

arrInput[k][i * iImageWidth +j] = pdImage[iIndex];

//pdImage[iIndex] /= 15.0 ;

}

cout << "input image is\n";

for (int k = 0; k < iNumOfImage; ++k)

{

int iStart = k *iImageSize;

cout << "k is " << k <<endl;

for (int i = 0; i < iImageWidth; ++i)

{

for (int j = 0; j < iImageWidth; ++j)

{

int iIndex = i * iImageWidth + j;

double dValue = arrInput[k][iIndex];

cout << dValue << ' ';

}

cout << endl;

}

cout << endl;

}

cout << endl;

}

void MakeCnnWeigh(double *pdKernel, int iNumOfKernel)

{

const int iKernelWidth = 3;

double iSum = 0;

double arrKernel[iKernelWidth][iKernelWidth] = {{4, 7, 1},

{3, 8, 5},

{3, 2, 3}};

double arr2[iKernelWidth][iKernelWidth] = {{6, 5, 4},

{5, 4, 3},

{4, 3, 2}};

for (int k = 0; k < iNumOfKernel; ++k)

{

int iStart = k * iKernelWidth * iKernelWidth;

for (int i = 0; i < iKernelWidth; ++i)

{

for (int j = 0; j < iKernelWidth; ++j)

{

int iIndex = i * iKernelWidth + j + iStart;

pdKernel[iIndex] = i + j + 2;

if (k > 0)

pdKernel[iIndex] = arrKernel[i][j];

iSum += pdKernel[iIndex];

}

cout << "sum is " << iSum << endl;

for (int k = 0; k < iNumOfKernel; ++k)

{

cout << "kernel :" << k << endl;

int iStart = k * iKernelWidth * iKernelWidth;

for (int i = 0; i < iKernelWidth; ++i)

{

for (int j = 0; j < iKernelWidth; ++j)

{

int iIndex = i * iKernelWidth + j + iStart;

//pdKernel[iIndex] /= (double)iSum;

cout << pdKernel[iIndex] << ' ';

}

cout << endl;

}

cout << endl;

}

cout << endl;

}

训练两轮，生成theano权值的代码

cnn_mlp_theano.py

[python] view plain copy

#coding=utf-8

import cPickle

import gzip

import os

import sys

import time

import json

import numpy

import theano

import theano.tensor as T

from theano.tensor.signal import downsample

from theano.tensor.nnet import conv

from logistic_sgd import LogisticRegression, load_data

from mlp import HiddenLayer

class LeNetConvPoolLayer(object):

"""Pool Layer of a convolutional network """

def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):

"""

Allocate a LeNetConvPoolLayer with shared variable internal parameters.

:type rng: numpy.random.RandomState

:param rng: a random number generator used to initialize weights

:type input: theano.tensor.dtensor4

:param input: symbolic image tensor, of shape image_shape

:type filter_shape: tuple or list of length 4

:param filter_shape: (number of filters, num input feature maps,

filter height,filter width)

:type image_shape: tuple or list of length 4

:param image_shape: (batch size, num input feature maps,

image height, image width)

:type poolsize: tuple or list of length 2

:param poolsize: the downsampling (pooling) factor (#rows,#cols)

"""

assert image_shape[1] == filter_shape[1]

self.input = input

# there are "num input feature maps * filter height * filter width"

# inputs to each hidden unit

fan_in = numpy.prod(filter_shape[1:])

# each unit in the lower layer receives a gradient from:

# "num output feature maps * filter height * filter width" /

# pooling size

fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /

numpy.prod(poolsize))

# initialize weights with random weights

W_bound = numpy.sqrt(6. / (fan_in + fan_out))

self.W = theano.shared(numpy.asarray(

rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),

dtype=theano.config.floatX),

borrow=True)

# the bias is a 1D tensor -- one bias per output feature map

b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)

self.b = theano.shared(value=b_values, borrow=True)

# convolve input feature maps with filters

conv_out = conv.conv2d(input=input, filters=self.W,

filter_shape=filter_shape, image_shape=image_shape)

# downsample each feature map inpidually, using maxpooling

pooled_out = downsample.max_pool_2d(input=conv_out,

ds=poolsize, ignore_border=True)

# add the bias term. Since the bias is a vector (1D array), we first

# reshape it to a tensor of shape (1,n_filters,1,1). Each bias will

# thus be broadcasted across mini-batches and feature map

# width & height

self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))

# store parameters of this layer

self.params = [self.W, self.b]

def getDataNumpy(layers):

data = []

for layer in layers:

wb = layer.params

w, b = wb[0].get_value(), wb[1].get_value()

data.append([w, b])

return data

def getDataJson(layers):

data = []

i = 0

for layer in layers:

w, b = layer.params

# print '..layer is', i

w, b = w.get_value(), b.get_value()

wshape = w.shape

# print '...the shape of w is', wshape

if len(wshape) == 2:

w = w.transpose()

else:

for k in xrange(wshape[0]):

for j in xrange(wshape[1]):

w[k][j] = numpy.rot90(w[k][j], 2)

w = w.reshape((wshape[0], numpy.prod(wshape[1:])))

w = w.tolist()

b = b.tolist()

data.append([w, b])

i += 1

return data

def writefile(data, name = '../../tmp/src/data/theanocnn.json'):

print ('writefile is ' + name)

f = open(name, "wb")

json.dump(data,f)

f.close()

def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):

# Load the dataset

print ('readfile is ' + name)

f = open(name, 'rb')

data = json.load(f)

f.close()

readwb(data, layers, nkerns)

def readwb(data, layers, nkerns):

i = 0

kernSize = len(nkerns)

inputnum = 1

for layer in layers:

w, b = data[i]

w = numpy.array(w, dtype='float32')

b = numpy.array(b, dtype='float32')

# print '..layer is', i

# print w.shape

if i >= kernSize:

w = w.transpose()

else:

w = w.reshape((nkerns[i], inputnum, 5, 5))

for k in xrange(nkerns[i]):

for j in xrange(inputnum):

c = w[k][j]

w[k][j] = numpy.rot90(c, 2)

inputnum = nkerns[i]

# print '..readwb ，transpose and rot180'

# print w.shape

layer.W.set_value(w, borrow=True)

layer.b.set_value(b, borrow=True)

i += 1

def loadwb(classifier, name='theanocnn.json'):

data = json.load(open(name, 'rb'))

w, b = data

print type(w)

w = numpy.array(w, dtype='float32').transpose()

classifier.W.set_value(w, borrow=True)

classifier.b.set_value(b, borrow=True)

def savewb(classifier, name='theanocnn.json'):

w, b = classifier.params

w = w.get_value().transpose().tolist()

b = b.get_value().tolist()

data = [w, b]

json.dump(data, open(name, 'wb'))

def evaluate_lenet5(learning_rate=0.1, n_epochs=2,

dataset='../../data/mnist.pkl',

nkerns=[20, 50], batch_size=500):

""" Demonstrates lenet on MNIST dataset

:type learning_rate: float

:param learning_rate: learning rate used (factor for the stochastic

gradient)

:type n_epochs: int

:param n_epochs: maximal number of epochs to run the optimizer

:type dataset: string

:param dataset: path to the dataset used for training /testing (MNIST here)

:type nkerns: list of ints

:param nkerns: number of kernels on each layer

"""

rng = numpy.random.RandomState(23455)

datasets = load_data(dataset)

train_set_x, train_set_y = datasets[0]

valid_set_x, valid_set_y = datasets[1]

test_set_x, test_set_y = datasets[2]

# compute number of minibatches for training, validation and testing

n_train_batches = train_set_x.get_value(borrow=True).shape[0]

n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]

n_test_batches = test_set_x.get_value(borrow=True).shape[0]

n_train_batches /= batch_size

n_valid_batches /= batch_size

n_test_batches /= batch_size

# allocate symbolic variables for the data

index = T.lscalar() # index to a [mini]batch

x = T.matrix('x') # the data is presented as rasterized images

y = T.ivector('y') # the labels are presented as 1D vector of

# [int] labels

ishape = (28, 28) # this is the size of MNIST images

######################

# BUILD ACTUAL MODEL #

######################

print '... building the model'

# Reshape matrix of rasterized images of shape (batch_size,28*28)

# to a 4D tensor, compatible with our LeNetConvPoolLayer

layer0_input = x.reshape((batch_size, 1, 28, 28))

# Construct the first convolutional pooling layer:

# filtering reduces the image size to (28-5+1,28-5+1)=(24,24)

# maxpooling reduces this further to (24/2,24/2) = (12,12)

# 4D output tensor is thus of shape (batch_size,nkerns[0],12,12)

layer0 = LeNetConvPoolLayer(rng, input=layer0_input,

image_shape=(batch_size, 1, 28, 28),

filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2, 2))

# Construct the second convolutional pooling layer

# filtering reduces the image size to (12-5+1,12-5+1)=(8,8)

# maxpooling reduces this further to (8/2,8/2) = (4,4)

# 4D output tensor is thus of shape (nkerns[0],nkerns[1],4,4)

layer1 = LeNetConvPoolLayer(rng, input=layer0.output,

image_shape=(batch_size, nkerns[0], 12, 12),

filter_shape=(nkerns[1], nkerns[0], 5, 5), poolsize=(2, 2))

# the TanhLayer being fully-connected, it operates on 2D matrices of

# shape (batch_size,num_pixels) (i.e matrix of rasterized images).

# This will generate a matrix of shape (20,32*4*4) = (20,512)

layer2_input = layer1.output.flatten(2)

# construct a fully-connected sigmoidal layer

layer2 = HiddenLayer(rng, input=layer2_input, n_in=nkerns[1] * 4 * 4,

n_out=500, activation=T.tanh)

# classify the values of the fully-connected sigmoidal layer

layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)

# the cost we minimize during training is the NLL of the model

cost = layer3.negative_log_likelihood(y)

# create a function to compute the mistakes that are made by the model

test_model = theano.function([index], layer3.errors(y),

givens={

x: test_set_x[index * batch_size: (index + 1) * batch_size],

y: test_set_y[index * batch_size: (index + 1) * batch_size]})

validate_model = theano.function([index], layer3.errors(y),

givens={

x: valid_set_x[index * batch_size: (index + 1) * batch_size],

y: valid_set_y[index * batch_size: (index + 1) * batch_size]})

# create a list of all model parameters to be fit by gradient descent

params = layer3.params + layer2.params + layer1.params + layer0.params

# create a list of gradients for all model parameters

grads = T.grad(cost, params)

layers = [layer0, layer1, layer2, layer3]

# train_model is a function that updates the model parameters by

# SGD Since this model has many parameters, it would be tedious to

# manually create an update rule for each model parameter. We thus

# create the updates list by automatically looping over all

# (params[i],grads[i]) pairs.

updates = []

for param_i, grad_i in zip(params, grads):

updates.append((param_i, param_i - learning_rate * grad_i))

train_model = theano.function([index], cost, updates=updates,

givens={

x: train_set_x[index * batch_size: (index + 1) * batch_size],

y: train_set_y[index * batch_size: (index + 1) * batch_size]})

###############

# TRAIN MODEL #

###############

print '... training'

# early-stopping parameters

patience = 10000 # look as this many examples regardless

patience_increase = 2 # wait this much longer when a new best is

# found

improvement_threshold = 0.995 # a relative improvement of this much is

# considered significant

validation_frequency = min(n_train_batches, patience / 2)

# go through this many

# minibatche before checking the network

# on the validation set; in this case we

# check every epoch

best_params = None

best_validation_loss = numpy.inf

best_iter = 0

test_score = 0.

start_time = time.clock()

epoch = 0

done_looping = False

while (epoch < n_epochs) and (not done_looping):

epoch = epoch + 1

print '...epoch is', epoch, 'writefile'

writefile(getDataJson(layers))

for minibatch_index in xrange(n_train_batches):

iter = (epoch - 1) * n_train_batches + minibatch_index

if iter % 100 == 0:

print 'training @ iter = ', iter

cost_ij = train_model(minibatch_index)

if (iter + 1) % validation_frequency == 0:

# compute zero-one loss on validation set

validation_losses = [validate_model(i) for i

in xrange(n_valid_batches)]

this_validation_loss = numpy.mean(validation_losses)

print('epoch %i, minibatch %i/%i, validation error %f %%' % \

(epoch, minibatch_index + 1, n_train_batches, \

this_validation_loss * 100.))

# if we got the best validation score until now

if this_validation_loss < best_validation_loss:

#improve patience if loss improvement is good enough

if this_validation_loss < best_validation_loss * \

improvement_threshold:

patience = max(patience, iter * patience_increase)

# save best validation score and iteration number

best_validation_loss = this_validation_loss

best_iter = iter

# test it on the test set

test_losses = [test_model(i) for i in xrange(n_test_batches)]

test_score = numpy.mean(test_losses)

print((' epoch %i, minibatch %i/%i, test error of best '

'model %f %%') %

(epoch, minibatch_index + 1, n_train_batches,

test_score * 100.))

if patience <= iter:

done_looping = True

break

end_time = time.clock()

print('Optimization complete.')

print('Best validation score of %f %% obtained at iteration %i,'\

'with test performance %f %%' %

(best_validation_loss * 100., best_iter + 1, test_score * 100.))

print >> sys.stderr, ('The code for file ' +

os.path.split(__file__)[1] +

' ran for %.2fm' % ((end_time - start_time) / 60.))

'''''

'''

readfile(layers, nkerns)

validation_losses = [validate_model(i) for i

in xrange(n_valid_batches)]

this_validation_loss = numpy.mean(validation_losses)

print('validation error %f %%' % \

(this_validation_loss * 100.))

if __name__ == '__main__':

evaluate_lenet5()

上一篇：Oracle 存储过程详细介绍及如何使用下一篇：几款常见的HTML可视化编辑器

知识问答

c++重写卷积网络的前向计算过程，复现theano的测试结果

综合百科

网站导航