dinsdag 26 februari 2019

Machine learning in SQL2017 II

I succeeded to implement the machine learning on my laptop. My previous post on SQL server describes the required configuration. For the 2019 version (and possibly also for the 2017 version) the possibility becomes available to use different versions of Python, using a special tool. Its seems that for each version a different PythonLauncher.dll is needed, so probably the process depends on the preparation of different versions of this file by Microsoft.

There are many ways to access SQL from C#. I used the internally developed Data Access Layer for this. Here is the preparation of the SQL statement:


string startSql = EscapeQuote(string.Format(@"Select words = '{0}',  emailFrom = '{1}', emailTo = '{2}', emailCc = '{3}', emailFileRef = '{4}'", CheckNullOrEmpty(d.Words), CheckNullOrEmpty(d.EmailFrom), CheckNullOrEmpty(d.EmailTo), CheckNullOrEmpty(d.EmailCC), CheckNullOrEmpty(d.FileName)));



//Apply learning model:

string pythonScript = GetPythonScript();

string sql = string.Format(@"EXECUTE sp_execute_external_script

@language = N'Python'

, @script = N'{0}'

, @input_data_1 = N'{1}'

, @input_data_1_name = N'doc_data'

WITH RESULT SETS(([Matter] CHAR(100) NOT NULL));

(one has to be sure to also escape the quote in the Python script string)

It appeared that for accessing Python from SQL and back, the 'panda' structure is used:


import pandas as pd






#Input params:

words=str(doc_data.iat[0,0])

emailFrom=str(doc_data.iat[0,1])

emailTo=str(doc_data.iat[0,2])

emailCc=str(doc_data.iat[0,3])

fileName=str(doc_data.iat[0,4])







OutputDataSet = pd.DataFrame(best5idx)


Somehow assigning the results to the OutputDataSet is enough to return it to SQL. The 'WITH RESULT SETS' will assign headers to the output table.

However I encountered 2 issues:
  • When deployed on the server I got a 'memory error'. The model has to load more than 600MB of weight-values. I found that when more then 70% of the servers memory is used, one can get these kind of errors. The problem could possibly be resolved just by ordering more memory for the virtual server (7,50 euro/month).
  • It looked like for each prediction the model had to be recompiled, and the weights-file reloaded, because the SQL server starts the Python script, executes it, and then closes the session after getting the results. A sort-of solution for this could have been to speed up the process by declaring a virtual disk.
The developed system involved going from the rest service in C# to SQL to Python and back. Apart from the 2 issues it seemed overcomplicated, so I decided to try another approach: the Python Microservice.

Geen opmerkingen:

Een reactie posten