Browsing:

Tag: sql server 2017

Artificial Intelligence prepping data – Back to basic part 1

What street legal cars taught me about Machine learning? It’s all about the right data being available and the RDW data can’t be trusted!

I impulsively signed up for an Artificial Intelligence certification track 2 months ago, So I’ve been experimenting with Artificial Intelligence for a while now and in the beginning of the course it was one though cookie! Those formula’s to interpreted the data predictability really freaked me out!

But once I got past the formula’s and I saw the resemblance of the workspace with Microsoft products like SSIS and BI I see endless possibilities. This takes the data to a whole new level.

Preparing the data:

I did a test on all cars that are currently on the road in the Netherlands and combined it with performance data. I wanted to find the fastest street legal car. I guess I just wanted to find out what kind of cars I should fancy these days according to the performance stats.

I used an open data set from the dutch RWD (Driver and Vehicle Standards Agency). It contained 14m rows and it’s 7GB in size. So I had to prepare the data in order to keep the experiment basic and performance high. I imported it into my SQL server and I filtered out the the stationwagons, campers, scooters and trailers, So I was left with a 900000  rows data set.

I use a SQL Server 2017 and the Microsoft Azure machine learning studio to create a new experiment.

In order to make a prediction I needed to combine the brand data with the engine displacement data, because horsepower data was not available, to see which models are high performance based on the engine capacity. So sadly the smaller engines which are supercharged are not correctly represented in the prediction.

The calculation based on above rules, took a local SQL server on an i5 laptop about 15 minutes. I needed more data preparation.

Based on engine displacement, a top 3 came up. But I didn’t like the results at all. Sure, the engine displacement was high, but the cars are heavy and their performance isn’t the best. Super charged Turbo’s and gearing make all the difference, but aren’t properly represented in this data result.

I had to filter out a lot of data, next up I added the weight of the car, but it wasn’t trust worthy either. I found a data set which contained the Kw of the cars and top speed and joined the data with my current results and added a calculation in SQL on the Kw row * 1,362 to calculate the Hp of the car. The Hp outcome looks pretty accurate.  After 4 hours of combining data and filtering the queries I gave up. Based on this data there is no way you can truly point out the fastest cars. I had to change my plans. Too many uncertain variables to make a decent prediction and not even close to the start of an IA project 🙁

Lot’s of NULL data
This No. 1 car can’t be trusted!

After more data crunching, The results are still not really worth to display. So here is a TOP 21 of “fastest” cars…based on…well the obvious HP and Weight sorting:

btw, did you know there is only one Koenigsegg on the dutch roads.

Ok, I got a little bit carried away with data prepping.

Now let’s import it into an IA experiment: First you need to create a resource in the Azure Portal for your workspace. I won’t get into details, we did this before!

Verify that you created the following new resources: A Machine Learning Workspace, A Machine Learning Plan and A Storage Account.

Browse to the Machine learning workspace you created and launch Machine Learning Studio. This opens a new browser page.

Go to experiments and down in the left  corner click NEW.

Rename the experiment and add a dataset. Upload a new dataset. Datasets –> NEW–> Select data to upload. Now that you have the dataset ready, you can drag it into your experiment and start running tests and variables on the data.

In my next post we will dive deeper into Artificial Intelligence

 

 

 


Polybase installation on SQL Server 2017 part I- Oracle JRE 7 Update 51 (64-bit) or higher is required

Fresh new year, so a good time to check out the newest SQL Server! So far the installing process itself in SQL server 2017 brings no big new surprises. Just like the SQL Server 2016, you have to optionally download and install the SSMS via the Microsoft website, the link will be provided once the installation has finished.

SQL Server 2017

Next the install en configuration starts. I’ll highlight the one pain in the ass I encountered this time.

I already talked about the Polybase feature related to the content in a podcast early 2016, but this time an install and setup walkthrough, plus a warning for all the people bravely installing oracles newest version of java.

When you select the Polybase to be installed and you payed close attention, or already used it in 2016 edition, you know that you need the oracle SE Java Runtime Environment.Polybase Oracle JRE

If this is not already installed on you’re computer, the installation will fail, resulting in this message :

---------------------------
 Rule Check Result
 ---------------------------
 Rule "Oracle JRE 7 Update 51 (64-bit) or higher is required for Polybase" failed.

This computer does not have the Oracle Java SE Runtime Environment Version 7 Update 51 (64-bit) or higher installed. The Oracle Java SE Runtime Environment is software provided by a third party. Microsoft grants you no rights for such third-party software. You are responsible for and must separately locate, read and accept applicable third-party license terms. To continue, download the Oracle SE Java Runtime Environment from https://go.microsoft.com/fwlink/?LinkId=526030.
 ---------------------------
 OK
 ---------------------------

 

You need to head over to oracle.com and install a 7.51 or higher version, currently 9.0.1 is the highest, so seems legit to install this one.

Java install

 

 

 

 

 

Once you downloaded the correct product, In my case I choose the Windows Offline. Now run the Java install and return to your SQL Server setup for a re-run.

Wait what? Same message! “Requires JRE 7 update 51 or higher”. I just installed the latest JRE version, did a restart and java is up and running.

So, this it the moment you ask yourself, do I really really want the polybase feature that bad? The anwser is Yes! To start the troubleshoot, I decided, to do some backward compatibility, the oldest version available from site, without using my oracle client registration is 8.151, and guess what…This did the trick!

So stay away from the newest 9 version for as long as possible.

Next post will be the setup and configuration of the polybase