(For more resources related to this topic, see here.)
Let’s look at an example of a house price-based regression model, and create some real data to examine. These are actual numbers from houses for sale, and we will be trying to find the value of a house we are supposed to sell:
Size (m2) | Land (m2) | Rooms | Granite | Extra bathroom | Price |
1076 | 2801 | 6 | 0 | 0 | €324.500,00 |
990 | 3067 | 5 | 1 | 1 | €466.000,00 |
1229 | 3094 | 5 | 0 | 1 | €425.900,00 |
731 | 4315 | 4 | 1 | 0 | €387.120,00 |
671 | 2926 | 4 | 0 | 1 | €312.100,00 |
1078 | 6094 | 6 | 1 | 1 | €603.000,00 |
909 | 2854 | 5 | 0 | 1 | €383.400,00 |
975 | 2947 | 5 | 1 | 1 | ?? |
To load files in Weka, we have to put the table in the ARFF file format and save it as house.arff. Make sure the attributes are numeric, as shown here:
@RELATION house
@ATTRIBUTE size NUMERIC
@ATTRIBUTE land NUMERIC
@ATTRIBUTE rooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE extra_bathroom NUMERIC
@ATTRIBUTE price NUMERIC
@DATA
1076,2801,6,0,0,324500
990,3067,5,1,1,466000
1229,3094,5,0,1,425900
731,4315,4,1,0,387120
671,2926,4,0,1,312100
1078,6094,6,1,1,603000
909,2854,5,0,1,383400
975,2947,5,1,1,?
Use the following snippet:
import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;
public class Regression{
public static void main(String args[]) throws Exception{
//load data
Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);
//build model
LinearRegression model = new LinearRegression();
model.buildClassifier(data); //the last instance with missing
class is not used
System.out.println(model);
//classify the last instance
Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);
}
}
Here is the output:
Linear Regression Model
price =
195.2035 * size +
38.9694 * land +
76218.4642 * granite +
73947.2118 * extra_bathroom +
2681.136
My house (975,2947,5,1,1,?): 458013.16703945777
Import a basic regression model named weka.classifiers.functions.LinearRegression:
import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;
Load the house dataset:
Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);
Initialize and build a regression model. Note, that the last instance is not used for building the model since the class value is missing:
LinearRegression model = new LinearRegression();
model.buildClassifier(data);
Output the model:
System.out.println(model);
Use the model to predict the price of the last instance in the dataset:
Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);
This section lists some additional algorithms.
There is a wide variety of implemented regression algorithms one can use in Weka:
We learned how to use models that predict a value of numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…