




















































(For more resources related to this topic, see here.)
Let's look at an example of a house price-based regression model, and create some real data to examine. These are actual numbers from houses for sale, and we will be trying to find the value of a house we are supposed to sell:
Size (m2) |
Land (m2) |
Rooms |
Granite |
Extra bathroom |
Price |
1076 |
2801 |
6 |
0 |
0 |
€324.500,00 |
990 |
3067 |
5 |
1 |
1 |
€466.000,00 |
1229 |
3094 |
5 |
0 |
1 |
€425.900,00 |
731 |
4315 |
4 |
1 |
0 |
€387.120,00 |
671 |
2926 |
4 |
0 |
1 |
€312.100,00 |
1078 |
6094 |
6 |
1 |
1 |
€603.000,00 |
909 |
2854 |
5 |
0 |
1 |
€383.400,00 |
975 |
2947 |
5 |
1 |
1 |
?? |
To load files in Weka, we have to put the table in the ARFF file format and save it as house.arff. Make sure the attributes are numeric, as shown here:
@RELATION house
@ATTRIBUTE size NUMERIC
@ATTRIBUTE land NUMERIC
@ATTRIBUTE rooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE extra_bathroom NUMERIC
@ATTRIBUTE price NUMERIC
@DATA
1076,2801,6,0,0,324500
990,3067,5,1,1,466000
1229,3094,5,0,1,425900
731,4315,4,1,0,387120
671,2926,4,0,1,312100
1078,6094,6,1,1,603000
909,2854,5,0,1,383400
975,2947,5,1,1,?
Use the following snippet:
import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;
public class Regression{
public static void main(String args[]) throws Exception{
//load data
Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);
//build model
LinearRegression model = new LinearRegression();
model.buildClassifier(data); //the last instance with missing
class is not used
System.out.println(model);
//classify the last instance
Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);
}
}
Here is the output:
Linear Regression Model
price =
195.2035 * size +
38.9694 * land +
76218.4642 * granite +
73947.2118 * extra_bathroom +
2681.136
My house (975,2947,5,1,1,?): 458013.16703945777
Import a basic regression model named weka.classifiers.functions.LinearRegression:
import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instance;
import weka.core.Instances;
import weka.classifiers.functions.LinearRegression;
Load the house dataset:
Instances data = new Instances(new BufferedReader(new
FileReader("dataset/house.arff")));
data.setClassIndex(data.numAttributes() - 1);
Initialize and build a regression model. Note, that the last instance is not used for building the model since the class value is missing:
LinearRegression model = new LinearRegression();
model.buildClassifier(data);
Output the model:
System.out.println(model);
Use the model to predict the price of the last instance in the dataset:
Instance myHouse = data.lastInstance();
double price = model.classifyInstance(myHouse);
System.out.println("My house ("+myHouse+"): "+price);
This section lists some additional algorithms.
There is a wide variety of implemented regression algorithms one can use in Weka:
We learned how to use models that predict a value of numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.
Further resources on this subject: