ki-dhbw/Aufgaben/02 - linear regression - mu...

1164 lines
103 KiB
Plaintext
Raw Permalink Normal View History

{
"cells": [
{
"cell_type": "markdown",
"id": "2f8e19e4",
"metadata": {},
"source": [
"# Lineare Regression mit mehreren Features ($d>1$)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "643861b2",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"# plotting settings\n",
"pd.plotting.register_matplotlib_converters()\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import seaborn as sns\n",
"from tqdm.notebook import tqdm"
]
},
{
"cell_type": "markdown",
"id": "315bd31f",
"metadata": {},
"source": [
"Wir verwenden hier beispielhaft den Datensatz [Melbourne Housing Snapshot](https://www.kaggle.com/datasets/dansbecker/melbourne-housing-snapshot). Diesen finden Sie auch im Moodle unter `data/kaggle/melb_data.csv`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e3381ac0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',\n",
" 'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',\n",
" 'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',\n",
" 'Longtitude', 'Regionname', 'Propertycount'],\n",
" dtype='object')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"melbourne_file_path = 'data/melb_data.csv'\n",
"melbourne_data = pd.read_csv(melbourne_file_path)\n",
"melbourne_data = melbourne_data.dropna(axis=0) # entfernen von Daten mit fehlenden Werten\n",
"melbourne_data.columns # Spaltennamen der Tabelle (potentielle Features)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0f80237c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Suburb</th>\n",
" <th>Address</th>\n",
" <th>Rooms</th>\n",
" <th>Type</th>\n",
" <th>Price</th>\n",
" <th>Method</th>\n",
" <th>SellerG</th>\n",
" <th>Date</th>\n",
" <th>Distance</th>\n",
" <th>Postcode</th>\n",
" <th>...</th>\n",
" <th>Bathroom</th>\n",
" <th>Car</th>\n",
" <th>Landsize</th>\n",
" <th>BuildingArea</th>\n",
" <th>YearBuilt</th>\n",
" <th>CouncilArea</th>\n",
" <th>Lattitude</th>\n",
" <th>Longtitude</th>\n",
" <th>Regionname</th>\n",
" <th>Propertycount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Abbotsford</td>\n",
" <td>25 Bloomburg St</td>\n",
" <td>2</td>\n",
" <td>h</td>\n",
" <td>1035000.0</td>\n",
" <td>S</td>\n",
" <td>Biggin</td>\n",
" <td>4/02/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>156.0</td>\n",
" <td>79.0</td>\n",
" <td>1900.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8079</td>\n",
" <td>144.9934</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abbotsford</td>\n",
" <td>5 Charles St</td>\n",
" <td>3</td>\n",
" <td>h</td>\n",
" <td>1465000.0</td>\n",
" <td>SP</td>\n",
" <td>Biggin</td>\n",
" <td>4/03/2017</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>134.0</td>\n",
" <td>150.0</td>\n",
" <td>1900.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8093</td>\n",
" <td>144.9944</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Abbotsford</td>\n",
" <td>55a Park St</td>\n",
" <td>4</td>\n",
" <td>h</td>\n",
" <td>1600000.0</td>\n",
" <td>VB</td>\n",
" <td>Nelson</td>\n",
" <td>4/06/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>120.0</td>\n",
" <td>142.0</td>\n",
" <td>2014.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8072</td>\n",
" <td>144.9941</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Abbotsford</td>\n",
" <td>124 Yarra St</td>\n",
" <td>3</td>\n",
" <td>h</td>\n",
" <td>1876000.0</td>\n",
" <td>S</td>\n",
" <td>Nelson</td>\n",
" <td>7/05/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>245.0</td>\n",
" <td>210.0</td>\n",
" <td>1910.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8024</td>\n",
" <td>144.9993</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Abbotsford</td>\n",
" <td>98 Charles St</td>\n",
" <td>2</td>\n",
" <td>h</td>\n",
" <td>1636000.0</td>\n",
" <td>S</td>\n",
" <td>Nelson</td>\n",
" <td>8/10/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>256.0</td>\n",
" <td>107.0</td>\n",
" <td>1890.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8060</td>\n",
" <td>144.9954</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" Suburb Address Rooms Type Price Method SellerG \\\n",
"1 Abbotsford 25 Bloomburg St 2 h 1035000.0 S Biggin \n",
"2 Abbotsford 5 Charles St 3 h 1465000.0 SP Biggin \n",
"4 Abbotsford 55a Park St 4 h 1600000.0 VB Nelson \n",
"6 Abbotsford 124 Yarra St 3 h 1876000.0 S Nelson \n",
"7 Abbotsford 98 Charles St 2 h 1636000.0 S Nelson \n",
"\n",
" Date Distance Postcode ... Bathroom Car Landsize BuildingArea \\\n",
"1 4/02/2016 2.5 3067.0 ... 1.0 0.0 156.0 79.0 \n",
"2 4/03/2017 2.5 3067.0 ... 2.0 0.0 134.0 150.0 \n",
"4 4/06/2016 2.5 3067.0 ... 1.0 2.0 120.0 142.0 \n",
"6 7/05/2016 2.5 3067.0 ... 2.0 0.0 245.0 210.0 \n",
"7 8/10/2016 2.5 3067.0 ... 1.0 2.0 256.0 107.0 \n",
"\n",
" YearBuilt CouncilArea Lattitude Longtitude Regionname \\\n",
"1 1900.0 Yarra -37.8079 144.9934 Northern Metropolitan \n",
"2 1900.0 Yarra -37.8093 144.9944 Northern Metropolitan \n",
"4 2014.0 Yarra -37.8072 144.9941 Northern Metropolitan \n",
"6 1910.0 Yarra -37.8024 144.9993 Northern Metropolitan \n",
"7 1890.0 Yarra -37.8060 144.9954 Northern Metropolitan \n",
"\n",
" Propertycount \n",
"1 4019.0 \n",
"2 4019.0 \n",
"4 4019.0 \n",
"6 4019.0 \n",
"7 4019.0 \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"melbourne_data.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b4939e52",
"metadata": {},
"outputs": [],
"source": [
"#features = ['BuildingArea', Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude', 'YearBuilt', 'Distance']\n",
"features = ['Rooms', 'BuildingArea']\n",
"data = melbourne_data[features + ['Price']]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "47f35849",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Rooms</th>\n",
" <th>BuildingArea</th>\n",
" <th>Price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>6196.000000</td>\n",
" <td>6196.000000</td>\n",
" <td>6.196000e+03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>2.931407</td>\n",
" <td>141.568645</td>\n",
" <td>1.068828e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.971079</td>\n",
" <td>90.834824</td>\n",
" <td>6.751564e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.310000e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>2.000000</td>\n",
" <td>91.000000</td>\n",
" <td>6.200000e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>3.000000</td>\n",
" <td>124.000000</td>\n",
" <td>8.800000e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>4.000000</td>\n",
" <td>170.000000</td>\n",
" <td>1.325000e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>8.000000</td>\n",
" <td>3112.000000</td>\n",
" <td>9.000000e+06</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Rooms BuildingArea Price\n",
"count 6196.000000 6196.000000 6.196000e+03\n",
"mean 2.931407 141.568645 1.068828e+06\n",
"std 0.971079 90.834824 6.751564e+05\n",
"min 1.000000 0.000000 1.310000e+05\n",
"25% 2.000000 91.000000 6.200000e+05\n",
"50% 3.000000 124.000000 8.800000e+05\n",
"75% 4.000000 170.000000 1.325000e+06\n",
"max 8.000000 3112.000000 9.000000e+06"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ed0fdea0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Rooms</th>\n",
" <th>BuildingArea</th>\n",
" <th>Price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>79.0</td>\n",
" <td>1035000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>150.0</td>\n",
" <td>1465000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>142.0</td>\n",
" <td>1600000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>3</td>\n",
" <td>210.0</td>\n",
" <td>1876000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2</td>\n",
" <td>107.0</td>\n",
" <td>1636000.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Rooms BuildingArea Price\n",
"1 2 79.0 1035000.0\n",
"2 3 150.0 1465000.0\n",
"4 4 142.0 1600000.0\n",
"6 3 210.0 1876000.0\n",
"7 2 107.0 1636000.0"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "b5126919",
"metadata": {},
"source": [
"## Definition der Funktionen für die Lineare Regression"
]
},
{
"cell_type": "markdown",
"id": "dacabf66",
"metadata": {},
"source": [
"Aus der Vorlesung:\n",
"\n",
"$$ h(x, w) = w^T x . $$\n",
"\n",
"In der Vorlesung haben wir $\\theta$ statt $w$ verwendet.\n",
"\n",
"**Wichtig:** Diese Definition von $h$ nimmt an, dass die erste Komponente von $x$, also in Python code `x[0]`, immer 1 ist.\n",
"\n",
"Wir können auch eine vektorisierte Form von $h(x, w)$ definieren, bei der der Input $X$ mehrere (oder alle) Trainingsbeispiele umfasst und der Output ein Vektor aus den zugehörigen Werten von h zu jedem der Trainingsbeispiele ist. In Matrixschreibweise:\n",
"\n",
"$$ h(X, w) = X w , $$\n",
"\n",
"wobei die Zeilen von $X$ aus je einem Trainingsbeispiel (inkl. der \"1\" in der ersten Komponente) bestehen.\n",
"\n",
"Aufgrund der Art wie `numpy` den Spezialfall der Multiplikation zweier Vektoren handhabt können wir den Code für beide oben erwähnten Varianten von $h$ vereinheitlichen und eine Funktion $h(x, w)$ definieren, die sowohl mit einer Inputzeile als auch mit mehreren Inputzeilen umgehen kann.\n",
"\n",
"Bei der Multiplikation zweier numpy arrays (also zweier Vektoren) mittels `@`-Operator bildet numpy stets das Skalarprodukt der Vektoren, ohne dass man einen der Vektoren transponieren müsste. D.h., wenn wir zwei Spaltenvektoren $w, x$ haben, lautet die korrekte Schreibweise eigentlich:\n",
"$$w^T x$$\n",
"numpy erlaubt es uns aber einfach `w @ x` oder auch `x @ w` zu schreiben anstelle (des ebenfalls möglichen) `w.T @ x`.\n",
"\n",
"Dies ermöglicht es uns eine vektorisierte Form von $h(x, w)$ leicht aufzuschreiben, die sowohl mit einem Parameter `x` bestehend aus einer Zeile an Inputdaten (also z.B. einem einzelnen Trainingsbeispiel) funktioniert als auch mit der gesamten Feature-Matrix `X`, bestehend aus allen (oder mehreren) Trainingsdaten auf einmal."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "14116a52",
"metadata": {},
"outputs": [],
"source": [
"def h(x, w):\n",
" \"\"\"x und w sind numpy arrays; x kann auch die komplette Feature-Matrix sein\"\"\"\n",
" # Diese Form erlaubt es für x eine ganze (Feature-)Matrix zu übergeben. Die Matrix enthält\n",
" # zeilenweise je einen Datenpunkt, für den h berechnet werden soll.\n",
" # w @ x.T ist dann ein Vektor mit je einem Ergebnis in den Komponenten des Vektors pro Zeile\n",
" # der übergebenen (Feature-)Matrix.\n",
" return x @ w"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "82129a25",
"metadata": {},
"outputs": [],
"source": [
"# Definition der Kostenfunktion\n",
"def J(w, X, y):\n",
" \"\"\"\n",
" w, X, y müssen numpy arrays sein\n",
" X: Feature-Matrix aller Trainingsdaten inkl. Spalte mit 1; Dimension: n x (d+1)\n",
" y: Vektor aller Targets zu X\n",
" \"\"\"\n",
" errors = y - h(x=X, w=w)\n",
" mse = 1.0/(2.0*len(y)) * ( errors @ errors )\n",
" return mse"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4209dc9c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6196, 3)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.shape"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8b34a5c7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1.],\n",
" [1.],\n",
" [1.],\n",
" ...,\n",
" [1.],\n",
" [1.],\n",
" [1.]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.ones((len(data),1))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "6632cabe",
"metadata": {},
"outputs": [],
"source": [
"def feature_matrix_from_data(data):\n",
" # hier erzeugen wir die Matrix mit unseren Input-Daten (Features) inklusive der Spalte mit \"1\"\n",
" return np.hstack((np.ones((len(data),1)), data.to_numpy(copy=True)))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "74556a69",
"metadata": {},
"outputs": [],
"source": [
"# hier erzeugen wir die Matrix mit unseren Input-Daten (Features) inklusive der Spalte mit \"1\"\n",
"#X = np.hstack((np.ones((len(data),1)), data[features].to_numpy(copy=True)))\n",
"X = feature_matrix_from_data(data[features])\n",
"# und ausserdem den Vektor der Targets\n",
"y = data.Price.to_numpy(copy=True)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "79f5e3e0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6196, 3)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.shape"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "8f9724c3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1. , 2. , 79. ],\n",
" [ 1. , 3. , 150. ],\n",
" [ 1. , 4. , 142. ],\n",
" ...,\n",
" [ 1. , 1. , 35.64],\n",
" [ 1. , 2. , 61.6 ],\n",
" [ 1. , 6. , 388.5 ]])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X"
]
},
{
"cell_type": "markdown",
"id": "85733d6b",
"metadata": {},
"source": [
"**Hinweis:** Die Matrix $X$ hat zwar die gleiche Dimension wie `data`, allerdings enthält data eine Spalte `Price`, die in $X$ nicht enthalten ist. Dafür hat $X$ als erste Spalte die \"1er\"."
]
},
{
"cell_type": "markdown",
"id": "1d8e64e6",
"metadata": {},
"source": [
"## Analytische Lösung der linearen Regression\n",
"\n",
"Die analytische Lösung verläuft identisch zum Fall mit nur einem Feature.\n",
"\n",
"`np.linalg.solve(A, b)` berechnet $w$ im linearen Gleichungssystem\n",
"\n",
"$ A w = b $\n",
"\n",
"$A$ - Matrix,\n",
"$w$ - Vektor (unsere unbekannten),\n",
"$b$ - Vektor.\n",
"\n",
"Wir suchen die Lösung $w$ im folgenden Gleichungssystem:\n",
"\n",
"$$ X^T X w = X^T Y $$\n",
"\n",
"Mit $A = X^TX$ und $b = X^T Y$ berechnet `np.linalg.solve(A, b)` unsere gesuchten Paramter für die lineare Regression."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "fc1d2c0a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Die 3 Parameter der linearen Regression:\n",
"[ 42769.88494072 232612.86504788 2431.15453776]\n",
"Kostenfunktion J(w_ana): 147658829426.14856\n",
"CPU times: user 11.3 ms, sys: 2.13 ms, total: 13.4 ms\n",
"Wall time: 1.7 ms\n"
]
}
],
"source": [
"%%time\n",
"w_ana = np.linalg.solve(X.T @ X, X.T @ y)\n",
"print('Die {} Parameter der linearen Regression:\\n{}'.format(len(w_ana), w_ana))\n",
"J_ana = J(w=w_ana, X=X, y=y)\n",
"print('Kostenfunktion J(w_ana): {}'.format(J_ana))"
]
},
{
"cell_type": "markdown",
"id": "daab4572",
"metadata": {},
"source": [
"## Numerische Lösung mit Gradient Descent"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "b314f36a",
"metadata": {},
"outputs": [],
"source": [
"## Numerische Lösung mit Gradient Descent\n",
"def grad_desc_upd(w, alpha, x, y):\n",
" \"\"\"y, x sind Vektoren (numpy-arrays)\"\"\"\n",
" errors = y - h(x=x, w=w)\n",
" w = w + alpha / len(y) * (x.T @ errors)\n",
" return w"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "3dc2775c",
"metadata": {},
"outputs": [],
"source": [
"def grad_desc(w, alpha, x, y, n_iterations):\n",
" J_all = [[0], [J(w=w, X=x, y=y)]]\n",
" for it in tqdm(range(n_iterations)):\n",
" w = grad_desc_upd(w=w, alpha=alpha, x=x, y=y)\n",
" if it % 100 == 0:\n",
" J_all[1].append(J(w=w, X=x, y=y))\n",
" J_all[0].append(it)\n",
" return w, J_all"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a801cac3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 2.43686014, 5.00088371, 206.19316114])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grad_desc_upd(w=np.ones(X.shape[1]), alpha=1e-6, x=X[:7], y=y[:7])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "dc6f778a",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e9c9403f9b08472294edb52bb2c10c1d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/10000 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.94 s, sys: 1.87 s, total: 3.81 s\n",
"Wall time: 417 ms\n"
]
}
],
"source": [
"%%time\n",
"w_init = np.ones(X.shape[1])\n",
"alpha = 3.1e-10 # verschiedene alpha ausprobieren\n",
"n_iterations = 10000\n",
"_, J_tmp = grad_desc(w=w_init, alpha=alpha, x=X, y=y, n_iterations=n_iterations)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "c04ebb9f",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "57bac370a44f48e9951dc5dba56b29ef",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/100000 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Die 3 Parameter der linearen Regression:\n",
"[ 10.49970832 31.89447067 1601.95876825]\n",
"Kostenfunktion J: 540959857400.77966\n",
"J relativ zu Startkosten: 0.6771395257663181\n",
"Vergleich Kostenfunktion zu analytischer Lösung: 3.66*J_ana\n",
"Relative Abweichung der Parameter zu analytischer Lösung: [2.45493022e-04 1.37113958e-04 6.58929222e-01]*w_ana\n",
"CPU times: user 21.5 s, sys: 11 s, total: 32.6 s\n",
"Wall time: 3.53 s\n"
]
}
],
"source": [
"%%time\n",
"w_init = np.ones(X.shape[1])\n",
"alpha = 1e-10 # verschiedene alpha ausprobieren\n",
"n_iterations = 100000\n",
"w_gd, J_all = grad_desc(w=w_init, alpha=alpha, x=X, y=y, n_iterations=n_iterations)\n",
"print('Die {} Parameter der linearen Regression:\\n{}'.format(len(w_gd), w_gd))\n",
"print('Kostenfunktion J: {}'.format(J_all[1][-1]))\n",
"print('J relativ zu Startkosten: {}'.format(J_all[1][-1]/J_all[1][0]))\n",
"print('Vergleich Kostenfunktion zu analytischer Lösung: {:.2f}*J_ana'.format(J_all[1][-1]/J_ana))\n",
"print('Relative Abweichung der Parameter zu analytischer Lösung: {}*w_ana'.format((w_gd)/w_ana))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "8b4db3ee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: >"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAicAAAGsCAYAAAAGzwdbAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABHWUlEQVR4nO3deVhU9eIG8PfMDAwgzIDIqoC4i6KiuAAuebXILc3UNBT3LU2xsvLX9bZY2aZl5r6XW+7mHmqugIqKiiiKKKCyuMGA7Mz5/WHNjevGIHBmeT/Pc57ncuZ7Zt45VLx35pzvVxBFUQQRERGRgZBJHYCIiIjon1hOiIiIyKCwnBAREZFBYTkhIiIig8JyQkRERAaF5YSIiIgMCssJERERGRSWEyIiIjIoLCdERERkUFhOiIiIyKAYVTk5cuQIevXqBXd3dwiCgG3btul1fH5+PoYNGwZfX18oFAr06dPnsTGpqal466230KBBA8hkMoSFhVVIdiIiIioboyonDx8+RPPmzTFv3rxyHV9SUgJra2tMmjQJXbt2feKYgoICODk54d///jeaN2/+InGJiIioHBRSB9BHt27d0K1bt6c+XlBQgI8//hjr1q1DZmYmmjZtim+++QYvvfQSAKBatWpYsGABAOD48ePIzMx87Dlq166NOXPmAACWL19e4e+BiIiIns2oPjl5nokTJyIyMhLr16/H+fPn0b9/f7z66qu4evWq1NGIiIiojEymnCQnJ2PFihXYuHEjOnTogLp16+L9999H+/btsWLFCqnjERERURkZ1dc6z3LhwgWUlJSgQYMGpfYXFBTA0dFRolRERESkL5MpJzk5OZDL5Th9+jTkcnmpx2xtbSVKRURERPoymXLi5+eHkpISZGRkoEOHDlLHISIionIyqnKSk5ODhIQE3c/Xr19HTEwMqlevjgYNGiAkJAShoaGYNWsW/Pz8cOfOHRw4cADNmjVDjx49AABxcXEoLCzE/fv3kZ2djZiYGABAixYtdM/7976cnBzcuXMHMTExsLS0hI+PT1W9VSIiIrMliKIoSh2irA4dOoTOnTs/tn/o0KFYuXIlioqK8MUXX+CXX37BrVu3UKNGDbRr1w6fffYZfH19ATy6VTgpKemx5/jnaRAE4bHHvby8cOPGjYp7M0RERPRERlVOiIiIyPSZzK3EREREZBpYToiIiMigGMUFsVqtFrdv34adnd0TrwchIiIiwyOKIrKzs+Hu7g6ZrOyfhxhFObl9+zY8PDykjkFERETlkJKSglq1apV5vFGUEzs7OwCP3pxKpZI4DREREZWFRqOBh4eH7u94WRlFOfn7qxyVSsVyQkREZGT0vSSDF8QSERGRQWE5ISIiIoPCckJEREQGheWEiIiIDArLCRERERkUlhMiIiIyKCwnREREZFBYToiIiMig6FVOSkpKMH36dHh7e8Pa2hp169bFjBkzIIriM487dOgQWrZsCaVSiXr16mHlypUvkpmIiIhMmF4zxH7zzTdYsGABVq1ahSZNmiA6OhrDhw+HWq3GpEmTnnjM9evX0aNHD4wbNw5r1qzBgQMHMGrUKLi5uSE4OLhC3gQRERGZDkF83sce/9CzZ0+4uLhg2bJlun1vvPEGrK2tsXr16ice8+GHH2LXrl2IjY3V7Rs4cCAyMzOxd+/eMr2uRqOBWq1GVlYWp68nIiIyEuX9+63X1zqBgYE4cOAArly5AgA4d+4cjh07hm7duj31mMjISHTt2rXUvuDgYERGRj71mIKCAmg0mlIbERERmQe9yslHH32EgQMHolGjRrCwsICfnx/CwsIQEhLy1GPS0tLg4uJSap+Liws0Gg3y8vKeeMzMmTOhVqt1m4eHhz4xy+zIlTsY/Us08otKKuX5iYiISH96lZMNGzZgzZo1WLt2Lc6cOYNVq1bh+++/x6pVqyo01LRp05CVlaXbUlJSKvT5AeBhQTHCfotBeFw6hq84hdzC4gp/DSIiItKfXuVk6tSpuk9PfH19MWTIEEyZMgUzZ8586jGurq5IT08vtS89PR0qlQrW1tZPPEapVEKlUpXaKlo1pQKLhrSCrVKByMR7CF12Etn5RRX+OkRERKQfvcpJbm4uZLLSh8jlcmi12qceExAQgAMHDpTaFx4ejoCAAH1eulK0rl0dv45sA5WVAtFJDzB42Ulk5bKgEBERSUmvctKrVy98+eWX2LVrF27cuIGtW7di9uzZeP3113Vjpk2bhtDQUN3P48aNQ2JiIj744ANcvnwZ8+fPx4YNGzBlypSKexcvwM/TAWtHt4ODjQXOpWRi0JIo3H9YKHUsIiIis6VXOZk7dy769euHt99+G40bN8b777+PsWPHYsaMGboxqampSE5O1v3s7e2NXbt2ITw8HM2bN8esWbOwdOlSg5rjpGlNNdaPCUANW0vEpWowcHEk7mQXSB2LiIjILOk1z4lUqmqek4SMHIQsjUK6pgB1nKph7ah2cFVbVdrrERERmbIqmefE1NVztsWGsQGoaW+NxDsPMWBRJG4+yJU6FhERkVlhOfkfXo7V8NvYdvCsboPk+7l4c1EUbtx9KHUsIiIis8Fy8gS1HGywYWwA6jhVw63MPLy5OBIJGTlSxyIiIjILLCdP4aq2wm9jAtDQxQ7pmgIMXByJy2mcRp+IiKiysZw8g5OdEuvGtEMTdxXu5hRi4OIoxN7KkjoWERGRSWM5eY7q1SyxdlQ7NPewR2ZuEQYticLZ5AdSxyIiIjJZLCdloLaxwOqRbdC6tgOy84sxeOkJRCXekzoWERGRSWI5KSM7KwusGtEGgXUd8bCwBEOXn8Sh+AypYxEREZkclhM92FgqsHxYa3Rp5IyCYi1G/xKNPRdSpY5FRERkUlhO9GRlIcfCIa3Qo5kbikpETFh7BlvO3JQ6FhERkclgOSkHC7kMPw30wwD/WtCKwLsbzuHXqCSpYxEREZkElpNykssEfN23GYYF1gYATN8Wi0WHr0kbioiIyASwnLwAmUzAJ718MKFzXQDAzD2XMTv8CoxgLUUiIiKDxXLyggRBwNTgRpga3BAA8NOBq/hi1yUWFCIionJiOakgEzrXw2evNQEALDt2Hf+39QJKtCwoRERE+mI5qUBDA2vj237NIBOAdSdT8O6GGBSVaKWORUREZFRYTirYAH8P/DTIDwqZgO0xt/H2mjMoKC6ROhYREZHRYDmpBD2buWNxaCtYKmQIj0vHqFXRyC0sljoWERGRUWA5qST/auSClcNaw8ZSjqNX72Lo8pPQ5BdJHYuIiMjgsZxUosB6NfDryLaws1Lg1I0HCFlyAg8eFkodi4iIyKCxnFSyVl4OWDe6HapXs8SFW1l4c3EkMjT5UsciIiIyWCwnVaBpTTU2jG0HF5USV9JzMGBRJFLu50odi4iIyCCxnFSRes522Dg2EB7VrXHjXi76L4zE1fRsqWMREREZHJaTKuTpaIONYwNR39kWaZp8DFgUiXMpmVLHIiIiMigsJ1XMVW2FDWMD0LyWGg9yi/DWkihEXrsndSwiIiKDwXIiAYdqllgzuh0C6zriYWEJhq44ifC4dKljERERGQSWE4nYKhVYPqw1XvFxQWGxFuNWn8aWMzeljkVERCQ5lhMJWVnIMT+kJfq2rIkSrYh3N5zDyuPXpY5FREQkKZYTiSnkMnzfrzmGBdYGAHy6Iw5z9l+FKHJFYyIiMk8sJwZAJhPwSS8fTOnaAADww/4r+HxnHLRaFhQiIjI/LCcGQhAETO5aH5/08gEArDh+A1M3nUdxiVbiZERERFWL5cTADA/yxuwBzSGXCdh85ibeXnMG+UUlUsciIiKqMiwnBqhvy1pYOLgVLBUy/BGXjhErTyGnoFjqWERERFWC5cRAvezjgpXDW6OapRwR1+4hZEkUVzQmIiKzwHJiwALr1sC6Me3gYGOBczezMGBRJNKyuKIxERGZNpYTA9eslj02jguAq8oKVzNy0G9hBG7cfSh1LCIiokrDcmIE6jnbYdP4ANR2tMHNB3notzASl1I1UsciIiKqFCwnRqKWgw02jgtEYzcV7uYUYMCiSJxI5IKBRERkelhOjIi
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.lineplot(x=J_all[0], y=J_all[1])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "31574b9a",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "16a65185cb4541ffbdb5a02b33027697",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1000000 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Die 3 Parameter der linearen Regression:\n",
"[ 81.08915614 248.45370379 6493.32860783]\n",
"Kostenfunktion J: 201611248738.63248\n",
"J relativ zu Startkosten: 0.37282268754553793\n",
"Vergleich Kostenfunktion zu analytischer Lösung: 1.36539*J_ana\n",
"Relative Abweichung der Parameter zu analytischer Lösung: [1.89594048e-03 1.06809958e-03 2.67088270e+00]*w_ana\n",
"CPU times: user 3min 26s, sys: 1min 26s, total: 4min 53s\n",
"Wall time: 31.6 s\n"
]
}
],
"source": [
"%%time\n",
"alpha = 3.1e-10 # verschiedene alpha ausprobieren\n",
"n_iterations = 1000000\n",
"w_gd2, J_all2 = grad_desc(w=w_gd, alpha=alpha, x=X, y=y, n_iterations=n_iterations)\n",
"print('Die {} Parameter der linearen Regression:\\n{}'.format(len(w_gd2), w_gd2))\n",
"print('Kostenfunktion J: {}'.format(J_all2[1][-1]))\n",
"print('J relativ zu Startkosten: {}'.format(J_all2[1][-1]/J_all2[1][0]))\n",
"print('Vergleich Kostenfunktion zu analytischer Lösung: {:.5f}*J_ana'.format(J_all2[1][-1]/J_ana))\n",
"print('Relative Abweichung der Parameter zu analytischer Lösung: {}*w_ana'.format((w_gd2)/w_ana))"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "4434e050",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: >"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiMAAAG+CAYAAABBOgSxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA4xElEQVR4nO3deXxU5f3//ffMJDPZBwLZgAiyr2FzISh1o0WlFtp+rUUU2y/a2kKL7U+/LdXeLmiDrXjXb7UobtRbMRZvwRYRtShQDVhWjYDIngBZEMhMEmCSzJzfH0kGAknIJDNzMsnr+XicR5yT68z5zIGH8+a6rnMdi2EYhgAAAExiNbsAAADQuRFGAACAqQgjAADAVIQRAABgKsIIAAAwFWEEAACYijACAABMRRgBAACmIowAAABTEUYAAICpIiqMrFu3TjfddJN69Oghi8Wi5cuXB3T86dOn9aMf/UgjRoxQVFSUpk6del6boqIi3XrrrRo4cKCsVqvuueeeoNQOAAAaF1FhpLKyUiNHjtQzzzzTquO9Xq9iY2P1y1/+UhMnTmy0jcfjUUpKih544AGNHDmyLeUCAIAWiDK7gEDccMMNuuGGG5r8vcfj0f3336/XX39dZWVlGj58uB5//HFdffXVkqT4+HgtXLhQkvTJJ5+orKzsvPfo06ePnnrqKUnSSy+9FPTPAAAAGoqonpELmT17ttavX6/c3Fx9/vnnuvnmm3X99ddr9+7dZpcGAACa0GHCSEFBgV5++WUtXbpUEyZMUL9+/XTvvffqyiuv1Msvv2x2eQAAoAkRNUzTnPz8fHm9Xg0cOLDBfo/Ho27duplUFQAAuJAOE0YqKipks9m0efNm2Wy2Br9LSEgwqSoAAHAhHSaMjB49Wl6vV6WlpZowYYLZ5QAAgBaKqDBSUVGhPXv2+F/v379f27ZtU3JysgYOHKjp06drxowZWrBggUaPHq2jR49q9erVysrK0uTJkyVJO3bsUFVVlY4fP67y8nJt27ZNkjRq1Cj/+9bvq6io0NGjR7Vt2zbZ7XYNHTo0XB8VAIBOw2IYhmF2ES21Zs0aXXPNNeftv+OOO7R48WJVV1fr0Ucf1SuvvKLDhw+re/fuGjdunB5++GGNGDFCUu2tuwcPHjzvPc6+DBaL5bzf9+7dWwcOHAjehwEAAJIiLIwAAICOp8Pc2gsAACITYQQAAJgqIiaw+nw+HTlyRImJiY3O5wAAAO2PYRgqLy9Xjx49ZLU23f8REWHkyJEjyszMNLsMAADQCoWFherVq1eTv4+IMJKYmCip9sMkJSWZXA0AAGgJt9utzMxM//d4UyIijNQPzSQlJRFGAACIMBeaYhHQBNaHHnpIFoulwTZ48OAm2y9evPi89jExMYGcEgAAdHAB94wMGzZM//rXv868QVTzb5GUlKRdu3b5XzMBFQAAnC3gMBIVFaX09PQWt7dYLAG1BwAAnUvA64zs3r1bPXr0UN++fTV9+nQVFBQ0276iokK9e/dWZmampkyZou3bt1/wHB6PR263u8EGAAA6poDCyOWXX67Fixdr1apVWrhwofbv368JEyaovLy80faDBg3SSy+9pLfffluvvvqqfD6fxo8fr0OHDjV7npycHDmdTv/Gbb0AAHRcbXo2TVlZmXr37q0nn3xSM2fOvGD76upqDRkyRNOmTdO8efOabOfxeOTxePyv628Ncrlc3E0DAECEcLvdcjqdF/z+btOtvV26dNHAgQO1Z8+eFrWPjo7W6NGjL9je4XDI4XC0pTQAABAh2vRsmoqKCu3du1cZGRktau/1epWfn9/i9gAAoOMLKIzce++9Wrt2rQ4cOKC8vDx997vflc1m07Rp0yRJM2bM0Ny5c/3tH3nkEb3//vvat2+ftmzZottuu00HDx7UnXfeGdxPAQAAIlZAwzSHDh3StGnTdOzYMaWkpOjKK6/Uhg0blJKSIkkqKCho8CCcEydO6K677lJxcbG6du2qsWPHKi8vT0OHDg3upwAAABGrTRNYw6WlE2AAAED70dLv7zbNGQEAAGirTh1GfvH6Vt38bJ6KXafNLgUAgE4rIp7aGyob9x9Xsfu0SstPK93JA/wAADBDp+4Z6RpvlyQdr6wyuRIAADqvTh1GutWFkRMnCSMAAJilU4eR+p6RYxWEEQAAzNKpwwg9IwAAmK9Th5GuccwZAQDAbJ06jCQnEEYAADBb5w4j9IwAAGC6zh1GuLUXAADTEUZEGAEAwEyEEUllp6rl9bX75wUCANAhdeow0iUuWpJkGFIZt/cCAGCKTh1Gom1WOWNrAwlrjQAAYI5OHUakM0M1rMIKAIA5CCOswgoAgKk6fRipX4X1GHfUAABgik4fRvzPpyGMAABgik4fRvxP7iWMAABgik4fRugZAQDAXJ0+jNAzAgCAuTp9GOnG3TQAAJiq04eRrv5hmmqTKwEAoHPq9GGkm3+YxmNyJQAAdE6dPozU94ycrvbpVJXX5GoAAOh8On0YibfbZI+qvQz0jgAAEH6dPoxYLBYlxzFvBAAAs3T6MCKd9bA8ekYAAAg7woh4WB4AAGYijOisnpEKwggAAOFGGNGZMHKcVVgBAAg7woik7gm1YeTrCuaMAAAQboQRSd0THJIYpgEAwAyEEUnd6sIIPSMAAIQfYURnD9PQMwIAQLgRRnTWME2lR4ZhmFwNAACdC2FEUreEM8+nqeT5NAAAhBVhRFKcPUpxdpsk6RjzRgAACKuAwshDDz0ki8XSYBs8eHCzxyxdulSDBw9WTEyMRowYoZUrV7ap4FDpxu29AACYIuCekWHDhqmoqMi/ffzxx022zcvL07Rp0zRz5kxt3bpVU6dO1dSpU/XFF1+0qehQ6O6/o4ZJrAAAhFPAYSQqKkrp6en+rXv37k22feqpp3T99dfrvvvu05AhQzRv3jyNGTNGTz/9dJuKDoVu8dzeCwCAGQIOI7t371aPHj3Ut29fTZ8+XQUFBU22Xb9+vSZOnNhg36RJk7R+/fpmz+HxeOR2uxtsoZaSyPNpAAAwQ0Bh5PLLL9fixYu1atUqLVy4UPv379eECRNUXl7eaPvi4mKlpaU12JeWlqbi4uJmz5OTkyOn0+nfMjMzAymzVegZAQDAHAGFkRtuuEE333yzsrKyNGnSJK1cuVJlZWX6+9//HtSi5s6dK5fL5d8KCwuD+v6NqZ/ASs8IAADhFdWWg7t06aKBAwdqz549jf4+PT1dJSUlDfaVlJQoPT292fd1OBxyOBxtKS1g3VkSHgAAU7RpnZGKigrt3btXGRkZjf4+Oztbq1evbrDvgw8+UHZ2dltOGxLc2gsAgDkCCiP33nuv1q5dqwMHDigvL0/f/e53ZbPZNG3aNEnSjBkzNHfuXH/7OXPmaNWqVVqwYIG+/PJLPfTQQ9q0aZNmz54d3E8RBCn+JeEZpgEAIJwCGqY5dOiQpk2bpmPHjiklJUVXXnmlNmzYoJSUFElSQUGBrNYz+Wb8+PFasmSJHnjgAf3ud7/TgAEDtHz5cg0fPjy4nyII6p/cW3ayWtVen6JtLE4LAEA4WIwIeDKc2+2W0+mUy+VSUlJSSM7h8xka8MC78voMffq765SWFBOS8wAA0Fm09Pubf/7XsVotSo6vnTdytJx5IwAAhAth5Czd6sII80YAAAgfwshZUhLrbu+lZwQAgLAhjJzlTM8IYQQAgHAhjJylfuEzVmEFACB8CCNnqb+99ygLnwEAEDaEkbN05/k0AACEHWHkLDyfBgCA8COMnIXn0wAAEH6EkbPU39p7rKJKPl+7X5gWAIAOgTBylvphmhqfoRMnmTcCAEA4EEbOEm2z+peEL2XhMwAAwoIwco7UuqEank8DAEB4EEbOUT9vhJ4RAADCgzByjtTEGElSaflpkysBAKBzIIycw98z4qZnBACAcCCMnMM/Z4S1RgAACAvCyDlSk+rCCD0jAACEBWHkHCkJ9RNYmTMCAEA4EEbOkZpUO4GVW3sBAAgPwsg56ueMVFZ5VempMbkaAAA6PsLIOeIdUYqz2ySx1ggAAOFAGGkEq7ACABA+hJFGsPAZAADhQxhpREoSC58BABA
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.lineplot(x=J_all2[0], y=J_all2[1])"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "4d0fbfee",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d680d4cc18984adab5920af97da76e2e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/10000000 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Die 3 Parameter der linearen Regression:\n",
"[ 540.5020477 1598.80804052 6469.41806027]\n",
"Kostenfunktion J: 200954758401.09796\n",
"J relativ zu Startkosten: 0.9967438136028319\n",
"Vergleich Kostenfunktion zu analytischer Lösung: 1.36*J_ana\n",
"Relative Abweichung der Parameter zu analytischer Lösung: [0.01263744 0.00687326 2.66104765]*w_ana\n",
"CPU times: user 37min 33s, sys: 9min 27s, total: 47min 1s\n",
"Wall time: 5min 1s\n"
]
}
],
"source": [
"%%time\n",
"alpha = 3.1e-10 # verschiedene alpha ausprobieren\n",
"n_iterations = 10000000\n",
"w_gd3, J_all3 = grad_desc(w=w_gd2, alpha=alpha, x=X, y=y, n_iterations=n_iterations)\n",
"\n",
"print('Die {} Parameter der linearen Regression:\\n{}'.format(len(w_gd3), w_gd3))\n",
"print('Kostenfunktion J: {}'.format(J_all3[1][-1]))\n",
"print('J relativ zu Startkosten: {}'.format(J_all3[1][-1]/J_all3[1][0]))\n",
"print('Vergleich Kostenfunktion zu analytischer Lösung: {:.2f}*J_ana'.format(J_all3[1][-1]/J_ana))\n",
"print('Relative Abweichung der Parameter zu analytischer Lösung: {}*w_ana'.format((w_gd3)/w_ana))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "252656f1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: >"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjUAAAG+CAYAAABrivUeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABPtElEQVR4nO3deVxU9f4/8NcZdmQYBWVRUHFBXFDcWdXKiwoutKipuWUuCYL1bbO6V28b2c1ubrmVaCmRmKgh2SUXAgFNBZUUXBEUQRQdcJB1zu8Pb9wfCcKwHWbm9Xw8zuNxPfM557w4t5xX854ZBFEURRARERFpOZnUAYiIiIiaAksNERER6QSWGiIiItIJLDVERESkE1hqiIiISCew1BAREZFOYKkhIiIincBSQ0RERDqBpYaIiIh0AksNERER6QS9LDW//fYbJkyYgI4dO0IQBOzdu1ej40tKSjBnzhy4urrC0NAQAQEBj625desWpk+fDmdnZ8hkMixdurRJshMREVHN9LLUqFQqDBgwAOvXr2/Q8ZWVlTAzM0NwcDBGjx5d45rS0lJ06NAB77//PgYMGNCYuERERFQPhlIHkMK4ceMwbty4Wh8vLS3Fe++9h++//x73799Hv379sHLlSowaNQoA0KZNG2zYsAEAcOzYMdy/f/+xc3Tt2hWrV68GAGzdurXJfwYiIiKqTi9fqalLUFAQkpKSEBERgbNnz2Ly5MkYO3YsLl26JHU0IiIiqgVLzV9kZWUhLCwMkZGR8PHxQffu3fHGG2/A29sbYWFhUscjIiKiWujl+OlJzp07h8rKSjg7O1fbX1paCmtra4lSERERUV1Yav7iwYMHMDAwwKlTp2BgYFDtMQsLC4lSERERUV1Yav5i4MCBqKysxO3bt+Hj4yN1HCIiIqonvSw1Dx48wOXLl6v+fO3aNaSmpsLKygrOzs6YMWMGZs2ahVWrVmHgwIHIz8/HoUOH0L9/f/j7+wMAzp8/j7KyMhQUFKCoqAipqakAADc3t6rz/rnvwYMHyM/PR2pqKoyNjdGnT5+W+lGJiIj0hiCKoih1iJZ29OhRPPXUU4/tnz17NrZt24by8nJ89NFH+Pbbb3Hz5k20b98e7u7u+Oc//wlXV1cAjz6yff369cfO8f/fTkEQHnu8S5cuyMzMbLofhoiIiADoaakhIiIi3cOPdBMREZFOYKkhIiIinaA3bxRWq9XIycmBXC6v8b0uRERE1PqIooiioiJ07NgRMtmTX4vRm1KTk5MDR0dHqWMQERFRA2RnZ8PBweGJa/Sm1MjlcgCPboqlpaXEaYiIiKg+CgsL4ejoWPU8/iR6U2r+HDlZWlqy1BAREWmZ+rx1hG8UJiIiIp3AUkNEREQ6gaWGiIiIdAJLDREREekElhoiIiLSCSw1REREpBNYaoiIiEgnsNQQERGRTmCpISIiIp3AUkNEREQ6gaWGiIiIdAJLTRMoq1BLHYGIiEjvsdQ0UlmFGlM2JSE05gLKK1luiIiIpKI3v6W7uRxOv43U7PtIzb6P3zMLsG76IHRsayZ1LCIiIr3DV2oaaWw/O2x8aRDkpoY4nXUffmvicehCntSxiIiI9I5GpSY0NBRDhw6FXC6HjY0NAgICkJGRUedxkZGRcHFxgampKVxdXRETE1Pt8T179sDX1xfW1tYQBAGpqak1nicpKQlPP/002rRpA0tLS4wYMQIPHz7U5EdoFmP72ePAEh/0d1DgfnE55m0/yXEUERFRC9Oo1MTFxSEwMBDJycmIjY1FeXk5fH19oVKpaj0mMTER06ZNw7x585CSkoKAgAAEBAQgLS2tao1KpYK3tzdWrlxZ63mSkpIwduxY+Pr64sSJE/j9998RFBQEmax1vNjU2dockYs8MMezKwBg029XMXVTEm7el750ERER6QNBFEWxoQfn5+fDxsYGcXFxGDFiRI1rpk6dCpVKhejo6Kp97u7ucHNzw8aNG6utzczMhJOTE1JSUuDm5lbtMXd3d/ztb3/Dhx9+2KCshYWFUCgUUCqVsLS0bNA56utg2i28ufssikoq0NbcCKsmD8AzvW2b9ZpERES6SJPn70a9zKFUKgEAVlZWta5JSkrC6NGjq+0bM2YMkpKS6n2d27dv4/jx47CxsYGnpydsbW0xcuRIJCQk1HpMaWkpCgsLq20theMoIiKiltfgUqNWq7F06VJ4eXmhX79+ta7Lzc2FrW31VylsbW2Rm5tb72tdvXoVALBixQrMnz8fBw8exKBBg/DMM8/g0qVLNR4TGhoKhUJRtTk6Otb7ek2B4ygiIqKW1eBSExgYiLS0NERERDRlnhqp1Y9e4Vi4cCHmzp2LgQMH4t///jd69eqFrVu31njMsmXLoFQqq7bs7Oxmz/lXJoYGWDGxb7VPR/nz01FERETNokGlJigoCNHR0Thy5AgcHByeuNbOzg55edWfxPPy8mBnZ1fv69nb2wMA+vTpU21/7969kZWVVeMxJiYmsLS0rLZJheMoIiKi5qdRqRFFEUFBQYiKisLhw4fh5ORU5zEeHh44dOhQtX2xsbHw8PCo93W7du2Kjh07Pvbx8YsXL6JLly71Po+U/hxHzfXqCoDjKCIioqam0TcKBwYGIjw8HPv27YNcLq96X4xCoYCZ2aNv0Z01axY6deqE0NBQAEBISAhGjhyJVatWwd/fHxERETh58iQ2b95cdd6CggJkZWUhJycHAKrKi52dHezs7CAIAt58800sX74cAwYMgJubG7Zv34709HTs3r278XehhZgYGmD5hL4Y7mSFN3efrRpH8dNRRERETUDUAIAat7CwsKo1I0eOFGfPnl3tuF27donOzs6isbGx2LdvX/HAgQPVHg8LC6vxvMuXL6+2LjQ0VHRwcBDNzc1FDw8PMT4+vt7ZlUqlCEBUKpWa/MjNJuuuSpywNl7s8na02OXtaPHjA+fFsopKqWMRERG1Kpo8fzfqe2q0SUt+T019lVZU4tOf0xF2LBMAMKhzW6ydPgid+LujiIiIALTg99RQ4/w5jtr40uD//e6o1fx0FBERUUOw1LQCY/vZISbYBwMcFFA+fPTpqE/46SgiIiKNsNS0Eo5W5ohc5Fn16ajNv13FFH46ioiIqN5YaloRY0NZtXFUyn/HUb+e5ziKiIioLiw1rdBfx1GvfMtxFBERUV1YalqpP8dRL3s9+oLDP8dRN+4VS5yMiIiodWKpacWMDWX4x4Q+2DTzf+Mo/zUJHEcRERHVgKVGC4zp+99xlGPbqnHUR9HnUVbBcRQREdGfWGq0hKOVOSIXemCe96Nx1NcJ1zBlUxKyCziOIiIiAlhqtIqxoQx/H98Hm2cOhqWpIVKzH/3uqP/8kSt1NCIiIsmx1Ggh3752OBDsAzfHtigsqcCC707hg584jiIiIv3GUqOlHK3MsWuhB+b7PBpHbT12DZM3JnIcRUREeoulRosZG8rwnn8ffD1rCBRmRjhzQwm/NfE4mMZxFBER6R+WGh0wuo8tYkJ8MLBzWxSVVGDRjlNYsf8PlFZUSh2NiIioxbDU6IhObc2wa6EHFo7oBgDYlpiJFzYkIesux1FERKQfWGp0iJGBDMv8emPrnCFoa26EczeV8F8Tj5/P3ZI6GhERUbNjqdFBT7vYIibYB4O7tENRaQVe3Xkay/elcRxFREQ6jaVGR3Vsa4aIBe5YNLI7AGB70nU8vyER1++qJE5GRETUPFhqdJiRgQzvjHNB2JyhaGduhLSbhRi/JgEHznIcRUREuoelRg885WKDmBAfDPnvOCow/DT+vjcNJeUcRxERke5gqdET9opH46jFox6No75LfjSOunaH4ygiItINLDV6xNBAhrfGumDb3KGwamOMP3IKMWFtAn46kyN1NCIiokZjqdFDo3rZICbYB8O6WuFBaQWWfJ+C96LOcRxFRERajaVGT9kpTBE+fziCnuoBQQB2Hs/Cs18l4mr+A6mjERERNQhLjR4zNJDhjTG9sH3uMFi3McaFW4/GUftSb0odjYiISGMsNYQRzh0QE+KD4U5WUJVVIiQiFcv2cBxFRETahaWGAAC2lqbY+cpwBD/9aBz1/Yk
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.lineplot(x=J_all3[0], y=J_all3[1])"
]
},
{
"cell_type": "markdown",
"id": "e8b1f648",
"metadata": {},
"source": [
"## $R^2$"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "50022cc2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"erklärte Varianz (R^2): 0.3520362618371272\n"
]
}
],
"source": [
"X = feature_matrix_from_data(data[features])\n",
"y = data.Price.to_numpy(copy=True)\n",
"J_ana = J(w=w_ana, X=X, y=y)\n",
"MSE = 2*J_ana\n",
"mu_y = sum(y)/len(y)\n",
"sigma_y_quadrat = ( (y - mu_y) @ (y - mu_y) ) / len(y)\n",
"R2 = 1 - MSE/sigma_y_quadrat\n",
"print('erklärte Varianz (R^2): {}'.format(R2))"
]
},
{
"cell_type": "markdown",
"id": "104ad3d4",
"metadata": {},
"source": [
"$R^2$ ist größer als beim Modell mit nur 1 Feature (BuildingArea)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}