ki-dhbw/Aufgaben/00 - Python Kurzeinführung...

1239 lines
170 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "24e7652f",
"metadata": {},
"source": [
"# \"Python für ML\" Kurzeinführung"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "92bb5be1",
"metadata": {},
"outputs": [],
"source": [
"# imports überall im Code möglich, aber die Konvention ist alle benötigten import statements\n",
"# gleich zu Beginn einer Datei zu machen\n",
"\n",
"# numpy ist ein Python-Modul für Numerik, das sowohl Funktionalität als auch Effizienz bietet\n",
"import numpy as np\n",
"\n",
"# pandas ist sehr gut zum Arbeiten mit tabellarischen Daten, egal ob csv, xls oder xlsx\n",
"import pandas as pd\n",
"\n",
"# plotting settings\n",
"pd.plotting.register_matplotlib_converters()\n",
"\n",
"# matplotlib ist ein sehr umfangreiches Modul zum Erstellen von Visualisierungen/Plots\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# seaborn erleichtert das Erstellen von oft verwendeten Plot-Typen;\n",
"# es basiert selbst auf matplotlib und man kann beides kombinieren\n",
"# eine schöne Einführung in Seaborn: https://www.kaggle.com/learn/data-visualization\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"id": "0562db47",
"metadata": {},
"source": [
"Es gibt verschiedene Zelltypen in Jupyter - Code oder Markdown. Mit Markdown kann man den Code schöner dokumentieren als durch Kommentare im Code selbst. Es sind *verschiedene* **Formatierungen** und sogar LaTeX-ähnliche mathematische Formeln möglich. Sowohl inline ($h_\\theta(x) = \\theta^Tx$) als auch zentriert in separaten Zeilen:\n",
"\n",
"$$h_\\theta(x) = \\theta^Tx$$\n",
"\n",
"<p><b>HTML</b> wird ebenfalls erkannt.</p>\n",
"\n",
"Wir laden jetzt eine CSV-Datei mit Pandas:"
]
},
{
"cell_type": "markdown",
"id": "55f91177",
"metadata": {},
"source": [
"## Daten laden"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "724b4875",
"metadata": {},
"outputs": [],
"source": [
"data_file_path = '../data/exam-iq.csv'\n",
"data = pd.read_csv(data_file_path)\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b5661e8d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pass</th>\n",
" <th>Hours</th>\n",
" <th>IQ</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0.50</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>0.75</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>1.00</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>1.25</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>1.50</td>\n",
" <td>100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0</td>\n",
" <td>1.75</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0</td>\n",
" <td>1.75</td>\n",
" <td>115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1</td>\n",
" <td>2.00</td>\n",
" <td>104</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>2.25</td>\n",
" <td>120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0</td>\n",
" <td>2.50</td>\n",
" <td>98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1</td>\n",
" <td>2.75</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>0</td>\n",
" <td>3.00</td>\n",
" <td>88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>1</td>\n",
" <td>3.25</td>\n",
" <td>108</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>1</td>\n",
" <td>4.00</td>\n",
" <td>109</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>1</td>\n",
" <td>4.25</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>1</td>\n",
" <td>4.50</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>1</td>\n",
" <td>4.75</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>1</td>\n",
" <td>5.00</td>\n",
" <td>102</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>1</td>\n",
" <td>5.50</td>\n",
" <td>109</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>0</td>\n",
" <td>3.50</td>\n",
" <td>125</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pass Hours IQ\n",
"0 0 0.50 110\n",
"1 0 0.75 95\n",
"2 0 1.00 118\n",
"3 0 1.25 97\n",
"4 0 1.50 100\n",
"5 0 1.75 110\n",
"6 0 1.75 115\n",
"7 1 2.00 104\n",
"8 1 2.25 120\n",
"9 0 2.50 98\n",
"10 1 2.75 118\n",
"11 0 3.00 88\n",
"12 1 3.25 108\n",
"13 1 4.00 109\n",
"14 1 4.25 110\n",
"15 1 4.50 112\n",
"16 1 4.75 97\n",
"17 1 5.00 102\n",
"18 1 5.50 109\n",
"19 0 3.50 125"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "66f73953",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pass</th>\n",
" <th>Hours</th>\n",
" <th>IQ</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0.50</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>0.75</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>1.00</td>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>1.25</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>1.50</td>\n",
" <td>100</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pass Hours IQ\n",
"0 0 0.50 110\n",
"1 0 0.75 95\n",
"2 0 1.00 118\n",
"3 0 1.25 97\n",
"4 0 1.50 100"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "7a2feffb",
"metadata": {},
"source": [
"**Nützliche Shortcuts:<br>**\n",
"b - fügt eine leere Zelle unterhalb der aktuell aktiven hinzu<br>\n",
"a - fügt eine leere Zelle oberhalb der aktuell aktiven hinzu<br>\n",
"CTRL + ENTER - führt aktive Zelle aus (Mac: CMD + ENTER)<br>\n",
"SHIFT + ENTER - führt aktive Zelle aus und wechselt zur nächsten Zelle<br>\n",
"ENTER - wechselt in den Bearbeiten-Modus einer Zelle<br>\n",
"ESC - wechselt in den Ansicht-Modus einer Zelle<br>\n",
"d d - (2x d) - löscht aktive Zelle<br>\n",
"CTRL + C und CTRL + V funktionieren wie erwartet<br>"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "099ff4d7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: xlabel='Hours', ylabel='IQ'>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=data['Hours'], y=data['IQ'])\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3e62008e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: xlabel='Hours', ylabel='IQ'>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=data['Hours'], y=data['IQ'], hue=data['Pass'])"
]
},
{
"cell_type": "markdown",
"id": "a0f6f2ee",
"metadata": {},
"source": [
"Ein anderer Datensatz..."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9f7c4ebd",
"metadata": {},
"outputs": [],
"source": [
"melbourne_file_path = 'data/melb_data.csv'\n",
"melbourne_data = pd.read_csv(melbourne_file_path)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "d5e017fe",
"metadata": {},
"outputs": [],
"source": [
"melbourne_data = melbourne_data.dropna(axis=0)\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0de6a41c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Suburb</th>\n",
" <th>Address</th>\n",
" <th>Rooms</th>\n",
" <th>Type</th>\n",
" <th>Price</th>\n",
" <th>Method</th>\n",
" <th>SellerG</th>\n",
" <th>Date</th>\n",
" <th>Distance</th>\n",
" <th>Postcode</th>\n",
" <th>...</th>\n",
" <th>Bathroom</th>\n",
" <th>Car</th>\n",
" <th>Landsize</th>\n",
" <th>BuildingArea</th>\n",
" <th>YearBuilt</th>\n",
" <th>CouncilArea</th>\n",
" <th>Lattitude</th>\n",
" <th>Longtitude</th>\n",
" <th>Regionname</th>\n",
" <th>Propertycount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Abbotsford</td>\n",
" <td>25 Bloomburg St</td>\n",
" <td>2</td>\n",
" <td>h</td>\n",
" <td>1035000.0</td>\n",
" <td>S</td>\n",
" <td>Biggin</td>\n",
" <td>4/02/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>156.0</td>\n",
" <td>79.0</td>\n",
" <td>1900.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8079</td>\n",
" <td>144.9934</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abbotsford</td>\n",
" <td>5 Charles St</td>\n",
" <td>3</td>\n",
" <td>h</td>\n",
" <td>1465000.0</td>\n",
" <td>SP</td>\n",
" <td>Biggin</td>\n",
" <td>4/03/2017</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>134.0</td>\n",
" <td>150.0</td>\n",
" <td>1900.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8093</td>\n",
" <td>144.9944</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Abbotsford</td>\n",
" <td>55a Park St</td>\n",
" <td>4</td>\n",
" <td>h</td>\n",
" <td>1600000.0</td>\n",
" <td>VB</td>\n",
" <td>Nelson</td>\n",
" <td>4/06/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>120.0</td>\n",
" <td>142.0</td>\n",
" <td>2014.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8072</td>\n",
" <td>144.9941</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Abbotsford</td>\n",
" <td>124 Yarra St</td>\n",
" <td>3</td>\n",
" <td>h</td>\n",
" <td>1876000.0</td>\n",
" <td>S</td>\n",
" <td>Nelson</td>\n",
" <td>7/05/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>245.0</td>\n",
" <td>210.0</td>\n",
" <td>1910.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8024</td>\n",
" <td>144.9993</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Abbotsford</td>\n",
" <td>98 Charles St</td>\n",
" <td>2</td>\n",
" <td>h</td>\n",
" <td>1636000.0</td>\n",
" <td>S</td>\n",
" <td>Nelson</td>\n",
" <td>8/10/2016</td>\n",
" <td>2.5</td>\n",
" <td>3067.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>256.0</td>\n",
" <td>107.0</td>\n",
" <td>1890.0</td>\n",
" <td>Yarra</td>\n",
" <td>-37.8060</td>\n",
" <td>144.9954</td>\n",
" <td>Northern Metropolitan</td>\n",
" <td>4019.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" Suburb Address Rooms Type Price Method SellerG \\\n",
"1 Abbotsford 25 Bloomburg St 2 h 1035000.0 S Biggin \n",
"2 Abbotsford 5 Charles St 3 h 1465000.0 SP Biggin \n",
"4 Abbotsford 55a Park St 4 h 1600000.0 VB Nelson \n",
"6 Abbotsford 124 Yarra St 3 h 1876000.0 S Nelson \n",
"7 Abbotsford 98 Charles St 2 h 1636000.0 S Nelson \n",
"\n",
" Date Distance Postcode ... Bathroom Car Landsize BuildingArea \\\n",
"1 4/02/2016 2.5 3067.0 ... 1.0 0.0 156.0 79.0 \n",
"2 4/03/2017 2.5 3067.0 ... 2.0 0.0 134.0 150.0 \n",
"4 4/06/2016 2.5 3067.0 ... 1.0 2.0 120.0 142.0 \n",
"6 7/05/2016 2.5 3067.0 ... 2.0 0.0 245.0 210.0 \n",
"7 8/10/2016 2.5 3067.0 ... 1.0 2.0 256.0 107.0 \n",
"\n",
" YearBuilt CouncilArea Lattitude Longtitude Regionname \\\n",
"1 1900.0 Yarra -37.8079 144.9934 Northern Metropolitan \n",
"2 1900.0 Yarra -37.8093 144.9944 Northern Metropolitan \n",
"4 2014.0 Yarra -37.8072 144.9941 Northern Metropolitan \n",
"6 1910.0 Yarra -37.8024 144.9993 Northern Metropolitan \n",
"7 1890.0 Yarra -37.8060 144.9954 Northern Metropolitan \n",
"\n",
" Propertycount \n",
"1 4019.0 \n",
"2 4019.0 \n",
"4 4019.0 \n",
"6 4019.0 \n",
"7 4019.0 \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"melbourne_data.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b3523f16",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(0.0, 1000.0)]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = sns.scatterplot(x=melbourne_data['BuildingArea'], y=melbourne_data['Price'])\n",
"ax.set(xlim=(0, 1000))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b3cc3971",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"xplot = [1,2]\n",
"yplot = [3, 8]\n",
"ax = sns.lineplot(x=xplot, y=yplot)"
]
},
{
"cell_type": "markdown",
"id": "b3f8a1a5",
"metadata": {},
"source": [
"Sie plotten verschiedene Dinge in eine Grafik, indem Sie seaborn mehrfach hintereinander innerhalb derselben Codezelle aufrufen. Wir plotten jetzt sowohl die Datenpunkte von oben, als auch eine Gerade:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "410a31ee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6196"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(melbourne_data)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "fb578f78",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: xlabel='BuildingArea', ylabel='Price'>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = sns.scatterplot(x=melbourne_data['BuildingArea'][:100], y=melbourne_data['Price'][:100])\n",
"xplot = [0,300]\n",
"yplot = [1e5, 3e6]\n",
"sns.lineplot(x=xplot, y=yplot)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "223eeae8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = [1,2,3]\n",
"l.append(4)\n",
"l"
]
},
{
"cell_type": "markdown",
"id": "8e335cc5",
"metadata": {},
"source": [
"So könnte man die Feature-Matrix X und den Vektor der Outputs erstellen:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "30b87fbd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l1 = [1,2,3]\n",
"l2 = [4]\n",
"l1 + l2"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "a5215abe",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[[1, 2, 3], [4, 5, 6]]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M = [\n",
" [1,2,3],\n",
" [4,5,6]\n",
"]\n",
"M"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "791052f6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2, 3],\n",
" [4, 5, 6]])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.array(M)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "79166874",
"metadata": {},
"outputs": [],
"source": [
"X = []\n",
"Y = []\n",
"for i, row in melbourne_data.iterrows():\n",
" X.append([1] + [row['BuildingArea']])\n",
" Y.append(row['Price'])\n",
"\n",
"# Für Matrixmultiplikation eiget sich numpy besser, daher wandeln wir die Python-Listen in numpy arrays um:\n",
"X = np.array(X)\n",
"Y = np.array(Y)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "23f97e97",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1. , 79. ],\n",
" [ 1. , 150. ],\n",
" [ 1. , 142. ],\n",
" ...,\n",
" [ 1. , 35.64],\n",
" [ 1. , 61.6 ],\n",
" [ 1. , 388.5 ]], shape=(6196, 2))"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X"
]
},
{
"cell_type": "markdown",
"id": "860aea24",
"metadata": {},
"source": [
"### Slicen von numpy arrays"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "69d5eed6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1. 79.]\n",
" [ 1. 150.]\n",
" [ 1. 142.]\n",
" [ 1. 210.]\n",
" [ 1. 107.]]\n",
"[1035000. 1465000. 1600000. 1876000. 1636000. 1097000. 1350000.]\n"
]
}
],
"source": [
"# slicen von numpy arrays\n",
"print(X[:5])\n",
"print(Y[:7])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "a5ebf4dd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 79. 150. 142. ... 35.64 61.6 388.5 ]\n"
]
}
],
"source": [
"# slicen von numpy arrays: Auswahl einer Spalte\n",
"print(X[:,1])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "796dfee7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 79. 150. 142. 210. 107.]\n"
]
}
],
"source": [
"# slicen von numpy arrays: die ersten 5 Zeilen von Spalte 2 (also index 1):\n",
"print(X[:5,1])"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "681e36cf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1. 1. 1. 1. 1.]\n"
]
}
],
"source": [
"# slicen von numpy arrays: die ersten 5 Zeilen von Spalte 1 (also index ):\n",
"print(X[:5,0])"
]
},
{
"cell_type": "markdown",
"id": "ce97c986",
"metadata": {},
"source": [
"### Multiplikation bei numpy arrays\n",
"\n",
"Sie können numpy arrays sowohl elementweise multiplizieren als auch als Matrixmultiplikation."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "6f4cabc0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = 'asdas {}'\n",
"type(s)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "73cc9eb1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'a: 7.1234567'"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = 7.1234567\n",
"s = 'a: {}'.format(a)\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "c853f2d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a: [1 1 1]\n",
"b: [1 2 3]\n",
"elementweise Operationen:\n",
"a+b: [2 3 4]\n",
"a*b: [1 2 3]\n",
"a/b: [1. 0.5 0.33333333]\n",
"\n",
"Skalarprodukt:\n",
"a@b: 6\n",
"\n",
"Matrix x Vektor:\n",
"M:\n",
"[[1 3]\n",
" [2 1]]\n",
"x: [1 1]\n",
"M@x: [4 3]\n",
"\n",
"Matrix transponieren:\n",
"M.T:\n",
"[[1 2]\n",
" [3 1]]\n"
]
}
],
"source": [
"a = np.array([1,1,1])\n",
"b = np.array([1,2,3])\n",
"print('a: {}'.format(a))\n",
"print('b: {}'.format(b))\n",
"\n",
"print('elementweise Operationen:')\n",
"print('a+b: {}'.format(a+b))\n",
"print('a*b: {}'.format(a*b))\n",
"print('a/b: {}'.format(a/b))\n",
"\n",
"print()\n",
"print('Skalarprodukt:')\n",
"print('a@b: {}'.format(a@b))\n",
"\n",
"print()\n",
"print('Matrix x Vektor:')\n",
"M = np.array([[1,3], [2,1]])\n",
"x = np.array([1,1])\n",
"print('M:\\n{}'.format(M))\n",
"print('x: {}'.format(x))\n",
"print('M@x: {}'.format(M@x))\n",
"\n",
"print()\n",
"print('Matrix transponieren:')\n",
"print('M.T:\\n{}'.format(M.T))"
]
},
{
"cell_type": "markdown",
"id": "e0577f21",
"metadata": {},
"source": [
"## Analytische Lösung der linearen Regression\n",
"\n",
"`np.linalg.solve(A, b)` berechnet $w$ im linearen Gleichungssystem\n",
"\n",
"$ A w = b $\n",
"\n",
"$A$ - Matrix,\n",
"$w$ - Vektor (unsere unbekannten),\n",
"$b$ - Vektor.\n",
"\n",
"Wir suchen die Lösung $w$ im folgenden Gleichungssystem:\n",
"\n",
"$$ X^T X w = X^T Y $$\n",
"\n",
"Mit $A = X^TX$ und $b = X^T Y$ berechnet `np.linalg.solve(A, b)` unsere gesuchten Paramter für die lineare Regression."
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "35a78137",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[510531.72552189 3943.64499525]\n",
"CPU times: user 302 μs, sys: 10 μs, total: 312 μs\n",
"Wall time: 316 μs\n"
]
}
],
"source": [
"%%time\n",
"w_ana = np.linalg.solve(X.T @ X, X.T @ Y)\n",
"print(w_ana)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}