ການທົບທວນຄືນ Content
- ລະຫັດ QR
- ດາວນ໌ໂຫລດ Data
- Preprocessing ຂໍ້ມູນ
- ລະຫັດ QR
- ຄວາມຄິດເຫັນທີ່ Logistic Regression
- ການເຮັດວຽກ Log Loss
- ລະບົບການປັບປຸງ Gradient Descent
- ການຝຶກອົບຮົມ Model
- ການທົບທວນຄືນ
- ດາວນ໌ໂຫລດ The Model
- ພາສາລາວ
ຄູ່ມືນີ້ສະແດງໃຫ້ເຫັນວິທີການນໍາໃຊ້ API ຂອງ TensorFlow Core ທີ່ມີລະດັບຕ່ໍາເພື່ອນໍາໃຊ້ການສື່ຫຍໍ້ຂອງ binary ກັບການຍົກເລີກ logistic. ມັນນໍາໃຊ້ວິທະຍາໄລ Wisconsin Cancerປະເພດຂອງ tumor
ລະຫັດ QRມັນເປັນຫນຶ່ງໃນ algoritms ທີ່ດີທີ່ສຸດສໍາລັບການປະເພດສິລະປະ binary. ໃນຂະນະທີ່ມີຊຸດຕົວຢ່າງທີ່ມີຄຸນນະສົມບັດ, ການຄົ້ນຄວ້າຂອງການດໍາເນີນການ regression logistic ແມ່ນສໍາລັບການຜະລິດຄຸນນະສົມບັດລະຫວ່າງ 0 ແລະ 1, ເຊິ່ງສາມາດໄດ້ຮັບການອະນຸຍາດເປັນ probabilities ຂອງແຕ່ລະຕົວຢ່າງທີ່ກ່ຽວຂ້ອງກັບຄຸນນະສົມບັດພິເສດ.
ລະຫັດ QR
ການນໍາໃຊ້ tutorialປະເພດ Pandaການອ່ານເອກະສານ CSV ໃນ aລະຫັດ QRຂໍຂອບໃຈສີດໍາສໍາລັບການ plotting ການເຊື່ອມຕໍ່ Pairwise ໃນ dataset,ດາວໂຫລດສໍາລັບການຄອມພິວເຕີ matrix confusion, ແລະໂທລະສັບມືຖືການສ້າງ visualizations.
pip install -q seaborn
import tensorflow as tf
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
import sklearn.metrics as sk_metrics
import tempfile
import os
# Preset matplotlib figure sizes.
matplotlib.rcParams['figure.figsize'] = [9, 6]
print(tf.__version__)
# To make the results reproducible, set the random seed value.
tf.random.set_seed(22)
2024-08-15 02:45:41.468739: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-15 02:45:41.489749: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-15 02:45:41.496228: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2.17.0
ດາວນ໌ໂຫລດ Data
ຫຼັງຈາກນັ້ນ, load theວິທະຍາໄລ Wisconsin Cancerຈາກ Theການຝຶກອົບຮົມ UCI Machine Learningຊຸດຂໍ້ມູນນີ້ປະກອບມີຄຸນນະສົມບັດທີ່ແຕກຕ່າງກັນເຊັ່ນຕອງຂອງ tumor, texture, ແລະ concavity.
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data'
features = ['radius', 'texture', 'perimeter', 'area', 'smoothness', 'compactness',
'concavity', 'concave_poinits', 'symmetry', 'fractal_dimension']
column_names = ['id', 'diagnosis']
for attr in ['mean', 'ste', 'largest']:
for feature in features:
column_names.append(feature + "_" + attr)
ຊື່ຫຍໍ້ຂອງ : Read the dataset into a pandasລະຫັດ QR using pandas.read_csv
:
dataset = pd.read_csv(url, names=column_names)
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 569 non-null int64
1 diagnosis 569 non-null object
2 radius_mean 569 non-null float64
3 texture_mean 569 non-null float64
4 perimeter_mean 569 non-null float64
5 area_mean 569 non-null float64
6 smoothness_mean 569 non-null float64
7 compactness_mean 569 non-null float64
8 concavity_mean 569 non-null float64
9 concave_poinits_mean 569 non-null float64
10 symmetry_mean 569 non-null float64
11 fractal_dimension_mean 569 non-null float64
12 radius_ste 569 non-null float64
13 texture_ste 569 non-null float64
14 perimeter_ste 569 non-null float64
15 area_ste 569 non-null float64
16 smoothness_ste 569 non-null float64
17 compactness_ste 569 non-null float64
18 concavity_ste 569 non-null float64
19 concave_poinits_ste 569 non-null float64
20 symmetry_ste 569 non-null float64
21 fractal_dimension_ste 569 non-null float64
22 radius_largest 569 non-null float64
23 texture_largest 569 non-null float64
24 perimeter_largest 569 non-null float64
25 area_largest 569 non-null float64
26 smoothness_largest 569 non-null float64
27 compactness_largest 569 non-null float64
28 concavity_largest 569 non-null float64
29 concave_poinits_largest 569 non-null float64
30 symmetry_largest 569 non-null float64
31 fractal_dimension_largest 569 non-null float64
dtypes: float64(30), int64(1), object(1)
memory usage: 142.4+ KB
Display the first five rows:
dataset.head()
id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave_poinits_mean ... radius_largest texture_largest perimeter_largest area_largest smoothness_largest compactness_largest concavity_largest concave_poinits_largest symmetry_largest fractal_dimension_largest
0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 ... 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890 1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.24 0.1860 0.274850 0.08902 2 84300903 M 19.25 21.25 130.00 1203.00.10960 0.15990 0.1974 0.12790 ... 23.57 25.53 152.508.07 1707.014 0.444 0.4245 0.4504 0.30 0.13875 3 84348301 M 11.42 20.38 77.58 386.
ຊື່ຫຍໍ້ຂອງ : Divide the dataset into training and test sets usingpandas.DataFrame.sample
ຂໍຂອບໃຈpandas.DataFrame.drop
ແລະpandas.DataFrame.iloc
ຊຸດທົດສອບຖືກນໍາໃຊ້ເພື່ອ evaluating generalizability ຂອງມາດຕະຖານຂອງທ່ານກັບຂໍ້ມູນທີ່ບໍ່ຮູ້ຈັກ.
train_dataset = dataset.sample(frac=0.75, random_state=1)
len(train_dataset)
427
test_dataset = dataset.drop(train_dataset.index)
len(test_dataset)
142
# The `id` column can be dropped since each row is unique
x_train, y_train = train_dataset.iloc[:, 2:], train_dataset.iloc[:, 1]
x_test, y_test = test_dataset.iloc[:, 2:], test_dataset.iloc[:, 1]
Preprocessing ຂໍ້ມູນ
ລະບົບຂໍ້ມູນນີ້ປະກອບມີຂະຫນາດກາງ, ລະບົບປະເພດ, ແລະຂະຫນາດໃຫຍ່ທີ່ສຸດສໍາລັບທຸກໆ 10 ການຂະຫນາດໃຫຍ່ຂອງ tumor ທີ່ໄດ້ຮັບການກວດສອບໂດຍສະເພາະ."diagnosis"
ລະຫັດ QR ທີ່ຖືກເຂົ້າລະຫັດໂດຍການເຂົ້າລະຫັດ QR'M'
ການຢັ້ງຢືນການປິ່ນປົວຂອງ tumor malignant ແລະ'B'
ການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນການຢັ້ງຢືນ
ການpandas.Series.map
ລະຫັດ QR ເອກະຊົນບໍລິການບໍລິການລູກຄ້າຂອງພວກເຮົາ
ຊຸດຂໍ້ມູນຍັງຄວນໄດ້ຮັບການປ່ຽນແປງກັບ tensor ກັບtf.convert_to_tensor
ການເຮັດວຽກຫຼັງຈາກການ preprocessing ແມ່ນ completed.
y_train, y_test = y_train.map({'B': 0, 'M': 1}), y_test.map({'B': 0, 'M': 1})
x_train, y_train = tf.convert_to_tensor(x_train, dtype=tf.float32), tf.convert_to_tensor(y_train, dtype=tf.float32)
x_test, y_test = tf.convert_to_tensor(x_test, dtype=tf.float32), tf.convert_to_tensor(y_test, dtype=tf.float32)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1723689945.265757 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.269593 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.273290 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.276976 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.288712 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.292180 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.295550 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.299093 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.302584 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.306098 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.309484 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689945.312921 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.538105 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.540233 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.542239 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.544278 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.546323 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.548257 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.550168 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.552143 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.554591 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.556540 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.558447 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.560412 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.599852 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.601910 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.604061 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.606104 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.608094 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.610074 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.611985 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.613947 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.615903 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.618356 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.620668 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1723689946.623031 132290 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
Use seaborn.pairplot
ການທົບທວນຄືນການຈັດການຮ່ວມເພດຂອງຄູ່ເພື່ອນຂອງຄຸນນະສົມບັດປະມານທີ່ແຕກຕ່າງກັນຈາກຄຸນນະສົມບັດການຝຶກອົບຮົມແລະເບິ່ງວິທີທີ່ພວກເຂົາເຈົ້າກ່ຽວຂ້ອງກັບຈຸດປະສົງ:
sns.pairplot(train_dataset.iloc[:, 1:6], hue = 'diagnosis', diag_kind='kde');
ການຄາດຄະເນດິນດີຕ້ອນຮັບຂອງພວກເຮົາແມ່ນການຄາດຄະເນດິນດີຕ້ອນຮັບຂອງພວກເຮົາ. ການຄາດຄະເນດິນດີຕ້ອນຮັບຂອງພວກເຮົາແມ່ນການຄາດຄະເນດິນດີຕ້ອນຮັບຂອງພວກເຮົາ.
ຖ້າຫາກວ່າທ່ານກໍາລັງຊອກຫາຂໍ້ມູນເພີ່ມເຕີມ, ກະລຸນາຊອກຫາຂໍ້ມູນເພີ່ມເຕີມກ່ຽວກັບການຊອກຫາຂໍ້ມູນເພີ່ມເຕີມ.
train_dataset.describe().transpose()[:10]
|
|
mean |
std |
min |
25% |
50% |
75% |
max |
---|---|---|---|---|---|---|---|---|
id |
427.0 |
2.756014e+07 |
1.162735e+08 |
8670.00000 |
865427.500000 |
905539.00000 |
8.810829e+06 |
9.113205e+08 |
radius_mean |
427.0 |
1.414331e+01 |
3.528717e+00 |
6.98100 |
11.695000 |
13.43000 |
1.594000e+01 |
2.811000e+01 |
texture_mean |
427.0 |
1.924468e+01 |
4.113131e+00 |
10.38000 |
16.330000 |
18.84000 |
2.168000e+01 |
3.381000e+01 |
perimeter_mean |
427.0 |
9.206759e+01 |
2.431431e+01 |
43.79000 |
75.235000 |
86.87000 |
1.060000e+02 |
1.885000e+02 |
area_mean |
427.0 |
6.563190e+02 |
3.489106e+02 |
143.50000 |
420.050000 |
553.50000 |
7.908500e+02 |
2.499000e+03 |
smoothness_mean |
427.0 |
9.633618e-02 |
1.436820e-02 |
0.05263 |
0.085850 |
0.09566 |
1.050000e-01 |
1.634000e-01 |
compactness_mean |
427.0 |
1.036597e-01 |
5.351893e-02 |
0.02344 |
0.063515 |
0.09182 |
1.296500e-01 |
3.454000e-01 |
concavity_mean |
427.0 |
8.833008e-02 |
7.965884e-02 |
0.00000 |
0.029570 |
0.05999 |
1.297500e-01 |
4.268000e-01 |
concave_poinits_mean |
427.0 |
4.872688e-02 |
3.853594e-02 |
0.00000 |
0.019650 |
0.03390 |
7.409500e-02 |
2.012000e-01 |
symmetry_mean |
427.0 |
1.804597e-01 |
2.637837e-02 |
0.12030 |
0.161700 |
0.17840 |
1.947000e-01 |
2.906000e-01 |
ລະຫັດ QR
427.0
2.756014e+07
1.162735e+08
8670.00000
865427.500000
905539.00000
8.810829e+06
9.113205e+08
ລະຫັດ QR
427.0
1.414331e+01
3.528717e+00
6.98100
11.695000
13.43000
1.594000e+01
2.811000e+01
texture_mean
427.0
1.924468e+01
4.113131e+00
10.38000
16.330000
18.84000
2.168000e+01
3.381000e+01
ຊື່ຫຍໍ້ຂອງ : perimeter
427.0
9.206759e+01
2.431431e+01
43.79000
75.235000
86.87000
1.060000e+02
1.885000e+02
ຫນ້າທໍາອິດ / Mean
427.0
6.563190e+02
3.489106e+02
143.50000
420.050000
553.50000
7.908500e+02
2.499000e+03
ຄວາມຄິດເຫັນທີ່ Smoothness
427.0
9.633618e-02
1.436820e-02
0.05263
0.085850
0.09566
1.050000e-01
1.634000e-01
ລະຫັດ QR
427.0
1.036597e-01
5.351893e-02
0.02344
0.063515
0.09182
1.296500e-01
3.454000e-01
ຄວາມຄິດເຫັນທີ່ concavity
427.0
8.833008e-02
7.965884e-02
0.00000
0.029570
0.05999
1.297500e-01
4.268000e-01
ຊື່ຫຍໍ້ຂອງ : concave
427.0
4.872688e-02
3.853594e-02
0.00000
0.019650
0.03390
7.409500e-02
2.012000e-01
symmetry_mean
427.0
1.804597e-01
2.637837e-02
0.12030
0.161700
0.17840
1.947000e-01
2.906000e-01
ມີການຄາດຄະເນດຽວກັນ, ມັນເປັນປະໂຫຍດທີ່ຈະມາດຕະຖານຂໍ້ມູນເຊັ່ນດຽວກັນກັບການຄາດຄະເນດຽວກັນກັບການຄາດຄະເນດຽວກັນກັບການຄາດຄະເນດຽວກັນ.normalization.
class Normalize(tf.Module):
def __init__(self, x):
# Initialize the mean and standard deviation for normalization
self.mean = tf.Variable(tf.math.reduce_mean(x, axis=0))
self.std = tf.Variable(tf.math.reduce_std(x, axis=0))
def norm(self, x):
# Normalize the input
return (x - self.mean)/self.std
def unnorm(self, x):
# Unnormalize the input
return (x * self.std) + self.mean
norm_x = Normalize(x_train)
x_train_norm, x_test_norm = norm_x.norm(x_train), norm_x.norm(x_test)
ລະຫັດ QR
Before building a logistic regression model, it is crucial to understand the method's differences compared to traditional linear regression.
ຄວາມຄິດເຫັນທີ່ Logistic Regression
ການຍົກເລີກ Linear returns a linear combination of its inputs; this output is unlimited. ການຍົກເລີກຂອງລະຫັດ QRມັນແມ່ນໃນ(0, 1)
ປະເພດ: ປະເພດ: ປະເພດ: ປະເພດ: ປະເພດ: ປະເພດ: ປະເພດ: ປະເພດທີ່ດີເລີດລະຫັດ QR
Logistic regression maps the continuous outputs of traditional linear regression, (-∞, ∞)
ປະເພດ probabilities(0, 1)
ການປ່ຽນແປງນີ້ຍັງເປັນ symmetrical ດັ່ງນັ້ນຈຶ່ງເພື່ອ flipping ລະຫັດຂອງ output linear ມີຜົນປະໂຫຍດທີ່ທັນສະໄຫມຂອງ probabilities ທີ່ເລີ່ມຕົ້ນ.
ສະ ຫນັບ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ ສະ ຫນູນ1
ການກັ່ນຕອງທີ່ຕ້ອງການສາມາດໄດ້ຮັບໂດຍ interpreting output linear regression ເປັນlog oddsຄວາມຄິດເຫັນທີ່ Being In Class1
ປະເພດ0
:
ln(Y1−Y)=wX+b
ໂດຍການຕິດຕັ້ງ wX + b = z, ການຄາດຄະເນດຽວກັນນີ້ສາມາດແກ້ໄຂສໍາລັບ Y:
Y=ez1+ez=11+e−z
ການອະນຸຍາດ 11+e−z ແມ່ນຮູ້ຈັກເປັນລະຫັດ QR σ(z). Hence, the equation for logistic regression can be written as Y=σ(wX+b).
ລະບົບຂໍ້ມູນຂອງການຝຶກອົບຮົມນີ້ແມ່ນການປິ່ນປົວ matrix feature high-dimensional. Therefore, the equation above must be rewritten in a matrix vector form as follows:
Y=σ(Xw+b)
ວິທີການ:
- Ym×1 - Vector ປະລິມານ
- Xm×n: a feature matrix
- ລະຫັດ QR
- ຊື່ຫຍໍ້ຂອງ : A bias
- σ: ຄຸນນະສົມບັດ sigmoid ທີ່ຖືກນໍາໃຊ້ກັບແຕ່ລະອຽດຂອງ vector output
ການເລີ່ມຕົ້ນໂດຍ visualizing ລະບົບ sigmoid, ເຊິ່ງການປ່ຽນແປງ output linear,(-∞, ∞)
ຂ້າພະເຈົ້າສືບຕໍ່ໄດ້ຮັບການປະທັບໃຈ0
ແລະ1
ຄຸນນະສົມບັດ Sigmoid ແມ່ນມີຢູ່ໃນtf.math.sigmoid
.
x = tf.linspace(-10, 10, 500)
x = tf.cast(x, tf.float32)
f = lambda x : (1/20)*x + 0.6
plt.plot(x, tf.math.sigmoid(x))
plt.ylim((-0.1,1.1))
plt.title("Sigmoid function");
ການເຮັດວຽກ Log Loss
ການດາວໂຫລດ, or binary cross-entropy loss, is the ideal loss function for a binary classification problem with logistic regression. For each example, the log loss quantifies the similarity between a predicted probability and the example's true value. It is determined by the following equation:
L=−1m∑i=1myi⋅log(y^i)+(1−yi)⋅log(1−y^i)
ວິທີການ:
- y^: vector ຂອງ probabilities predicted
- y: Vector ຂອງຈຸດປະສົງທີ່ແທ້ຈິງ
You can use the tf.nn.sigmoid_cross_entropy_with_logits
Function to calculate the log loss. This function automatically applies the sigmoid activation to the regression output:
def log_loss(y_pred, y):
# Compute the log loss function
ce = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=y_pred)
return tf.reduce_mean(ce)
ລະບົບການປັບປຸງ Gradient Descent
The TensorFlow Core APIs support automatic differentiation with tf.GradientTape
ຖ້າຫາກວ່າທ່ານກໍາລັງຊອກຫາສໍາລັບການຄອມພິວເຕີທີ່ຢູ່ໃກ້ກັບການ regression logisticການປັບປຸງ Gradientຂໍຂອບໃຈວ່າທ່ານກໍາລັງຊອກຫາຂໍ້ມູນເພີ່ມເຕີມ.
ໃນອຸປະກອນທີ່ຜ່ານມາສໍາລັບການ lost log, ກະລຸນາຮູ້ວ່າແຕ່ລະ y^i ສາມາດໄດ້ຮັບ rewrite ໃນຕອນທ້າຍຂອງ input ເປັນ σ(Xiw + b).
ການຄົ້ນຄວ້າແມ່ນເພື່ອຊອກຫາ w ແລະ b ທີ່ຫຼຸດຜ່ອນການປະທັບໃຈຂອງບັນຊີລາຍຊື່:
L=−1m∑i=1myi⋅log(σ(Xiw+b))+(1−yi)⋅log(1−σ(Xiw+b))
By taking the gradient L with respect to w, you get the following:
∂L∂w=1m(σ(Xw+b)−y)X
ໂດຍການນໍາໃຊ້ gradient L ກັບ b, ທ່ານໄດ້ຮັບການນີ້:
∂L∂b=1m∑i=1mσ(Xiw+b)−yi
ໃນປັດຈຸບັນ, ສ້າງຮູບແບບການ regression logistic.
class LogisticRegression(tf.Module):
def __init__(self):
self.built = False
def __call__(self, x, train=True):
# Initialize the model parameters on the first call
if not self.built:
# Randomly generate the weights and the bias term
rand_w = tf.random.uniform(shape=[x.shape[-1], 1], seed=22)
rand_b = tf.random.uniform(shape=[], seed=22)
self.w = tf.Variable(rand_w)
self.b = tf.Variable(rand_b)
self.built = True
# Compute the model output
z = tf.add(tf.matmul(x, self.w), self.b)
z = tf.squeeze(z, axis=1)
if train:
return z
return tf.sigmoid(z)
ການຢັ້ງຢືນ, ໃຫ້ແນ່ໃຈວ່າມາດຕະຖານທີ່ບໍ່ໄດ້ຮັບການຝຶກອົບຮົມ outputs ຄຸນນະພາບໃນຂະຫນາດຂອງ(0, 1)
ສໍາລັບ subset ຂະຫນາດນ້ອຍຂອງຂໍ້ມູນການຝຶກອົບຮົມ.
log_reg = LogisticRegression()
y_pred = log_reg(x_train_norm[:5], train=False)
y_pred.numpy()
array([0.9994985 , 0.9978607 , 0.29620072, 0.01979049, 0.3314926 ],
dtype=float32)
ຫຼັງຈາກນັ້ນ, ດາວນ໌ໂຫລດ Function accuracy ສໍາລັບການຄາດຄະເນຜົນປະໂຫຍດປະສິດທິພາບໃນໄລຍະການຝຶກອົບຮົມ. ສໍາລັບການດາວໂຫລດ Classifications ຈາກ probabilities predicted, sett a threshold for which all probabilities higher than the threshold belong to class.1
. ນີ້ແມ່ນ hyperparameter configurable ທີ່ສາມາດຕິດຕັ້ງເພື່ອ0.5
ໃນຖານະເປັນ default.
def predict_class(y_pred, thresh=0.5):
# Return a tensor with `1` if `y_pred` > `0.5`, and `0` otherwise
return tf.cast(y_pred > thresh, tf.float32)
def accuracy(y_pred, y):
# Return the proportion of matches between `y_pred` and `y`
y_pred = tf.math.sigmoid(y_pred)
y_pred_class = predict_class(y_pred)
check_equal = tf.cast(y_pred_class == y,tf.float32)
acc_val = tf.reduce_mean(check_equal)
return acc_val
ການຝຶກອົບຮົມ Model
ການນໍາໃຊ້ mini-batches ສໍາລັບການຝຶກອົບຮົມສະຫນອງການປະສິດທິພາບຂອງລັກສະນະແລະການເຊື່ອມຕໍ່ຢ່າງໄວວາ.tf.data.Dataset
API ມີຄຸນນະສົມບັດທີ່ດີສໍາລັບການ batching ແລະ shuffling. API ເຮັດໃຫ້ທ່ານສາມາດສ້າງ pipelines ການເຂົ້າເຖິງທີ່ສົມບູນແບບຈາກເອກະສານທີ່ທັນສະໄຫມແລະສາມາດນໍາໃຊ້ຢ່າງຕໍ່ເນື່ອງ.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train_norm, y_train))
train_dataset = train_dataset.shuffle(buffer_size=x_train.shape[0]).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test_norm, y_test))
test_dataset = test_dataset.shuffle(buffer_size=x_test.shape[0]).batch(batch_size)
ໃນປັດຈຸບັນຂຽນການຝຶກອົບຮົມ loop ສໍາລັບຮູບແບບ regression logistic. The loop utilizes the log loss function and its gradients with respect to the input in order to iteratively update the model's parameters.
# Set training parameters
epochs = 200
learning_rate = 0.01
train_losses, test_losses = [], []
train_accs, test_accs = [], []
# Set up the training loop and begin training
for epoch in range(epochs):
batch_losses_train, batch_accs_train = [], []
batch_losses_test, batch_accs_test = [], []
# Iterate over the training data
for x_batch, y_batch in train_dataset:
with tf.GradientTape() as tape:
y_pred_batch = log_reg(x_batch)
batch_loss = log_loss(y_pred_batch, y_batch)
batch_acc = accuracy(y_pred_batch, y_batch)
# Update the parameters with respect to the gradient calculations
grads = tape.gradient(batch_loss, log_reg.variables)
for g,v in zip(grads, log_reg.variables):
v.assign_sub(learning_rate * g)
# Keep track of batch-level training performance
batch_losses_train.append(batch_loss)
batch_accs_train.append(batch_acc)
# Iterate over the testing data
for x_batch, y_batch in test_dataset:
y_pred_batch = log_reg(x_batch)
batch_loss = log_loss(y_pred_batch, y_batch)
batch_acc = accuracy(y_pred_batch, y_batch)
# Keep track of batch-level testing performance
batch_losses_test.append(batch_loss)
batch_accs_test.append(batch_acc)
# Keep track of epoch-level model performance
train_loss, train_acc = tf.reduce_mean(batch_losses_train), tf.reduce_mean(batch_accs_train)
test_loss, test_acc = tf.reduce_mean(batch_losses_test), tf.reduce_mean(batch_accs_test)
train_losses.append(train_loss)
train_accs.append(train_acc)
test_losses.append(test_loss)
test_accs.append(test_acc)
if epoch % 20 == 0:
print(f"Epoch: {epoch}, Training log loss: {train_loss:.3f}")
Epoch: 0, Training log loss: 0.661
Epoch: 20, Training log loss: 0.418
Epoch: 40, Training log loss: 0.269
Epoch: 60, Training log loss: 0.178
Epoch: 80, Training log loss: 0.137
Epoch: 100, Training log loss: 0.116
Epoch: 120, Training log loss: 0.106
Epoch: 140, Training log loss: 0.096
Epoch: 160, Training log loss: 0.094
Epoch: 180, Training log loss: 0.089
ການທົບທວນຄືນ
ເບິ່ງການປ່ຽນແປງໃນຄວາມງາມແລະຄວາມຖືກຕ້ອງຂອງຮູບແບບຂອງທ່ານໃນໄລຍະເວລາ.
plt.plot(range(epochs), train_losses, label = "Training loss")
plt.plot(range(epochs), test_losses, label = "Testing loss")
plt.xlabel("Epoch")
plt.ylabel("Log loss")
plt.legend()
plt.title("Log loss vs training iterations");
plt.plot(range(epochs), train_accs, label = "Training accuracy")
plt.plot(range(epochs), test_accs, label = "Testing accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy (%)")
plt.legend()
plt.title("Accuracy vs training iterations");
print(f"Final training log loss: {train_losses[-1]:.3f}")
print(f"Final testing log Loss: {test_losses[-1]:.3f}")
Final training log loss: 0.089
Final testing log Loss: 0.077
print(f"Final training accuracy: {train_accs[-1]:.3f}")
print(f"Final testing accuracy: {test_accs[-1]:.3f}")
Final training accuracy: 0.968
Final testing accuracy: 0.979
ຮູບແບບນີ້ສະແດງໃຫ້ເຫັນຄວາມແມ່ນຍໍາສູງແລະຄວາມເສຍຫາຍຕ່ໍາໃນຂະນະທີ່ຂ້າງຂວາງກ່ຽວກັບການສອບເສັງ tumors ໃນຊຸດຂໍ້ມູນການຝຶກອົບຮົມແລະຍັງເປັນການທົ່ວໄປທີ່ດີກັບຂໍ້ມູນການທົດສອບທີ່ບໍ່ຖືກເບິ່ງ. ສໍາລັບການຂົນສົ່ງຫນຶ່ງຫຼັງຈາກນັ້ນ, ທ່ານສາມາດທົດສອບລະດັບຄວາມຜິດພາດທີ່ໃຫ້ຄວາມຮູ້ເພີ່ມເຕີມຫຼາຍກ່ວາລະດັບຄວາມແມ່ນຍໍາທົ່ວໄປ. ລະດັບຄວາມຜິດພາດທີ່ດີທີ່ສຸດສອງສໍາລັບບັນຫາການສອບເສັງ binary ແມ່ນລະດັບຄວາມຫມັ້ນຄົງ (FPR) ແລະລະດັບຄວາມຫມັ້ນຄົງ (FNR).
ສໍາລັບບັນຫານີ້, FPR ແມ່ນຂະຫນາດໃຫຍ່ຂອງການຄາດຄະເນດິບ malignant ໃນລະຫວ່າງການຢາທີ່ເປັນຕົວແທນ benign. Inversely, FNR ແມ່ນຂະຫນາດໃຫຍ່ຂອງການຄາດຄະເນດິບ benign ໃນລະຫວ່າງການຢາທີ່ເປັນຕົວແທນ malignant.
ການຄຸ້ມຄອງ matrix confusionsklearn.metrics.confusion_matrix
, ເຊິ່ງກວດສອບຄວາມຖືກຕ້ອງຂອງການກວດສອບ, ແລະໃຊ້ matplotlib ເພື່ອພິມ matrix:
def show_confusion_matrix(y, y_classes, typ):
# Compute the confusion matrix and normalize it
plt.figure(figsize=(10,10))
confusion = sk_metrics.confusion_matrix(y.numpy(), y_classes.numpy())
confusion_normalized = confusion / confusion.sum(axis=1, keepdims=True)
axis_labels = range(2)
ax = sns.heatmap(
confusion_normalized, xticklabels=axis_labels, yticklabels=axis_labels,
cmap='Blues', annot=True, fmt='.4f', square=True)
plt.title(f"Confusion matrix: {typ}")
plt.ylabel("True label")
plt.xlabel("Predicted label")
y_pred_train, y_pred_test = log_reg(x_train_norm, train=False), log_reg(x_test_norm, train=False)
train_classes, test_classes = predict_class(y_pred_train), predict_class(y_pred_test)
show_confusion_matrix(y_train, train_classes, 'Training')
show_confusion_matrix(y_test, test_classes, 'Testing')
ໃນໄລຍະການທົດສອບທາງດ້ານວິຊາການຈໍານວນຫຼາຍເຊັ່ນດຽວກັນກັບການກວດສອບການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນປົວຂອງການປິ່ນ
ສໍາລັບການຄວບຄຸມສໍາລັບ FPR ແລະ FNR, ກະລຸນາປ່ຽນແປງ hyperparameter threshold ໃນຂະນະທີ່ກໍານົດຄຸນນະພາບຄຸນນະພາບ. threshold ນ້ອຍກ່ວາການຄຸນນະພາບສູງກ່ວາການຄຸນນະພາບສູງກ່ວາການຄຸນນະພາບສູງກ່ວາການຄຸນນະພາບສູງກ່ວາການຄຸນນະພາບສູງກ່ວາການຄຸນນະພາບສູງ. This inevitably increases the number of false positives and the FPR but it also helps to reduce the number of false negatives and the FNR.
ດາວນ໌ໂຫລດ The Model
ຂ້າພະເຈົ້າສືບຕໍ່ໄດ້ຮັບການປະທັບໃຈກໍໂດຍການບໍລິການລູກຄ້າຂອງພວກເຮົາ
- ລະຫັດ QR
- ການຄາດຄະເນ
- ການຄາດຄະເນ
class ExportModule(tf.Module):
def __init__(self, model, norm_x, class_pred):
# Initialize pre- and post-processing functions
self.model = model
self.norm_x = norm_x
self.class_pred = class_pred
@tf.function(input_signature=[tf.TensorSpec(shape=[None, None], dtype=tf.float32)])
def __call__(self, x):
# Run the `ExportModule` for new data points
x = self.norm_x.norm(x)
y = self.model(x, train=False)
y = self.class_pred(y)
return y
log_reg_export = ExportModule(model=log_reg,
norm_x=norm_x,
class_pred=predict_class)
ຖ້າຫາກວ່າທ່ານຕ້ອງການໃຫ້ມາດຕະຖານໃນສະພາບອາກາດຂອງຕົນ, ທ່ານສາມາດເຮັດວຽກກັບtf.saved_model.save
ການດາວໂຫລດຮູບແບບທີ່ຖືກເກັບຮັກສາແລະການຄາດຄະເນດຽວກັນtf.saved_model.load
ລະຫັດ QR
models = tempfile.mkdtemp()
save_path = os.path.join(models, 'log_reg_export')
tf.saved_model.save(log_reg_export, save_path)
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp9k_sar52/log_reg_export/assets
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp9k_sar52/log_reg_export/assets
log_reg_loaded = tf.saved_model.load(save_path)
test_preds = log_reg_loaded(x_test)
test_preds[:10].numpy()
array([1., 1., 1., 1., 0., 1., 1., 1., 1., 1.], dtype=float32)
ພາສາລາວ
ຊື່ຫຍໍ້ຂອງ : This notebook introduced a few techniques to handle a logistic regression problem. Here are a few more tips that may help:
- API ຂອງ TensorFlow Core ສາມາດຖືກນໍາໃຊ້ເພື່ອສ້າງ workflows ການຝຶກອົບຮົມເຄື່ອງທີ່ມີລະດັບຄວາມສູງຂອງການຕິດຕັ້ງ
- ການທົດສອບລະດັບຄວາມຜິດພາດແມ່ນວິທີທີ່ດີທີ່ຈະຊອກຫາຄວາມຮູ້ເພີ່ມເຕີມກ່ຽວກັບຜົນປະໂຫຍດຂອງມະນຸດຄົ້ນຄວ້າຫຼາຍກ່ວາລະດັບຄວາມຖືກຕ້ອງທັງຫມົດຂອງຕົນ.
- Overfitting ແມ່ນບັນຫາທີ່ແຕກຕ່າງກັນສໍາລັບຮູບແບບ regression logistic, ແຕ່ມັນບໍ່ແມ່ນບັນຫາສໍາລັບການ tutorial ນີ້. ດາວນ໌ໂຫລດ Overfit ແລະ underfit tutorial ສໍາລັບການຊ່ວຍເຫຼືອເພີ່ມເຕີມກ່ຽວກັບການນີ້.
ສໍາລັບຕົວຢ່າງເພີ່ມເຕີມຂອງການນໍາໃຊ້ TensorFlow Core API, ດາວນ໌ໂຫລດguideຖ້າຫາກວ່າທ່ານຕ້ອງການຮູ້ເພີ່ມເຕີມກ່ຽວກັບການດາວໂຫລດແລະການ preparing data, ກະລຸນາເບິ່ງ tutorials ຢູ່ຮູບພາບ ສໍາ ລັບ Data Loadingປະເພດລະຫັດ QR.
ທີ່ຜ່ານມາ: Published on the
ທີ່ຜ່ານມາ: Published on the