Python exercises and solutions PDF

Title Python exercises and solutions
Author Kanan Mammadzada
Course Storia della fotografia
Institution Universidad del Sol
Pages 14
File Size 240.5 KB
File Type PDF
Total Downloads 70
Total Views 156

Summary

Python exercises...


Description

02.11.2018

Lesson05 (1)

Lesson 5: Pandas (I) Third-party modules pandas provides high-performance, easy to use data structures. The primary object in pandas is the DataFrame, an Excel-like, two-dimensional tabular and column-oriented data structure with both row and column labels. matplotlib is the most popular Python library for producing plots and other 2D data visualizations. NumPy, short for numerical Python, is the foundational package for scientific computing in Python. It provides, among others, an efficient data structure useful for numerical work, e.g., manipulating matrices and generating random numbers. SciPy contains additional routines needed in scientific work. For example, routines for computing integrals numerically, solving differential equations, and optimization. Furthermore, it includes specialized tools for machine learning (scikit-learn).

The import statement Before you can use the functions of a module, you need to import the module. The basic way to do this is to use an import statement that consists of the following: The import keyword The name of the module Optionally, more module names, as long as they are separated by commas

The Pandas module In [1]: import pandas as pd

Series A Pandas series is a one-dimensional array (a vector) that contains integer, float or string values. Furthermore, it provides an index starting by default from 0.

http://localhost:8888/notebooks/Lesson05%20(1).ipynb

1/14

02.11.2018

Lesson05 (1)

In [2]: FivePrimes = [1,3,5,7,11] s1 = pd.Series(data = FivePrimes) s1 Out[2]: 0 1 1 3 2 5 3 7 4 11 dtype: int64 In [4]: YesAndNo = ['Yes','Yes','No','Yes','No'] s2 = pd.Series(data = YesAndNo) s2 Out[4]: 0 Yes 1 Yes 2 No 3 Yes 4 No dtype: object In [5]: type(s1) Out[5]: pandas.core.series.Series In [6]: type(s2) Out[6]: pandas.core.series.Series

Dataframe A Pandas dataframe represents a tabular, spreadsheet-like data structure containing columns, rows, column names, and an index. Columns can be a different data type (numeric or string). New columns and rows can be easily added to the dataframe. In addition dataframes can also be easily exported and imported from CSV, Excel, JSON, HTML and SQL database.

http://localhost:8888/notebooks/Lesson05%20(1).ipynb

2/14

02.11.2018

Lesson05 (1)

In [7]: df = pd.DataFrame(data= list(zip(FivePrimes,YesAndNo)), columns = ['x','y']) df Out[7]: x

y

0

1

Yes

1

3

Yes

2

5

No

3

7

Yes

4

11

No

In [7]: type(df) Out[7]: pandas.core.frame.DataFrame

Exporting and importing CSV and EXCEL files In [11]: df.to_csv('SimpleData.csv', index=False) df.to_excel('SimpleData.xlsx', index=False) In [12]: df = pd.read_csv('SimpleData.csv', delimiter = ',') df Out[12]: x

y

0

1

Yes

1

3

Yes

2

5

No

3

7

Yes

4

11

No

http://localhost:8888/notebooks/Lesson05%20(1).ipynb

3/14

02.11.2018

Lesson05 (1)

In [13]: df = pd.read_excel('SimpleData.xlsx') df Out[13]: x

y

0

1

Yes

1

3

Yes

2

5

No

3

7

Yes

4

11

No

In [13]: df.info()

RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): x 5 non-null int64 y 5 non-null object dtypes: int64(1), object(1) memory usage: 160.0+ bytes In [14]: df.shape Out[14]: (5, 2) In [14]: df.head(3) Out[14]: x

y

0

1

Yes

1

3

Yes

2

5

No

In [15]: df.tail(3) Out[15]: x

y

2

5

No

3

7

Yes

4

11

No

http://localhost:8888/notebooks/Lesson05%20(1).ipynb

4/14

02.11.2018

Lesson05 (1)

Selecting columns by name Unlike lists, a Pandas dataframe can have several columns (variables) that are called as follows. In [16]: df['x'] Out[16]: 0 1 1 3 2 5 3 7 4 11 Name: x, dtype: int64 In [17]: df[['x','y']] Out[17]: x

y

0

1

Yes

1

3

Yes

2

5

No

3

7

Yes

4

11

No

Selecting rows by indexing and slicing As for lists, you can get single or multiple rows by indexing and slicing. For example: In [18]: df[0:1] Out[18]:

0

x

y

1

Yes

http://localhost:8888/notebooks/Lesson05%20(1).ipynb

5/14

02.11.2018

Lesson05 (1)

In [19]: df[1:4] Out[19]: x

y

1

3

Yes

2

5

No

3

7

Yes

In [20]: df[2:] Out[20]: x

y

2

5

No

3

7

Yes

4

11

No

In [22]: df['x'][0] Out[22]: 1 In [23]: df['x'][1:3] Out[23]: 1 3 2 5 Name: x, dtype: int64 In [21]: df.loc[1:3,'x'] Out[21]: 1 3 2 5 3 7 Name: x, dtype: int64

Selecting rows by filtering Suppose you want to select rows from a dataframe based on values in a column. For example, we can get all rows in which the values of column x are greater than 4 by writing:

http://localhost:8888/notebooks/Lesson05%20(1).ipynb

6/14

02.11.2018

Lesson05 (1)

In [22]: df[df['x'] > 4] Out[22]: x

y

2

5

No

3

7

Yes

4

11

No

Within the squared brackets, you can find the condition df['x'] > 4 which is fullfilled (or True, more about it later) for the rows indexed by 2 or greater. Furthermore, you can filter rows based on two ore more conditions. While & means "and", i.e. all conditions must be fulfilled, | indicates "or", i.e. only one condition must be fulfilled in order to be executed. In [27]: df[(df['x'] > 4) & (df['x'] 5) | (df['x']...


Similar Free PDFs