Back
Close

Computing with Data

elgeish
34.9K views

The pandas Package

Pandas is a package that provides an implementation of a dataframe object that can be used to store datasets (like the ones in R):

import pandas as pd
import numpy as np
# Example: how to create a dataframe
data = {
"names": ["John", "Jane", "George"],
"age": [25, 35, 52],
"height": [68.1, 62.5, 60.5],
}
df = pd.DataFrame(data)
print("dataframe content =\n" + str(df))
print("dataframe types =\n" + str(df.dtypes))
# Example: accessing data in a dataframe
df["age"] = 35 # assign 35 to all age values
print("age column =\n" + str(df["age"]))
print("height column =\n" + str(df.height))
print("second row =\n" + str(df.ix[1]))
# Example: adding a new column
df = pd.DataFrame(data)
df["weight"] = [170.2, 160.7, 185.5]
print(df)
# Example: the median function
df = pd.DataFrame(data)
print("medians of columns =\n" + str(df.median()))
print("medians of rows =\n" + str(df.median(axis=1)))
# Example: apply f(x) = x + 1 to all columns
data = {
"age": [25.2, 35.4, 52.1],
"height": [68.1, 62.5, 60.5],
"weight": [170.2, 160.7, 185.5],
}
df = pd.DataFrame(data)
print(df.apply(lambda z: z + 1))
# Example: working with missing data
data = {
"age" : [25.2, np.nan, np.nan],
"height" : [68.1, 62.5, 60.5],
"weight" : [170.2, np.nan, 185.5],
}
df = pd.DataFrame(data)
# NA stands for Not Available
print("column means (NA skipped):")
print(str(df.mean()))
print("column means: (NA not skipped)")
print(str(df.mean(skipna=False)))
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Create your playground on Tech.io
This playground was created on Tech.io, our hands-on, knowledge-sharing platform for developers.
Go to tech.io
codingame x discord
Join the CodinGame community on Discord to chat about puzzle contributions, challenges, streams, blog articles - all that good stuff!
JOIN US ON DISCORD
Online Participants