python3

date: 2021-05-02 excerpt: python3チートシート

tag: python python3 チートシート str utf8 unicode

python3チートシート

N進数表現

10進数を文字列のN進数に変換する

format(5, "08b") # '00000101'
format(10, "o") # '12'
format(10, "x") # 'a'

一部だけ知りたいならば、bit演算で求めたほうが早い

N進数の文字列を10進数の数値に変換する

int("10", 2) # 2
int("10", 8) # 8
int("10", 16) # 16

zip(転置)

x = [[1,2,3], [4,5,6], [7,8,9]]
list(zip(*x)) # [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

argparse

基本的な使用例

import argparse
parser = argparse.ArgumentParser()

# 名前付き引数, e.g. python3 a.py --echo=foo
parser.add_argument("--echo")
# デフォルトの値が0で、int指定
# 指定と異なった値を入れるとエラー
parser.add_argument("-p", "--param", type=int, default=0)
# デフォルトでFalse, 指定されるとTrue
parser.add_argument("-v", "--verbose", help="increase output verbosity",
                    action="store_true")

args = parser.parse_args()
print(args.echo)
print(args.param)
print(args.verbose)

Argparse Tutorial

path操作

`os.path`で現在のワーキングディレクトリを得る

jupyterや対話型インタプリタで実行していると得られない変数__file__を利用し、かつ、locals関数で例外をハンドルする

cwd = os.path.dirname(__file__) if "__file__" in locals() else os.getcwd()

`os.path.join`の挙動

__file__ = os.getcwd()
__file__ # '/foo/bar'
os.path.join(__file__, "../..") # '/foo/bar/../..'

`os.walk`のように使う`path.rglob`

再帰的にglobするのに用いることができる
ネストしたディレクトリをすべてスキャンすることができる

image_names = []
for path in tqdm(Path(images_dir).rglob("*")):
    if path.is_dir():
        continue
    image_names.append(path)

拡張子(extention)を削除する

Path.with_suffix関数を用いる

from pathlib import Path
print(Path("/tmp/example.pdf").with_suffix("")) # /tmp/example が出力される

ファイルの統計情報

`Path.stat()`, `os.stat()`

Path.stat()とos.stat()は同じである

from pathlib import Path
import os

Path("a").stat() # st_mode, st_ino, st_dev, st_link, st_uidのデータクラスが得られる
os.stat("a")

標準入出力

`input()`以外の標準出力のスキャン方法

sys.stdin.readline()を用いると高速にスキャンできる
- この速度差はおおよそ3倍である

readline = sys.stdin.readline
x = list(map(int, readline().split()))

可変長引数を受け取って変数を宣言する方法

*A, = map(int, "1 2 3 4 5".split())
print(A) # [1, 2, 3, 4, 5]

例外

複数の型の例外をワンラインでキャッチする

except (zlib.error, gzip.BadGzipFile) as exc:
    pass

算術演算

エクスポネンシャル表記

1e+4や1e-3などで10000.0や0.001を表現できる

display(1e-2) # 0.01
display(1e-1) # 0.1
display(1e+0) # 1.0
display(1e+1) # 10.0
display(1e+2) # 100.0
display(1e+3) # 1000.0

exponential-example

`pow()`関数

AをN乗するときに、とても大きな数になるとき、MODを取ることを許容するととても高速に計算できる
具体的には、繰り返し二乗法という方法で計算が行われている
計算量がn回かけるとき、単純に計算するとO(n)であるが、O(log n)に削減できる

MOD=998244353
pow(13, 9999999, MOD) # 早い, 796122862
13**9999999%MOD # 遅い, 796122862

`itertools.count()`関数

無限カウンタを作る
初期値をitertools.count(n)でnを与えることができる

`math.comb`関数

import math
n=7
k=5
print(math.comb(n, k)) # 21

高速な`combination`の実装(mod付き)

競技プログラミング等で実装する基礎方針になる

def cmb(n, r, mod):
    if (r < 0) or (n < r):
        return 0
    r = min(r, n - r)
    return fact[n] * factinv[r] * factinv[n-r] % mod
mod = 10 ** 9 + 7
N = 10 ** 6  # N は必要分だけ用意する
fact = [1, 1]  # fact[n] = (n! mod p)
factinv = [1, 1]  # factinv[n] = ((n!)^(-1) mod p)
inv = [0, 1]  # factinv 計算用

for i in range(2, N + 1):
    fact.append((fact[-1] * i) % mod)
    inv.append((-inv[mod % i] * (mod // i)) % mod)
    factinv.append((factinv[-1] * inv[-1]) % mod)
a,b = map(int,input().split())
a-=1; b-=1
print(cmb(a+b, b, mod))

`math.radians`関数

360°で表現される角度からmath.cos, math.sinで計算できるラジアンを返す

math.radians(360/n) # n等分したラジアン  

`math.atan2`関数

ベクトルを引数にその角度が構成しているラジアンを得る
引数の順序がmath.atan2(y,x)なので注意

math.atan2(2,0) == math.pi/2 # true

`array`

list構造とは別にarrayというものがある
insertなどの操作がlistより3倍程度高速

import array

a = array.array("i", [0, 1, 2])

colab

文字列関連

capitalize(最初の文字をUppercaseにする)

print("i am a hero.".capitalize())
# I am a hero.
print(" i am a hero.".capitalize())
#  i am a hero.

`string`モジュール

英語小文字、大文字、数字、標準でサポートされるascii文字などの一覧を得られる

import string

print(string.ascii_letters)  # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.ascii_lowercase)  # abcdefghijklmnopqrstuvwxyz
print(string.ascii_uppercase)  # ABCDEFGHIJKLMNOPQRSTUVWXYZ
print(string.digits)  # 0123456789
print(string.hexdigits)  # 0123456789abcdefABCDEF
print(string.octdigits)  # 01234567
print(string.punctuation)  # !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
print(
    string.printable
)  # 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

並列化

multiprocessing.Poolを用いた並列化

from multiprocessing import Pool, cpu_count
from tqdm.auto import tqdm
import time
import os

def run(arg):
    time.sleep(1)
    print(os.getpid(), arg)
    return arg*arg

args = list(range(10))
with Pool(cpu_count()) as p:
    *ret, = tqdm(p.imap(run, args), total=len(args), desc="work1")
print(ret)

collectionsのデータ型

ネストした`collections.defaultdict`

adj = collections.defaultdict(lambda :collections.defaultdict(int))

# e.g.
adj[1][2] += 1

モジュール化

`init.py`の書き方

`src`などプロジェクトレベルのディレクトリで他のライブラリに公開する時

.(ピリオド)は__init__.pyから見た相対のパスを意味する
作成しているライブラリの__init__.pyで他のライブラリに使わせる関数名、クラス名を露出させる役割を持つ

from .myfunc import my_add
from .myfunc import my_devide

`tests`から`src`プロジェクトを利用する時

こちらの目的は暗黙的に最初に実行されるファイルとしての側面が強い
相対ディレクトリのパスのインポート等が目的になる

import sys
import os

cwd = os.path.dirname(__file__) if "__file__" in locals() else os.getcwd()
lib = os.path.join(cwd, "../src")
sys.path.insert(0, lib)

テスト

デフォルトのユニットテストライブラリ`unittest`

/python-unittest/

`importlib`ライブラリ

実行しているpythonに特定のモジュールがインストールされているかチェックすることができる
モジュールがインストールされているとModuleSpecオブジェクトが返り、インストールされていないとNoneが返る

>>> import importlib
>>> importlib.util.find_spec("math")
ModuleSpec(name='math', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in')
>>> importlib.util.find_spec("something-not-exists")
>>>

docstringの書き方

docstringの例

いくつかの方言があるが、google styleが最も整理されている
インデントはspace 4
function
- 関数の下に説明を書く
- Args:の後に引数の説明を書く
- Returns:の後に戻り値を書く
class
- クラスの直下に説明を書く
- Attributes:の後にアトリビュートの説明を書く
- Methods:の後にメソッドの概要を書く

function

def add_binary(a, b):
    """
    Returns the sum of two decimal numbers in binary digits.
	...

    Args:
        a (int): 
			A decimal integer
        b (int): 
			Another decimal integer
    Returns:
        str: 
			Binary string of the sum of a and b
    """
    binary_sum = bin(a+b)[2:]
    return binary_sum
print(add_binary.__doc__)

class

class Person:
    """
    A class to represent a person.
    ...

    Attributes: 
		name (str): 
			first name of the person
		surname (str): 
			family name of the person
		age (int): 
			age of the person

    Methods:
		info(additional=""):
			Prints the person's name and age.
    """

classの設計

static class

シングルトンクラスを設計する際や簡単に隠蔽したい時などはstatic classが便利
明示的に@staticmethodをつけると良い

class Mochi:
    a: int = 0
    b: str = "おもち"
    @staticmethod
    def get():
        return Mochi.b
    @staticmethod
    def set(x):
        Mochi.b = x
print(Mochi.get()) # おもち
Mochi.set("ねこ")
print(Mochi.get()) # ねこ

class method

自身のインスタンスを返すスタティックメソッド

class Neko:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __str__(self):
        return f"{self.a}と{self.b}"
    @classmethod
    def from_string(cls, x):
        a, b = map(int, x.split("-"))
        neko = Neko(a, b)
        return neko

print(Neko(10, 20)) # 10と20
print(Neko.from_string("3-7")) # 3と7

関数デコレータ

@で装飾して関数の機能を装飾するもの
以下は時間測定の例

import timeit

def timer(function):
  def wrap():
    start_time = timeit.default_timer()
    function()
    elapsed = timeit.default_timer() - start_time
    print(f'"{function.__name__}" took {elapsed:0.06f} seconds')
  return wrap

@timer
def something():
    for i in range(10**7):
        i*i*i
something() # "something" took 1.124430 seconds

基本的なデータ構造

list

l = [1, 2, 3, 4, 1]
l.count(1) # 2, 指定された値が何個あるか 
l.index(1) # 0, 指定された値が最初に何番目のindexにあるか

iterator

リストを回すとO(N)の計算量がかかるがiteratorの形で宣言するとO(1)に圧縮できる
リストの一部だけ必要で、すべてをスキャンする必要がないとき、iteratorで宣言すると早い

%%time
obj = [r for r in range(10**7) if r%2 == 0 and r > 1000]
print(obj[0]) # 1.27 s

%%time
obj = iter(r for r in range(10**7) if r%2 == 0 and r > 1000)
print(next(obj)) # 65.8 ms

実験

シリアライズ

pickle

pythonのデフォルトのシリアライズ
jsonなどよりも多少効率が良いのと型情報を保存できる
classや関数を含む情報のシリアライズをできるが、lambdaはできないなどの制約がある

import pickle
import collections

wc = collections.defaultdict(lambda :[0, 1, 2])
pickle.dumps(wc) # シリアライズできない

def func():
    return [0, 1, 2]
wc = collections.defaultdict(func)
pickle.dumps(wc)

実験

str型とunicodeの関係

それぞれの関係

bytes型が基底としてあり、それを文字コードに従って特殊化する形でstr型がある
- 標準ではbytes型をutf8として解釈することでstr型が作成される
- 明示的に文字コードを指定して再解釈させることも可能
utf8のstrを.encode('unicode-escape')することでエスケープされたbytes型のunicode文字を得られる

jp = "おはよう桃ちゃん!"
print(jp)


# unicodeのrawでエスケープされたunicodeを得られる
unicode_jp = jp.encode("unicode-escape")
print(unicode_jp) # b'\\u304a\\u306f\\u3088\\u3046\\u6843\\u3061\\u3083\\u3093!'

# もとに戻せる
print(unicode_jp.decode("unicode-escape")) # おはよう桃ちゃん!

# ユニコードがそのままテキストとして得られる
print(unicode_jp.decode("utf8")) # \u304a\u306f\u3088\u3046\u6843\u3061\u3083\u3093!

contextmanagerの使い方

withステートメント等を作ることができる
- 典型例としてはインスタンスを引数で受け取る -> インスタンスのクリーンアップ
- 添付している例はステートメントの実行時間を測定する

from contextlib import contextmanager
from loguru import logger
import time
from typing import Any

@contextmanager
def closing(any: Any = None):
    tlog = None
    try:
        tlog = time.time()
        yield None # ここで受け取ったインスタンスを返す
    finally:
        elp = time.time() - tlog
        logger.info(f"exit, elapsed = {elp}") 
        if any:
           del any # インスタンス引数があればそれを削除

with closing():
    time.sleep(1.0)

datetimeとフォーマットコード

パースや文字列として出力する際にフォーマットコードが必要

python3のドキュメントにまとまっている
- strftime() and strptime() Format Codes