PythonからPyAudioで録音/再生してみる

Tweet

トビウオが 2018年11月28日 - 18:09 に投稿

概要

Python用ライブラリ「PyAudio」を使用して、マイクから録音・スピーカーから再生・取得した音声のリアルタイム分析まで行ってみました。

準備

ライブラリをインストールする際、依存ライブラリが存在することに注意する必要があります。
単に「pip install pyaudio」だけでは、依存ライブラリ不足でインストールが止まってしまうことがあるからです。
そのため、Mac OSの場合「brew install portaudio」、Linuxの場合「sudo apt-get install portaudio19-dev python-all-dev」が必要になります。詳しくは他の方の記事を参照してください。

使用方法

まず、PyAudio型のインスタンスを作成する必要があります。PyAudio型には__enter__や__exit__が実装されていませんので、破棄する際は明示的にterminate()メソッドを叩く必要があります。

import pyaudio
p = pyaudio.PyAudio()

# (この間に、PyAudioによる処理を行う)

p.terminate()

次に、PyAudio型のインスタンスから実行できる処理について示します。

利用できるデバイスについて調べる

録音用や再生用に使用できるオーディオデバイスの一覧を取得できます。
ただし、APIとして用意されているのは「デバイスの数を取得する」メソッドと「指定したインデックスのデバイスの情報を取得する」メソッドであり、「デバイスの一覧を取得する」メソッドは無いので注意が必要です。

for index in range(0, p.get_device_count()):
    print(p. get_device_info_by_index(index))

また、PyAudioはPortAudioというクロスプラットフォームなオーディオライブラリのラッパーです。PortAudioでは

オーディオデバイスの種類を「Host API(大分類)」
Host APIが持つオーディオインタフェースを「Device(小分類)」

と呼んで管理しており、PyAudioでもそれを参照して操作できます。

for host_index in range(0, p.get_host_api_count()):
    print(p. get_host_api_info_by_index(host_index))
    for device_index in range(0, p. get_host_api_info_by_index(host_index)['deviceCount']):
        print(p.get_device_info_by_host_api_device_index(host_index, device_index))

より詳しく理解するため、コンピューターのオーディオデバイスが以下の表の状態だった場合で説明します。

host_api index	device_index 1	device_index 2	name
0	0	0	A
0	1	1	B
0	2	2	C
1	0	3	D
1	1	4	E
2	0	5	F

# Host APIの総数を返す(この値は3)
host_api_count = p.get_host_api_count()

# 指定したHost APIの情報を返す
host_api_info1 = p.get_host_api_info_by_index(0)
host_api_info2 = p.get_host_api_info_by_index(1)

# Host APIの情報から、それに連なったDeviceの数を返す
device_count1 = host_api_info1['deviceCount']  # host_api_index=0なので、この値は3
device_count2 = host_api_info2['deviceCount']  # host_api_index=1なので、この値は2

# 指定したHost APIの、指定したDeviceの情報を返す
# 第二引数は、上記の表で言えば「device_index 1」に相当
device_info1 = p.get_device_info_by_host_api_device_index(0, 0)  # 値はA
device_info2 = p.get_device_info_by_host_api_device_index(0, 1)  # 値はB
device_info3 = p.get_device_info_by_host_api_device_index(1, 0)  # 値はD

# デバイスの総数を返す(この値は6)
device_count = p.get_device_count()

# 指定したデバイスの情報を返す
# 引数は、上記の表で言えば「device_index 2」に相当
device_info4 = p.get_device_info_by_index(0)  # 値はA
device_info5 = p.get_device_info_by_index(1)  # 値はB
device_info6 = p.get_device_info_by_index(3)  # 値はD

ここで、get_host_api_info_by_index()およびget_device_info_by_index()から取得できる、Host APIの情報・Deviceの情報はそれぞれ次の通りです。

# Host APIの情報。PaHostApiInfo構造体に準拠している。詳しくはこちら↓
# http://portaudio.com/docs/v19-doxydocs/structPaHostApiInfo.html
{
    'defaultInputDevice': 0,    # デフォルトの入力デバイスの「device_index 2」
     'defaultOutputDevice': 1,    # デフォルトの出力デバイスの「device_index 2」
     'deviceCount': 2,    # Host APIが持つオーディオデバイスの数
     'index': 0,    # Host APIのインデックス
     'name': 'Core Audio',    # Host APIの名称
     'structVersion': 1,    # ？
     'type': 5    # Host APIの種類。例えば「5」は「paCoreAudio(Mac OSのCoreAudio)」を指す
}

# Deviceの情報。PaDeviceInfo構造体に準拠している。詳しくはこちら↓
# http://portaudio.com/docs/v19-doxydocs/structPaDeviceInfo.html
{
    # 非インタラクティブな用途(wavファイルの再生など)におけるデフォルトの入力レイテンシ
    'defaultHighInputLatency': 0.01310657596371882,
    # 非インタラクティブな用途(wavファイルの再生など)におけるデフォルトの出力レイテンシ
    'defaultHighOutputLatency': 0.1,
    # インタラクティブな用途におけるデフォルトの入力レイテンシ
    'defaultLowInputLatency': 0.0029478458049886623,
    # インタラクティブな用途におけるデフォルトの出力レイテンシ
    'defaultLowOutputLatency': 0.01,
    # デフォルトのサンプリングレート
    'defaultSampleRate': 44100.0,
    'hostApi': 0,    # Deviceが属しているHost APIのインデックス
    'index': 0,    # 上記の表で言えば「device_index 2」に相当
    'maxInputChannels': 2,    # 最大の入力チャンネル数。0なら入力を受け付けない
    'maxOutputChannels': 0,    # 最大の出力チャンネル数。0なら出力を受け付けない
    'name': 'Built-in Microphone',    # Deviceの名称
    'structVersion': 2    # ？
}

なお、get_host_api_info_by_type()を使用すれば、Host APIの種類を指定してHost APIの情報を取得できます。
ですが、Host APIの種類を表す数字(PortAudioではenum PaHostApiTypeId)から、「pyaudio.paCoreAudio」などの種類を逆引きするAPIはありませんので、必要なら自分で作りましょう。

ちなみに、デフォルトで使用するHost API、デフォルトで使用する入力・出力Deviceの情報は、それぞれget_default_host_api_info()、get_default_input_device_info(),、get_default_output_device_info()から取得できます。

録音/再生用途で使用する

PyAudioの場合、録音/再生を行うため、デバイスからStream型のインスタンスを取得し、それに対して操作を行うことで処理を行います。
Stream型のインスタンスは、生成時に「入力用に使うか」「出力用に使うか」「入力と出力の両方に使うか」を指定できます。この際、「入力専用のStreamを作成し、処理後に出力専用のStreamに渡す」といったことも可能です。
また、Stream型のインスタンスを生成する際、入力用・出力用にどのデバイスを使用するかを設定できます(設定しないとデフォルトのデバイスが使用される)。

# Streamを開く(ブロッキング処理の場合)
stream = p.open(format=pyaudio.paInt16,
                channels=2,
                rate=44100,
                input=True,
                output=True,
                input_device_index=0,
                output_device_index=1,
                frames_per_buffer=4096)

# Streamを開く(非ブロッキング処理の場合)
stream = p.open(format=pyaudio.paInt16,
                channels=2,
                rate=44100,
                input=True,
                output=True,
                input_device_index=0,
                output_device_index=1,
                frames_per_buffer=4096,
                stream_callback=func)

例：WAVファイルを読み込んで再生する

典型的なブロッキング処理です。出力用のStreamを開いておき、読み込んだWAVEファイルを順次書き込みます。
このコードはPyAudioの公式ドキュメントにあるものを参考にしています。

import pyaudio
import wave

# チャンクサイズ(粒度)
CHUNK_SIZE = 1024

# WAVファイルを開く
wf = wave.open('test.wav', 'rb')

# PyAudioインスタンスを作成する
p = pyaudio.PyAudio()

# Streamを開く。フォーマット・チャンネル・サンプリングレートをWAVファイルと
# 合わせているが、合わせなくても再生は行える。
# ちなみにフォーマットとはビット深度のことであり、
# 8bitなら「p.get_format_from_width(1)」、
# 16bitなら「p.get_format_from_width((2)」とバイト数で設定することに注意
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                channels=wf.getnchannels(),
                rate=wf.getframerate(),
                output=True)

# データをチャンクサイズだけ読み込む
data = wf.readframes(CHUNK_SIZE)

# Streamに読み取ったデータを書き込む＝再生する
while len(data) > 0:
    # Streamに書き込む
    stream.write(data)

    # 再度チャンクサイズだけ読み込む。これを繰り返す
    data = wf.readframes(CHUNK_SIZE)

# Streamを止めて、closeする。closeしなければ、start_stream()で再開できる
stream.stop_stream()
stream.close()

# PyAudioインスタンスを破棄する
p.terminate()

例：録音した音声の音量(RMS)を計算して随時表示する

典型的な非ブロッキング処理です。入力用のStreamを開くのですが、その際にコールバック関数を設定するのがポイントです。
コールバック関数には「(in_data, frame_count, time_info, status)」という4種類の引数を設定し、「(out_data, flag)」というタプルを戻り値として返します。それぞれの引数の意味は次の通り。

in_data……bytes型。その時録音された音声データがバイナリ形式て返ってくる。WAVフォーマットは「リトルエンディアン」「ステレオだとLRLRの順番」「ビット深度が8bitなら1バイト分、16bitなら2バイト分の塊」なので、structを駆使してバイナリを解析しよう
frame_count……int型。その時録音された音声データの要素数。in_dataのバイト数と必ずしも一致するわけではないので、例えばステレオ・16bitならframe_count×4=len(in_data)となる
time_info……dict型。「入力バッファから入力した時刻」「出力バッファに出力した時刻」「現在時刻」が秒単位(恐らくOS起動時からの経過時間)で書き込まれている。他のオーディオデバイスとの時刻合わせなどに使える
status……int型。公式ドキュメントにあるように、現在の状態が書き込まれている
out_data……bytes型。in_data型と書式は同じ。p.openした際にoutput=Trueだったなら当然書き込むが、そうでない場合はNoneでも渡せばいい
flag……int型。現在の録音状況について、続行する・中断する・終了するから選べる。後々stream.stop_stream()する予定がある場合、単にpyaudio_ex.paContinueだけ返していてもいい

import pyaudio
import struct

def callback(in_data, frame_count, time_info, status):
    # bytes型を配列に変換する
    # (とりあえず8bit・モノクロだとした例を書く。
    # データは1バイトづつであり、0〜255までで中央値が128であることに注意)
    in_data2 = struct.unpack(f'<{len(in_data)}B', in_data)
    in_data3 = tuple((x - 128) / 128.0 for x in in_data2)

    # 読み取った配列(各要素は-1以上1以下の実数)について、RMSを計算する
    rms = math.sqrt(sum([x * x for x in in_data3]) / len(in_data3))

    # RMSからデシベルを計算して表示する
    db = 20 * math.log10(rms) if rms > 0.0 else -math.inf
    print(f"RMS：{format(db, '3.1f')}[dB]")

    return None, pyaudio_ex.paContinue

p = pyaudio.PyAudio()

stream = p.open(format=pyaudio.paInt8,
                channels=1,
                rate=8000,
                input=True,
                stream_callback=callback)

コメントを追加
閲覧数 29051

How to start your own business

friends at work have been hoping for. The type of details on this treasure trove is one of a kind and appreciated and is going to assist my kids and I in our studies a couple times a week. It appears as if this forum acquired a large amount of knowledge concerning this and the other hyper links and types of info really show it. Typically i'm not on the net during the night however when I get an opportunity im always perusing for this sort of knowledge and stuff closely having to do with it. If anyone gets a chance, take a look at my website. <a href=https://bioscienceadvising.com/how-to-write-grant>discrepancies between reviewing versus revision in educational drafting</a>

返信

Evolutionizing to Environmentally Friendly Power: Obstacles and

Matt Michael D'Agati functions as the owner of Renewables Worldwide, an alternative energy Company in MA.

A handful of time period ago, embarking on a leap of faith, Matthew D'Agati ventured into the realm of solar, furthermore in a short point began efficiently marketing significant amounts of power, predominately near the commercial sector, partnering with solar farm developers and local businesses in the "design" of their particular initiatives.

Consistent networking after just the industry, inspired Matt to be part of a regional startup 2 a long time inside, and in a short time, he became their Chief Strategy Officer, in charge of all function and companies improvement, along with being provided group title.

By using specific unions and shear operate principles, Matthew D'Agati boosted that business from a modest first-year revenue to in excess of a 210% enlarge in coarse earnings by season two. Based on that foundation, Renewables Worldwide’s (RW), a veteran soldier-possessed business, was made with the charge of giving sustainable vitality treatments for a smarter and more sustainable future.

Even more really, realizing there is a niche in the internet and a better way to complete final results, RW is one of a select number of organizations in the usa to attention on consumer purchase, focusing in both business and commercial solar work off-take. Specific eyesight is to produce a product sales structure on a community-based, regional, national level, offering various green vitality models throughout the of RW.

This dedication in the actual renewable sector proceeds to motivate and motivate Matt in moving forward his venture to work with associations that have the unchanging of providing sustainable focus products for a way more maintainable future. Matt features your in sales from a business program at Hesser College.

<a href=https://www.whartonfintech.org/our-team>Understanding sun-powered farms in Massachusetts by Matt D'Agati.</a>

返信