Python Learning

Friday, 10 January 2020

Choosing columns in pandas DataFrame

Sometimes, you have a lot of columns in your DataFrame and want to use only some of them.

Picking specific columns

df[‘col1]

This command picks a column and returns it as a Series

df[[‘col1’]]

Here, I chose the column and I get a DataFrame

df[[‘col1’, ‘col2’]]

This is the same command as above — but this time, I am choosing more than one column

Picking certain values from a column

df[df1[‘col1’] == value]

You choose all of the values in column 1 that are equal to the value.

df[df1[‘col1’] != value]

All of the values in column 1 that are not equal to the value.

df[df1[‘col1’] < value]

All of the values in column 1 are smaller than the value.

df[df1[‘col1’] > value]

All of the values in column 1 are bigger than the value.

All of the values in ‘Police District’ are 8

df1[‘col1’] == value

Similarly to the above commands, just here you get Boolean values.

Picking certain rows

df.ix[index]

You can actually choose a row by using its index (the number on the far left).

df.ix[‘index name’]

This command does exactly the same thing as above but you use it when you have actually named your indices.

Deleting columns

df.drop([‘col1’,’col2'], axis=1)

This command deletes specific columns from the DataFrame

You can see that here I don’t have the two rows that I dropped

del df[‘col1’]

This command deletes a specific column from the DataFrame and modifies it — so be careful how you use it.

You can see that this column is now gone

All of the code can be found on my GitHub: https://github.com/kasiarachuta/Blog/blob/master/Choosing%20specific%20columns%20or%20rows%20from%20pandas%20DataFrame.ipynb

from : https://medium.com/@kasiarachuta/choosing-columns-in-pandas-dataframe-d0677b34a6ca

在Python中用tabulate打印表單

tabulate是一個幫助你打印標準化表單的庫，使用起來非常便捷，支持的格式較多。通過pip install tabulate安裝之後即可使用，以下是一個簡單的例子：

從 列表 導入 列表

table  =  [[ “ spam” ， 42 ]， [ “ eggs” ， 451 ]， [ “ bacon” ， 0 ]] 
標頭 =  [ “ item” ， “ qty” ]

fmts  =  [ “簡單” ，
        “簡單” ，
        “網格” ，
        “ fancy_grid” ，
        “管道” ，
        “ orgtbl” ，
        “ jira” ，
        “ psql” ，
        “ rst” ，
        “ mediawiki” ，
        “ moinmoin” ，
        “ html” ，
        “ latex” ，
        “ latex_booktabs” ，
        “紡織品” ]

為 FMT  在 FMTS ：
    打印 FMT  +  “：” 
    打印 平板狀（表， 標題， tablefmt = FMT ）

打印的表單如下：

平原：

數量
垃圾郵件42
雞蛋451
培根0

簡單：

數量
------ -----
垃圾郵件42
雞蛋451
培根0

網格：

+--------+-------+
| 項目| 數量
+========+=======+
| 垃圾郵件| 42 |
+--------+-------+
| 雞蛋| 451 |
+--------+-------+
| 培根| 0 |
+--------+-------+

fancy_grid：

╒════════╤═══════╕
│項目│數量│
╞════════╪═══════╡
│垃圾郵件│42│
├────────┼───────┤
│雞蛋│451│
├────────┼───────┤
│培根│0│
╘════════╧═══════╛

管：

| 項目| 數量
|:-------|------:|
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |

orgtbl：

| 項目| 數量
|--------+-------|
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |

吉拉：

|| 項目|| 數量||
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |

psql：

+--------+-------+
| 項目| 數量
|--------+-------|
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |
+--------+-------+

第一個：

====== =====
數量
====== =====
垃圾郵件42
雞蛋451
培根0
====== =====

媒體維基：

{| class =“ wikitable” style =“ text-align：left;”
|+ 
|-
！項目！align =“ right” | 數量
|-
| 垃圾郵件|| align =“ right” | 42
|-
| 雞蛋|| align =“ right” | 451
|-
| 培根|| align =“ right” | 0
|}

moinmoin：

|| '''項目'''|| <style =“ text-align：right;”>'''數量'''||
|| 垃圾郵件|| <style =“ text-align：right;”> 42 ||
|| 雞蛋|| <style =“ text-align：right;”> 451 ||
|| 培根|| <style =“ text-align：right;”> 0 ||

的HTML：

<表格>
<頭>
<tr> <th>項目</ th> <th style =“ text-align：right;”>數量</ th> </ tr>
</ thead>
<身體>
<tr> <td>垃圾郵件</ td> <td style =“ text-align：right;”> 42 </ td> </ tr>
<tr> <td>雞蛋</ td> <td style =“ text-align：right;”> 451 </ td> </ tr>
<tr> <td>培根</ td> <td style =“ text-align：right;”> 0 </ td> </ tr>
</ tbody>
</ table>

膠乳：

\ begin {tabular} {lr}
\ hline
 項目和數量\\
\ hline
 垃圾郵件和42 \\
 雞蛋和451 \\
 培根＆0 \\
\ hline
\ end {表格}

latex_booktabs：

\ begin {tabular} {lr}
\ toprule
 項目和數量\\
\ midrule
 垃圾郵件和42 \\
 雞蛋和451 \\
 培根＆0 \\
\ bottomrule
\ end {表格}

紡織品：

| _。項目| _。數量
| <。垃圾郵件|>。42 |
| <。雞蛋|>。451 |
| <。培根|>。0 |

from : http://blog.zhengyi.one/tabulate.html

To count the word frequency in multiple list

Inspired from the Counter collection that you use:

from glob import glob
from collections import Counter
import re

folderpaths = 'd:/individual-articles'
counter = Counter()

filepaths = glob(os.path.join(folderpaths,'*.txt'))
for file in filepaths:
    with open(file) as f:
        words = re.findall(r'\w+', f.read().lower())
        counter = counter + Counter(words)
print counter

from : https://stackoverflow.com/questions/17399535/to-count-the-word-frequency-in-multiple-documents-python

Thursday, 9 January 2020

Fix TypeError: sequence item 3: expected string, float found

IIUC need convert to string with lambda function:

g=df1.groupby('Attribute_spcName')['Char_spcValue'].apply(lambda x: ', '.join(x.astype(str))

from : https://stackoverflow.com/questions/49359266/typeerror-sequence-item-3-expected-string-float-found

Python Pandas : How to display full Dataframe i.e. print all rows & columns without truncation

pd.set_option('display.max_rows', None)

pd.set_option('display.max_columns', None)

pd.set_option('display.width', None)

pd.set_option('display.max_colwidth', -1)

from : https://thispointer.com/python-pandas-how-to-display-full-dataframe-i-e-print-all-rows-columns-without-truncation/