Friday, 10 January 2020

Choosing columns in pandas DataFrame

Sometimes, you have a lot of columns in your DataFrame and want to use only some of them.

Picking specific columns

df[‘col1]
This command picks a column and returns it as a Series
Column returned as a Series
df[[‘col1’]]
Here, I chose the column and I get a DataFrame
Column returned as a DataFrame
df[[‘col1’, ‘col2’]]
This is the same command as above — but this time, I am choosing more than one column
Two columns returned as a DataFrame

Picking certain values from a column

df[df1[‘col1’] == value]
You choose all of the values in column 1 that are equal to the value.
df[df1[‘col1’] != value]
All of the values in column 1 that are not equal to the value.
df[df1[‘col1’] < value]
All of the values in column 1 are smaller than the value.
df[df1[‘col1’] > value]
All of the values in column 1 are bigger than the value.
All of the values in ‘Police District’ are 8
df1[‘col1’] == value
Similarly to the above commands, just here you get Boolean values.
The column returned as boolean

Picking certain rows

df.ix[index]
You can actually choose a row by using its index (the number on the far left).
I chose a row with index ‘0’
df.ix[‘index name’]
This command does exactly the same thing as above but you use it when you have actually named your indices.

Deleting columns

df.drop([‘col1’,’col2'], axis=1)
This command deletes specific columns from the DataFrame
You can see that here I don’t have the two rows that I dropped
del df[‘col1’]
This command deletes a specific column from the DataFrame and modifies it — so be careful how you use it.
You can see that this column is now gone

from : https://medium.com/@kasiarachuta/choosing-columns-in-pandas-dataframe-d0677b34a6ca

在Python中用tabulate打印表單

tabulate是一個幫助你打印標準化表單的庫,使用起來非常便捷,支持的格式較多。通過pip install tabulate安裝之後即可使用,以下是一個簡單的例子:
 列表 導入 列表

table  =  [[ “ spam”  42 ], [ “ eggs”  451 ], [ “ bacon”  0 ]] 
標頭 =  [ “ item”  “ qty” ]

fmts  =  [ “簡單” 
        “簡單” 
        “網格” 
        “ fancy_grid” 
        “管道” 
        “ orgtbl” 
        “ jira” 
        “ psql” 
        “ rst” 
        “ mediawiki” 
        “ moinmoin” 
        “ html” 
        “ latex” 
        “ latex_booktabs” 
        “紡織品” ]

 FMT   FMTS 
    打印 FMT  +  “:” 
    打印 平板狀 標題 tablefmt = FMT 
打印的表單如下:
平原:
數量
垃圾郵件42
雞蛋451
培根0
簡單:
數量
------ -----
垃圾郵件42
雞蛋451
培根0
網格:
+--------+-------+
| 項目| 數量
+========+=======+
| 垃圾郵件| 42 |
+--------+-------+
| 雞蛋| 451 |
+--------+-------+
| 培根| 0 |
+--------+-------+
fancy_grid:
╒════════╤═══════╕
│項目│數量│
╞════════╪═══════╡
│垃圾郵件│42│
├────────┼───────┤
│雞蛋│451│
├────────┼───────┤
│培根│0│
╘════════╧═══════╛
管:
| 項目| 數量
|:-------|------:|
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |
orgtbl:
| 項目| 數量
|--------+-------|
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |
吉拉:
|| 項目|| 數量||
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |
psql:
+--------+-------+
| 項目| 數量
|--------+-------|
| 垃圾郵件| 42 |
| 雞蛋| 451 |
| 培根| 0 |
+--------+-------+
第一個:
====== =====
數量
====== =====
垃圾郵件42
雞蛋451
培根0
====== =====
媒體維基:
{| class =“ wikitable” style =“ text-align:left;”
|+ 
|-項目!align =“ right” | 數量
|-
| 垃圾郵件|| align =“ right” | 42
|-
| 雞蛋|| align =“ right” | 451
|-
| 培根|| align =“ right” | 0
|}
moinmoin:
|| '''項目'''|| <style =“ text-align:right;”>'''數量'''||
|| 垃圾郵件|| <style =“ text-align:right;”> 42 ||
|| 雞蛋|| <style =“ text-align:right;”> 451 ||
|| 培根|| <style =“ text-align:right;”> 0 ||
的HTML:
<表格>
<頭>
<tr> <th>項目</ th> <th style =“ text-align:right;”>數量</ th> </ tr>
</ thead>
<身體>
<tr> <td>垃圾郵件</ td> <td style =“ text-align:right;”> 42 </ td> </ tr>
<tr> <td>雞蛋</ td> <td style =“ text-align:right;”> 451 </ td> </ tr>
<tr> <td>培根</ td> <td style =“ text-align:right;”> 0 </ td> </ tr>
</ tbody>
</ table>
膠乳:
\ begin {tabular} {lr}
\ hline
 項目和數量\\
\ hline
 垃圾郵件和42 \\
 雞蛋和451 \\
 培根&0 \\
\ hline
\ end {表格}
latex_booktabs:
\ begin {tabular} {lr}
\ toprule
 項目和數量\\
\ midrule
 垃圾郵件和42 \\
 雞蛋和451 \\
 培根&0 \\
\ bottomrule
\ end {表格}
紡織品:
| _。項目| _。數量
| <。垃圾郵件|>。42 |
| <。雞蛋|>。451 |
| <。培根|>。0 |


from : http://blog.zhengyi.one/tabulate.html

To count the word frequency in multiple list

Inspired from the Counter collection that you use:
from glob import glob
from collections import Counter
import re

folderpaths = 'd:/individual-articles'
counter = Counter()

filepaths = glob(os.path.join(folderpaths,'*.txt'))
for file in filepaths:
    with open(file) as f:
        words = re.findall(r'\w+', f.read().lower())
        counter = counter + Counter(words)
print counter

from : https://stackoverflow.com/questions/17399535/to-count-the-word-frequency-in-multiple-documents-python

Thursday, 9 January 2020