電腦領域 HKEPC Hardware - Powered by Discuz! Board

標題: [技術討論] 搵軟件管理HDD/NAS嘅10TB文件 [打印本頁]

作者: CarlR696 時間: 2023-11-4 01:04 標題: 搵軟件管理HDD/NAS嘅10TB文件

提示: 作者被禁止或刪除內容自動屏蔽

作者: bongbong3481 時間: 2023-11-4 09:56

留名學野。

作者: eo38cl 時間: 2023-11-4 14:53

本帖最後由 eo38cl 於 2023-11-4 14:54 編輯

Windows下，最好用係Whereisit，但已停止更新，最後一個版本係2014(build 220)，官網已買唔到。

然後係WinCatalog、Everything、abeMeda (前身CDWinder)等

作者: CarlR696 時間: 2023-11-4 20:57

提示: 作者被禁止或刪除內容自動屏蔽

作者: ericauky 時間: 2023-11-4 22:51

咁多ching，搭單問一問，有無prog 係可check hdd 內有無重複的file呢？

作者: CarlR696 時間: 2023-11-5 12:07

提示: 作者被禁止或刪除內容自動屏蔽

作者: SuperElephant 時間: 2023-11-13 16:31

本帖最後由 SuperElephant 於 2023-11-13 19:46 編輯

回覆 5# ericauky
大型檔案比對需時, 以下答案會對比每個檔案的全部內容(hash), 而非檔案名稱

開cmd打"python --version"然後enter, 如無顯示version, 請安裝Python
新增檔案C:\dedup.py將以下代碼copy+paste落dedup.py然後save
再開cmd, 行: python "C:\dedup.py" "D:\target_directory"
"D:\target_directory" 改成你需要搵重複的路徑

代碼：

import os
import hashlib
import argparse
def calculate_sha256(file_path):
with open(file_path, 'rb') as file:
bytes = file.read()
readable_hash = hashlib.sha256(bytes).hexdigest()
return readable_hash
def find_duplicates(target_directory):
file_hashes = {}
for dirpath, dirnames, filenames in os.walk(target_directory):
for filename in filenames:
file_path = os.path.join(dirpath, filename)
file_hash = calculate_sha256(file_path)
if file_hash in file_hashes:
file_hashes[file_hash].append(file_path)
else:
file_hashes[file_hash] = [file_path]
duplicates = {k: v for k, v in file_hashes.items() if len(v) > 1}
return duplicates
# Parse command-line arguments
parser = argparse.ArgumentParser(description='Find duplicate files in a directory.')
parser.add_argument('directory', type=str, help='The target directory.')
args = parser.parse_args()
target_directory = args.directory
duplicates = find_duplicates(target_directory)
for hash, file_paths in duplicates.items():
print(f"Duplicate files for hash {hash}:")
for file_path in file_paths:
print(f"\t{file_path}")
print(f"Scan completed. Found {len(duplicates)} duplicate hashes.")

複製代碼

作者: hoho1986 時間: 2023-11-17 09:57

WizTree
可以Export CSV，外置HDD都可以用

作者: Jip仔 時間: 2023-11-17 16:24

咁多ching，搭單問一問，有無prog 係可check hdd 內有無重複的file呢？
ericauky 發表於 2023-11-4 22:51

Duplicate Cleaner Free
https://www.duplicatecleaner.com/

作者: CarlR696 時間: 2023-11-17 17:16

提示: 作者被禁止或刪除內容自動屏蔽

作者: CarlR696 時間: 2023-11-17 17:17

提示: 作者被禁止或刪除內容自動屏蔽

作者: sonichkhk 時間: 2023-11-17 22:07

回覆 5# ericauky

用fclones or ddh

作者: Jip仔 時間: 2023-11-17 23:01

根本一google 就搵到o既野，
答，
係一種侮辱。

我問的都係無咁簡單的。 ...
CarlR696 發表於 2023-11-17 17:17

請問樓主你回我喺乜嘢意思
我回5樓（搵重複的file）並非回你喎

作者: s20012797 時間: 2023-11-17 23:42

回覆 ericauky
大型檔案比對需時, 以下答案會對比每個檔案的全部內容(hash), 而非檔案名稱

開cmd打"pyth ...
SuperElephant 發表於 2023/11/13 16:31

比我我會咁寫

DupFileFind.py

# This script is more robust when dealing with large files and exceptions, and presents the results in a more user-friendly way.
import os
import hashlib
import argparse
# This function is used to calculate the SHA256 hash of a file
def calculate_sha256(file_path):
try:
# Create a new hashlib.sha256 object
hash_sha256 = hashlib.sha256()
# Open the file and read its content
with open(file_path, 'rb') as file:
# Read the file content in chunks and update the hash
for chunk in iter(lambda: file.read(4096), b""):
hash_sha256.update(chunk)
# Return the hash in hexadecimal format
return hash_sha256.hexdigest()
except Exception as e:
# If an error occurs while reading the file, print the error message and return None
print(f"Error reading file {file_path}: {str(e)}")
return None
# This function is used to find duplicate files in the target directory
def find_duplicates(target_directory):
# Create a dictionary to store the file hashes and their corresponding file paths
file_hashes = {}
# Traverse the target directory and all its subdirectories
for dirpath, dirnames, filenames in os.walk(target_directory):
# Traverse each file
for filename in filenames:
# Get the full path of the file
file_path = os.path.join(dirpath, filename)
# Calculate the hash of the file
file_hash = calculate_sha256(file_path)
# If the file hash is already in the dictionary, add the file path to the corresponding list
# Otherwise, create a new list and add the file path to it
if file_hash:
if file_hash in file_hashes:
file_hashes[file_hash].append(file_path)
else:
file_hashes[file_hash] = [file_path]
# Select the hashes that have more than one file path from the dictionary, these are the duplicate files
duplicates = {k: v for k, v in file_hashes.items() if len(v) > 1}
return duplicates
# Parse command-line arguments
parser = argparse.ArgumentParser(description='Find duplicate files in a directory.')
parser.add_argument('directory', type=str, help='The target directory.')
args = parser.parse_args()
# Get the target directory
target_directory = args.directory
# Find duplicate files
duplicates = find_duplicates(target_directory)
# Print the information of duplicate files
for hash, file_paths in duplicates.items():
print(f"Duplicate files for hash {hash}:")
for file_path in file_paths:
print(f"\t{file_path}")
# Print the completion message
print(f"Scan completed. Found {len(duplicates)} duplicate hashes.")

複製代碼

作者: SuperElephant 時間: 2023-11-18 15:00

我哋咁多位都知道回覆既意義亦都係抱著同一理念回po

作者: CarlR696 時間: 2023-11-18 15:43

提示: 作者被禁止或刪除內容自動屏蔽

作者: SuperElephant 時間: 2023-11-18 16:29

回覆 16# CarlR696

宏觀d睇
我哋既興趣係觀賞一啲實力相差太遠既人
用最高姿態最深入資訊最詳細既回覆
反差當中相映成趣
諷刺當中帶點侮辱
明既自然明 epc垃唔垃圾亦控制唔到
回呢d po唔是淨係為咗答問題

作者: rabbit82047 時間: 2023-11-18 17:09

要問就問，要回就回，無程度高低之分

要講搵 google 就唔好問，不如搬埋 chatgpt 出黎
討論區執哂佢算數，個個都高手到唔洗問